concrete.util.simple_comm module¶
Create a simple (valid) Communication suitable for testing purposes
-
class
concrete.util.simple_comm.
SimpleCommTempFile
(n=10, id_fmt=u'temp-%d', sentence_fmt=u'Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix=u'.concrete')¶ Bases:
object
DEPRECATED. Please use
create_comm()
instead.Class representing a temporary file of sample concrete objects. Designed to facilitate testing.
-
path
¶ str – path to file
-
communications
¶ Communication[] – List of communications that were written to file
Usage:
from concrete.util import CommunicationReader with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f: reader = CommunicationReader(f.path) for (orig_comm, comm_path_pair) in zip(f.communications, reader): print(orig_comm.id) print(orig_comm.id == comm_path_pair[0].id) print(f.path == comm_path_pair[1])
Create temp file and write communications.
Parameters: - n – i number of communications to write
- id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- writer_class – CommunicationWriter or CommunicationWriterTGZ
- suffix – file path suffix (you probably want to choose this to match writer_class)
-
-
concrete.util.simple_comm.
add_annotation_level_argparse_argument
(parser)¶ Add an ‘–annotation-level’ argument to an ArgumentParser
The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See
create_comm()
for details.Parameters: parser (argparse.ArgumentParser) –
-
concrete.util.simple_comm.
create_comm
(comm_id, text=u'', comm_type=u'article', section_kind=u'passage', metadata_tool=u'concrete-python', metadata_timestamp=None, annotation_level=u'token')¶ Create a simple, valid
Communication
from text.By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections.
annotation_level controls the amount of annotation that is added:
- AL_NONE: add no optional annotations (not even sections)
- AL_SECTION: add sections but not sentences
- AL_SENTENCE: add sentences but not tokens
- AL_TOKEN: add all annotations, up to tokens (the default)
Parameters: - comm_id (str) –
- text (str) –
- comm_type (str) –
- section_kind (str) –
- metadata_tool (str) –
- metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used.
- annotation_level (str) –
Returns: Return type:
-
concrete.util.simple_comm.
create_section
(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Section
from provided text and metadata.Lower-level routine (called by
create_comm()
).Parameters: - sec_text (str) –
- sec_start (int) –
- sec_end (int) –
- section_kind (str) –
- aug (_AnalyticUUIDGenerator) –
- metadata_tool (str) –
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Return type:
-
concrete.util.simple_comm.
create_sentence
(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Sentence
from provided text and metadata.Lower-level routine (called indirectly by
create_comm()
)Parameters: - sen_text (str) –
- sen_start (int) –
- sen_end (int) –
- aug (_AnalyticUUIDGenerator) –
- metadata_tool (str) –
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Return type:
-
concrete.util.simple_comm.
create_simple_comm
(comm_id, sentence_string=u'Super simple sentence .')¶ Create a simple (valid)
Communication
suitable for testing purposesThe Communication will have a single
Section
containing a singleSentence
.Parameters: - comm_id (str) – Specifies a Communication ID
- sentence_string (str) – String to be used for the sentence text. The string will be whitespace-tokenized.
Returns: Return type: