concrete.util.simple_comm module¶
Create a simple (valid) Communication suitable for testing purposes
-
class
concrete.util.simple_comm.
SimpleCommTempFile
(n=10, id_fmt=u'temp-%d', sentence_fmt=u'Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix=u'.concrete')¶ Bases:
object
DEPRECATED. Please use
create_comm()
instead.Class representing a temporary file of sample concrete objects. Designed to facilitate testing.
-
path
¶ str – path to file
-
communications
¶ Communication[] – List of communications that were written to file
Usage:
from concrete.util import CommunicationReader with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f: reader = CommunicationReader(f.path) for (orig_comm, comm_path_pair) in zip(f.communications, reader): print(orig_comm.id) print(orig_comm.id == comm_path_pair[0].id) print(f.path == comm_path_pair[1])
Create temp file and write communications.
Parameters: - n – i number of communications to write
- id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- writer_class – CommunicationWriter or CommunicationWriterTGZ
- suffix – file path suffix (you probably want to choose this to match writer_class)
-
-
concrete.util.simple_comm.
add_annotation_level_argparse_argument
(parser)¶ Add an ‘–annotation-level’ argument to an ArgumentParser
The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See
create_comm()
for details.Parameters: parser (argparse.ArgumentParser) – the parser to add the argument to
-
concrete.util.simple_comm.
create_comm
(comm_id, text=u'', comm_type=u'article', section_kind=u'passage', metadata_tool=u'concrete-python', metadata_timestamp=None, annotation_level=u'token')¶ Create a simple, valid
Communication
from text.By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections. Each section will be created with a call to
create_section()
.annotation_level controls the amount of annotation that is added:
- AL_NONE: add no optional annotations (not even sections)
- AL_SECTION: add sections but not sentences
- AL_SENTENCE: add sentences but not tokens
- AL_TOKEN: add all annotations, up to tokens (the default)
Parameters: - comm_id (str) – Communication id
- text (str) – Communication text
- comm_type (str) – Communication type
- section_kind (str) – Section kind to set on all sections
- metadata_tool (str) – tool name of analytic that generated this text
- metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used.
- annotation_level (str) – string representing annotation level to add to communication (see above)
Returns: Communication containing given text and metadata
-
concrete.util.simple_comm.
create_section
(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Section
from provided text and metadata. Section text will be split into sentence texts by newlines and each sentence will be created with a call tocreate_sentence()
.Lower-level routine (called by
create_comm()
).Parameters: - sec_text (str) – text to create section from
- sec_start (int) – starting position of section in Communication text (inclusive)
- sec_end (int) – ending position of section in Communication text (inclusive)
- section_kind (str) – value for Section.kind field to be set to
- aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this section
- metadata_tool (str) – tool name of the analytic that generated this section
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Concrete Section containing given text and metadata
-
concrete.util.simple_comm.
create_sentence
(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Sentence
from provided text and metadata.Lower-level routine (called indirectly by
create_comm()
)Parameters: - sen_text (str) – text to create sentence from
- sen_start (int) – starting position of sentence in Communication text (inclusive)
- sen_end (int) – ending position of sentence in Communication text (inclusive)
- aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this sentence
- metadata_tool (str) – tool name of the analytic that generated this sentence
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Concrete Sentence containing given text and metadata
-
concrete.util.simple_comm.
create_simple_comm
(comm_id, sentence_string=u'Super simple sentence .')¶ Create a simple (valid)
Communication
suitable for testing purposesThe Communication will have a single
Section
containing a singleSentence
.Parameters: - comm_id (str) – Communication id
- sentence_string (str) – Communication text
Returns: Communication containing given text and having the given id