concrete.util.simple_comm module¶

Create a simple (valid) Communication suitable for testing purposes

class concrete.util.simple_comm.SimpleCommTempFile(n=10, id_fmt=u'temp-%d', sentence_fmt=u'Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix=u'.concrete')¶

Bases: object

DEPRECATED. Please use create_comm() instead.

Class representing a temporary file of sample concrete objects. Designed to facilitate testing.

path¶: str – path to file

communications¶: Communication[] – List of communications that were written to file

Usage:

from concrete.util import CommunicationReader
with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f:
    reader = CommunicationReader(f.path)
    for (orig_comm, comm_path_pair) in zip(f.communications, reader):
        print(orig_comm.id)
        print(orig_comm.id == comm_path_pair[0].id)
        print(f.path == comm_path_pair[1])

Create temp file and write communications.

Parameters:

n – i number of communications to write
id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
writer_class – CommunicationWriter or CommunicationWriterTGZ
suffix – file path suffix (you probably want to choose this to match writer_class)

concrete.util.simple_comm.add_annotation_level_argparse_argument(parser)¶

Add an ‘–annotation-level’ argument to an ArgumentParser

The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See create_comm() for details.

Parameters:	parser (argparse.ArgumentParser) –

concrete.util.simple_comm.create_comm(comm_id, text=u'', comm_type=u'article', section_kind=u'passage', metadata_tool=u'concrete-python', metadata_timestamp=None, annotation_level=u'token')¶

Create a simple, valid Communication from text.

By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections.

annotation_level controls the amount of annotation that is added:

AL_NONE: add no optional annotations (not even sections)

AL_SECTION: add sections but not sentences

AL_SENTENCE: add sentences but not tokens

AL_TOKEN: add all annotations, up to tokens (the default)

Parameters:	comm_id (str) – text (str) – comm_type (str) – section_kind (str) – metadata_tool (str) – metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used. annotation_level (str) –
Returns:
Return type:	Communication

concrete.util.simple_comm.create_section(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)¶

Create Section from provided text and metadata.

Lower-level routine (called by create_comm()).

Parameters:	sec_text (str) – sec_start (int) – sec_end (int) – section_kind (str) – aug (_AnalyticUUIDGenerator) – metadata_tool (str) – metadata_timestamp (int) – Time in seconds since the Epoch annotation_level (str) – See `create_comm()` for details
Returns:
Return type:	Section

concrete.util.simple_comm.create_sentence(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)¶

Create Sentence from provided text and metadata.

Lower-level routine (called indirectly by create_comm())

Parameters:	sen_text (str) – sen_start (int) – sen_end (int) – aug (_AnalyticUUIDGenerator) – metadata_tool (str) – metadata_timestamp (int) – Time in seconds since the Epoch annotation_level (str) – See `create_comm()` for details
Returns:
Return type:	Sentence

concrete.util.simple_comm.create_simple_comm(comm_id, sentence_string=u'Super simple sentence .')¶

Create a simple (valid) Communication suitable for testing purposes

The Communication will have a single Section containing a single Sentence.

Parameters:	comm_id (str) – Specifies a Communication ID sentence_string (str) – String to be used for the sentence text. The string will be whitespace-tokenized.
Returns:
Return type:	Communication