concrete.util.simple_comm module

Create a simple (valid) Communication suitable for testing purposes

class concrete.util.simple_comm.SimpleCommTempFile(n=10, id_fmt=u'temp-%d', sentence_fmt=u'Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix=u'.concrete')

Bases: object

DEPRECATED. Please use create_comm() instead.

Class representing a temporary file of sample concrete objects. Designed to facilitate testing.

path

str – path to file

communications

Communication[] – List of communications that were written to file

Usage demo:

>>> from concrete.util import CommunicationReader
>>> with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f:
...     reader = CommunicationReader(f.path)
...     for (orig_comm, comm_path_pair) in zip(f.communications, reader):
...         print(orig_comm.id)
...         print(orig_comm.id == comm_path_pair[0].id)
...         print(f.path == comm_path_pair[1])
temp-0
True
True
temp-1
True
True
temp-2
True
True

Create temp file and write communications.

n:i number of communications to write id_fmt: format string used to generate communication IDs;

should contain one instance of %d, which will be replaced by the number of the communication
sentence_fmt: format string used to generate communication
IDs; should contain one instance of %d, which will be replaced by the number of the communication

writer_class: CommunicationWriter or CommunicationWriterTGZ suffix: file path suffix (you probably want to choose this

to match writer_class)
concrete.util.simple_comm.add_annotation_level_argparse_argument(parser)

Add an ‘–annotation-level’ argument to an ArgumentParser

The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See create_comm() for details.

Parameters:parser (argparse.ArgumentParser) –
concrete.util.simple_comm.create_comm(comm_id, text=u'', comm_type=u'article', section_kind=u'passage', metadata_tool=u'concrete-python', metadata_timestamp=None, annotation_level=u'token')

Create a simple, valid Communication from text.

By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections.

annotation_level controls the amount of annotation that is added:

  • AL_NONE: add no optional annotations (not even sections)
  • AL_SECTION: add sections but not sentences
  • AL_SENTENCE: add sentences but not tokens
  • AL_TOKEN: add all annotations, up to tokens (the default)
Parameters:
  • comm_id (str) –
  • text (str) –
  • comm_type (str) –
  • section_kind (str) –
  • metadata_tool (str) –
  • metadata_timestamp (int) – Time in seconds since the Epoch. If `None, the current time will be used.
  • annotation_level (str) –
Returns:

Return type:

Communication

concrete.util.simple_comm.create_section(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Section from provided text and metadata.

Lower-level routine (called by create_comm()).

Parameters:
  • sec_text (str) –
  • sec_start (int) –
  • sec_end (int) –
  • section_kind (str) –
  • aug (_AnalyticUUIDGenerator) –
  • metadata_tool (str) –
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Return type:

Section

concrete.util.simple_comm.create_sentence(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Sentence from provided text and metadata.

Lower-level routine (called indirectly by create_comm())

Parameters:
  • sen_text (str) –
  • sen_start (int) –
  • sen_end (int) –
  • aug (_AnalyticUUIDGenerator) –
  • metadata_tool (str) –
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Return type:

Sentence

concrete.util.simple_comm.create_simple_comm(comm_id, sentence_string=u'Super simple sentence .')

Create a simple (valid) Communication suitable for testing purposes

The Communication will have a single Section containing a single Sentence.

Parameters:
  • comm_id (str) – Specifies a Communication ID
  • sentence_string (str) – String to be used for the sentence text. The string will be whitespace-tokenized.
Returns:

Return type:

Communication