concrete.util.simple_comm module

Create a simple (valid) Communication suitable for testing purposes

class concrete.util.simple_comm.SimpleCommTempFile(n=10, id_fmt='temp-%d', sentence_fmt='Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix='.concrete')

Bases: object

DEPRECATED. Please use create_comm() instead.

Class representing a temporary file of sample concrete objects. Designed to facilitate testing.

path

path to file

Type:str
communications

List of communications that were written to file

Type:Communication[]

Usage:

from concrete.util import CommunicationReader
with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f:
    reader = CommunicationReader(f.path)
    for (orig_comm, comm_path_pair) in zip(f.communications, reader):
        print(orig_comm.id)
        print(orig_comm.id == comm_path_pair[0].id)
        print(f.path == comm_path_pair[1])

Create temp file and write communications.

Parameters:
  • n – i number of communications to write
  • id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
  • sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
  • writer_class – CommunicationWriter or CommunicationWriterTGZ
  • suffix – file path suffix (you probably want to choose this to match writer_class)
concrete.util.simple_comm.add_annotation_level_argparse_argument(parser)

Add an ‘–annotation-level’ argument to an ArgumentParser

The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See create_comm() for details.

Parameters:parser (argparse.ArgumentParser) – the parser to add the argument to
concrete.util.simple_comm.create_comm(comm_id, text='', comm_type='article', section_kind='passage', metadata_tool='concrete-python', metadata_timestamp=None, annotation_level='token')

Create a simple, valid Communication from text.

By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections. Each section will be created with a call to create_section().

annotation_level controls the amount of annotation that is added:

  • AL_NONE: add no optional annotations (not even sections)
  • AL_SECTION: add sections but not sentences
  • AL_SENTENCE: add sentences but not tokens
  • AL_TOKEN: add all annotations, up to tokens (the default)
Parameters:
  • comm_id (str) – Communication id
  • text (str) – Communication text
  • comm_type (str) – Communication type
  • section_kind (str) – Section kind to set on all sections
  • metadata_tool (str) – tool name of analytic that generated this text
  • metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used.
  • annotation_level (str) – string representing annotation level to add to communication (see above)
Returns:

Communication containing given text and metadata

concrete.util.simple_comm.create_section(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Section from provided text and metadata. Section text will be split into sentence texts by newlines and each sentence will be created with a call to create_sentence().

Lower-level routine (called by create_comm()).

Parameters:
  • sec_text (str) – text to create section from
  • sec_start (int) – starting position of section in Communication text (inclusive)
  • sec_end (int) – ending position of section in Communication text (inclusive)
  • section_kind (str) – value for Section.kind field to be set to
  • aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this section
  • metadata_tool (str) – tool name of the analytic that generated this section
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Concrete Section containing given text and metadata

concrete.util.simple_comm.create_sentence(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Sentence from provided text and metadata.

Lower-level routine (called indirectly by create_comm())

Parameters:
  • sen_text (str) – text to create sentence from
  • sen_start (int) – starting position of sentence in Communication text (inclusive)
  • sen_end (int) – ending position of sentence in Communication text (inclusive)
  • aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this sentence
  • metadata_tool (str) – tool name of the analytic that generated this sentence
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Concrete Sentence containing given text and metadata

concrete.util.simple_comm.create_simple_comm(comm_id, sentence_string='Super simple sentence .')

Create a simple (valid) Communication suitable for testing purposes

The Communication will have a single Section containing a single Sentence.

Parameters:
  • comm_id (str) – Communication id
  • sentence_string (str) – Communication text
Returns:

Communication containing given text and having the given id