Concrete-python Documentation

Concrete-python is the Python interface to Concrete, a natural language processing data format and set of service protocols that work across different operating systems and programming languages via Apache Thrift. Concrete-python contains generated Python classes, utility classes and functions, and scripts. It does not contain the Thrift schema for Concrete, which can be found in the Concrete GitHub repository.

Tutorial

https://travis-ci.org/hltcoe/concrete-python.svg https://ci.appveyor.com/api/projects/status/0346c3lu11vj8xqj?svg=true

Concrete-python is the Python interface to Concrete, a natural language processing data format and set of service protocols that work across different operating systems and programming languages via Apache Thrift. Concrete-python contains generated Python classes, utility classes and functions, and scripts. It does not contain the Thrift schema for Concrete, which can be found in the Concrete GitHub repository.

For information about installing and using concrete-python, please see the online documentation.

License

Copyright 2012-2017 Johns Hopkins University HLTCOE. All rights reserved. This software is released under the 2-clause BSD license. Please see LICENSE for more information.

Requirements

concrete-python is tested on Python 2.7 or 3.5 (it does not run on Python 2.6; it may run on more Python 3.x versions) and requires the Thrift Python library, among other Python libraries. These are installed automatically by setup.py or pip. The Thrift compiler is not required.

Note: The accelerated protocol offers a (de)serialization speedup of 10x or more; if you would like to use it, ensure a C++ compiler is available on your system before installing concrete-python. (If a compiler is not available, concrete-python will fall back to the unaccelerated protocol automatically.) If you are on Linux, a suitable C++ compiler will be listed as g++ or gcc-c++ in your package manager.

Installation

You can install Concrete using the pip package manager:

pip install concrete

or by cloning the repository and running setup.py:

git clone https://github.com/hltcoe/concrete-python.git
cd concrete-python
python setup.py install

Basic usage

Here and in the following sections we make use of an example Concrete Communication file included in the concrete-python source distribution. The Communication type represents an article, book, post, Tweet, or any other kind of document that we might want to store and analyze. Copy it from tests/testdata/serif_dog-bites-man.concrete if you have the concrete-python source distribution or download it separately here: serif_dog-bites-man.concrete.

First we use the concrete-inspect.py tool (explained in more detail in the following section) to inspect some of the contents of the Communication:

concrete-inspect.py --text serif_dog-bites-man.concrete

This command prints the text of the Communication to the console. In our case the text is a short article formatted in SGML:

<DOC id="dog-bites-man" type="other">
<HEADLINE>
Dog Bites Man
</HEADLINE>
<TEXT>
<P>
John Smith, manager of ACMÉ INC, was bit by a dog on March 10th, 2013.
</P>
<P>
He died!
</P>
<P>
John's daughter Mary expressed sorrow.
</P>
</TEXT>
</DOC>

Now run the following command to inspect some of the annotations stored in that Communication:

concrete-inspect.py --ner --pos --dependency serif_dog-bites-man.concrete

This command shows a tokenization, part-of-speech tagging, named entity tagging, and dependency parse in a CoNLL-like columnar format:

INDEX       TOKEN   POS     NER     HEAD    DEPREL
-----       -----   ---     ---     ----    ------
1   John    NNP     PER     2       compound
2   Smith   NNP     PER     10      nsubjpass
3   ,       ,
4   manager NN              2       appos
5   of      IN              7       case
6   ACMÉ    NNP     ORG     7       compound
7   INC     NNP     ORG     4       nmod
8   ,       ,
9   was     VBD             10      auxpass
10  bit     NN              0       ROOT
11  by      IN              13      case
12  a       DT              13      det
13  dog     NN              10      nmod
14  on      IN              15      case
15  March   DATE-NNP                13      nmod
16  10th    JJ              15      amod
17  ,       ,
18  2013    CD              13      amod
19  .       .

1   He      PRP             2       nsubj
2   died    VBD             0       ROOT
3   !       .

1   John    NNP     PER     3       nmod:poss
2   's      POS             1       case
3   daughter        NN              5       dep
4   Mary    NNP     PER     5       nsubj
5   expressed       VBD             0       ROOT
6   sorrow  NN              5       dobj
7   .       .

Reading Concrete

There are even more annotations stored in this Communication, but for now we move on to demonstrate handling of the Communication in Python. The example file contains a single Communication, but many (if not most) files contain several. The same code can be used to read Communications in a regular file, tar archive, or zip archive:

from concrete.util import CommunicationReader

for (comm, filename) in CommunicationReader('serif_dog-bites-man.concrete'):
    print(comm.id)
    print()
    print(comm.text)

This loop prints the unique ID and text (the same text we saw before) of our one Communication:

tests/testdata/serif_dog-bites-man.xml

<DOC id="dog-bites-man" type="other">
<HEADLINE>
Dog Bites Man
</HEADLINE>
<TEXT>
<P>
John Smith, manager of ACMÉ INC, was bit by a dog on March 10th, 2013.
</P>
<P>
He died!
</P>
<P>
John's daughter Mary expressed sorrow.
</P>
</TEXT>
</DOC>

In addition to the general-purpose CommunicationReader there is a convenience function for reading a single Communication from a regular file:

from concrete.util import read_communication_from_file

comm = read_communication_from_file('serif_dog-bites-man.concrete')

Communications are broken into Sections, which are in turn broken into Sentences, which are in turn broken into Tokens (and that’s only scratching the surface). To traverse this decomposition:

from concrete.util import lun, get_tokens

for section in lun(comm.sectionList):
    print('* section')
    for sentence in lun(section.sentenceList):
        print('  + sentence')
        for token in get_tokens(sentence.tokenization):
            print('    - ' + token.text)

The output is:

* section
* section
  + sentence
    - John
    - Smith
    - ,
    - manager
    - of
    - ACMÉ
    - INC
    - ,
    - was
    - bit
    - by
    - a
    - dog
    - on
    - March
    - 10th
    - ,
    - 2013
    - .
* section
  + sentence
    - He
    - died
    - !
* section
  + sentence
    - John
    - 's
    - daughter
    - Mary
    - expressed
    - sorrow
    - .

Here we used get_tokens, which abstracts the process of extracting a sequence of Tokens from a Tokenization, and lun, which returns its argument or (if its argument is None) an empty list and stands for “list un-none”. Many fields in Concrete are optional, including Communication.sectionList and Section.sentenceList; checking for None quickly becomes tedious.

In this Communication the tokens have been annotated with part-of-speech tags, as we saw previously using concrete-inspect.py. We can print them with the following code:

from concrete.util import get_tagged_tokens

for section in lun(comm.sectionList):
    print('* section')
    for sentence in lun(section.sentenceList):
        print('  + sentence')
        for token_tag in get_tagged_tokens(sentence.tokenization, 'POS'):
            print('    - ' + token_tag.tag)

The output is:

* section
* section
  + sentence
    - NNP
    - NNP
    - ,
    - NN
    - IN
    - NNP
    - NNP
    - ,
    - VBD
    - NN
    - IN
    - DT
    - NN
    - IN
    - DATE-NNP
    - JJ
    - ,
    - CD
    - .
* section
  + sentence
    - PRP
    - VBD
    - .
* section
  + sentence
    - NNP
    - POS
    - NN
    - NNP
    - VBD
    - NN
    - .

Writing Concrete

We can add a new part-of-speech tagging to the Communication as well. Let’s add a simplified version of the current tagging:

from concrete.util import AnalyticUUIDGeneratorFactory, now_timestamp
from concrete import TokenTagging, TaggedToken, AnnotationMetadata

augf = AnalyticUUIDGeneratorFactory(comm)
aug = augf.create()

for section in lun(comm.sectionList):
    for sentence in lun(section.sentenceList):
        sentence.tokenization.tokenTaggingList.append(TokenTagging(
            uuid=aug.next(),
            metadata=AnnotationMetadata(
                tool='Simple POS',
                timestamp=now_timestamp(),
                kBest=1
            ),
            taggingType='POS',
            taggedTokenList=[
                TaggedToken(
                    tokenIndex=original.tokenIndex,
                    tag=original.tag.split('-')[-1][:2],
                )
                for original
                in get_tagged_tokens(sentence.tokenization, 'POS')
            ]
        ))

Here we used generate_UUID, which generates a random UUID object, and now_timestamp, which returns a Concrete timestamp representing the current time. But now how do we know which tagging is ours? Each annotation’s metadata contains a tool name, and we can use it to distinguish between competing annotations:

from concrete.util import get_tagged_tokens

for section in lun(comm.sectionList):
    print('* section')
    for sentence in lun(section.sentenceList):
        print('  + sentence')
        token_tag_pairs = zip(
            get_tagged_tokens(sentence.tokenization, 'POS', tool='Serif: part-of-speech'),
            get_tagged_tokens(sentence.tokenization, 'POS', tool='Simple POS')
        )
        for (old_tag, new_tag) in token_tag_pairs:
            print('    - ' + old_tag.tag + ' -> ' + new_tag.tag)

The output shows our new part-of-speech tagging has a smaller, simpler set of possible values:

* section
* section
  + sentence
    - NNP -> NN
    - NNP -> NN
    - , -> ,
    - NN -> NN
    - IN -> IN
    - NNP -> NN
    - NNP -> NN
    - , -> ,
    - VBD -> VB
    - NN -> NN
    - IN -> IN
    - DT -> DT
    - NN -> NN
    - IN -> IN
    - DATE-NNP -> NN
    - JJ -> JJ
    - , -> ,
    - CD -> CD
    - . -> .
* section
  + sentence
    - PRP -> PR
    - VBD -> VB
    - . -> .
* section
  + sentence
    - NNP -> NN
    - POS -> PO
    - NN -> NN
    - NNP -> NN
    - VBD -> VB
    - NN -> NN
    - . -> .

Finally, let’s write our newly annotated Communication back to disk:

from concrete.util import CommunicationWriter

with CommunicationWriter('serif_dog-bites-man.concrete') as writer:
    writer.write(comm)

concrete-inspect.py

Use concrete-inspect.py to quickly explore the contents of a Communication from the command line. concrete-inspect.py and other scripts are installed to the path along with the concrete-python library.

–id

Run the following command to print the unique ID of our modified example Communication:

concrete-inspect.py --id serif_dog-bites-man.concrete

Output:

tests/testdata/serif_dog-bites-man.xml

–metadata

Use --metadata to print the stored annotations along with their tool names:

concrete-inspect.py --metadata serif_dog-bites-man.concrete

Output:

Communication:  concrete_serif v3.10.1pre

  Tokenization:  Serif: tokens

    Dependency Parse:  Stanford

    Parse:  Serif: parse

    TokenTagging:  Serif: names
    TokenTagging:  Serif: part-of-speech
    TokenTagging:  Simple POS

  EntityMentionSet #0:  Serif: names
  EntityMentionSet #1:  Serif: values
  EntityMentionSet #2:  Serif: mentions

  EntitySet #0:  Serif: doc-entities
  EntitySet #1:  Serif: doc-values

  SituationMentionSet #0:  Serif: relations
  SituationMentionSet #1:  Serif: events

  SituationSet #0:  Serif: relations
  SituationSet #1:  Serif: events

  CommunicationTagging:  lda
  CommunicationTagging:  urgency

–sections

Use --sections to print the text of the Communication, broken out by section:

concrete-inspect.py --sections serif_dog-bites-man.concrete

Output:

Section 0 (0ab68635-c83d-4b02-b8c3-288626968e05), from 81 to 82:



Section 1 (54902d75-1841-4d8d-b4c5-390d4ef1a47a), from 85 to 162:

John Smith, manager of ACMÉ INC, was bit by a dog on March 10th, 2013.
</P>


Section 2 (7ec8b7d9-6be0-4c62-af57-3c6c48bad711), from 165 to 180:

He died!
</P>


Section 3 (68da91a1-5beb-4129-943d-170c40c7d0f7), from 183 to 228:

John's daughter Mary expressed sorrow.
</P>

–entities

Use --entities to print the named entities detected in the Communication:

concrete-inspect.py --entities serif_dog-bites-man.concrete

Output:

Entity Set 0 (Serif: doc-entities):
  Entity 0-0:
      EntityMention 0-0-0:
          tokens:     John Smith
          text:       John Smith
          entityType: PER
          phraseType: PhraseType.NAME
      EntityMention 0-0-1:
          tokens:     John Smith , manager of ACMÉ INC ,
          text:       John Smith, manager of ACMÉ INC,
          entityType: PER
          phraseType: PhraseType.APPOSITIVE
          child EntityMention #0:
              tokens:     John Smith
              text:       John Smith
              entityType: PER
              phraseType: PhraseType.NAME
          child EntityMention #1:
              tokens:     manager of ACMÉ INC
              text:       manager of ACMÉ INC
              entityType: PER
              phraseType: PhraseType.COMMON_NOUN
      EntityMention 0-0-2:
          tokens:     manager of ACMÉ INC
          text:       manager of ACMÉ INC
          entityType: PER
          phraseType: PhraseType.COMMON_NOUN
      EntityMention 0-0-3:
          tokens:     He
          text:       He
          entityType: PER
          phraseType: PhraseType.PRONOUN
      EntityMention 0-0-4:
          tokens:     John
          text:       John
          entityType: PER.Individual
          phraseType: PhraseType.NAME

  Entity 0-1:
      EntityMention 0-1-0:
          tokens:     ACMÉ INC
          text:       ACMÉ INC
          entityType: ORG
          phraseType: PhraseType.NAME

  Entity 0-2:
      EntityMention 0-2-0:
          tokens:     John 's daughter Mary
          text:       John's daughter Mary
          entityType: PER.Individual
          phraseType: PhraseType.NAME
          child EntityMention #0:
              tokens:     Mary
              text:       Mary
              entityType: PER
              phraseType: PhraseType.OTHER
      EntityMention 0-2-1:
          tokens:     daughter
          text:       daughter
          entityType: PER
          phraseType: PhraseType.COMMON_NOUN


Entity Set 1 (Serif: doc-values):
  Entity 1-0:
      EntityMention 1-0-0:
          tokens:     March 10th , 2013
          text:       March 10th, 2013
          entityType: TIMEX2.TIME
          phraseType: PhraseType.OTHER

–mentions

Use --mentions to show the named entity mentions in the Communication, annotated on the text:

concrete-inspect.py --mentions serif_dog-bites-man.concrete

Output:

<ENTITY ID=0><ENTITY ID=0>John Smith</ENTITY> , <ENTITY ID=0>manager of <ENTITY ID=1>ACMÉ INC</ENTITY></ENTITY> ,</ENTITY> was bit by a dog on <ENTITY ID=3>March 10th , 2013</ENTITY> .

<ENTITY ID=0>He</ENTITY> died !

<ENTITY ID=2><ENTITY ID=0>John</ENTITY> 's <ENTITY ID=2>daughter</ENTITY> Mary</ENTITY> expressed sorrow .

–situations

Use --situations to show the situations detected in the Communication:

concrete-inspect.py --situations serif_dog-bites-man.concrete

Output:

Situation Set 0 (Serif: relations):

Situation Set 1 (Serif: events):
  Situation 1-0:
      situationType:    Life.Die

–treebank

Use --treebank to show constituency parse trees of the sentences in the Communication:

concrete-inspect.py --treebank serif_dog-bites-man.concrete

Output:

(S (NP (NPP (NNP john)
            (NNP smith))
       (, ,)
       (NP (NPA (NN manager))
           (PP (IN of)
               (NPP (NNP acme)
                    (NNP inc))))
       (, ,))
   (VP (VBD was)
       (NP (NPA (NN bit))
           (PP (IN by)
               (NP (NPA (DT a)
                        (NN dog))
                   (PP (IN on)
                       (NP (DATE (DATE-NNP march)
                                 (JJ 10th))
                           (, ,)
                           (NPA (CD 2013))))))))
   (. .))


(S (NPA (PRP he))
   (VP (VBD died))
   (. !))


(S (NPA (NPPOS (NPP (NNP john))
               (POS 's))
        (NN daughter)
        (NPP (NNP mary)))
   (VP (VBD expressed)
       (NPA (NN sorrow)))
   (. .))

Other options

Use --ner, --pos, --lemmas, and --dependency (together or independently) to show respective token-level information in a CoNLL-like format, and use --text to print the text of the Communication, as described in a previous section.

Run concrete-inspect.py --help to show a detailed help message explaining the options discussed above and others. All concrete-python scripts have such help messages.

create-comm.py

Use create-comm.py to generate a simple Communication from a text file. For example, create a file called history-of-the-world.txt containing the following text:

The dog ran .
The cat jumped .

The dolphin teleported .

Then run the following command to convert it to a Concrete Communication, creating Sections, Sentences, and Tokens based on whitespace:

create-comm.py --annotation-level token history-of-the-world.txt history-of-the-world.concrete

Use concrete-inspect.py as shown previously to verify the structure of the Communication:

concrete-inspect.py --sections history-of-the-world.concrete

Output:

Section 0 (a188dcdd-1ade-be5d-41c4-fd4d81f71685), from 0 to 30:
The dog ran .
The cat jumped .

Section 1 (a188dcdd-1ade-be5d-41c4-fd4d81f7168a), from 32 to 57:
The dolphin teleported .

Other scripts

concrete-python provides a number of other scripts, including but not limited to:

concrete2json.py
reads in a Concrete Communication and prints a JSON version of the Communication to stdout. The JSON is “pretty printed” with indentation and whitespace, which makes the JSON easier to read and to use for diffs.
create-comm-tarball.py
like create-comm.py but for multiple files: reads in a tar.gz archive of text files, parses them into sections and sentence based on whitespace, and writes them back out as Concrete Communications in another tar.gz archive.
fetch-client.py
connects to a FetchCommunicationService, retrieves one or more Communications (as specified on the command line), and writes them to disk.
fetch-server.py
implements FetchCommunicationService, serving Communications to clients from a file or directory of Communications on disk.
search-client.py
connects to a SearchService, reading queries from the console and printing out results as Communication ids in a loop.
validate-communication.py
reads in a Concrete Communication file and prints out information about any invalid fields. This script is a command-line wrapper around the functionality in the concrete.validate library.

Use the --help flag for details about the scripts’ command line arguments.

Validating Concrete Communications

The Python version of the Thrift Libraries does not perform any validation of Thrift objects. You should use the validate_communication() function after reading and before writing a Concrete Communication:

from concrete.util import read_communication_from_file
from concrete.validate import validate_communication

comm = read_communication_from_file('tests/testdata/serif_dog-bites-man.concrete')

# Returns True|False, logs details using Python stdlib 'logging' module
validate_communication(comm)

Thrift fields have three levels of requiredness:

  • explicitly labeled as required
  • explicitly labeled as optional
  • no requiredness label given (“default required”)

Other Concrete tools will raise an exception if a required field is missing on deserialization or serialization, and will raise an exception if a “default required” field is missing on serialization. By default, concrete-python does not perform any validation of Thrift objects on serialization or deserialization. The Python Thrift classes do provide shallow validate() methods, but they only check for explicitly required fields (not “default required” fields) and do not validate nested objects.

The validate_communication() function recursively checks a Communication object for required fields, plus additional checks for UUID mismatches.

API Reference

Python modules and scripts for working with Concrete, an HLT data specification defined using Thrift.

High-level interface

concrete.inspect module

Functions used by concrete_inspect.py to print data in a Communication.

The function implementations provide useful examples of how to interact with many different Concrete datastructures.

concrete.inspect.penn_treebank_for_parse(parse)

Return a Penn-Treebank style representation of a Parse object

Parameters:parse (Parse) –
Returns:A string containing a Penn Treebank style parse tree representation
Return type:str
concrete.inspect.print_communication_taggings_for_communication(comm, tool=None, communication_tagging_filter=None)

Print information for CommunicationTagging objects

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print information for CommunicationTagging objects with a matching metadata.tool field
  • communication_tagging_filter (func) – If not None, print information for only those CommunicationTagging objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_conll_style_tags_for_communication(comm, char_offsets=False, dependency=False, lemmas=False, ner=False, pos=False, dependency_tool=None, dependency_parse_filter=None, lemmas_tool=None, lemmas_filter=None, ner_tool=None, ner_filter=None, pos_tool=None, pos_filter=None, other_tags=None)

Print ‘CoNLL-style’ tags for the tokens in a Communication. If column is requested (for example, ner is set to True) but there is no such annotation in the communication, that column is not printed (the header is not printed either). If there is more than one such annotation in the communication, one column is printed for each annotation. In the event of differing numbers of annotations per Tokenization, all annotations are printed, but it is not guaranteed that the columns of two different tokenizations correspond to one another.

Parameters:
  • comm (Communication) –
  • char_offsets (bool) – Flag for printing token text specified by a Token‘s (optional) TextSpan
  • dependency (bool) – Flag for printing dependency parse HEAD tags
  • dependency_tool (str) – Deprecated. If not None, only print information for DependencyParse objects if they have a matching metadata.tool field
  • dependency_parse_filter (func) – If not None, print information for only those DependencyParse objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
  • lemmas (bool) – Flag for printing lemma tags (TokenTagging objects of type LEMMA)
  • lemmas_tool (str) – Deprecated. If not None, only print information for TokenTagging objects of type LEMMA if they have a matching metadata.tool field
  • lemmas_filter (func) – If not None, print information for only those LEMMA taggings that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
  • ner (bool) – Flag for printing Named Entity Recognition tags (TokenTagging objects of type NER)
  • ner_tool (str) – Deprecated. If not None, only print information for TokenTagging objects of type NER if they have a matching metadata.tool field
  • ner_filter (func) – If not None, print information for only those NER taggings that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
  • pos (bool) – Flag for printing Part-of-Speech tags (TokenTagging objects of type POS)
  • pos_tool (str) – Deprecated. If not None, only print information for TokenTagging objects of type POS if they have a matching metadata.tool field
  • pos_filter (func) – If not None, print information for only those POS taggings that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
  • other_tags (dict) – Map of other tagging types to print (as keys) to annotation filters, or None. If the value (annotation filter) of a given tagging type is not None, print information for only those taggings that pass the filter (should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered)).
concrete.inspect.print_entities(comm, tool=None, entity_set_filter=None)

Print information for Entity objects and their associated EntityMention objects

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print information for EntitySet objects with a matching metadata.tool field
  • entity_set_filter (func) – If not None, print information for only those EntitySet objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_id_for_communication(comm, tool=None, communication_filter=None)

Print ID field of Communication

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print ID of Communication objects with a matching metadata.tool field
  • communication_filter (func) – If not None, print information for only those Communication objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_metadata(comm, tool=None, annotation_filter=None)

Print metadata tools used to annotate Communication

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print AnnotationMetadata information for objects with a matching metadata.tool field
  • annotation_filter (func) – If not None, print information for only those objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_penn_treebank_for_communication(comm, tool=None, parse_filter=None)

Print Penn-Treebank parse trees for all Tokenization objects

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print information for Tokenization objects with a matching metadata.tool field
  • parse_filter (func) – If not None, print information for only those Parse objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_sections(comm, tool=None, communication_filter=None)

Print information for all Section object, according to their spans.

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print information for Section objects with a matching metadata.tool field
  • communication_filter (func) – If not None, print information for only those Communication objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_situation_mentions(comm, tool=None, situation_mention_set_filter=None)

Print information for all SituationMention`s (some of which may not have a :class:.Situation`)

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print information for SituationMention objects with a matching metadata.tool field
  • situation_mention_set_filter (func) – If not None, print information for only those SituationMentionSet objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_situations(comm, tool=None, situation_set_filter=None)

Print information for all Situation objects and their associated SituationMention objects

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print information for Situation objects with a matching metadata.tool field
  • situation_set_filter (func) – If not None, print information for only those SituationSet objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_text_for_communication(comm, tool=None, communication_filter=None)

Print text field of :class:.Communication`

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print text field of Communication objects with a matching metadata.tool field
  • communication_filter (func) – If not None, print information for only those Communication objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_tokens_for_communication(comm, tool=None, tokenization_filter=None)

Print token text for a Communication

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print token text for Communication objects with a matching metadata.tool field
  • tokenization_filter (func) – If not None, print information for only those Tokenization objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.inspect.print_tokens_with_entityMentions(comm, tool=None, entity_mention_set_filter=None)

Print information for Token objects that are part of an EntityMention

Parameters:
  • comm (Communication) –
  • tool (str) – Deprecated. If not None, only print information for tokens that are associated with an EntityMention that is part of an EntityMentionSet with a matching metadata.tool field
  • entity_mention_set_filter (func) – If not None, print information for only those EntityMentionSet objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).

concrete.util package

Utility code for working with Concrete

concrete.util.access module
class concrete.util.access.CommunicationContainerFetchHandler(communication_container)

Bases: object

FetchCommunicationService implementation using Communication containers

Implements the FetchCommunicationService interface, retrieving Communications from a dict-like communication_container object that maps Communication ID strings to Communications. The communication_container could be an actual dict, or a container such as:

Usage:

from concrete.util.access_wrapper import FetchCommunicationServiceWrapper

handler = CommunicationContainerFetchHandler(comm_container)
fetch_service = FetchCommunicationServiceWrapper(handler)
fetch_service.serve(host, port)
Parameters:communication_container – Dict-like object that maps Communication IDs to Communications
about()
alive()
fetch(fetch_request)
getCommunicationCount()
getCommunicationIDs(offset, count)
class concrete.util.access.DirectoryBackedStoreHandler(store_path)

Bases: object

Simple StoreCommunicationService implementation using a directory

Implements the StoreCommunicationService interface, storing Communications in a directory.

Parameters:store_path – Path where Communications should be Stored
about()
alive()
store(communication)

Save Communication to a directory

Stored Communication files will be named [COMMUNICATION_ID].comm. If a file with that name already exists, it will be overwritten.

class concrete.util.access.RedisHashBackedStoreHandler(redis_db, key)

Bases: object

Simple StoreCommunicationService implementation using a Redis hash.

Implements the StoreCommunicationService interface, storing Communications in a Redis hash, indexed by id.

Parameters:
  • redis_db (redis.Redis) – Redis database connection object
  • key (str) – key of hash in redis database
about()
alive()
store(communication)

Save Communication to a Redis hash, using the Communication id as a key.

Parameters:communication (Communication) – communication to store
class concrete.util.access.RelayFetchHandler(host, port)

Bases: object

Implements a ‘relay’ to another FetchCommunicationService server.

A FetchCommunicationService that acts as a relay to a second FetchCommunicationService, where the second service is using the TSocket transport and TCompactProtocol protocol.

This class was designed for the use case where you have Thrift JavaScript code that needs to communicate with a FetchCommunicationService server, but the server does not support the same Thrift serialization protocol as the JavaScript client.

The de-facto standard for Concrete services is to use the TCompactProtocol serialization protocol over a TSocket connection. But as of Thrift 0.10.0, the Thrift JavaScript libraries only support using TJSONProtocol over HTTP.

The RelayFetchHandler class is intended to be used as server-side code by a web application. The JavaScript code will make FetchCommunicationService RPC calls to the web server using HTTP/TJSONProtocol, and the web application will then pass these RPC calls to another FetchCommunicationService using TSocket/TCompactProtocol RPC calls.

Parameters:
about()
alive()
fetch(request)
getCommunicationCount()
getCommunicationIDs(offset, count)
class concrete.util.access.S3BackedStoreHandler(bucket, prefix_len=4)

Bases: object

Simple StoreCommunicationService implementation using an AWS S3 bucket.

Implements the StoreCommunicationService interface, storing Communications in an S3 bucket, indexed by id, optionally prefixed with a fixed-length, random-looking but deterministic hash to improve performance.

References

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

Parameters:
  • bucket (boto.s3.bucket.Bucket) – S3 bucket object
  • prefix_len (int) – length of prefix to add to a Communication id to form its key. A prefix of length four enables S3 to better partition the bucket contents, yielding higher performance and a lower chance of getting rate-limited by AWS.
about()

Return S3BackedStoreHandler service information.

Returns:An object of type ServiceInfo
alive()

Return whether service is alive and running.

Returns:True or False
store(communication)

Save Communication to an S3 bucket, using the Communication id with a hash prefix of length self.prefix_len as a key.

Parameters:communication (Communication) – communication to store
concrete.util.access.prefix_s3_key(key_str, prefix_len)

Given unprefixed S3 key key_str, prefix the key with a deterministic prefix of hex characters of length prefix_len and return the result. Keys with such prefixes enable better performance on S3 and reduce the likelihood of rate-limiting.

Parameters:
  • key_str (str) – original (unprefixed) key, as a string
  • prefix_len (int) – length of prefix to add to key
Returns:

prefixed key

References

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

concrete.util.access.unprefix_s3_key(prefixed_key_str, prefix_len)

Given prefixed S3 key key_str, remove prefix of length prefix_len from the key and return the result. Keys with random-looking prefixes enable better performance on S3 and reduce the likelihood of rate-limiting.

Parameters:
  • preixed_key_str (str) – prefixed key, as a string
  • prefix_len (int) – length of prefix to remove from key
Returns:

unprefixed key

References

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

concrete.util.access_wrapper module
class concrete.util.access_wrapper.FetchCommunicationClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/access/FetchCommunicationService.pyc'>
class concrete.util.access_wrapper.FetchCommunicationServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/access/FetchCommunicationService.pyc'>
class concrete.util.access_wrapper.StoreCommunicationClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/access/StoreCommunicationService.pyc'>
class concrete.util.access_wrapper.StoreCommunicationServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/access/StoreCommunicationService.pyc'>
class concrete.util.access_wrapper.SubprocessFetchCommunicationServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of FetchCommunicationServiceWrapper

class concrete.util.access_wrapper.SubprocessStoreCommunicationServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of StoreCommunicationServiceWrapper

concrete.util.annotate_wrapper module
class concrete.util.annotate_wrapper.AnnotateCommunicationClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/annotate/AnnotateCommunicationService.pyc'>
class concrete.util.annotate_wrapper.AnnotateCommunicationServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/annotate/AnnotateCommunicationService.pyc'>
class concrete.util.annotate_wrapper.SubprocessAnnotateCommunicationServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of AnnotateCommunicationServiceWrapper

concrete.util.comm_container module

Communication Containers - mapping Communication IDs to Communications

Classes that behave like a read-only dictionary (implementing Python’s collections.Mapping interface) and map Communication ID strings to Communications.

The classes abstract away the storage backend. If you need to optimize for performance, you may not want to use a dictionary abstraction that retrieves one Communication at a time.

class concrete.util.comm_container.DirectoryBackedCommunicationContainer(directory_path, comm_extensions=[u'.comm', u'.concrete', u'.gz'])

Bases: _abcoll.Mapping

Maps Comm IDs to Comms, retrieving Comms from the filesystem

DirectoryBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from the filesystem.

Upon initialization, a DirectoryBackedCommunicationContainer instance will (recursively) search directory_path for any files that end with the specified comm_extensions. Files with matching extensions are assumed to be Communication files whose filename (sans extension) is the file’s Communication ID. So, for example, a file named ‘XIN_ENG_20101212.0120.concrete’ is assumed to be a Communication file with a Communication ID of ‘XIN_ENG_20101212.0120’.

Files with the extension .gz will be decompressed using gzip.

A DirectoryBackedCommunicationsContainer will not be able to find any files that are added to directory_path after the container was initialized.

Parameters:
  • directory_path (str) – Path to directory containing Communications files
  • comm_extensions (str[]) – List of strings specifying filename extensions to be associated with Communications
class concrete.util.comm_container.FetchBackedCommunicationContainer(host, port)

Bases: _abcoll.Mapping

Maps Comm IDs to Comms, retrieving Comms from a FetchCommunicationService server

FetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from a FetchCommunicationService.

If you need to retrieve large amounts of data from a FetchCommunicationService, then you SHOULD NOT USE THIS CLASS. This class retrieves one Communication at a time using FetchCommunicationService.

Parameters:
class concrete.util.comm_container.MemoryBackedCommunicationContainer(communications_file, max_file_size=1073741824)

Bases: _abcoll.Mapping

Maps Comm IDs to Comms by loading all Comms in file into memory

FetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. All Communications in communications_file will be read into memory using a CommunicationReader instance.

Parameters:
  • communications_file (str) – String specifying name of Communications file
  • max_file_size (int) – Maximum file size, in bytes
class concrete.util.comm_container.RedisHashBackedCommunicationContainer(redis_db, key)

Bases: _abcoll.Mapping

Provides access to Communications stored in a Redis hash, assuming the key of each communication is its Communication id.

RedisHashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from a Redis hash.

Parameters:
  • redis_db (redis.Redis) – Redis database connection object
  • key (str) – Key in redis database where hash is located
class concrete.util.comm_container.S3BackedCommunicationContainer(bucket, prefix_len=4)

Bases: _abcoll.Mapping

Provides access to Communications stored in an AWS S3 bucket, assuming the key of each communication is its Communication id (optionally prefixed with a fixed-length, random-looking but deterministic hash to improve performance).

S3HashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs (with or without prefixes) to Communications. Communications are lazily retrieved from an S3 bucket.

References

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

Parameters:
  • bucket (boto.s3.bucket.Bucket) – S3 bucket object
  • prefix_len (int) – length of prefix in each Communication’s key in the bucket. This number of characters will be removed from the beginning of the key to determine the Communication id (without incurring the cost of fetching and deserializing the Communication). A prefix enables S3 to better partition the bucket contents, yielding higher performance and a lower chance of getting rate-limited by AWS.
class concrete.util.comm_container.ZipFileBackedCommunicationContainer(zipfile_path, comm_extensions=[u'.comm', u'.concrete'])

Bases: _abcoll.Mapping

Maps Comm IDs to Comms, retrieving Comms from a Zip file

ZipFileBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from a Zip file.

Parameters:
  • zipfile_path (str) – Path to Zip file containing Communications
  • comm_extensions (str[]) – List of strings specifying filename extensions associated with Communications
concrete.util.concrete_uuid module

Helper functions for generating Concrete UUID objects

class concrete.util.concrete_uuid.AnalyticUUIDGeneratorFactory(comm=None)

Bases: object

Primary interface to generation of compressible UUIDs. Each compressible UUID takes the form

xxxxxxxx-xxxx-yyyy-yyyy-zzzzzzzzzzzz

where each instance of x, y, or z is a hexadecimal digit, the group of x digits is shared across all annotations in a Communication, the group of y digits is shared across all annotations generated by a given analytic (by convention, AnnotationMetadata tool) in a given Communication, and the group of z digits is unique to each annotation (generated by a given analytic). Thus all UUIDs in a Communication share the same first twelve hex digits and some UUIDs in a Communication share the same middle eight hex digits. Additionally, while the x and y components are generated uniformly at random, the z component for each analytic in a Communication starts at a uniform-at-random twelve hex digits for the first annotation and increments by one for each annotation thereafter. Thus the UUIDs of a Communication likely have many substrings in common and are easily compressed. For example, we might find the following seven UUIDs in a Communication, corresponding to seven annotations split across two analytics:

1bccb123-be45-7288-028a-4fdf3181ab51 1bccb123-be45-7288-028a-4fdf3181ab52 1bccb123-be45-7288-028a-4fdf3181ab53 1bccb123-be45-df12-9c04-198eaa130a4e 1bccb123-be45-df12-9c04-198eaa130a4f 1bccb123-be45-df12-9c04-198eaa130a50 1bccb123-be45-df12-9c04-198eaa130a51

One generator factory should be created per Communication, and a new generator should be created from that factory for each analytic processing the communication. Often each program represents a single analytic, so common usage is:

augf = AnalyticUUIDGeneratorFactory(comm)
aug = augf.create()
for <each annotation object created by this analytic>:
    annotation = next(aug)
    <add annotation to communication>

or if you’re creating a new Communication:

augf = AnalyticUUIDGeneratorFactory()
aug = augf.create()
comm = <create communication>
comm.uuid = next(aug)
for <each annotation object created by this analytic>:
    annotation = next(aug)
    <add annotation to communication>

where the annotation objects might be objects of type Parse, DependencyParse, TokenTagging, CommunicationTagging, etc.

create()
Returns:A UUID generator for a new analytic.
class concrete.util.concrete_uuid.UUIDClustering(comm)

Bases: object

Representation of the UUID instance clusters in a concrete communication (each cluster represents the set of nested members of the communication that reference or are identified by a given UUID).

hashable_clusters()

Hashable version of UUIDClustering.

Two UUIDClusterings c1 and c2 are equivalent (the two underlying Communications’ UUID structures are equivalent) if and only if:

c1.hashable_clusters() == c2.hashable_clusters()
Returns:The set of unlabeled UUID clusters in a unique and hashable format.
class concrete.util.concrete_uuid.UUIDCompressor(single_analytic=False)

Bases: object

Interface to replacing a Communication’s UUIDs with compressible UUIDs.

Parameters:single_analytic (bool) – True to generate new UUIDs using a single analytic for all annotations, false to use the annotation metadata tool name as the analytic id
compress(comm)

Return a copy of a communication whose UUIDs have been replaced by compressible UUIDs using AnalyticUUIDGeneratorFactory. When this method returns this object’s public member variable uuid_map will contain a dictionary mapping the original UUIDs to the new UUIDs.

Parameters:comm (Communication) – communication to be copied (the UUIDs of the copy will be made compressible)
Returns:Deep copy of comm with compressed UUIDs
Return type:Communication
concrete.util.concrete_uuid.bin_to_hex(b, n=None)

Return hexadecimal representation of binary value

Parameters:
  • b (int) – integer whose bit representation will be converted
  • n (int) – length of returned hexadecimal string (the string will be left-padded with 0s if it is originally shorter than n; an exception will be thrown if it is longer; the string will be returned as-is if n is None)
Returns:

a string of hexadecimal characters representing the bit sequence in b, padded to be n characters long if n is not None

Raises:

ValueError – if n is not None and the hexadecimal string representing b is longer than n

concrete.util.concrete_uuid.compress_uuids(comm, verify=False, single_analytic=False)

Create a copy of Communication comm with UUIDs converted according to the compressible UUID scheme

Parameters:
  • comm (Communication) –
  • verify (bool) – If True, use a heuristic to verify the UUID link structure is preserved in the new Communication
  • single_analytic (bool) – If True, use a single analytic prefix for all UUIDs in comm.
Returns:

A 2-tuple containing the new Communication (converted using the compressible UUID scheme) and the UUIDCompressor object used to perform the conversion.

Raises:

ValueError – If verify is True and comm has references added, raise because verification would cause an infinite loop.

concrete.util.concrete_uuid.generate_UUID()

Return a Concrete UUID object with a random UUID4 value.

Returns:a Concrete UUID object
concrete.util.concrete_uuid.generate_hex_unif(n)

Generate and return random string of n hexadecimal characters.

Parameters:n (int) – number of characters of string to return
Returns:string of n i.i.d. uniform hexadecimal characters
concrete.util.concrete_uuid.generate_uuid_unif()

Generate and return random UUID string whose characters are drawn uniformly from the hexadecimal alphabet.

Returns:string of hexadecimal characters drawn uniformly at random (delimited into five UUID-like segments by hyphens)
concrete.util.concrete_uuid.hex_to_bin(h)

Return binary encoding of hexadecimal string

Parameters:h (str) – string of hexadecimal characters
Returns:an integer whose bit representation corresponds to the hexadecimal representation in h
concrete.util.concrete_uuid.join_uuid(xs, ys, zs)

Given three hexadecimal strings of sizes 12, 8, and 12, join them into a UUID string (inserting hyphens appropriately) and return the result.

Parameters:
  • xs (str) – 12 hexadecimal characters that will form first two segments of the UUID string (size 8 and size 4 respectively)
  • ys (str) – 8 hexadecimal characters that will form the third and fourth segment of the UUID string (each of size 4)
  • zs (str) – 12 hexadecimal characters that will form the last segment of the UUID string (size 12)
Returns:

string of size 36 (12 + 8 + 12 = 32, plus four hyphens inserted appropriately) comprising UUID formed from xs, ys, and zs

Raises:

ValueError – if xs, ys, or zs have incorrect length

concrete.util.concrete_uuid.split_uuid(u)

Split UUID string into three hexadecimal strings of sizes 12, 8, and 12, returning those three strings (with hyphens stripped) in a tuple.

Parameters:u (str) – UUID string
Returns:a tuple of three hexadecimal strings of sizes 12, 8, and 12, corresponding to the first two segments, middle two segments, and last segment of the input UUID string (with all hyphens stripped)
Raises:ValueError – if UUID string is malformatted
concrete.util.file_io module

Code for reading and writing Concrete Communications

class concrete.util.file_io.CommunicationReader(filename, add_references=True, filetype=0)

Bases: concrete.util.file_io.ThriftReader

Iterator/generator class for reading one or more Communications from a file

The iterator returns a (Communication, filename) tuple

Supported filetypes are:

  • a file with a single Communication
  • a file with multiple Communications concatenated together
  • a gzipped file with a single Communication
  • a gzipped file with multiple Communications concatenated together
  • a .tar.gz file with one or more Communications
  • a .zip file with one or more Communications

Sample usage:

for (comm, filename) in CommunicationReader('multiple_comms.tar.gz'):
    do_something(comm)
Parameters:
class concrete.util.file_io.CommunicationWriter(filename=None, gzip=False)

Bases: object

Class for writing one or more Communications to a file

Sample usage:

writer = CommunicationWriter('foo.concrete')
writer.write(existing_comm_object)
writer.close()
Parameters:
  • filename (str) – if specified, open file at this path during construction (a file can alternatively be opened after construction using the open method)
  • gzip (bool) – Flag indicating if file should be compressed with gzip
close()

Close file.

open(filename)

Open specified file for writing. File will be compressed if the gzip flag of the constructor was set to True.

Parameters:filename (str) – path to file to open for writing
write(comm)
Parameters:comm (Communication) – communication to write to file
class concrete.util.file_io.CommunicationWriterTGZ(tar_filename=None)

Bases: concrete.util.file_io.CommunicationWriterTar

Class for writing one or more Communications to a .TAR.GZ archive

Sample usage:

writer = CommunicationWriterTGZ('multiple_comms.tgz')
writer.write(comm_object_one, 'comm_one.concrete')
writer.write(comm_object_two, 'comm_two.concrete')
writer.write(comm_object_three, 'comm_three.concrete')
writer.close()
class concrete.util.file_io.CommunicationWriterTar(tar_filename=None, gzip=False)

Bases: object

Class for writing one or more Communications to a .TAR archive

Sample usage:

writer = CommunicationWriterTar('multiple_comms.tar')
writer.write(comm_object_one, 'comm_one.concrete')
writer.write(comm_object_two, 'comm_two.concrete')
writer.write(comm_object_three, 'comm_three.concrete')
writer.close()
Parameters:
  • tar_filename (str) – if specified, open file at this path during construction (a file can alternatively be opened after construction using the open method)
  • gzip (bool) – Flag indicating if .TAR file should be compressed with gzip
close()

Close tar file.

open(tar_filename)

Open specified tar file for writing. File will be compressed if the gzip flag of the constructor was set to True.

Parameters:tar_filename (str) – path to file to open for writing
write(comm, comm_filename=None)
Parameters:
  • comm (Communication) – communication to write to tar file
  • comm_filename (str) – desired filename of communication within tar file (by default the filename will be the communication id appended with a .concrete extension)
class concrete.util.file_io.ThriftReader(thrift_type, filename, postprocess=None, filetype=0)

Bases: object

Iterator/generator class for reading one or more Thrift structures from a file

The iterator returns a (obj, filename) tuple where obj is an object of type thrift_type.

Supported filetypes are:

  • a file with a single Thrift structure
  • a file with multiple Thrift structures concatenated together
  • a gzipped file with a single Thrift structure
  • a gzipped file with multiple Thrift structures concatenated together
  • a .tar.gz file with one or more Thrift structures
  • a .zip file with one or more Thrift structures

Sample usage:

for (comm, filename) in ThriftReader(Communication,
                                     'multiple_comms.tar.gz'):
    do_something(comm)
Parameters:
  • thrift_type – Class for Thrift type, e.g. Communication, TokenLattice
  • filename (str) –
  • postprocess (function) – A post-processing function that is called with the Thrift object as argument each time a Thrift object is read from the file
  • filetype (FileType) – Expected type of file. Default value is FileType.AUTO, where function will try to automatically determine file type.
Raises:

ValueError – if filetype is not a known filetype name or id

next()

Return tuple containing next communication (and filename) in the sequence.

Raises:StopIteration – if there are no more communications
Returns:tuple containing Communication object and its filename
concrete.util.file_io.read_communication_from_file(communication_filename, add_references=True)

Read a Communication from the file specified by filename

Parameters:
Returns:

Communication read from file

Return type:

Communication

concrete.util.file_io.read_thrift_from_file(thrift_obj, filename)

Instantiate Thrift object from contents of named file

The Thrift file is assumed to be encoded using TCompactProtocol

WARNING - Thrift deserialization tends to fail silently. For example, the Thrift libraries will not complain if you try to deserialize data from the file /dev/urandom.

Parameters:
  • thrift_obj – A Thrift object (e.g. a Communication object)
  • filename (str) – A filename string
Returns:

The Thrift object that was passed in as an argument

concrete.util.file_io.read_tokenlattice_from_file(tokenlattice_filename)

Read a TokenLattice from a file

Parameters:tokenlattice_filename (str) – Name of file containing serialized TokenLattice
Returns:TokenLattice read from file
Return type:TokenLattice
concrete.util.file_io.write_communication_to_file(communication, communication_filename)

Write a Communication to a file

Parameters:
  • communication (Communication) – communication to write
  • communication_filename (str) – path of file to write to
concrete.util.file_io.write_thrift_to_file(thrift_obj, filename)

Write a Thrift object to a file

Parameters:
  • thrift_obj – Thrift object to write
  • filename (str) – path of file to write to
concrete.util.json_fu module

Convert Concrete objects to JSON strings

concrete.util.json_fu.communication_file_to_json(communication_filename, remove_timestamps=False, remove_uuids=False)

Get a “pretty-printed” JSON string representation for a Communication

Parameters:
  • communication_filename (str) – Communication filename
  • remove_timestamps (bool) – Flag for removing timestamps from JSON output
  • remove_uuids (bool) – Flag for removing UUID info from JSON output
Returns:

A “pretty-printed” JSON representation of the Communication

Return type:

str

concrete.util.json_fu.get_json_object_without_timestamps(json_object)

Create a copy of a JSON object created by json.loads(), with all representations of AnnotationMetadata timestamps (dictionary keys with value timestamp) recursively removed.

Parameters:json_object – Python object created from string by json.loads()
Returns:A copy of the input data structure with all timestamp objects removed
concrete.util.json_fu.get_json_object_without_uuids(json_object)

Create a copy of a JSON object created by json.loads(), with all representations of UUID objects (dictionaries containing a ‘uuidString’ key) recursively removed.

Parameters:json_object – Python object created from string by json.loads()
Returns:A copy of the input data structure with all UUID objects removed
concrete.util.json_fu.thrift_to_json(tobj, remove_timestamps=False, remove_uuids=False)

Get a “pretty-printed” JSON string representation for a Thrift object

Parameters:
  • tobj – A Thrift object
  • remove_timestamps (bool) – Flag for removing timestamps from JSON output
  • remove_uuids (bool) – Flag for removing UUID info from JSON output
Returns:

A “pretty-printed” JSON representation of the Thrift object

Return type:

str

concrete.util.json_fu.tokenlattice_file_to_json(toklat_filename)

Get a “pretty-printed” JSON string representation for a TokenLattice

Parameters:toklat_filename (str) – String specifying TokenLattice filename
Returns:A “pretty-printed” JSON representation of the TokenLattice
Return type:str
concrete.util.learn_wrapper module
class concrete.util.learn_wrapper.ActiveLearnerClientClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/learn/ActiveLearnerClientService.pyc'>
class concrete.util.learn_wrapper.ActiveLearnerClientServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/learn/ActiveLearnerClientService.pyc'>
class concrete.util.learn_wrapper.ActiveLearnerServerClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/learn/ActiveLearnerServerService.pyc'>
class concrete.util.learn_wrapper.ActiveLearnerServerServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/learn/ActiveLearnerServerService.pyc'>
class concrete.util.learn_wrapper.SubprocessActiveLearnerClientServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of ActiveLearnerClientServiceWrapper

class concrete.util.learn_wrapper.SubprocessActiveLearnerServerServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of ActiveLearnerServerServiceWrapper

concrete.util.locale module
concrete.util.locale.set_stdout_encoding()

Force stdout encoding to utf-8. Ideally the user should set the output encoding to utf-8 (or otherwise) in their environment, as explained on the internet, but in practice it has been difficult to get that right (and scripts writing to stdout have broken).

concrete.util.mem_io module
concrete.util.mem_io.communication_deep_copy(comm)

Return deep copy of communication.

Parameters:comm (Communication) – communication to copy
Returns:deep copy of comm
Return type:Communication
concrete.util.mem_io.read_communication_from_buffer(buf, add_references=True)

Deserialize buf (a binary string) and return resulting communication. Add references if requested.

Parameters:
Returns:

Communication read from buffer

Return type:

Communication

concrete.util.mem_io.write_communication_to_buffer(comm)

Serialize communication to buffer (binary string) and return buffer.

Parameters:comm (Communication) – communication to serialize
Returns:Communication read from buffer
Return type:Communication
concrete.util.metadata module
exception concrete.util.metadata.MultipleAnnotationsError(*args, **kwargs)

Bases: exceptions.Exception

Exception representing more than one annotations present in a concrete object when one (or zero) is expected.

exception concrete.util.metadata.ZeroAnnotationsError(*args, **kwargs)

Bases: exceptions.Exception

Exception representing zero annotations present in a concrete object when one (or more) is expected.

concrete.util.metadata.datetime_to_timestamp(dt)

Given time-zone–unaware datetime object representing date and time in UTC, return corresponding Concrete timestamp.

Parameters:dt (datetime) – time-zone–unaware datetime object representing date and time (in UTC) to convert
Returns:concrete timestamp representing datetime dt

Source: http://stackoverflow.com/questions/6999726/how-can-i-convert-a-datetime-object-to-milliseconds-since-epoch-unix-time-in-p

concrete.util.metadata.filter_annotations(annotations, filter_fields=None, sort_field=None, sort_reverse=False, action_if_multiple=u'pass', action_if_zero=u'pass')

Return filtered and/or re-ordered list of annotations, that is, objects containing a metadata field of type AnnotationMetadata. The default behavior is to do no filtering (or re-ordering), returning an exact copy of annotations.

Parameters:
  • annotations (list) – original list of annotations (objects containing a metadata field of type metadata.ttypes.AnnotationMetadata). This list is not modified.
  • filter_fields (dict) – dict of fields and their desired values by which to filter annotations (keep annotations whose field FIELD equals VALUE for all FIELD: VALUE) entries). Default: keep all annotations. See get_annotation_field() for valid fields.
  • sort_field (str) – field by which to re-order annotations. Default: do not re-order annotations.
  • sort_reverse (bool) – True to reverse order of annotations (after sorting, if any).
  • action_if_multiple (str) – action to take if, after filtering, there is more than one annotation left. ‘pass’ to return all filtered and re-ordered annotations, ‘raise’ to raise an exception of type MultipleAnnotationsError, ‘first’ to return a list containing the first annotation after filtering and re-ordering, or ‘last’ to return a list containing the last annotation after filtering and re-ordering.
  • action_if_zero (str) – action to take if, after filtering, there are no annotations left. ‘pass’ to return an empty list, ‘raise’ to raise an exception of type ZeroAnnotationsError.
Returns:

filtered and/or re-ordered list of annotations

Raises:
  • ValueError – if the value of action_if_multiple or action_if_zero is not recognized
  • MultipleAnnotationsError – if the value of action_if_multiple is ‘raise’ and there are multiple annotations passing the filter
  • ZeroAnnotationsError – if the value of action_if_zero is ‘raise’ and there are no annotations passing the filter
concrete.util.metadata.filter_annotations_json(annotations, kwargs_json)

Call filter_annotations() on annotations, sending it keyword arguments from the JSON-encoded dictionary kwargs_json.

Parameters:
  • annotations (list) – original list of annotations (objects containing a metadata field of type metadata.ttypes.AnnotationMetadata). This list is not modified.
  • kwargs_json (str) – JSON-encoded dictionary of keyword arguments to be passed to filter_annotations().
Returns:

annotations filtered by filter_annotations() according to provided JSON-encoded keyword arguments.

Raises:
  • ValueError – if the value of ‘action_if_multiple’ or ‘action_if_zero’ is not recognized
  • MultipleAnnotationsError – if the value of ‘action_if_multiple’ is ‘raise’ and there are multiple annotations passing the filter
  • ZeroAnnotationsError – if the value of ‘action_if_zero’ is ‘raise’ and there are no annotations passing the filter
concrete.util.metadata.filter_unnone(annotation_filter)

If annotation_filter is None, return no-op filter.

Parameters:annotation_filter (func) – function that takes a list of annotations and returns a filtered (and/or re-ordered) list of annotations
Returns:function that takes a list of annotations and returns a filtered (and/or re-ordered) list of annotations.
concrete.util.metadata.get_annotation_field(annotation, field)

Return requested field of annotation metadata.

Parameters:
  • annotation (object) – object containing a metadata field of type metadata.ttypes.AnnotationMetadata.
  • field (str) – name of metadata field: kBest, timestamp, or tool.
Returns:

value of requested field in annotation metadata.

Raises:

ValueError – on unknown field name

concrete.util.metadata.get_index_of_tool(lst_of_conc, tool)

Return the index of the object in the provided list whose tool name matches tool.

If tool is None, return the first valid index into lst_of_conc.

This returns -1 if:
  • lst_of_conc is None, or
  • lst_of_conc has no entries, or
  • no object in lst_of_conc matches tool.
Parameters:
  • lst_of_conc (list) – list of Concrete objects, each of which has a .metadata field.
  • tool (str) – A tool name to match.
concrete.util.metadata.now_timestamp()

Return timestamp representing the current time.

Returns:concrete timestamp representing the current time
concrete.util.metadata.timestamp_to_datetime(timestamp)

Given Concrete timestamp, return corresponding time-zone–unaware datetime object representing date and time in UTC.

Parameters:timestamp (int) – Concrete timestamp (integer representing seconds since the epoch in UTC) representing date and time to convert
Returns:datetime representing timestamp dt

Source: https://stackoverflow.com/questions/3694487/initialize-a-datetime-object-with-seconds-since-epoch

concrete.util.metadata.tool_to_filter(tool, explicit_filter=None)

Given tool name (deprecated way to filter annotations) or None, and an explicit annotation filter function or None, return an annotation filter function representing whichever is not None (and raise ValueError if both are not None).

Parameters:
  • tool (str) – name of tool to filter by, or None
  • explicit_filter (func) – function taking a list of annotations as input and returning a sub-list (possibly re-ordered) as output, or None
Returns:

Function taking a list of annotations as input and either applying explicit_filter to them and returning its output or filtering them by tool tool and returning that filtered list. If both tool and explicit_filter are not None, raise ValueError.

Raises:

ValueError – if both tool and explicit_filter are not None

concrete.util.net module
concrete.util.net.find_port()

Find and return an available TCP port.

Returns:an unused TCP port (an integer)
concrete.util.redis_io module
class concrete.util.redis_io.RedisCommunicationReader(redis_db, key, add_references=True, **kwargs)

Bases: concrete.util.redis_io.RedisReader

Iterable class for reading one or more Communications from redis. See RedisReader for further description.

Example usage:

from redis import Redis
redis_db = Redis(port=12345)
for comm in RedisCommunicationReader(redis_db, 'my-set-key'):
    do_something(comm)

Create communication reader for specified key in specified redis_db.

Parameters:
  • redis_db (redis.Redis) – Redis database connection object
  • key (str) – name of redis key containing your communication(s)
  • add_references (bool) – True to fill in members in the communication according to UUID relationships (see concrete.util.add_references), False to return communication as-is (note: you may need this False if you are dealing with incomplete communications)

All other keyword arguments are passed through to RedisReader; see RedisReader for a description of those arguments.

Raises:Exception – if deserialize_func is specified (it is set to the appropriate concrete deserializer internally)
class concrete.util.redis_io.RedisCommunicationWriter(redis_db, key, uuid_hash_key=False, **kwargs)

Bases: concrete.util.redis_io.RedisWriter

Class for writing one or more Communications to redis. See RedisWriter for further description.

Example usage:

from redis import Redis redis_db = Redis(port=12345) w = RedisCommunicationWriter(redis_db, ‘my-set-key’) w.write(comm)

Create communication writer for specified key in specified redis_db.

Parameters:
  • redis_db (redis.Redis) – Redis database connection object
  • key (str) – name of redis key containing your communication(s)
  • uuid_hash_key (bool) – True to use the UUID as the hash key for a communication, False to use the id

All other keyword arguments are passed through to RedisWriter; see RedisWriter for a description of those arguments.

Raises:Exception – if serialize_func is specified (it is set to the appropriate concrete serializer internally), or if hash_key_func is specified (it is set to an appropriate function internally)
class concrete.util.redis_io.RedisReader(redis_db, key, key_type=None, pop=False, block=False, right_to_left=True, block_timeout=0, temp_key_ttl=3600, temp_key_leaf_len=32, cycle_list=False, deserialize_func=None)

Bases: object

Iterable class for reading one or more objects from redis.

Supported input types are:

  • a set containing zero or more objects
  • a list containing zero or more objects
  • a hash containing zero or more key-object pairs

For list and set types, the reader can optionally pop (consume) its input; for lists only, the reader can moreover block on the input.

Note that iteration over a set or hash will create a temporary key in the redis database to maintain a set of elements scanned so far.

If pop is False and the key (in the database) is modified during iteration, behavior is undefined. If pop is True, modifications during iteration are encouraged.

Example usage:

from redis import Redis
redis_db = Redis(port=12345)
for obj in RedisReader(redis_db, 'my-set-key'):
    do_something(obj)

Create reader for specified key in specified redis_db.

Parameters:
  • redis_db (redis.Redis) – Redis database connection object
  • key (str) – name of redis key containing your object(s)
  • key_type (str) – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists, so probably not suitable for block and/or pop modes)
  • pop (bool) – True to remove objects from redis as we iterate over them, and False to leave redis unaltered
  • block (bool) – True to block for data (i.e., wait for something to be added to the list if it is empty), False to end iteration when there is no more data
  • right_to_left (bool) – True to iterate over and index in lists from right to left, False to iterate/index from left to right
  • deserialize_func (func) – maps blobs from redis to some more friendly representation (e.g., if all your items are unicode strings, you might want to specify lambda s: s.decode(‘utf-8’)); return blobs unchanged if deserialize_func is None
  • block_timeout (int) – number of seconds to block during operations if block is True; if 0, block forever
  • temp_key_ttl (int) – time-to-live (in seconds) of temporary keys created during scans (amount of time to process a batch of items returned by a scan should be much less than the time-to-live of the temporary key, or duplicate items will be returned)
  • temp_key_leaf_len (int) – length (in bytes) of random part of temporary key (longer is less likely to cause conflicts with other processes but slower)
  • cycle_list (bool) – iterate over list by popping items from the right end and pushing them onto the left end (atomically), note iteration thus modifies the list (although a full iteration ultimately leaves the list in the same state as it began)
Raises:
  • Exception – if key_type is None but the key does not exist in the database (so its type cannot be guessed)
  • ValueError – if key type is not recognized or the options that were specified are not supported for a recognized key type
batch(n)

Return a batch of n objects. May be faster than one-at-a-time iteration, but currently only supported for non-popping, non-blocking set configurations. Support for popping, non-blocking sets is planned; see http://redis.io/commands/spop .

Parameters:n (int) – number of objects to return
Raises:Exception – if key type is not a set, or if it is a set but popping or blocking operation is specified
class concrete.util.redis_io.RedisWriter(redis_db, key, key_type=None, right_to_left=True, serialize_func=None, hash_key_func=None)

Bases: object

Class for writing one or more objects to redis.

Supported input types are:

  • a set of objects
  • a list of objects
  • a hash of key-object pairs

Example usage:

from redis import Redis redis_db = Redis(port=12345) w = RedisWriter(redis_db, ‘my-set-key’) w.write(obj)

Create object writer for specified key in specified redis_db.

Parameters:
  • redis_db (redis.Redis) – Redis database connection object
  • key (str) – name of redis key containing your object(s)
  • key_type (str) – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists)
  • right_to_left (bool) – True to write elements to the left end of lists, False to write to the right end
  • serialize_func (func) – maps objects to blobs before sending to Redis (e.g., if everything you write will be a unicode string, you might want to use lambda u: u.encode(‘utf-8’)); pass objects to Redis unchanged if serialize_func is None
  • hash_key_func (func) – maps objects to keys when key_type is hash (None: use Python’s hash function)
clear()

Remove all data from redis data structure.

write(obj)

Write object obj to redis data structure.

Parameters:
  • obj (object) – object to be serialized by
  • and written to database, according (self.serialize_func) –
  • key type (to) –
Raises:

Exception – if called on redis key type that is not a list, set, or hash

concrete.util.redis_io.read_communication_from_redis_key(redis_db, key, add_references=True)

Return a serialized communication from a string key. If block is True, poll server until key appears at specified interval or until specified timeout (indefinitely if timeout is zero). Return None if block is False and key does not exist or if block is True and key does not exist after specified timeout.

Parameters:
concrete.util.redis_io.write_communication_to_redis_key(redis_db, key, comm)

Serialize communication and store result in redis key.

Parameters:
  • redis_db (redis.Redis) – Redis database connection object
  • key (str) – name of simple (string) redis key to write communication to
  • comm (Communication) – communication to serialize
concrete.util.references module

Add reference variables for each UUID “pointer” in a Communication

concrete.util.references.add_references_to_communication(comm)

Create references for each UUID ‘pointer’

Parameters:comm (Communication) – A Concrete Communication object, will be modified by this function

The Concrete schema uses UUID objects as internal pointers between Concrete objects. This function adds member variables to Concrete objects that are references to the Concrete objects identified by the UUID.

For example, each Entity has a mentionIdlist that lists the UUIDs of the EntityMention objects for that Entity. This function adds a mentionList variable to the Entity that is a list of references to the actual EntityMention objects. This allows you to access the EntityMention objects using:

entity.mentionList

This function adds these reference variables:

And adds these lists of reference variables:

For variables that represent optional lists of UUID objects (e.g. situation.mentionIdList), Python Thrift will set the variable to None if the list is not provided. When this function adds a list-of-references variable (in this case, situation.mentionList) for an omitted optional list, it sets the new variable to None - it DOES NOT leave the variable undefined.

concrete.util.results_wrapper module
class concrete.util.results_wrapper.ResultsServerClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/services/results/ResultsServerService.pyc'>
class concrete.util.results_wrapper.ResultsServerServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/services/results/ResultsServerService.pyc'>
class concrete.util.results_wrapper.SubprocessResultsServerServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of ResultsServerServiceWrapper

concrete.util.search_wrapper module
class concrete.util.search_wrapper.FeedbackClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/search/FeedbackService.pyc'>
class concrete.util.search_wrapper.FeedbackServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/search/FeedbackService.pyc'>
class concrete.util.search_wrapper.SearchClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/search/SearchService.pyc'>
class concrete.util.search_wrapper.SearchProxyClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/search/SearchProxyService.pyc'>
class concrete.util.search_wrapper.SearchProxyServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/search/SearchProxyService.pyc'>
class concrete.util.search_wrapper.SearchServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/search/SearchService.pyc'>
class concrete.util.search_wrapper.SubprocessFeedbackServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of FeedbackServiceWrapper

class concrete.util.search_wrapper.SubprocessSearchProxyServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of SearchProxyServiceWrapper

class concrete.util.search_wrapper.SubprocessSearchServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of SearchServiceWrapper

concrete.util.service_wrapper module
class concrete.util.service_wrapper.ConcreteServiceClientWrapper(host, port)

Bases: object

Base class for a wrapper around a Concrete service client. Implements the context manager interface so client can be controlled using the with: statement (client connection is closed when the with: scope is exited).

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
class concrete.util.service_wrapper.ConcreteServiceWrapper(implementation)

Bases: object

Base class for a wrapper around a Concrete service that runs in (blocks) the current process.

Parameters:implementation (object) – handler of specified concrete service
serve(host, port)

Serve on specified host and port in current process, blocking until server is killed. (If server is not killed by signal or otherwise it will block forever.)

Parameters:
  • host (str) – hostname to serve on
  • port (int) – port number to serve on
class concrete.util.service_wrapper.SubprocessConcreteServiceWrapper(implementation, host, port, timeout=None)

Bases: object

Base class for a wrapper around a Concrete service that runs in a subprocess; implements the context manager interface so subprocess can be controlled using the with: statement (subprocess is stopped and joined when the with: scope is exited).

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
SLEEP_INTERVAL = 0.1
concrete.util.simple_comm module

Create a simple (valid) Communication suitable for testing purposes

class concrete.util.simple_comm.SimpleCommTempFile(n=10, id_fmt=u'temp-%d', sentence_fmt=u'Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix=u'.concrete')

Bases: object

DEPRECATED. Please use create_comm() instead.

Class representing a temporary file of sample concrete objects. Designed to facilitate testing.

path

str – path to file

communications

Communication[] – List of communications that were written to file

Usage:

from concrete.util import CommunicationReader
with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f:
    reader = CommunicationReader(f.path)
    for (orig_comm, comm_path_pair) in zip(f.communications, reader):
        print(orig_comm.id)
        print(orig_comm.id == comm_path_pair[0].id)
        print(f.path == comm_path_pair[1])

Create temp file and write communications.

Parameters:
  • n – i number of communications to write
  • id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
  • sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
  • writer_class – CommunicationWriter or CommunicationWriterTGZ
  • suffix – file path suffix (you probably want to choose this to match writer_class)
concrete.util.simple_comm.add_annotation_level_argparse_argument(parser)

Add an ‘–annotation-level’ argument to an ArgumentParser

The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See create_comm() for details.

Parameters:parser (argparse.ArgumentParser) – the parser to add the argument to
concrete.util.simple_comm.create_comm(comm_id, text=u'', comm_type=u'article', section_kind=u'passage', metadata_tool=u'concrete-python', metadata_timestamp=None, annotation_level=u'token')

Create a simple, valid Communication from text.

By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections. Each section will be created with a call to create_section().

annotation_level controls the amount of annotation that is added:

  • AL_NONE: add no optional annotations (not even sections)
  • AL_SECTION: add sections but not sentences
  • AL_SENTENCE: add sentences but not tokens
  • AL_TOKEN: add all annotations, up to tokens (the default)
Parameters:
  • comm_id (str) – Communication id
  • text (str) – Communication text
  • comm_type (str) – Communication type
  • section_kind (str) – Section kind to set on all sections
  • metadata_tool (str) – tool name of analytic that generated this text
  • metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used.
  • annotation_level (str) – string representing annotation level to add to communication (see above)
Returns:

Communication containing given text and metadata

concrete.util.simple_comm.create_section(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Section from provided text and metadata. Section text will be split into sentence texts by newlines and each sentence will be created with a call to create_sentence().

Lower-level routine (called by create_comm()).

Parameters:
  • sec_text (str) – text to create section from
  • sec_start (int) – starting position of section in Communication text (inclusive)
  • sec_end (int) – ending position of section in Communication text (inclusive)
  • section_kind (str) – value for Section.kind field to be set to
  • aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this section
  • metadata_tool (str) – tool name of the analytic that generated this section
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Concrete Section containing given text and metadata

concrete.util.simple_comm.create_sentence(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Sentence from provided text and metadata.

Lower-level routine (called indirectly by create_comm())

Parameters:
  • sen_text (str) – text to create sentence from
  • sen_start (int) – starting position of sentence in Communication text (inclusive)
  • sen_end (int) – ending position of sentence in Communication text (inclusive)
  • aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this sentence
  • metadata_tool (str) – tool name of the analytic that generated this sentence
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Concrete Sentence containing given text and metadata

concrete.util.simple_comm.create_simple_comm(comm_id, sentence_string=u'Super simple sentence .')

Create a simple (valid) Communication suitable for testing purposes

The Communication will have a single Section containing a single Sentence.

Parameters:
  • comm_id (str) – Communication id
  • sentence_string (str) – Communication text
Returns:

Communication containing given text and having the given id

concrete.util.summarization_wrapper module
class concrete.util.summarization_wrapper.SubprocessSummarizationServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

Parameters:
  • implementation (object) – handler of specified concrete service
  • host (str) – hostname that will be served on when context is entered
  • port (int) – port number that will be served on when context is entered
  • timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
concrete_service_wrapper_class

alias of SummarizationServiceWrapper

class concrete.util.summarization_wrapper.SummarizationClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

Parameters:
  • host (str) – hostname to connect to
  • port (int) – port number to connect to
concrete_service_class = <module 'concrete.summarization.SummarizationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/summarization/SummarizationService.pyc'>
class concrete.util.summarization_wrapper.SummarizationServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

Parameters:implementation (object) – handler of specified concrete service
concrete_service_class = <module 'concrete.summarization.SummarizationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.13.2/lib/python2.7/site-packages/concrete-4.13.2-py2.7.egg/concrete/summarization/SummarizationService.pyc'>
concrete.util.thrift_factory module
class concrete.util.thrift_factory.ThriftFactory(transportFactory, protocolFactory)

Bases: object

Abstract factory to create Thrift objects for client and server.

createProtocol(transport)

Return new thrift protocol on transport.

Parameters:transport (TTransport.TTransport) – transport to create protocol on
Returns:TTransport.TTransport
createServer(processor, host, port)

Return new thrift server given a service handler and the server host and port.

Parameters:
  • processor – concrete service handler
  • host (str) – hostname to serve on
  • port (int) – port number to serve on
Returns:

TServer.TThreadedServer

createSocket(host, port)

Return new thrift socket.

Parameters:
  • host (str) – hostname to create socket on
  • port (int) – port number to create socket on
Returns:

TSocket.TSocket

createTransport(socket)

Return new thrift transport on socket..

Parameters:socket (TSocket.TSocket) – socket to create transport on
Returns:TSocket.TSocket
concrete.util.thrift_factory.is_accelerated()

Return whether this concrete-python installation has accelerated serialization.

Returns:True if this concrete-python installation is accelerated, False otherwise
concrete.util.tokenization module
exception concrete.util.tokenization.NoSuchTokenTagging(*args, **kwargs)

Bases: exceptions.Exception

Exception representing there is no TokenTagging annotation that matches the given criteria in a given concrete object

concrete.util.tokenization.compute_lattice_expected_counts(lattice)

Given a TokenLattice in which the dst, src, token, and weight fields are set in each arc, compute and return a list of expected token log-probabilities.

Input arc weights are treated as unnormalized log-probabilities.

Parameters:lattice (TokenLattice) – lattice to compute expected counts for
Returns:List of floats (expected log-probabilities) with the float at position i corresponding to the token with tokenIndex i.
concrete.util.tokenization.flatten(a)

Returned flattened version of input list.

Parameters:a (list) –
Returns:Flattened list
Return type:list
concrete.util.tokenization.get_comm_tokenizations(comm, tool=None)

Get list of Tokenization objects in a Communication

Parameters:
  • comm (Communication) – communications to extract tokenizations from
  • tool (str) – If not None, only return Tokenization objects whose metadata.tool field is equal to tool
Returns:

List of Tokenization objects

concrete.util.tokenization.get_comm_tokens(comm, sect_pred=None, suppress_warnings=False)

Get list of Token objects in Communication.

Parameters:
  • comm (Communication) – communications to extract tokens from
  • sect_pred (function) – Function that takes a Section and returns false if the Section should be excluded.
  • suppress_warnings (bool) – True to suppress warning messages that Tokenization.kind is None
Returns:

List of Token objects in Communication, delegating to get_tokens() for each sentence.

concrete.util.tokenization.get_lemmas(t, tool=None)

Returns the result of get_tagged_tokens() with a tagging_type of “LEMMA”

Parameters:
  • t (Tokenization) – tokenization to extract tagged tokens from
  • tool (str) – If not None, only return tagged tokens for TokenTagging objects whose metadata.tool field is equal to tool
Returns:

list of ‘LEMMA’-tagged tokens matching tool (if specified)

concrete.util.tokenization.get_ner(t, tool=None)

Returns the result of get_tagged_tokens() with a tagging_type of “NER”

Parameters:
  • t (Tokenization) – tokenization to extract tagged tokens from
  • tool (str) – If not None, only return tagged tokens for TokenTagging objects whose metadata.tool field is equal to tool
Returns:

list of ‘NER’-tagged tokens matching tool (if specified)

concrete.util.tokenization.get_pos(t, tool=None)

Returns the result of get_tagged_tokens() with a tagging_type of “LEMMA”

Parameters:
  • t (Tokenization) – tokenization to extract tagged tokens from
  • tool (str) – If not None, only return tagged tokens for TokenTagging objects whose metadata.tool field is equal to tool
Returns:

list of ‘POS’-tagged tokens matching tool (if specified)

concrete.util.tokenization.get_tagged_tokens(tokenization, tagging_type, tool=None)

Return list of TaggedToken objects of taggingType equal to tagging_type, if there is a unique choice.

Parameters:
  • tokenization (Tokenization) – tokenization to return tagged tokens for
  • tagging_type (str) – only return tagged tokens for TokenTagging objects whose taggingType field is equal to tagging_type
  • tool (str) – If not None, only return tagged tokens for TokenTagging objects whose metadata.tool field is equal to tool
Returns:

List of TaggedToken objects of taggingType equal to tagging_type, if there is a unique choice.

Raises:
  • NoSuchTokenTagging – if there is no matching tagging
  • Exception – if there is more than one matching tagging.
concrete.util.tokenization.get_token_taggings(tokenization, tagging_type, case_sensitive=False)

Return list of TokenTagging objects of taggingType equal to tagging_type.

Parameters:
  • tokenization (Tokenization) – tokenization from which taggings will be selected
  • tagging_type (str) – value of taggingType to filter to
  • case_sensitive (bool) – True to do case-sensitive matching on taggingType.
Returns:

List of TokenTagging objects of taggingType equal to tagging_type, in same order as they appeared in the tokenization.

concrete.util.tokenization.get_tokenizations(comm, tool=None)

Returns a flat list of all Tokenization objects in a Communication

Parameters:
  • comm (Communication) – communication to get tokenizations from
  • tool (str) – if not None, return only tokenizations whose metadata.tool field matches tool
Returns:

A list of all Tokenization objects within the Communication matching tool (if it is not None)

concrete.util.tokenization.get_tokens(tokenization, suppress_warnings=False)

Get list of Token objects for a Tokenization

Return list of Tokens from lattice.cachedBestPath, if Tokenization kind is TOKEN_LATTICE; else, return list of Tokens from tokenList.

Warn and return list of Tokens from tokenList if kind is not set.

Return None if kind is set but the respective data fields are not.

Parameters:
  • tokenization (Tokenization) – tokenization to extract tokens from
  • suppress_warnings (bool) – True to suppress warning messages that tokenization.kind is None
Returns:

List of Token objects, or None

Raises:

ValueError – if tokenization.kind is not a recognized tokenization kind

concrete.util.tokenization.plus(x, y)

Return concatenation of two lists.

Parameters:
  • x (list) –
  • y (list) –
Returns:

list concatenation of x and y

concrete.util.twitter module

Convert between JSON and Concrete representations of Tweets

The JSON fields used by the Twitter API are documented at:

concrete.util.twitter.capture_tweet_lid(tweet)

Reads the lang field from a tweet from the twitter API, if it exists, and return corresponding concrete LanguageIdentification object.

Parameters:tweet (dict) – Object created by deserializing a JSON Tweet string
Returns:LanguageIdentification object, or None if the lang field is not present in the Tweet JSON
concrete.util.twitter.json_tweet_object_to_Communication(tweet)

Convert deserialized JSON Tweet object to Communication

Parameters:tweet (object) – Object created by deserializing a JSON Tweet string
Returns:Communication representing the Tweet, with tweetInfo and text fields set (among others) but with a null (None) sectionList.
Return type:Communication
concrete.util.twitter.json_tweet_object_to_TweetInfo(tweet)

Create TweetInfo object from deserialized JSON Tweet object

Parameters:tweet (dict) – Object created by deserializing a JSON Tweet string
Returns:concrete object representing twitter metadata from tweet
Return type:TweetInfo
concrete.util.twitter.json_tweet_string_to_Communication(json_tweet_string, check_empty=False, check_delete=False)

Convert JSON Tweet string to Communication

Parameters:
  • json_tweet_string (str) – JSON Tweet string from Twitter API
  • check_empty (bool) – If True, check if json_tweet_string is empty (return None if it is)
  • check_delete (bool) – If True, check for presence of delete field in Tweet JSON, and if the ‘delete’ field is present, return None
Returns:

Communication representing the Tweet, with tweetInfo and text fields set (among others) but with a null (None) sectionList.

Return type:

Communication

concrete.util.twitter.json_tweet_string_to_TweetInfo(json_tweet_string)

Create TweetInfo object from JSON Tweet string

Parameters:json_tweet_string (str) – JSON Tweet string from Twitter API
Returns:concrete twitter metadata object with fields set from json_tweet_string
Return type:TweetInfo
concrete.util.twitter.snake_case_to_camelcase(value)

Converts snake case to camel case

Implementation copied from this Stack Overflow post: http://goo.gl/SSgo9k

Parameters:value (str) – snake case (lower case with underscores) value
Returns:camel case string corresponding to value (with isolated unscores stripped and sequences of two or more underscores reduced by one underscore)
Return type:str
concrete.util.twitter.twitter_lid_to_iso639_3(twitter_lid)

Convert Twitter Language ID string to ISO639-3 code

Ref: https://dev.twitter.com/rest/reference/get/help/languages

Parameters:twitter_lid (str) – This can be an iso639-3 code (no-op), iso639-1 2-letter abbr (converted to 3), or combo (split by ‘-‘, then first part converted)
Returns:the ISO639-3 code corresponding to twitter_lid
Return type:str
concrete.util.unnone module
concrete.util.unnone.dun(d)

If l is None return an empty dict, else return l. Simplifies iteration over dict fields that might be unset.

Parameters:d (dict) – input dict (or None)
Return
d, or an empty dict if d is None
concrete.util.unnone.lun(l)

If l is None return an empty list, else return l. Simplifies iteration over list fields that might be unset.

Parameters:l (list) – input list (or None)
Return
l, or an empty list if l is None
concrete.util.unnone.sun(s)

If l is None return an empty set, else return l. Simplifies iteration over set fields that might be unset.

Parameters:s (set) – input set (or None)
Return
s, or an empty set if s is None

concrete.validate module

Library to validate a Concrete Communication

Validation info, error and warning messages are logged using the Python standard library’s logging module.

concrete.validate.validate_communication(comm)

Test if all objects in a Communication are valid.

Calls validate_thrift_deep() to check for Concrete data structure fields that are required by the Concrete Thrift definitions. Then calls:

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_communication_file(communication_filename)

Test if the Communication in a file is valid

Deserializes a Communication file into memory, then calls validate_communication() on the Communication object.

Parameters:communication_filename (str) – Name of file containing
Returns:bool
concrete.validate.validate_constituency_parses(comm, tokenization)

Test a Tokenization‘s constituency Parse objects.

Verifies that, for each constituent Parse:

  • none of the constituent IDs for the parse repeat
  • the parse tree is a fully connected graph
  • the parse “tree” is really a tree data structure
Parameters:
Returns:

bool

concrete.validate.validate_dependency_parses(tokenization)

Test a Tokenization‘s DependencyParse objects

Verifies that, for each DependencyParse:

  • the parse is a fully connected graph
  • there are no nodes with a null governer node whose edgeType is not root
Parameters:tokenization (Tokenization) –
Returns:bool
concrete.validate.validate_entity_mention_ids(comm)

Test if all Entity mentionIds are valid

Checks if all Entity mentionId UUID‘s refer to a EntityMention UUID that exists in the Communication

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_entity_mention_token_ref_sequences(comm)

Test if all EntityMention objects have a valid TokenRefSequences

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_entity_mention_tokenization_ids(comm)

Test tokenizationID field of every EntityMention

Verifies that, for each EntityMention, the entityMention.tokens.tokenizationId UUID field matches the UUID of a Tokenization that exists in this Communication

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_situation_mentions(comm)

Test every SituationMention in the Communication

A SituationMention has a list of MentionArgument objects, and each MentionArgument can point to an EntityMention, SituationMention or TokenRefSequence.

Checks that each MentionArgument points to only one type of argument. Also checks validity of all EntityMention and SituationMention UUID‘s.

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_situations(comm)

Test every Situation in the Communication

Checks the validity of all EntityMention and SituationMention UUID‘s referenced by each Situation.

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_thrift(thrift_object, indent_level=0)

Test if a Thrift object has all required fields.

This function calls the Thrift object’s validate() function. If an exception is raised because of missing required fields, the function catches the exception and logs the exception’s error message using the Python Standard Library’s logging module.

Parameters:
  • thrift_object
  • indent_level (int) – Text indentation level for logging error message
Returns:

bool

concrete.validate.validate_thrift_deep(thrift_object, valid=True)

Deep validation of thrift messages.

Parameters:thrift_object – a Thrift object

The Python version of Thrift 0.9.1 does not support deep (recursive) validation, and none of the Thrift serialization/deserialization code calls even the shallow validation functions provided by Thrift.

This function implements deep validation. The code is adapted from:

See this blog post for more information:

A patch to implement deep validation was submitted to the Thrift repository in February of 2013:

but Thrift 0.9.1 - which was released on 2013-08-21 - does not include this functionality.

concrete.validate.validate_thrift_object_required_fields(thrift_object, indent_level=0)

DEPRECATED: Use validate_thrift() instead

concrete.validate.validate_thrift_object_required_fields_recursively(thrift_object, valid=True)

DEPRECATED. Use validate_thrift_deep() instead.

concrete.validate.validate_token_offsets_for_section(section)

Test if the TextSpan boundaries for all Sentence objects in a Section fall within the boundaries of the Section‘s TextSpan

Parameters:section (Section) –
Returns:bool
concrete.validate.validate_token_offsets_for_sentence(sentence)

Test if the TextSpan boundaries for all Token objects` in a Sentence fall within the boundaries of the Sentence‘s TextSpan.

Parameters:sentence (Sentence) –
Returns:bool
concrete.validate.validate_token_ref_sequence(comm, token_ref_sequence)

Check if a TokenRefSequence is valid

Verify that all token indices in the TokenRefSequence point to actual token indices in corresponding Tokenization

Parameters:
Returns:

bool

concrete.validate.validate_token_taggings(tokenization)

Test if a Tokenization has any TokenTagging objects with invalid token indices

Parameters:tokenization (Tokenization) –
Returns:bool

concrete.version module

concrete.version.add_argparse_argument(parser)
concrete.version.concrete_library_version()
concrete.version.concrete_schema_version()

Low-level interface (Concrete schema)

Note that all data types defined by the Concrete schema—except for services—can be imported directly from the top-level concrete package. For example, instead of from concrete.communication.ttypes import Communication you can write from concrete import Communication.

concrete.access package

concrete.access.FetchCommunicationService module
class concrete.access.FetchCommunicationService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.access.FetchCommunicationService.Iface


Service to fetch particular communications.

fetch(request)

Parameters:
- request

getCommunicationCount()

Get the number of Communications this service searches over. Implementations
that do not provide this should throw an exception.

getCommunicationIDs(offset, count)

Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementations
that do not provide this should throw an exception.

Parameters:
- offset
- count

recv_fetch()
recv_getCommunicationCount()
recv_getCommunicationIDs()
send_fetch(request)
send_getCommunicationCount()
send_getCommunicationIDs(offset, count)
class concrete.access.FetchCommunicationService.Iface

Bases: concrete.services.Service.Iface


Service to fetch particular communications.

fetch(request)

Parameters:
- request

getCommunicationCount()

Get the number of Communications this service searches over. Implementations
that do not provide this should throw an exception.

getCommunicationIDs(offset, count)

Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementations
that do not provide this should throw an exception.

Parameters:
- offset
- count

class concrete.access.FetchCommunicationService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.access.FetchCommunicationService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_fetch(seqid, iprot, oprot)
process_getCommunicationCount(seqid, iprot, oprot)
process_getCommunicationIDs(seqid, iprot, oprot)
class concrete.access.FetchCommunicationService.fetch_args(request=None)

Bases: object


Attributes:
- request

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.fetch_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationCount_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationCount_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationIDs_args(offset=None, count=None)

Bases: object


Attributes:
- offset
- count

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationIDs_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
concrete.access.StoreCommunicationService module
class concrete.access.StoreCommunicationService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.access.StoreCommunicationService.Iface


A service that exists so that clients can store Concrete data
structures to implementing servers.

Implement this if you are creating an analytic that wishes to
store its results back to a server. That server may perform
validation, write the new layers to a database, and so forth.

recv_store()
send_store(communication)
store(communication)

Store a communication to a server implementing this method.

The communication that is stored should contain the new
analytic layers you wish to append. You may also wish to call
methods that unset annotations you feel the receiver would not
find useful in order to reduce network overhead.

Parameters:
- communication

class concrete.access.StoreCommunicationService.Iface

Bases: concrete.services.Service.Iface


A service that exists so that clients can store Concrete data
structures to implementing servers.

Implement this if you are creating an analytic that wishes to
store its results back to a server. That server may perform
validation, write the new layers to a database, and so forth.

store(communication)

Store a communication to a server implementing this method.

The communication that is stored should contain the new
analytic layers you wish to append. You may also wish to call
methods that unset annotations you feel the receiver would not
find useful in order to reduce network overhead.

Parameters:
- communication

class concrete.access.StoreCommunicationService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.access.StoreCommunicationService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_store(seqid, iprot, oprot)
class concrete.access.StoreCommunicationService.store_args(communication=None)

Bases: object


Attributes:
- communication

read(iprot)
validate()
write(oprot)
class concrete.access.StoreCommunicationService.store_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.access.ttypes.FetchRequest(communicationIds=None, auths=None)

Bases: object


Struct representing a request for FetchCommunicationService.

Attributes:
- communicationIds: a list of Communication IDs
- auths: optional authorization mechanism

read(iprot)
validate()
write(oprot)
class concrete.access.ttypes.FetchResult(communications=None)

Bases: object


Struct containing Communications from the FetchCommunicationService service.

Attributes:
- communications: a list of Communication objects that represent the results of the request

read(iprot)
validate()
write(oprot)

concrete.annotate package

concrete.annotate.AnnotateCommunicationService module
class concrete.annotate.AnnotateCommunicationService.Client(iprot, oprot=None)

Bases: concrete.annotate.AnnotateCommunicationService.Iface


Annotator service methods. For concrete analytics that
are to be stood up as independent services, accessible
from any programming language.

annotate(original)

Main annotation method. Takes a communication as input
and returns a new one as output.

It is up to the implementing service to verify that
the input communication is valid.

Can throw a ConcreteThriftException upon error
(invalid input, analytic exception, etc.).

Parameters:
- original

getDocumentation()

Return a detailed description of what the particular tool
does, what inputs and outputs to expect, etc.

Developers whom are not familiar with the particular
analytic should be able to read this string and
understand the essential functions of the analytic.

getMetadata()

Return the tool’s AnnotationMetadata.

recv_annotate()
recv_getDocumentation()
recv_getMetadata()
send_annotate(original)
send_getDocumentation()
send_getMetadata()
send_shutdown()
shutdown()

Indicate to the server it should shut down.

class concrete.annotate.AnnotateCommunicationService.Iface

Bases: object


Annotator service methods. For concrete analytics that
are to be stood up as independent services, accessible
from any programming language.

annotate(original)

Main annotation method. Takes a communication as input
and returns a new one as output.

It is up to the implementing service to verify that
the input communication is valid.

Can throw a ConcreteThriftException upon error
(invalid input, analytic exception, etc.).

Parameters:
- original

getDocumentation()

Return a detailed description of what the particular tool
does, what inputs and outputs to expect, etc.

Developers whom are not familiar with the particular
analytic should be able to read this string and
understand the essential functions of the analytic.

getMetadata()

Return the tool’s AnnotationMetadata.

shutdown()

Indicate to the server it should shut down.

class concrete.annotate.AnnotateCommunicationService.Processor(handler)

Bases: concrete.annotate.AnnotateCommunicationService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_annotate(seqid, iprot, oprot)
process_getDocumentation(seqid, iprot, oprot)
process_getMetadata(seqid, iprot, oprot)
process_shutdown(seqid, iprot, oprot)
class concrete.annotate.AnnotateCommunicationService.annotate_args(original=None)

Bases: object


Attributes:
- original

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.annotate_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getDocumentation_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getDocumentation_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getMetadata_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getMetadata_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.shutdown_args

Bases: object

read(iprot)
validate()
write(oprot)

concrete.audio package

class concrete.audio.ttypes.Sound(wav=None, mp3=None, sph=None, path=None)

Bases: object


A sound wave. A separate optional field is defined for each
suppported format. Typically, a Sound object will only define
a single field.

Note: we may want to have separate fields for separate channels
(left vs right), etc.

Attributes:
- wav
- mp3
- sph
- path: An absolute path to a file on disk where the sound file can be
found. It is assumed that this path will be accessable from any
machine that the system is run on (i.e., it should be a shared
disk, or possibly a mirrored directory).

read(iprot)
validate()
write(oprot)

concrete.clustering package

class concrete.clustering.ttypes.Cluster(clusterMemberIndexList=None, confidenceList=None, childIndexList=None)

Bases: object


A set of items which are alike in some way. Has an implicit id which is the
index of this Cluster in its parent Clustering’s ‘clusterList’.

Attributes:
- clusterMemberIndexList: The items in this cluster. Values are indices into the
‘clusterMemberList’ of the Clustering which contains this Cluster.
- confidenceList: Co-indexed with ‘clusterMemberIndexList’. The i^{th} value represents the
confidence that mention clusterMemberIndexList[i] belongs to this cluster.
- childIndexList: A set of clusters (implicit ids/indices) from which this cluster was
created. This cluster should represent the union of all the items in all
of the child clusters. (For hierarchical clustering only).

read(iprot)
validate()
write(oprot)
class concrete.clustering.ttypes.ClusterMember(communicationId=None, setId=None, elementId=None)

Bases: object


An item being clustered. Does not designate cluster _membership_, as in
“item x belongs to cluster C”, but rather just the item (“x” in this
example). Membership is indicated through Cluster objects. An item may be a
Entity, EntityMention, Situation, SituationMention, or technically anything
with a UUID.

Attributes:
- communicationId: UUID of the Communication which contains the item specified by ‘elementId’.
This is ancillary info assuming UUIDs are indeed universally unique.
- setId: UUID of the Entity|Situation(Mention)Set which contains the item specified by ‘elementId’.
This is ancillary info assuming UUIDs are indeed universally unique.
- elementId: UUID of the EntityMention, Entity, SituationMention, or Situation that
this item represents. This is the characteristic field.

read(iprot)
validate()
write(oprot)
class concrete.clustering.ttypes.Clustering(uuid=None, metadata=None, clusterMemberList=None, clusterList=None, rootClusterIndexList=None)

Bases: object


An (optionally) hierarchical clustering of items appearing across a set of
Communications (intra-Communication clusterings are encoded by Entities and
Situations). An item may be a Entity, EntityMention, Situation,
SituationMention, or technically anything with a UUID.

Attributes:
- uuid: UUID for this Clustering object.
- metadata: Metadata for this Clustering object.
- clusterMemberList: The set of items being clustered.
- clusterList: Clusters of items. If this is a hierarchical clustering, this may contain
clusters which are the set of smaller clusters.
Clusters may not “overlap”, meaning (for all clusters X,Y):
X cap Y
eq emptyset implies X subset Y ee Y subset X
- rootClusterIndexList: A set of disjoint clusters (indices in ‘clusterList’) which cover all
items in ‘clusterMemberList’. This list must be specified for hierarchical
clusterings and should not be specified for flat clusterings.

read(iprot)
validate()
write(oprot)

concrete.communication package

class concrete.communication.ttypes.Communication(id=None, uuid=None, type=None, text=None, startTime=None, endTime=None, communicationTaggingList=None, metadata=None, keyValueMap=None, lidList=None, sectionList=None, entityMentionSetList=None, entitySetList=None, situationMentionSetList=None, situationSetList=None, originalText=None, sound=None, communicationMetadata=None)

Bases: object


A single communication instance, containing linguistic content
generated by a single speaker or author. This type is used for
both inter-personal communications (such as phone calls or
conversations) and third-party communications (such as news
articles).

Each communication instance is grounded by its original
(unannotated) contents, which should be stored in either the
“text” field (for text communications) or the “audio” field (for
audio communications). If the communication is not available in
its original form, then these fields should store the
communication in the least-processed form available.

Attributes:
- id: Stable identifier for this communication, identifying both the
name of the source corpus and the document that it corresponds to
in that corpus.
- uuid: Universally unique identifier for this communication instance.
This is generated randomly, and can not be mapped back to the
source corpus. It is used as a target for symbolic “pointers”.
- type: A short, corpus-specific term characterizing the nature of the
communication; may change in a future version of concrete.
Often used for filtering. For example, Gigaword uses
the type “story” to distinguish typical news articles from
weekly summaries (“multi”), editorial advisories (“advis”), etc.
At present, this value is typically a literal form from the
originating corpus: as a result, a type marked ‘other’ may have
different meanings across different corpora.
- text: The full text contents of this communication in its original
form, or in the least-processed form available, if the original
is not available.
- startTime: The time when this communication started (in unix time UTC –
i.e., seconds since January 1, 1970).
- endTime: The time when this communication ended (in unix time UTC –
i.e., seconds since January 1, 1970).
- communicationTaggingList: A list of CommunicationTagging objects that can support this
Communication. CommunicationTagging objects can be used to
annotate Communications with topics, gender identification, etc.
- metadata: metadata.AnnotationMetadata to support this particular communication.

Communications derived from other communications should
indicate in this metadata object their dependency
to the original communication ID.
- keyValueMap: A catch-all store of keys and values. Use sparingly!
- lidList: Theories about the languages that are present in this
communication.
- sectionList: Theory about the block structure of this communication.
- entityMentionSetList: Theories about which spans of text are used to mention entities
in this communication.
- entitySetList: Theories about what entities are discussed in this
communication, with pointers to individual mentions.
- situationMentionSetList: Theories about what situations are explicitly mentioned in this
communication.
- situationSetList: Theories about what situations are asserted in this
communication.
- originalText: Optional original text field that points back to an original
communication.

This field can be populated for sake of convenience when creating
“perspective” communication (communications that are based on
highly destructive changes to an original communication [e.g.,
via MT]). This allows developers to quickly access the original
text that this perspective communication is based off of.
- sound: The full audio contents of this communication in its original
form, or in the least-processed form available, if the original
is not available.
- communicationMetadata: Metadata about this specific Communication, such as information
about its author, information specific to this Communication
or Communications like it (info from an API, for example), etc.

read(iprot)
validate()
write(oprot)
class concrete.communication.ttypes.CommunicationSet(communicationIdList=None, corpus=None, entityMentionClusterList=None, entityClusterList=None, situationMentionClusterList=None, situationClusterList=None)

Bases: object


A structure that represents a collection of Communications.

Attributes:
- communicationIdList: A list of Communication UUIDs that this CommunicationSet
represents.

This field may be absent if this CommunicationSet represents
a large corpus. If absent, ‘corpus’ field should be present.
- corpus: The name of a corpus or other document body that this
CommunicationSet represents.

Should be present if ‘communicationIdList’ is absent.
- entityMentionClusterList: A list of Clustering objects that represent a
group of EntityMentions that are a part of this
CommunicationSet.
- entityClusterList: A list of Clustering objects that represent a
group of Entities that are a part of this
CommunicationSet.
- situationMentionClusterList: A list of Clustering objects that represent a
group of SituationMentions that are a part of this
CommunicationSet.
- situationClusterList: A list of Clustering objects that represent a
group of Situations that are a part of this
CommunicationSet.

read(iprot)
validate()
write(oprot)
class concrete.communication.ttypes.CommunicationTagging(uuid=None, metadata=None, taggingType=None, tagList=None, confidenceList=None)

Bases: object


A structure that represents a ‘tagging’ of a Communication. These
might be labels or annotations on a particular communcation.

For example, this structure might be used to describe the topics
discussed in a Communication. The taggingType might be ‘topic’, and
the tagList might include ‘politics’ and ‘science’.

Attributes:
- uuid: A unique identifier for this CommunicationTagging object.
- metadata: AnnotationMetadata to support this CommunicationTagging object.
- taggingType: A string that captures the type of this CommunicationTagging
object. For example: ‘topic’ or ‘gender’.
- tagList: A list of strings that represent different tags related to the taggingType.
For example, if the taggingType is ‘topic’, some example tags might be
‘politics’, ‘science’, etc.
- confidenceList: A list of doubles, parallel to the list of strings in tagList,
that indicate the confidences of each tag.

read(iprot)
validate()
write(oprot)

concrete.email package

class concrete.email.ttypes.EmailAddress(address=None, displayName=None)

Bases: object


An email address, optionally accompanied by a display_name. These
values are typically extracted from strings such as:
<tt> “John Smith” &lt;john@xyz.com&gt; </tt>.


Attributes:
- address
- displayName

read(iprot)
validate()
write(oprot)
class concrete.email.ttypes.EmailCommunicationInfo(messageId=None, contentType=None, userAgent=None, inReplyToList=None, referenceList=None, senderAddress=None, returnPathAddress=None, toAddressList=None, ccAddressList=None, bccAddressList=None, emailFolder=None, subject=None, quotedAddresses=None, attachmentPaths=None, salutation=None, signature=None)

Bases: object


Extra information about an email communication instance.

Attributes:
- messageId
- contentType
- userAgent
- inReplyToList
- referenceList
- senderAddress
- returnPathAddress
- toAddressList
- ccAddressList
- bccAddressList
- emailFolder
- subject
- quotedAddresses
- attachmentPaths
- salutation
- signature

read(iprot)
validate()
write(oprot)

concrete.entities package

class concrete.entities.ttypes.Entity(uuid=None, id=None, mentionIdList=None, rawMentionList=None, type=None, confidence=None, canonicalName=None)

Bases: object


A single referent (or “entity”) that is referred to at least once
in a given communication, along with pointers to all of the
references to that referent. The referent’s type (e.g., is it a
person, or a location, or an organization, etc) is also recorded.

Because each Entity contains pointers to all references to a
referent with a given communication, an Entity can be
thought of as a coreference set.

Attributes:
- uuid: Unique identifier for this entity.
- id: A corpus-specific and stable id such as a Freebase mid
or a DBpedia id.
- mentionIdList: An list of pointers to all of the mentions of this Entity’s
referent. (type=EntityMention)
- rawMentionList: An list of pointers to all of the sentences which contain a
mention of this Entity.
- type: The basic type of this entity’s referent.
- confidence: Confidence score for this individual entity. You can also set a
confidence score for an entire EntitySet using the EntitySet’s
metadata.
- canonicalName: A string containing a representative, canonical, or “best” name
for this entity’s referent. This string may match one of the
mentions’ text strings, but it is not required to.

read(iprot)
validate()
write(oprot)
class concrete.entities.ttypes.EntityMention(uuid=None, tokens=None, entityType=None, phraseType=None, confidence=None, text=None, childMentionIdList=None)

Bases: object


A span of text with a specific referent, such as a person,
organization, or time. Things that can be referred to by a mention
are called “entities.”

It is left up to individual EntityMention taggers to decide which
referent types and phrase types to identify. For example, some
EntityMention taggers may only identify proper nouns, or may only
identify EntityMentions that refer to people.

Each EntityMention consists of a sequence of tokens. This sequence
is usually annotated with information about the referent type
(e.g., is it a person, or a location, or an organization, etc) as
well as the phrase type (is it a name, pronoun, common noun, etc.).

EntityMentions typically consist of a single noun phrase; however,
other phrase types may also be marked as mentions. For
example, in the phrase “French hotel,” the adjective “French” might
be marked as a mention for France.

Attributes:
- uuid
- tokens: Pointer to sequence of tokens.

Special note: In the case of PRO-drop, where there is no explicit
mention, but an EntityMention is needed for downstream Entity
analysis, this field should be set to a TokenRefSequence with an
empty tokenIndexList and the anchorTokenIndex set to the head/only
token of the verb/predicate from which the PRO was dropped.
- entityType: The type of referent that is referred to by this mention.
- phraseType: The phrase type of the tokens that constitute this mention.
- confidence: A confidence score for this individual mention. You can also
set a confidence score for an entire EntityMentionSet using the
EntityMentionSet’s metadata.
- text: The text content of this entity mention. This field is
typically redundant with the string formed by cross-referencing
the ‘tokens.tokenIndexList’ field with this mention’s
tokenization. This field may not be generated by all analytics.
- childMentionIdList: A list of pointers to the “child” EntityMentions of this
EntityMention.

read(iprot)
validate()
write(oprot)
class concrete.entities.ttypes.EntityMentionSet(uuid=None, metadata=None, mentionList=None, linkingList=None)

Bases: object


A theory about the set of entity mentions that are present in a
message. See also: EntityMention

This type does not represent a coreference relationship, which is handled by Entity.
This type is meant to represent the output of a entity-mention-identifier,
which is often a part of an in-doc coreference system.

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- mentionList: List of mentions in this set.
- linkingList: Entity linking annotations associated with this EntityMentionSet.

read(iprot)
validate()
write(oprot)
class concrete.entities.ttypes.EntitySet(uuid=None, metadata=None, entityList=None, linkingList=None, mentionSetId=None)

Bases: object


A theory about the set of entities that are present in a
message. See also: Entity.

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- entityList: List of entities in this set.
- linkingList: Entity linking annotations associated with this EntitySet.
- mentionSetId: An optional UUID pointer to an EntityMentionSet.

If this field is present, consumers can assume that all
Entity objects in this EntitySet have EntityMentions that are included
in the named EntityMentionSet.

read(iprot)
validate()
write(oprot)

concrete.exceptions package

exception concrete.exceptions.ttypes.ConcreteThriftException(message=None, serEx=None)

Bases: thrift.Thrift.TException


An exception to be used with Concrete thrift
services.

Attributes:
- message
- serEx

read(iprot)
validate()
write(oprot)

concrete.language package

class concrete.language.ttypes.LanguageIdentification(uuid=None, metadata=None, languageToProbabilityMap=None)

Bases: object


A theory about what languages are present in a given communication
or piece of communication. Note that it is possible to have more
than one language present in a given communication.

Attributes:
- uuid: Unique identifier for this language identification.
- metadata: Information about where this language identification came from.
- languageToProbabilityMap: A list mapping from a language to the probability that that
language occurs in a given communication. Each language code should
occur at most once in this list. The probabilities do <i>not</i>
need to sum to one – for example, if a single communication is known
to contain both English and French, then it would be appropriate
to assign a probability of 1 to both langauges. (Manually
annotated LanguageProb objects should always have probabilities
of either zero or one; machine-generated LanguageProbs may have
intermediate probabilities.)

Note: The string key should represent the ISO 639-3 three-letter code.

read(iprot)
validate()
write(oprot)

concrete.learn package

concrete.learn.ActiveLearnerClientService module
class concrete.learn.ActiveLearnerClientService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.learn.ActiveLearnerClientService.Iface


The active learner client implements a method to accept new sorts of the annotation units

recv_submitSort()
send_submitSort(sessionId, unitIds)
submitSort(sessionId, unitIds)

Submit a new sort of communications to the broker

Parameters:
- sessionId
- unitIds

class concrete.learn.ActiveLearnerClientService.Iface

Bases: concrete.services.Service.Iface


The active learner client implements a method to accept new sorts of the annotation units

submitSort(sessionId, unitIds)

Submit a new sort of communications to the broker

Parameters:
- sessionId
- unitIds

class concrete.learn.ActiveLearnerClientService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.learn.ActiveLearnerClientService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_submitSort(seqid, iprot, oprot)
class concrete.learn.ActiveLearnerClientService.submitSort_args(sessionId=None, unitIds=None)

Bases: object


Attributes:
- sessionId
- unitIds

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerClientService.submitSort_result

Bases: object

read(iprot)
validate()
write(oprot)
concrete.learn.ActiveLearnerServerService module
class concrete.learn.ActiveLearnerServerService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.learn.ActiveLearnerServerService.Iface


The active learning server is responsible for sorting a list of communications.
Users annotate communications based on the sort.

Active learning is an asynchronous process.
It is started by the client calling start().
At arbitrary times, the client can call addAnnotations().
When the server is done with a sort of the data, it calls submitSort() on the client.
The server can perform additional sorts until stop() is called.

The server must be preconfigured with the details of the data source to pull communications.

addAnnotations(sessionId, annotations)

Add annotations from the user to the learning process

Parameters:
- sessionId
- annotations

recv_addAnnotations()
recv_start()
recv_stop()
send_addAnnotations(sessionId, annotations)
send_start(sessionId, task, contact)
send_stop(sessionId)
start(sessionId, task, contact)

Start an active learning session on these communications

Parameters:
- sessionId
- task
- contact

stop(sessionId)

Stop the learning session

Parameters:
- sessionId

class concrete.learn.ActiveLearnerServerService.Iface

Bases: concrete.services.Service.Iface


The active learning server is responsible for sorting a list of communications.
Users annotate communications based on the sort.

Active learning is an asynchronous process.
It is started by the client calling start().
At arbitrary times, the client can call addAnnotations().
When the server is done with a sort of the data, it calls submitSort() on the client.
The server can perform additional sorts until stop() is called.

The server must be preconfigured with the details of the data source to pull communications.

addAnnotations(sessionId, annotations)

Add annotations from the user to the learning process

Parameters:
- sessionId
- annotations

start(sessionId, task, contact)

Start an active learning session on these communications

Parameters:
- sessionId
- task
- contact

stop(sessionId)

Stop the learning session

Parameters:
- sessionId

class concrete.learn.ActiveLearnerServerService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.learn.ActiveLearnerServerService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_addAnnotations(seqid, iprot, oprot)
process_start(seqid, iprot, oprot)
process_stop(seqid, iprot, oprot)
class concrete.learn.ActiveLearnerServerService.addAnnotations_args(sessionId=None, annotations=None)

Bases: object


Attributes:
- sessionId
- annotations

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.addAnnotations_result

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.start_args(sessionId=None, task=None, contact=None)

Bases: object


Attributes:
- sessionId
- task
- contact

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.start_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.stop_args(sessionId=None)

Bases: object


Attributes:
- sessionId

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.stop_result

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.learn.ttypes.Annotation(id=None, communication=None)

Bases: object


Annotation on a communication.

Attributes:
- id: Identifier of the part of the communication being annotated.
- communication: Communication with the annotation stored in it.
The location of the annotation depends on the annotation unit identifier

read(iprot)
validate()
write(oprot)
class concrete.learn.ttypes.AnnotationTask(type=None, language=None, unitType=None, units=None)

Bases: object


Annotation task including information for pulling data.

Attributes:
- type: Type of annotation task
- language: Language of the data for the task
- unitType: Entire communication or individual sentences
- units: Identifiers for each annotation unit

read(iprot)
validate()
write(oprot)

concrete.linking package

Bases: object


A structure that represents the origin of an entity linking annotation.

Attributes:
- sourceId: The “root” of this Link; points to a EntityMention UUID, Entity UUID, etc.
- linkTargetList: A list of LinkTarget objects that this Link contains.

read(iprot)
validate()
write(oprot)
class concrete.linking.ttypes.LinkTarget(confidence=None, targetId=None, dbId=None, dbName=None)

Bases: object


A structure that represents the target of an entity linking annotation.

Attributes:
- confidence: Confidence of this LinkTarget object.
- targetId: A UUID that represents the target of this LinkTarget. This
UUID should exist in the Entity/Situation(Mention)Set that the
Linking object is contained in.
- dbId: A database ID that represents the target of this linking.

This should be used if the target of the linking is not associated
with an Entity|Situation(Mention)Set in Concrete, and therefore cannot be linked by
a UUID internal to concrete.

If present, other optional field ‘dbName’ should also be populated.
- dbName: The name of the database that represents the target of this linking.

Together with the ‘dbId’, this can form a pointer to a target
that is not represented inside concrete.

Should be populated alongside ‘dbId’.

read(iprot)
validate()
write(oprot)
class concrete.linking.ttypes.Linking(metadata=None, linkList=None)

Bases: object


A structure that represents entity linking annotations.

Attributes:
- metadata: Metadata related to this Linking object.
- linkList: A list of Link objects that this Linking object contains.

read(iprot)
validate()
write(oprot)

concrete.metadata package

class concrete.metadata.ttypes.AnnotationMetadata(tool=None, timestamp=None, digest=None, dependencies=None, kBest=1)

Bases: object


Metadata associated with an annotation or a set of annotations,
that identifies where those annotations came from.

Attributes:
- tool: The name of the tool that generated this annotation.
- timestamp: The time at which this annotation was generated (in unix time
UTC – i.e., seconds since January 1, 1970).
- digest: A Digest, carrying over any information the annotation metadata
wishes to carry over.
- dependencies: The theories that supported this annotation.

An empty field indicates that the theory has no
dependencies (e.g., an ingester).
- kBest: An integer that represents a ranking for systems
that output k-best lists.

For systems that do not output k-best lists,
the default value (1) should suffice.

read(iprot)
validate()
write(oprot)
class concrete.metadata.ttypes.CommunicationMetadata(tweetInfo=None, emailInfo=None, nitfInfo=None)

Bases: object


Metadata specific to a particular Communication object.
This might include corpus-specific metadata (from the Twitter API),
attributes associated with the Communication (the author),
or other information about the Communication.

Attributes:
- tweetInfo: Extra information for communications where kind==TWEET:
Information about this tweet that is provided by the Twitter
API. For information about the Twitter API, see:
- emailInfo: Extra information for communications where kind==EMAIL
- nitfInfo: Extra information that may come from the NITF
(News Industry Text Format) schema. See ‘nitf.thrift’.

read(iprot)
validate()
write(oprot)
class concrete.metadata.ttypes.Digest(bytesValue=None, int64Value=None, doubleValue=None, stringValue=None, int64List=None, doubleList=None, stringList=None)

Bases: object


Analytic-specific information about an attribute or edge. Digests
are used to combine information from multiple sources to generate a
unified value. The digests generated by an analytic will only ever
be used by that same analytic, so analytics can feel free to encode
information in whatever way is convenient.

Attributes:
- bytesValue: The following fields define various ways you can store the
digest data (for convenience). If none of these meets your
needs, then serialize the digest to a byte sequence and store it
in bytesValue.
- int64Value
- doubleValue
- stringValue
- int64List
- doubleList
- stringList

read(iprot)
validate()
write(oprot)
class concrete.metadata.ttypes.TheoryDependencies(sectionTheoryList=None, sentenceTheoryList=None, tokenizationTheoryList=None, posTagTheoryList=None, nerTagTheoryList=None, lemmaTheoryList=None, langIdTheoryList=None, parseTheoryList=None, dependencyParseTheoryList=None, tokenAnnotationTheoryList=None, entityMentionSetTheoryList=None, entitySetTheoryList=None, situationMentionSetTheoryList=None, situationSetTheoryList=None, communicationsList=None)

Bases: object


A struct that holds UUIDs for all theories that a particular
annotation was based upon (and presumably requires).

Producers of TheoryDependencies should list all stages that they
used in constructing their particular annotation. They do not,
however, need to explicitly label each stage; they can label
only the immediate stage before them.

Examples:

If you are producing a Tokenization, and only used the
SentenceSegmentation in order to produce that Tokenization, list
only the single SentenceSegmentation UUID in sentenceTheoryList.

In this example, even though the SentenceSegmentation will have
a dependency on some SectionSegmentation, it is not necessary
for the Tokenization to list the SectionSegmentation UUID as a
dependency.

If you are a producer of EntityMentions, and you use two
POSTokenTagging and one NERTokenTagging objects, add the UUIDs for
the POSTokenTagging objects to posTagTheoryList, and the UUID of
the NER TokenTagging to the nerTagTheoryList.

In this example, because multiple annotations influenced the
new annotation, they should all be listed as dependencies.

Attributes:
- sectionTheoryList
- sentenceTheoryList
- tokenizationTheoryList
- posTagTheoryList
- nerTagTheoryList
- lemmaTheoryList
- langIdTheoryList
- parseTheoryList
- dependencyParseTheoryList
- tokenAnnotationTheoryList
- entityMentionSetTheoryList
- entitySetTheoryList
- situationMentionSetTheoryList
- situationSetTheoryList
- communicationsList

read(iprot)
validate()
write(oprot)

concrete.nitf package

class concrete.nitf.ttypes.NITFInfo(alternateURL=None, articleAbstract=None, authorBiography=None, banner=None, biographicalCategoryList=None, columnName=None, columnNumber=None, correctionDate=None, correctionText=None, credit=None, dayOfWeek=None, descriptorList=None, featurePage=None, generalOnlineDescriptorList=None, guid=None, kicker=None, leadParagraphList=None, locationList=None, nameList=None, newsDesk=None, normalizedByline=None, onlineDescriptorList=None, onlineHeadline=None, onlineLeadParagraph=None, onlineLocationList=None, onlineOrganizationList=None, onlinePeople=None, onlineSectionList=None, onlineTitleList=None, organizationList=None, page=None, peopleList=None, publicationDate=None, publicationDayOfMonth=None, publicationMonth=None, publicationYear=None, section=None, seriesName=None, slug=None, taxonomicClassifierList=None, titleList=None, typesOfMaterialList=None, url=None, wordCount=None)

Bases: object


Attributes:
- alternateURL: This field specifies the URL of the article, if published online. In some
cases, such as with the New York Times, when this field is present,
the URL is preferred to the URL field on articles published on
or after April 02, 2006, as the linked page will have richer content.
- articleAbstract: This field is a summary of the article, possibly written by
an indexing service.
- authorBiography: This field specifies the biography of the author of the article.
Generally, this field is specified for guest authors, and not for
regular reporters, except to provide the author’s email address.
- banner: The banner field is used to indicate if there has been additional
information appended to the articles since its publication. Examples of
banners include (‘Correction Appended’ and ‘Editor’s Note Appended’).
- biographicalCategoryList: When present, the biographical category field generally indicates that a
document focuses on a particular individual. The value of the field
indicates the area or category in which this individual is best known.
This field is most often defined for Obituaries and Book Reviews.

<ol>
<li>Politics and Government (U.S.)</li>
<li>Books and Magazines <li>Royalty</li>
</ol>
- columnName: If the article is part of a regular column, this field specifies the name
of that column.
<br>
Sample Column Names:
<br>
<ol>
<li>World News Briefs</li>
<li>WEDDINGS</li>
<li>The Accessories Channel</li>
</ol>

- columnNumber: This field specifies the column in which the article starts in the print
paper. A typical printed page in the paper has six columns numbered from
right to left. As a consequence most, but not all, of the values for this
field fall in the range 1-6.
- correctionDate: This field specifies the date on which a correction was made to the
article. Generally, if the correction date is specified, the correction
text will also be specified (and vice versa).
- correctionText: For articles corrected following publication, this field specifies the
correction. Generally, if the correction text is specified, the
correction date will also be specified (and vice versa).
- credit: This field indicates the entity that produced the editorial content of
this document.
- dayOfWeek: This field specifies the day of week on which the article was published.
<ul>
<li>Monday</li>
<li>Tuesday</li>
<li>Wednesday</li>
<li>Thursday</li>
<li>Friday</li>
<li>Saturday</li>
<li>Sunday</li>
</ul>
- descriptorList: The &quot;descriptors&quot; field specifies a list of descriptive terms drawn from
a normalized controlled vocabulary corresponding to subjects mentioned in
the article.
<br>
Examples Include:
<ol>
<li>ECONOMIC CONDITIONS AND TRENDS</li>
<li>AIRPLANES</li>
<li>VIOLINS</li>
</ol>
- featurePage: The feature page containing this article, such as
<ul>
<li>Education Page</li>
<li>Fashion Page</li>
</ul>
- generalOnlineDescriptorList: The &quot;general online descriptors&quot; field specifies a list of descriptors
that are at a higher level of generality than the other tags associated
with the article.
<br>
Examples Include:
<ol>
<li>Surfing</li>
<li>Venice Biennale</li>
<li>Ranches</li>
</ol>
- guid: The GUID field specifies an integer that is guaranteed to be unique for
every document in the corpus.
- kicker: The kicker is an additional piece of information printed as an
accompaniment to a news headline.
- leadParagraphList: The &quot;lead Paragraph&quot; field is the lead paragraph of the article.
Generally this field is populated with the first two paragraphs from the
article.
- locationList: The &quot;locations&quot; field specifies a list of geographic descriptors drawn
from a normalized controlled vocabulary that correspond to places
mentioned in the article.
<br>
Examples Include:
<ol>
<li>Wellsboro (Pa)</li>
<li>Kansas City (Kan)</li>
<li>Park Slope (NYC)</li>
</ol>
- nameList: The &quot;names&quot; field specifies a list of names mentioned in the article.
<br>
Examples Include:
<ol>
<li>Azza Fahmy</li>
<li>George C. Izenour</li>
<li>Chris Schenkel</li>
</ol>
- newsDesk: This field specifies the desk in the newsroom that
produced the article. The desk is related to, but is not the same as the
section in which the article appears.
- normalizedByline: The Normalized Byline field is the byline normalized to the form (last
name, first name).
- onlineDescriptorList: This field specifies a list of descriptors from a normalized controlled
vocabulary that correspond to topics mentioned in the article.
<br>
Examples Include:
<ol>
<li>Marriages</li>
<li>Parks and Other Recreation Areas</li>
<li>Cooking and Cookbooks</li>
</ol>
- onlineHeadline: This field specifies the headline displayed with the article
online. Often this differs from the headline used in print.
- onlineLeadParagraph: This field specifies the lead paragraph for the online version.
- onlineLocationList: This field specifies a list of place names that correspond to geographic
locations mentioned in the article.
<br>
Examples Include:
<ol>
<li>Hollywood</li>
<li>Los Angeles</li>
<li>Arcadia</li>
</ol>
- onlineOrganizationList: This field specifies a list of organizations that correspond to
organizations mentioned in the article.
<br>
Examples Include:
<ol>
<li>Nintendo Company Limited</li>
<li>Yeshiva University</li>
<li>Rose Center</li>
</ol>
- onlinePeople: This field specifies a list of people that correspond to individuals
mentioned in the article.
<br>
Examples Include:
<ol>
<li>Lopez, Jennifer</li>
<li>Joyce, James</li>
<li>Robinson, Jackie</li>
</ol>
- onlineSectionList: This field specifies the section(s) in which the online version of the article
is placed. This may typically be populated from a semicolon (;) delineated list.
- onlineTitleList: This field specifies a list of authored works mentioned in the article.
<br>
Examples Include:
<ol>
<li>Matchstick Men (Movie)</li>
<li>Blades of Glory (Movie)</li>
<li>Bridge and Tunnel (Play)</li>
</ol>
- organizationList: This field specifies a list of organization names drawn from a normalized
controlled vocabulary that correspond to organizations mentioned in the
article.
<br>
Examples Include:
<ol>
<li>Circuit City Stores Inc</li>
<li>Delaware County Community College (Pa)</li>
<li>CONNECTICUT GRAND OPERA</li>
</ol>
- page: This field specifies the page of the section in the paper in which the
article appears. This is not an absolute pagination. An article that
appears on page 3 in section A occurs in the physical paper before an
article that occurs on page 1 of section F. The section is encoded in
the <strong>section</strong> field.
- peopleList: This field specifies a list of people from a normalized controlled
vocabulary that correspond to individuals mentioned in the article.
<br>
Examples Include:
<ol>
<li>REAGAN, RONALD WILSON (PRES)</li>
<li>BEGIN, MENACHEM (PRIME MIN)</li>
<li>COLLINS, GLENN</li>
</ol>
- publicationDate: This field specifies the date of the article’s publication.
- publicationDayOfMonth: This field specifies the day of the month on which the article was
published, always in the range 1-31.
- publicationMonth: This field specifies the month on which the article was published in the
range 1-12 where 1 is January 2 is February etc.
- publicationYear: This field specifies the year in which the article was published. This
value is in the range 1987-2007 for this collection.
- section: This field specifies the section of the paper in which the article
appears. This is not the name of the section, but rather a letter or
number that indicates the section.
- seriesName: If the article is part of a regular series, this field specifies the name
of that column.
- slug: The slug is a short string that uniquely identifies an article from all
other articles published on the same day. Please note, however, that
different articles on different days may have the same slug.
<ul>
<li>30other</li>
<li>12reunion</li>
</ul>
- taxonomicClassifierList: This field specifies a list of taxonomic classifiers that place this
article into a hierarchy of articles. The individual terms of each
taxonomic classifier are separated with the ‘/’ character.
<br>
Examples Include:
<ol>
<li>Top/Features/Travel/Guides/Destinations/North America/United
States/Arizona</li>
<li>Top/News/U.S./Rockies</li>
<li>Top/Opinion</li>
</ol>
- titleList: This field specifies a list of authored works that correspond to works
mentioned in the article.
<br>
Examples Include:
<ol>
<li>Greystoke: The Legend of Tarzan, Lord of the Apes (Movie)</li>
<li>Law and Order (TV Program)</li>
<li>BATTLEFIELD EARTH (BOOK)</li>
</ol>
- typesOfMaterialList: This field specifies a normalized list of terms describing the general
editorial category of the article.
<br>
Examples Include:
<ol>
<li>REVIEW</li>
<li>OBITUARY</li>
<li>ANALYSIS</li>
</ol>
- url: This field specifies the location of the online version of the article. The
&quot;Alternative Url&quot; field is preferred to this field on articles published
on or after April 02, 2006, as the linked page will have richer content.
- wordCount: This field specifies the number of words in the body of the article,
including the lead paragraph.

read(iprot)
validate()
write(oprot)

concrete.search package

concrete.search.FeedbackService module
class concrete.search.FeedbackService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.search.FeedbackService.Iface

addCommunicationFeedback(searchResultsId, communicationId, feedback)

Provide feedback on the relevance of a particular communication to a search

Parameters:
- searchResultsId
- communicationId
- feedback

addSentenceFeedback(searchResultsId, communicationId, sentenceId, feedback)

Provide feedback on the relevance of a particular sentence to a search

Parameters:
- searchResultsId
- communicationId
- sentenceId
- feedback

recv_addCommunicationFeedback()
recv_addSentenceFeedback()
recv_startFeedback()
send_addCommunicationFeedback(searchResultsId, communicationId, feedback)
send_addSentenceFeedback(searchResultsId, communicationId, sentenceId, feedback)
send_startFeedback(results)
startFeedback(results)

Start providing feedback for the specified SearchResults.
This causes the search and its results to be persisted.

Parameters:
- results

class concrete.search.FeedbackService.Iface

Bases: concrete.services.Service.Iface

addCommunicationFeedback(searchResultsId, communicationId, feedback)

Provide feedback on the relevance of a particular communication to a search

Parameters:
- searchResultsId
- communicationId
- feedback

addSentenceFeedback(searchResultsId, communicationId, sentenceId, feedback)

Provide feedback on the relevance of a particular sentence to a search

Parameters:
- searchResultsId
- communicationId
- sentenceId
- feedback

startFeedback(results)

Start providing feedback for the specified SearchResults.
This causes the search and its results to be persisted.

Parameters:
- results

class concrete.search.FeedbackService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.search.FeedbackService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_addCommunicationFeedback(seqid, iprot, oprot)
process_addSentenceFeedback(seqid, iprot, oprot)
process_startFeedback(seqid, iprot, oprot)
class concrete.search.FeedbackService.addCommunicationFeedback_args(searchResultsId=None, communicationId=None, feedback=None)

Bases: object


Attributes:
- searchResultsId
- communicationId
- feedback

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.addCommunicationFeedback_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.addSentenceFeedback_args(searchResultsId=None, communicationId=None, sentenceId=None, feedback=None)

Bases: object


Attributes:
- searchResultsId
- communicationId
- sentenceId
- feedback

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.addSentenceFeedback_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.startFeedback_args(results=None)

Bases: object


Attributes:
- results

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.startFeedback_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
concrete.search.SearchProxyService module
class concrete.search.SearchProxyService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.search.SearchProxyService.Iface


The search proxy service provides a single interface to multiple search providers

getCapabilities(provider)

Get a list of search type and language pairs for a search provider

Parameters:
- provider

getCorpora(provider)

Get a corpus list for a search provider

Parameters:
- provider

getProviders()

Get a list of search providers behind the proxy

recv_getCapabilities()
recv_getCorpora()
recv_getProviders()
search(query, provider)

Specify the search provider when performing a search

Parameters:
- query
- provider

send_getCapabilities(provider)
send_getCorpora(provider)
send_getProviders()
class concrete.search.SearchProxyService.Iface

Bases: concrete.services.Service.Iface


The search proxy service provides a single interface to multiple search providers

getCapabilities(provider)

Get a list of search type and language pairs for a search provider

Parameters:
- provider

getCorpora(provider)

Get a corpus list for a search provider

Parameters:
- provider

getProviders()

Get a list of search providers behind the proxy

search(query, provider)

Specify the search provider when performing a search

Parameters:
- query
- provider

class concrete.search.SearchProxyService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.search.SearchProxyService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_getCapabilities(seqid, iprot, oprot)
process_getCorpora(seqid, iprot, oprot)
process_getProviders(seqid, iprot, oprot)
class concrete.search.SearchProxyService.getCapabilities_args(provider=None)

Bases: object


Attributes:
- provider

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getCapabilities_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getCorpora_args(provider=None)

Bases: object


Attributes:
- provider

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getCorpora_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getProviders_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getProviders_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.search_args(query=None, provider=None)

Bases: object


Attributes:
- query
- provider

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.search_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
concrete.search.SearchService module
class concrete.search.SearchService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.search.SearchService.Iface

getCapabilities()

Get a list of search type-language pairs

getCorpora()

Get a corpus list from the search provider

recv_getCapabilities()
recv_getCorpora()
search(query)

Perform a search specified by the query

Parameters:
- query

send_getCapabilities()
send_getCorpora()
class concrete.search.SearchService.Iface

Bases: concrete.services.Service.Iface

getCapabilities()

Get a list of search type-language pairs

getCorpora()

Get a corpus list from the search provider

search(query)

Perform a search specified by the query

Parameters:
- query

class concrete.search.SearchService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.search.SearchService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_getCapabilities(seqid, iprot, oprot)
process_getCorpora(seqid, iprot, oprot)
class concrete.search.SearchService.getCapabilities_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.getCapabilities_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.getCorpora_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.getCorpora_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.search_args(query=None)

Bases: object


Attributes:
- query

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.search_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchCapability(type=None, lang=None)

Bases: object


A search provider describes its capabilities with a list of search type and language pairs.

Attributes:
- type: A type of search supported by the search provider
- lang: Language that the search provider supports.
Use ISO 639-2/T three letter codes.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchFeedback

Bases: object


Feedback values

NEGATIVE = -1
NONE = 0
POSITIVE = 1
class concrete.search.ttypes.SearchQuery(terms=None, questions=None, communicationId=None, tokens=None, rawQuery=None, auths=None, userId=None, name=None, labels=None, type=None, lang=None, corpus=None, k=None, communication=None)

Bases: object


Wrapper for information relevant to a (possibly structured) search.

Attributes:
- terms: Individual words, or multiword phrases, e.g., ‘dog’, ‘blue
cheese’. It is the responsibility of the implementation of
Search* to tokenize multiword phrases, if so-desired. Further,
an implementation may choose to support advanced features such as
wildcards, e.g.: ‘blue*’. This specification makes no
committment as to the internal structure of keywords and their
semantics: that is the responsibility of the individual
implementation.
- questions: e.g., “what is the capital of spain?”

questions is a list in order that possibly different phrasings of
the question can be included, e.g.: “what is the name of spain’s
capital?”
- communicationId: Refers to an optional communication that can provide context for the query.
- tokens: Refers to a sequence of tokens in the communication referenced by communicationId.
- rawQuery: The input from the user provided in the search box, unmodified
- auths: optional authorization mechanism
- userId: Identifies the user who submitted the search query
- name: Human readable name of the query.
- labels: Properties of the query or user.
These labels can be used to group queries and results by a domain or group of
users for training. An example usage would be assigning the geographical region
as a label (“spain”). User labels could be based on organizational units (“hltcoe”).
- type: This search is over this type of data (communications, sentences, entities)
- lang: The language of the corpus that the user wants to search.
Use ISO 639-2/T three letter codes.
- corpus: An identifier of the corpus that the search is to be performed over.
- k: The maximum number of candidates the search service should return.
- communication: An optional communication used as context for the query.
If both this field and communicationId is populated, then it is
assumed the ID of the communication is the same as communicationId.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchResult(uuid=None, searchQuery=None, searchResultItems=None, metadata=None, lang=None)

Bases: object


Single wrapper for results from all the various Search* services.

Attributes:
- uuid: Unique identifier for the results of this search.
- searchQuery: The query that led to this result.
Useful for capturing feedback or building training data.
- searchResultItems: The list is assumed sorted best to worst, which should be
reflected by the values contained in the score field of each
SearchResult, if that field is populated.
- metadata: The system that provided the response: likely use case for
populating this field is for building training data. Presumably
a system will not need/want to return this object in live use.
- lang: The dominant language of the search results.
Use ISO 639-2/T three letter codes.
Search providers should set this when possible to support downstream processing.
Do not set if it is not known.
If multilingual, use the string “multilingual”.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchResultItem(communicationId=None, sentenceId=None, score=None, tokens=None, entity=None)

Bases: object


An individual element returned from a search. Most/all methods
will return a communicationId, possibly with an associated score.
For example if the target element type of the search is Sentence
then the sentenceId field should be populated.

Attributes:
- communicationId
- sentenceId: The UUID of the returned sentence, which appears in the
communication referenced by communicationId.
- score: Values are not restricted in range (e.g., do not have to be
within [0,1]). Higher is better.

- tokens: If SearchType=ENTITY_MENTIONS then this field should be populated.
Otherwise, this field may be optionally populated in order to
provide a hint to the client as to where to center a
visualization, or the extraction of context, etc.
- entity: If SearchType=ENTITIES then this field should be populated.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchType

Bases: object


What are we searching over

COMMUNICATIONS = 0
ENTITIES = 3
ENTITY_MENTIONS = 4
SECTIONS = 1
SENTENCES = 2
SITUATIONS = 5
SITUATION_MENTIONS = 6

concrete.services package

concrete.services.results.ResultsServerService module
class concrete.services.results.ResultsServerService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.services.results.ResultsServerService.Iface

getLatestSearchResult(userId)

Get the most recent search results for a user

Parameters:
- userId

getNextChunk(sessionId)

Get next chunk of data to annotate
The client should use the Retriever service to access the data

Parameters:
- sessionId

getSearchResult(searchResultId)

Get a search result object

Parameters:
- searchResultId

getSearchResults(taskType, limit)

Get a list of search results for a particular annotation task
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- limit

getSearchResultsByUser(taskType, userId, limit)

Get a list of search results for a particular annotation task filtered by a user id
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- userId
- limit

recv_getLatestSearchResult()
recv_getNextChunk()
recv_getSearchResult()
recv_getSearchResults()
recv_getSearchResultsByUser()
recv_registerSearchResult()
recv_startSession()
recv_stopSession()
recv_submitAnnotation()
registerSearchResult(result, taskType)

Register the specified search result for annotation.

If a name has not been assigned to the search query, one will be generated.
This service also requires that the user_id field be populated in the SearchQuery.

Parameters:
- result
- taskType

send_getLatestSearchResult(userId)
send_getNextChunk(sessionId)
send_getSearchResult(searchResultId)
send_getSearchResults(taskType, limit)
send_getSearchResultsByUser(taskType, userId, limit)
send_registerSearchResult(result, taskType)
send_startSession(searchResultId, taskType)
send_stopSession(sessionId)
send_submitAnnotation(sessionId, unitId, communication)
startSession(searchResultId, taskType)

Start an annotation session
Returns a session id used in future session calls

Parameters:
- searchResultId
- taskType

stopSession(sessionId)

Stops an annotation session

Parameters:
- sessionId

submitAnnotation(sessionId, unitId, communication)

Submit an annotation for a session

Parameters:
- sessionId
- unitId
- communication

class concrete.services.results.ResultsServerService.Iface

Bases: concrete.services.Service.Iface

getLatestSearchResult(userId)

Get the most recent search results for a user

Parameters:
- userId

getNextChunk(sessionId)

Get next chunk of data to annotate
The client should use the Retriever service to access the data

Parameters:
- sessionId

getSearchResult(searchResultId)

Get a search result object

Parameters:
- searchResultId

getSearchResults(taskType, limit)

Get a list of search results for a particular annotation task
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- limit

getSearchResultsByUser(taskType, userId, limit)

Get a list of search results for a particular annotation task filtered by a user id
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- userId
- limit

registerSearchResult(result, taskType)

Register the specified search result for annotation.

If a name has not been assigned to the search query, one will be generated.
This service also requires that the user_id field be populated in the SearchQuery.

Parameters:
- result
- taskType

startSession(searchResultId, taskType)

Start an annotation session
Returns a session id used in future session calls

Parameters:
- searchResultId
- taskType

stopSession(sessionId)

Stops an annotation session

Parameters:
- sessionId

submitAnnotation(sessionId, unitId, communication)

Submit an annotation for a session

Parameters:
- sessionId
- unitId
- communication

class concrete.services.results.ResultsServerService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.services.results.ResultsServerService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_getLatestSearchResult(seqid, iprot, oprot)
process_getNextChunk(seqid, iprot, oprot)
process_getSearchResult(seqid, iprot, oprot)
process_getSearchResults(seqid, iprot, oprot)
process_getSearchResultsByUser(seqid, iprot, oprot)
process_registerSearchResult(seqid, iprot, oprot)
process_startSession(seqid, iprot, oprot)
process_stopSession(seqid, iprot, oprot)
process_submitAnnotation(seqid, iprot, oprot)
class concrete.services.results.ResultsServerService.getLatestSearchResult_args(userId=None)

Bases: object


Attributes:
- userId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getLatestSearchResult_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getNextChunk_args(sessionId=None)

Bases: object


Attributes:
- sessionId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getNextChunk_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResult_args(searchResultId=None)

Bases: object


Attributes:
- searchResultId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResult_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResultsByUser_args(taskType=None, userId=None, limit=None)

Bases: object


Attributes:
- taskType
- userId
- limit

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResultsByUser_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResults_args(taskType=None, limit=None)

Bases: object


Attributes:
- taskType
- limit

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResults_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.registerSearchResult_args(result=None, taskType=None)

Bases: object


Attributes:
- result
- taskType

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.registerSearchResult_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.startSession_args(searchResultId=None, taskType=None)

Bases: object


Attributes:
- searchResultId
- taskType

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.startSession_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.stopSession_args(sessionId=None)

Bases: object


Attributes:
- sessionId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.stopSession_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.submitAnnotation_args(sessionId=None, unitId=None, communication=None)

Bases: object


Attributes:
- sessionId
- unitId
- communication

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.submitAnnotation_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
concrete.services.Service module
class concrete.services.Service.Client(iprot, oprot=None)

Bases: concrete.services.Service.Iface


Base service that all other services should inherit from

about()

Get information about the service

alive()

Is the service alive?

recv_about()
recv_alive()
send_about()
send_alive()
class concrete.services.Service.Iface

Bases: object


Base service that all other services should inherit from

about()

Get information about the service

alive()

Is the service alive?

class concrete.services.Service.Processor(handler)

Bases: concrete.services.Service.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_about(seqid, iprot, oprot)
process_alive(seqid, iprot, oprot)
class concrete.services.Service.about_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.services.Service.about_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.services.Service.alive_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.services.Service.alive_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.services.ttypes.AnnotationTaskType

Bases: object


Annotation Tasks Types

NER = 2
TOPICID = 3
TRANSLATION = 1
class concrete.services.ttypes.AnnotationUnitIdentifier(communicationId=None, sentenceId=None)

Bases: object


An annotation unit is the part of the communication to be annotated.
It can be the entire communication or a particular sentence in the communication.
If the sentenceID is null, the unit is the entire communication

Attributes:
- communicationId: Communication identifier for loading data
- sentenceId: Sentence identifer if annotating sentences

read(iprot)
validate()
write(oprot)
class concrete.services.ttypes.AnnotationUnitType

Bases: object


An annotation unit is the part of the communication to be annotated.

COMMUNICATION = 1
SENTENCE = 2
class concrete.services.ttypes.AsyncContactInfo(host=None, port=None)

Bases: object


Contact information for the asynchronous communications.
When a client contacts a server for a job that takes a significant amount of time,
it is often best to implement this asynchronously.
We do this by having the client stand up a server to accept the results and
passing that information to the original server.
The server may want to create a new thrift client on every request or maintain
a pool of clients for reuse.

Attributes:
- host
- port

read(iprot)
validate()
write(oprot)
exception concrete.services.ttypes.NotImplementedException(message=None, serEx=None)

Bases: thrift.Thrift.TException


An exception to be used when an invoked method has
not been implemented by the service.

Attributes:
- message: The explanation (why the exception occurred)
- serEx: The serialized exception

read(iprot)
validate()
write(oprot)
class concrete.services.ttypes.ServiceInfo(name=None, version=None, description=None)

Bases: object


Each service is described by this info struct.
It is for human consumption and for records of versions in deployments.

Attributes:
- name: Name of the service
- version: Version string of the service.
It is preferred that the services implement semantic versioning: http://semver.org/
with version strings like x.y.z
- description: Description of the service

read(iprot)
validate()
write(oprot)
exception concrete.services.ttypes.ServicesException(message=None, serEx=None)

Bases: thrift.Thrift.TException


An exception to be used with Concrete services.

Attributes:
- message: The explanation (why the exception occurred)
- serEx: The serialized exception

read(iprot)
validate()
write(oprot)

concrete.situations package

class concrete.situations.ttypes.Argument(role=None, entityId=None, situationId=None, propertyList=None)

Bases: object


A situation argument, consisting of an argument role and a value.
Argument values may be Entities or Situations.

Attributes:
- role: The relationship between this argument and the situation that
owns it. The roles that a situation’s arguments can take
depend on the type of the situation (including subtype
information, such as event_type).
- entityId: A pointer to the value of this argument, if it is explicitly
encoded as an Entity.
- situationId: A pointer to the value of this argument, if it is a situation.
- propertyList: For the BinarySRL task, there may be situations
where more than one property is attached to a single
participant. A list of these properties can be stored in this field.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.Justification(justificationType=None, mentionId=None, tokenRefSeqList=None)

Bases: object


Attributes:
- justificationType: An enumerated value used to describe the way in which the
justification’s mention provides supporting evidence for the
situation.
- mentionId: A pointer to the SituationMention itself.
- tokenRefSeqList: An optional list of pointers to tokens that are (especially)
relevant to the way in which this mention provides
justification for the situation. It is left up to individual
analytics to decide what tokens (if any) they wish to include
in this field.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.MentionArgument(role=None, entityMentionId=None, situationMentionId=None, tokens=None, constituent=None, confidence=None, propertyList=None)

Bases: object


A “concrete” argument, that may be used by SituationMentions or EntityMentions
to avoid conflicts where abstract Arguments were being used to support concrete Mentions.

Attributes:
- role: The relationship between this argument and the situation that
owns it. The roles that a situation’s arguments can take
depend on the type of the situation (including subtype
information, such as event_type).
- entityMentionId: A pointer to the value of an EntityMention, if this is being used to support
an EntityMention.
- situationMentionId: A pointer to the value of this argument, if it is a SituationMention.
- tokens: The location of this MentionArgument in the Communication.
If this MentionArgument can be identified in a document using an
EntityMention or SituationMention, then UUID references to those
types should be preferred and this field left as null.
- constituent: An alternative way to specify the same thing as tokens.
- confidence: Confidence of this argument belonging to its SituationMention
- propertyList: For the BinarySRL task, there may be situations
where more than one property is attached to a single
participant. A list of these properties can be stored in this field.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.Property(value=None, metadata=None, polarity=None)

Bases: object


Attached to Arguments to support situations where
a ‘participant’ has more than one ‘property’ (in BinarySRL terms),
whereas Arguments notionally only support one Role.

Attributes:
- value: The required value of the property.
- metadata: Metadata to support this particular property object.
- polarity: This value is typically boolean, 0.0 or 1.0, but we use a
float in order to potentially capture cases where an annotator is
highly confident that the value is underspecified, via a value of
0.5.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.Situation(uuid=None, situationType=None, situationKind=None, argumentList=None, mentionIdList=None, justificationList=None, timeML=None, intensity=None, polarity=None, confidence=None)

Bases: object


A single situation, along with pointers to situation mentions that
provide evidence for the situation. “Situations” include events,
relations, facts, sentiments, and beliefs. Each situation has a
core type (such as EVENT or SENTIMENT), along with an optional
subtype based on its core type (e.g., event_type=CONTACT_MEET), and
a set of zero or more unordered arguments.

This struct may be used for a variety of “processed” Situations such
as (but not limited to):
- SituationMentions which have been collapsed into a coreferential cluster
- Situations which are inferred and not directly supported by a textual mention

Attributes:
- uuid: Unique identifier for this situation.
- situationType: The core type of this situation (eg EVENT or SENTIMENT),
or a coarse grain situation type.
- situationKind: A fine grain situation type that specifically describes the
situation based on situationType above. It allows for more
detailed description of the situation.

Some examples:

if situationType == EVENT, the event type for the situation
if situationType == STATE, the state type
if situationType == TEMPORAL_FACT, the temporal fact type

For Propbank, this field should be the predicate lemma and id,
e.g. “strike.02”. For FrameNet, this should be the frame name,
e.g. “Commerce_buy”.

Different and more varied situationTypes may be added
in the future.
- argumentList: The arguments for this situation. Each argument consists of a
role and a value. It is possible for an situation to have
multiple arguments with the same role. Arguments are
unordered.
- mentionIdList: Ids of the mentions of this situation in a communication
(type=SituationMention)
- justificationList: An list of pointers to SituationMentions that provide
justification for this situation. These mentions may be either
direct mentions of the situation, or indirect evidence.
- timeML: A wrapper for TimeML annotations.
- intensity: An “intensity” rating for this situation, typically ranging from
0-1. In the case of SENTIMENT situations, this is used to record
the intensity of the sentiment.
- polarity: The polarity of this situation. In the case of SENTIMENT
situations, this is used to record the polarity of the
sentiment.
- confidence: A confidence score for this individual situation. You can also
set a confidence score for an entire SituationSet using the
SituationSet’s metadata.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.SituationMention(uuid=None, text=None, situationType=None, situationKind=None, argumentList=None, intensity=None, polarity=None, tokens=None, constituent=None, confidence=None)

Bases: object


A concrete mention of a situation, where “situations” include
events, relations, facts, sentiments, and beliefs. Each situation
has a core type (such as EVENT or SENTIMENT), along with an
optional subtype based on its core type (e.g.,
event_type=CONTACT_MEET), and a set of zero or more unordered
arguments.

This struct should be used for most types of SRL labelings
(e.g. Propbank and FrameNet) because they are grounded in text.

Attributes:
- uuid: Unique identifier for this situation.
- text: The text content of this situation mention. This field is
often redundant with the ‘tokens’ field, and may not
be generated by all analytics.
- situationType: The core type of this situation (eg EVENT or SENTIMENT),
or a coarse grain situation type.
- situationKind: A fine grain situation type that specifically describes the
situation mention based on situationType above. It allows for
more detailed description of the situation mention.

Some examples:

if situationType == EVENT, the event type for the sit. mention
if situationType == STATE, the state type for this sit. mention

For Propbank, this field should be the predicate lemma and id,
e.g. “strike.02”. For FrameNet, this should be the frame name,
e.g. “Commerce_buy”.

Different and more varied situationTypes may be added
in the future.
- argumentList: The arguments for this situation mention. Each argument
consists of a role and a value. It is possible for an situation
to have multiple arguments with the same role. Arguments are
unordered.
- intensity: An “intensity” rating for the situation, typically ranging from
0-1. In the case of SENTIMENT situations, this is used to record
the intensity of the sentiment.
- polarity: The polarity of this situation. In the case of SENTIMENT
situations, this is used to record the polarity of the
sentiment.
- tokens: An optional pointer to tokens that are (especially)
relevant to this situation mention. It is left up to individual
analytics to decide what tokens (if any) they wish to include in
this field. In particular, it is not specified whether the
arguments’ tokens should be included.
- constituent: An alternative way to specify the same thing as tokens.
- confidence: A confidence score for this individual situation mention. You
can also set a confidence score for an entire SituationMentionSet
using the SituationMentionSet’s metadata.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.SituationMentionSet(uuid=None, metadata=None, mentionList=None, linkingList=None)

Bases: object


A theory about the set of situation mentions that are present in a
message. See also: SituationMention

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- mentionList: List of mentions in this set.
- linkingList: Entity linking annotations associated with this SituationMentionSet.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.SituationSet(uuid=None, metadata=None, situationList=None, linkingList=None)

Bases: object


A theory about the set of situations that are present in a
message. See also: Situation

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- situationList: List of mentions in this set.
- linkingList: Entity linking annotations associated with this SituationSet.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.TimeML(timeMLClass=None, timeMLTense=None, timeMLAspect=None)

Bases: object


A wrapper for various TimeML annotations.

Attributes:
- timeMLClass: The TimeML class for situations representing TimeML events
- timeMLTense: The TimeML tense for situations representing TimeML events
- timeMLAspect: The TimeML aspect for situations representing TimeML events

read(iprot)
validate()
write(oprot)

concrete.spans package

class concrete.spans.ttypes.AudioSpan(start=None, ending=None)

Bases: object


A span of audio within a single communication, identified by a
pair of time offests. Time offsets are zero-based.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

Attributes:
- start: Start time (in seconds)
- ending: End time (in seconds)

read(iprot)
validate()
write(oprot)
class concrete.spans.ttypes.TextSpan(start=None, ending=None)

Bases: object


A span of text within a single communication, identified by a pair
of zero-indexed character offsets into a Thrift string. Thrift strings
are encoded using UTF-8:
The offsets are character-based, not byte-based - a character with a
three-byte UTF-8 representation only counts as one character.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

Attributes:
- start: Start character, inclusive.
- ending: End character, exclusive

read(iprot)
validate()
write(oprot)

concrete.structure package

class concrete.structure.ttypes.Arc(src=None, dst=None, token=None, weight=None)

Bases: object


Type for arcs. For epsilon edges, leave ‘token’ blank.

Attributes:
- src
- dst
- token
- weight

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Constituent(id=None, tag=None, childList=None, headChildIndex=-1, start=None, ending=None)

Bases: object


A single parse constituent (or “phrase”).

Attributes:
- id: A parse-relative identifier for this consistuent. Together
with the UUID for a Parse, this can be used to define
pointers to specific constituents.
- tag: A description of this constituency node, e.g. the category “NP”.
For leaf nodes, this should be a word and for pre-terminal nodes
this should be a POS tag.
- childList
- headChildIndex: The index of the head child of this constituent. I.e., the
head child of constituent <tt>c</tt> is
<tt>c.children[c.head_child_index]</tt>. A value of -1
indicates that no child head was identified.
- start: The first token (inclusive) of this constituent in the
parent Tokenization. Almost certainly should be populated.
- ending: The last token (exclusive) of this constituent in the
parent Tokenization. Almost certainly should be populated.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.ConstituentRef(parseId=None, constituentIndex=None)

Bases: object


A reference to a Constituent within a Parse.

Attributes:
- parseId: The UUID of the Parse that this Constituent belongs to.
- constituentIndex: The index in the constituent list of this Constituent.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Dependency(gov=-1, dep=None, edgeType=None)

Bases: object


A syntactic edge between two tokens in a tokenized sentence.

Attributes:
- gov: The governor or the head token. 0 indexed.
- dep: The dependent token. 0 indexed.
- edgeType: The relation that holds between gov and dep.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.DependencyParse(uuid=None, metadata=None, dependencyList=None, structureInformation=None)

Bases: object


Represents a dependency parse with typed edges.

Attributes:
- uuid
- metadata
- dependencyList
- structureInformation

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.DependencyParseStructure(isAcyclic=None, isConnected=None, isSingleHeaded=None, isProjective=None)

Bases: object


Information about the structure of a dependency parse.
This information is computable from the list of dependencies,
but this allows the consumer to make (verified) assumptions
about the dependencies being processed.

Attributes:
- isAcyclic: True iff there are no cycles in the dependency graph.
- isConnected: True iff the dependency graph forms a single connected component.
- isSingleHeaded: True iff every node in the dependency parse has at most
one head/parent/governor.
- isProjective: True iff there are no crossing edges in the dependency parse.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.LatticePath(weight=None, tokenList=None)

Bases: object


Attributes:
- weight
- tokenList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Parse(uuid=None, metadata=None, constituentList=None)

Bases: object


A theory about the syntactic parse of a sentence.


ote If we add support for parse forests in the future, then it
will most likely be done by adding a new field (e.g.
“<tt>forest_root</tt>”) that uses a new struct type to encode the
forest. A “<tt>kind</tt>” field might also be added (analogous to
<tt>Tokenization.kind</tt>) to indicate whether a parse is encoded
using a simple tree or a parse forest.

Attributes:
- uuid
- metadata
- constituentList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Section(uuid=None, sentenceList=None, textSpan=None, rawTextSpan=None, audioSpan=None, kind=None, label=None, numberList=None, lidList=None)

Bases: object


A single “section” of a communication, such as a paragraph. Each
section is defined using a text or audio span, and can optionally
contain a list of sentences.

Attributes:
- uuid: The unique identifier for this section.
- sentenceList: The sentences of this “section.”
- textSpan: Location of this section in the communication text.

NOTE: This text span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- rawTextSpan: Location of this section in the raw text.

NOTE: This text span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- audioSpan: Location of this section in the original audio.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- kind: A short, sometimes corpus-specific term characterizing the nature
of the section; may change in a future version of concrete. This
often acts as a coarse-grained descriptor that is used for
filtering. For example, Gigaword uses the section kind “passage”
to distinguish content-bearing paragraphs in the body of an
article from other paragraphs, such as the headline and dateline.
- label: The name of the section. For example, a title of a section on
Wikipedia.
- numberList: Position within the communication with respect to other Sections:
The section number, E.g., 3, or 3.1, or 3.1.2, etc. Aimed at
Communications with content organized in a hierarchy, such as a Book
with multiple chapters, then sections, then paragraphs. Or even a
dense Wikipedia page with subsections. Sections should still be
arranged linearly, where reading these numbers should not be required
to get a start-to-finish enumeration of the Communication’s content.
- lidList: An optional field to be used for multi-language documents.

This field should be populated when a section is inside of
a document that contains multiple languages.

Minimally, each block of text in one language should be it’s own
section. For example, if a paragraph is in English and the
paragraph afterwards is in French, these should be separated into
two different sections, allowing language-specific analytics to
run on appropriate sections.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Sentence(uuid=None, tokenization=None, textSpan=None, rawTextSpan=None, audioSpan=None)

Bases: object


A single sentence or utterance in a communication.

Attributes:
- uuid
- tokenization: Theory about the tokens that make up this sentence. For text
communications, these tokenizations will typically be generated
by a tokenizer. For audio communications, these tokenizations
will typically be generated by an automatic speech recognizer.

The “Tokenization” message type is also used to store the output
of machine translation systems and text normalization
systems.
- textSpan: Location of this sentence in the communication text.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- rawTextSpan: Location of this sentence in the raw text.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- audioSpan: Location of this sentence in the original audio.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

read(iprot)
validate()
write(oprot)

Bases: object


A collection of tokens that represent a link to another resource.
This resource might be another Concrete object (e.g., another
Concrete Communication), represented with the ‘concreteTarget’
field, or it could link to a resource outside of Concrete via the
‘externalTarget’ field.

Attributes:
- tokens: The tokens that make up this SpanLink object.
- concreteTarget
- externalTarget
- linkType

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TaggedToken(tokenIndex=None, tag=None, confidence=None, tagList=None, confidenceList=None)

Bases: object


Attributes:
- tokenIndex: A pointer to the token being tagged.

Token indices are 0-based. These indices are also 0-based.
- tag: A string containing the annotation.
If the tag set you are using is not case sensitive,
then all part of speech tags should be normalized to upper case.
- confidence: Confidence of the annotation.
- tagList: A list of strings that represent a distribution of possible
tags for this token.

If populated, the ‘tag’ field should also be populated
with the “best” value from this list.
- confidenceList: A list of doubles that represent confidences associated with
the tags in the ‘tagList’ field.

If populated, the ‘confidence’ field should also be populated
with the confidence associated with the “best” tag in ‘tagList’.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Token(tokenIndex=None, text=None, textSpan=None, rawTextSpan=None, audioSpan=None)

Bases: object


A single token (typically a word) in a communication. The exact
definition of what counts as a token is left up to the tools that
generate token sequences.

Usually, each token will include at least a text string.

Attributes:
- tokenIndex: A 0-based tokenization-relative identifier for this token that
represents the order that this token appears in the
sentence. Together with the UUID for a Tokenization, this can be
used to define pointers to specific tokens. If a Tokenization
object contains multiple Token objects with the same id (e.g., in
different n-best lists), then all of their other fields must be
identical as well.
- text: The text associated with this token.
Note - we may have a destructive tokenizer (e.g., Stanford rewriting)
and as a result, we want to maintain this field.
- textSpan: Location of this token in this perspective’s text (.text field).
In cases where this token does not correspond directly with any
text span in the text (such as word insertion during MT),
this field may be given a value indicating “approximately” where
the token comes from. A span covering the entire sentence may be
used if no more precise value seems appropriate.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the document, but is the annotation’s best
effort at such a representation.
- rawTextSpan: Location of this token in the original, raw text (.originalText
field). In cases where this token does not correspond directly
with any text span in the original text (such as word insertion
during MT), this field may be given a value indicating
“approximately” where the token comes from. A span covering the
entire sentence may be used if no more precise value seems
appropriate.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original raw document, but is the annotation’s best
effort at such a representation.
- audioSpan: Location of this token in the original audio.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenLattice(startState=0, endState=0, arcList=None, cachedBestPath=None)

Bases: object


A lattice structure that assigns scores to a set of token
sequences. The lattice is encoded as an FSA, where states are
identified by integers, and each arc is annotated with an
optional tokens and a weight. (Arcs with no tokens are
“epsilon” arcs.) The lattice has a single start state and a
single end state. (You can use epsilon edges to simulate
multiple start states or multiple end states, if desired.)

The score of a path through the lattice is the sum of the weights
of the arcs that make up that path. A path with a lower score
is considered “better” than a path with a higher score.

If possible, path scores should be negative log likelihoods
(with base e – e.g. if P=1, then weight=0; and if P=0.5, then
weight=0.693). Furthermore, if possible, the path scores should
be globally normalized (i.e., they should encode probabilities).
This will allow for them to be combined with other information
in a reasonable way when determining confidences for system
outputs.

TokenLattices should never contain any paths with cycles. Every
arc in the lattice should be included in some path from the start
state to the end state.

Attributes:
- startState
- endState
- arcList
- cachedBestPath

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenList(tokenList=None)

Bases: object


A wrapper around a list of tokens.

Attributes:
- tokenList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenRefSequence(tokenIndexList=None, anchorTokenIndex=-1, tokenizationId=None, textSpan=None, rawTextSpan=None, audioSpan=None)

Bases: object


A list of pointers to tokens that all belong to the same
tokenization.

Attributes:
- tokenIndexList: The tokenization-relative identifiers for each token that is
included in this sequence.
- anchorTokenIndex: An optional field that can be used to describe
the root of a sentence (if this sequence is a full sentence),
the head of a constituent (if this sequence is a constituent),
or some other form of “canonical” token in this sequence if,
for instance, it is not easy to map this sequence to a another
annotation that has a head.

This field is defined with respect to the Tokenization given
by tokenizationId, and not to this object’s tokenIndexList.
- tokenizationId: The UUID of the tokenization that contains the tokens.
- textSpan: The text span in the main text (.text field) associated with this
TokenRefSequence.

NOTE: This span represents a best guess, or ‘provenance’: it
cannot be guaranteed that this text span matches the _exact_ text
of the original document, but is the annotation’s best effort at
such a representation.
- rawTextSpan: The text span in the original text (.originalText field)
associated with this TokenRefSequence.

NOTE: This span represents a best guess, or ‘provenance’: it
cannot be guaranteed that this text span matches the _exact_ text
of the original raw document, but is the annotation’s best effort
at such a representation.
- audioSpan: The audio span associated with this TokenRefSequence.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenTagging(uuid=None, metadata=None, taggedTokenList=None, taggingType=None)

Bases: object


A theory about some token-level annotation.
The TokenTagging consists of a mapping from tokens
(using token ids) to string tags (e.g. part-of-speech tags or lemmas).

The mapping defined by a TokenTagging may be partial –
i.e., some tokens may not be assigned any part of speech tags.

For lattice tokenizations, you may need to create multiple
part-of-speech taggings (for different paths through the lattice),
since the appropriate tag for a given token may depend on the path
taken. For example, you might define a separate
TokenTagging for each of the top K paths, which leaves all
tokens that are not part of the path unlabeled.

Currently, we use strings to encode annotations. In
the future, we may add fields for encoding specific tag sets
(eg treebank tags), or for adding compound tags.

Attributes:
- uuid: The UUID of this TokenTagging object.
- metadata: Information about where the annotation came from.
This should be used to tell between gold-standard annotations
and automatically-generated theories about the data
- taggedTokenList: The mapping from tokens to annotations.
This may be a partial mapping.
- taggingType: An ontology-backed string that represents the
type of token taggings this TokenTagging object
produces.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Tokenization(uuid=None, metadata=None, tokenList=None, lattice=None, kind=None, tokenTaggingList=None, parseList=None, dependencyParseList=None, spanLinkList=None)

Bases: object


A theory (or set of alternative theories) about the sequence of
tokens that make up a sentence.

This message type is used to record the output of not just for
tokenizers, but also for a wide variety of other tools, including
machine translation systems, text normalizers, part-of-speech
taggers, and stemmers.

Each Tokenization is encoded using either a TokenList
or a TokenLattice. (If you want to encode an n-best list, then
you should store it as n separate Tokenization objects.) The
“kind” field is used to indicate whether this Tokenization contains
a list of tokens or a TokenLattice.

The confidence value for each sequence is determined by combining
the confidence from the “metadata” field with confidence
information from individual token sequences as follows:

<ul>
<li> For n-best lists:
metadata.confidence </li>
<li> For lattices:
metadata.confidence * exp(-sum(arc.weight)) </li>
</ul>

Note: in some cases (such as the output of a machine translation
tool), the order of the tokens in a token sequence may not
correspond with the order of their original text span offsets.

Attributes:
- uuid
- metadata: Information about where this tokenization came from.
- tokenList: A wrapper around an ordered list of the tokens in this tokenization.
This may also give easy access to the “reconstructed text” associated
with this tokenization.
This field should only have a value if kind==TOKEN_LIST.
- lattice: A lattice that compactly describes a set of token sequences that
might make up this tokenization. This field should only have a
value if kind==LATTICE.
- kind: Enumerated value indicating whether this tokenization is
implemented using an n-best list or a lattice.
- tokenTaggingList
- parseList
- dependencyParseList
- spanLinkList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenizationKind

Bases: object


Enumerated types of Tokenizations

TOKEN_LATTICE = 2
TOKEN_LIST = 1

concrete.summarization package

concrete.summarization.SummarizationService module
class concrete.summarization.SummarizationService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.summarization.SummarizationService.Iface

getCapabilities()
recv_getCapabilities()
recv_summarize()
send_getCapabilities()
send_summarize(query)
summarize(query)

Parameters:
- query

class concrete.summarization.SummarizationService.Iface

Bases: concrete.services.Service.Iface

getCapabilities()
summarize(query)

Parameters:
- query

class concrete.summarization.SummarizationService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.summarization.SummarizationService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_getCapabilities(seqid, iprot, oprot)
process_summarize(seqid, iprot, oprot)
class concrete.summarization.SummarizationService.getCapabilities_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.summarization.SummarizationService.getCapabilities_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.summarization.SummarizationService.summarize_args(query=None)

Bases: object


Attributes:
- query

read(iprot)
validate()
write(oprot)
class concrete.summarization.SummarizationService.summarize_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.summarization.ttypes.SummarizationCapability(type=None, lang=None)

Bases: object


Attributes:
- type
- lang

read(iprot)
validate()
write(oprot)
class concrete.summarization.ttypes.SummarizationRequest(queryTerms=None, maximumTokens=None, maximumCharacters=None, sourceType=None, sourceIds=None, sourceCommunication=None)

Bases: object


A request to summarize which specifies the length of the desired
summary and the text data to be summarized.
Either set sourceCommunication or sourceType and sourceIds.

Attributes:
- queryTerms: Terms or features pertinent to the query.
Can be empty, meaning summarize all source material with
no a priori beliefs about what is important to summarize.
- maximumTokens: Limit on how long the returned summary can be in tokens.
- maximumCharacters: Limit on how long the returned summary can be in characters.
- sourceType: How to interpret the ids in sourceIds.
May be null is sourceIds is null, otherwise must be populated.
- sourceIds: A list of concrete object ids which serve as the material
to summarize.
- sourceCommunication: Alternative to sourceIds+sourceType: provide a Communication
of text to summarize.

read(iprot)
validate()
write(oprot)
class concrete.summarization.ttypes.Summary(summaryCommunication=None, concepts=None)

Bases: object


A shortened version of some text, possibly with some concepts
annotated as justifications for why particular pieces of the
summary were kept.

Attributes:
- summaryCommunication: Contains the text of the generated summary.
- concepts: Concepts mentioned in the summary which are believed to be
interesting and/or worth highlighting.

read(iprot)
validate()
write(oprot)
class concrete.summarization.ttypes.SummaryConcept(tokens=None, concept=None, confidence=1, utility=1)

Bases: object


A mention of a concept described in a summary which is thought
to be informative. Concepts might be named entities, facts, or
events which were determined to be salient in the text being
summarized.

Attributes:
- tokens: Location in summaryCommunication of this concept
- concept: Short description of the concept being evoked, e.g. “kbrel:bornIn” or “related:ACME_Corp”
- confidence: How confident is the system that this concept was evoked by this mention, in [0,1]
- utility: How informative/important it is that this concept be included in the summary (non-negative).

read(iprot)
validate()
write(oprot)
class concrete.summarization.ttypes.SummarySourceType

Bases: object

DOCUMENT = 0
ENTITY = 2
TOKENIZATION = 1

concrete.twitter package

class concrete.twitter.ttypes.BoundingBox(type=None, coordinateList=None)

Bases: object


Attributes:
- type
- coordinateList

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.HashTag(text=None, startOffset=None, endOffset=None)

Bases: object


Attributes:
- text
- startOffset
- endOffset

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.PlaceAttributes(streetAddress=None, region=None, locality=None)

Bases: object


Attributes:
- streetAddress
- region
- locality

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TweetInfo(id=None, text=None, createdAt=None, user=None, truncated=None, entities=None, source=None, coordinates=None, place=None, favorited=None, retweeted=None, retweetCount=None, inReplyToScreenName=None, inReplyToStatusId=None, inReplyToUserId=None, retweetedScreenName=None, retweetedStatusId=None, retweetedUserId=None)

Bases: object


Attributes:
- id
- text
- createdAt
- user
- truncated
- entities
- source
- coordinates
- place
- favorited
- retweeted
- retweetCount
- inReplyToScreenName
- inReplyToStatusId
- inReplyToUserId
- retweetedScreenName
- retweetedStatusId
- retweetedUserId

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterCoordinates(type=None, coordinates=None)

Bases: object


Attributes:
- type
- coordinates

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterEntities(hashtagList=None, urlList=None, userMentionList=None)

Bases: object


Attributes:
- hashtagList
- urlList
- userMentionList

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterLatLong(latitude=None, longitude=None)

Bases: object


A twitter geocoordinate.

Attributes:
- latitude
- longitude

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterPlace(placeType=None, countryCode=None, country=None, fullName=None, name=None, id=None, url=None, boundingBox=None, attributes=None)

Bases: object


Attributes:
- placeType
- countryCode
- country
- fullName
- name
- id
- url
- boundingBox
- attributes

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterUser(id=None, name=None, screenName=None, lang=None, geoEnabled=None, createdAt=None, friendsCount=None, statusesCount=None, verified=None, listedCount=None, favouritesCount=None, followersCount=None, location=None, timeZone=None, description=None, utcOffset=None, url=None)

Bases: object


Information about a Twitter user.

Attributes:
- id
- name
- screenName
- lang
- geoEnabled
- createdAt
- friendsCount
- statusesCount
- verified
- listedCount
- favouritesCount
- followersCount
- location
- timeZone
- description
- utcOffset
- url

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.URL(startOffset=None, endOffset=None, expandedUrl=None, url=None, displayUrl=None)

Bases: object


Attributes:
- startOffset
- endOffset
- expandedUrl
- url
- displayUrl

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.UserMention(startOffset=None, endOffset=None, screenName=None, name=None, id=None)

Bases: object


Attributes:
- startOffset
- endOffset
- screenName
- name
- id

read(iprot)
validate()
write(oprot)

concrete.uuid package

class concrete.uuid.ttypes.UUID(uuidString=None)

Bases: object


Attributes:
- uuidString: A string representation of a UUID, in the format of:
<pre>
550e8400-e29b-41d4-a716-446655440000
</pre>

read(iprot)
validate()
write(oprot)

Development

Submitting a bug report

Please report any bugs to the GitLab (internal) or GitHub (public) issue trackers. Your issue will be resolved more quickly if you are able to provide a minimal working example, including an example concrete data file if applicable.

Contributing code

  1. Ensure an issue has been created for your new feature/bugfix on GitLab (internal) or GitHub (public).

  2. If you are adding a new feature, create a stub (placeholder implementation) of the desired argument/function/class/etc.

  3. Write a test for your new feature/bugfix and run it, ensuring that it fails on the current implementation:

    py.test tests/test_my_code.py
    

    NameErrors, ImportErrors, SyntaxErrors, etc. do not count (they indicate the API is wrong).

  4. Implement your new feature/bugfix.

  5. Run the test again, ensuring that it now passes.

  6. Run all tests and style checks, ensuring that they pass:

    tox
    

    Optionally, run integration tests (you must have Redis server version 2.8 or later in your path; do redis-server --version to check):

    tox integration-tests
    
  7. If you created a new module (file) or package (directory) in the library, please see “Adding new modules and packages” in the next section.

  8. Push your changes to a feature branch on GitLab/GitHub (e.g., called n-issue-abbrev where n is the issue number and issue-abbrev is a very short abbreviation of the issue title) and ensure that the build passes. The build is defined in .gitlab-ci.yml (.travis.yml and appveyor.yml for public builds); tox is configured in tox.ini. The build includes unit tests, integration tests, and style checks and runs on Python 2.7 and 3.5 across multiple platforms; if it fails, please find the error in the build log, fix it, and try again.

  9. Add a line to CHANGELOG under the current version-in-progress describing your changes simply and concisely. Add yourself to AUTHORS if you are not already listed.

  10. If you’ve made multiple commits, please squash them and git push -f to the feature branch.

  11. Create a merge/pull request for your feature branch into master, referencing the GitLab/GitHub issue.

For maintainers

Adding new modules and packages

If a new module or package is created, either by hand or in the auto-generated code from Thrift, a small amount of additional configuration must be performed.

In either case, the name of the package (if it is a package and not a module) should be added to the packages parameter in setup.py. The name of the package or module should be added to the subpackage or submodule list in docs/concrete.rst, respectively. A new ReStructuredText file should also be created under docs/ for the package or module; follow the conventions set by the other packages and modules.

If the new module or package was written by hand, a guard should be added to autodoc_process_docstring in docs/conf.py so that that module or package is not ignored by the documentation parser. If it is a package, a guard should also be added to generate.bash so that the corresponding directory is not deleted when the auto-generated code is copied into concrete/ from the Thrift build directory.

If a new package was generated by Thrift, a corresponding exclude should be added to the flake8 configuration in setup.cfg and the new package’s ttypes module should be added to the star imports in concrete/__init__.py. If a new module (not package) was generated by thrift, no action is necessary.

Branches, versions, and releases

The master branch is kept stable at all times. Before a commit is pushed to master, it should be checked by CI on another branch. The recommended way of maintaining this is to do all work in feature branches that are kept up-to-date with master and pushed to GitLab, waiting for CI to finish before merging.

We use zest.releaser to manage versions, the CHANGELOG, and releases. (Making a new release is a many-step process that requires great care; doing so by hand is strongly discouraged.) Using zest.releaser, stable versions are released to PyPI and master is kept on a development version number (so that a stable version number never represents more than one snapshot of the code). To make a new release install zest.releaser (pip install zest.releaser) and run fullrelease.

Testing PyPI releases

To test how changes to concrete-python will show up on PyPI (for example, how the readme is rendered) you can use the PyPI testing site. To do so, set the following in ~/.pypirc:

repository = https://testpypi.python.org/pypi

You will also need to create a testpypi user account and you may need to request access to the concrete package on testpypi.

Testing documentation

The automated build checks for syntax errors in the documentation. When a push is made to the GitHub repository the online documentation is automatically re-generated. You can run the automatic validation and generate the HTML documentation locally by doing:

tox -e docs

The generated HTML documentation is stored it in .tox/docs/tmp/html (relative to the top of your repository). Open this file path in a web browser to check how your changes will look when published online.

(Re)generating code from concrete

The Python code generated by the thrift compiler on the schema defined in the concrete project is checked in to concrete-python manually after applying necessary patches. For trivial modifications to the schema this process is automated by generate.bash, which assumes concrete has been cloned alongside concrete-python (in the same parent directory):

bash generate.bash

After this succeeds, tests should be run and the changes should be manually inspected (git diff) for sanity. Note that this will not delete previously-generated files that are no longer produced by thrift (whose entries were removed from the schema).

Note: Often generate.bash is not sufficient: the patches (in patches/) document where it (thrift) falls short on the previously-compiled schema. Additionally, if new packages (namespaces) are added to the schema, they must be added to setup.py, setup.cfg, and concrete/__init__.py.

If generate.bash throws an error, the necessary changes should be performed manually and checked in to the index, at which point the generated code should be removed from the working tree, raw (unpatched) generated code should be generated, and new patches should be produced and stored in patches/ using git diff. See the arguments to generate.bash for generating the unpatched code.

Indices and tables