Concrete-python Documentation¶
Concrete-python is the Python interface to Concrete, a natural language processing data format and set of service protocols that work across different operating systems and programming languages via Apache Thrift. Concrete-python contains generated Python classes, utility classes and functions, and scripts. It does not contain the Thrift schema for Concrete, which can be found in the Concrete GitHub repository.
Tutorial¶
Concrete-python is the Python interface to Concrete, a natural language processing data format and set of service protocols that work across different operating systems and programming languages via Apache Thrift. Concrete-python contains generated Python classes, utility classes and functions, and scripts. It does not contain the Thrift schema for Concrete, which can be found in the Concrete GitHub repository.
This document provides a quick tutorial of concrete-python installation and usage. For more information, including an API reference and development information, please see the online documentation.
Table of Contents
License¶
Copyright 2012-2019 Johns Hopkins University HLTCOE. All rights reserved. This software is released under the 2-clause BSD license. Please see LICENSE for more information.
Requirements¶
concrete-python is tested on Python 3.5 and requires the
Thrift Python library, among other Python libraries. These are
installed automatically by setup.py
or pip
. The Thrift
compiler is not required.
Note: The accelerated protocol offers a (de)serialization speedup
of 10x or more; if you would like to use it, ensure a C++ compiler is
available on your system before installing concrete-python.
(If a compiler is not available, concrete-python will fall back to the
unaccelerated protocol automatically.) If you are on Linux, a suitable
C++ compiler will be listed as g++
or gcc-c++
in your package
manager.
If you are using macOS Mojave with the Homebrew package manager
(https://brew.sh), you can install the accelerated protocol using
the script install-mojave-homebrew-accelerated-thrift.sh
.
Installation¶
You can install Concrete using the pip
package manager:
pip install concrete
or by cloning the repository and running setup.py
:
git clone https://github.com/hltcoe/concrete-python.git
cd concrete-python
python setup.py install
Basic usage¶
Here and in the following sections we make use of an example Concrete
Communication file included in the concrete-python source distribution.
The Communication type represents an article, book, post, Tweet, or
any other kind of document that we might want to store and analyze.
Copy it from tests/testdata/serif_dog-bites-man.concrete
if you
have the concrete-python source distribution or download it
separately here: serif_dog-bites-man.concrete.
First we use the concrete-inspect.py
tool (explained in more detail
in the following section) to inspect some of the contents of the
Communication:
concrete-inspect.py --text serif_dog-bites-man.concrete
This command prints the text of the Communication to the console. In our case the text is a short article formatted in SGML:
<DOC id="dog-bites-man" type="other">
<HEADLINE>
Dog Bites Man
</HEADLINE>
<TEXT>
<P>
John Smith, manager of ACMÉ INC, was bit by a dog on March 10th, 2013.
</P>
<P>
He died!
</P>
<P>
John's daughter Mary expressed sorrow.
</P>
</TEXT>
</DOC>
Now run the following command to inspect some of the annotations stored in that Communication:
concrete-inspect.py --ner --pos --dependency serif_dog-bites-man.concrete
This command shows a tokenization, part-of-speech tagging, named entity tagging, and dependency parse in a CoNLL-like columnar format:
INDEX TOKEN POS NER HEAD DEPREL
----- ----- --- --- ---- ------
1 John NNP PER 2 compound
2 Smith NNP PER 10 nsubjpass
3 , ,
4 manager NN 2 appos
5 of IN 7 case
6 ACMÉ NNP ORG 7 compound
7 INC NNP ORG 4 nmod
8 , ,
9 was VBD 10 auxpass
10 bit NN 0 ROOT
11 by IN 13 case
12 a DT 13 det
13 dog NN 10 nmod
14 on IN 15 case
15 March DATE-NNP 13 nmod
16 10th JJ 15 amod
17 , ,
18 2013 CD 13 amod
19 . .
1 He PRP 2 nsubj
2 died VBD 0 ROOT
3 ! .
1 John NNP PER 3 nmod:poss
2 's POS 1 case
3 daughter NN 5 dep
4 Mary NNP PER 5 nsubj
5 expressed VBD 0 ROOT
6 sorrow NN 5 dobj
7 . .
Reading Concrete¶
There are even more annotations stored in this Communication, but for now we move on to demonstrate handling of the Communication in Python. The example file contains a single Communication, but many (if not most) files contain several. The same code can be used to read Communications in a regular file, tar archive, or zip archive:
from concrete.util import CommunicationReader
for (comm, filename) in CommunicationReader('serif_dog-bites-man.concrete'):
print(comm.id)
print()
print(comm.text)
This loop prints the unique ID and text (the same text we saw before) of our one Communication:
tests/testdata/serif_dog-bites-man.xml
<DOC id="dog-bites-man" type="other">
<HEADLINE>
Dog Bites Man
</HEADLINE>
<TEXT>
<P>
John Smith, manager of ACMÉ INC, was bit by a dog on March 10th, 2013.
</P>
<P>
He died!
</P>
<P>
John's daughter Mary expressed sorrow.
</P>
</TEXT>
</DOC>
In addition to the general-purpose CommunicationReader
there is a
convenience function for reading a single Communication from a regular
file:
from concrete.util import read_communication_from_file
comm = read_communication_from_file('serif_dog-bites-man.concrete')
Communications are broken into Sections, which are in turn broken into Sentences, which are in turn broken into Tokens (and that’s only scratching the surface). To traverse this decomposition:
from concrete.util import lun, get_tokens
for section in lun(comm.sectionList):
print('* section')
for sentence in lun(section.sentenceList):
print(' + sentence')
for token in get_tokens(sentence.tokenization):
print(' - ' + token.text)
The output is:
* section
* section
+ sentence
- John
- Smith
- ,
- manager
- of
- ACMÉ
- INC
- ,
- was
- bit
- by
- a
- dog
- on
- March
- 10th
- ,
- 2013
- .
* section
+ sentence
- He
- died
- !
* section
+ sentence
- John
- 's
- daughter
- Mary
- expressed
- sorrow
- .
Here we used get_tokens
, which abstracts the process of extracting
a sequence of Tokens from a Tokenization, and lun
, which
returns its argument or (if its argument is None
) an empty list
and stands for “list un-none”. Many fields in Concrete are optional,
including Communication.sectionList
and Section.sentenceList
;
checking for None
quickly becomes tedious.
In this Communication the tokens have been annotated with
part-of-speech tags, as we saw previously using
concrete-inspect.py
. We can print them with the following code:
from concrete.util import get_tagged_tokens
for section in lun(comm.sectionList):
print('* section')
for sentence in lun(section.sentenceList):
print(' + sentence')
for token_tag in get_tagged_tokens(sentence.tokenization, 'POS'):
print(' - ' + token_tag.tag)
The output is:
* section
* section
+ sentence
- NNP
- NNP
- ,
- NN
- IN
- NNP
- NNP
- ,
- VBD
- NN
- IN
- DT
- NN
- IN
- DATE-NNP
- JJ
- ,
- CD
- .
* section
+ sentence
- PRP
- VBD
- .
* section
+ sentence
- NNP
- POS
- NN
- NNP
- VBD
- NN
- .
Writing Concrete¶
We can add a new part-of-speech tagging to the Communication as well. Let’s add a simplified version of the current tagging:
from concrete.util import AnalyticUUIDGeneratorFactory, now_timestamp
from concrete import TokenTagging, TaggedToken, AnnotationMetadata
augf = AnalyticUUIDGeneratorFactory(comm)
aug = augf.create()
for section in lun(comm.sectionList):
for sentence in lun(section.sentenceList):
sentence.tokenization.tokenTaggingList.append(TokenTagging(
uuid=aug.next(),
metadata=AnnotationMetadata(
tool='Simple POS',
timestamp=now_timestamp(),
kBest=1
),
taggingType='POS',
taggedTokenList=[
TaggedToken(
tokenIndex=original.tokenIndex,
tag=original.tag.split('-')[-1][:2],
)
for original
in get_tagged_tokens(sentence.tokenization, 'POS')
]
))
Here we used AnalyticUUIDGeneratorFactory
, which creates generators of
Concrete UUID objects (see Working with UUIDs for more information).
We also used now_timestamp
, which returns a Concrete timestamp representing
the current time. But now how do we know which tagging is ours? Each
annotation’s metadata contains a tool name, and we can use it to
distinguish between competing annotations:
from concrete.util import get_tagged_tokens
for section in lun(comm.sectionList):
print('* section')
for sentence in lun(section.sentenceList):
print(' + sentence')
token_tag_pairs = zip(
get_tagged_tokens(sentence.tokenization, 'POS', tool='Serif: part-of-speech'),
get_tagged_tokens(sentence.tokenization, 'POS', tool='Simple POS')
)
for (old_tag, new_tag) in token_tag_pairs:
print(' - ' + old_tag.tag + ' -> ' + new_tag.tag)
The output shows our new part-of-speech tagging has a smaller, simpler set of possible values:
* section
* section
+ sentence
- NNP -> NN
- NNP -> NN
- , -> ,
- NN -> NN
- IN -> IN
- NNP -> NN
- NNP -> NN
- , -> ,
- VBD -> VB
- NN -> NN
- IN -> IN
- DT -> DT
- NN -> NN
- IN -> IN
- DATE-NNP -> NN
- JJ -> JJ
- , -> ,
- CD -> CD
- . -> .
* section
+ sentence
- PRP -> PR
- VBD -> VB
- . -> .
* section
+ sentence
- NNP -> NN
- POS -> PO
- NN -> NN
- NNP -> NN
- VBD -> VB
- NN -> NN
- . -> .
Finally, let’s write our newly annotated Communication back to disk:
from concrete.util import CommunicationWriter
with CommunicationWriter('serif_dog-bites-man.concrete') as writer:
writer.write(comm)
Note there are many other useful classes and functions in the
concrete.util
library. See the API reference in the
online documentation for details.
concrete-inspect.py¶
Use concrete-inspect.py
to quickly explore the contents of a
Communication from the command line. concrete-inspect.py
and other
scripts are installed to the path along with the concrete-python
library.
–id¶
Run the following command to print the unique ID of our modified example Communication:
concrete-inspect.py --id serif_dog-bites-man.concrete
Output:
tests/testdata/serif_dog-bites-man.xml
–metadata¶
Use --metadata
to print the stored annotations along with their
tool names:
concrete-inspect.py --metadata serif_dog-bites-man.concrete
Output:
Communication: concrete_serif v3.10.1pre
Tokenization: Serif: tokens
Dependency Parse: Stanford
Parse: Serif: parse
TokenTagging: Serif: names
TokenTagging: Serif: part-of-speech
TokenTagging: Simple POS
EntityMentionSet #0: Serif: names
EntityMentionSet #1: Serif: values
EntityMentionSet #2: Serif: mentions
EntitySet #0: Serif: doc-entities
EntitySet #1: Serif: doc-values
SituationMentionSet #0: Serif: relations
SituationMentionSet #1: Serif: events
SituationSet #0: Serif: relations
SituationSet #1: Serif: events
CommunicationTagging: lda
CommunicationTagging: urgency
–sections¶
Use --sections
to print the text of the Communication, broken out
by section:
concrete-inspect.py --sections serif_dog-bites-man.concrete
Output:
Section 0 (0ab68635-c83d-4b02-b8c3-288626968e05)[kind: SectionKind.PASSAGE], from 81 to 82:
Section 1 (54902d75-1841-4d8d-b4c5-390d4ef1a47a)[kind: SectionKind.PASSAGE], from 85 to 162:
John Smith, manager of ACMÉ INC, was bit by a dog on March 10th, 2013.
</P>
Section 2 (7ec8b7d9-6be0-4c62-af57-3c6c48bad711)[kind: SectionKind.PASSAGE], from 165 to 180:
He died!
</P>
Section 3 (68da91a1-5beb-4129-943d-170c40c7d0f7)[kind: SectionKind.PASSAGE], from 183 to 228:
John's daughter Mary expressed sorrow.
</P>
–entities¶
Use --entities
to print the named entities detected in the
Communication:
concrete-inspect.py --entities serif_dog-bites-man.concrete
Output:
Entity Set 0 (Serif: doc-entities):
Entity 0-0:
EntityMention 0-0-0:
tokens: John Smith
text: John Smith
entityType: PER
phraseType: PhraseType.NAME
EntityMention 0-0-1:
tokens: John Smith , manager of ACMÉ INC ,
text: John Smith, manager of ACMÉ INC,
entityType: PER
phraseType: PhraseType.APPOSITIVE
child EntityMention #0:
tokens: John Smith
text: John Smith
entityType: PER
phraseType: PhraseType.NAME
child EntityMention #1:
tokens: manager of ACMÉ INC
text: manager of ACMÉ INC
entityType: PER
phraseType: PhraseType.COMMON_NOUN
EntityMention 0-0-2:
tokens: manager of ACMÉ INC
text: manager of ACMÉ INC
entityType: PER
phraseType: PhraseType.COMMON_NOUN
EntityMention 0-0-3:
tokens: He
text: He
entityType: PER
phraseType: PhraseType.PRONOUN
EntityMention 0-0-4:
tokens: John
text: John
entityType: PER.Individual
phraseType: PhraseType.NAME
Entity 0-1:
EntityMention 0-1-0:
tokens: ACMÉ INC
text: ACMÉ INC
entityType: ORG
phraseType: PhraseType.NAME
Entity 0-2:
EntityMention 0-2-0:
tokens: John 's daughter Mary
text: John's daughter Mary
entityType: PER.Individual
phraseType: PhraseType.NAME
child EntityMention #0:
tokens: Mary
text: Mary
entityType: PER
phraseType: PhraseType.OTHER
EntityMention 0-2-1:
tokens: daughter
text: daughter
entityType: PER
phraseType: PhraseType.COMMON_NOUN
Entity Set 1 (Serif: doc-values):
Entity 1-0:
EntityMention 1-0-0:
tokens: March 10th , 2013
text: March 10th, 2013
entityType: TIMEX2.TIME
phraseType: PhraseType.OTHER
–mentions¶
Use --mentions
to show the named entity mentions in the
Communication, annotated on the text:
concrete-inspect.py --mentions serif_dog-bites-man.concrete
Output:
<ENTITY ID=0><ENTITY ID=0>John Smith</ENTITY> , <ENTITY ID=0>manager of <ENTITY ID=1>ACMÉ INC</ENTITY></ENTITY> ,</ENTITY> was bit by a dog on <ENTITY ID=3>March 10th , 2013</ENTITY> .
<ENTITY ID=0>He</ENTITY> died !
<ENTITY ID=2><ENTITY ID=0>John</ENTITY> 's <ENTITY ID=2>daughter</ENTITY> Mary</ENTITY> expressed sorrow .
–situations¶
Use --situations
to show the situations detected in the
Communication:
concrete-inspect.py --situations serif_dog-bites-man.concrete
Output:
Situation Set 0 (Serif: relations):
Situation Set 1 (Serif: events):
Situation 1-0:
situationType: Life.Die
–treebank¶
Use --treebank
to show constituency parse trees of the sentences in
the Communication:
concrete-inspect.py --treebank serif_dog-bites-man.concrete
Output:
(S (NP (NPP (NNP john)
(NNP smith))
(, ,)
(NP (NPA (NN manager))
(PP (IN of)
(NPP (NNP acme)
(NNP inc))))
(, ,))
(VP (VBD was)
(NP (NPA (NN bit))
(PP (IN by)
(NP (NPA (DT a)
(NN dog))
(PP (IN on)
(NP (DATE (DATE-NNP march)
(JJ 10th))
(, ,)
(NPA (CD 2013))))))))
(. .))
(S (NPA (PRP he))
(VP (VBD died))
(. !))
(S (NPA (NPPOS (NPP (NNP john))
(POS 's))
(NN daughter)
(NPP (NNP mary)))
(VP (VBD expressed)
(NPA (NN sorrow)))
(. .))
Other options¶
Use --ner
, --pos
, --lemmas
, and --dependency
(together
or independently) to show respective token-level information in a
CoNLL-like format, and use --text
to print the text of the
Communication, as described in a previous section.
Run concrete-inspect.py --help
to show a detailed help message
explaining the options discussed above and others. All
concrete-python scripts have such help messages.
create-comm.py¶
Use create-comm.py
to generate a simple Communication from a text
file. For example, create a file called history-of-the-world.txt
containing the following text:
The dog ran .
The cat jumped .
The dolphin teleported .
Then run the following command to convert it to a Concrete Communication, creating Sections, Sentences, and Tokens based on whitespace:
create-comm.py --annotation-level token history-of-the-world.txt history-of-the-world.concrete
Use concrete-inspect.py
as shown previously to verify the
structure of the Communication:
concrete-inspect.py --sections history-of-the-world.concrete
Output:
Section 0 (a188dcdd-1ade-be5d-41c4-fd4d81f71685)[kind: passage], from 0 to 30:
The dog ran .
The cat jumped .
Section 1 (a188dcdd-1ade-be5d-41c4-fd4d81f7168a)[kind: passage], from 32 to 57:
The dolphin teleported .
Other scripts¶
concrete-python provides a number of other scripts, including but not limited to:
concrete2json.py
- reads in a Concrete Communication and prints a JSON version of the Communication to stdout. The JSON is “pretty printed” with indentation and whitespace, which makes the JSON easier to read and to use for diffs.
create-comm-tarball.py
- like
create-comm.py
but for multiple files: reads in a tar.gz archive of text files, parses them into sections and sentences based on whitespace, and writes them back out as Concrete Communications in another tar.gz archive. fetch-client.py
- connects to a FetchCommunicationService, retrieves one or more Communications (as specified on the command line), and writes them to disk.
fetch-server.py
- implements FetchCommunicationService, serving Communications to clients from a file or directory of Communications on disk.
search-client.py
- connects to a SearchService, reading queries from the console and printing out results as Communication ids in a loop.
validate-communication.py
- reads in a Concrete Communication file and prints out information
about any invalid fields. This script is a command-line wrapper
around the functionality in the
concrete.validate
library.
Use the --help
flag for details about the scripts’ command line
arguments.
Working with UUIDs¶
Each UUID object contains a single string,
uuidString
, which can be used as a universally unique identifier for the
object the UUID is attached to. The AnalyticUUIDGeneratorFactory
produces
UUID generators for a Communication, one for each analytic (tool) used to
process the Communication. In contrast to the Python uuid
library, the
AnalyticUUIDGeneratorFactory
yields UUIDs that have common prefixes within a
Communication and within annotations produced by the same analytic, enabling
common compression algorithms to much more efficiently store the UUIDs in each
Communication. See the AnalyticUUIDGeneratorFactory
class in the API
reference in the online documentation for more information.
Note that uuidString
is generated by
a random process, so running the same code twice will result in two
completely different sets of identifiers. Concretely, if you run a parser to
produce a part-of-speech TokenTagging for each Tokenization in a
Communication, save the modified Communication, then run the parser again on
the same original Communication, you will get two different identifiers for
each TokenTagging, even though the contents of each pair of
TokenTaggings—the part-of-speech tags—may be the identical.
Validating Concrete Communications¶
The Python version of the Thrift Libraries does not perform any
validation of Thrift objects. You should use the
validate_communication()
function after reading and before writing
a Concrete Communication:
from concrete.util import read_communication_from_file
from concrete.validate import validate_communication
comm = read_communication_from_file('tests/testdata/serif_dog-bites-man.concrete')
# Returns True|False, logs details using Python stdlib 'logging' module
validate_communication(comm)
Thrift fields have three levels of requiredness:
- explicitly labeled as required
- explicitly labeled as optional
- no requiredness label given (“default required”)
Other Concrete tools will raise an exception if a required field is
missing on deserialization or serialization, and will raise an
exception if a “default required” field is missing on serialization.
By default, concrete-python does not perform any validation of Thrift
objects on serialization or deserialization. The Python Thrift classes
do provide shallow validate()
methods, but they only check for
explicitly required fields (not “default required” fields) and do
not validate nested objects.
The validate_communication()
function recursively checks a
Communication object for required fields, plus additional checks for
UUID mismatches.
Advanced Usage¶
In this section we demonstrate more advanced processing of Concrete Communications. We previously traversed Sections, Sentences, TokenLists, and TokenTaggings, which have a nested linear structure; we now demonstrate usage of DependencyParses, Entities, and SituationMentions, which are non-linear, higher-level annotations.
Print DependencyParses¶
The following code prints a Communication’s tokens and their dependency
graph in CoNLL format, similar to concrete-inspect.py --dependency
,
for the first dependency parse in each sentence. This example makes
use of serif_dog-bites-man.concrete:
from concrete.util import read_communication_from_file, lun
comm = read_communication_from_file('serif_dog-bites-man.concrete')
for section in lun(comm.sectionList):
for sentence in lun(section.sentenceList):
if sentence.tokenization and sentence.tokenization.tokenList:
# Columns of CoNLL-style output go here.
taggings = []
# Token text
taggings.append([x.text for x in sentence.tokenization.tokenList.tokenList])
if sentence.tokenization.dependencyParseList:
# Read dependency arcs from dependency parse tree. (Deps start at zero.)
head = [-1]*len(sentence.tokenization.tokenList.tokenList)
for arc in sentence.tokenization.dependencyParseList[0].dependencyList:
head[arc.dep] = arc.gov
# Add head index to taggings
taggings.append(head)
# Transpose the list. Format and print each row.
for row in zip(*taggings):
print('\t'.join('%15s' % x for x in row))
print('')
There are many optional fields in Concrete and here we’ve encountered
several of them: Communication.sectionList
,
Section.sentenceList
, Sentence.tokenization
,
Tokenization.tokenList
, and Tokenization.dependencyParseList
.
An unset optional field is represented with a value of None
.
We’ve used concrete.util.unnone.lun()
, which returns its
argument if its argument is not None
and otherwise returns an empty
list, to work around some of the optional fields, while we’ve directly
checked the others.
Expected output of the previous code:
John 1
Smith 9
, -1
manager 1
of 6
ACMÉ 6
INC 3
, -1
was 9
bit -1
by 12
a 12
dog 9
on 14
March 12
10th 14
, -1
2013 12
. -1
He 1
died -1
! -1
John 2
's 0
daughter 4
Mary 4
expressed -1
sorrow 4
. -1
Print Entities¶
We now print Entities and their EntityMentions (which represent the result of coreference resolution). This example makes use of serif_dog-bites-man.concrete:
from concrete.util import read_communication_from_file, lun
comm = read_communication_from_file('serif_dog-bites-man.concrete')
for entitySet in lun(comm.entitySetList):
for ei, entity in enumerate(entitySet.entityList):
print('Entity %s (%s)' % (ei, entity.canonicalName))
for i, mention in enumerate(entity.mentionList):
print(' Mention %s: %s' % (i, mention.text))
print('')
print('')
Note that Entity.mentionList
is not in the schema! This field
was added in
concrete.util.file_io.read_communication_from_file()
after
deserializing the original Communication. By default, some additional
fields are added to Concrete objects by
concrete.util.references.add_references_to_communication()
when they are deserialized; see that function’s documentation for
details. For our purposes here, know that
add_references_to_communication
adds a mentionList
field to
each Entity that contains a list of the EntityMentions that reference
that Entity.
Expected output of the previous code:
Entity 0 (None)
Mention 0: John Smith
Mention 1: John Smith, manager of ACMÉ INC,
Mention 2: manager of ACMÉ INC
Mention 3: He
Mention 4: John
Entity 1 (None)
Mention 0: ACMÉ INC
Entity 2 (None)
Mention 0: John's daughter Mary
Mention 1: daughter
Entity 0 (2013-03-10)
Mention 0: March 10th, 2013
Print SituationMentions¶
We now print SituationMentions, the results of relation extraction. This example makes use of serif_example.concrete, on which BBN-SERIF’s relation and event extractor has been run:
from concrete.util import read_communication_from_file, lun
comm = read_communication_from_file('serif_example.concrete')
for i, situationMentionSet in enumerate(lun(comm.situationMentionSetList)):
if situationMentionSet.metadata:
print('Situation Set %d (%s):' % (i, situationMentionSet.metadata.tool))
else:
print('Situation Set %d:' % i)
for j, situationMention in enumerate(situationMentionSet.mentionList):
print('SituationMention %d-%d:' % (i, j))
print(' text', situationMention.text)
print(' situationType', situationMention.situationType)
for k, arg in enumerate(lun(situationMention.argumentList)):
print(' Argument %d:' % k)
print(' role', arg.role)
if arg.entityMention:
print(' entityMention', arg.entityMention.text)
if arg.situationMention:
print(' situationMention:')
print(' text', situationMention.text)
print(' situationType', situationMention.situationType)
print('')
print('')
Expected output:
Situation Set 0 (Serif: relations):
SituationMention 0-0:
text None
situationType ORG-AFF.Employment
Argument 0:
role Role.RELATION_SOURCE_ROLE
entityMention manager of ACME INC
Argument 1:
role Role.RELATION_TARGET_ROLE
entityMention ACME INC
SituationMention 0-1:
text None
situationType PER-SOC.Family
Argument 0:
role Role.RELATION_SOURCE_ROLE
entityMention John
Argument 1:
role Role.RELATION_TARGET_ROLE
entityMention daughter
Situation Set 1 (Serif: events):
SituationMention 1-0:
text died
situationType Life.Die
Argument 0:
role Victim
entityMention He
API Reference¶
Python modules and scripts for working with Concrete, an HLT data specification defined using Thrift.
High-level interface¶
concrete.inspect module¶
Functions used by concrete_inspect.py to print data in a Communication.
The function implementations provide useful examples of how to interact with many different Concrete datastructures.
-
concrete.inspect.
penn_treebank_for_parse
(parse)¶ Return a Penn-Treebank style representation of a Parse object
Parameters: parse (Parse) – Returns: A string containing a Penn Treebank style parse tree representation Return type: str
-
concrete.inspect.
print_communication_taggings_for_communication
(comm, tool=None, communication_tagging_filter=None)¶ Print information for
CommunicationTagging
objectsParameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print information for
CommunicationTagging
objects with a matching metadata.tool field - communication_tagging_filter (func) – If not None, print information
for only those
CommunicationTagging
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
Print ‘CoNLL-style’ tags for the tokens in a Communication. If column is requested (for example, ner is set to True) but there is no such annotation in the communication, that column is not printed (the header is not printed either). If there is more than one such annotation in the communication, one column is printed for each annotation. In the event of differing numbers of annotations per Tokenization, all annotations are printed, but it is not guaranteed that the columns of two different tokenizations correspond to one another.
Parameters: - comm (Communication) –
- char_offsets (bool) – Flag for printing token text specified by
a
Token
’s (optional)TextSpan
- dependency (bool) – Flag for printing dependency parse HEAD tags
- dependency_tool (str) – Deprecated.
If not None, only print information for
DependencyParse
objects if they have a matching metadata.tool field - dependency_parse_filter (func) – If not None, print information
for only those
DependencyParse
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered). - lemmas (bool) – Flag for printing lemma tags
(
TokenTagging
objects of type LEMMA) - lemmas_tool (str) – Deprecated.
If not None, only print information for
TokenTagging
objects of type LEMMA if they have a matching metadata.tool field - lemmas_filter (func) – If not None, print information for only those LEMMA taggings that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
- ner (bool) – Flag for printing Named Entity Recognition tags
(
TokenTagging
objects of type NER) - ner_tool (str) – Deprecated.
If not None, only print information for
TokenTagging
objects of type NER if they have a matching metadata.tool field - ner_filter (func) – If not None, print information for only those NER taggings that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
- pos (bool) – Flag for printing Part-of-Speech tags
(
TokenTagging
objects of type POS) - pos_tool (str) – Deprecated.
If not None, only print information for
TokenTagging
objects of type POS if they have a matching metadata.tool field - pos_filter (func) – If not None, print information for only those POS taggings that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
- other_tags (dict) – Map of other tagging types to print (as keys) to annotation filters, or None. If the value (annotation filter) of a given tagging type is not None, print information for only those taggings that pass the filter (should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered)).
-
concrete.inspect.
print_entities
(comm, tool=None, entity_set_filter=None)¶ Print information for
Entity
objects and their associatedEntityMention
objectsParameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print information for
EntitySet
objects with a matching metadata.tool field - entity_set_filter (func) – If not None, print information
for only those
EntitySet
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_id_for_communication
(comm, tool=None, communication_filter=None)¶ Print ID field of
Communication
Parameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print ID of
Communication
objects with a matching metadata.tool field - communication_filter (func) – If not None, print information
for only those
Communication
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_metadata
(comm, tool=None, annotation_filter=None)¶ Print metadata tools used to annotate Communication
Parameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print
AnnotationMetadata
information for objects with a matching metadata.tool field - annotation_filter (func) – If not None, print information for only those objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_penn_treebank_for_communication
(comm, tool=None, parse_filter=None)¶ Print Penn-Treebank parse trees for all
Tokenization
objectsParameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print information for
Tokenization
objects with a matching metadata.tool field - parse_filter (func) – If not None, print information
for only those
Parse
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_sections
(comm, tool=None, communication_filter=None)¶ Print information for all
Section
object, according to their spans.Parameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print information for
Section
objects with a matching metadata.tool field - communication_filter (func) – If not None, print information
for only those
Communication
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_situation_mentions
(comm, tool=None, situation_mention_set_filter=None)¶ Print information for all
SituationMention`s (some of which may not have a :class:
.Situation`)Parameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print information for
SituationMention
objects with a matching metadata.tool field - situation_mention_set_filter (func) – If not None, print information
for only those
SituationMentionSet
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_situations
(comm, tool=None, situation_set_filter=None)¶ Print information for all
Situation
objects and their associatedSituationMention
objectsParameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print information for
Situation
objects with a matching metadata.tool field - situation_set_filter (func) – If not None, print information
for only those
SituationSet
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_text_for_communication
(comm, tool=None, communication_filter=None)¶ Print text field of :class:.Communication`
Parameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print text field of
Communication
objects with a matching metadata.tool field - communication_filter (func) – If not None, print information
for only those
Communication
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_tokens_for_communication
(comm, tool=None, tokenization_filter=None)¶ Print token text for a
Communication
Parameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print token text for
Communication
objects with a matching metadata.tool field - tokenization_filter (func) – If not None, print information
for only those
Tokenization
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
-
concrete.inspect.
print_tokens_with_entityMentions
(comm, tool=None, entity_mention_set_filter=None)¶ Print information for
Token
objects that are part of anEntityMention
Parameters: - comm (Communication) –
- tool (str) – Deprecated.
If not None, only print information for tokens
that are associated with an
EntityMention
that is part of anEntityMentionSet
with a matching metadata.tool field - entity_mention_set_filter (func) – If not None, print information
for only those
EntityMentionSet
objects that pass this filter. Should be a function that takes a list of annotations (objects with metadata fields) and returns a list of annotations (possibly filtered and re-ordered).
concrete.util package¶
Utility code for working with Concrete
concrete.util.access module¶
-
class
concrete.util.access.
CommunicationContainerFetchHandler
(communication_container)¶ Bases:
object
FetchCommunicationService implementation using Communication containers
Implements the
FetchCommunicationService
interface, retrieving Communications from a dict-like communication_container object that maps Communication ID strings to Communications. The communication_container could be an actual dict, or a container such as:DirectoryBackedCommunicationContainer
FetchBackedCommunicationContainer
MemoryBackedCommunicationContainer
RedisHashBackedCommunicationContainer
ZipFileBackedCommunicationContainer
S3BackedCommunicationContainer
Usage:
from concrete.util.access_wrapper import FetchCommunicationServiceWrapper handler = CommunicationContainerFetchHandler(comm_container) fetch_service = FetchCommunicationServiceWrapper(handler) fetch_service.serve(host, port)
Parameters: communication_container – Dict-like object that maps Communication IDs to Communications -
about
()¶
-
alive
()¶
-
fetch
(fetch_request)¶
-
getCommunicationCount
()¶
-
getCommunicationIDs
(offset, count)¶
-
class
concrete.util.access.
DirectoryBackedStoreHandler
(store_path)¶ Bases:
object
Simple StoreCommunicationService implementation using a directory
Implements the
StoreCommunicationService
interface, storing Communications in a directory.Parameters: store_path – Path where Communications should be Stored -
about
()¶
-
alive
()¶
-
store
(communication)¶ Save Communication to a directory
Stored Communication files will be named [COMMUNICATION_ID].comm. If a file with that name already exists, it will be overwritten.
-
-
class
concrete.util.access.
RedisHashBackedStoreHandler
(redis_db, key)¶ Bases:
object
Simple StoreCommunicationService implementation using a Redis hash.
Implements the
StoreCommunicationService
interface, storing Communications in a Redis hash, indexed by id.Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – key of hash in redis database
-
about
()¶
-
alive
()¶
-
store
(communication)¶ Save Communication to a Redis hash, using the Communication id as a key.
Parameters: communication (Communication) – communication to store
-
class
concrete.util.access.
RelayFetchHandler
(host, port)¶ Bases:
object
Implements a ‘relay’ to another
FetchCommunicationService
server.A
FetchCommunicationService
that acts as a relay to a secondFetchCommunicationService
, where the second service is using the TSocket transport and TCompactProtocol protocol.This class was designed for the use case where you have Thrift JavaScript code that needs to communicate with a
FetchCommunicationService
server, but the server does not support the same Thrift serialization protocol as the JavaScript client.The de-facto standard for Concrete services is to use the TCompactProtocol serialization protocol over a TSocket connection. But as of Thrift 0.10.0, the Thrift JavaScript libraries only support using TJSONProtocol over HTTP.
The RelayFetchHandler class is intended to be used as server-side code by a web application. The JavaScript code will make
FetchCommunicationService
RPC calls to the web server using HTTP/TJSONProtocol, and the web application will then pass these RPC calls to anotherFetchCommunicationService
using TSocket/TCompactProtocol RPC calls.Parameters: - host (str) – Hostname of
FetchCommunicationService
server - port (int) – Port # of
FetchCommunicationService
server
-
about
()¶
-
alive
()¶
-
fetch
(request)¶
-
getCommunicationCount
()¶
-
getCommunicationIDs
(offset, count)¶
- host (str) – Hostname of
-
class
concrete.util.access.
S3BackedStoreHandler
(bucket, prefix_len=4)¶ Bases:
object
Simple StoreCommunicationService implementation using an AWS S3 bucket.
Implements the
StoreCommunicationService
interface, storing Communications in an S3 bucket, indexed by id, optionally prefixed with a fixed-length, random-looking but deterministic hash to improve performance.References
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Parameters: - bucket (boto.s3.bucket.Bucket) – S3 bucket object
- prefix_len (int) – length of prefix to add to a Communication id to form its key. A prefix of length four enables S3 to better partition the bucket contents, yielding higher performance and a lower chance of getting rate-limited by AWS.
-
about
()¶ Return S3BackedStoreHandler service information.
Returns: An object of type ServiceInfo
-
alive
()¶ Return whether service is alive and running.
Returns: True or False
-
store
(communication)¶ Save Communication to an S3 bucket, using the Communication id with a hash prefix of length self.prefix_len as a key.
Parameters: communication (Communication) – communication to store
-
concrete.util.access.
prefix_s3_key
(key_str, prefix_len)¶ Given unprefixed S3 key key_str, prefix the key with a deterministic prefix of hex characters of length prefix_len and return the result. Keys with such prefixes enable better performance on S3 and reduce the likelihood of rate-limiting.
Parameters: - key_str (str) – original (unprefixed) key, as a string
- prefix_len (int) – length of prefix to add to key
Returns: prefixed key
References
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
-
concrete.util.access.
unprefix_s3_key
(prefixed_key_str, prefix_len)¶ Given prefixed S3 key key_str, remove prefix of length prefix_len from the key and return the result. Keys with random-looking prefixes enable better performance on S3 and reduce the likelihood of rate-limiting.
Parameters: - preixed_key_str (str) – prefixed key, as a string
- prefix_len (int) – length of prefix to remove from key
Returns: unprefixed key
References
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
concrete.util.access_wrapper module¶
-
class
concrete.util.access_wrapper.
FetchCommunicationClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/access/FetchCommunicationService.py'>¶
-
class
concrete.util.access_wrapper.
FetchCommunicationServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/access/FetchCommunicationService.py'>¶
-
-
class
concrete.util.access_wrapper.
HTTPFetchCommunicationClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/access/FetchCommunicationService.py'>¶
-
-
class
concrete.util.access_wrapper.
HTTPStoreCommunicationClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/access/StoreCommunicationService.py'>¶
-
-
class
concrete.util.access_wrapper.
StoreCommunicationClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/access/StoreCommunicationService.py'>¶
-
class
concrete.util.access_wrapper.
StoreCommunicationServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/access/StoreCommunicationService.py'>¶
-
-
class
concrete.util.access_wrapper.
SubprocessFetchCommunicationServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
FetchCommunicationServiceWrapper
-
class
concrete.util.access_wrapper.
SubprocessStoreCommunicationServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
StoreCommunicationServiceWrapper
concrete.util.annotate_wrapper module¶
-
class
concrete.util.annotate_wrapper.
AnnotateCommunicationClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/annotate/AnnotateCommunicationService.py'>¶
-
class
concrete.util.annotate_wrapper.
AnnotateCommunicationServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/annotate/AnnotateCommunicationService.py'>¶
-
-
class
concrete.util.annotate_wrapper.
HTTPAnnotateCommunicationClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/annotate/AnnotateCommunicationService.py'>¶
-
-
class
concrete.util.annotate_wrapper.
SubprocessAnnotateCommunicationServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
AnnotateCommunicationServiceWrapper
concrete.util.comm_container module¶
Communication Containers - mapping Communication IDs to Communications
Classes that behave like a read-only dictionary (implementing Python’s collections.abc.Mapping interface) and map Communication ID strings to Communications.
The classes abstract away the storage backend. If you need to optimize for performance, you may not want to use a dictionary abstraction that retrieves one Communication at a time.
-
class
concrete.util.comm_container.
DirectoryBackedCommunicationContainer
(directory_path, comm_extensions=['.comm', '.concrete', '.gz'], add_references=True)¶ Bases:
collections.abc.Mapping
Maps Comm IDs to Comms, retrieving Comms from the filesystem
DirectoryBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from the filesystem.
Upon initialization, a DirectoryBackedCommunicationContainer instance will (recursively) search directory_path for any files that end with the specified comm_extensions. Files with matching extensions are assumed to be Communication files whose filename (sans extension) is the file’s Communication ID. So, for example, a file named ‘XIN_ENG_20101212.0120.concrete’ is assumed to be a Communication file with a Communication ID of ‘XIN_ENG_20101212.0120’.
Files with the extension .gz will be decompressed using gzip.
A DirectoryBackedCommunicationsContainer will not be able to find any files that are added to directory_path after the container was initialized.
Parameters: - directory_path (str) – Path to directory containing Communications files
- comm_extensions (str[]) – List of strings specifying filename extensions to be associated with Communications
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
on any retrievedCommunication
-
class
concrete.util.comm_container.
FetchBackedCommunicationContainer
(host, port)¶ Bases:
collections.abc.Mapping
Maps Comm IDs to Comms, retrieving Comms from a
FetchCommunicationService
serverFetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from a
FetchCommunicationService
.If you need to retrieve large amounts of data from a
FetchCommunicationService
, then you SHOULD NOT USE THIS CLASS. This class retrieves one Communication at a time usingFetchCommunicationService
.Parameters: - host (str) – Hostname of
FetchCommunicationService
server - port (int) – Port # of
FetchCommunicationService
server
- host (str) – Hostname of
-
class
concrete.util.comm_container.
MemoryBackedCommunicationContainer
(communications_file, max_file_size=1073741824, add_references=True)¶ Bases:
collections.abc.Mapping
Maps Comm IDs to Comms by loading all Comms in file into memory
FetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. All Communications in communications_file will be read into memory using a
CommunicationReader
instance.Parameters: - communications_file (str) – String specifying name of Communications file
- max_file_size (int) – Maximum file size, in bytes
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
on any retrievedCommunication
-
class
concrete.util.comm_container.
RedisHashBackedCommunicationContainer
(redis_db, key, add_references=True)¶ Bases:
collections.abc.Mapping
Provides access to Communications stored in a Redis hash, assuming the key of each communication is its Communication id.
RedisHashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from a Redis hash.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – Key in redis database where hash is located
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
on any retrievedCommunication
-
class
concrete.util.comm_container.
S3BackedCommunicationContainer
(bucket, prefix_len=4, add_references=True)¶ Bases:
collections.abc.Mapping
Provides access to Communications stored in an AWS S3 bucket, assuming the key of each communication is its Communication id (optionally prefixed with a fixed-length, random-looking but deterministic hash to improve performance).
S3HashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs (with or without prefixes) to Communications. Communications are lazily retrieved from an S3 bucket.
References
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Parameters: - bucket (boto.s3.bucket.Bucket) – S3 bucket object
- prefix_len (int) – length of prefix in each Communication’s key in the bucket. This number of characters will be removed from the beginning of the key to determine the Communication id (without incurring the cost of fetching and deserializing the Communication). A prefix enables S3 to better partition the bucket contents, yielding higher performance and a lower chance of getting rate-limited by AWS.
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
on any retrievedCommunication
-
class
concrete.util.comm_container.
ZipFileBackedCommunicationContainer
(zipfile_path, comm_extensions=['.comm', '.concrete'], add_references=True)¶ Bases:
collections.abc.Mapping
Maps Comm IDs to Comms, retrieving Comms from a Zip file
ZipFileBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from a Zip file.
Parameters: - zipfile_path (str) – Path to Zip file containing Communications
- comm_extensions (str[]) – List of strings specifying filename extensions associated with Communications
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
on any retrievedCommunication
concrete.util.concrete_uuid module¶
Helper functions for generating Concrete UUID
objects
-
class
concrete.util.concrete_uuid.
AnalyticUUIDGeneratorFactory
(comm=None)¶ Bases:
object
Primary interface to generation of compressible UUIDs. Each compressible UUID takes the form
xxxxxxxx-xxxx-yyyy-yyyy-zzzzzzzzzzzzwhere each instance of x, y, or z is a hexadecimal digit, the group of x digits is shared across all annotations in a Communication, the group of y digits is shared across all annotations generated by a given analytic (by convention,
AnnotationMetadata
tool) in a given Communication, and the group of z digits is unique to each annotation (generated by a given analytic). Thus all UUIDs in a Communication share the same first twelve hex digits and some UUIDs in a Communication share the same middle eight hex digits. Additionally, while the x and y components are generated uniformly at random, the z component for each analytic in a Communication starts at a uniform-at-random twelve hex digits for the first annotation and increments by one for each annotation thereafter. Thus the UUIDs of a Communication likely have many substrings in common and are easily compressed. For example, we might find the following seven UUIDs in a Communication, corresponding to seven annotations split across two analytics:1bccb123-be45-7288-028a-4fdf3181ab51 1bccb123-be45-7288-028a-4fdf3181ab52 1bccb123-be45-7288-028a-4fdf3181ab53 1bccb123-be45-df12-9c04-198eaa130a4e 1bccb123-be45-df12-9c04-198eaa130a4f 1bccb123-be45-df12-9c04-198eaa130a50 1bccb123-be45-df12-9c04-198eaa130a51One generator factory should be created per Communication, and a new generator should be created from that factory for each analytic processing the communication. Often each program represents a single analytic, so common usage is:
augf = AnalyticUUIDGeneratorFactory(comm) aug = augf.create() for <each annotation object created by this analytic>: annotation = next(aug) <add annotation to communication>
or if you’re creating a new Communication:
augf = AnalyticUUIDGeneratorFactory() aug = augf.create() comm = <create communication> comm.uuid = next(aug) for <each annotation object created by this analytic>: annotation = next(aug) <add annotation to communication>
where the annotation objects might be objects of type
Parse
,DependencyParse
,TokenTagging
,CommunicationTagging
, etc.-
create
()¶ Returns: A UUID generator for a new analytic.
-
-
class
concrete.util.concrete_uuid.
UUIDClustering
(comm)¶ Bases:
object
Representation of the UUID instance clusters in a concrete communication (each cluster represents the set of nested members of the communication that reference or are identified by a given UUID).
-
hashable_clusters
()¶ Hashable version of UUIDClustering.
Two UUIDClusterings c1 and c2 are equivalent (the two underlying Communications’ UUID structures are equivalent) if and only if:
c1.hashable_clusters() == c2.hashable_clusters()Returns: The set of unlabeled UUID clusters in a unique and hashable format.
-
-
class
concrete.util.concrete_uuid.
UUIDCompressor
(single_analytic=False)¶ Bases:
object
Interface to replacing a Communication’s UUIDs with compressible UUIDs.
Parameters: single_analytic (bool) – True to generate new UUIDs using a single analytic for all annotations, false to use the annotation metadata tool name as the analytic id -
compress
(comm)¶ Return a copy of a communication whose UUIDs have been replaced by compressible UUIDs using
AnalyticUUIDGeneratorFactory
. When this method returns this object’s public member variable uuid_map will contain a dictionary mapping the original UUIDs to the new UUIDs.Parameters: comm (Communication) – communication to be copied (the UUIDs of the copy will be made compressible) Returns: Deep copy of comm with compressed UUIDs Return type: Communication
-
-
concrete.util.concrete_uuid.
bin_to_hex
(b, n=None)¶ Return hexadecimal representation of binary value
Parameters: - b (int) – integer whose bit representation will be converted
- n (int) – length of returned hexadecimal string (the string will be left-padded with 0s if it is originally shorter than n; an exception will be thrown if it is longer; the string will be returned as-is if n is None)
Returns: a string of hexadecimal characters representing the bit sequence in b, padded to be n characters long if n is not None
Raises: ValueError
– if n is not None and the hexadecimal string representing b is longer than n
-
concrete.util.concrete_uuid.
compress_uuids
(comm, verify=False, single_analytic=False)¶ Create a copy of
Communication
comm with UUIDs converted according to the compressible UUID schemeParameters: - comm (Communication) –
- verify (bool) – If True, use a heuristic to verify the UUID link structure is preserved in the new Communication
- single_analytic (bool) – If True, use a single analytic prefix for all UUIDs in comm.
Returns: A 2-tuple containing the new
Communication
(converted using the compressible UUID scheme) and theUUIDCompressor
object used to perform the conversion.Raises: ValueError
– If verify is True and comm has references added, raise because verification would cause an infinite loop.
-
concrete.util.concrete_uuid.
generate_UUID
()¶ Return a Concrete UUID object with a random UUID4 value.
Returns: a Concrete UUID
object
-
concrete.util.concrete_uuid.
generate_hex_unif
(n)¶ Generate and return random string of n hexadecimal characters.
Parameters: n (int) – number of characters of string to return Returns: string of n i.i.d. uniform hexadecimal characters
-
concrete.util.concrete_uuid.
generate_uuid_unif
()¶ Generate and return random UUID string whose characters are drawn uniformly from the hexadecimal alphabet.
Returns: string of hexadecimal characters drawn uniformly at random (delimited into five UUID-like segments by hyphens)
-
concrete.util.concrete_uuid.
hex_to_bin
(h)¶ Return binary encoding of hexadecimal string
Parameters: h (str) – string of hexadecimal characters Returns: an integer whose bit representation corresponds to the hexadecimal representation in h
-
concrete.util.concrete_uuid.
join_uuid
(xs, ys, zs)¶ Given three hexadecimal strings of sizes 12, 8, and 12, join them into a UUID string (inserting hyphens appropriately) and return the result.
Parameters: - xs (str) – 12 hexadecimal characters that will form first two segments of the UUID string (size 8 and size 4 respectively)
- ys (str) – 8 hexadecimal characters that will form the third and fourth segment of the UUID string (each of size 4)
- zs (str) – 12 hexadecimal characters that will form the last segment of the UUID string (size 12)
Returns: string of size 36 (12 + 8 + 12 = 32, plus four hyphens inserted appropriately) comprising UUID formed from xs, ys, and zs
Raises: ValueError
– if xs, ys, or zs have incorrect length
-
concrete.util.concrete_uuid.
split_uuid
(u)¶ Split UUID string into three hexadecimal strings of sizes 12, 8, and 12, returning those three strings (with hyphens stripped) in a tuple.
Parameters: u (str) – UUID string Returns: a tuple of three hexadecimal strings of sizes 12, 8, and 12, corresponding to the first two segments, middle two segments, and last segment of the input UUID string (with all hyphens stripped) Raises: ValueError
– if UUID string is malformatted
concrete.util.file_io module¶
Code for reading and writing Concrete Communications
-
class
concrete.util.file_io.
CommunicationReader
(filename, add_references=True, filetype=0)¶ Bases:
concrete.util.file_io.ThriftReader
Iterator/generator class for reading one or more Communications from a file
The iterator returns a (Communication, filename) tuple
Supported filetypes are:
- a file with a single Communication
- a file with multiple Communications concatenated together
- a gzipped file with a single Communication
- a gzipped file with multiple Communications concatenated together
- a .tar.gz file with one or more Communications
- a .zip file with one or more Communications
Sample usage:
for (comm, filename) in CommunicationReader('multiple_comms.tar.gz'): do_something(comm)
Parameters: - filename (str) – path of file to read from
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
on allCommunication
objects read from file - filetype (FileType) – Expected type of file. Default value is FileType.AUTO, where function will try to automatically determine file type.
-
class
concrete.util.file_io.
CommunicationWriter
(filename=None, gzip=False)¶ Bases:
object
Class for writing one or more Communications to a file
Sample usage:
with CommunicationWriter('foo.concrete') as writer: writer.write(existing_comm_object)
Parameters: - filename (str) – if specified, open file at this path during construction (a file can alternatively be opened after construction using the open method)
- gzip (bool) – Flag indicating if file should be compressed with gzip
-
close
()¶ Close file.
-
open
(filename)¶ Open specified file for writing. File will be compressed if the gzip flag of the constructor was set to True.
Parameters: filename (str) – path to file to open for writing
-
write
(comm)¶ Parameters: comm (Communication) – communication to write to file
-
class
concrete.util.file_io.
CommunicationWriterTGZ
(tar_filename=None)¶ Bases:
concrete.util.file_io.CommunicationWriterTar
Class for writing one or more Communications to a .tar.gz (.tgz) archive
Sample usage:
with CommunicationWriterTGZ('multiple_comms.tar.gz') as writer: writer.write(comm_object_one, 'comm_one.concrete') writer.write(comm_object_two, 'comm_two.concrete') writer.write(comm_object_three, 'comm_three.concrete')
Parameters: tar_filename (str) – if specified, open file at this path during construction (a file can alternatively be opened after construction using the open method)
-
class
concrete.util.file_io.
CommunicationWriterTar
(tar_filename=None, gzip=False)¶ Bases:
object
Class for writing one or more Communications to a .tar archive
Sample usage:
with CommunicationWriterTar('multiple_comms.tar') as writer: writer.write(comm_object_one, 'comm_one.concrete') writer.write(comm_object_two, 'comm_two.concrete') writer.write(comm_object_three, 'comm_three.concrete')
Initialize
Parameters: - tar_filename (str) – if specified, open file at this path during construction (a file can alternatively be opened after construction using the open method)
- gzip (bool) – Flag indicating if .tar file should be compressed with gzip
-
close
()¶ Close tar file.
-
open
(tar_filename)¶ Open specified tar file for writing. File will be compressed if the gzip flag of the constructor was set to True.
Parameters: tar_filename (str) – path to file to open for writing
-
write
(comm, comm_filename=None)¶ Parameters: - comm (Communication) – communication to write to tar file
- comm_filename (str) – desired filename of communication within tar file (by default the filename will be the communication id appended with a .concrete extension)
-
class
concrete.util.file_io.
CommunicationWriterZip
(zip_filename=None)¶ Bases:
object
Class for writing one or more Communications to a .zip archive
Sample usage:
with CommunicationWriterZip('multiple_comms.zip') as writer: writer.write(comm_object_one, 'comm_one.concrete') writer.write(comm_object_two, 'comm_two.concrete') writer.write(comm_object_three, 'comm_three.concrete')
Parameters: zip_filename (str) – if specified, open file at this path during construction (a file can alternatively be opened after construction using the open method) -
close
()¶ Close zip file.
-
open
(zip_filename=None)¶ Open specified zip file for writing.
Parameters: zip_filename (str) – path to file to open for writing
-
write
(comm, comm_filename=None)¶ Write communication to zip file.
Parameters: - comm (Communication) – communication to write to zip file
- comm_filename (str) – desired filename of communication within zip file (by default the filename will be the communication id appended with a .concrete extension)
-
-
class
concrete.util.file_io.
ThriftReader
(thrift_type, filename, postprocess=None, filetype=0)¶ Bases:
object
Iterator/generator class for reading one or more Thrift structures from a file
The iterator returns a (obj, filename) tuple where obj is an object of type thrift_type.
Supported filetypes are:
- a file with a single Thrift structure
- a file with multiple Thrift structures concatenated together
- a gzipped file with a single Thrift structure
- a gzipped file with multiple Thrift structures concatenated together
- a .tar.gz file with one or more Thrift structures
- a .zip file with one or more Thrift structures
Sample usage:
for (comm, filename) in ThriftReader(Communication, 'multiple_comms.tar.gz'): do_something(comm)
Parameters: - thrift_type – Class for Thrift type, e.g. Communication, TokenLattice
- filename (str) –
- postprocess (function) – A post-processing function that is called with the Thrift object as argument each time a Thrift object is read from the file
- filetype (FileType) – Expected type of file. Default value is FileType.AUTO, where function will try to automatically determine file type.
Raises: ValueError
– if filetype is not a known filetype name or id-
next
()¶ Return tuple containing next communication (and filename) in the sequence.
Raises: EOFError
– unexpected EOF, probably caused by deserializing an invalid Thrift objectStopIteration
– if there are no more communications
Returns: tuple containing Communication object and its filename
-
concrete.util.file_io.
read_communication_from_file
(communication_filename, add_references=True)¶ Read a Communication from the file specified by filename
Parameters: - communication_filename (str) – String with filename
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
onCommunication
read from file
Returns: Communication read from file
Return type:
-
concrete.util.file_io.
read_thrift_from_file
(thrift_obj, filename)¶ Instantiate Thrift object from contents of named file
The Thrift file is assumed to be encoded using TCompactProtocol
WARNING - Thrift deserialization tends to fail silently. For example, the Thrift libraries will not complain if you try to deserialize data from the file /dev/urandom.
Parameters: - thrift_obj – A Thrift object (e.g. a Communication object)
- filename (str) – A filename string
Returns: The Thrift object that was passed in as an argument
-
concrete.util.file_io.
read_tokenlattice_from_file
(tokenlattice_filename)¶ Read a
TokenLattice
from a fileParameters: tokenlattice_filename (str) – Name of file containing serialized TokenLattice
Returns: TokenLattice read from file Return type: TokenLattice
-
concrete.util.file_io.
write_communication_to_file
(communication, communication_filename)¶ Write a
Communication
to a fileParameters: - communication (Communication) – communication to write
- communication_filename (str) – path of file to write to
-
concrete.util.file_io.
write_thrift_to_file
(thrift_obj, filename)¶ Write a Thrift object to a file
Parameters: - thrift_obj – Thrift object to write
- filename (str) – path of file to write to
concrete.util.json_fu module¶
Convert Concrete objects to JSON strings
-
concrete.util.json_fu.
communication_file_to_json
(communication_filename, remove_timestamps=False, remove_uuids=False)¶ Get a “pretty-printed” JSON string representation for a
Communication
Parameters: - communication_filename (str) – Communication filename
- remove_timestamps (bool) – Flag for removing timestamps from JSON output
- remove_uuids (bool) – Flag for removing
UUID
info from JSON output
Returns: A “pretty-printed” JSON representation of the Communication
Return type: str
-
concrete.util.json_fu.
get_json_object_without_timestamps
(json_object)¶ Create a copy of a JSON object created by json.loads(), with all representations of
AnnotationMetadata
timestamps (dictionary keys with value timestamp) recursively removed.Parameters: json_object – Python object created from string by json.loads() Returns: A copy of the input data structure with all timestamp objects removed
-
concrete.util.json_fu.
get_json_object_without_uuids
(json_object)¶ Create a copy of a JSON object created by json.loads(), with all representations of
UUID
objects (dictionaries containing a ‘uuidString’ key) recursively removed.Parameters: json_object – Python object created from string by json.loads() Returns: A copy of the input data structure with all UUID objects removed
-
concrete.util.json_fu.
thrift_to_json
(tobj, remove_timestamps=False, remove_uuids=False)¶ Get a “pretty-printed” JSON string representation for a Thrift object
Parameters: - tobj – A Thrift object
- remove_timestamps (bool) – Flag for removing timestamps from JSON output
- remove_uuids (bool) – Flag for removing
UUID
info from JSON output
Returns: A “pretty-printed” JSON representation of the Thrift object
Return type: str
-
concrete.util.json_fu.
tokenlattice_file_to_json
(toklat_filename)¶ Get a “pretty-printed” JSON string representation for a
TokenLattice
Parameters: toklat_filename (str) – String specifying TokenLattice filename Returns: A “pretty-printed” JSON representation of the TokenLattice Return type: str
concrete.util.learn_wrapper module¶
-
class
concrete.util.learn_wrapper.
ActiveLearnerClientClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/learn/ActiveLearnerClientService.py'>¶
-
class
concrete.util.learn_wrapper.
ActiveLearnerClientServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/learn/ActiveLearnerClientService.py'>¶
-
-
class
concrete.util.learn_wrapper.
ActiveLearnerServerClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/learn/ActiveLearnerServerService.py'>¶
-
class
concrete.util.learn_wrapper.
ActiveLearnerServerServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/learn/ActiveLearnerServerService.py'>¶
-
-
class
concrete.util.learn_wrapper.
HTTPActiveLearnerClientClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/learn/ActiveLearnerClientService.py'>¶
-
-
class
concrete.util.learn_wrapper.
HTTPActiveLearnerServerClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/learn/ActiveLearnerServerService.py'>¶
-
-
class
concrete.util.learn_wrapper.
SubprocessActiveLearnerClientServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
ActiveLearnerClientServiceWrapper
-
class
concrete.util.learn_wrapper.
SubprocessActiveLearnerServerServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
ActiveLearnerServerServiceWrapper
concrete.util.locale module¶
-
concrete.util.locale.
set_stdout_encoding
()¶ Force stdout encoding to utf-8. Ideally the user should set the output encoding to utf-8 (or otherwise) in their environment, as explained on the internet, but in practice it has been difficult to get that right (and scripts writing to stdout have broken).
concrete.util.mem_io module¶
-
concrete.util.mem_io.
communication_deep_copy
(comm)¶ Return deep copy of communication.
Parameters: comm (Communication) – communication to copy Returns: deep copy of comm Return type: Communication
-
concrete.util.mem_io.
read_communication_from_buffer
(buf, add_references=True)¶ Deserialize buf (a binary string) and return resulting communication. Add references if requested.
Parameters: - buf (str) – String representing communication encoded from thrift
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
onCommunication
read from buffer
Returns: Communication read from buffer
Return type:
-
concrete.util.mem_io.
write_communication_to_buffer
(comm)¶ Serialize communication to buffer (binary string) and return buffer.
Parameters: comm (Communication) – communication to serialize Returns: Communication read from buffer Return type: Communication
concrete.util.metadata module¶
-
exception
concrete.util.metadata.
MultipleAnnotationsError
(*args, **kwargs)¶ Bases:
Exception
Exception representing more than one annotations present in a concrete object when one (or zero) is expected.
-
exception
concrete.util.metadata.
ZeroAnnotationsError
(*args, **kwargs)¶ Bases:
Exception
Exception representing zero annotations present in a concrete object when one (or more) is expected.
-
concrete.util.metadata.
datetime_to_timestamp
(dt)¶ Given time-zone–unaware datetime object representing date and time in UTC, return corresponding Concrete timestamp.
Parameters: dt (datetime) – time-zone–unaware datetime object representing date and time (in UTC) to convert Returns: concrete timestamp representing datetime dt
-
concrete.util.metadata.
filter_annotations
(annotations, filter_fields=None, sort_field=None, sort_reverse=False, action_if_multiple='pass', action_if_zero='pass')¶ Return filtered and/or re-ordered list of annotations, that is, objects containing a metadata field of type AnnotationMetadata. The default behavior is to do no filtering (or re-ordering), returning an exact copy of annotations.
Parameters: - annotations (list) – original list of annotations (objects
containing a metadata field of type
metadata.ttypes.AnnotationMetadata
). This list is not modified. - filter_fields (dict) – dict of fields and their desired values
by which to filter annotations (keep annotations whose
field FIELD equals VALUE for all FIELD:
VALUE) entries). Default: keep all annotations.
See
get_annotation_field()
for valid fields. - sort_field (str) – field by which to re-order annotations. Default: do not re-order annotations.
- sort_reverse (bool) – True to reverse order of annotations (after sorting, if any).
- action_if_multiple (str) – action to take if, after filtering, there is more than one annotation left. ‘pass’ to return all filtered and re-ordered annotations, ‘raise’ to raise an exception of type MultipleAnnotationsError, ‘first’ to return a list containing the first annotation after filtering and re-ordering, or ‘last’ to return a list containing the last annotation after filtering and re-ordering.
- action_if_zero (str) – action to take if, after filtering, there are no annotations left. ‘pass’ to return an empty list, ‘raise’ to raise an exception of type ZeroAnnotationsError.
Returns: filtered and/or re-ordered list of annotations
Raises: ValueError
– if the value of action_if_multiple or action_if_zero is not recognizedMultipleAnnotationsError
– if the value of action_if_multiple is ‘raise’ and there are multiple annotations passing the filterZeroAnnotationsError
– if the value of action_if_zero is ‘raise’ and there are no annotations passing the filter
- annotations (list) – original list of annotations (objects
containing a metadata field of type
-
concrete.util.metadata.
filter_annotations_json
(annotations, kwargs_json)¶ Call
filter_annotations()
on annotations, sending it keyword arguments from the JSON-encoded dictionary kwargs_json.Parameters: - annotations (list) – original list of annotations (objects
containing a metadata field of type
metadata.ttypes.AnnotationMetadata
). This list is not modified. - kwargs_json (str) – JSON-encoded dictionary of keyword
arguments to be passed to
filter_annotations()
.
Returns: annotations filtered by
filter_annotations()
according to provided JSON-encoded keyword arguments.Raises: ValueError
– if the value of ‘action_if_multiple’ or ‘action_if_zero’ is not recognizedMultipleAnnotationsError
– if the value of ‘action_if_multiple’ is ‘raise’ and there are multiple annotations passing the filterZeroAnnotationsError
– if the value of ‘action_if_zero’ is ‘raise’ and there are no annotations passing the filter
- annotations (list) – original list of annotations (objects
containing a metadata field of type
-
concrete.util.metadata.
filter_unnone
(annotation_filter)¶ If annotation_filter is None, return no-op filter.
Parameters: annotation_filter (func) – function that takes a list of annotations and returns a filtered (and/or re-ordered) list of annotations Returns: function that takes a list of annotations and returns a filtered (and/or re-ordered) list of annotations.
-
concrete.util.metadata.
get_annotation_field
(annotation, field)¶ Return requested field of annotation metadata.
Parameters: - annotation (object) – object containing a metadata field of
type
metadata.ttypes.AnnotationMetadata
. - field (str) – name of metadata field: kBest, timestamp, or tool.
Returns: value of requested field in annotation metadata.
Raises: ValueError
– on unknown field name- annotation (object) – object containing a metadata field of
type
-
concrete.util.metadata.
get_index_of_tool
(lst_of_conc, tool)¶ Return the index of the object in the provided list whose tool name matches tool.
If tool is None, return the first valid index into lst_of_conc.
- This returns -1 if:
- lst_of_conc is None, or
- lst_of_conc has no entries, or
- no object in lst_of_conc matches tool.
Parameters: - lst_of_conc (list) – list of Concrete objects, each of which has a .metadata field.
- tool (str) – A tool name to match.
-
concrete.util.metadata.
now_timestamp
()¶ Return timestamp representing the current time.
Returns: concrete timestamp representing the current time
-
concrete.util.metadata.
timestamp_to_datetime
(timestamp)¶ Given Concrete timestamp, return corresponding time-zone–unaware datetime object representing date and time in UTC.
Parameters: timestamp (int) – Concrete timestamp (integer representing seconds since the epoch in UTC) representing date and time to convert Returns: datetime representing timestamp dt Source: https://stackoverflow.com/questions/3694487/initialize-a-datetime-object-with-seconds-since-epoch
-
concrete.util.metadata.
tool_to_filter
(tool, explicit_filter=None)¶ Given tool name (deprecated way to filter annotations) or None, and an explicit annotation filter function or None, return an annotation filter function representing whichever is not None (and raise ValueError if both are not None).
Parameters: - tool (str) – name of tool to filter by, or None
- explicit_filter (func) – function taking a list of annotations as input and returning a sub-list (possibly re-ordered) as output, or None
Returns: Function taking a list of annotations as input and either applying explicit_filter to them and returning its output or filtering them by tool tool and returning that filtered list. If both tool and explicit_filter are not None, raise ValueError.
Raises: ValueError
– if both tool and explicit_filter are not None
concrete.util.net module¶
-
concrete.util.net.
find_port
()¶ Find and return an available TCP port.
Returns: an unused TCP port (an integer)
concrete.util.redis_io module¶
-
class
concrete.util.redis_io.
RedisCommunicationReader
(redis_db, key, add_references=True, **kwargs)¶ Bases:
concrete.util.redis_io.RedisReader
Iterable class for reading one or more Communications from redis. See RedisReader for further description.
Example usage:
from redis import Redis redis_db = Redis(port=12345) for comm in RedisCommunicationReader(redis_db, 'my-set-key'): do_something(comm)
Create communication reader for specified key in specified redis_db.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – name of redis key containing your communication(s)
- add_references (bool) – True to fill in members in the communication according to UUID relationships (see concrete.util.add_references), False to return communication as-is (note: you may need this False if you are dealing with incomplete communications)
All other keyword arguments are passed through to RedisReader; see
RedisReader
for a description of those arguments.Raises: Exception
– if deserialize_func is specified (it is set to the appropriate concrete deserializer internally)
-
class
concrete.util.redis_io.
RedisCommunicationWriter
(redis_db, key, uuid_hash_key=False, **kwargs)¶ Bases:
concrete.util.redis_io.RedisWriter
Class for writing one or more Communications to redis. See RedisWriter for further description.
Example usage:
from redis import Redis redis_db = Redis(port=12345) w = RedisCommunicationWriter(redis_db, ‘my-set-key’) w.write(comm)Create communication writer for specified key in specified redis_db.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – name of redis key containing your communication(s)
- uuid_hash_key (bool) – True to use the UUID as the hash key for a communication, False to use the id
All other keyword arguments are passed through to RedisWriter; see
RedisWriter
for a description of those arguments.Raises: Exception
– if serialize_func is specified (it is set to the appropriate concrete serializer internally), or if hash_key_func is specified (it is set to an appropriate function internally)
-
class
concrete.util.redis_io.
RedisReader
(redis_db, key, key_type=None, pop=False, block=False, right_to_left=True, block_timeout=0, temp_key_ttl=3600, temp_key_leaf_len=32, cycle_list=False, deserialize_func=None)¶ Bases:
object
Iterable class for reading one or more objects from redis.
Supported input types are:
- a set containing zero or more objects
- a list containing zero or more objects
- a hash containing zero or more key-object pairs
For list and set types, the reader can optionally pop (consume) its input; for lists only, the reader can moreover block on the input.
Note that iteration over a set or hash will create a temporary key in the redis database to maintain a set of elements scanned so far.
If pop is False and the key (in the database) is modified during iteration, behavior is undefined. If pop is True, modifications during iteration are encouraged.
Example usage:
from redis import Redis redis_db = Redis(port=12345) for obj in RedisReader(redis_db, 'my-set-key'): do_something(obj)
Create reader for specified key in specified redis_db.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – name of redis key containing your object(s)
- key_type (str) – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists, so probably not suitable for block and/or pop modes)
- pop (bool) – True to remove objects from redis as we iterate over them, and False to leave redis unaltered
- block (bool) – True to block for data (i.e., wait for something to be added to the list if it is empty), False to end iteration when there is no more data
- right_to_left (bool) – True to iterate over and index in lists from right to left, False to iterate/index from left to right
- deserialize_func (func) – maps blobs from redis to some more friendly representation (e.g., if all your items are unicode strings, you might want to specify lambda s: s.decode(‘utf-8’)); return blobs unchanged if deserialize_func is None
- block_timeout (int) – number of seconds to block during operations if block is True; if 0, block forever
- temp_key_ttl (int) – time-to-live (in seconds) of temporary keys created during scans (amount of time to process a batch of items returned by a scan should be much less than the time-to-live of the temporary key, or duplicate items will be returned)
- temp_key_leaf_len (int) – length (in bytes) of random part of temporary key (longer is less likely to cause conflicts with other processes but slower)
- cycle_list (bool) – iterate over list by popping items from the right end and pushing them onto the left end (atomically), note iteration thus modifies the list (although a full iteration ultimately leaves the list in the same state as it began)
Raises: Exception
– if key_type is None but the key does not exist in the database (so its type cannot be guessed)ValueError
– if key type is not recognized or the options that were specified are not supported for a recognized key type
-
batch
(n)¶ Return a batch of n objects. May be faster than one-at-a-time iteration, but currently only supported for non-popping, non-blocking set configurations. Support for popping, non-blocking sets is planned; see http://redis.io/commands/spop .
Parameters: n (int) – number of objects to return Raises: Exception
– if key type is not a set, or if it is a set but popping or blocking operation is specified
-
class
concrete.util.redis_io.
RedisWriter
(redis_db, key, key_type=None, right_to_left=True, serialize_func=None, hash_key_func=None)¶ Bases:
object
Class for writing one or more objects to redis.
Supported input types are:
- a set of objects
- a list of objects
- a hash of key-object pairs
Example usage:
from redis import Redis redis_db = Redis(port=12345) w = RedisWriter(redis_db, ‘my-set-key’) w.write(obj)Create object writer for specified key in specified redis_db.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – name of redis key containing your object(s)
- key_type (str) – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists)
- right_to_left (bool) – True to write elements to the left end of lists, False to write to the right end
- serialize_func (func) – maps objects to blobs before sending to Redis (e.g., if everything you write will be a unicode string, you might want to use lambda u: u.encode(‘utf-8’)); pass objects to Redis unchanged if serialize_func is None
- hash_key_func (func) – maps objects to keys when key_type is hash (None: use Python’s hash function)
-
clear
()¶ Remove all data from redis data structure.
-
write
(obj)¶ Write object obj to redis data structure.
Parameters: - obj (object) – object to be serialized by
- and written to database, according (self.serialize_func) –
- key type (to) –
Raises: Exception
– if called on redis key type that is not a list, set, or hash
-
concrete.util.redis_io.
read_communication_from_redis_key
(redis_db, key, add_references=True)¶ Return a serialized communication from a string key. If block is True, poll server until key appears at specified interval or until specified timeout (indefinitely if timeout is zero). Return None if block is False and key does not exist or if block is True and key does not exist after specified timeout.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – simple (string) key to read serialized communication from
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
onCommunication
read from file
-
concrete.util.redis_io.
write_communication_to_redis_key
(redis_db, key, comm)¶ Serialize communication and store result in redis key.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – name of simple (string) redis key to write communication to
- comm (Communication) – communication to serialize
concrete.util.references module¶
Add reference variables for each UUID
“pointer” in a
Communication
-
concrete.util.references.
add_references_to_communication
(comm)¶ Create references for each
UUID
‘pointer’Parameters: comm (Communication) – A Concrete Communication object, will be modified by this function The Concrete schema uses
UUID
objects as internal pointers between Concrete objects. This function adds member variables to Concrete objects that are references to the Concrete objects identified by theUUID
.For example, each
Entity
has a mentionIdlist that lists the UUIDs of theEntityMention
objects for thatEntity
. This function adds a mentionList variable to theEntity
that is a list of references to the actualEntityMention
objects. This allows you to access theEntityMention
objects using:entity.mentionListThis function adds these reference variables:
- tokenization to each
TokenRefSequence
- entityMention to each
Argument
- sentence backpointer to each
Tokenization
- parentMention backpointer to appropriate
EntityMention
And adds these lists of reference variables:
- mentionList to each
Entity
- situationMention to each
Argument
- mentionList to each
Situation
- childMentionList to each
EntityMention
For variables that represent optional lists of
UUID
objects (e.g. situation.mentionIdList), Python Thrift will set the variable to None if the list is not provided. When this function adds a list-of-references variable (in this case, situation.mentionList) for an omitted optional list, it sets the new variable to None - it DOES NOT leave the variable undefined.- tokenization to each
concrete.util.results_wrapper module¶
-
class
concrete.util.results_wrapper.
HTTPResultsServerClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/services/results/ResultsServerService.py'>¶
-
-
class
concrete.util.results_wrapper.
ResultsServerClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/services/results/ResultsServerService.py'>¶
-
class
concrete.util.results_wrapper.
ResultsServerServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/services/results/ResultsServerService.py'>¶
-
-
class
concrete.util.results_wrapper.
SubprocessResultsServerServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
ResultsServerServiceWrapper
concrete.util.search_wrapper module¶
-
class
concrete.util.search_wrapper.
FeedbackClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/search/FeedbackService.py'>¶
-
class
concrete.util.search_wrapper.
FeedbackServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/search/FeedbackService.py'>¶
-
-
class
concrete.util.search_wrapper.
HTTPSearchClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/search/SearchService.py'>¶
-
-
class
concrete.util.search_wrapper.
SearchClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/search/SearchService.py'>¶
-
class
concrete.util.search_wrapper.
SearchProxyClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/search/SearchProxyService.py'>¶
-
class
concrete.util.search_wrapper.
SearchProxyServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/search/SearchProxyService.py'>¶
-
-
class
concrete.util.search_wrapper.
SearchServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/search/SearchService.py'>¶
-
-
class
concrete.util.search_wrapper.
SubprocessFeedbackServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
FeedbackServiceWrapper
-
class
concrete.util.search_wrapper.
SubprocessSearchProxyServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
SearchProxyServiceWrapper
-
class
concrete.util.search_wrapper.
SubprocessSearchServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
SearchServiceWrapper
concrete.util.service_wrapper module¶
-
class
concrete.util.service_wrapper.
ConcreteServiceClientWrapper
(host, port)¶ Bases:
object
Base class for a wrapper around a Concrete service client. Implements the context manager interface so client can be controlled using the with: statement (client connection is closed when the with: scope is exited).
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
class
concrete.util.service_wrapper.
ConcreteServiceWrapper
(implementation)¶ Bases:
object
Base class for a wrapper around a Concrete service that runs in (blocks) the current process.
Parameters: implementation (object) – handler of specified concrete service -
serve
(host, port)¶ Serve on specified host and port in current process, blocking until server is killed. (If server is not killed by signal or otherwise it will block forever.)
Parameters: - host (str) – hostname to serve on
- port (int) – port number to serve on
-
-
class
concrete.util.service_wrapper.
HTTPConcreteServiceClientWrapper
(uri)¶ Bases:
object
Base class for a wrapper around an HTTP Concrete service client. Implements the context manager interface so client can be controlled using the with: statement (client connection is closed when the with: scope is exited).
Parameters: uri (str) –
-
class
concrete.util.service_wrapper.
SubprocessConcreteServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
object
Base class for a wrapper around a Concrete service that runs in a subprocess; implements the context manager interface so subprocess can be controlled using the with: statement (subprocess is stopped and joined when the with: scope is exited).
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
SLEEP_INTERVAL
= 0.1¶
concrete.util.simple_comm module¶
Create a simple (valid) Communication suitable for testing purposes
-
class
concrete.util.simple_comm.
SimpleCommTempFile
(n=10, id_fmt='temp-%d', sentence_fmt='Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix='.concrete')¶ Bases:
object
DEPRECATED. Please use
create_comm()
instead.Class representing a temporary file of sample concrete objects. Designed to facilitate testing.
-
path
¶ path to file
Type: str
-
communications
¶ List of communications that were written to file
Type: Communication[]
Usage:
from concrete.util import CommunicationReader with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f: reader = CommunicationReader(f.path) for (orig_comm, comm_path_pair) in zip(f.communications, reader): print(orig_comm.id) print(orig_comm.id == comm_path_pair[0].id) print(f.path == comm_path_pair[1])
Create temp file and write communications.
Parameters: - n – i number of communications to write
- id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- writer_class – CommunicationWriter or CommunicationWriterTGZ
- suffix – file path suffix (you probably want to choose this to match writer_class)
-
-
concrete.util.simple_comm.
add_annotation_level_argparse_argument
(parser)¶ Add an ‘–annotation-level’ argument to an ArgumentParser
The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See
create_comm()
for details.Parameters: parser (argparse.ArgumentParser) – the parser to add the argument to
-
concrete.util.simple_comm.
create_comm
(comm_id, text='', comm_type='article', section_kind='passage', metadata_tool='concrete-python', metadata_timestamp=None, annotation_level='token')¶ Create a simple, valid
Communication
from text.By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections. Each section will be created with a call to
create_section()
.annotation_level controls the amount of annotation that is added:
- AL_NONE: add no optional annotations (not even sections)
- AL_SECTION: add sections but not sentences
- AL_SENTENCE: add sentences but not tokens
- AL_TOKEN: add all annotations, up to tokens (the default)
Parameters: - comm_id (str) – Communication id
- text (str) – Communication text
- comm_type (str) – Communication type
- section_kind (str) – Section kind to set on all sections
- metadata_tool (str) – tool name of analytic that generated this text
- metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used.
- annotation_level (str) – string representing annotation level to add to communication (see above)
Returns: Communication containing given text and metadata
-
concrete.util.simple_comm.
create_section
(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Section
from provided text and metadata. Section text will be split into sentence texts by newlines and each sentence will be created with a call tocreate_sentence()
.Lower-level routine (called by
create_comm()
).Parameters: - sec_text (str) – text to create section from
- sec_start (int) – starting position of section in Communication text (inclusive)
- sec_end (int) – ending position of section in Communication text (inclusive)
- section_kind (str) – value for Section.kind field to be set to
- aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this section
- metadata_tool (str) – tool name of the analytic that generated this section
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Concrete Section containing given text and metadata
-
concrete.util.simple_comm.
create_sentence
(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Sentence
from provided text and metadata.Lower-level routine (called indirectly by
create_comm()
)Parameters: - sen_text (str) – text to create sentence from
- sen_start (int) – starting position of sentence in Communication text (inclusive)
- sen_end (int) – ending position of sentence in Communication text (inclusive)
- aug (_AnalyticUUIDGenerator) – compressible UUID generator for the analytic that generated this sentence
- metadata_tool (str) – tool name of the analytic that generated this sentence
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Concrete Sentence containing given text and metadata
-
concrete.util.simple_comm.
create_simple_comm
(comm_id, sentence_string='Super simple sentence .')¶ Create a simple (valid)
Communication
suitable for testing purposesThe Communication will have a single
Section
containing a singleSentence
.Parameters: - comm_id (str) – Communication id
- sentence_string (str) – Communication text
Returns: Communication containing given text and having the given id
concrete.util.summarization_wrapper module¶
-
class
concrete.util.summarization_wrapper.
HTTPSummarizationClientWrapper
(uri)¶ Bases:
concrete.util.service_wrapper.HTTPConcreteServiceClientWrapper
Parameters: uri (str) – -
concrete_service_class
= <module 'concrete.summarization.SummarizationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/summarization/SummarizationService.py'>¶
-
-
class
concrete.util.summarization_wrapper.
SubprocessSummarizationServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
Parameters: - implementation (object) – handler of specified concrete service
- host (str) – hostname that will be served on when context is entered
- port (int) – port number that will be served on when context is entered
- timeout (int) – number of seconds to wait for server to start in subprocess, when context is entered (if None, wait forever)
-
concrete_service_wrapper_class
¶ alias of
SummarizationServiceWrapper
-
class
concrete.util.summarization_wrapper.
SummarizationClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
Parameters: - host (str) – hostname to connect to
- port (int) – port number to connect to
-
concrete_service_class
= <module 'concrete.summarization.SummarizationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/summarization/SummarizationService.py'>¶
-
class
concrete.util.summarization_wrapper.
SummarizationServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
Parameters: implementation (object) – handler of specified concrete service -
concrete_service_class
= <module 'concrete.summarization.SummarizationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.15.2/lib/python3.7/site-packages/concrete-4.15.2-py3.7.egg/concrete/summarization/SummarizationService.py'>¶
-
concrete.util.thrift_factory module¶
-
class
concrete.util.thrift_factory.
ThriftFactory
(transportFactory, protocolFactory)¶ Bases:
object
Abstract factory to create Thrift objects for client and server.
-
createProtocol
(transport)¶ Return new thrift protocol on transport.
Parameters: transport (TTransport.TTransport) – transport to create protocol on Returns: TTransport.TTransport
-
createServer
(processor, host, port)¶ Return new thrift server given a service handler and the server host and port.
Parameters: - processor – concrete service handler
- host (str) – hostname to serve on
- port (int) – port number to serve on
Returns: TServer.TThreadedServer
-
createSocket
(host, port)¶ Return new thrift socket.
Parameters: - host (str) – hostname to create socket on
- port (int) – port number to create socket on
Returns: TSocket.TSocket
-
createTransport
(socket)¶ Return new thrift transport on socket..
Parameters: socket (TSocket.TSocket) – socket to create transport on Returns: TSocket.TSocket
-
-
concrete.util.thrift_factory.
is_accelerated
()¶ Return whether this concrete-python installation has accelerated serialization.
Returns: True if this concrete-python installation is accelerated, False otherwise
concrete.util.tokenization module¶
-
exception
concrete.util.tokenization.
NoSuchTokenTagging
(*args, **kwargs)¶ Bases:
Exception
Exception representing there is no
TokenTagging
annotation that matches the given criteria in a given concrete object
-
concrete.util.tokenization.
compute_lattice_expected_counts
(lattice)¶ Given a
TokenLattice
in which the dst, src, token, and weight fields are set in each arc, compute and return a list of expected token log-probabilities.Input arc weights are treated as unnormalized log-probabilities.
Parameters: lattice (TokenLattice) – lattice to compute expected counts for Returns: List of floats (expected log-probabilities) with the float at position i corresponding to the token with tokenIndex i.
-
concrete.util.tokenization.
flatten
(a)¶ Returned flattened version of input list.
Parameters: a (list) – Returns: Flattened list Return type: list
-
concrete.util.tokenization.
get_comm_tokenizations
(comm, tool=None)¶ Get list of
Tokenization
objects in aCommunication
Parameters: - comm (Communication) – communications to extract tokenizations from
- tool (str) – If not None, only return
Tokenization
objects whose metadata.tool field is equal to tool
Returns: List of
Tokenization
objects
-
concrete.util.tokenization.
get_comm_tokens
(comm, sect_pred=None, suppress_warnings=False)¶ Get list of
Token
objects inCommunication
.Parameters: - comm (Communication) – communications to extract tokens from
- sect_pred (function) – Function that takes a
Section
and returns false if theSection
should be excluded. - suppress_warnings (bool) – True to suppress warning messages that Tokenization.kind is None
Returns: List of
Token
objects inCommunication
, delegating toget_tokens()
for each sentence.
-
concrete.util.tokenization.
get_lemmas
(t, tool=None)¶ Returns the result of
get_tagged_tokens()
with a tagging_type of “LEMMA”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: list of ‘LEMMA’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.
get_ner
(t, tool=None)¶ Returns the result of
get_tagged_tokens()
with a tagging_type of “NER”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: list of ‘NER’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.
get_pos
(t, tool=None)¶ Returns the result of
get_tagged_tokens()
with a tagging_type of “LEMMA”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: list of ‘POS’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.
get_tagged_tokens
(tokenization, tagging_type, tool=None)¶ Return list of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Parameters: - tokenization (Tokenization) – tokenization to return tagged tokens for
- tagging_type (str) – only return tagged tokens for
TokenTagging
objects whose taggingType field is equal to tagging_type - tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: List of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Raises: NoSuchTokenTagging
– if there is no matching taggingException
– if there is more than one matching tagging.
-
concrete.util.tokenization.
get_token_taggings
(tokenization, tagging_type, case_sensitive=False)¶ Return list of
TokenTagging
objects of taggingType equal to tagging_type.Parameters: - tokenization (Tokenization) – tokenization from which taggings will be selected
- tagging_type (str) – value of taggingType to filter to
- case_sensitive (bool) – True to do case-sensitive matching on taggingType.
Returns: List of
TokenTagging
objects of taggingType equal to tagging_type, in same order as they appeared in the tokenization. If no matchingTokenTagging
objects exist, return an empty list.
-
concrete.util.tokenization.
get_tokenizations
(comm, tool=None)¶ Returns a flat list of all Tokenization objects in a Communication
Parameters: - comm (Communication) – communication to get tokenizations from
- tool (str) – if not None, return only tokenizations whose metadata.tool field matches tool
Returns: A list of all Tokenization objects within the Communication matching tool (if it is not None)
-
concrete.util.tokenization.
get_tokens
(tokenization, suppress_warnings=False)¶ Get list of
Token
objects for aTokenization
Return list of Tokens from lattice.cachedBestPath, if Tokenization kind is TOKEN_LATTICE; else, return list of Tokens from tokenList.
Warn and return list of Tokens from tokenList if kind is not set.
Return None if kind is set but the respective data fields are not.
Parameters: - tokenization (Tokenization) – tokenization to extract tokens from
- suppress_warnings (bool) – True to suppress warning messages that tokenization.kind is None
Returns: List of
Token
objects, or NoneRaises: ValueError
– if tokenization.kind is not a recognized tokenization kind
-
concrete.util.tokenization.
plus
(x, y)¶ Return concatenation of two lists.
Parameters: - x (list) –
- y (list) –
Returns: list concatenation of x and y
concrete.util.twitter module¶
Convert between JSON and Concrete representations of Tweets
The JSON fields used by the Twitter API are documented at:
-
concrete.util.twitter.
capture_tweet_lid
(tweet)¶ Reads the lang field from a tweet from the twitter API, if it exists, and return corresponding concrete
LanguageIdentification
object.Parameters: tweet (dict) – Object created by deserializing a JSON Tweet string Returns: LanguageIdentification
object, or None if the lang field is not present in the Tweet JSON
-
concrete.util.twitter.
json_tweet_object_to_Communication
(tweet)¶ Convert deserialized JSON Tweet object to
Communication
Parameters: tweet (object) – Object created by deserializing a JSON Tweet string Returns: Communication representing the Tweet, with tweetInfo and text fields set (among others) but with a null (None) sectionList. Return type: Communication
-
concrete.util.twitter.
json_tweet_object_to_TweetInfo
(tweet)¶ Create
TweetInfo
object from deserialized JSON Tweet objectParameters: tweet (dict) – Object created by deserializing a JSON Tweet string Returns: concrete object representing twitter metadata from tweet Return type: TweetInfo
-
concrete.util.twitter.
json_tweet_string_to_Communication
(json_tweet_string, check_empty=False, check_delete=False)¶ Convert JSON Tweet string to Communication
Parameters: - json_tweet_string (str) – JSON Tweet string from Twitter API
- check_empty (bool) – If True, check if json_tweet_string is empty (return None if it is)
- check_delete (bool) – If True, check for presence of delete field in Tweet JSON, and if the ‘delete’ field is present, return None
Returns: Communication representing the Tweet, with tweetInfo and text fields set (among others) but with a null (None) sectionList.
Return type:
-
concrete.util.twitter.
json_tweet_string_to_TweetInfo
(json_tweet_string)¶ Create
TweetInfo
object from JSON Tweet stringParameters: json_tweet_string (str) – JSON Tweet string from Twitter API Returns: concrete twitter metadata object with fields set from json_tweet_string Return type: TweetInfo
-
concrete.util.twitter.
snake_case_to_camelcase
(value)¶ Converts snake case to camel case
Implementation copied from this Stack Overflow post: http://goo.gl/SSgo9k
Parameters: value (str) – snake case (lower case with underscores) value Returns: camel case string corresponding to value (with isolated unscores stripped and sequences of two or more underscores reduced by one underscore) Return type: str
-
concrete.util.twitter.
twitter_lid_to_iso639_3
(twitter_lid)¶ Convert Twitter Language ID string to ISO639-3 code
Parameters: twitter_lid (str) – This can be an iso639-3 code (no-op), iso639-1 2-letter abbr (converted to 3), or combo (split by ‘-’, then first part converted)
Per the Twitter documentation, “The language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).”
Returns: the ISO639-3 code corresponding to twitter_lid Return type: str
concrete.util.unnone module¶
-
concrete.util.unnone.
dun
(d)¶ If l is None return an empty dict, else return l. Simplifies iteration over dict fields that might be unset.
Parameters: d (dict) – input dict (or None) - Return
- d, or an empty dict if d is None
-
concrete.util.unnone.
lun
(lst)¶ If lst is None return an empty list, else return lst. Simplifies iteration over list fields that might be unset.
Parameters: lst (list) – input list (or None) - Return
- lst, or an empty list if lst is None
-
concrete.util.unnone.
sun
(s)¶ If l is None return an empty set, else return l. Simplifies iteration over set fields that might be unset.
Parameters: s (set) – input set (or None) - Return
- s, or an empty set if s is None
concrete.validate module¶
Library to validate a Concrete Communication
Validation info, error and warning messages are logged using the Python standard library’s logging module.
-
concrete.validate.
validate_communication
(comm)¶ Test if all objects in a
Communication
are valid.Calls
validate_thrift_deep()
to check for Concrete data structure fields that are required by the Concrete Thrift definitions. Then calls:validate_token_offsets_for_section()
validate_token_offsets_for_sentence()
validate_constituency_parses()
validate_dependency_parses()
validate_token_taggings()
validate_entity_mention_ids()
validate_entity_mention_tokenization_ids()
validate_situations()
validate_situation_mentions()
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_communication_file
(communication_filename)¶ Test if the
Communication
in a file is validDeserializes a
Communication
file into memory, then callsvalidate_communication()
on the Communication object.Parameters: communication_filename (str) – Name of file containing Returns: bool
-
concrete.validate.
validate_constituency_parses
(comm, tokenization)¶ Test a
Tokenization
’s constituencyParse
objects.Verifies that, for each constituent
Parse
:- none of the constituent IDs for the parse repeat
- the parse tree is a fully connected graph
- the parse “tree” is really a tree data structure
Parameters: - comm (Communication) –
- tokenization (Tokenization) –
Returns: bool
-
concrete.validate.
validate_dependency_parses
(tokenization)¶ Test a
Tokenization
’sDependencyParse
objectsVerifies that, for each
DependencyParse
:- the parse is a fully connected graph
- there are no nodes with a null governer node whose edgeType is not root
Parameters: tokenization (Tokenization) – Returns: bool
-
concrete.validate.
validate_entity_mention_ids
(comm)¶ Test if all
Entity
mentionIds are validChecks if all
Entity
mentionIdUUID
’s refer to aEntityMention
UUID
that exists in theCommunication
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_entity_mention_token_ref_sequences
(comm)¶ Test if all
EntityMention
objects have a validTokenRefSequences
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_entity_mention_tokenization_ids
(comm)¶ Test tokenizationID field of every
EntityMention
Verifies that, for each
EntityMention
, the entityMention.tokens.tokenizationIdUUID
field matches theUUID
of aTokenization
that exists in thisCommunication
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_situation_mentions
(comm)¶ Test every
SituationMention
in theCommunication
A
SituationMention
has a list ofMentionArgument
objects, and eachMentionArgument
can point to anEntityMention
,SituationMention
orTokenRefSequence
.Checks that each
MentionArgument
points to only one type of argument. Also checks validity of allEntityMention
andSituationMention
UUID
’s.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_situations
(comm)¶ Test every
Situation
in theCommunication
Checks the validity of all
EntityMention
andSituationMention
UUID
’s referenced by eachSituation
.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_thrift
(thrift_object, indent_level=0)¶ Test if a Thrift object has all required fields.
This function calls the Thrift object’s validate() function. If an exception is raised because of missing required fields, the function catches the exception and logs the exception’s error message using the Python Standard Library’s logging module.
Parameters: - thrift_object –
- indent_level (int) – Text indentation level for logging error message
Returns: bool
-
concrete.validate.
validate_thrift_deep
(thrift_object, valid=True)¶ Deep validation of thrift messages.
Parameters: thrift_object – a Thrift object The Python version of Thrift 0.9.1 does not support deep (recursive) validation, and none of the Thrift serialization/deserialization code calls even the shallow validation functions provided by Thrift.
This function implements deep validation. The code is adapted from:
See this blog post for more information:
A patch to implement deep validation was submitted to the Thrift repository in February of 2013:
but Thrift 0.9.1 - which was released on 2013-08-21 - does not include this functionality.
-
concrete.validate.
validate_thrift_object_required_fields
(thrift_object, indent_level=0)¶ DEPRECATED: Use
validate_thrift()
instead
-
concrete.validate.
validate_thrift_object_required_fields_recursively
(thrift_object, valid=True)¶ DEPRECATED. Use
validate_thrift_deep()
instead.
-
concrete.validate.
validate_token_offsets_for_section
(section)¶ Test if the
TextSpan
boundaries for allSentence
objects in aSection
fall within the boundaries of theSection
’sTextSpan
Parameters: section (Section) – Returns: bool
-
concrete.validate.
validate_token_offsets_for_sentence
(sentence)¶ Test if the
TextSpan
boundaries for allToken
objects` in aSentence
fall within the boundaries of theSentence
’sTextSpan
.Parameters: sentence (Sentence) – Returns: bool
-
concrete.validate.
validate_token_ref_sequence
(comm, token_ref_sequence)¶ Check if a
TokenRefSequence
is validVerify that all token indices in the
TokenRefSequence
point to actual token indices in correspondingTokenization
Parameters: - comm (Communication) –
- token_ref_sequence (TokenRefSequence) –
Returns: bool
-
concrete.validate.
validate_token_taggings
(tokenization)¶ Test if a
Tokenization
has anyTokenTagging
objects with invalid token indicesParameters: tokenization (Tokenization) – Returns: bool
Low-level interface (Concrete schema)¶
Note that all data types defined by the Concrete schema—except for
services—can be imported directly from the top-level concrete
package. For example, instead of
from concrete.communication.ttypes import Communication
you can
write from concrete import Communication
.
concrete.access package¶
concrete.access.FetchCommunicationService module¶
-
class
concrete.access.FetchCommunicationService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.access.FetchCommunicationService.Iface
Service to fetch particular communications.-
fetch
(request)¶ - Parameters:- request
-
getCommunicationCount
()¶ - Get the number of Communications this service searches over. Implementationsthat do not provide this should throw an exception.
-
getCommunicationIDs
(offset, count)¶ - Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementationsthat do not provide this should throw an exception.Parameters:- offset- count
-
recv_fetch
()¶
-
recv_getCommunicationCount
()¶
-
recv_getCommunicationIDs
()¶
-
send_fetch
(request)¶
-
send_getCommunicationCount
()¶
-
send_getCommunicationIDs
(offset, count)¶
-
-
class
concrete.access.FetchCommunicationService.
Iface
¶ Bases:
concrete.services.Service.Iface
Service to fetch particular communications.-
fetch
(request)¶ - Parameters:- request
-
getCommunicationCount
()¶ - Get the number of Communications this service searches over. Implementationsthat do not provide this should throw an exception.
-
getCommunicationIDs
(offset, count)¶ - Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementationsthat do not provide this should throw an exception.Parameters:- offset- count
-
-
class
concrete.access.FetchCommunicationService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.access.FetchCommunicationService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_fetch
(seqid, iprot, oprot)¶
-
process_getCommunicationCount
(seqid, iprot, oprot)¶
-
process_getCommunicationIDs
(seqid, iprot, oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
fetch_args
(request=None)¶ Bases:
object
Attributes:- request-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
fetch_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
getCommunicationCount_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
getCommunicationCount_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.access.StoreCommunicationService module¶
-
class
concrete.access.StoreCommunicationService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.access.StoreCommunicationService.Iface
A service that exists so that clients can store Concrete datastructures to implementing servers.Implement this if you are creating an analytic that wishes tostore its results back to a server. That server may performvalidation, write the new layers to a database, and so forth.-
recv_store
()¶
-
send_store
(communication)¶
-
store
(communication)¶ - Store a communication to a server implementing this method.The communication that is stored should contain the newanalytic layers you wish to append. You may also wish to callmethods that unset annotations you feel the receiver would notfind useful in order to reduce network overhead.Parameters:- communication
-
-
class
concrete.access.StoreCommunicationService.
Iface
¶ Bases:
concrete.services.Service.Iface
A service that exists so that clients can store Concrete datastructures to implementing servers.Implement this if you are creating an analytic that wishes tostore its results back to a server. That server may performvalidation, write the new layers to a database, and so forth.-
store
(communication)¶ - Store a communication to a server implementing this method.The communication that is stored should contain the newanalytic layers you wish to append. You may also wish to callmethods that unset annotations you feel the receiver would notfind useful in order to reduce network overhead.Parameters:- communication
-
-
class
concrete.access.StoreCommunicationService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.access.StoreCommunicationService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_store
(seqid, iprot, oprot)¶
-
concrete.annotate package¶
concrete.annotate.AnnotateCommunicationService module¶
-
class
concrete.annotate.AnnotateCommunicationService.
Client
(iprot, oprot=None)¶ Bases:
concrete.annotate.AnnotateCommunicationService.Iface
Annotator service methods. For concrete analytics thatare to be stood up as independent services, accessiblefrom any programming language.-
annotate
(original)¶ - Main annotation method. Takes a communication as inputand returns a new one as output.It is up to the implementing service to verify thatthe input communication is valid.Can throw a ConcreteThriftException upon error(invalid input, analytic exception, etc.).Parameters:- original
-
getDocumentation
()¶ - Return a detailed description of what the particular tooldoes, what inputs and outputs to expect, etc.Developers whom are not familiar with the particularanalytic should be able to read this string andunderstand the essential functions of the analytic.
-
getMetadata
()¶ - Return the tool’s AnnotationMetadata.
-
recv_annotate
()¶
-
recv_getDocumentation
()¶
-
recv_getMetadata
()¶
-
send_annotate
(original)¶
-
send_getDocumentation
()¶
-
send_getMetadata
()¶
-
send_shutdown
()¶
-
shutdown
()¶ - Indicate to the server it should shut down.
-
-
class
concrete.annotate.AnnotateCommunicationService.
Iface
¶ Bases:
object
Annotator service methods. For concrete analytics thatare to be stood up as independent services, accessiblefrom any programming language.-
annotate
(original)¶ - Main annotation method. Takes a communication as inputand returns a new one as output.It is up to the implementing service to verify thatthe input communication is valid.Can throw a ConcreteThriftException upon error(invalid input, analytic exception, etc.).Parameters:- original
-
getDocumentation
()¶ - Return a detailed description of what the particular tooldoes, what inputs and outputs to expect, etc.Developers whom are not familiar with the particularanalytic should be able to read this string andunderstand the essential functions of the analytic.
-
getMetadata
()¶ - Return the tool’s AnnotationMetadata.
-
shutdown
()¶ - Indicate to the server it should shut down.
-
-
class
concrete.annotate.AnnotateCommunicationService.
Processor
(handler)¶ Bases:
concrete.annotate.AnnotateCommunicationService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_annotate
(seqid, iprot, oprot)¶
-
process_getDocumentation
(seqid, iprot, oprot)¶
-
process_getMetadata
(seqid, iprot, oprot)¶
-
process_shutdown
(seqid, iprot, oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
annotate_args
(original=None)¶ Bases:
object
Attributes:- original-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
annotate_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
getDocumentation_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
getDocumentation_result
(success=None)¶ Bases:
object
Attributes:- success-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
getMetadata_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.annotate.AnnotateWithContextService module¶
-
class
concrete.annotate.AnnotateWithContextService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.annotate.AnnotateWithContextService.Iface
A service that provides an alternative to Annotate,with the ability to pass along an additional Contextparameter that conveys additional information about theCommunication.-
annotate
(original, context)¶ - Takes a Communication and a Context as inputand returns a new one as output.It is up to the implementing service to verify thatthe input communication is valid, as well as interpretthe Context in an appropriate manner.Can throw a ConcreteThriftException upon error(invalid input, analytic exception, etc.).Parameters:- original- context
-
recv_annotate
()¶
-
send_annotate
(original, context)¶
-
-
class
concrete.annotate.AnnotateWithContextService.
Iface
¶ Bases:
concrete.services.Service.Iface
A service that provides an alternative to Annotate,with the ability to pass along an additional Contextparameter that conveys additional information about theCommunication.-
annotate
(original, context)¶ - Takes a Communication and a Context as inputand returns a new one as output.It is up to the implementing service to verify thatthe input communication is valid, as well as interpretthe Context in an appropriate manner.Can throw a ConcreteThriftException upon error(invalid input, analytic exception, etc.).Parameters:- original- context
-
-
class
concrete.annotate.AnnotateWithContextService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.annotate.AnnotateWithContextService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_annotate
(seqid, iprot, oprot)¶
-
concrete.audio package¶
-
class
concrete.audio.ttypes.
Sound
(wav=None, mp3=None, sph=None, path=None)¶ Bases:
object
A sound wave. A separate optional field is defined for eachsuppported format. Typically, a Sound object will only definea single field.Note: we may want to have separate fields for separate channels(left vs right), etc.Attributes:- wav- mp3- sph- path: An absolute path to a file on disk where the sound file can befound. It is assumed that this path will be accessable from anymachine that the system is run on (i.e., it should be a shareddisk, or possibly a mirrored directory).-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.clustering package¶
-
class
concrete.clustering.ttypes.
Cluster
(clusterMemberIndexList=None, confidenceList=None, childIndexList=None)¶ Bases:
object
A set of items which are alike in some way. Has an implicit id which is theindex of this Cluster in its parent Clustering’s ‘clusterList’.Attributes:- clusterMemberIndexList: The items in this cluster. Values are indices into the‘clusterMemberList’ of the Clustering which contains this Cluster.- confidenceList: Co-indexed with ‘clusterMemberIndexList’. The i^{th} value represents theconfidence that mention clusterMemberIndexList[i] belongs to this cluster.- childIndexList: A set of clusters (implicit ids/indices) from which this cluster wascreated. This cluster should represent the union of all the items in allof the child clusters. (For hierarchical clustering only).-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.clustering.ttypes.
ClusterMember
(communicationId=None, setId=None, elementId=None)¶ Bases:
object
An item being clustered. Does not designate cluster _membership_, as in“item x belongs to cluster C”, but rather just the item (“x” in thisexample). Membership is indicated through Cluster objects. An item may be aEntity, EntityMention, Situation, SituationMention, or technically anythingwith a UUID.Attributes:- communicationId: UUID of the Communication which contains the item specified by ‘elementId’.This is ancillary info assuming UUIDs are indeed universally unique.- setId: UUID of the Entity|Situation(Mention)Set which contains the item specified by ‘elementId’.This is ancillary info assuming UUIDs are indeed universally unique.- elementId: UUID of the EntityMention, Entity, SituationMention, or Situation thatthis item represents. This is the characteristic field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.clustering.ttypes.
Clustering
(uuid=None, metadata=None, clusterMemberList=None, clusterList=None, rootClusterIndexList=None)¶ Bases:
object
An (optionally) hierarchical clustering of items appearing across a set ofCommunications (intra-Communication clusterings are encoded by Entities andSituations). An item may be a Entity, EntityMention, Situation,SituationMention, or technically anything with a UUID.Attributes:- uuid: UUID for this Clustering object.- metadata: Metadata for this Clustering object.- clusterMemberList: The set of items being clustered.- clusterList: Clusters of items. If this is a hierarchical clustering, this may containclusters which are the set of smaller clusters.Clusters may not “overlap”, meaning (for all clusters X,Y):X cap Yeq emptyset implies X subset Y ee Y subset X- rootClusterIndexList: A set of disjoint clusters (indices in ‘clusterList’) which cover allitems in ‘clusterMemberList’. This list must be specified for hierarchicalclusterings and should not be specified for flat clusterings.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.communication package¶
-
class
concrete.communication.ttypes.
Communication
(id=None, uuid=None, type=None, text=None, startTime=None, endTime=None, communicationTaggingList=None, metadata=None, keyValueMap=None, lidList=None, sectionList=None, entityMentionSetList=None, entitySetList=None, situationMentionSetList=None, situationSetList=None, originalText=None, sound=None, communicationMetadata=None)¶ Bases:
object
A single communication instance, containing linguistic contentgenerated by a single speaker or author. This type is used forboth inter-personal communications (such as phone calls orconversations) and third-party communications (such as newsarticles).Each communication instance is grounded by its original(unannotated) contents, which should be stored in either the“text” field (for text communications) or the “audio” field (foraudio communications). If the communication is not available inits original form, then these fields should store thecommunication in the least-processed form available.Attributes:- id: Stable identifier for this communication, identifying both thename of the source corpus and the document that it corresponds toin that corpus.- uuid: Universally unique identifier for this communication instance.This is generated randomly, and can not be mapped back to thesource corpus. It is used as a target for symbolic “pointers”.- type: A short, corpus-specific term characterizing the nature of thecommunication; may change in a future version of concrete.Often used for filtering. For example, Gigaword usesthe type “story” to distinguish typical news articles fromweekly summaries (“multi”), editorial advisories (“advis”), etc.At present, this value is typically a literal form from theoriginating corpus: as a result, a type marked ‘other’ may havedifferent meanings across different corpora.- text: The full text contents of this communication in its originalform, or in the least-processed form available, if the originalis not available.- startTime: The time when this communication started (in unix time UTC –i.e., seconds since January 1, 1970).- endTime: The time when this communication ended (in unix time UTC –i.e., seconds since January 1, 1970).- communicationTaggingList: A list of CommunicationTagging objects that can support thisCommunication. CommunicationTagging objects can be used toannotate Communications with topics, gender identification, etc.- metadata: metadata.AnnotationMetadata to support this particular communication.Communications derived from other communications shouldindicate in this metadata object their dependencyto the original communication ID.- keyValueMap: A catch-all store of keys and values. Use sparingly!- lidList: Theories about the languages that are present in thiscommunication.- sectionList: Theory about the block structure of this communication.- entityMentionSetList: Theories about which spans of text are used to mention entitiesin this communication.- entitySetList: Theories about what entities are discussed in thiscommunication, with pointers to individual mentions.- situationMentionSetList: Theories about what situations are explicitly mentioned in thiscommunication.- situationSetList: Theories about what situations are asserted in thiscommunication.- originalText: Optional original text field that points back to an originalcommunication.This field can be populated for sake of convenience when creating“perspective” communication (communications that are based onhighly destructive changes to an original communication [e.g.,via MT]). This allows developers to quickly access the originaltext that this perspective communication is based off of.- sound: The full audio contents of this communication in its originalform, or in the least-processed form available, if the originalis not available.- communicationMetadata: Metadata about this specific Communication, such as informationabout its author, information specific to this Communicationor Communications like it (info from an API, for example), etc.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.communication.ttypes.
CommunicationSet
(communicationIdList=None, corpus=None, entityMentionClusterList=None, entityClusterList=None, situationMentionClusterList=None, situationClusterList=None)¶ Bases:
object
A structure that represents a collection of Communications.Attributes:- communicationIdList: A list of Communication UUIDs that this CommunicationSetrepresents.This field may be absent if this CommunicationSet representsa large corpus. If absent, ‘corpus’ field should be present.- corpus: The name of a corpus or other document body that thisCommunicationSet represents.Should be present if ‘communicationIdList’ is absent.- entityMentionClusterList: A list of Clustering objects that represent agroup of EntityMentions that are a part of thisCommunicationSet.- entityClusterList: A list of Clustering objects that represent agroup of Entities that are a part of thisCommunicationSet.- situationMentionClusterList: A list of Clustering objects that represent agroup of SituationMentions that are a part of thisCommunicationSet.- situationClusterList: A list of Clustering objects that represent agroup of Situations that are a part of thisCommunicationSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.communication.ttypes.
CommunicationTagging
(uuid=None, metadata=None, taggingType=None, tagList=None, confidenceList=None)¶ Bases:
object
A structure that represents a ‘tagging’ of a Communication. Thesemight be labels or annotations on a particular communcation.For example, this structure might be used to describe the topicsdiscussed in a Communication. The taggingType might be ‘topic’, andthe tagList might include ‘politics’ and ‘science’.Attributes:- uuid: A unique identifier for this CommunicationTagging object.- metadata: AnnotationMetadata to support this CommunicationTagging object.- taggingType: A string that captures the type of this CommunicationTaggingobject. For example: ‘topic’ or ‘gender’.- tagList: A list of strings that represent different tags related to the taggingType.For example, if the taggingType is ‘topic’, some example tags might be‘politics’, ‘science’, etc.- confidenceList: A list of doubles, parallel to the list of strings in tagList,that indicate the confidences of each tag.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.context package¶
-
class
concrete.context.ttypes.
Context
(contents=None)¶ Bases:
object
A structure intended to convey context about a particularConcrete communication.Contexts are intended to be used to convey meta-communicationinformation to analytics via an RPC method. It is expected thatservices consuming or producing Contexts are coupled,delivering an agreed upon format that is capable ofbeing interpreted and processed between two particular services.Currently, it is being used to transmit hypotheses alongsideconcrete communications for AIDA.Attributes:- contents: The contents of the Context. Services should agreeupon what the expected format of the contents are(e.g. JSON, RDF) between themselves.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.email package¶
-
class
concrete.email.ttypes.
EmailAddress
(address=None, displayName=None)¶ Bases:
object
An email address, optionally accompanied by a display_name. Thesevalues are typically extracted from strings such as:<tt> “John Smith” <john@xyz.com> </tt>.see RFC2822 http://tools.ietf.org/html/rfc2822Attributes:- address- displayName-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.email.ttypes.
EmailCommunicationInfo
(messageId=None, contentType=None, userAgent=None, inReplyToList=None, referenceList=None, senderAddress=None, returnPathAddress=None, toAddressList=None, ccAddressList=None, bccAddressList=None, emailFolder=None, subject=None, quotedAddresses=None, attachmentPaths=None, salutation=None, signature=None)¶ Bases:
object
Extra information about an email communication instance.Attributes:- messageId- contentType- userAgent- inReplyToList- referenceList- senderAddress- returnPathAddress- toAddressList- ccAddressList- bccAddressList- emailFolder- subject- quotedAddresses- attachmentPaths- salutation- signature-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.entities package¶
-
class
concrete.entities.ttypes.
Entity
(uuid=None, id=None, mentionIdList=None, rawMentionList=None, type=None, confidence=None, canonicalName=None)¶ Bases:
object
A single referent (or “entity”) that is referred to at least oncein a given communication, along with pointers to all of thereferences to that referent. The referent’s type (e.g., is it aperson, or a location, or an organization, etc) is also recorded.Because each Entity contains pointers to all references to areferent with a given communication, an Entity can bethought of as a coreference set.Attributes:- uuid: Unique identifier for this entity.- id: A corpus-specific and stable id such as a Freebase midor a DBpedia id.- mentionIdList: An list of pointers to all of the mentions of this Entity’sreferent. (type=EntityMention)- rawMentionList: An list of pointers to all of the sentences which contain amention of this Entity.- type: The basic type of this entity’s referent.- confidence: Confidence score for this individual entity. You can also set aconfidence score for an entire EntitySet using the EntitySet’smetadata.- canonicalName: A string containing a representative, canonical, or “best” namefor this entity’s referent. This string may match one of thementions’ text strings, but it is not required to.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.entities.ttypes.
EntityMention
(uuid=None, tokens=None, entityType=None, phraseType=None, confidence=None, text=None, childMentionIdList=None)¶ Bases:
object
A span of text with a specific referent, such as a person,organization, or time. Things that can be referred to by a mentionare called “entities.”It is left up to individual EntityMention taggers to decide whichreferent types and phrase types to identify. For example, someEntityMention taggers may only identify proper nouns, or may onlyidentify EntityMentions that refer to people.Each EntityMention consists of a sequence of tokens. This sequenceis usually annotated with information about the referent type(e.g., is it a person, or a location, or an organization, etc) aswell as the phrase type (is it a name, pronoun, common noun, etc.).EntityMentions typically consist of a single noun phrase; however,other phrase types may also be marked as mentions. Forexample, in the phrase “French hotel,” the adjective “French” mightbe marked as a mention for France.Attributes:- uuid- tokens: Pointer to sequence of tokens.Special note: In the case of PRO-drop, where there is no explicitmention, but an EntityMention is needed for downstream Entityanalysis, this field should be set to a TokenRefSequence with anempty tokenIndexList and the anchorTokenIndex set to the head/onlytoken of the verb/predicate from which the PRO was dropped.- entityType: The type of referent that is referred to by this mention.- phraseType: The phrase type of the tokens that constitute this mention.- confidence: A confidence score for this individual mention. You can alsoset a confidence score for an entire EntityMentionSet using theEntityMentionSet’s metadata.- text: The text content of this entity mention. This field istypically redundant with the string formed by cross-referencingthe ‘tokens.tokenIndexList’ field with this mention’stokenization. This field may not be generated by all analytics.- childMentionIdList: A list of pointers to the “child” EntityMentions of thisEntityMention.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.entities.ttypes.
EntityMentionSet
(uuid=None, metadata=None, mentionList=None, linkingList=None)¶ Bases:
object
A theory about the set of entity mentions that are present in amessage. See also: EntityMentionThis type does not represent a coreference relationship, which is handled by Entity.This type is meant to represent the output of a entity-mention-identifier,which is often a part of an in-doc coreference system.Attributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- mentionList: List of mentions in this set.- linkingList: Entity linking annotations associated with this EntityMentionSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.entities.ttypes.
EntitySet
(uuid=None, metadata=None, entityList=None, linkingList=None, mentionSetId=None)¶ Bases:
object
A theory about the set of entities that are present in amessage. See also: Entity.Attributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- entityList: List of entities in this set.- linkingList: Entity linking annotations associated with this EntitySet.- mentionSetId: An optional UUID pointer to an EntityMentionSet.If this field is present, consumers can assume that allEntity objects in this EntitySet have EntityMentions that are includedin the named EntityMentionSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.exceptions package¶
concrete.language package¶
-
class
concrete.language.ttypes.
LanguageIdentification
(uuid=None, metadata=None, languageToProbabilityMap=None)¶ Bases:
object
A theory about what languages are present in a given communicationor piece of communication. Note that it is possible to have morethan one language present in a given communication.Attributes:- uuid: Unique identifier for this language identification.- metadata: Information about where this language identification came from.- languageToProbabilityMap: A list mapping from a language to the probability that thatlanguage occurs in a given communication. Each language code shouldoccur at most once in this list. The probabilities do <i>not</i>need to sum to one – for example, if a single communication is knownto contain both English and French, then it would be appropriateto assign a probability of 1 to both langauges. (Manuallyannotated LanguageProb objects should always have probabilitiesof either zero or one; machine-generated LanguageProbs may haveintermediate probabilities.)Note: The string key should represent the ISO 639-3 three-letter code.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.learn package¶
concrete.learn.ActiveLearnerClientService module¶
-
class
concrete.learn.ActiveLearnerClientService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.learn.ActiveLearnerClientService.Iface
The active learner client implements a method to accept new sorts of the annotation units-
recv_submitSort
()¶
-
send_submitSort
(sessionId, unitIds)¶
-
submitSort
(sessionId, unitIds)¶ - Submit a new sort of communications to the brokerParameters:- sessionId- unitIds
-
-
class
concrete.learn.ActiveLearnerClientService.
Iface
¶ Bases:
concrete.services.Service.Iface
The active learner client implements a method to accept new sorts of the annotation units-
submitSort
(sessionId, unitIds)¶ - Submit a new sort of communications to the brokerParameters:- sessionId- unitIds
-
-
class
concrete.learn.ActiveLearnerClientService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.learn.ActiveLearnerClientService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_submitSort
(seqid, iprot, oprot)¶
-
concrete.learn.ActiveLearnerServerService module¶
-
class
concrete.learn.ActiveLearnerServerService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.learn.ActiveLearnerServerService.Iface
The active learning server is responsible for sorting a list of communications.Users annotate communications based on the sort.Active learning is an asynchronous process.It is started by the client calling start().At arbitrary times, the client can call addAnnotations().When the server is done with a sort of the data, it calls submitSort() on the client.The server can perform additional sorts until stop() is called.The server must be preconfigured with the details of the data source to pull communications.-
addAnnotations
(sessionId, annotations)¶ - Add annotations from the user to the learning processParameters:- sessionId- annotations
-
recv_addAnnotations
()¶
-
recv_start
()¶
-
recv_stop
()¶
-
send_addAnnotations
(sessionId, annotations)¶
-
send_start
(sessionId, task, contact)¶
-
send_stop
(sessionId)¶
-
start
(sessionId, task, contact)¶ - Start an active learning session on these communicationsParameters:- sessionId- task- contact
-
stop
(sessionId)¶ - Stop the learning sessionParameters:- sessionId
-
-
class
concrete.learn.ActiveLearnerServerService.
Iface
¶ Bases:
concrete.services.Service.Iface
The active learning server is responsible for sorting a list of communications.Users annotate communications based on the sort.Active learning is an asynchronous process.It is started by the client calling start().At arbitrary times, the client can call addAnnotations().When the server is done with a sort of the data, it calls submitSort() on the client.The server can perform additional sorts until stop() is called.The server must be preconfigured with the details of the data source to pull communications.-
addAnnotations
(sessionId, annotations)¶ - Add annotations from the user to the learning processParameters:- sessionId- annotations
-
start
(sessionId, task, contact)¶ - Start an active learning session on these communicationsParameters:- sessionId- task- contact
-
stop
(sessionId)¶ - Stop the learning sessionParameters:- sessionId
-
-
class
concrete.learn.ActiveLearnerServerService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.learn.ActiveLearnerServerService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_addAnnotations
(seqid, iprot, oprot)¶
-
process_start
(seqid, iprot, oprot)¶
-
process_stop
(seqid, iprot, oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
addAnnotations_args
(sessionId=None, annotations=None)¶ Bases:
object
Attributes:- sessionId- annotations-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
addAnnotations_result
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
start_args
(sessionId=None, task=None, contact=None)¶ Bases:
object
Attributes:- sessionId- task- contact-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
start_result
(success=None)¶ Bases:
object
Attributes:- success-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ttypes.
Annotation
(id=None, communication=None)¶ Bases:
object
Annotation on a communication.Attributes:- id: Identifier of the part of the communication being annotated.- communication: Communication with the annotation stored in it.The location of the annotation depends on the annotation unit identifier-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ttypes.
AnnotationTask
(type=None, language=None, unitType=None, units=None)¶ Bases:
object
Annotation task including information for pulling data.Attributes:- type: Type of annotation task- language: Language of the data for the task- unitType: Entire communication or individual sentences- units: Identifiers for each annotation unit-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.linking package¶
-
class
concrete.linking.ttypes.
Link
(sourceId=None, linkTargetList=None)¶ Bases:
object
A structure that represents the origin of an entity linking annotation.Attributes:- sourceId: The “root” of this Link; points to a EntityMention UUID, Entity UUID, etc.- linkTargetList: A list of LinkTarget objects that this Link contains.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.linking.ttypes.
LinkTarget
(confidence=None, targetId=None, dbId=None, dbName=None)¶ Bases:
object
A structure that represents the target of an entity linking annotation.Attributes:- confidence: Confidence of this LinkTarget object.- targetId: A UUID that represents the target of this LinkTarget. ThisUUID should exist in the Entity/Situation(Mention)Set that theLinking object is contained in.- dbId: A database ID that represents the target of this linking.This should be used if the target of the linking is not associatedwith an Entity|Situation(Mention)Set in Concrete, and therefore cannot be linked bya UUID internal to concrete.If present, other optional field ‘dbName’ should also be populated.- dbName: The name of the database that represents the target of this linking.Together with the ‘dbId’, this can form a pointer to a targetthat is not represented inside concrete.Should be populated alongside ‘dbId’.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.linking.ttypes.
Linking
(metadata=None, linkList=None)¶ Bases:
object
A structure that represents entity linking annotations.Attributes:- metadata: Metadata related to this Linking object.- linkList: A list of Link objects that this Linking object contains.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.metadata package¶
-
class
concrete.metadata.ttypes.
AnnotationMetadata
(tool=None, timestamp=None, digest=None, dependencies=None, kBest=1)¶ Bases:
object
Metadata associated with an annotation or a set of annotations,that identifies where those annotations came from.Attributes:- tool: The name of the tool that generated this annotation.- timestamp: The time at which this annotation was generated (in unix timeUTC – i.e., seconds since January 1, 1970).- digest: A Digest, carrying over any information the annotation metadatawishes to carry over.- dependencies: The theories that supported this annotation.An empty field indicates that the theory has nodependencies (e.g., an ingester).- kBest: An integer that represents a ranking for systemsthat output k-best lists.For systems that do not output k-best lists,the default value (1) should suffice.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.metadata.ttypes.
CommunicationMetadata
(tweetInfo=None, emailInfo=None, nitfInfo=None)¶ Bases:
object
Metadata specific to a particular Communication object.This might include corpus-specific metadata (from the Twitter API),attributes associated with the Communication (the author),or other information about the Communication.Attributes:- tweetInfo: Extra information for communications where kind==TWEET:Information about this tweet that is provided by the TwitterAPI. For information about the Twitter API, see:- emailInfo: Extra information for communications where kind==EMAIL- nitfInfo: Extra information that may come from the NITF(News Industry Text Format) schema. See ‘nitf.thrift’.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.metadata.ttypes.
Digest
(bytesValue=None, int64Value=None, doubleValue=None, stringValue=None, int64List=None, doubleList=None, stringList=None)¶ Bases:
object
Analytic-specific information about an attribute or edge. Digestsare used to combine information from multiple sources to generate aunified value. The digests generated by an analytic will only everbe used by that same analytic, so analytics can feel free to encodeinformation in whatever way is convenient.Attributes:- bytesValue: The following fields define various ways you can store thedigest data (for convenience). If none of these meets yourneeds, then serialize the digest to a byte sequence and store itin bytesValue.- int64Value- doubleValue- stringValue- int64List- doubleList- stringList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.metadata.ttypes.
TheoryDependencies
(sectionTheoryList=None, sentenceTheoryList=None, tokenizationTheoryList=None, posTagTheoryList=None, nerTagTheoryList=None, lemmaTheoryList=None, langIdTheoryList=None, parseTheoryList=None, dependencyParseTheoryList=None, tokenAnnotationTheoryList=None, entityMentionSetTheoryList=None, entitySetTheoryList=None, situationMentionSetTheoryList=None, situationSetTheoryList=None, communicationsList=None)¶ Bases:
object
A struct that holds UUIDs for all theories that a particularannotation was based upon (and presumably requires).Producers of TheoryDependencies should list all stages that theyused in constructing their particular annotation. They do not,however, need to explicitly label each stage; they can labelonly the immediate stage before them.Examples:If you are producing a Tokenization, and only used theSentenceSegmentation in order to produce that Tokenization, listonly the single SentenceSegmentation UUID in sentenceTheoryList.In this example, even though the SentenceSegmentation will havea dependency on some SectionSegmentation, it is not necessaryfor the Tokenization to list the SectionSegmentation UUID as adependency.If you are a producer of EntityMentions, and you use twoPOSTokenTagging and one NERTokenTagging objects, add the UUIDs forthe POSTokenTagging objects to posTagTheoryList, and the UUID ofthe NER TokenTagging to the nerTagTheoryList.In this example, because multiple annotations influenced thenew annotation, they should all be listed as dependencies.Attributes:- sectionTheoryList- sentenceTheoryList- tokenizationTheoryList- posTagTheoryList- nerTagTheoryList- lemmaTheoryList- langIdTheoryList- parseTheoryList- dependencyParseTheoryList- tokenAnnotationTheoryList- entityMentionSetTheoryList- entitySetTheoryList- situationMentionSetTheoryList- situationSetTheoryList- communicationsList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.nitf package¶
-
class
concrete.nitf.ttypes.
NITFInfo
(alternateURL=None, articleAbstract=None, authorBiography=None, banner=None, biographicalCategoryList=None, columnName=None, columnNumber=None, correctionDate=None, correctionText=None, credit=None, dayOfWeek=None, descriptorList=None, featurePage=None, generalOnlineDescriptorList=None, guid=None, kicker=None, leadParagraphList=None, locationList=None, nameList=None, newsDesk=None, normalizedByline=None, onlineDescriptorList=None, onlineHeadline=None, onlineLeadParagraph=None, onlineLocationList=None, onlineOrganizationList=None, onlinePeople=None, onlineSectionList=None, onlineTitleList=None, organizationList=None, page=None, peopleList=None, publicationDate=None, publicationDayOfMonth=None, publicationMonth=None, publicationYear=None, section=None, seriesName=None, slug=None, taxonomicClassifierList=None, titleList=None, typesOfMaterialList=None, url=None, wordCount=None)¶ Bases:
object
Attributes:- alternateURL: This field specifies the URL of the article, if published online. In somecases, such as with the New York Times, when this field is present,the URL is preferred to the URL field on articles published onor after April 02, 2006, as the linked page will have richer content.- articleAbstract: This field is a summary of the article, possibly written byan indexing service.- authorBiography: This field specifies the biography of the author of the article.Generally, this field is specified for guest authors, and not forregular reporters, except to provide the author’s email address.- banner: The banner field is used to indicate if there has been additionalinformation appended to the articles since its publication. Examples ofbanners include (‘Correction Appended’ and ‘Editor’s Note Appended’).- biographicalCategoryList: When present, the biographical category field generally indicates that adocument focuses on a particular individual. The value of the fieldindicates the area or category in which this individual is best known.This field is most often defined for Obituaries and Book Reviews.<ol><li>Politics and Government (U.S.)</li><li>Books and Magazines <li>Royalty</li></ol>- columnName: If the article is part of a regular column, this field specifies the nameof that column.<br>Sample Column Names:<br><ol><li>World News Briefs</li><li>WEDDINGS</li><li>The Accessories Channel</li></ol>- columnNumber: This field specifies the column in which the article starts in the printpaper. A typical printed page in the paper has six columns numbered fromright to left. As a consequence most, but not all, of the values for thisfield fall in the range 1-6.- correctionDate: This field specifies the date on which a correction was made to thearticle. Generally, if the correction date is specified, the correctiontext will also be specified (and vice versa).- correctionText: For articles corrected following publication, this field specifies thecorrection. Generally, if the correction text is specified, thecorrection date will also be specified (and vice versa).- credit: This field indicates the entity that produced the editorial content ofthis document.- dayOfWeek: This field specifies the day of week on which the article was published.<ul><li>Monday</li><li>Tuesday</li><li>Wednesday</li><li>Thursday</li><li>Friday</li><li>Saturday</li><li>Sunday</li></ul>- descriptorList: The "descriptors" field specifies a list of descriptive terms drawn froma normalized controlled vocabulary corresponding to subjects mentioned inthe article.<br>Examples Include:<ol><li>ECONOMIC CONDITIONS AND TRENDS</li><li>AIRPLANES</li><li>VIOLINS</li></ol>- featurePage: The feature page containing this article, such as<ul><li>Education Page</li><li>Fashion Page</li></ul>- generalOnlineDescriptorList: The "general online descriptors" field specifies a list of descriptorsthat are at a higher level of generality than the other tags associatedwith the article.<br>Examples Include:<ol><li>Surfing</li><li>Venice Biennale</li><li>Ranches</li></ol>- guid: The GUID field specifies an integer that is guaranteed to be unique forevery document in the corpus.- kicker: The kicker is an additional piece of information printed as anaccompaniment to a news headline.- leadParagraphList: The "lead Paragraph" field is the lead paragraph of the article.Generally this field is populated with the first two paragraphs from thearticle.- locationList: The "locations" field specifies a list of geographic descriptors drawnfrom a normalized controlled vocabulary that correspond to placesmentioned in the article.<br>Examples Include:<ol><li>Wellsboro (Pa)</li><li>Kansas City (Kan)</li><li>Park Slope (NYC)</li></ol>- nameList: The "names" field specifies a list of names mentioned in the article.<br>Examples Include:<ol><li>Azza Fahmy</li><li>George C. Izenour</li><li>Chris Schenkel</li></ol>- newsDesk: This field specifies the desk in the newsroom thatproduced the article. The desk is related to, but is not the same as thesection in which the article appears.- normalizedByline: The Normalized Byline field is the byline normalized to the form (lastname, first name).- onlineDescriptorList: This field specifies a list of descriptors from a normalized controlledvocabulary that correspond to topics mentioned in the article.<br>Examples Include:<ol><li>Marriages</li><li>Parks and Other Recreation Areas</li><li>Cooking and Cookbooks</li></ol>- onlineHeadline: This field specifies the headline displayed with the articleonline. Often this differs from the headline used in print.- onlineLeadParagraph: This field specifies the lead paragraph for the online version.- onlineLocationList: This field specifies a list of place names that correspond to geographiclocations mentioned in the article.<br>Examples Include:<ol><li>Hollywood</li><li>Los Angeles</li><li>Arcadia</li></ol>- onlineOrganizationList: This field specifies a list of organizations that correspond toorganizations mentioned in the article.<br>Examples Include:<ol><li>Nintendo Company Limited</li><li>Yeshiva University</li><li>Rose Center</li></ol>- onlinePeople: This field specifies a list of people that correspond to individualsmentioned in the article.<br>Examples Include:<ol><li>Lopez, Jennifer</li><li>Joyce, James</li><li>Robinson, Jackie</li></ol>- onlineSectionList: This field specifies the section(s) in which the online version of the articleis placed. This may typically be populated from a semicolon (;) delineated list.- onlineTitleList: This field specifies a list of authored works mentioned in the article.<br>Examples Include:<ol><li>Matchstick Men (Movie)</li><li>Blades of Glory (Movie)</li><li>Bridge and Tunnel (Play)</li></ol>- organizationList: This field specifies a list of organization names drawn from a normalizedcontrolled vocabulary that correspond to organizations mentioned in thearticle.<br>Examples Include:<ol><li>Circuit City Stores Inc</li><li>Delaware County Community College (Pa)</li><li>CONNECTICUT GRAND OPERA</li></ol>- page: This field specifies the page of the section in the paper in which thearticle appears. This is not an absolute pagination. An article thatappears on page 3 in section A occurs in the physical paper before anarticle that occurs on page 1 of section F. The section is encoded inthe <strong>section</strong> field.- peopleList: This field specifies a list of people from a normalized controlledvocabulary that correspond to individuals mentioned in the article.<br>Examples Include:<ol><li>REAGAN, RONALD WILSON (PRES)</li><li>BEGIN, MENACHEM (PRIME MIN)</li><li>COLLINS, GLENN</li></ol>- publicationDate: This field specifies the date of the article’s publication.- publicationDayOfMonth: This field specifies the day of the month on which the article waspublished, always in the range 1-31.- publicationMonth: This field specifies the month on which the article was published in therange 1-12 where 1 is January 2 is February etc.- publicationYear: This field specifies the year in which the article was published. Thisvalue is in the range 1987-2007 for this collection.- section: This field specifies the section of the paper in which the articleappears. This is not the name of the section, but rather a letter ornumber that indicates the section.- seriesName: If the article is part of a regular series, this field specifies the nameof that column.- slug: The slug is a short string that uniquely identifies an article from allother articles published on the same day. Please note, however, thatdifferent articles on different days may have the same slug.<ul><li>30other</li><li>12reunion</li></ul>- taxonomicClassifierList: This field specifies a list of taxonomic classifiers that place thisarticle into a hierarchy of articles. The individual terms of eachtaxonomic classifier are separated with the ‘/’ character.<br>Examples Include:<ol><li>Top/Features/Travel/Guides/Destinations/North America/UnitedStates/Arizona</li><li>Top/News/U.S./Rockies</li><li>Top/Opinion</li></ol>- titleList: This field specifies a list of authored works that correspond to worksmentioned in the article.<br>Examples Include:<ol><li>Greystoke: The Legend of Tarzan, Lord of the Apes (Movie)</li><li>Law and Order (TV Program)</li><li>BATTLEFIELD EARTH (BOOK)</li></ol>- typesOfMaterialList: This field specifies a normalized list of terms describing the generaleditorial category of the article.<br>Examples Include:<ol><li>REVIEW</li><li>OBITUARY</li><li>ANALYSIS</li></ol>- url: This field specifies the location of the online version of the article. The"Alternative Url" field is preferred to this field on articles publishedon or after April 02, 2006, as the linked page will have richer content.- wordCount: This field specifies the number of words in the body of the article,including the lead paragraph.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.search package¶
concrete.search.FeedbackService module¶
-
class
concrete.search.FeedbackService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.search.FeedbackService.Iface
-
addCommunicationFeedback
(searchResultsId, communicationId, feedback)¶ - Provide feedback on the relevance of a particular communication to a searchParameters:- searchResultsId- communicationId- feedback
-
addSentenceFeedback
(searchResultsId, communicationId, sentenceId, feedback)¶ - Provide feedback on the relevance of a particular sentence to a searchParameters:- searchResultsId- communicationId- sentenceId- feedback
-
recv_addCommunicationFeedback
()¶
-
recv_addSentenceFeedback
()¶
-
recv_startFeedback
()¶
-
send_addCommunicationFeedback
(searchResultsId, communicationId, feedback)¶
-
send_addSentenceFeedback
(searchResultsId, communicationId, sentenceId, feedback)¶
-
send_startFeedback
(results)¶
-
startFeedback
(results)¶ - Start providing feedback for the specified SearchResults.This causes the search and its results to be persisted.Parameters:- results
-
-
class
concrete.search.FeedbackService.
Iface
¶ Bases:
concrete.services.Service.Iface
-
addCommunicationFeedback
(searchResultsId, communicationId, feedback)¶ - Provide feedback on the relevance of a particular communication to a searchParameters:- searchResultsId- communicationId- feedback
-
addSentenceFeedback
(searchResultsId, communicationId, sentenceId, feedback)¶ - Provide feedback on the relevance of a particular sentence to a searchParameters:- searchResultsId- communicationId- sentenceId- feedback
-
startFeedback
(results)¶ - Start providing feedback for the specified SearchResults.This causes the search and its results to be persisted.Parameters:- results
-
-
class
concrete.search.FeedbackService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.search.FeedbackService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_addCommunicationFeedback
(seqid, iprot, oprot)¶
-
process_addSentenceFeedback
(seqid, iprot, oprot)¶
-
process_startFeedback
(seqid, iprot, oprot)¶
-
-
class
concrete.search.FeedbackService.
addCommunicationFeedback_args
(searchResultsId=None, communicationId=None, feedback=None)¶ Bases:
object
Attributes:- searchResultsId- communicationId- feedback-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.FeedbackService.
addCommunicationFeedback_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.FeedbackService.
addSentenceFeedback_args
(searchResultsId=None, communicationId=None, sentenceId=None, feedback=None)¶ Bases:
object
Attributes:- searchResultsId- communicationId- sentenceId- feedback-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.FeedbackService.
addSentenceFeedback_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.search.SearchProxyService module¶
-
class
concrete.search.SearchProxyService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.search.SearchProxyService.Iface
The search proxy service provides a single interface to multiple search providers-
getCapabilities
(provider)¶ - Get a list of search type and language pairs for a search providerParameters:- provider
-
getCorpora
(provider)¶ - Get a corpus list for a search providerParameters:- provider
-
getProviders
()¶ - Get a list of search providers behind the proxy
-
recv_getCapabilities
()¶
-
recv_getCorpora
()¶
-
recv_getProviders
()¶
-
recv_search
()¶
-
search
(query, provider)¶ - Specify the search provider when performing a searchParameters:- query- provider
-
send_getCapabilities
(provider)¶
-
send_getCorpora
(provider)¶
-
send_getProviders
()¶
-
send_search
(query, provider)¶
-
-
class
concrete.search.SearchProxyService.
Iface
¶ Bases:
concrete.services.Service.Iface
The search proxy service provides a single interface to multiple search providers-
getCapabilities
(provider)¶ - Get a list of search type and language pairs for a search providerParameters:- provider
-
getCorpora
(provider)¶ - Get a corpus list for a search providerParameters:- provider
-
getProviders
()¶ - Get a list of search providers behind the proxy
-
search
(query, provider)¶ - Specify the search provider when performing a searchParameters:- query- provider
-
-
class
concrete.search.SearchProxyService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.search.SearchProxyService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_getCapabilities
(seqid, iprot, oprot)¶
-
process_getCorpora
(seqid, iprot, oprot)¶
-
process_getProviders
(seqid, iprot, oprot)¶
-
process_search
(seqid, iprot, oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCapabilities_args
(provider=None)¶ Bases:
object
Attributes:- provider-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCapabilities_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCorpora_args
(provider=None)¶ Bases:
object
Attributes:- provider-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCorpora_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getProviders_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getProviders_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.search.SearchService module¶
-
class
concrete.search.SearchService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.search.SearchService.Iface
-
getCapabilities
()¶ - Get a list of search type-language pairs
-
getCorpora
()¶ - Get a corpus list from the search provider
-
recv_getCapabilities
()¶
-
recv_getCorpora
()¶
-
recv_search
()¶
-
search
(query)¶ - Perform a search specified by the queryParameters:- query
-
send_getCapabilities
()¶
-
send_getCorpora
()¶
-
send_search
(query)¶
-
-
class
concrete.search.SearchService.
Iface
¶ Bases:
concrete.services.Service.Iface
-
getCapabilities
()¶ - Get a list of search type-language pairs
-
getCorpora
()¶ - Get a corpus list from the search provider
-
search
(query)¶ - Perform a search specified by the queryParameters:- query
-
-
class
concrete.search.SearchService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.search.SearchService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_getCapabilities
(seqid, iprot, oprot)¶
-
process_getCorpora
(seqid, iprot, oprot)¶
-
process_search
(seqid, iprot, oprot)¶
-
-
class
concrete.search.SearchService.
getCapabilities_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchService.
getCapabilities_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchService.
getCorpora_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchService.
getCorpora_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.ttypes.
SearchCapability
(type=None, lang=None)¶ Bases:
object
A search provider describes its capabilities with a list of search type and language pairs.Attributes:- type: A type of search supported by the search provider- lang: Language that the search provider supports.Use ISO 639-2/T three letter codes.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.ttypes.
SearchFeedback
¶ Bases:
object
Feedback values-
NEGATIVE
= -1¶
-
NONE
= 0¶
-
POSITIVE
= 1¶
-
-
class
concrete.search.ttypes.
SearchQuery
(terms=None, questions=None, communicationId=None, tokens=None, rawQuery=None, auths=None, userId=None, name=None, labels=None, type=None, lang=None, corpus=None, k=None, communication=None)¶ Bases:
object
Wrapper for information relevant to a (possibly structured) search.Attributes:- terms: Individual words, or multiword phrases, e.g., ‘dog’, ‘bluecheese’. It is the responsibility of the implementation ofSearch* to tokenize multiword phrases, if so-desired. Further,an implementation may choose to support advanced features such aswildcards, e.g.: ‘blue*’. This specification makes nocommittment as to the internal structure of keywords and theirsemantics: that is the responsibility of the individualimplementation.- questions: e.g., “what is the capital of spain?”questions is a list in order that possibly different phrasings ofthe question can be included, e.g.: “what is the name of spain’scapital?”- communicationId: Refers to an optional communication that can provide context for the query.- tokens: Refers to a sequence of tokens in the communication referenced by communicationId.- rawQuery: The input from the user provided in the search box, unmodified- auths: optional authorization mechanism- userId: Identifies the user who submitted the search query- name: Human readable name of the query.- labels: Properties of the query or user.These labels can be used to group queries and results by a domain or group ofusers for training. An example usage would be assigning the geographical regionas a label (“spain”). User labels could be based on organizational units (“hltcoe”).- type: This search is over this type of data (communications, sentences, entities)- lang: The language of the corpus that the user wants to search.Use ISO 639-2/T three letter codes.- corpus: An identifier of the corpus that the search is to be performed over.- k: The maximum number of candidates the search service should return.- communication: An optional communication used as context for the query.If both this field and communicationId is populated, then it isassumed the ID of the communication is the same as communicationId.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.ttypes.
SearchResult
(uuid=None, searchQuery=None, searchResultItems=None, metadata=None, lang=None)¶ Bases:
object
Single wrapper for results from all the various Search* services.Attributes:- uuid: Unique identifier for the results of this search.- searchQuery: The query that led to this result.Useful for capturing feedback or building training data.- searchResultItems: The list is assumed sorted best to worst, which should bereflected by the values contained in the score field of eachSearchResult, if that field is populated.- metadata: The system that provided the response: likely use case forpopulating this field is for building training data. Presumablya system will not need/want to return this object in live use.- lang: The dominant language of the search results.Use ISO 639-2/T three letter codes.Search providers should set this when possible to support downstream processing.Do not set if it is not known.If multilingual, use the string “multilingual”.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.ttypes.
SearchResultItem
(communicationId=None, sentenceId=None, score=None, tokens=None, entity=None)¶ Bases:
object
An individual element returned from a search. Most/all methodswill return a communicationId, possibly with an associated score.For example if the target element type of the search is Sentencethen the sentenceId field should be populated.Attributes:- communicationId- sentenceId: The UUID of the returned sentence, which appears in thecommunication referenced by communicationId.- score: Values are not restricted in range (e.g., do not have to bewithin [0,1]). Higher is better.- tokens: If SearchType=ENTITY_MENTIONS then this field should be populated.Otherwise, this field may be optionally populated in order toprovide a hint to the client as to where to center avisualization, or the extraction of context, etc.- entity: If SearchType=ENTITIES then this field should be populated.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.services package¶
concrete.services.results.ResultsServerService module¶
-
class
concrete.services.results.ResultsServerService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.services.results.ResultsServerService.Iface
-
getLatestSearchResult
(userId)¶ - Get the most recent search results for a userParameters:- userId
-
getNextChunk
(sessionId)¶ - Get next chunk of data to annotateThe client should use the Retriever service to access the dataParameters:- sessionId
-
getSearchResult
(searchResultId)¶ - Get a search result objectParameters:- searchResultId
-
getSearchResults
(taskType, limit)¶ - Get a list of search results for a particular annotation taskSet the limit to 0 to get all relevant search resultsParameters:- taskType- limit
-
getSearchResultsByUser
(taskType, userId, limit)¶ - Get a list of search results for a particular annotation task filtered by a user idSet the limit to 0 to get all relevant search resultsParameters:- taskType- userId- limit
-
recv_getLatestSearchResult
()¶
-
recv_getNextChunk
()¶
-
recv_getSearchResult
()¶
-
recv_getSearchResults
()¶
-
recv_getSearchResultsByUser
()¶
-
recv_registerSearchResult
()¶
-
recv_startSession
()¶
-
recv_stopSession
()¶
-
recv_submitAnnotation
()¶
-
registerSearchResult
(result, taskType)¶ - Register the specified search result for annotation.If a name has not been assigned to the search query, one will be generated.This service also requires that the user_id field be populated in the SearchQuery.Parameters:- result- taskType
-
send_getLatestSearchResult
(userId)¶
-
send_getNextChunk
(sessionId)¶
-
send_getSearchResult
(searchResultId)¶
-
send_getSearchResults
(taskType, limit)¶
-
send_getSearchResultsByUser
(taskType, userId, limit)¶
-
send_registerSearchResult
(result, taskType)¶
-
send_startSession
(searchResultId, taskType)¶
-
send_stopSession
(sessionId)¶
-
send_submitAnnotation
(sessionId, unitId, communication)¶
-
startSession
(searchResultId, taskType)¶ - Start an annotation sessionReturns a session id used in future session callsParameters:- searchResultId- taskType
-
stopSession
(sessionId)¶ - Stops an annotation sessionParameters:- sessionId
-
submitAnnotation
(sessionId, unitId, communication)¶ - Submit an annotation for a sessionParameters:- sessionId- unitId- communication
-
-
class
concrete.services.results.ResultsServerService.
Iface
¶ Bases:
concrete.services.Service.Iface
-
getLatestSearchResult
(userId)¶ - Get the most recent search results for a userParameters:- userId
-
getNextChunk
(sessionId)¶ - Get next chunk of data to annotateThe client should use the Retriever service to access the dataParameters:- sessionId
-
getSearchResult
(searchResultId)¶ - Get a search result objectParameters:- searchResultId
-
getSearchResults
(taskType, limit)¶ - Get a list of search results for a particular annotation taskSet the limit to 0 to get all relevant search resultsParameters:- taskType- limit
-
getSearchResultsByUser
(taskType, userId, limit)¶ - Get a list of search results for a particular annotation task filtered by a user idSet the limit to 0 to get all relevant search resultsParameters:- taskType- userId- limit
-
registerSearchResult
(result, taskType)¶ - Register the specified search result for annotation.If a name has not been assigned to the search query, one will be generated.This service also requires that the user_id field be populated in the SearchQuery.Parameters:- result- taskType
-
startSession
(searchResultId, taskType)¶ - Start an annotation sessionReturns a session id used in future session callsParameters:- searchResultId- taskType
-
stopSession
(sessionId)¶ - Stops an annotation sessionParameters:- sessionId
-
submitAnnotation
(sessionId, unitId, communication)¶ - Submit an annotation for a sessionParameters:- sessionId- unitId- communication
-
-
class
concrete.services.results.ResultsServerService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.services.results.ResultsServerService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_getLatestSearchResult
(seqid, iprot, oprot)¶
-
process_getNextChunk
(seqid, iprot, oprot)¶
-
process_getSearchResult
(seqid, iprot, oprot)¶
-
process_getSearchResults
(seqid, iprot, oprot)¶
-
process_getSearchResultsByUser
(seqid, iprot, oprot)¶
-
process_registerSearchResult
(seqid, iprot, oprot)¶
-
process_startSession
(seqid, iprot, oprot)¶
-
process_stopSession
(seqid, iprot, oprot)¶
-
process_submitAnnotation
(seqid, iprot, oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getLatestSearchResult_args
(userId=None)¶ Bases:
object
Attributes:- userId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getLatestSearchResult_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getNextChunk_args
(sessionId=None)¶ Bases:
object
Attributes:- sessionId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getNextChunk_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResult_args
(searchResultId=None)¶ Bases:
object
Attributes:- searchResultId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResult_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResultsByUser_args
(taskType=None, userId=None, limit=None)¶ Bases:
object
Attributes:- taskType- userId- limit-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResultsByUser_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResults_args
(taskType=None, limit=None)¶ Bases:
object
Attributes:- taskType- limit-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResults_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
registerSearchResult_args
(result=None, taskType=None)¶ Bases:
object
Attributes:- result- taskType-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
registerSearchResult_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
startSession_args
(searchResultId=None, taskType=None)¶ Bases:
object
Attributes:- searchResultId- taskType-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
startSession_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
stopSession_args
(sessionId=None)¶ Bases:
object
Attributes:- sessionId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
stopSession_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.services.Service module¶
-
class
concrete.services.Service.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Iface
Base service that all other services should inherit from-
about
()¶ - Get information about the service
-
alive
()¶ - Is the service alive?
-
recv_about
()¶
-
recv_alive
()¶
-
send_about
()¶
-
send_alive
()¶
-
-
class
concrete.services.Service.
Iface
¶ Bases:
object
Base service that all other services should inherit from-
about
()¶ - Get information about the service
-
alive
()¶ - Is the service alive?
-
-
class
concrete.services.Service.
Processor
(handler)¶ Bases:
concrete.services.Service.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_about
(seqid, iprot, oprot)¶
-
process_alive
(seqid, iprot, oprot)¶
-
-
class
concrete.services.ttypes.
AnnotationTaskType
¶ Bases:
object
Annotation Tasks Types-
NER
= 2¶
-
TOPICID
= 3¶
-
TRANSLATION
= 1¶
-
-
class
concrete.services.ttypes.
AnnotationUnitIdentifier
(communicationId=None, sentenceId=None)¶ Bases:
object
An annotation unit is the part of the communication to be annotated.It can be the entire communication or a particular sentence in the communication.If the sentenceID is null, the unit is the entire communicationAttributes:- communicationId: Communication identifier for loading data- sentenceId: Sentence identifer if annotating sentences-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.ttypes.
AnnotationUnitType
¶ Bases:
object
An annotation unit is the part of the communication to be annotated.-
COMMUNICATION
= 1¶
-
SENTENCE
= 2¶
-
-
class
concrete.services.ttypes.
AsyncContactInfo
(host=None, port=None)¶ Bases:
object
Contact information for the asynchronous communications.When a client contacts a server for a job that takes a significant amount of time,it is often best to implement this asynchronously.We do this by having the client stand up a server to accept the results andpassing that information to the original server.The server may want to create a new thrift client on every request or maintaina pool of clients for reuse.Attributes:- host- port-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
exception
concrete.services.ttypes.
NotImplementedException
(message=None, serEx=None)¶ Bases:
thrift.Thrift.TException
An exception to be used when an invoked method hasnot been implemented by the service.Attributes:- message: The explanation (why the exception occurred)- serEx: The serialized exception-
classmethod
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
classmethod
-
class
concrete.services.ttypes.
ServiceInfo
(name=None, version=None, description=None)¶ Bases:
object
Each service is described by this info struct.It is for human consumption and for records of versions in deployments.Attributes:- name: Name of the service- version: Version string of the service.It is preferred that the services implement semantic versioning: http://semver.org/with version strings like x.y.z- description: Description of the service-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
exception
concrete.services.ttypes.
ServicesException
(message=None, serEx=None)¶ Bases:
thrift.Thrift.TException
An exception to be used with Concrete services.Attributes:- message: The explanation (why the exception occurred)- serEx: The serialized exception-
classmethod
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
classmethod
concrete.situations package¶
-
class
concrete.situations.ttypes.
Argument
(role=None, entityId=None, situationId=None, propertyList=None)¶ Bases:
object
A situation argument, consisting of an argument role and a value.Argument values may be Entities or Situations.Attributes:- role: The relationship between this argument and the situation thatowns it. The roles that a situation’s arguments can takedepend on the type of the situation (including subtypeinformation, such as event_type).- entityId: A pointer to the value of this argument, if it is explicitlyencoded as an Entity.- situationId: A pointer to the value of this argument, if it is a situation.- propertyList: For the BinarySRL task, there may be situationswhere more than one property is attached to a singleparticipant. A list of these properties can be stored in this field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
Justification
(justificationType=None, mentionId=None, tokenRefSeqList=None)¶ Bases:
object
Attributes:- justificationType: An enumerated value used to describe the way in which thejustification’s mention provides supporting evidence for thesituation.- mentionId: A pointer to the SituationMention itself.- tokenRefSeqList: An optional list of pointers to tokens that are (especially)relevant to the way in which this mention providesjustification for the situation. It is left up to individualanalytics to decide what tokens (if any) they wish to includein this field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
MentionArgument
(role=None, entityMentionId=None, situationMentionId=None, tokens=None, confidence=None, propertyList=None)¶ Bases:
object
A “concrete” argument, that may be used by SituationMentions or EntityMentionsto avoid conflicts where abstract Arguments were being used to support concrete Mentions.Attributes:- role: The relationship between this argument and the situation thatowns it. The roles that a situation’s arguments can takedepend on the type of the situation (including subtypeinformation, such as event_type).- entityMentionId: A pointer to the value of an EntityMention, if this is being used to supportan EntityMention.- situationMentionId: A pointer to the value of this argument, if it is a SituationMention.- tokens: The location of this MentionArgument in the Communication.If this MentionArgument can be identified in a document using anEntityMention or SituationMention, then UUID references to thosetypes should be preferred and this field left as null.- confidence: Confidence of this argument belonging to its SituationMention- propertyList: For the BinarySRL task, there may be situationswhere more than one property is attached to a singleparticipant. A list of these properties can be stored in this field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
Property
(value=None, metadata=None, polarity=None)¶ Bases:
object
Attached to Arguments to support situations wherea ‘participant’ has more than one ‘property’ (in BinarySRL terms),whereas Arguments notionally only support one Role.Attributes:- value: The required value of the property.- metadata: Metadata to support this particular property object.- polarity: This value is typically boolean, 0.0 or 1.0, but we use afloat in order to potentially capture cases where an annotator ishighly confident that the value is underspecified, via a value of0.5.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
Situation
(uuid=None, situationType=None, situationKind=None, argumentList=None, mentionIdList=None, justificationList=None, timeML=None, intensity=None, polarity=None, confidence=None)¶ Bases:
object
A single situation, along with pointers to situation mentions thatprovide evidence for the situation. “Situations” include events,relations, facts, sentiments, and beliefs. Each situation has acore type (such as EVENT or SENTIMENT), along with an optionalsubtype based on its core type (e.g., event_type=CONTACT_MEET), anda set of zero or more unordered arguments.This struct may be used for a variety of “processed” Situations suchas (but not limited to):- SituationMentions which have been collapsed into a coreferential cluster- Situations which are inferred and not directly supported by a textual mentionAttributes:- uuid: Unique identifier for this situation.- situationType: The core type of this situation (eg EVENT or SENTIMENT),or a coarse grain situation type.- situationKind: A fine grain situation type that specifically describes thesituation based on situationType above. It allows for moredetailed description of the situation.Some examples:if situationType == EVENT, the event type for the situationif situationType == STATE, the state typeif situationType == TEMPORAL_FACT, the temporal fact typeFor Propbank, this field should be the predicate lemma and id,e.g. “strike.02”. For FrameNet, this should be the frame name,e.g. “Commerce_buy”.Different and more varied situationTypes may be addedin the future.- argumentList: The arguments for this situation. Each argument consists of arole and a value. It is possible for an situation to havemultiple arguments with the same role. Arguments areunordered.- mentionIdList: Ids of the mentions of this situation in a communication(type=SituationMention)- justificationList: An list of pointers to SituationMentions that providejustification for this situation. These mentions may be eitherdirect mentions of the situation, or indirect evidence.- timeML: A wrapper for TimeML annotations.- intensity: An “intensity” rating for this situation, typically ranging from0-1. In the case of SENTIMENT situations, this is used to recordthe intensity of the sentiment.- polarity: The polarity of this situation. In the case of SENTIMENTsituations, this is used to record the polarity of thesentiment.- confidence: A confidence score for this individual situation. You can alsoset a confidence score for an entire SituationSet using theSituationSet’s metadata.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
SituationMention
(uuid=None, text=None, situationType=None, situationKind=None, argumentList=None, intensity=None, polarity=None, tokens=None, confidence=None)¶ Bases:
object
A concrete mention of a situation, where “situations” includeevents, relations, facts, sentiments, and beliefs. Each situationhas a core type (such as EVENT or SENTIMENT), along with anoptional subtype based on its core type (e.g.,event_type=CONTACT_MEET), and a set of zero or more unorderedarguments.This struct should be used for most types of SRL labelings(e.g. Propbank and FrameNet) because they are grounded in text.Attributes:- uuid: Unique identifier for this situation.- text: The text content of this situation mention. This field isoften redundant with the ‘tokens’ field, and may notbe generated by all analytics.- situationType: The core type of this situation (eg EVENT or SENTIMENT),or a coarse grain situation type.- situationKind: A fine grain situation type that specifically describes thesituation mention based on situationType above. It allows formore detailed description of the situation mention.Some examples:if situationType == EVENT, the event type for the sit. mentionif situationType == STATE, the state type for this sit. mentionFor Propbank, this field should be the predicate lemma and id,e.g. “strike.02”. For FrameNet, this should be the frame name,e.g. “Commerce_buy”.Different and more varied situationTypes may be addedin the future.- argumentList: The arguments for this situation mention. Each argumentconsists of a role and a value. It is possible for an situationto have multiple arguments with the same role. Arguments areunordered.- intensity: An “intensity” rating for the situation, typically ranging from0-1. In the case of SENTIMENT situations, this is used to recordthe intensity of the sentiment.- polarity: The polarity of this situation. In the case of SENTIMENTsituations, this is used to record the polarity of thesentiment.- tokens: An optional pointer to tokens that are (especially)relevant to this situation mention. It is left up to individualanalytics to decide what tokens (if any) they wish to include inthis field. In particular, it is not specified whether thearguments’ tokens should be included.- confidence: A confidence score for this individual situation mention. Youcan also set a confidence score for an entire SituationMentionSetusing the SituationMentionSet’s metadata.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
SituationMentionSet
(uuid=None, metadata=None, mentionList=None, linkingList=None)¶ Bases:
object
A theory about the set of situation mentions that are present in amessage. See also: SituationMentionAttributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- mentionList: List of mentions in this set.- linkingList: Entity linking annotations associated with this SituationMentionSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
SituationSet
(uuid=None, metadata=None, situationList=None, linkingList=None)¶ Bases:
object
A theory about the set of situations that are present in amessage. See also: SituationAttributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- situationList: List of mentions in this set.- linkingList: Entity linking annotations associated with this SituationSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
TimeML
(timeMLClass=None, timeMLTense=None, timeMLAspect=None)¶ Bases:
object
A wrapper for various TimeML annotations.Attributes:- timeMLClass: The TimeML class for situations representing TimeML events- timeMLTense: The TimeML tense for situations representing TimeML events- timeMLAspect: The TimeML aspect for situations representing TimeML events-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.spans package¶
-
class
concrete.spans.ttypes.
AudioSpan
(start=None, ending=None)¶ Bases:
object
A span of audio within a single communication, identified by apair of time offests. Time offsets are zero-based.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.Attributes:- start: Start time (in seconds)- ending: End time (in seconds)-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.spans.ttypes.
TextSpan
(start=None, ending=None)¶ Bases:
object
A span of text within a single communication, identified by a pairof zero-indexed character offsets into a Thrift string. Thrift stringsare encoded using UTF-8:The offsets are character-based, not byte-based - a character with athree-byte UTF-8 representation only counts as one character.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.Attributes:- start: Start character, inclusive.- ending: End character, exclusive-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.structure package¶
-
class
concrete.structure.ttypes.
Arc
(src=None, dst=None, token=None, weight=None)¶ Bases:
object
Type for arcs. For epsilon edges, leave ‘token’ blank.Attributes:- src- dst- token- weight-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Constituent
(id=None, tag=None, childList=None, headChildIndex=-1, start=None, ending=None)¶ Bases:
object
A single parse constituent (or “phrase”).Attributes:- id: A parse-relative identifier for this consistuent. Togetherwith the UUID for a Parse, this can be used to definepointers to specific constituents.- tag: A description of this constituency node, e.g. the category “NP”.For leaf nodes, this should be a word and for pre-terminal nodesthis should be a POS tag.- childList- headChildIndex: The index of the head child of this constituent. I.e., thehead child of constituent <tt>c</tt> is<tt>c.children[c.head_child_index]</tt>. A value of -1indicates that no child head was identified.- start: The first token (inclusive) of this constituent in theparent Tokenization. Almost certainly should be populated.- ending: The last token (exclusive) of this constituent in theparent Tokenization. Almost certainly should be populated.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
ConstituentRef
(parseId=None, constituentIndex=None)¶ Bases:
object
A reference to a Constituent within a Parse.Attributes:- parseId: The UUID of the Parse that this Constituent belongs to.- constituentIndex: The index in the constituent list of this Constituent.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Dependency
(gov=-1, dep=None, edgeType=None)¶ Bases:
object
A syntactic edge between two tokens in a tokenized sentence.Attributes:- gov: The governor or the head token. 0 indexed.- dep: The dependent token. 0 indexed.- edgeType: The relation that holds between gov and dep.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
DependencyParse
(uuid=None, metadata=None, dependencyList=None, structureInformation=None)¶ Bases:
object
Represents a dependency parse with typed edges.Attributes:- uuid- metadata- dependencyList- structureInformation-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
DependencyParseStructure
(isAcyclic=None, isConnected=None, isSingleHeaded=None, isProjective=None)¶ Bases:
object
Information about the structure of a dependency parse.This information is computable from the list of dependencies,but this allows the consumer to make (verified) assumptionsabout the dependencies being processed.Attributes:- isAcyclic: True iff there are no cycles in the dependency graph.- isConnected: True iff the dependency graph forms a single connected component.- isSingleHeaded: True iff every node in the dependency parse has at mostone head/parent/governor.- isProjective: True iff there are no crossing edges in the dependency parse.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
LatticePath
(weight=None, tokenList=None)¶ Bases:
object
Attributes:- weight- tokenList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Parse
(uuid=None, metadata=None, constituentList=None)¶ Bases:
object
A theory about the syntactic parse of a sentence.ote If we add support for parse forests in the future, then itwill most likely be done by adding a new field (e.g.“<tt>forest_root</tt>”) that uses a new struct type to encode theforest. A “<tt>kind</tt>” field might also be added (analogous to<tt>Tokenization.kind</tt>) to indicate whether a parse is encodedusing a simple tree or a parse forest.Attributes:- uuid- metadata- constituentList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Section
(uuid=None, sentenceList=None, textSpan=None, rawTextSpan=None, audioSpan=None, kind=None, label=None, numberList=None, lidList=None)¶ Bases:
object
A single “section” of a communication, such as a paragraph. Eachsection is defined using a text or audio span, and can optionallycontain a list of sentences.Attributes:- uuid: The unique identifier for this section.- sentenceList: The sentences of this “section.”- textSpan: Location of this section in the communication text.NOTE: This text span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- rawTextSpan: Location of this section in the raw text.NOTE: This text span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- audioSpan: Location of this section in the original audio.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- kind: A short, sometimes corpus-specific term characterizing the natureof the section; may change in a future version of concrete. Thisoften acts as a coarse-grained descriptor that is used forfiltering. For example, Gigaword uses the section kind “passage”to distinguish content-bearing paragraphs in the body of anarticle from other paragraphs, such as the headline and dateline.- label: The name of the section. For example, a title of a section onWikipedia.- numberList: Position within the communication with respect to other Sections:The section number, E.g., 3, or 3.1, or 3.1.2, etc. Aimed atCommunications with content organized in a hierarchy, such as a Bookwith multiple chapters, then sections, then paragraphs. Or even adense Wikipedia page with subsections. Sections should still bearranged linearly, where reading these numbers should not be requiredto get a start-to-finish enumeration of the Communication’s content.- lidList: An optional field to be used for multi-language documents.This field should be populated when a section is inside ofa document that contains multiple languages.Minimally, each block of text in one language should be it’s ownsection. For example, if a paragraph is in English and theparagraph afterwards is in French, these should be separated intotwo different sections, allowing language-specific analytics torun on appropriate sections.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Sentence
(uuid=None, tokenization=None, textSpan=None, rawTextSpan=None, audioSpan=None)¶ Bases:
object
A single sentence or utterance in a communication.Attributes:- uuid- tokenization: Theory about the tokens that make up this sentence. For textcommunications, these tokenizations will typically be generatedby a tokenizer. For audio communications, these tokenizationswill typically be generated by an automatic speech recognizer.The “Tokenization” message type is also used to store the outputof machine translation systems and text normalizationsystems.- textSpan: Location of this sentence in the communication text.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- rawTextSpan: Location of this sentence in the raw text.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- audioSpan: Location of this sentence in the original audio.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
SpanLink
(tokens=None, concreteTarget=None, externalTarget=None, linkType=None)¶ Bases:
object
A collection of tokens that represent a link to another resource.This resource might be another Concrete object (e.g., anotherConcrete Communication), represented with the ‘concreteTarget’field, or it could link to a resource outside of Concrete via the‘externalTarget’ field.Attributes:- tokens: The tokens that make up this SpanLink object.- concreteTarget- externalTarget- linkType-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TaggedToken
(tokenIndex=None, tag=None, confidence=None, tagList=None, confidenceList=None)¶ Bases:
object
Attributes:- tokenIndex: A pointer to the token being tagged.Token indices are 0-based. These indices are also 0-based.- tag: A string containing the annotation.If the tag set you are using is not case sensitive,then all part of speech tags should be normalized to upper case.- confidence: Confidence of the annotation.- tagList: A list of strings that represent a distribution of possibletags for this token.If populated, the ‘tag’ field should also be populatedwith the “best” value from this list.- confidenceList: A list of doubles that represent confidences associated withthe tags in the ‘tagList’ field.If populated, the ‘confidence’ field should also be populatedwith the confidence associated with the “best” tag in ‘tagList’.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Token
(tokenIndex=None, text=None, textSpan=None, rawTextSpan=None, audioSpan=None)¶ Bases:
object
A single token (typically a word) in a communication. The exactdefinition of what counts as a token is left up to the tools thatgenerate token sequences.Usually, each token will include at least a text string.Attributes:- tokenIndex: A 0-based tokenization-relative identifier for this token thatrepresents the order that this token appears in thesentence. Together with the UUID for a Tokenization, this can beused to define pointers to specific tokens. If a Tokenizationobject contains multiple Token objects with the same id (e.g., indifferent n-best lists), then all of their other fields must beidentical as well.- text: The text associated with this token.Note - we may have a destructive tokenizer (e.g., Stanford rewriting)and as a result, we want to maintain this field.- textSpan: Location of this token in this perspective’s text (.text field).In cases where this token does not correspond directly with anytext span in the text (such as word insertion during MT),this field may be given a value indicating “approximately” wherethe token comes from. A span covering the entire sentence may beused if no more precise value seems appropriate.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the document, but is the annotation’s besteffort at such a representation.- rawTextSpan: Location of this token in the original, raw text (.originalTextfield). In cases where this token does not correspond directlywith any text span in the original text (such as word insertionduring MT), this field may be given a value indicating“approximately” where the token comes from. A span covering theentire sentence may be used if no more precise value seemsappropriate.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original raw document, but is the annotation’s besteffort at such a representation.- audioSpan: Location of this token in the original audio.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenLattice
(startState=0, endState=0, arcList=None, cachedBestPath=None)¶ Bases:
object
A lattice structure that assigns scores to a set of tokensequences. The lattice is encoded as an FSA, where states areidentified by integers, and each arc is annotated with anoptional tokens and a weight. (Arcs with no tokens are“epsilon” arcs.) The lattice has a single start state and asingle end state. (You can use epsilon edges to simulatemultiple start states or multiple end states, if desired.)The score of a path through the lattice is the sum of the weightsof the arcs that make up that path. A path with a lower scoreis considered “better” than a path with a higher score.If possible, path scores should be negative log likelihoods(with base e – e.g. if P=1, then weight=0; and if P=0.5, thenweight=0.693). Furthermore, if possible, the path scores shouldbe globally normalized (i.e., they should encode probabilities).This will allow for them to be combined with other informationin a reasonable way when determining confidences for systemoutputs.TokenLattices should never contain any paths with cycles. Everyarc in the lattice should be included in some path from the startstate to the end state.Attributes:- startState- endState- arcList- cachedBestPath-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenList
(tokenList=None)¶ Bases:
object
A wrapper around a list of tokens.Attributes:- tokenList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenRefSequence
(tokenIndexList=None, anchorTokenIndex=-1, tokenizationId=None, textSpan=None, rawTextSpan=None, audioSpan=None, dependencies=None, constituent=None)¶ Bases:
object
A list of pointers to tokens that all belong to the sametokenization.Attributes:- tokenIndexList: The tokenization-relative identifiers for each token that isincluded in this sequence.- anchorTokenIndex: An optional field that can be used to describethe root of a sentence (if this sequence is a full sentence),the head of a constituent (if this sequence is a constituent),or some other form of “canonical” token in this sequence if,for instance, it is not easy to map this sequence to a anotherannotation that has a head.This field is defined with respect to the Tokenization givenby tokenizationId, and not to this object’s tokenIndexList.- tokenizationId: The UUID of the tokenization that contains the tokens.- textSpan: The text span in the main text (.text field) associated with thisTokenRefSequence.NOTE: This span represents a best guess, or ‘provenance’: itcannot be guaranteed that this text span matches the _exact_ textof the original document, but is the annotation’s best effort atsuch a representation.- rawTextSpan: The text span in the original text (.originalText field)associated with this TokenRefSequence.NOTE: This span represents a best guess, or ‘provenance’: itcannot be guaranteed that this text span matches the _exact_ textof the original raw document, but is the annotation’s best effortat such a representation.- audioSpan: The audio span associated with this TokenRefSequence.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- dependencies: Use this field to reference a dependency tree fragmentsuch as a shortest path or all the dependents in a constituent.- constituent: Use this field to specify an entire constituent in a parse tree.Prefer textSpan over this field unless a node in a tree is needed.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenTagging
(uuid=None, metadata=None, taggedTokenList=None, taggingType=None)¶ Bases:
object
A theory about some token-level annotation.The TokenTagging consists of a mapping from tokens(using token ids) to string tags (e.g. part-of-speech tags or lemmas).The mapping defined by a TokenTagging may be partial –i.e., some tokens may not be assigned any part of speech tags.For lattice tokenizations, you may need to create multiplepart-of-speech taggings (for different paths through the lattice),since the appropriate tag for a given token may depend on the pathtaken. For example, you might define a separateTokenTagging for each of the top K paths, which leaves alltokens that are not part of the path unlabeled.Currently, we use strings to encode annotations. Inthe future, we may add fields for encoding specific tag sets(eg treebank tags), or for adding compound tags.Attributes:- uuid: The UUID of this TokenTagging object.- metadata: Information about where the annotation came from.This should be used to tell between gold-standard annotationsand automatically generated theories about the data- taggedTokenList: The mapping from tokens to annotations.This may be a partial mapping.- taggingType: An ontology-backed string that represents thetype of token taggings this TokenTagging objectproduces.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Tokenization
(uuid=None, metadata=None, tokenList=None, lattice=None, kind=None, tokenTaggingList=None, parseList=None, dependencyParseList=None, spanLinkList=None)¶ Bases:
object
A theory (or set of alternative theories) about the sequence oftokens that make up a sentence.This message type is used to record the output of not just fortokenizers, but also for a wide variety of other tools, includingmachine translation systems, text normalizers, part-of-speechtaggers, and stemmers.Each Tokenization is encoded using either a TokenListor a TokenLattice. (If you want to encode an n-best list, thenyou should store it as n separate Tokenization objects.) The“kind” field is used to indicate whether this Tokenization containsa list of tokens or a TokenLattice.The confidence value for each sequence is determined by combiningthe confidence from the “metadata” field with confidenceinformation from individual token sequences as follows:<ul><li> For n-best lists:metadata.confidence </li><li> For lattices:metadata.confidence * exp(-sum(arc.weight)) </li></ul>Note: in some cases (such as the output of a machine translationtool), the order of the tokens in a token sequence may notcorrespond with the order of their original text span offsets.Attributes:- uuid- metadata: Information about where this tokenization came from.- tokenList: A wrapper around an ordered list of the tokens in this tokenization.This may also give easy access to the “reconstructed text” associatedwith this tokenization.This field should only have a value if kind==TOKEN_LIST.- lattice: A lattice that compactly describes a set of token sequences thatmight make up this tokenization. This field should only have avalue if kind==LATTICE.- kind: Enumerated value indicating whether this tokenization isimplemented using an n-best list or a lattice.- tokenTaggingList- parseList- dependencyParseList- spanLinkList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.summarization package¶
concrete.summarization.SummarizationService module¶
-
class
concrete.summarization.SummarizationService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.summarization.SummarizationService.Iface
-
getCapabilities
()¶
-
recv_getCapabilities
()¶
-
recv_summarize
()¶
-
send_getCapabilities
()¶
-
send_summarize
(query)¶
-
summarize
(query)¶ - Parameters:- query
-
-
class
concrete.summarization.SummarizationService.
Iface
¶ Bases:
concrete.services.Service.Iface
-
getCapabilities
()¶
-
summarize
(query)¶ - Parameters:- query
-
-
class
concrete.summarization.SummarizationService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.summarization.SummarizationService.Iface
,thrift.Thrift.TProcessor
-
on_message_begin
(func)¶
-
process
(iprot, oprot)¶
-
process_getCapabilities
(seqid, iprot, oprot)¶
-
process_summarize
(seqid, iprot, oprot)¶
-
-
class
concrete.summarization.SummarizationService.
getCapabilities_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.summarization.SummarizationService.
getCapabilities_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.summarization.ttypes.
SummarizationCapability
(type=None, lang=None)¶ Bases:
object
Attributes:- type- lang-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.summarization.ttypes.
SummarizationRequest
(queryTerms=None, maximumTokens=None, maximumCharacters=None, sourceType=None, sourceIds=None, sourceCommunication=None)¶ Bases:
object
A request to summarize which specifies the length of the desiredsummary and the text data to be summarized.Either set sourceCommunication or sourceType and sourceIds.Attributes:- queryTerms: Terms or features pertinent to the query.Can be empty, meaning summarize all source material withno a priori beliefs about what is important to summarize.- maximumTokens: Limit on how long the returned summary can be in tokens.- maximumCharacters: Limit on how long the returned summary can be in characters.- sourceType: How to interpret the ids in sourceIds.May be null is sourceIds is null, otherwise must be populated.- sourceIds: A list of concrete object ids which serve as the materialto summarize.- sourceCommunication: Alternative to sourceIds+sourceType: provide a Communicationof text to summarize.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.summarization.ttypes.
Summary
(summaryCommunication=None, concepts=None)¶ Bases:
object
A shortened version of some text, possibly with some conceptsannotated as justifications for why particular pieces of thesummary were kept.Attributes:- summaryCommunication: Contains the text of the generated summary.- concepts: Concepts mentioned in the summary which are believed to beinteresting and/or worth highlighting.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.summarization.ttypes.
SummaryConcept
(tokens=None, concept=None, confidence=1.0, utility=1.0)¶ Bases:
object
A mention of a concept described in a summary which is thoughtto be informative. Concepts might be named entities, facts, orevents which were determined to be salient in the text beingsummarized.Attributes:- tokens: Location in summaryCommunication of this concept- concept: Short description of the concept being evoked, e.g. “kbrel:bornIn” or “related:ACME_Corp”- confidence: How confident is the system that this concept was evoked by this mention, in [0,1]- utility: How informative/important it is that this concept be included in the summary (non-negative).-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.twitter package¶
-
class
concrete.twitter.ttypes.
BoundingBox
(type=None, coordinateList=None)¶ Bases:
object
Attributes:- type- coordinateList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
HashTag
(text=None, startOffset=None, endOffset=None)¶ Bases:
object
Attributes:- text- startOffset- endOffset-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
PlaceAttributes
(streetAddress=None, region=None, locality=None)¶ Bases:
object
Attributes:- streetAddress- region- locality-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TweetInfo
(id=None, text=None, createdAt=None, user=None, truncated=None, entities=None, source=None, coordinates=None, place=None, favorited=None, retweeted=None, retweetCount=None, inReplyToScreenName=None, inReplyToStatusId=None, inReplyToUserId=None, retweetedScreenName=None, retweetedStatusId=None, retweetedUserId=None)¶ Bases:
object
Attributes:- id- text- createdAt- user- truncated- entities- source- coordinates- place- favorited- retweeted- retweetCount- inReplyToScreenName- inReplyToStatusId- inReplyToUserId- retweetedScreenName- retweetedStatusId- retweetedUserId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterCoordinates
(type=None, coordinates=None)¶ Bases:
object
Attributes:- type- coordinates-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterEntities
(hashtagList=None, urlList=None, userMentionList=None)¶ Bases:
object
Attributes:- hashtagList- urlList- userMentionList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterLatLong
(latitude=None, longitude=None)¶ Bases:
object
A twitter geocoordinate.Attributes:- latitude- longitude-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterPlace
(placeType=None, countryCode=None, country=None, fullName=None, name=None, id=None, url=None, boundingBox=None, attributes=None)¶ Bases:
object
Attributes:- placeType- countryCode- country- fullName- name- id- url- boundingBox- attributes-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterUser
(id=None, name=None, screenName=None, lang=None, geoEnabled=None, createdAt=None, friendsCount=None, statusesCount=None, verified=None, listedCount=None, favouritesCount=None, followersCount=None, location=None, timeZone=None, description=None, utcOffset=None, url=None)¶ Bases:
object
Information about a Twitter user.Attributes:- id- name- screenName- lang- geoEnabled- createdAt- friendsCount- statusesCount- verified- listedCount- favouritesCount- followersCount- location- timeZone- description- utcOffset- url-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Development¶
Submitting a bug report¶
Please report any bugs to the GitLab (internal) or GitHub (public) issue trackers. Your issue will be resolved more quickly if you are able to provide a minimal working example, including an example concrete data file if applicable.
Contributing code¶
Ensure an issue has been created for your new feature/bugfix on GitLab (internal) or GitHub (public).
If you are adding a new feature, create a stub (placeholder implementation) of the desired argument/function/class/etc.
Write a test for your new feature/bugfix and run it, ensuring that it fails on the current implementation:
py.test tests/test_my_code.py
NameErrors, ImportErrors, SyntaxErrors, etc. do not count (they indicate the API is wrong).
Implement your new feature/bugfix.
Run the test again, ensuring that it now passes.
Run all tests and style checks, ensuring that they pass:
tox
Optionally, run integration tests (you must have Redis server version 2.8 or later in your path; do
redis-server --version
to check):tox integration-tests
If you created a new module (file) or package (directory) in the library, please see “Adding new modules and packages” in the next section.
Push your changes to a feature branch on GitLab/GitHub (e.g., called
n-issue-abbrev
wheren
is the issue number andissue-abbrev
is a very short abbreviation of the issue title) and ensure that the build passes. The build is defined in.gitlab-ci.yml
(.travis.yml
andappveyor.yml
for public builds); tox is configured intox.ini
. The build includes unit tests, integration tests, and style checks and runs on Python 3.5 across multiple platforms; if it fails, please find the error in the build log, fix it, and try again.Add a line to
CHANGELOG
under the current version-in-progress describing your changes simply and concisely. Add yourself toAUTHORS
if you are not already listed.If you’ve made multiple commits, please squash them and
git push -f
to the feature branch.Create a merge/pull request for your feature branch into
main
, referencing the GitLab/GitHub issue.
For maintainers¶
Adding new modules and packages¶
If a new module or package is created, either by hand or in the auto-generated code from Thrift, a small amount of additional configuration must be performed.
In either case, the name of the package (if it is a package and not a
module) should be added to the packages
parameter in setup.py
.
The name of the package or module should be added to the subpackage or
submodule list in docs/concrete.rst
, respectively.
A new ReStructuredText file should also be created under docs/
for
the package or module; follow the conventions set by the other packages
and modules.
If the new module or package was written by hand, a guard should be
added to autodoc_process_docstring
in docs/conf.py
so that that
module or package is not ignored by the documentation parser. If it is
a package, a guard should also be added to generate.bash
so that
the corresponding directory is not deleted when the auto-generated code
is copied into concrete/
from the Thrift build directory.
If a new package was generated by Thrift, a corresponding exclude
should be added to the flake8 configuration in setup.cfg
and the
new package’s ttypes
module should be added to the star imports
in concrete/__init__.py
. If a new module (not package) was
generated by thrift, no action is necessary.
Branches, versions, and releases¶
The main branch is kept stable at all times. Before a commit is pushed to main, it should be checked by CI on another branch. The recommended way of maintaining this is to do all work in feature branches that are kept up-to-date with main and pushed to GitLab, waiting for CI to finish before merging.
We use zest.releaser to manage versions, the CHANGELOG
, and
releases. (Making a new release is a many-step process that requires
great care; doing so by hand is strongly discouraged.)
Using zest.releaser, stable versions are released to PyPI
and main is kept on a development version number (so that a stable
version number never represents more than one snapshot of the code).
To make a new release install zest.releaser
(pip install zest.releaser
) and run fullrelease
.
Testing PyPI releases¶
To test how changes to concrete-python will show up on PyPI (for
example, how the readme is rendered) you can use the PyPI testing
site. To do so, set the following in ~/.pypirc
:
repository = https://testpypi.python.org/pypi
You will also need to create a testpypi user account and you may need
to request access to the concrete
package on testpypi.
Testing documentation¶
The automated build checks for syntax errors in the documentation. When a push is made to the GitHub repository the online documentation is automatically re-generated. You can run the automatic validation and generate the HTML documentation locally by doing:
tox -e docs
The generated HTML documentation is stored it in .tox/docs/tmp/html
(relative to the top of your repository). Open this file path in a
web browser to check how your changes will look when published online.
(Re)generating code from concrete¶
The Python code generated by the thrift compiler on the schema defined
in the concrete project is checked in to concrete-python manually after
applying necessary patches. For trivial modifications to the schema
this process is automated by generate.bash
, which assumes concrete
has been cloned alongside concrete-python (in the same parent
directory):
bash generate.bash
After this succeeds, tests should be run and the changes should be
manually inspected (git diff
) for sanity. Note that this will not
delete previously-generated files that are no longer produced by
thrift (whose entries were removed from the schema).
Note: Often generate.bash
is not sufficient: the patches (in
patches/
) document where it (thrift) falls short on the
previously-compiled schema. Additionally, if new packages
(namespaces) are added to the schema, they must be added to
setup.py
, setup.cfg
, and concrete/__init__.py
.
If generate.bash
throws an error, the
necessary changes should be performed manually and checked in to the
index, at which point the generated code should be removed from the
working tree, raw (unpatched) generated code should be generated, and
new patches should be produced and stored in patches/
using
git diff
. See the arguments to generate.bash
for generating
the unpatched code.