Welcome to concrete’s documentation!

Python modules and scripts for working with Concrete, an HLT data specification defined using Thrift.

concrete package

Subpackages

concrete.access package

Submodules
concrete.access.FetchCommunicationService module
class concrete.access.FetchCommunicationService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.access.FetchCommunicationService.Iface


Service to fetch particular communications.

fetch(request)

Parameters:
- request

getCommunicationCount()

Get the number of Communications this service searches over. Implementations
that do not provide this should throw an exception.

getCommunicationIDs(offset, count)

Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementations
that do not provide this should throw an exception.

Parameters:
- offset
- count

recv_fetch()
recv_getCommunicationCount()
recv_getCommunicationIDs()
send_fetch(request)
send_getCommunicationCount()
send_getCommunicationIDs(offset, count)
class concrete.access.FetchCommunicationService.Iface

Bases: concrete.services.Service.Iface


Service to fetch particular communications.

fetch(request)

Parameters:
- request

getCommunicationCount()

Get the number of Communications this service searches over. Implementations
that do not provide this should throw an exception.

getCommunicationIDs(offset, count)

Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementations
that do not provide this should throw an exception.

Parameters:
- offset
- count

class concrete.access.FetchCommunicationService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.access.FetchCommunicationService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_fetch(seqid, iprot, oprot)
process_getCommunicationCount(seqid, iprot, oprot)
process_getCommunicationIDs(seqid, iprot, oprot)
class concrete.access.FetchCommunicationService.fetch_args(request=None)

Bases: object


Attributes:
- request

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.fetch_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationCount_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationCount_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationIDs_args(offset=None, count=None)

Bases: object


Attributes:
- offset
- count

read(iprot)
validate()
write(oprot)
class concrete.access.FetchCommunicationService.getCommunicationIDs_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
concrete.access.StoreCommunicationService module
class concrete.access.StoreCommunicationService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.access.StoreCommunicationService.Iface


A service that exists so that clients can store Concrete data
structures to implementing servers.

Implement this if you are creating an analytic that wishes to
store its results back to a server. That server may perform
validation, write the new layers to a database, and so forth.

recv_store()
send_store(communication)
store(communication)

Store a communication to a server implementing this method.

The communication that is stored should contain the new
analytic layers you wish to append. You may also wish to call
methods that unset annotations you feel the receiver would not
find useful in order to reduce network overhead.

Parameters:
- communication

class concrete.access.StoreCommunicationService.Iface

Bases: concrete.services.Service.Iface


A service that exists so that clients can store Concrete data
structures to implementing servers.

Implement this if you are creating an analytic that wishes to
store its results back to a server. That server may perform
validation, write the new layers to a database, and so forth.

store(communication)

Store a communication to a server implementing this method.

The communication that is stored should contain the new
analytic layers you wish to append. You may also wish to call
methods that unset annotations you feel the receiver would not
find useful in order to reduce network overhead.

Parameters:
- communication

class concrete.access.StoreCommunicationService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.access.StoreCommunicationService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_store(seqid, iprot, oprot)
class concrete.access.StoreCommunicationService.store_args(communication=None)

Bases: object


Attributes:
- communication

read(iprot)
validate()
write(oprot)
class concrete.access.StoreCommunicationService.store_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
concrete.access.constants module
concrete.access.ttypes module
class concrete.access.ttypes.FetchRequest(communicationIds=None, auths=None)

Bases: object


Struct representing a request for FetchCommunicationService.

Attributes:
- communicationIds: a list of Communication IDs
- auths: optional authorization mechanism

read(iprot)
validate()
write(oprot)
class concrete.access.ttypes.FetchResult(communications=None)

Bases: object


Struct containing Communications from the FetchCommunicationService service.

Attributes:
- communications: a list of Communication objects that represent the results of the request

read(iprot)
validate()
write(oprot)
Module contents

concrete.annotate package

Submodules
concrete.annotate.AnnotateCommunicationService module
class concrete.annotate.AnnotateCommunicationService.Client(iprot, oprot=None)

Bases: concrete.annotate.AnnotateCommunicationService.Iface


Annotator service methods. For concrete analytics that
are to be stood up as independent services, accessible
from any programming language.

annotate(original)

Main annotation method. Takes a communication as input
and returns a new one as output.

It is up to the implementing service to verify that
the input communication is valid.

Can throw a ConcreteThriftException upon error
(invalid input, analytic exception, etc.).

Parameters:
- original

getDocumentation()

Return a detailed description of what the particular tool
does, what inputs and outputs to expect, etc.

Developers whom are not familiar with the particular
analytic should be able to read this string and
understand the essential functions of the analytic.

getMetadata()

Return the tool’s AnnotationMetadata.

recv_annotate()
recv_getDocumentation()
recv_getMetadata()
send_annotate(original)
send_getDocumentation()
send_getMetadata()
send_shutdown()
shutdown()

Indicate to the server it should shut down.

class concrete.annotate.AnnotateCommunicationService.Iface

Bases: object


Annotator service methods. For concrete analytics that
are to be stood up as independent services, accessible
from any programming language.

annotate(original)

Main annotation method. Takes a communication as input
and returns a new one as output.

It is up to the implementing service to verify that
the input communication is valid.

Can throw a ConcreteThriftException upon error
(invalid input, analytic exception, etc.).

Parameters:
- original

getDocumentation()

Return a detailed description of what the particular tool
does, what inputs and outputs to expect, etc.

Developers whom are not familiar with the particular
analytic should be able to read this string and
understand the essential functions of the analytic.

getMetadata()

Return the tool’s AnnotationMetadata.

shutdown()

Indicate to the server it should shut down.

class concrete.annotate.AnnotateCommunicationService.Processor(handler)

Bases: concrete.annotate.AnnotateCommunicationService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_annotate(seqid, iprot, oprot)
process_getDocumentation(seqid, iprot, oprot)
process_getMetadata(seqid, iprot, oprot)
process_shutdown(seqid, iprot, oprot)
class concrete.annotate.AnnotateCommunicationService.annotate_args(original=None)

Bases: object


Attributes:
- original

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.annotate_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getDocumentation_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getDocumentation_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getMetadata_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.getMetadata_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.annotate.AnnotateCommunicationService.shutdown_args

Bases: object

read(iprot)
validate()
write(oprot)
concrete.annotate.constants module
concrete.annotate.ttypes module
Module contents

concrete.audio package

Submodules
concrete.audio.constants module
concrete.audio.ttypes module
class concrete.audio.ttypes.Sound(wav=None, mp3=None, sph=None, path=None)

Bases: object


A sound wave. A separate optional field is defined for each
suppported format. Typically, a Sound object will only define
a single field.

Note: we may want to have separate fields for separate channels
(left vs right), etc.

Attributes:
- wav
- mp3
- sph
- path: An absolute path to a file on disk where the sound file can be
found. It is assumed that this path will be accessable from any
machine that the system is run on (i.e., it should be a shared
disk, or possibly a mirrored directory).

read(iprot)
validate()
write(oprot)
Module contents

concrete.clustering package

Submodules
concrete.clustering.constants module
concrete.clustering.ttypes module
class concrete.clustering.ttypes.Cluster(clusterMemberIndexList=None, confidenceList=None, childIndexList=None)

Bases: object


A set of items which are alike in some way. Has an implicit id which is the
index of this Cluster in its parent Clustering’s ‘clusterList’.

Attributes:
- clusterMemberIndexList: The items in this cluster. Values are indices into the
‘clusterMemberList’ of the Clustering which contains this Cluster.
- confidenceList: Co-indexed with ‘clusterMemberIndexList’. The i^{th} value represents the
confidence that mention clusterMemberIndexList[i] belongs to this cluster.
- childIndexList: A set of clusters (implicit ids/indices) from which this cluster was
created. This cluster should represent the union of all the items in all
of the child clusters. (For hierarchical clustering only).

read(iprot)
validate()
write(oprot)
class concrete.clustering.ttypes.ClusterMember(communicationId=None, setId=None, elementId=None)

Bases: object


An item being clustered. Does not designate cluster _membership_, as in
“item x belongs to cluster C”, but rather just the item (“x” in this
example). Membership is indicated through Cluster objects. An item may be a
Entity, EntityMention, Situation, SituationMention, or technically anything
with a UUID.

Attributes:
- communicationId: UUID of the Communication which contains the item specified by ‘elementId’.
This is ancillary info assuming UUIDs are indeed universally unique.
- setId: UUID of the Entity|Situation(Mention)Set which contains the item specified by ‘elementId’.
This is ancillary info assuming UUIDs are indeed universally unique.
- elementId: UUID of the EntityMention, Entity, SituationMention, or Situation that
this item represents. This is the characteristic field.

read(iprot)
validate()
write(oprot)
class concrete.clustering.ttypes.Clustering(uuid=None, metadata=None, clusterMemberList=None, clusterList=None, rootClusterIndexList=None)

Bases: object


An (optionally) hierarchical clustering of items appearing across a set of
Communications (intra-Communication clusterings are encoded by Entities and
Situations). An item may be a Entity, EntityMention, Situation,
SituationMention, or technically anything with a UUID.

Attributes:
- uuid: UUID for this Clustering object.
- metadata: Metadata for this Clustering object.
- clusterMemberList: The set of items being clustered.
- clusterList: Clusters of items. If this is a hierarchical clustering, this may contain
clusters which are the set of smaller clusters.
Clusters may not “overlap”, meaning (for all clusters X,Y):
X cap Y
eq emptyset implies X subset Y ee Y subset X
- rootClusterIndexList: A set of disjoint clusters (indices in ‘clusterList’) which cover all
items in ‘clusterMemberList’. This list must be specified for hierarchical
clusterings and should not be specified for flat clusterings.

read(iprot)
validate()
write(oprot)
Module contents

concrete.communication package

Submodules
concrete.communication.constants module
concrete.communication.ttypes module
class concrete.communication.ttypes.Communication(id=None, uuid=None, type=None, text=None, startTime=None, endTime=None, communicationTaggingList=None, metadata=None, keyValueMap=None, lidList=None, sectionList=None, entityMentionSetList=None, entitySetList=None, situationMentionSetList=None, situationSetList=None, originalText=None, sound=None, communicationMetadata=None)

Bases: object


A single communication instance, containing linguistic content
generated by a single speaker or author. This type is used for
both inter-personal communications (such as phone calls or
conversations) and third-party communications (such as news
articles).

Each communication instance is grounded by its original
(unannotated) contents, which should be stored in either the
“text” field (for text communications) or the “audio” field (for
audio communications). If the communication is not available in
its original form, then these fields should store the
communication in the least-processed form available.

Attributes:
- id: Stable identifier for this communication, identifying both the
name of the source corpus and the document that it corresponds to
in that corpus.
- uuid: Universally unique identifier for this communication instance.
This is generated randomly, and can not be mapped back to the
source corpus. It is used as a target for symbolic “pointers”.
- type: A short, corpus-specific term characterizing the nature of the
communication; may change in a future version of concrete.
Often used for filtering. For example, Gigaword uses
the type “story” to distinguish typical news articles from
weekly summaries (“multi”), editorial advisories (“advis”), etc.
At present, this value is typically a literal form from the
originating corpus: as a result, a type marked ‘other’ may have
different meanings across different corpora.
- text: The full text contents of this communication in its original
form, or in the least-processed form available, if the original
is not available.
- startTime: The time when this communication started (in unix time UTC –
i.e., seconds since January 1, 1970).
- endTime: The time when this communication ended (in unix time UTC –
i.e., seconds since January 1, 1970).
- communicationTaggingList: A list of CommunicationTagging objects that can support this
Communication. CommunicationTagging objects can be used to
annotate Communications with topics, gender identification, etc.
- metadata: metadata.AnnotationMetadata to support this particular communication.

Communications derived from other communications should
indicate in this metadata object their dependency
to the original communication ID.
- keyValueMap: A catch-all store of keys and values. Use sparingly!
- lidList: Theories about the languages that are present in this
communication.
- sectionList: Theory about the block structure of this communication.
- entityMentionSetList: Theories about which spans of text are used to mention entities
in this communication.
- entitySetList: Theories about what entities are discussed in this
communication, with pointers to individual mentions.
- situationMentionSetList: Theories about what situations are explicitly mentioned in this
communication.
- situationSetList: Theories about what situations are asserted in this
communication.
- originalText: Optional original text field that points back to an original
communication.

This field can be populated for sake of convenience when creating
“perspective” communication (communications that are based on
highly destructive changes to an original communication [e.g.,
via MT]). This allows developers to quickly access the original
text that this perspective communication is based off of.
- sound: The full audio contents of this communication in its original
form, or in the least-processed form available, if the original
is not available.
- communicationMetadata: Metadata about this specific Communication, such as information
about its author, information specific to this Communication
or Communications like it (info from an API, for example), etc.

read(iprot)
validate()
write(oprot)
class concrete.communication.ttypes.CommunicationSet(communicationIdList=None, corpus=None, entityMentionClusterList=None, entityClusterList=None, situationMentionClusterList=None, situationClusterList=None)

Bases: object


A structure that represents a collection of Communications.

Attributes:
- communicationIdList: A list of Communication UUIDs that this CommunicationSet
represents.

This field may be absent if this CommunicationSet represents
a large corpus. If absent, ‘corpus’ field should be present.
- corpus: The name of a corpus or other document body that this
CommunicationSet represents.

Should be present if ‘communicationIdList’ is absent.
- entityMentionClusterList: A list of Clustering objects that represent a
group of EntityMentions that are a part of this
CommunicationSet.
- entityClusterList: A list of Clustering objects that represent a
group of Entities that are a part of this
CommunicationSet.
- situationMentionClusterList: A list of Clustering objects that represent a
group of SituationMentions that are a part of this
CommunicationSet.
- situationClusterList: A list of Clustering objects that represent a
group of Situations that are a part of this
CommunicationSet.

read(iprot)
validate()
write(oprot)
class concrete.communication.ttypes.CommunicationTagging(uuid=None, metadata=None, taggingType=None, tagList=None, confidenceList=None)

Bases: object


A structure that represents a ‘tagging’ of a Communication. These
might be labels or annotations on a particular communcation.

For example, this structure might be used to describe the topics
discussed in a Communication. The taggingType might be ‘topic’, and
the tagList might include ‘politics’ and ‘science’.

Attributes:
- uuid: A unique identifier for this CommunicationTagging object.
- metadata: AnnotationMetadata to support this CommunicationTagging object.
- taggingType: A string that captures the type of this CommunicationTagging
object. For example: ‘topic’ or ‘gender’.
- tagList: A list of strings that represent different tags related to the taggingType.
For example, if the taggingType is ‘topic’, some example tags might be
‘politics’, ‘science’, etc.
- confidenceList: A list of doubles, parallel to the list of strings in tagList,
that indicate the confidences of each tag.

read(iprot)
validate()
write(oprot)
Module contents

concrete.email package

Submodules
concrete.email.constants module
concrete.email.ttypes module
class concrete.email.ttypes.EmailAddress(address=None, displayName=None)

Bases: object


An email address, optionally accompanied by a display_name. These
values are typically extracted from strings such as:
<tt> “John Smith” &lt;john@xyz.com&gt; </tt>.


Attributes:
- address
- displayName

read(iprot)
validate()
write(oprot)
class concrete.email.ttypes.EmailCommunicationInfo(messageId=None, contentType=None, userAgent=None, inReplyToList=None, referenceList=None, senderAddress=None, returnPathAddress=None, toAddressList=None, ccAddressList=None, bccAddressList=None, emailFolder=None, subject=None, quotedAddresses=None, attachmentPaths=None, salutation=None, signature=None)

Bases: object


Extra information about an email communication instance.

Attributes:
- messageId
- contentType
- userAgent
- inReplyToList
- referenceList
- senderAddress
- returnPathAddress
- toAddressList
- ccAddressList
- bccAddressList
- emailFolder
- subject
- quotedAddresses
- attachmentPaths
- salutation
- signature

read(iprot)
validate()
write(oprot)
Module contents

concrete.entities package

Submodules
concrete.entities.constants module
concrete.entities.ttypes module
class concrete.entities.ttypes.Entity(uuid=None, mentionIdList=None, type=None, confidence=None, canonicalName=None)

Bases: object


A single referent (or “entity”) that is referred to at least once
in a given communication, along with pointers to all of the
references to that referent. The referent’s type (e.g., is it a
person, or a location, or an organization, etc) is also recorded.

Because each Entity contains pointers to all references to a
referent with a given communication, an Entity can be
thought of as a coreference set.

Attributes:
- uuid: Unique identifier for this entity.
- mentionIdList: An list of pointers to all of the mentions of this Entity’s
referent. (type=EntityMention)
- type: The basic type of this entity’s referent.
- confidence: Confidence score for this individual entity. You can also set a
confidence score for an entire EntitySet using the EntitySet’s
metadata.
- canonicalName: A string containing a representative, canonical, or “best” name
for this entity’s referent. This string may match one of the
mentions’ text strings, but it is not required to.

read(iprot)
validate()
write(oprot)
class concrete.entities.ttypes.EntityMention(uuid=None, tokens=None, entityType=None, phraseType=None, confidence=None, text=None, childMentionIdList=None)

Bases: object


A span of text with a specific referent, such as a person,
organization, or time. Things that can be referred to by a mention
are called “entities.”

It is left up to individual EntityMention taggers to decide which
referent types and phrase types to identify. For example, some
EntityMention taggers may only identify proper nouns, or may only
identify EntityMentions that refer to people.

Each EntityMention consists of a sequence of tokens. This sequence
is usually annotated with information about the referent type
(e.g., is it a person, or a location, or an organization, etc) as
well as the phrase type (is it a name, pronoun, common noun, etc.).

EntityMentions typically consist of a single noun phrase; however,
other phrase types may also be marked as mentions. For
example, in the phrase “French hotel,” the adjective “French” might
be marked as a mention for France.

Attributes:
- uuid
- tokens: Pointer to sequence of tokens.

Special note: In the case of PRO-drop, where there is no explicit
mention, but an EntityMention is needed for downstream Entity
analysis, this field should be set to a TokenRefSequence with an
empty tokenIndexList and the anchorTokenIndex set to the head/only
token of the verb/predicate from which the PRO was dropped.
- entityType: The type of referent that is referred to by this mention.
- phraseType: The phrase type of the tokens that constitute this mention.
- confidence: A confidence score for this individual mention. You can also
set a confidence score for an entire EntityMentionSet using the
EntityMentionSet’s metadata.
- text: The text content of this entity mention. This field is
typically redundant with the string formed by cross-referencing
the ‘tokens.tokenIndexList’ field with this mention’s
tokenization. This field may not be generated by all analytics.
- childMentionIdList: A list of pointers to the “child” EntityMentions of this
EntityMention.

read(iprot)
validate()
write(oprot)
class concrete.entities.ttypes.EntityMentionSet(uuid=None, metadata=None, mentionList=None, linkingList=None)

Bases: object


A theory about the set of entity mentions that are present in a
message. See also: EntityMention

This type does not represent a coreference relationship, which is handled by Entity.
This type is meant to represent the output of a entity-mention-identifier,
which is often a part of an in-doc coreference system.

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- mentionList: List of mentions in this set.
- linkingList: Entity linking annotations associated with this EntityMentionSet.

read(iprot)
validate()
write(oprot)
class concrete.entities.ttypes.EntitySet(uuid=None, metadata=None, entityList=None, linkingList=None, mentionSetId=None)

Bases: object


A theory about the set of entities that are present in a
message. See also: Entity.

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- entityList: List of entities in this set.
- linkingList: Entity linking annotations associated with this EntitySet.
- mentionSetId: An optional UUID pointer to an EntityMentionSet.

If this field is present, consumers can assume that all
Entity objects in this EntitySet have EntityMentions that are included
in the named EntityMentionSet.

read(iprot)
validate()
write(oprot)
Module contents

concrete.exceptions package

Submodules
concrete.exceptions.constants module
concrete.exceptions.ttypes module
exception concrete.exceptions.ttypes.ConcreteThriftException(message=None, serEx=None)

Bases: thrift.Thrift.TException


An exception to be used with Concrete thrift
services.

Attributes:
- message
- serEx

read(iprot)
validate()
write(oprot)
Module contents

concrete.language package

Submodules
concrete.language.constants module
concrete.language.ttypes module
class concrete.language.ttypes.LanguageIdentification(uuid=None, metadata=None, languageToProbabilityMap=None)

Bases: object


A theory about what languages are present in a given communication
or piece of communication. Note that it is possible to have more
than one language present in a given communication.

Attributes:
- uuid: Unique identifier for this language identification.
- metadata: Information about where this language identification came from.
- languageToProbabilityMap: A list mapping from a language to the probability that that
language occurs in a given communication. Each language code should
occur at most once in this list. The probabilities do <i>not</i>
need to sum to one – for example, if a single communication is known
to contain both English and French, then it would be appropriate
to assign a probability of 1 to both langauges. (Manually
annotated LanguageProb objects should always have probabilities
of either zero or one; machine-generated LanguageProbs may have
intermediate probabilities.)

Note: The string key should represent the ISO 639-3 three-letter code.

read(iprot)
validate()
write(oprot)
Module contents

concrete.learn package

Submodules
concrete.learn.ActiveLearnerClientService module
class concrete.learn.ActiveLearnerClientService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.learn.ActiveLearnerClientService.Iface


The active learner client implements a method to accept new sorts of the annotation units

recv_submitSort()
send_submitSort(sessionId, unitIds)
submitSort(sessionId, unitIds)

Submit a new sort of communications to the broker

Parameters:
- sessionId
- unitIds

class concrete.learn.ActiveLearnerClientService.Iface

Bases: concrete.services.Service.Iface


The active learner client implements a method to accept new sorts of the annotation units

submitSort(sessionId, unitIds)

Submit a new sort of communications to the broker

Parameters:
- sessionId
- unitIds

class concrete.learn.ActiveLearnerClientService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.learn.ActiveLearnerClientService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_submitSort(seqid, iprot, oprot)
class concrete.learn.ActiveLearnerClientService.submitSort_args(sessionId=None, unitIds=None)

Bases: object


Attributes:
- sessionId
- unitIds

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerClientService.submitSort_result

Bases: object

read(iprot)
validate()
write(oprot)
concrete.learn.ActiveLearnerServerService module
class concrete.learn.ActiveLearnerServerService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.learn.ActiveLearnerServerService.Iface


The active learning server is responsible for sorting a list of communications.
Users annotate communications based on the sort.

Active learning is an asynchronous process.
It is started by the client calling start().
At arbitrary times, the client can call addAnnotations().
When the server is done with a sort of the data, it calls submitSort() on the client.
The server can perform additional sorts until stop() is called.

The server must be preconfigured with the details of the data source to pull communications.

addAnnotations(sessionId, annotations)

Add annotations from the user to the learning process

Parameters:
- sessionId
- annotations

recv_addAnnotations()
recv_start()
recv_stop()
send_addAnnotations(sessionId, annotations)
send_start(sessionId, task, contact)
send_stop(sessionId)
start(sessionId, task, contact)

Start an active learning session on these communications

Parameters:
- sessionId
- task
- contact

stop(sessionId)

Stop the learning session

Parameters:
- sessionId

class concrete.learn.ActiveLearnerServerService.Iface

Bases: concrete.services.Service.Iface


The active learning server is responsible for sorting a list of communications.
Users annotate communications based on the sort.

Active learning is an asynchronous process.
It is started by the client calling start().
At arbitrary times, the client can call addAnnotations().
When the server is done with a sort of the data, it calls submitSort() on the client.
The server can perform additional sorts until stop() is called.

The server must be preconfigured with the details of the data source to pull communications.

addAnnotations(sessionId, annotations)

Add annotations from the user to the learning process

Parameters:
- sessionId
- annotations

start(sessionId, task, contact)

Start an active learning session on these communications

Parameters:
- sessionId
- task
- contact

stop(sessionId)

Stop the learning session

Parameters:
- sessionId

class concrete.learn.ActiveLearnerServerService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.learn.ActiveLearnerServerService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_addAnnotations(seqid, iprot, oprot)
process_start(seqid, iprot, oprot)
process_stop(seqid, iprot, oprot)
class concrete.learn.ActiveLearnerServerService.addAnnotations_args(sessionId=None, annotations=None)

Bases: object


Attributes:
- sessionId
- annotations

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.addAnnotations_result

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.start_args(sessionId=None, task=None, contact=None)

Bases: object


Attributes:
- sessionId
- task
- contact

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.start_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.stop_args(sessionId=None)

Bases: object


Attributes:
- sessionId

read(iprot)
validate()
write(oprot)
class concrete.learn.ActiveLearnerServerService.stop_result

Bases: object

read(iprot)
validate()
write(oprot)
concrete.learn.constants module
concrete.learn.ttypes module
class concrete.learn.ttypes.Annotation(id=None, communication=None)

Bases: object


Annotation on a communication.

Attributes:
- id: Identifier of the part of the communication being annotated.
- communication: Communication with the annotation stored in it.
The location of the annotation depends on the annotation unit identifier

read(iprot)
validate()
write(oprot)
class concrete.learn.ttypes.AnnotationTask(type=None, language=None, unitType=None, units=None)

Bases: object


Annotation task including information for pulling data.

Attributes:
- type: Type of annotation task
- language: Language of the data for the task
- unitType: Entire communication or individual sentences
- units: Identifiers for each annotation unit

read(iprot)
validate()
write(oprot)
Module contents

concrete.linking package

Submodules
concrete.linking.constants module
concrete.linking.ttypes module

Bases: object


A structure that represents the origin of an entity linking annotation.

Attributes:
- sourceId: The “root” of this Link; points to a EntityMention UUID, Entity UUID, etc.
- linkTargetList: A list of LinkTarget objects that this Link contains.

read(iprot)
validate()
write(oprot)
class concrete.linking.ttypes.LinkTarget(confidence=None, targetId=None, dbId=None, dbName=None)

Bases: object


A structure that represents the target of an entity linking annotation.

Attributes:
- confidence: Confidence of this LinkTarget object.
- targetId: A UUID that represents the target of this LinkTarget. This
UUID should exist in the Entity/Situation(Mention)Set that the
Linking object is contained in.
- dbId: A database ID that represents the target of this linking.

This should be used if the target of the linking is not associated
with an Entity|Situation(Mention)Set in Concrete, and therefore cannot be linked by
a UUID internal to concrete.

If present, other optional field ‘dbName’ should also be populated.
- dbName: The name of the database that represents the target of this linking.

Together with the ‘dbId’, this can form a pointer to a target
that is not represented inside concrete.

Should be populated alongside ‘dbId’.

read(iprot)
validate()
write(oprot)
class concrete.linking.ttypes.Linking(metadata=None, linkList=None)

Bases: object


A structure that represents entity linking annotations.

Attributes:
- metadata: Metadata related to this Linking object.
- linkList: A list of Link objects that this Linking object contains.

read(iprot)
validate()
write(oprot)
Module contents

concrete.metadata package

Submodules
concrete.metadata.constants module
concrete.metadata.ttypes module
class concrete.metadata.ttypes.AnnotationMetadata(tool=None, timestamp=None, digest=None, dependencies=None, kBest=1)

Bases: object


Metadata associated with an annotation or a set of annotations,
that identifies where those annotations came from.

Attributes:
- tool: The name of the tool that generated this annotation.
- timestamp: The time at which this annotation was generated (in unix time
UTC – i.e., seconds since January 1, 1970).
- digest: A Digest, carrying over any information the annotation metadata
wishes to carry over.
- dependencies: The theories that supported this annotation.

An empty field indicates that the theory has no
dependencies (e.g., an ingester).
- kBest: An integer that represents a ranking for systems
that output k-best lists.

For systems that do not output k-best lists,
the default value (1) should suffice.

read(iprot)
validate()
write(oprot)
class concrete.metadata.ttypes.CommunicationMetadata(tweetInfo=None, emailInfo=None, nitfInfo=None)

Bases: object


Metadata specific to a particular Communication object.
This might include corpus-specific metadata (from the Twitter API),
attributes associated with the Communication (the author),
or other information about the Communication.

Attributes:
- tweetInfo: Extra information for communications where kind==TWEET:
Information about this tweet that is provided by the Twitter
API. For information about the Twitter API, see:
- emailInfo: Extra information for communications where kind==EMAIL
- nitfInfo: Extra information that may come from the NITF
(News Industry Text Format) schema. See ‘nitf.thrift’.

read(iprot)
validate()
write(oprot)
class concrete.metadata.ttypes.Digest(bytesValue=None, int64Value=None, doubleValue=None, stringValue=None, int64List=None, doubleList=None, stringList=None)

Bases: object


Analytic-specific information about an attribute or edge. Digests
are used to combine information from multiple sources to generate a
unified value. The digests generated by an analytic will only ever
be used by that same analytic, so analytics can feel free to encode
information in whatever way is convenient.

Attributes:
- bytesValue: The following fields define various ways you can store the
digest data (for convenience). If none of these meets your
needs, then serialize the digest to a byte sequence and store it
in bytesValue.
- int64Value
- doubleValue
- stringValue
- int64List
- doubleList
- stringList

read(iprot)
validate()
write(oprot)
class concrete.metadata.ttypes.TheoryDependencies(sectionTheoryList=None, sentenceTheoryList=None, tokenizationTheoryList=None, posTagTheoryList=None, nerTagTheoryList=None, lemmaTheoryList=None, langIdTheoryList=None, parseTheoryList=None, dependencyParseTheoryList=None, tokenAnnotationTheoryList=None, entityMentionSetTheoryList=None, entitySetTheoryList=None, situationMentionSetTheoryList=None, situationSetTheoryList=None, communicationsList=None)

Bases: object


A struct that holds UUIDs for all theories that a particular
annotation was based upon (and presumably requires).

Producers of TheoryDependencies should list all stages that they
used in constructing their particular annotation. They do not,
however, need to explicitly label each stage; they can label
only the immediate stage before them.

Examples:

If you are producing a Tokenization, and only used the
SentenceSegmentation in order to produce that Tokenization, list
only the single SentenceSegmentation UUID in sentenceTheoryList.

In this example, even though the SentenceSegmentation will have
a dependency on some SectionSegmentation, it is not necessary
for the Tokenization to list the SectionSegmentation UUID as a
dependency.

If you are a producer of EntityMentions, and you use two
POSTokenTagging and one NERTokenTagging objects, add the UUIDs for
the POSTokenTagging objects to posTagTheoryList, and the UUID of
the NER TokenTagging to the nerTagTheoryList.

In this example, because multiple annotations influenced the
new annotation, they should all be listed as dependencies.

Attributes:
- sectionTheoryList
- sentenceTheoryList
- tokenizationTheoryList
- posTagTheoryList
- nerTagTheoryList
- lemmaTheoryList
- langIdTheoryList
- parseTheoryList
- dependencyParseTheoryList
- tokenAnnotationTheoryList
- entityMentionSetTheoryList
- entitySetTheoryList
- situationMentionSetTheoryList
- situationSetTheoryList
- communicationsList

read(iprot)
validate()
write(oprot)
Module contents

concrete.nitf package

Submodules
concrete.nitf.constants module
concrete.nitf.ttypes module
class concrete.nitf.ttypes.NITFInfo(alternateURL=None, articleAbstract=None, authorBiography=None, banner=None, biographicalCategoryList=None, columnName=None, columnNumber=None, correctionDate=None, correctionText=None, credit=None, dayOfWeek=None, descriptorList=None, featurePage=None, generalOnlineDescriptorList=None, guid=None, kicker=None, leadParagraphList=None, locationList=None, nameList=None, newsDesk=None, normalizedByline=None, onlineDescriptorList=None, onlineHeadline=None, onlineLeadParagraph=None, onlineLocationList=None, onlineOrganizationList=None, onlinePeople=None, onlineSectionList=None, onlineTitleList=None, organizationList=None, page=None, peopleList=None, publicationDate=None, publicationDayOfMonth=None, publicationMonth=None, publicationYear=None, section=None, seriesName=None, slug=None, taxonomicClassifierList=None, titleList=None, typesOfMaterialList=None, url=None, wordCount=None)

Bases: object


Attributes:
- alternateURL: This field specifies the URL of the article, if published online. In some
cases, such as with the New York Times, when this field is present,
the URL is preferred to the URL field on articles published on
or after April 02, 2006, as the linked page will have richer content.
- articleAbstract: This field is a summary of the article, possibly written by
an indexing service.
- authorBiography: This field specifies the biography of the author of the article.
Generally, this field is specified for guest authors, and not for
regular reporters, except to provide the author’s email address.
- banner: The banner field is used to indicate if there has been additional
information appended to the articles since its publication. Examples of
banners include (‘Correction Appended’ and ‘Editor’s Note Appended’).
- biographicalCategoryList: When present, the biographical category field generally indicates that a
document focuses on a particular individual. The value of the field
indicates the area or category in which this individual is best known.
This field is most often defined for Obituaries and Book Reviews.

<ol>
<li>Politics and Government (U.S.)</li>
<li>Books and Magazines <li>Royalty</li>
</ol>
- columnName: If the article is part of a regular column, this field specifies the name
of that column.
<br>
Sample Column Names:
<br>
<ol>
<li>World News Briefs</li>
<li>WEDDINGS</li>
<li>The Accessories Channel</li>
</ol>

- columnNumber: This field specifies the column in which the article starts in the print
paper. A typical printed page in the paper has six columns numbered from
right to left. As a consequence most, but not all, of the values for this
field fall in the range 1-6.
- correctionDate: This field specifies the date on which a correction was made to the
article. Generally, if the correction date is specified, the correction
text will also be specified (and vice versa).
- correctionText: For articles corrected following publication, this field specifies the
correction. Generally, if the correction text is specified, the
correction date will also be specified (and vice versa).
- credit: This field indicates the entity that produced the editorial content of
this document.
- dayOfWeek: This field specifies the day of week on which the article was published.
<ul>
<li>Monday</li>
<li>Tuesday</li>
<li>Wednesday</li>
<li>Thursday</li>
<li>Friday</li>
<li>Saturday</li>
<li>Sunday</li>
</ul>
- descriptorList: The &quot;descriptors&quot; field specifies a list of descriptive terms drawn from
a normalized controlled vocabulary corresponding to subjects mentioned in
the article.
<br>
Examples Include:
<ol>
<li>ECONOMIC CONDITIONS AND TRENDS</li>
<li>AIRPLANES</li>
<li>VIOLINS</li>
</ol>
- featurePage: The feature page containing this article, such as
<ul>
<li>Education Page</li>
<li>Fashion Page</li>
</ul>
- generalOnlineDescriptorList: The &quot;general online descriptors&quot; field specifies a list of descriptors
that are at a higher level of generality than the other tags associated
with the article.
<br>
Examples Include:
<ol>
<li>Surfing</li>
<li>Venice Biennale</li>
<li>Ranches</li>
</ol>
- guid: The GUID field specifies an integer that is guaranteed to be unique for
every document in the corpus.
- kicker: The kicker is an additional piece of information printed as an
accompaniment to a news headline.
- leadParagraphList: The &quot;lead Paragraph&quot; field is the lead paragraph of the article.
Generally this field is populated with the first two paragraphs from the
article.
- locationList: The &quot;locations&quot; field specifies a list of geographic descriptors drawn
from a normalized controlled vocabulary that correspond to places
mentioned in the article.
<br>
Examples Include:
<ol>
<li>Wellsboro (Pa)</li>
<li>Kansas City (Kan)</li>
<li>Park Slope (NYC)</li>
</ol>
- nameList: The &quot;names&quot; field specifies a list of names mentioned in the article.
<br>
Examples Include:
<ol>
<li>Azza Fahmy</li>
<li>George C. Izenour</li>
<li>Chris Schenkel</li>
</ol>
- newsDesk: This field specifies the desk in the newsroom that
produced the article. The desk is related to, but is not the same as the
section in which the article appears.
- normalizedByline: The Normalized Byline field is the byline normalized to the form (last
name, first name).
- onlineDescriptorList: This field specifies a list of descriptors from a normalized controlled
vocabulary that correspond to topics mentioned in the article.
<br>
Examples Include:
<ol>
<li>Marriages</li>
<li>Parks and Other Recreation Areas</li>
<li>Cooking and Cookbooks</li>
</ol>
- onlineHeadline: This field specifies the headline displayed with the article
online. Often this differs from the headline used in print.
- onlineLeadParagraph: This field specifies the lead paragraph for the online version.
- onlineLocationList: This field specifies a list of place names that correspond to geographic
locations mentioned in the article.
<br>
Examples Include:
<ol>
<li>Hollywood</li>
<li>Los Angeles</li>
<li>Arcadia</li>
</ol>
- onlineOrganizationList: This field specifies a list of organizations that correspond to
organizations mentioned in the article.
<br>
Examples Include:
<ol>
<li>Nintendo Company Limited</li>
<li>Yeshiva University</li>
<li>Rose Center</li>
</ol>
- onlinePeople: This field specifies a list of people that correspond to individuals
mentioned in the article.
<br>
Examples Include:
<ol>
<li>Lopez, Jennifer</li>
<li>Joyce, James</li>
<li>Robinson, Jackie</li>
</ol>
- onlineSectionList: This field specifies the section(s) in which the online version of the article
is placed. This may typically be populated from a semicolon (;) delineated list.
- onlineTitleList: This field specifies a list of authored works mentioned in the article.
<br>
Examples Include:
<ol>
<li>Matchstick Men (Movie)</li>
<li>Blades of Glory (Movie)</li>
<li>Bridge and Tunnel (Play)</li>
</ol>
- organizationList: This field specifies a list of organization names drawn from a normalized
controlled vocabulary that correspond to organizations mentioned in the
article.
<br>
Examples Include:
<ol>
<li>Circuit City Stores Inc</li>
<li>Delaware County Community College (Pa)</li>
<li>CONNECTICUT GRAND OPERA</li>
</ol>
- page: This field specifies the page of the section in the paper in which the
article appears. This is not an absolute pagination. An article that
appears on page 3 in section A occurs in the physical paper before an
article that occurs on page 1 of section F. The section is encoded in
the <strong>section</strong> field.
- peopleList: This field specifies a list of people from a normalized controlled
vocabulary that correspond to individuals mentioned in the article.
<br>
Examples Include:
<ol>
<li>REAGAN, RONALD WILSON (PRES)</li>
<li>BEGIN, MENACHEM (PRIME MIN)</li>
<li>COLLINS, GLENN</li>
</ol>
- publicationDate: This field specifies the date of the article’s publication.
- publicationDayOfMonth: This field specifies the day of the month on which the article was
published, always in the range 1-31.
- publicationMonth: This field specifies the month on which the article was published in the
range 1-12 where 1 is January 2 is February etc.
- publicationYear: This field specifies the year in which the article was published. This
value is in the range 1987-2007 for this collection.
- section: This field specifies the section of the paper in which the article
appears. This is not the name of the section, but rather a letter or
number that indicates the section.
- seriesName: If the article is part of a regular series, this field specifies the name
of that column.
- slug: The slug is a short string that uniquely identifies an article from all
other articles published on the same day. Please note, however, that
different articles on different days may have the same slug.
<ul>
<li>30other</li>
<li>12reunion</li>
</ul>
- taxonomicClassifierList: This field specifies a list of taxonomic classifiers that place this
article into a hierarchy of articles. The individual terms of each
taxonomic classifier are separated with the ‘/’ character.
<br>
Examples Include:
<ol>
<li>Top/Features/Travel/Guides/Destinations/North America/United
States/Arizona</li>
<li>Top/News/U.S./Rockies</li>
<li>Top/Opinion</li>
</ol>
- titleList: This field specifies a list of authored works that correspond to works
mentioned in the article.
<br>
Examples Include:
<ol>
<li>Greystoke: The Legend of Tarzan, Lord of the Apes (Movie)</li>
<li>Law and Order (TV Program)</li>
<li>BATTLEFIELD EARTH (BOOK)</li>
</ol>
- typesOfMaterialList: This field specifies a normalized list of terms describing the general
editorial category of the article.
<br>
Examples Include:
<ol>
<li>REVIEW</li>
<li>OBITUARY</li>
<li>ANALYSIS</li>
</ol>
- url: This field specifies the location of the online version of the article. The
&quot;Alternative Url&quot; field is preferred to this field on articles published
on or after April 02, 2006, as the linked page will have richer content.
- wordCount: This field specifies the number of words in the body of the article,
including the lead paragraph.

read(iprot)
validate()
write(oprot)
Module contents

concrete.search package

Submodules
concrete.search.FeedbackService module
class concrete.search.FeedbackService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.search.FeedbackService.Iface

addCommunicationFeedback(searchResultsId, communicationId, feedback)

Provide feedback on the relevance of a particular communication to a search

Parameters:
- searchResultsId
- communicationId
- feedback

addSentenceFeedback(searchResultsId, communicationId, sentenceId, feedback)

Provide feedback on the relevance of a particular sentence to a search

Parameters:
- searchResultsId
- communicationId
- sentenceId
- feedback

recv_addCommunicationFeedback()
recv_addSentenceFeedback()
recv_startFeedback()
send_addCommunicationFeedback(searchResultsId, communicationId, feedback)
send_addSentenceFeedback(searchResultsId, communicationId, sentenceId, feedback)
send_startFeedback(results)
startFeedback(results)

Start providing feedback for the specified SearchResults.
This causes the search and its results to be persisted.

Parameters:
- results

class concrete.search.FeedbackService.Iface

Bases: concrete.services.Service.Iface

addCommunicationFeedback(searchResultsId, communicationId, feedback)

Provide feedback on the relevance of a particular communication to a search

Parameters:
- searchResultsId
- communicationId
- feedback

addSentenceFeedback(searchResultsId, communicationId, sentenceId, feedback)

Provide feedback on the relevance of a particular sentence to a search

Parameters:
- searchResultsId
- communicationId
- sentenceId
- feedback

startFeedback(results)

Start providing feedback for the specified SearchResults.
This causes the search and its results to be persisted.

Parameters:
- results

class concrete.search.FeedbackService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.search.FeedbackService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_addCommunicationFeedback(seqid, iprot, oprot)
process_addSentenceFeedback(seqid, iprot, oprot)
process_startFeedback(seqid, iprot, oprot)
class concrete.search.FeedbackService.addCommunicationFeedback_args(searchResultsId=None, communicationId=None, feedback=None)

Bases: object


Attributes:
- searchResultsId
- communicationId
- feedback

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.addCommunicationFeedback_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.addSentenceFeedback_args(searchResultsId=None, communicationId=None, sentenceId=None, feedback=None)

Bases: object


Attributes:
- searchResultsId
- communicationId
- sentenceId
- feedback

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.addSentenceFeedback_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.startFeedback_args(results=None)

Bases: object


Attributes:
- results

read(iprot)
validate()
write(oprot)
class concrete.search.FeedbackService.startFeedback_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
concrete.search.SearchProxyService module
class concrete.search.SearchProxyService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.search.SearchProxyService.Iface


The search proxy service provides a single interface to multiple search providers

getCapabilities(provider)

Get a list of search type and language pairs for a search provider

Parameters:
- provider

getCorpora(provider)

Get a corpus list for a search provider

Parameters:
- provider

getProviders()

Get a list of search providers behind the proxy

recv_getCapabilities()
recv_getCorpora()
recv_getProviders()
search(query, provider)

Specify the search provider when performing a search

Parameters:
- query
- provider

send_getCapabilities(provider)
send_getCorpora(provider)
send_getProviders()
class concrete.search.SearchProxyService.Iface

Bases: concrete.services.Service.Iface


The search proxy service provides a single interface to multiple search providers

getCapabilities(provider)

Get a list of search type and language pairs for a search provider

Parameters:
- provider

getCorpora(provider)

Get a corpus list for a search provider

Parameters:
- provider

getProviders()

Get a list of search providers behind the proxy

search(query, provider)

Specify the search provider when performing a search

Parameters:
- query
- provider

class concrete.search.SearchProxyService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.search.SearchProxyService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_getCapabilities(seqid, iprot, oprot)
process_getCorpora(seqid, iprot, oprot)
process_getProviders(seqid, iprot, oprot)
class concrete.search.SearchProxyService.getCapabilities_args(provider=None)

Bases: object


Attributes:
- provider

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getCapabilities_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getCorpora_args(provider=None)

Bases: object


Attributes:
- provider

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getCorpora_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getProviders_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.getProviders_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.search_args(query=None, provider=None)

Bases: object


Attributes:
- query
- provider

read(iprot)
validate()
write(oprot)
class concrete.search.SearchProxyService.search_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
concrete.search.SearchService module
class concrete.search.SearchService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.search.SearchService.Iface

getCapabilities()

Get a list of search type-language pairs

getCorpora()

Get a corpus list from the search provider

recv_getCapabilities()
recv_getCorpora()
search(query)

Perform a search specified by the query

Parameters:
- query

send_getCapabilities()
send_getCorpora()
class concrete.search.SearchService.Iface

Bases: concrete.services.Service.Iface

getCapabilities()

Get a list of search type-language pairs

getCorpora()

Get a corpus list from the search provider

search(query)

Perform a search specified by the query

Parameters:
- query

class concrete.search.SearchService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.search.SearchService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_getCapabilities(seqid, iprot, oprot)
process_getCorpora(seqid, iprot, oprot)
class concrete.search.SearchService.getCapabilities_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.getCapabilities_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.getCorpora_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.getCorpora_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.search_args(query=None)

Bases: object


Attributes:
- query

read(iprot)
validate()
write(oprot)
class concrete.search.SearchService.search_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
concrete.search.constants module
concrete.search.ttypes module
class concrete.search.ttypes.SearchCapability(type=None, lang=None)

Bases: object


A search provider describes its capabilities with a list of search type and language pairs.

Attributes:
- type: A type of search supported by the search provider
- lang: Language that the search provider supports.
Use ISO 639-2/T three letter codes.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchFeedback

Bases: object


Feedback values

NEGATIVE = -1
NONE = 0
POSITIVE = 1
class concrete.search.ttypes.SearchQuery(terms=None, questions=None, communicationId=None, tokens=None, rawQuery=None, auths=None, userId=None, name=None, labels=None, type=None, lang=None, corpus=None, k=None, communication=None)

Bases: object


Wrapper for information relevant to a (possibly structured) search.

Attributes:
- terms: Individual words, or multiword phrases, e.g., ‘dog’, ‘blue
cheese’. It is the responsibility of the implementation of
Search* to tokenize multiword phrases, if so-desired. Further,
an implementation may choose to support advanced features such as
wildcards, e.g.: ‘blue*’. This specification makes no
committment as to the internal structure of keywords and their
semantics: that is the responsibility of the individual
implementation.
- questions: e.g., “what is the capital of spain?”

questions is a list in order that possibly different phrasings of
the question can be included, e.g.: “what is the name of spain’s
capital?”
- communicationId: Refers to an optional communication that can provide context for the query.
- tokens: Refers to a sequence of tokens in the communication referenced by communicationId.
- rawQuery: The input from the user provided in the search box, unmodified
- auths: optional authorization mechanism
- userId: Identifies the user who submitted the search query
- name: Human readable name of the query.
- labels: Properties of the query or user.
These labels can be used to group queries and results by a domain or group of
users for training. An example usage would be assigning the geographical region
as a label (“spain”). User labels could be based on organizational units (“hltcoe”).
- type: This search is over this type of data (communications, sentences, entities)
- lang: The language of the corpus that the user wants to search.
Use ISO 639-2/T three letter codes.
- corpus: An identifier of the corpus that the search is to be performed over.
- k: The maximum number of candidates the search service should return.
- communication: An optional communication used as context for the query.
If both this field and communicationId is populated, then it is
assumed the ID of the communication is the same as communicationId.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchResult(uuid=None, searchQuery=None, searchResultItems=None, metadata=None, lang=None)

Bases: object


Single wrapper for results from all the various Search* services.

Attributes:
- uuid: Unique identifier for the results of this search.
- searchQuery: The query that led to this result.
Useful for capturing feedback or building training data.
- searchResultItems: The list is assumed sorted best to worst, which should be
reflected by the values contained in the score field of each
SearchResult, if that field is populated.
- metadata: The system that provided the response: likely use case for
populating this field is for building training data. Presumably
a system will not need/want to return this object in live use.
- lang: The dominant language of the search results.
Use ISO 639-2/T three letter codes.
Search providers should set this when possible to support downstream processing.
Do not set if it is not known.
If multilingual, use the string “multilingual”.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchResultItem(communicationId=None, sentenceId=None, score=None, tokens=None)

Bases: object


An individual element returned from a search. Most/all methods
will return a communicationId, possibly with an associated score.
For example if the target element type of the search is Sentence
then the sentenceId field should be populated.

Attributes:
- communicationId
- sentenceId: The UUID of the returned sentence, which appears in the
communication referenced by communicationId.
- score: Values are not restricted in range (e.g., do not have to be
within [0,1]). Higher is better.

- tokens: If the Search is meant to result in a tokenRefSequence, this is
that result. Otherwise, this field may be optionally populated
in order to provide a hint to the client as to where to center a
visualization, or the extraction of context, etc.

read(iprot)
validate()
write(oprot)
class concrete.search.ttypes.SearchType

Bases: object


What are we searching over

COMMUNICATIONS = 0
ENTITIES = 3
ENTITY_MENTIONS = 4
SECTIONS = 1
SENTENCES = 2
SITUATIONS = 5
SITUATION_MENTIONS = 6
Module contents

concrete.services package

Subpackages
concrete.services.results package
Submodules
concrete.services.results.ResultsServerService module
class concrete.services.results.ResultsServerService.Client(iprot, oprot=None)

Bases: concrete.services.Service.Client, concrete.services.results.ResultsServerService.Iface

getLatestSearchResult(userId)

Get the most recent search results for a user

Parameters:
- userId

getNextChunk(sessionId)

Get next chunk of data to annotate
The client should use the Retriever service to access the data

Parameters:
- sessionId

getSearchResult(searchResultId)

Get a search result object

Parameters:
- searchResultId

getSearchResults(taskType, limit)

Get a list of search results for a particular annotation task
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- limit

getSearchResultsByUser(taskType, userId, limit)

Get a list of search results for a particular annotation task filtered by a user id
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- userId
- limit

recv_getLatestSearchResult()
recv_getNextChunk()
recv_getSearchResult()
recv_getSearchResults()
recv_getSearchResultsByUser()
recv_registerSearchResult()
recv_startSession()
recv_stopSession()
recv_submitAnnotation()
registerSearchResult(result, taskType)

Register the specified search result for annotation.

If a name has not been assigned to the search query, one will be generated.
This service also requires that the user_id field be populated in the SearchQuery.

Parameters:
- result
- taskType

send_getLatestSearchResult(userId)
send_getNextChunk(sessionId)
send_getSearchResult(searchResultId)
send_getSearchResults(taskType, limit)
send_getSearchResultsByUser(taskType, userId, limit)
send_registerSearchResult(result, taskType)
send_startSession(searchResultId, taskType)
send_stopSession(sessionId)
send_submitAnnotation(sessionId, unitId, communication)
startSession(searchResultId, taskType)

Start an annotation session
Returns a session id used in future session calls

Parameters:
- searchResultId
- taskType

stopSession(sessionId)

Stops an annotation session

Parameters:
- sessionId

submitAnnotation(sessionId, unitId, communication)

Submit an annotation for a session

Parameters:
- sessionId
- unitId
- communication

class concrete.services.results.ResultsServerService.Iface

Bases: concrete.services.Service.Iface

getLatestSearchResult(userId)

Get the most recent search results for a user

Parameters:
- userId

getNextChunk(sessionId)

Get next chunk of data to annotate
The client should use the Retriever service to access the data

Parameters:
- sessionId

getSearchResult(searchResultId)

Get a search result object

Parameters:
- searchResultId

getSearchResults(taskType, limit)

Get a list of search results for a particular annotation task
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- limit

getSearchResultsByUser(taskType, userId, limit)

Get a list of search results for a particular annotation task filtered by a user id
Set the limit to 0 to get all relevant search results

Parameters:
- taskType
- userId
- limit

registerSearchResult(result, taskType)

Register the specified search result for annotation.

If a name has not been assigned to the search query, one will be generated.
This service also requires that the user_id field be populated in the SearchQuery.

Parameters:
- result
- taskType

startSession(searchResultId, taskType)

Start an annotation session
Returns a session id used in future session calls

Parameters:
- searchResultId
- taskType

stopSession(sessionId)

Stops an annotation session

Parameters:
- sessionId

submitAnnotation(sessionId, unitId, communication)

Submit an annotation for a session

Parameters:
- sessionId
- unitId
- communication

class concrete.services.results.ResultsServerService.Processor(handler)

Bases: concrete.services.Service.Processor, concrete.services.results.ResultsServerService.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_getLatestSearchResult(seqid, iprot, oprot)
process_getNextChunk(seqid, iprot, oprot)
process_getSearchResult(seqid, iprot, oprot)
process_getSearchResults(seqid, iprot, oprot)
process_getSearchResultsByUser(seqid, iprot, oprot)
process_registerSearchResult(seqid, iprot, oprot)
process_startSession(seqid, iprot, oprot)
process_stopSession(seqid, iprot, oprot)
process_submitAnnotation(seqid, iprot, oprot)
class concrete.services.results.ResultsServerService.getLatestSearchResult_args(userId=None)

Bases: object


Attributes:
- userId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getLatestSearchResult_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getNextChunk_args(sessionId=None)

Bases: object


Attributes:
- sessionId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getNextChunk_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResult_args(searchResultId=None)

Bases: object


Attributes:
- searchResultId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResult_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResultsByUser_args(taskType=None, userId=None, limit=None)

Bases: object


Attributes:
- taskType
- userId
- limit

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResultsByUser_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResults_args(taskType=None, limit=None)

Bases: object


Attributes:
- taskType
- limit

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.getSearchResults_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.registerSearchResult_args(result=None, taskType=None)

Bases: object


Attributes:
- result
- taskType

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.registerSearchResult_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.startSession_args(searchResultId=None, taskType=None)

Bases: object


Attributes:
- searchResultId
- taskType

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.startSession_result(success=None, ex=None)

Bases: object


Attributes:
- success
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.stopSession_args(sessionId=None)

Bases: object


Attributes:
- sessionId

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.stopSession_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.submitAnnotation_args(sessionId=None, unitId=None, communication=None)

Bases: object


Attributes:
- sessionId
- unitId
- communication

read(iprot)
validate()
write(oprot)
class concrete.services.results.ResultsServerService.submitAnnotation_result(ex=None)

Bases: object


Attributes:
- ex

read(iprot)
validate()
write(oprot)
concrete.services.results.constants module
concrete.services.results.ttypes module
Module contents
Submodules
concrete.services.Service module
class concrete.services.Service.Client(iprot, oprot=None)

Bases: concrete.services.Service.Iface


Base service that all other services should inherit from

about()

Get information about the service

alive()

Is the service alive?

recv_about()
recv_alive()
send_about()
send_alive()
class concrete.services.Service.Iface

Bases: object


Base service that all other services should inherit from

about()

Get information about the service

alive()

Is the service alive?

class concrete.services.Service.Processor(handler)

Bases: concrete.services.Service.Iface, thrift.Thrift.TProcessor

process(iprot, oprot)
process_about(seqid, iprot, oprot)
process_alive(seqid, iprot, oprot)
class concrete.services.Service.about_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.services.Service.about_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
class concrete.services.Service.alive_args

Bases: object

read(iprot)
validate()
write(oprot)
class concrete.services.Service.alive_result(success=None)

Bases: object


Attributes:
- success

read(iprot)
validate()
write(oprot)
concrete.services.constants module
concrete.services.ttypes module
class concrete.services.ttypes.AnnotationTaskType

Bases: object


Annotation Tasks Types

NER = 2
TRANSLATION = 1
class concrete.services.ttypes.AnnotationUnitIdentifier(communicationId=None, sentenceId=None)

Bases: object


An annotation unit is the part of the communication to be annotated.
It can be the entire communication or a particular sentence in the communication.
If the sentenceID is null, the unit is the entire communication

Attributes:
- communicationId: Communication identifier for loading data
- sentenceId: Sentence identifer if annotating sentences

read(iprot)
validate()
write(oprot)
class concrete.services.ttypes.AnnotationUnitType

Bases: object


An annotation unit is the part of the communication to be annotated.

COMMUNICATION = 1
SENTENCE = 2
class concrete.services.ttypes.AsyncContactInfo(host=None, port=None)

Bases: object


Contact information for the asynchronous communications.
When a client contacts a server for a job that takes a significant amount of time,
it is often best to implement this asynchronously.
We do this by having the client stand up a server to accept the results and
passing that information to the original server.
The server may want to create a new thrift client on every request or maintain
a pool of clients for reuse.

Attributes:
- host
- port

read(iprot)
validate()
write(oprot)
exception concrete.services.ttypes.NotImplementedException(message=None, serEx=None)

Bases: thrift.Thrift.TException


An exception to be used when an invoked method has
not been implemented by the service.

Attributes:
- message: The explanation (why the exception occurred)
- serEx: The serialized exception

read(iprot)
validate()
write(oprot)
class concrete.services.ttypes.ServiceInfo(name=None, version=None, description=None)

Bases: object


Each service is described by this info struct.
It is for human consumption and for records of versions in deployments.

Attributes:
- name: Name of the service
- version: Version string of the service.
It is preferred that the services implement semantic versioning: http://semver.org/
with version strings like x.y.z
- description: Description of the service

read(iprot)
validate()
write(oprot)
exception concrete.services.ttypes.ServicesException(message=None, serEx=None)

Bases: thrift.Thrift.TException


An exception to be used with Concrete services.

Attributes:
- message: The explanation (why the exception occurred)
- serEx: The serialized exception

read(iprot)
validate()
write(oprot)
Module contents

concrete.situations package

Submodules
concrete.situations.constants module
concrete.situations.ttypes module
class concrete.situations.ttypes.Argument(role=None, entityId=None, situationId=None, propertyList=None)

Bases: object


A situation argument, consisting of an argument role and a value.
Argument values may be Entities or Situations.

Attributes:
- role: The relationship between this argument and the situation that
owns it. The roles that a situation’s arguments can take
depend on the type of the situation (including subtype
information, such as event_type).
- entityId: A pointer to the value of this argument, if it is explicitly
encoded as an Entity.
- situationId: A pointer to the value of this argument, if it is a situation.
- propertyList: For the BinarySRL task, there may be situations
where more than one property is attached to a single
participant. A list of these properties can be stored in this field.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.Justification(justificationType=None, mentionId=None, tokenRefSeqList=None)

Bases: object


Attributes:
- justificationType: An enumerated value used to describe the way in which the
justification’s mention provides supporting evidence for the
situation.
- mentionId: A pointer to the SituationMention itself.
- tokenRefSeqList: An optional list of pointers to tokens that are (especially)
relevant to the way in which this mention provides
justification for the situation. It is left up to individual
analytics to decide what tokens (if any) they wish to include
in this field.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.MentionArgument(role=None, entityMentionId=None, situationMentionId=None, tokens=None, constituent=None, confidence=None, propertyList=None)

Bases: object


A “concrete” argument, that may be used by SituationMentions or EntityMentions
to avoid conflicts where abstract Arguments were being used to support concrete Mentions.

Attributes:
- role: The relationship between this argument and the situation that
owns it. The roles that a situation’s arguments can take
depend on the type of the situation (including subtype
information, such as event_type).
- entityMentionId: A pointer to the value of an EntityMention, if this is being used to support
an EntityMention.
- situationMentionId: A pointer to the value of this argument, if it is a SituationMention.
- tokens: The location of this MentionArgument in the Communication.
If this MentionArgument can be identified in a document using an
EntityMention or SituationMention, then UUID references to those
types should be preferred and this field left as null.
- constituent: An alternative way to specify the same thing as tokens.
- confidence: Confidence of this argument belonging to its SituationMention
- propertyList: For the BinarySRL task, there may be situations
where more than one property is attached to a single
participant. A list of these properties can be stored in this field.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.Property(value=None, metadata=None, polarity=None)

Bases: object


Attached to Arguments to support situations where
a ‘participant’ has more than one ‘property’ (in BinarySRL terms),
whereas Arguments notionally only support one Role.

Attributes:
- value: The required value of the property.
- metadata: Metadata to support this particular property object.
- polarity: This value is typically boolean, 0.0 or 1.0, but we use a
float in order to potentially capture cases where an annotator is
highly confident that the value is underspecified, via a value of
0.5.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.Situation(uuid=None, situationType=None, situationKind=None, argumentList=None, mentionIdList=None, justificationList=None, timeML=None, intensity=None, polarity=None, confidence=None)

Bases: object


A single situation, along with pointers to situation mentions that
provide evidence for the situation. “Situations” include events,
relations, facts, sentiments, and beliefs. Each situation has a
core type (such as EVENT or SENTIMENT), along with an optional
subtype based on its core type (e.g., event_type=CONTACT_MEET), and
a set of zero or more unordered arguments.

This struct may be used for a variety of “processed” Situations such
as (but not limited to):
- SituationMentions which have been collapsed into a coreferential cluster
- Situations which are inferred and not directly supported by a textual mention

Attributes:
- uuid: Unique identifier for this situation.
- situationType: The core type of this situation (eg EVENT or SENTIMENT),
or a coarse grain situation type.
- situationKind: A fine grain situation type that specifically describes the
situation based on situationType above. It allows for more
detailed description of the situation.

Some examples:

if situationType == EVENT, the event type for the situation
if situationType == STATE, the state type
if situationType == TEMPORAL_FACT, the temporal fact type

For Propbank, this field should be the predicate lemma and id,
e.g. “strike.02”. For FrameNet, this should be the frame name,
e.g. “Commerce_buy”.

Different and more varied situationTypes may be added
in the future.
- argumentList: The arguments for this situation. Each argument consists of a
role and a value. It is possible for an situation to have
multiple arguments with the same role. Arguments are
unordered.
- mentionIdList: Ids of the mentions of this situation in a communication
(type=SituationMention)
- justificationList: An list of pointers to SituationMentions that provide
justification for this situation. These mentions may be either
direct mentions of the situation, or indirect evidence.
- timeML: A wrapper for TimeML annotations.
- intensity: An “intensity” rating for this situation, typically ranging from
0-1. In the case of SENTIMENT situations, this is used to record
the intensity of the sentiment.
- polarity: The polarity of this situation. In the case of SENTIMENT
situations, this is used to record the polarity of the
sentiment.
- confidence: A confidence score for this individual situation. You can also
set a confidence score for an entire SituationSet using the
SituationSet’s metadata.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.SituationMention(uuid=None, text=None, situationType=None, situationKind=None, argumentList=None, intensity=None, polarity=None, tokens=None, constituent=None, confidence=None)

Bases: object


A concrete mention of a situation, where “situations” include
events, relations, facts, sentiments, and beliefs. Each situation
has a core type (such as EVENT or SENTIMENT), along with an
optional subtype based on its core type (e.g.,
event_type=CONTACT_MEET), and a set of zero or more unordered
arguments.

This struct should be used for most types of SRL labelings
(e.g. Propbank and FrameNet) because they are grounded in text.

Attributes:
- uuid: Unique identifier for this situation.
- text: The text content of this situation mention. This field is
often redundant with the ‘tokens’ field, and may not
be generated by all analytics.
- situationType: The core type of this situation (eg EVENT or SENTIMENT),
or a coarse grain situation type.
- situationKind: A fine grain situation type that specifically describes the
situation mention based on situationType above. It allows for
more detailed description of the situation mention.

Some examples:

if situationType == EVENT, the event type for the sit. mention
if situationType == STATE, the state type for this sit. mention

For Propbank, this field should be the predicate lemma and id,
e.g. “strike.02”. For FrameNet, this should be the frame name,
e.g. “Commerce_buy”.

Different and more varied situationTypes may be added
in the future.
- argumentList: The arguments for this situation mention. Each argument
consists of a role and a value. It is possible for an situation
to have multiple arguments with the same role. Arguments are
unordered.
- intensity: An “intensity” rating for the situation, typically ranging from
0-1. In the case of SENTIMENT situations, this is used to record
the intensity of the sentiment.
- polarity: The polarity of this situation. In the case of SENTIMENT
situations, this is used to record the polarity of the
sentiment.
- tokens: An optional pointer to tokens that are (especially)
relevant to this situation mention. It is left up to individual
analytics to decide what tokens (if any) they wish to include in
this field. In particular, it is not specified whether the
arguments’ tokens should be included.
- constituent: An alternative way to specify the same thing as tokens.
- confidence: A confidence score for this individual situation mention. You
can also set a confidence score for an entire SituationMentionSet
using the SituationMentionSet’s metadata.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.SituationMentionSet(uuid=None, metadata=None, mentionList=None, linkingList=None)

Bases: object


A theory about the set of situation mentions that are present in a
message. See also: SituationMention

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- mentionList: List of mentions in this set.
- linkingList: Entity linking annotations associated with this SituationMentionSet.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.SituationSet(uuid=None, metadata=None, situationList=None, linkingList=None)

Bases: object


A theory about the set of situations that are present in a
message. See also: Situation

Attributes:
- uuid: Unique identifier for this set.
- metadata: Information about where this set came from.
- situationList: List of mentions in this set.
- linkingList: Entity linking annotations associated with this SituationSet.

read(iprot)
validate()
write(oprot)
class concrete.situations.ttypes.TimeML(timeMLClass=None, timeMLTense=None, timeMLAspect=None)

Bases: object


A wrapper for various TimeML annotations.

Attributes:
- timeMLClass: The TimeML class for situations representing TimeML events
- timeMLTense: The TimeML tense for situations representing TimeML events
- timeMLAspect: The TimeML aspect for situations representing TimeML events

read(iprot)
validate()
write(oprot)
Module contents

concrete.spans package

Submodules
concrete.spans.constants module
concrete.spans.ttypes module
class concrete.spans.ttypes.AudioSpan(start=None, ending=None)

Bases: object


A span of audio within a single communication, identified by a
pair of time offests. Time offsets are zero-based.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

Attributes:
- start: Start time (in seconds)
- ending: End time (in seconds)

read(iprot)
validate()
write(oprot)
class concrete.spans.ttypes.TextSpan(start=None, ending=None)

Bases: object


A span of text within a single communication, identified by a pair
of zero-indexed character offsets into a Thrift string. Thrift strings
are encoded using UTF-8:
The offsets are character-based, not byte-based - a character with a
three-byte UTF-8 representation only counts as one character.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

Attributes:
- start: Start character, inclusive.
- ending: End character, exclusive

read(iprot)
validate()
write(oprot)
Module contents

concrete.structure package

Submodules
concrete.structure.constants module
concrete.structure.ttypes module
class concrete.structure.ttypes.Arc(src=None, dst=None, token=None, weight=None)

Bases: object


Type for arcs. For epsilon edges, leave ‘token’ blank.

Attributes:
- src
- dst
- token
- weight

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Constituent(id=None, tag=None, childList=None, headChildIndex=-1, start=None, ending=None)

Bases: object


A single parse constituent (or “phrase”).

Attributes:
- id: A parse-relative identifier for this consistuent. Together
with the UUID for a Parse, this can be used to define
pointers to specific constituents.
- tag: A description of this constituency node, e.g. the category “NP”.
For leaf nodes, this should be a word and for pre-terminal nodes
this should be a POS tag.
- childList
- headChildIndex: The index of the head child of this constituent. I.e., the
head child of constituent <tt>c</tt> is
<tt>c.children[c.head_child_index]</tt>. A value of -1
indicates that no child head was identified.
- start: The first token (inclusive) of this constituent in the
parent Tokenization. Almost certainly should be populated.
- ending: The last token (exclusive) of this constituent in the
parent Tokenization. Almost certainly should be populated.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.ConstituentRef(parseId=None, constituentIndex=None)

Bases: object


A reference to a Constituent within a Parse.

Attributes:
- parseId: The UUID of the Parse that this Constituent belongs to.
- constituentIndex: The index in the constituent list of this Constituent.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Dependency(gov=-1, dep=None, edgeType=None)

Bases: object


A syntactic edge between two tokens in a tokenized sentence.

Attributes:
- gov: The governor or the head token. 0 indexed.
- dep: The dependent token. 0 indexed.
- edgeType: The relation that holds between gov and dep.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.DependencyParse(uuid=None, metadata=None, dependencyList=None, structureInformation=None)

Bases: object


Represents a dependency parse with typed edges.

Attributes:
- uuid
- metadata
- dependencyList
- structureInformation

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.DependencyParseStructure(isAcyclic=None, isConnected=None, isSingleHeaded=None, isProjective=None)

Bases: object


Information about the structure of a dependency parse.
This information is computable from the list of dependencies,
but this allows the consumer to make (verified) assumptions
about the dependencies being processed.

Attributes:
- isAcyclic: True iff there are no cycles in the dependency graph.
- isConnected: True iff the dependency graph forms a single connected component.
- isSingleHeaded: True iff every node in the dependency parse has at most
one head/parent/governor.
- isProjective: True iff there are no crossing edges in the dependency parse.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.LatticePath(weight=None, tokenList=None)

Bases: object


Attributes:
- weight
- tokenList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Parse(uuid=None, metadata=None, constituentList=None)

Bases: object


A theory about the syntactic parse of a sentence.


ote If we add support for parse forests in the future, then it
will most likely be done by adding a new field (e.g.
“<tt>forest_root</tt>”) that uses a new struct type to encode the
forest. A “<tt>kind</tt>” field might also be added (analogous to
<tt>Tokenization.kind</tt>) to indicate whether a parse is encoded
using a simple tree or a parse forest.

Attributes:
- uuid
- metadata
- constituentList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Section(uuid=None, sentenceList=None, textSpan=None, rawTextSpan=None, audioSpan=None, kind=None, label=None, numberList=None, lidList=None)

Bases: object


A single “section” of a communication, such as a paragraph. Each
section is defined using a text or audio span, and can optionally
contain a list of sentences.

Attributes:
- uuid: The unique identifier for this section.
- sentenceList: The sentences of this “section.”
- textSpan: Location of this section in the communication text.

NOTE: This text span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- rawTextSpan: Location of this section in the raw text.

NOTE: This text span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- audioSpan: Location of this section in the original audio.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- kind: A short, sometimes corpus-specific term characterizing the nature
of the section; may change in a future version of concrete. This
often acts as a coarse-grained descriptor that is used for
filtering. For example, Gigaword uses the section kind “passage”
to distinguish content-bearing paragraphs in the body of an
article from other paragraphs, such as the headline and dateline.
- label: The name of the section. For example, a title of a section on
Wikipedia.
- numberList: Position within the communication with respect to other Sections:
The section number, E.g., 3, or 3.1, or 3.1.2, etc. Aimed at
Communications with content organized in a hierarchy, such as a Book
with multiple chapters, then sections, then paragraphs. Or even a
dense Wikipedia page with subsections. Sections should still be
arranged linearly, where reading these numbers should not be required
to get a start-to-finish enumeration of the Communication’s content.
- lidList: An optional field to be used for multi-language documents.

This field should be populated when a section is inside of
a document that contains multiple languages.

Minimally, each block of text in one language should be it’s own
section. For example, if a paragraph is in English and the
paragraph afterwards is in French, these should be separated into
two different sections, allowing language-specific analytics to
run on appropriate sections.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Sentence(uuid=None, tokenization=None, textSpan=None, rawTextSpan=None, audioSpan=None)

Bases: object


A single sentence or utterance in a communication.

Attributes:
- uuid
- tokenization: Theory about the tokens that make up this sentence. For text
communications, these tokenizations will typically be generated
by a tokenizer. For audio communications, these tokenizations
will typically be generated by an automatic speech recognizer.

The “Tokenization” message type is also used to store the output
of machine translation systems and text normalization
systems.
- textSpan: Location of this sentence in the communication text.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- rawTextSpan: Location of this sentence in the raw text.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.
- audioSpan: Location of this sentence in the original audio.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

read(iprot)
validate()
write(oprot)

Bases: object


A collection of tokens that represent a link to another resource.
This resource might be another Concrete object (e.g., another
Concrete Communication), represented with the ‘concreteTarget’
field, or it could link to a resource outside of Concrete via the
‘externalTarget’ field.

Attributes:
- tokens: The tokens that make up this SpanLink object.
- concreteTarget
- externalTarget
- linkType

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TaggedToken(tokenIndex=None, tag=None, confidence=None, tagList=None, confidenceList=None)

Bases: object


Attributes:
- tokenIndex: A pointer to the token being tagged.

Token indices are 0-based. These indices are also 0-based.
- tag: A string containing the annotation.
If the tag set you are using is not case sensitive,
then all part of speech tags should be normalized to upper case.
- confidence: Confidence of the annotation.
- tagList: A list of strings that represent a distribution of possible
tags for this token.

If populated, the ‘tag’ field should also be populated
with the “best” value from this list.
- confidenceList: A list of doubles that represent confidences associated with
the tags in the ‘tagList’ field.

If populated, the ‘confidence’ field should also be populated
with the confidence associated with the “best” tag in ‘tagList’.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Token(tokenIndex=None, text=None, textSpan=None, rawTextSpan=None, audioSpan=None)

Bases: object


A single token (typically a word) in a communication. The exact
definition of what counts as a token is left up to the tools that
generate token sequences.

Usually, each token will include at least a text string.

Attributes:
- tokenIndex: A 0-based tokenization-relative identifier for this token that
represents the order that this token appears in the
sentence. Together with the UUID for a Tokenization, this can be
used to define pointers to specific tokens. If a Tokenization
object contains multiple Token objects with the same id (e.g., in
different n-best lists), then all of their other fields must be
identical as well.
- text: The text associated with this token.
Note - we may have a destructive tokenizer (e.g., Stanford rewriting)
and as a result, we want to maintain this field.
- textSpan: Location of this token in this perspective’s text (.text field).
In cases where this token does not correspond directly with any
text span in the text (such as word insertion during MT),
this field may be given a value indicating “approximately” where
the token comes from. A span covering the entire sentence may be
used if no more precise value seems appropriate.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the document, but is the annotation’s best
effort at such a representation.
- rawTextSpan: Location of this token in the original, raw text (.originalText
field). In cases where this token does not correspond directly
with any text span in the original text (such as word insertion
during MT), this field may be given a value indicating
“approximately” where the token comes from. A span covering the
entire sentence may be used if no more precise value seems
appropriate.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original raw document, but is the annotation’s best
effort at such a representation.
- audioSpan: Location of this token in the original audio.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenLattice(startState=0, endState=0, arcList=None, cachedBestPath=None)

Bases: object


A lattice structure that assigns scores to a set of token
sequences. The lattice is encoded as an FSA, where states are
identified by integers, and each arc is annotated with an
optional tokens and a weight. (Arcs with no tokens are
“epsilon” arcs.) The lattice has a single start state and a
single end state. (You can use epsilon edges to simulate
multiple start states or multiple end states, if desired.)

The score of a path through the lattice is the sum of the weights
of the arcs that make up that path. A path with a lower score
is considered “better” than a path with a higher score.

If possible, path scores should be negative log likelihoods
(with base e – e.g. if P=1, then weight=0; and if P=0.5, then
weight=0.693). Furthermore, if possible, the path scores should
be globally normalized (i.e., they should encode probabilities).
This will allow for them to be combined with other information
in a reasonable way when determining confidences for system
outputs.

TokenLattices should never contain any paths with cycles. Every
arc in the lattice should be included in some path from the start
state to the end state.

Attributes:
- startState
- endState
- arcList
- cachedBestPath

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenList(tokenList=None)

Bases: object


A wrapper around a list of tokens.

Attributes:
- tokenList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenRefSequence(tokenIndexList=None, anchorTokenIndex=-1, tokenizationId=None, textSpan=None, rawTextSpan=None, audioSpan=None)

Bases: object


A list of pointers to tokens that all belong to the same
tokenization.

Attributes:
- tokenIndexList: The tokenization-relative identifiers for each token that is
included in this sequence.
- anchorTokenIndex: An optional field that can be used to describe
the root of a sentence (if this sequence is a full sentence),
the head of a constituent (if this sequence is a constituent),
or some other form of “canonical” token in this sequence if,
for instance, it is not easy to map this sequence to a another
annotation that has a head.

This field is defined with respect to the Tokenization given
by tokenizationId, and not to this object’s tokenIndexList.
- tokenizationId: The UUID of the tokenization that contains the tokens.
- textSpan: The text span in the main text (.text field) associated with this
TokenRefSequence.

NOTE: This span represents a best guess, or ‘provenance’: it
cannot be guaranteed that this text span matches the _exact_ text
of the original document, but is the annotation’s best effort at
such a representation.
- rawTextSpan: The text span in the original text (.originalText field)
associated with this TokenRefSequence.

NOTE: This span represents a best guess, or ‘provenance’: it
cannot be guaranteed that this text span matches the _exact_ text
of the original raw document, but is the annotation’s best effort
at such a representation.
- audioSpan: The audio span associated with this TokenRefSequence.

NOTE: This span represents a best guess, or ‘provenance’:
it cannot be guaranteed that this text span matches the _exact_
text of the original document, but is the annotation’s best
effort at such a representation.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenTagging(uuid=None, metadata=None, taggedTokenList=None, taggingType=None)

Bases: object


A theory about some token-level annotation.
The TokenTagging consists of a mapping from tokens
(using token ids) to string tags (e.g. part-of-speech tags or lemmas).

The mapping defined by a TokenTagging may be partial –
i.e., some tokens may not be assigned any part of speech tags.

For lattice tokenizations, you may need to create multiple
part-of-speech taggings (for different paths through the lattice),
since the appropriate tag for a given token may depend on the path
taken. For example, you might define a separate
TokenTagging for each of the top K paths, which leaves all
tokens that are not part of the path unlabeled.

Currently, we use strings to encode annotations. In
the future, we may add fields for encoding specific tag sets
(eg treebank tags), or for adding compound tags.

Attributes:
- uuid: The UUID of this TokenTagging object.
- metadata: Information about where the annotation came from.
This should be used to tell between gold-standard annotations
and automatically-generated theories about the data
- taggedTokenList: The mapping from tokens to annotations.
This may be a partial mapping.
- taggingType: An ontology-backed string that represents the
type of token taggings this TokenTagging object
produces.

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.Tokenization(uuid=None, metadata=None, tokenList=None, lattice=None, kind=None, tokenTaggingList=None, parseList=None, dependencyParseList=None, spanLinkList=None)

Bases: object


A theory (or set of alternative theories) about the sequence of
tokens that make up a sentence.

This message type is used to record the output of not just for
tokenizers, but also for a wide variety of other tools, including
machine translation systems, text normalizers, part-of-speech
taggers, and stemmers.

Each Tokenization is encoded using either a TokenList
or a TokenLattice. (If you want to encode an n-best list, then
you should store it as n separate Tokenization objects.) The
“kind” field is used to indicate whether this Tokenization contains
a list of tokens or a TokenLattice.

The confidence value for each sequence is determined by combining
the confidence from the “metadata” field with confidence
information from individual token sequences as follows:

<ul>
<li> For n-best lists:
metadata.confidence </li>
<li> For lattices:
metadata.confidence * exp(-sum(arc.weight)) </li>
</ul>

Note: in some cases (such as the output of a machine translation
tool), the order of the tokens in a token sequence may not
correspond with the order of their original text span offsets.

Attributes:
- uuid
- metadata: Information about where this tokenization came from.
- tokenList: A wrapper around an ordered list of the tokens in this tokenization.
This may also give easy access to the “reconstructed text” associated
with this tokenization.
This field should only have a value if kind==TOKEN_LIST.
- lattice: A lattice that compactly describes a set of token sequences that
might make up this tokenization. This field should only have a
value if kind==LATTICE.
- kind: Enumerated value indicating whether this tokenization is
implemented using an n-best list or a lattice.
- tokenTaggingList
- parseList
- dependencyParseList
- spanLinkList

read(iprot)
validate()
write(oprot)
class concrete.structure.ttypes.TokenizationKind

Bases: object


Enumerated types of Tokenizations

TOKEN_LATTICE = 2
TOKEN_LIST = 1
Module contents

concrete.twitter package

Submodules
concrete.twitter.constants module
concrete.twitter.ttypes module
class concrete.twitter.ttypes.BoundingBox(type=None, coordinateList=None)

Bases: object


Attributes:
- type
- coordinateList

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.HashTag(text=None, startOffset=None, endOffset=None)

Bases: object


Attributes:
- text
- startOffset
- endOffset

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.PlaceAttributes(streetAddress=None, region=None, locality=None)

Bases: object


Attributes:
- streetAddress
- region
- locality

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TweetInfo(id=None, text=None, createdAt=None, user=None, truncated=None, entities=None, source=None, coordinates=None, place=None, favorited=None, retweeted=None, retweetCount=None, inReplyToScreenName=None, inReplyToStatusId=None, inReplyToUserId=None, retweetedScreenName=None, retweetedStatusId=None, retweetedUserId=None)

Bases: object


Attributes:
- id
- text
- createdAt
- user
- truncated
- entities
- source
- coordinates
- place
- favorited
- retweeted
- retweetCount
- inReplyToScreenName
- inReplyToStatusId
- inReplyToUserId
- retweetedScreenName
- retweetedStatusId
- retweetedUserId

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterCoordinates(type=None, coordinates=None)

Bases: object


Attributes:
- type
- coordinates

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterEntities(hashtagList=None, urlList=None, userMentionList=None)

Bases: object


Attributes:
- hashtagList
- urlList
- userMentionList

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterLatLong(latitude=None, longitude=None)

Bases: object


A twitter geocoordinate.

Attributes:
- latitude
- longitude

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterPlace(placeType=None, countryCode=None, country=None, fullName=None, name=None, id=None, url=None, boundingBox=None, attributes=None)

Bases: object


Attributes:
- placeType
- countryCode
- country
- fullName
- name
- id
- url
- boundingBox
- attributes

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.TwitterUser(id=None, name=None, screenName=None, lang=None, geoEnabled=None, createdAt=None, friendsCount=None, statusesCount=None, verified=None, listedCount=None, favouritesCount=None, followersCount=None, location=None, timeZone=None, description=None, utcOffset=None, url=None)

Bases: object


Information about a Twitter user.

Attributes:
- id
- name
- screenName
- lang
- geoEnabled
- createdAt
- friendsCount
- statusesCount
- verified
- listedCount
- favouritesCount
- followersCount
- location
- timeZone
- description
- utcOffset
- url

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.URL(startOffset=None, endOffset=None, expandedUrl=None, url=None, displayUrl=None)

Bases: object


Attributes:
- startOffset
- endOffset
- expandedUrl
- url
- displayUrl

read(iprot)
validate()
write(oprot)
class concrete.twitter.ttypes.UserMention(startOffset=None, endOffset=None, screenName=None, name=None, id=None)

Bases: object


Attributes:
- startOffset
- endOffset
- screenName
- name
- id

read(iprot)
validate()
write(oprot)
Module contents

concrete.util package

Submodules
concrete.util.access module
class concrete.util.access.CommunicationContainerFetchHandler(communication_container)

Bases: object

FetchCommunicationService implementation using Communication containers

Implements the FetchCommunicationService interface, retrieving Communications from a dict-like communication_container object that maps Communication ID strings to Communications. The communication_container could be an actual dict, or a container such as:

Usage:

from concrete.util.access_wrapper import FetchCommunicationServiceWrapper

handler = CommunicationContainerFetchHandler(comm_container)
fetch_service = FetchCommunicationServiceWrapper(handler)
fetch_service.serve(host, port)
Parameters:communication_container – Dict-like object that maps Communication IDs to Communications
about()
alive()
fetch(fetch_request)
getCommunicationCount()
getCommunicationIDs(offset, count)
class concrete.util.access.DirectoryBackedStoreHandler(store_path)

Bases: object

Simple StoreCommunicationService implementation using a directory

Implements the StoreCommunicationService interface, storing Communications in a directory.

Parameters:store_path – Path where Communications should be Stored
about()
alive()
store(communication)

Save Communication to a directory

Stored Communication files will be named [COMMUNICATION_ID].comm. If a file with that name already exists, it will be overwritten.

class concrete.util.access.RelayFetchHandler(host, port)

Bases: object

Implements a ‘relay’ to another FetchCommunicationService server.

A FetchCommunicationService that acts as a relay to a second FetchCommunicationService, where the second service is using the TSocket transport and TCompactProtocol protocol.

This class was designed for the use case where you have Thrift JavaScript code that needs to communicate with a FetchCommunicationService server, but the server does not support the same Thrift serialization protocol as the JavaScript client.

The de-facto standard for Concrete services is to use the TCompactProtocol serialization protocol over a TSocket connection. But as of Thrift 0.10.0, the Thrift JavaScript libraries only support using TJSONProtocol over HTTP.

The RelayFetchHandler class is intended to be used as server-side code by a web application. The JavaScript code will make FetchCommunicationService RPC calls to the web server using HTTP/TJSONProtocol, and the web application will then pass these RPC calls to another FetchCommunicationService using TSocket/TCompactProtocol RPC calls.

Parameters:
about()
alive()
fetch(request)
getCommunicationCount()
getCommunicationIDs(offset, count)
concrete.util.access_wrapper module
class concrete.util.access_wrapper.FetchCommunicationClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/FetchCommunicationService.pyc'>
class concrete.util.access_wrapper.FetchCommunicationServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/FetchCommunicationService.pyc'>
class concrete.util.access_wrapper.StoreCommunicationClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/StoreCommunicationService.pyc'>
class concrete.util.access_wrapper.StoreCommunicationServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/StoreCommunicationService.pyc'>
class concrete.util.access_wrapper.SubprocessFetchCommunicationServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of FetchCommunicationServiceWrapper

class concrete.util.access_wrapper.SubprocessStoreCommunicationServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of StoreCommunicationServiceWrapper

concrete.util.annotate_wrapper module
class concrete.util.annotate_wrapper.AnnotateCommunicationClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/annotate/AnnotateCommunicationService.pyc'>
class concrete.util.annotate_wrapper.AnnotateCommunicationServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/annotate/AnnotateCommunicationService.pyc'>
class concrete.util.annotate_wrapper.SubprocessAnnotateCommunicationServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of AnnotateCommunicationServiceWrapper

concrete.util.comm_container module

Communication Containers - mapping Communication IDs to Communications

Classes that behave like a read-only dictionary (implementing Python’s collections.Mapping interface) and map Communication ID strings to Communications.

The classes abstract away the storage backend. If you need to optimize for performance, you may not want to use a dictionary abstraction that retrieves one Communication at a time.

class concrete.util.comm_container.DirectoryBackedCommunicationContainer(directory_path, comm_extensions=[u'.comm', u'.concrete', u'.gz'])

Bases: _abcoll.Mapping

Maps Comm IDs to Comms, retrieving Comms from the filesystem

DirectoryBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from the filesystem.

Upon initialization, a DirectoryBackedCommunicationContainer instance will (recursively) search directory_path for any files that end with the specified comm_extensions. Files with matching extensions are assumed to be Communication files whose filename (sans extension) is the file’s Communication ID. So, for example, a file named ‘XIN_ENG_20101212.0120.concrete’ is assumed to be a Communication file with a Communication ID of ‘XIN_ENG_20101212.0120’.

Files with the extension .gz will be decompressed using gzip.

A DirectoryBackedCommunicationsContainer will not be able to find any files that are added to directory_path after the container was initialized.

Parameters:
  • directory_path (str) – Path to directory containing Communications files
  • comm_extensions (str[]) – List of strings specifying filename extensions to be associated with Communications
class concrete.util.comm_container.FetchBackedCommunicationContainer(host, port)

Bases: _abcoll.Mapping

Maps Comm IDs to Comms, retrieving Comms from a FetchCommunicationService server

FetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a FetchCommunicationService.

If you need to retrieve large amounts of data from a FetchCommunicationService, then you SHOULD NOT USE THIS CLASS. This class retrieves one Communication at a time using FetchCommunicationService.

Parameters:
class concrete.util.comm_container.MemoryBackedCommunicationContainer(communications_file, max_file_size=1073741824)

Bases: _abcoll.Mapping

Maps Comm IDs to Comms by loading all Comms in file into memory

FetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. All Communications in communications_file will be read into memory using a CommunicationReader instance.

Parameters:
  • communications_file (str) – String specifying name of Communications file
  • max_file_size (int) – Maximum file size, in bytes
class concrete.util.comm_container.RedisHashBackedCommunicationContainer(redis_db, key)

Bases: _abcoll.Mapping

Maps Comm IDs to Comms, retrieving Comms from a Redis hash

RedisHashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a Redis hash.

Parameters:
  • redis_db (redis.Redis) – redis database connection
  • key (str) – Key in redis database where hash is located
class concrete.util.comm_container.ZipFileBackedCommunicationContainer(zipfile_path, comm_extensions=[u'.comm', u'.concrete'])

Bases: _abcoll.Mapping

Maps Comm IDs to Comms, retrieving Comms from a Zip file

ZipFileBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a Zip file.

Parameters:
  • zipfile_path (str) – Path to Zip file containing Communications
  • comm_extensions (str[]) – List of strings specifying filename extensions associated with Communications
concrete.util.concrete_uuid module

Helper functions for generating Concrete UUID objects

class concrete.util.concrete_uuid.AnalyticUUIDGeneratorFactory(comm=None)

Bases: object

Factory for a compressible UUID generator.

One factory should be created per Communication, and a new generator should be created from that factory for each analytic processing the communication. Usually each program represents a single analytic, so common usage is:

augf = AnalyticUUIDGeneratorFactory(comm)
aug = augf.create()
for <each annotation object created by this analytic>:
    annotation = next(aug)
    <add annotation to communication>

or if you’re creating a new Communication:

augf = AnalyticUUIDGeneratorFactory()
aug = augf.create()
comm = <create communication>
comm.uuid = next(aug)
for <each annotation object created by this analytic>:
    annotation = next(aug)
    <add annotation to communication>

where the annotation objects might be objects of type Parse, DependencyParse, TokenTagging, CommunicationTagging, etc.

create()
Returns:A UUID generator for a new analytic.
class concrete.util.concrete_uuid.UUIDClustering(comm)

Bases: object

Representation of the UUID instance clusters in a concrete communication (each cluster represents the set of nested members of the communication that reference or are identified by a given UUID).

hashable_clusters()

Hashable version of UUIDClustering.

Two UUIDClusterings c1 and c2 are equivalent (the two underlying Communications’ UUID structures are equivalent) if and only if:

c1.hashable_clusters() == c2.hashable_clusters()
Returns:The set of unlabeled UUID clusters in a unique and hashable format.
class concrete.util.concrete_uuid.UUIDCompressor(single_analytic=False)

Bases: object

compress(comm)
Parameters:comm (Communication) –
Returns:Deep copy of comm with compressed UUIDs
Return type:Communication
concrete.util.concrete_uuid.bin_to_hex(b, n=None)
concrete.util.concrete_uuid.compress_uuids(comm, verify=False, single_analytic=False)

Create a copy of Communication comm with UUIDs converted according to the compressible UUID scheme

Parameters:
  • comm (Communication) –
  • verify (bool) – If True, use a heuristic to verify the UUID link structure is preserved in the new Communication
  • single_analytic (bool) – If True, use a single analytic prefix for all UUIDs in comm.
Returns:

A 2-tuple containing the new Communication (converted using the compressible UUID scheme) and the UUIDCompressor object used to perform the conversion.

Raises:

ValueError – If verify is True and comm has references added, raise because verification would cause an infinite loop.

concrete.util.concrete_uuid.generate_UUID()

Helper function for generating a Concrete UUID object

Returns:Concrete UUID object
Return type:UUID
concrete.util.concrete_uuid.generate_hex_unif(n)
concrete.util.concrete_uuid.generate_uuid_unif()
concrete.util.concrete_uuid.hex_to_bin(h)
concrete.util.concrete_uuid.join_uuid(xs, ys, zs)
concrete.util.concrete_uuid.split_uuid(u)
concrete.util.file_io module

Code for reading and writing Concrete Communications

class concrete.util.file_io.CommunicationReader(filename, add_references=True, filetype=0)

Bases: concrete.util.file_io.ThriftReader

Iterator/generator class for reading one or more Communications from a file

The iterator returns a (Communication, filename) tuple

Supported filetypes are:

  • a file with a single Communication
  • a file with multiple Communications concatenated together
  • a gzipped file with a single Communication
  • a gzipped file with multiple Communications concatenated together
  • a .tar.gz file with one or more Communications
  • a .zip file with one or more Communications

Sample usage:

for (comm, filename) in CommunicationReader('multiple_comms.tar.gz'):
    do_something(comm)
Parameters:
class concrete.util.file_io.CommunicationWriter(filename=None)

Bases: object

Class for writing one or more Communications to a file

Sample usage:

writer = CommunicationWriter('foo.concrete')
writer.write(existing_comm_object)
writer.close()
close()
open(filename)
Parameters:filename (str) –
write(comm)
Parameters:comm (Communication) –
class concrete.util.file_io.CommunicationWriterTGZ(tar_filename=None)

Bases: concrete.util.file_io.CommunicationWriterTar

Class for writing one or more Communications to a .TAR.GZ archive

Sample usage:

writer = CommunicationWriterTGZ('multiple_comms.tgz')
writer.write(comm_object_one, 'comm_one.concrete')
writer.write(comm_object_two, 'comm_two.concrete')
writer.write(comm_object_three, 'comm_three.concrete')
writer.close()
class concrete.util.file_io.CommunicationWriterTar(tar_filename=None, gzip=False)

Bases: object

Class for writing one or more Communications to a .TAR archive

Sample usage:

writer = CommunicationWriterTar('multiple_comms.tar')
writer.write(comm_object_one, 'comm_one.concrete')
writer.write(comm_object_two, 'comm_two.concrete')
writer.write(comm_object_three, 'comm_three.concrete')
writer.close()
Parameters:
  • tar_filename (str) – If a filename is given, open() will be called with the filename
  • gzip (bool) – Flag indicating if .TAR file should be compressed with gzip
close()
open(tar_filename)
Parameters:tar_filename (str) –
write(comm, comm_filename=None)
Parameters:
class concrete.util.file_io.ThriftReader(thrift_type, filename, postprocess=None, filetype=0)

Bases: object

Iterator/generator class for reading one or more Thrift structures from a file

The iterator returns a (obj, filename) tuple where obj is an object of type thrift_type.

Supported filetypes are:

  • a file with a single Thrift structure
  • a file with multiple Thrift structures concatenated together
  • a gzipped file with a single Thrift structure
  • a gzipped file with multiple Thrift structures concatenated together
  • a .tar.gz file with one or more Thrift structures
  • a .zip file with one or more Thrift structures

Sample usage:

for (comm, filename) in ThriftReader(Communication,
                                     'multiple_comms.tar.gz'):
    do_something(comm)
Parameters:
  • thrift_type – Class for Thrift type, e.g. Communication, TokenLattice
  • filename (str) –
  • postprocess (function) – A post-processing function that is called with the Thrift object as argument each time a Thrift object is read from the file
  • filetype (FileType) – Expected type of file. Default value is FileType.AUTO, where function will try to automatically determine file type.
next()
concrete.util.file_io.read_communication_from_file(communication_filename, add_references=True)

Read a Communication from the file specified by filename

Parameters:
Returns:

Return type:

Communication

concrete.util.file_io.read_thrift_from_file(thrift_obj, filename)

Instantiate Thrift object from contents of named file

The Thrift file is assumed to be encoded using TCompactProtocol

WARNING - Thrift deserialization tends to fail silently. For example, the Thrift libraries will not complain if you try to deserialize data from the file /dev/urandom.

Parameters:
  • thrift_obj – A Thrift object (e.g. a Communication object)
  • filename (str) – A filename string
Returns:

The Thrift object that was passed in as an argument

concrete.util.file_io.read_tokenlattice_from_file(tokenlattice_filename)

Read a TokenLattice from a file

Parameters:tokenlattice_filename (str) – Name of file containing serialized TokenLattice
Returns:
Return type:TokenLattice
concrete.util.file_io.write_communication_to_file(communication, communication_filename)

Write a Communication to a file

Parameters:
  • communication (Communication) –
  • communication_filename (str) –
concrete.util.file_io.write_thrift_to_file(thrift_obj, filename)

Write a Thrift object to a file

Parameters:
  • thrift_obj
  • filename (str) –
concrete.util.json_fu module

Convert Concrete objects to JSON strings

concrete.util.json_fu.communication_file_to_json(communication_filename, remove_timestamps=False, remove_uuids=False)

Get a “pretty-printed” JSON string representation for a Communication

Parameters:
  • communication_filename (str) – Communication filename
  • remove_timestamps (bool) – Flag for removing timestamps from JSON output
  • remove_uuids (bool) – Flag for removing UUID info from JSON output
Returns:

A “pretty-printed” JSON representation of the Communication

Return type:

str

concrete.util.json_fu.get_json_object_without_timestamps(json_object)

Create a copy of a JSON object created by json.loads(), with all representations of AnnotationMetadata timestamps (dictionary keys with value timestamp) recursively removed.

Parameters:json_object – Python object created from string by json.loads()
Returns:A copy of the input data structure with all timestamp objects removed
concrete.util.json_fu.get_json_object_without_uuids(json_object)

Create a copy of a JSON object created by json.loads(), with all representations of UUID objects (dictionaries containing a ‘uuidString’ key) recursively removed.

Parameters:json_object – Python object created from string by json.loads()
Returns:A copy of the input data structure with all UUID objects removed
concrete.util.json_fu.thrift_to_json(tobj, remove_timestamps=False, remove_uuids=False)

Get a “pretty-printed” JSON string representation for a Thrift object

Parameters:
  • tobj – A Thrift object
  • remove_timestamps (bool) – Flag for removing timestamps from JSON output
  • remove_uuids (bool) – Flag for removing UUID info from JSON output
Returns:

A “pretty-printed” JSON representation of the Thrift object

Return type:

str

concrete.util.json_fu.tokenlattice_file_to_json(toklat_filename)

Get a “pretty-printed” JSON string representation for a TokenLattice

Parameters:toklat_filename (str) – String specifying TokenLattice filename
Returns:A “pretty-printed” JSON representation of the TokenLattice
Return type:str
concrete.util.learn_wrapper module
class concrete.util.learn_wrapper.ActiveLearnerClientClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerClientService.pyc'>
class concrete.util.learn_wrapper.ActiveLearnerClientServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerClientService.pyc'>
class concrete.util.learn_wrapper.ActiveLearnerServerClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerServerService.pyc'>
class concrete.util.learn_wrapper.ActiveLearnerServerServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerServerService.pyc'>
class concrete.util.learn_wrapper.SubprocessActiveLearnerClientServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of ActiveLearnerClientServiceWrapper

class concrete.util.learn_wrapper.SubprocessActiveLearnerServerServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of ActiveLearnerServerServiceWrapper

concrete.util.locale module
concrete.util.locale.set_stdout_encoding()
concrete.util.mem_io module
concrete.util.mem_io.communication_deep_copy(comm)

Return deep copy of communication.

concrete.util.mem_io.read_communication_from_buffer(buf, add_references=True)

Deserialize buf (a binary string) and return resulting communication. Add references if requested.

concrete.util.mem_io.write_communication_to_buffer(comm)

Serialize communication to buffer (binary string) and return buffer.

concrete.util.metadata module
concrete.util.metadata.datetime_to_timestamp(dt)

Source: http://stackoverflow.com/questions/6999726/how-can-i-convert-a-datetime-object-to-milliseconds-since-epoch-unix-time-in-p

concrete.util.metadata.get_index_of_tool(lst_of_conc, tool)

Return the index of the object in the provided list whose tool name matches tool.

If tool is None, return the first valid index into lst_of_conc.

This returns -1 if:
  • lst_of_conc is None, or
  • lst_of_conc has no entries, or
  • no object in lst_of_conc matches tool.

Args:

  • lst_of_conc: A list of Concrete objects, each of which has a .metadata field.
  • tool: A tool name to match.
concrete.util.metadata.now_timestamp()

Return timestamp representing the current time.

concrete.util.net module
concrete.util.net.find_port()

Find and return an available TCP port.

>>> find_port() > 1023
True
concrete.util.redis_io module
class concrete.util.redis_io.RedisCommunicationReader(redis_db, key, add_references=True, **kwargs)

Bases: concrete.util.redis_io.RedisReader

Iterable class for reading one or more Communications from redis. See RedisReader for further description.

Example usage:

from redis import Redis
redis_db = Redis(port=12345)
for comm in RedisCommunicationReader(redis_db, 'my-set-key'):
    do_something(comm)

Create communication reader for specified key in specified redis_db.

Parameters:
  • redis_db – object of class redis.Redis
  • key – name of redis key containing your communication(s)
  • add_references – boolean, True to fill in members in the communication according to UUID relationships (see concrete.util.add_references), False to return communication as-is (note: you may need this False if you are dealing with incomplete communications)

All other keyword arguments are passed through to RedisReader.

class concrete.util.redis_io.RedisCommunicationWriter(redis_db, key, uuid_hash_key=False, **kwargs)

Bases: concrete.util.redis_io.RedisWriter

Class for writing one or more Communications to redis. See RedisWriter for further description.

Example usage:

from redis import Redis redis_db = Redis(port=12345) w = RedisCommunicationWriter(redis_db, ‘my-set-key’) w.write(comm)

Create communication writer for specified key in specified redis_db.

Parameters:
  • redis_db – object of class redis.Redis
  • key – name of redis key containing your communication(s)
  • uuid_hash_key – boolean, True to use the UUID as the hash key for a communication, False to use the id
class concrete.util.redis_io.RedisReader(redis_db, key, key_type=None, pop=False, block=False, right_to_left=True, block_timeout=0, temp_key_ttl=3600, temp_key_leaf_len=32, cycle_list=False, deserialize_func=None)

Bases: object

Iterable class for reading one or more objects from redis.

Supported input types are:

  • a set containing zero or more objects
  • a list containing zero or more objects
  • a hash containing zero or more key-object pairs

For list and set types, the reader can optionally pop (consume) its input; for lists only, the reader can moreover block on the input.

Note that iteration over a set or hash will create a temporary key in the redis database to maintain a set of elements scanned so far.

If pop is False and the key (in the database) is modified during iteration, behavior is undefined. If pop is True, modifications during iteration are encouraged.

Example usage:

from redis import Redis
redis_db = Redis(port=12345)
for obj in RedisReader(redis_db, 'my-set-key'):
    do_something(obj)

Create reader for specified key in specified redis_db.

Parameters:
  • redis_db – object of class redis.Redis
  • key – name of redis key containing your object(s)
  • key_type – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists, so probably not suitable for block and/or pop modes)
  • pop – boolean, True to remove objects from redis as we iterate over them, and False to leave redis unaltered
  • block – boolean, True to block for data (i.e., wait for something to be added to the list if it is empty), False to end iteration when there is no more data
  • right_to_left – boolean, True to iterate over and index in lists from right to left, False to iterate/index from left to right
  • deserialize_func – function, maps blobs from redis to some more friendly representation (e.g., if all your items are unicode strings, you might want to specify lambda s: s.decode(‘utf-8’)); return blobs unchanged if deserialize_func is None
batch(n)

Return a batch of n objects. May be faster than one-at-a-time iteration, but currently only supported for non-popping, non-blocking set configurations. Support for popping, non-blocking sets is planned; see http://redis.io/commands/spop .

Parameters:n
class concrete.util.redis_io.RedisWriter(redis_db, key, key_type=None, right_to_left=True, serialize_func=None, hash_key_func=None)

Bases: object

Class for writing one or more objects to redis.

Supported input types are:

  • a set of objects
  • a list of objects
  • a hash of key-object pairs

Example usage:

from redis import Redis redis_db = Redis(port=12345) w = RedisWriter(redis_db, ‘my-set-key’) w.write(obj)

Create object writer for specified key in specified redis_db.

Parameters:
  • redis_db – object of class redis.Redis
  • key – name of redis key containing your object(s)
  • key_type – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists)
  • right_to_left – boolean, True to write elements to the left end of lists, False to write to the right end
  • serialize_func – function, maps objects to blobs before sending to Redis (e.g., if everything you write will be a unicode string, you might want to use lambda u: u.encode(‘utf-8’)); pass objects to Redis unchanged if serialize_func is None
  • hash_key_func – function, maps objects to keys when key_type is hash (None: use Python’s hash function)
clear()
write(obj)
concrete.util.redis_io.read_communication_from_redis_key(redis_db, key, add_references=True)

Return a serialized communication from a string key. If block is True, poll server until key appears at specified interval or until specified timeout (indefinitely if timeout is zero). Return None if block is False and key does not exist or if block is True and key does not exist after specified timeout.

Parameters:
  • redis_db
  • key
  • add_references
concrete.util.redis_io.write_communication_to_redis_key(redis_db, key, comm)

Serialize communication and store result in redis key.

concrete.util.references module

Add reference variables for each UUID “pointer” in a Communication

concrete.util.references.add_references_to_communication(comm)

Create references for each UUID ‘pointer’

Parameters:comm (Communication) – A Concrete Communication object

The Concrete schema uses UUID objects as internal pointers between Concrete objects. This function adds member variables to Concrete objects that are references to the Concrete objects identified by the UUID.

For example, each Entity has a mentionIdlist that lists the UUIDs of the EntityMention objects for that Entity. This function adds a mentionList variable to the Entity that is a list of references to the actual EntityMention objects. This allows you to access the EntityMention objects using:

entity.mentionList

This function adds these reference variables:

And adds these lists of reference variables:

For variables that represent optional lists of UUID objects (e.g. situation.mentionIdList), Python Thrift will set the variable to None if the list is not provided. When this function adds a list-of-references variable (in this case, situation.mentionList) for an omitted optional list, it sets the new variable to None - it DOES NOT leave the variable undefined.

concrete.util.results_wrapper module
class concrete.util.results_wrapper.ResultsServerClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/services/results/ResultsServerService.pyc'>
class concrete.util.results_wrapper.ResultsServerServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/services/results/ResultsServerService.pyc'>
class concrete.util.results_wrapper.SubprocessResultsServerServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of ResultsServerServiceWrapper

concrete.util.search_wrapper module
class concrete.util.search_wrapper.FeedbackClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/FeedbackService.pyc'>
class concrete.util.search_wrapper.FeedbackServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/FeedbackService.pyc'>
class concrete.util.search_wrapper.SearchClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchService.pyc'>
class concrete.util.search_wrapper.SearchProxyClientWrapper(host, port)

Bases: concrete.util.service_wrapper.ConcreteServiceClientWrapper

concrete_service_class = <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchProxyService.pyc'>
class concrete.util.search_wrapper.SearchProxyServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchProxyService.pyc'>
class concrete.util.search_wrapper.SearchServiceWrapper(implementation)

Bases: concrete.util.service_wrapper.ConcreteServiceWrapper

concrete_service_class = <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchService.pyc'>
class concrete.util.search_wrapper.SubprocessFeedbackServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of FeedbackServiceWrapper

class concrete.util.search_wrapper.SubprocessSearchProxyServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of SearchProxyServiceWrapper

class concrete.util.search_wrapper.SubprocessSearchServiceWrapper(implementation, host, port, timeout=None)

Bases: concrete.util.service_wrapper.SubprocessConcreteServiceWrapper

concrete_service_wrapper_class

alias of SearchServiceWrapper

concrete.util.service_wrapper module
class concrete.util.service_wrapper.ConcreteServiceClientWrapper(host, port)

Bases: object

class concrete.util.service_wrapper.ConcreteServiceWrapper(implementation)

Bases: object

A sample wrapper around a Concrete service.

serve(host, port)
class concrete.util.service_wrapper.SubprocessConcreteServiceWrapper(implementation, host, port, timeout=None)

Bases: object

Concrete Service wrapper that runs server in a subprocess via a context manager interface.

SLEEP_INTERVAL = 0.1
concrete.util.simple_comm module

Create a simple (valid) Communication suitable for testing purposes

class concrete.util.simple_comm.SimpleCommTempFile(n=10, id_fmt=u'temp-%d', sentence_fmt=u'Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix=u'.concrete')

Bases: object

DEPRECATED. Please use create_comm() instead.

Class representing a temporary file of sample concrete objects. Designed to facilitate testing.

path

str – path to file

communications

Communication[] – List of communications that were written to file

Usage:

from concrete.util import CommunicationReader
with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f:
    reader = CommunicationReader(f.path)
    for (orig_comm, comm_path_pair) in zip(f.communications, reader):
        print(orig_comm.id)
        print(orig_comm.id == comm_path_pair[0].id)
        print(f.path == comm_path_pair[1])

Create temp file and write communications.

Parameters:
  • n – i number of communications to write
  • id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
  • sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
  • writer_class – CommunicationWriter or CommunicationWriterTGZ
  • suffix – file path suffix (you probably want to choose this to match writer_class)
concrete.util.simple_comm.add_annotation_level_argparse_argument(parser)

Add an ‘–annotation-level’ argument to an ArgumentParser

The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See create_comm() for details.

Parameters:parser (argparse.ArgumentParser) –
concrete.util.simple_comm.create_comm(comm_id, text=u'', comm_type=u'article', section_kind=u'passage', metadata_tool=u'concrete-python', metadata_timestamp=None, annotation_level=u'token')

Create a simple, valid Communication from text.

By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections.

annotation_level controls the amount of annotation that is added:

  • AL_NONE: add no optional annotations (not even sections)
  • AL_SECTION: add sections but not sentences
  • AL_SENTENCE: add sentences but not tokens
  • AL_TOKEN: add all annotations, up to tokens (the default)
Parameters:
  • comm_id (str) –
  • text (str) –
  • comm_type (str) –
  • section_kind (str) –
  • metadata_tool (str) –
  • metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used.
  • annotation_level (str) –
Returns:

Return type:

Communication

concrete.util.simple_comm.create_section(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Section from provided text and metadata.

Lower-level routine (called by create_comm()).

Parameters:
  • sec_text (str) –
  • sec_start (int) –
  • sec_end (int) –
  • section_kind (str) –
  • aug (_AnalyticUUIDGenerator) –
  • metadata_tool (str) –
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Return type:

Section

concrete.util.simple_comm.create_sentence(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)

Create Sentence from provided text and metadata.

Lower-level routine (called indirectly by create_comm())

Parameters:
  • sen_text (str) –
  • sen_start (int) –
  • sen_end (int) –
  • aug (_AnalyticUUIDGenerator) –
  • metadata_tool (str) –
  • metadata_timestamp (int) – Time in seconds since the Epoch
  • annotation_level (str) – See create_comm() for details
Returns:

Return type:

Sentence

concrete.util.simple_comm.create_simple_comm(comm_id, sentence_string=u'Super simple sentence .')

Create a simple (valid) Communication suitable for testing purposes

The Communication will have a single Section containing a single Sentence.

Parameters:
  • comm_id (str) – Specifies a Communication ID
  • sentence_string (str) – String to be used for the sentence text. The string will be whitespace-tokenized.
Returns:

Return type:

Communication

concrete.util.thrift_factory module
class concrete.util.thrift_factory.ThriftFactory(transportFactory, protocolFactory)

Bases: object

Abstract factory to create Thrift objects for client and server.

createProtocol(transport)
createServer(processor, host, port)
createSocket(host, port)
createTransport(socket)
concrete.util.thrift_factory.is_accelerated()
concrete.util.tokenization module
exception concrete.util.tokenization.NoSuchTokenTagging(*args, **kwargs)

Bases: exceptions.Exception

concrete.util.tokenization.compute_lattice_expected_counts(lattice)

Given a TokenLattice in which the dst, src, token, and weight fields are set in each arc, compute and return a list of expected token log-probabilities.

Input arc weights are treated as unnormalized log-probabilities.

Parameters:lattice (TokenLattice) –
Returns:List of floats (expected log-probabilities) with the float at position i corresponding to the token with tokenIndex i.
concrete.util.tokenization.flatten(a)
Parameters:a (list) –
Returns:Flattened list
Return type:list
concrete.util.tokenization.get_comm_tokenizations(comm, tool=None)

Get list of Tokenization objects in a Communication

Parameters:
  • comm (Communication) –
  • tool (str) – If given, only return Tokenization objects whose metadata.tool field is equal to tool
Returns:

List of Tokenization objects

concrete.util.tokenization.get_comm_tokens(comm, sect_pred=None, suppress_warnings=False)

Get list of Token objects in Communication.

Parameters:
  • comm (Communication) –
  • sect_pred (function) – Function that takes a Section and returns false if the Section should be excluded.
  • suppress_warnings (bool) –
Returns:

List of Token objects in Communication, delegating to get_tokens() for each sentence.

concrete.util.tokenization.get_lemmas(t, tool=None)

Calls get_tagged_tokens() with a tagging_type of “LEMMA”

concrete.util.tokenization.get_ner(t, tool=None)

Calls get_tagged_tokens() with a tagging_type of “NER”

concrete.util.tokenization.get_pos(t, tool=None)

Calls get_tagged_tokens() with a tagging_type of “POS”

concrete.util.tokenization.get_tagged_tokens(tokenization, tagging_type, tool=None)

Return list of TaggedToken objects of taggingType equal to tagging_type, if there is a unique choice.

Parameters:
  • tokenization (Tokenization) –
  • tagging_type (str) –
  • tool (str) – If tool is not None, filter the candidate TokenTaggings to those whose metadata.tool field matches tool.
Returns:

List of TaggedToken objects of taggingType equal to tagging_type, if there is a unique choice.

Raises:
  • NoSuchTokenTagging – if there is no matching tagging
  • Exception – if there is more than one matching tagging.
concrete.util.tokenization.get_tokenizations(comm, tool=None)

Returns a flat list of all Tokenization objects in a Communication

Parameters:comm (Communication) –
Returns:A list of all Tokenization objects within the Communication
concrete.util.tokenization.get_tokens(tokenization, suppress_warnings=False)

Get list of Token objects for a Tokenization

Return list of Tokens from lattice.cachedBestPath, if Tokenization kind is TOKEN_LATTICE; else, return list of Tokens from tokenList.

Warn and return list of Tokens from tokenList if kind is not set.

Return None if kind is set but the respective data fields are not.

Parameters:
  • tokenization (Tokenization) –
  • suppress_warnings (bool) –
Returns:

List of Token objects, or None

concrete.util.tokenization.plus(x, y)
Returns:x + y
concrete.util.twitter module

Convert between JSON and Concrete representations of Tweets

The JSON fields used by the Twitter API are documented at:

concrete.util.twitter.capture_tweet_lid(tweet)

Attempts to capture the ‘lang’ field in the twitter API, if it exists.

Parameters:tweet (object) – Object created by deserializing a JSON Tweet string
Returns:List of LanguageIdentification objects, or None if the field is not present in the Tweet JSON
concrete.util.twitter.json_tweet_object_to_Communication(tweet)

Convert deserialized JSON Tweet object to Communication

Parameters:tweet (object) – Object created by deserializing a JSON Tweet string
Returns:
Return type:Communication
concrete.util.twitter.json_tweet_object_to_TweetInfo(tweet)

Create TweetInfo object from deserialized JSON Tweet object

Parameters:tweet (object) – Object created by deserializing a JSON Tweet string
Returns:
Return type:TweetInfo
concrete.util.twitter.json_tweet_string_to_Communication(json_tweet_string, check_empty=False, check_delete=False)

Convert JSON Tweet string to Communication

Parameters:
  • json_tweet_string (str) – JSON Tweet string from Twitter API
  • check_empty (bool) – If True, check if json_tweet_string is empty
  • check_delete (bool) – If True, check for presence of delete field in Tweet JSON, and if the ‘delete’ field is present, return None
Returns:

Return type:

Communication

concrete.util.twitter.json_tweet_string_to_TweetInfo(json_tweet_string)

Create TweetInfo object from JSON Tweet string

Parameters:tweet (object) – JSON Tweet string from Twitter API
Returns:
Return type:TweetInfo
concrete.util.twitter.snake_case_to_camelcase(value)

Converts snake case to camel case

Implementation copied from this Stack Overflow post: http://goo.gl/SSgo9k

Parameters:value (unicode) –
Returns:unicode
concrete.util.twitter.twitter_lid_to_iso639_3(twitter_lid)

Convert Twitter Language ID string to ISO639-3 code

Ref: https://dev.twitter.com/rest/reference/get/help/languages

Parameters:twitter_lid (str) – This can be an iso639-3 code (no-op), iso639-1 2-letter abbr (converted to 3), or combo (split by ‘-‘, then first part converted)
Returns:An ISO639-3 code
Return type:str
concrete.util.unnone module
concrete.util.unnone.dun(d)

If l is None return an empty dict, else return l. Simplifies iteration over dict fields that might be unset.

concrete.util.unnone.lun(l)

If l is None return an empty list, else return l. Simplifies iteration over list fields that might be unset.

concrete.util.unnone.sun(s)

If l is None return an empty set, else return l. Simplifies iteration over set fields that might be unset.

Module contents

Utility code for working with Concrete

concrete.uuid package

Submodules
concrete.uuid.constants module
concrete.uuid.ttypes module
class concrete.uuid.ttypes.UUID(uuidString=None)

Bases: object


Attributes:
- uuidString: A string representation of a UUID, in the format of:
<pre>
550e8400-e29b-41d4-a716-446655440000
</pre>

read(iprot)
validate()
write(oprot)
Module contents

Submodules

concrete.inspect module

Functions used by concrete_inspect.py to print data in a Communication.

The function implementations provide useful examples of how to interact with many different Concrete datastructures.

concrete.inspect.penn_treebank_for_parse(parse)

Get a Penn-Treebank style string for a Concrete Parse object

Parameters:parse (Parse) –
Returns:A string containing a Penn Treebank style parse tree representation
Return type:str
concrete.inspect.print_communication_taggings_for_communication(comm, tool=None)

Print information for CommunicationTagging objects

Parameters:
concrete.inspect.print_conll_style_tags_for_communication(comm, char_offsets=False, dependency=False, lemmas=False, ner=False, pos=False, other_tags=None, dependency_tool=None, lemmas_tool=None, ner_tool=None, pos_tool=None)

Print ‘ConLL-style’ tags for the tokens in a Communication

Parameters:
  • comm (Communication) –
  • char_offsets (bool) – Flag for printing token text specified by a Token‘s (optional) TextSpan
  • dependency (bool) – Flag for printing dependency parse HEAD tags
  • lemmas (bool) – Flag for printing lemma tags
  • ner (bool) – Flag for printing Named Entity Recognition tags
  • pos (bool) – Flag for printing Part-of-Speech tags
concrete.inspect.print_conll_style_tags_for_tokenization(tokenization, token_tag_lists)

Print ‘ConLL-style’ tags for the tokens in a tokenization

Parameters:
  • tokenization (Tokenization) –
  • token_tag_lists – A list of lists of token tag strings
concrete.inspect.print_entities(comm, tool=None)

Print information for Entity objects and their associated EntityMention objects

Parameters:
  • comm (Communication) –
  • tool (str) – If not None, only print information for EntitySet objects with a matching metadata.tool field
concrete.inspect.print_id_for_communication(comm, tool=None)

Print ID field of Communication

Parameters:
concrete.inspect.print_metadata(comm, tool=None)

Print metadata for tools used to annotate Communication

Parameters:
concrete.inspect.print_penn_treebank_for_communication(comm, tool=None)

Print Penn-Treebank parse trees for all Tokenization objects

Parameters:
  • comm (Communication) –
  • tool (str) – If not None, only print information for Tokenization objects with a matching metadata.tool field
concrete.inspect.print_sections(comm, tool=None)

Print information for all Section object, according to their spans.

Parameters:
  • comm (Communication) –
  • tool (str) – If not None, only print information for Section objects with a matching metadata.tool field
concrete.inspect.print_situation_mentions(comm, tool=None)

Print information for all SituationMention (some of which may not have a Situation)

Parameters:
concrete.inspect.print_situations(comm, tool=None)

Print information for all Situation objects and their associated SituationMention objects

Parameters:
  • comm (Communication) –
  • tool (str) – If not None, only print information for Situation objects with a matching metadata.tool field
concrete.inspect.print_text_for_communication(comm, tool=None)

Print text field of :class:.Communication`

Parameters:
  • comm (Communication) –
  • tool (str) – If not None, only print text field of Communication objects with a matching metadata.tool field
concrete.inspect.print_tokens_for_communication(comm, tool=None)

Print token text for a Communication

Parameters:
  • comm (Communication) –
  • tool (str) – If not None, only print token text for Communication objects with a matching metadata.tool field
concrete.inspect.print_tokens_with_entityMentions(comm, tool=None)

Print information for Token objects that are part of an EntityMention

Parameters:

concrete.validate module

Library to validate a Concrete Communication

Validation info, error and warning messages are logged using the Python standard library’s logging module.

concrete.validate.validate_communication(comm)

Test if all objects in a Communication are valid.

Calls validate_thrift_deep() to check for Concrete data structure fields that are required by the Concrete Thrift definitions. Then calls:

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_communication_file(communication_filename)

Test if the Communication in a file is valid

Deserializes a Communication file into memory, then calls validate_communication() on the Communication object.

Parameters:communication_filename (str) – Name of file containing
Returns:bool
concrete.validate.validate_constituency_parses(comm, tokenization)

Test a Tokenization‘s constituency Parse objects.

Verifies that, for each constituent Parse:

  • none of the constituent IDs for the parse repeat
  • the parse tree is a fully connected graph
  • the parse “tree” is really a tree data structure
Parameters:
Returns:

bool

concrete.validate.validate_dependency_parses(tokenization)

Test a Tokenization‘s DependencyParse objects

Verifies that, for each DependencyParse:

  • the parse is a fully connected graph
  • there are no nodes with a null governer node whose edgeType is not root
Parameters:tokenization (Tokenization) –
Returns:bool
concrete.validate.validate_entity_mention_ids(comm)

Test if all Entity mentionIds are valid

Checks if all Entity mentionId UUID‘s refer to a EntityMention UUID that exists in the Communication

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_entity_mention_token_ref_sequences(comm)

Test if all EntityMention objects have a valid TokenRefSequences

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_entity_mention_tokenization_ids(comm)

Test tokenizationID field of every EntityMention

Verifies that, for each EntityMention, the entityMention.tokens.tokenizationId UUID field matches the UUID of a Tokenization that exists in this Communication

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_situation_mentions(comm)

Test every SituationMention in the Communication

A SituationMention has a list of MentionArgument objects, and each MentionArgument can point to an EntityMention, SituationMention or TokenRefSequence.

Checks that each MentionArgument points to only one type of argument. Also checks validity of all EntityMention and SituationMention UUID‘s.

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_situations(comm)

Test every Situation in the Communication

Checks the validity of all EntityMention and SituationMention UUID‘s referenced by each Situation.

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_thrift(thrift_object, indent_level=0)

Test if a Thrift object has all required fields.

This function calls the Thrift object’s validate() function. If an exception is raised because of missing required fields, the function catches the exception and logs the exception’s error message using the Python Standard Library’s logging module.

Parameters:
  • thrift_object
  • indent_level (int) – Text indentation level for logging error message
Returns:

bool

concrete.validate.validate_thrift_deep(thrift_object, valid=True)

Deep validation of thrift messages.

Parameters:thrift_object – a Thrift object

The Python version of Thrift 0.9.1 does not support deep (recursive) validation, and none of the Thrift serialization/deserialization code calls even the shallow validation functions provided by Thrift.

This function implements deep validation. The code is adapted from:

See this blog post for more information:

A patch to implement deep validation was submitted to the Thrift repository in February of 2013:

but Thrift 0.9.1 - which was released on 2013-08-21 - does not include this functionality.

concrete.validate.validate_thrift_object_required_fields(thrift_object, indent_level=0)

DEPRECATED: Use validate_thrift() instead

concrete.validate.validate_thrift_object_required_fields_recursively(thrift_object, valid=True)

DEPRECATED. Use validate_thrift_deep() instead.

concrete.validate.validate_token_offsets_for_section(section)

Test if the TextSpan boundaries for all Sentence objects in a Section fall within the boundaries of the Section‘s TextSpan

Parameters:section (Section) –
Returns:bool
concrete.validate.validate_token_offsets_for_sentence(sentence)

Test if the TextSpan boundaries for all Token objects` in a Sentence fall within the boundaries of the Sentence‘s TextSpan.

Parameters:sentence (Sentence) –
Returns:bool
concrete.validate.validate_token_ref_sequence(comm, token_ref_sequence)

Check if a TokenRefSequence is valid

Verify that all token indices in the TokenRefSequence point to actual token indices in corresponding Tokenization

Parameters:
Returns:

bool

concrete.validate.validate_token_taggings(tokenization)

Test if a Tokenization has any TokenTagging objects with invalid token indices

Parameters:tokenization (Tokenization) –
Returns:bool

concrete.version module

concrete.version.add_argparse_argument(parser)
concrete.version.concrete_library_version()
concrete.version.concrete_schema_version()

Module contents

Python modules and scripts for working with Concrete, an HLT data specification defined using Thrift.

Indices and tables