Welcome to concrete’s documentation!¶
Python modules and scripts for working with Concrete, an HLT data specification defined using Thrift.
concrete package¶
Subpackages¶
concrete.access package¶
Submodules¶
concrete.access.FetchCommunicationService module¶
-
class
concrete.access.FetchCommunicationService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.access.FetchCommunicationService.Iface
Service to fetch particular communications.-
fetch
(request)¶ - Parameters:- request
-
getCommunicationCount
()¶ - Get the number of Communications this service searches over. Implementationsthat do not provide this should throw an exception.
-
getCommunicationIDs
(offset, count)¶ - Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementationsthat do not provide this should throw an exception.Parameters:- offset- count
-
recv_fetch
()¶
-
recv_getCommunicationCount
()¶
-
recv_getCommunicationIDs
()¶
-
send_fetch
(request)¶
-
send_getCommunicationCount
()¶
-
send_getCommunicationIDs
(offset, count)¶
-
-
class
concrete.access.FetchCommunicationService.
Iface
¶ Bases:
concrete.services.Service.Iface
Service to fetch particular communications.-
fetch
(request)¶ - Parameters:- request
-
getCommunicationCount
()¶ - Get the number of Communications this service searches over. Implementationsthat do not provide this should throw an exception.
-
getCommunicationIDs
(offset, count)¶ - Get a list of ‘count’ Communication IDs starting at ‘offset’. Implementationsthat do not provide this should throw an exception.Parameters:- offset- count
-
-
class
concrete.access.FetchCommunicationService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.access.FetchCommunicationService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_fetch
(seqid, iprot, oprot)¶
-
process_getCommunicationCount
(seqid, iprot, oprot)¶
-
process_getCommunicationIDs
(seqid, iprot, oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
fetch_args
(request=None)¶ Bases:
object
Attributes:- request-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
fetch_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
getCommunicationCount_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.access.FetchCommunicationService.
getCommunicationCount_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.access.StoreCommunicationService module¶
-
class
concrete.access.StoreCommunicationService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.access.StoreCommunicationService.Iface
A service that exists so that clients can store Concrete datastructures to implementing servers.Implement this if you are creating an analytic that wishes tostore its results back to a server. That server may performvalidation, write the new layers to a database, and so forth.-
recv_store
()¶
-
send_store
(communication)¶
-
store
(communication)¶ - Store a communication to a server implementing this method.The communication that is stored should contain the newanalytic layers you wish to append. You may also wish to callmethods that unset annotations you feel the receiver would notfind useful in order to reduce network overhead.Parameters:- communication
-
-
class
concrete.access.StoreCommunicationService.
Iface
¶ Bases:
concrete.services.Service.Iface
A service that exists so that clients can store Concrete datastructures to implementing servers.Implement this if you are creating an analytic that wishes tostore its results back to a server. That server may performvalidation, write the new layers to a database, and so forth.-
store
(communication)¶ - Store a communication to a server implementing this method.The communication that is stored should contain the newanalytic layers you wish to append. You may also wish to callmethods that unset annotations you feel the receiver would notfind useful in order to reduce network overhead.Parameters:- communication
-
-
class
concrete.access.StoreCommunicationService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.access.StoreCommunicationService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_store
(seqid, iprot, oprot)¶
-
concrete.access.constants module¶
concrete.access.ttypes module¶
Module contents¶
concrete.annotate package¶
Submodules¶
concrete.annotate.AnnotateCommunicationService module¶
-
class
concrete.annotate.AnnotateCommunicationService.
Client
(iprot, oprot=None)¶ Bases:
concrete.annotate.AnnotateCommunicationService.Iface
Annotator service methods. For concrete analytics thatare to be stood up as independent services, accessiblefrom any programming language.-
annotate
(original)¶ - Main annotation method. Takes a communication as inputand returns a new one as output.It is up to the implementing service to verify thatthe input communication is valid.Can throw a ConcreteThriftException upon error(invalid input, analytic exception, etc.).Parameters:- original
-
getDocumentation
()¶ - Return a detailed description of what the particular tooldoes, what inputs and outputs to expect, etc.Developers whom are not familiar with the particularanalytic should be able to read this string andunderstand the essential functions of the analytic.
-
getMetadata
()¶ - Return the tool’s AnnotationMetadata.
-
recv_annotate
()¶
-
recv_getDocumentation
()¶
-
recv_getMetadata
()¶
-
send_annotate
(original)¶
-
send_getDocumentation
()¶
-
send_getMetadata
()¶
-
send_shutdown
()¶
-
shutdown
()¶ - Indicate to the server it should shut down.
-
-
class
concrete.annotate.AnnotateCommunicationService.
Iface
¶ Bases:
object
Annotator service methods. For concrete analytics thatare to be stood up as independent services, accessiblefrom any programming language.-
annotate
(original)¶ - Main annotation method. Takes a communication as inputand returns a new one as output.It is up to the implementing service to verify thatthe input communication is valid.Can throw a ConcreteThriftException upon error(invalid input, analytic exception, etc.).Parameters:- original
-
getDocumentation
()¶ - Return a detailed description of what the particular tooldoes, what inputs and outputs to expect, etc.Developers whom are not familiar with the particularanalytic should be able to read this string andunderstand the essential functions of the analytic.
-
getMetadata
()¶ - Return the tool’s AnnotationMetadata.
-
shutdown
()¶ - Indicate to the server it should shut down.
-
-
class
concrete.annotate.AnnotateCommunicationService.
Processor
(handler)¶ Bases:
concrete.annotate.AnnotateCommunicationService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_annotate
(seqid, iprot, oprot)¶
-
process_getDocumentation
(seqid, iprot, oprot)¶
-
process_getMetadata
(seqid, iprot, oprot)¶
-
process_shutdown
(seqid, iprot, oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
annotate_args
(original=None)¶ Bases:
object
Attributes:- original-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
annotate_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
getDocumentation_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
getDocumentation_result
(success=None)¶ Bases:
object
Attributes:- success-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.annotate.AnnotateCommunicationService.
getMetadata_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.annotate.constants module¶
concrete.annotate.ttypes module¶
Module contents¶
concrete.audio package¶
Submodules¶
concrete.audio.constants module¶
concrete.audio.ttypes module¶
-
class
concrete.audio.ttypes.
Sound
(wav=None, mp3=None, sph=None, path=None)¶ Bases:
object
A sound wave. A separate optional field is defined for eachsuppported format. Typically, a Sound object will only definea single field.Note: we may want to have separate fields for separate channels(left vs right), etc.Attributes:- wav- mp3- sph- path: An absolute path to a file on disk where the sound file can befound. It is assumed that this path will be accessable from anymachine that the system is run on (i.e., it should be a shareddisk, or possibly a mirrored directory).-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.clustering package¶
Submodules¶
concrete.clustering.constants module¶
concrete.clustering.ttypes module¶
-
class
concrete.clustering.ttypes.
Cluster
(clusterMemberIndexList=None, confidenceList=None, childIndexList=None)¶ Bases:
object
A set of items which are alike in some way. Has an implicit id which is theindex of this Cluster in its parent Clustering’s ‘clusterList’.Attributes:- clusterMemberIndexList: The items in this cluster. Values are indices into the‘clusterMemberList’ of the Clustering which contains this Cluster.- confidenceList: Co-indexed with ‘clusterMemberIndexList’. The i^{th} value represents theconfidence that mention clusterMemberIndexList[i] belongs to this cluster.- childIndexList: A set of clusters (implicit ids/indices) from which this cluster wascreated. This cluster should represent the union of all the items in allof the child clusters. (For hierarchical clustering only).-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.clustering.ttypes.
ClusterMember
(communicationId=None, setId=None, elementId=None)¶ Bases:
object
An item being clustered. Does not designate cluster _membership_, as in“item x belongs to cluster C”, but rather just the item (“x” in thisexample). Membership is indicated through Cluster objects. An item may be aEntity, EntityMention, Situation, SituationMention, or technically anythingwith a UUID.Attributes:- communicationId: UUID of the Communication which contains the item specified by ‘elementId’.This is ancillary info assuming UUIDs are indeed universally unique.- setId: UUID of the Entity|Situation(Mention)Set which contains the item specified by ‘elementId’.This is ancillary info assuming UUIDs are indeed universally unique.- elementId: UUID of the EntityMention, Entity, SituationMention, or Situation thatthis item represents. This is the characteristic field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.clustering.ttypes.
Clustering
(uuid=None, metadata=None, clusterMemberList=None, clusterList=None, rootClusterIndexList=None)¶ Bases:
object
An (optionally) hierarchical clustering of items appearing across a set ofCommunications (intra-Communication clusterings are encoded by Entities andSituations). An item may be a Entity, EntityMention, Situation,SituationMention, or technically anything with a UUID.Attributes:- uuid: UUID for this Clustering object.- metadata: Metadata for this Clustering object.- clusterMemberList: The set of items being clustered.- clusterList: Clusters of items. If this is a hierarchical clustering, this may containclusters which are the set of smaller clusters.Clusters may not “overlap”, meaning (for all clusters X,Y):X cap Yeq emptyset implies X subset Y ee Y subset X- rootClusterIndexList: A set of disjoint clusters (indices in ‘clusterList’) which cover allitems in ‘clusterMemberList’. This list must be specified for hierarchicalclusterings and should not be specified for flat clusterings.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.communication package¶
Submodules¶
concrete.communication.constants module¶
concrete.communication.ttypes module¶
-
class
concrete.communication.ttypes.
Communication
(id=None, uuid=None, type=None, text=None, startTime=None, endTime=None, communicationTaggingList=None, metadata=None, keyValueMap=None, lidList=None, sectionList=None, entityMentionSetList=None, entitySetList=None, situationMentionSetList=None, situationSetList=None, originalText=None, sound=None, communicationMetadata=None)¶ Bases:
object
A single communication instance, containing linguistic contentgenerated by a single speaker or author. This type is used forboth inter-personal communications (such as phone calls orconversations) and third-party communications (such as newsarticles).Each communication instance is grounded by its original(unannotated) contents, which should be stored in either the“text” field (for text communications) or the “audio” field (foraudio communications). If the communication is not available inits original form, then these fields should store thecommunication in the least-processed form available.Attributes:- id: Stable identifier for this communication, identifying both thename of the source corpus and the document that it corresponds toin that corpus.- uuid: Universally unique identifier for this communication instance.This is generated randomly, and can not be mapped back to thesource corpus. It is used as a target for symbolic “pointers”.- type: A short, corpus-specific term characterizing the nature of thecommunication; may change in a future version of concrete.Often used for filtering. For example, Gigaword usesthe type “story” to distinguish typical news articles fromweekly summaries (“multi”), editorial advisories (“advis”), etc.At present, this value is typically a literal form from theoriginating corpus: as a result, a type marked ‘other’ may havedifferent meanings across different corpora.- text: The full text contents of this communication in its originalform, or in the least-processed form available, if the originalis not available.- startTime: The time when this communication started (in unix time UTC –i.e., seconds since January 1, 1970).- endTime: The time when this communication ended (in unix time UTC –i.e., seconds since January 1, 1970).- communicationTaggingList: A list of CommunicationTagging objects that can support thisCommunication. CommunicationTagging objects can be used toannotate Communications with topics, gender identification, etc.- metadata: metadata.AnnotationMetadata to support this particular communication.Communications derived from other communications shouldindicate in this metadata object their dependencyto the original communication ID.- keyValueMap: A catch-all store of keys and values. Use sparingly!- lidList: Theories about the languages that are present in thiscommunication.- sectionList: Theory about the block structure of this communication.- entityMentionSetList: Theories about which spans of text are used to mention entitiesin this communication.- entitySetList: Theories about what entities are discussed in thiscommunication, with pointers to individual mentions.- situationMentionSetList: Theories about what situations are explicitly mentioned in thiscommunication.- situationSetList: Theories about what situations are asserted in thiscommunication.- originalText: Optional original text field that points back to an originalcommunication.This field can be populated for sake of convenience when creating“perspective” communication (communications that are based onhighly destructive changes to an original communication [e.g.,via MT]). This allows developers to quickly access the originaltext that this perspective communication is based off of.- sound: The full audio contents of this communication in its originalform, or in the least-processed form available, if the originalis not available.- communicationMetadata: Metadata about this specific Communication, such as informationabout its author, information specific to this Communicationor Communications like it (info from an API, for example), etc.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.communication.ttypes.
CommunicationSet
(communicationIdList=None, corpus=None, entityMentionClusterList=None, entityClusterList=None, situationMentionClusterList=None, situationClusterList=None)¶ Bases:
object
A structure that represents a collection of Communications.Attributes:- communicationIdList: A list of Communication UUIDs that this CommunicationSetrepresents.This field may be absent if this CommunicationSet representsa large corpus. If absent, ‘corpus’ field should be present.- corpus: The name of a corpus or other document body that thisCommunicationSet represents.Should be present if ‘communicationIdList’ is absent.- entityMentionClusterList: A list of Clustering objects that represent agroup of EntityMentions that are a part of thisCommunicationSet.- entityClusterList: A list of Clustering objects that represent agroup of Entities that are a part of thisCommunicationSet.- situationMentionClusterList: A list of Clustering objects that represent agroup of SituationMentions that are a part of thisCommunicationSet.- situationClusterList: A list of Clustering objects that represent agroup of Situations that are a part of thisCommunicationSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.communication.ttypes.
CommunicationTagging
(uuid=None, metadata=None, taggingType=None, tagList=None, confidenceList=None)¶ Bases:
object
A structure that represents a ‘tagging’ of a Communication. Thesemight be labels or annotations on a particular communcation.For example, this structure might be used to describe the topicsdiscussed in a Communication. The taggingType might be ‘topic’, andthe tagList might include ‘politics’ and ‘science’.Attributes:- uuid: A unique identifier for this CommunicationTagging object.- metadata: AnnotationMetadata to support this CommunicationTagging object.- taggingType: A string that captures the type of this CommunicationTaggingobject. For example: ‘topic’ or ‘gender’.- tagList: A list of strings that represent different tags related to the taggingType.For example, if the taggingType is ‘topic’, some example tags might be‘politics’, ‘science’, etc.- confidenceList: A list of doubles, parallel to the list of strings in tagList,that indicate the confidences of each tag.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.email package¶
Submodules¶
concrete.email.constants module¶
concrete.email.ttypes module¶
-
class
concrete.email.ttypes.
EmailAddress
(address=None, displayName=None)¶ Bases:
object
An email address, optionally accompanied by a display_name. Thesevalues are typically extracted from strings such as:<tt> “John Smith” <john@xyz.com> </tt>.see RFC2822 http://tools.ietf.org/html/rfc2822Attributes:- address- displayName-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.email.ttypes.
EmailCommunicationInfo
(messageId=None, contentType=None, userAgent=None, inReplyToList=None, referenceList=None, senderAddress=None, returnPathAddress=None, toAddressList=None, ccAddressList=None, bccAddressList=None, emailFolder=None, subject=None, quotedAddresses=None, attachmentPaths=None, salutation=None, signature=None)¶ Bases:
object
Extra information about an email communication instance.Attributes:- messageId- contentType- userAgent- inReplyToList- referenceList- senderAddress- returnPathAddress- toAddressList- ccAddressList- bccAddressList- emailFolder- subject- quotedAddresses- attachmentPaths- salutation- signature-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.entities package¶
Submodules¶
concrete.entities.constants module¶
concrete.entities.ttypes module¶
-
class
concrete.entities.ttypes.
Entity
(uuid=None, mentionIdList=None, type=None, confidence=None, canonicalName=None)¶ Bases:
object
A single referent (or “entity”) that is referred to at least oncein a given communication, along with pointers to all of thereferences to that referent. The referent’s type (e.g., is it aperson, or a location, or an organization, etc) is also recorded.Because each Entity contains pointers to all references to areferent with a given communication, an Entity can bethought of as a coreference set.Attributes:- uuid: Unique identifier for this entity.- mentionIdList: An list of pointers to all of the mentions of this Entity’sreferent. (type=EntityMention)- type: The basic type of this entity’s referent.- confidence: Confidence score for this individual entity. You can also set aconfidence score for an entire EntitySet using the EntitySet’smetadata.- canonicalName: A string containing a representative, canonical, or “best” namefor this entity’s referent. This string may match one of thementions’ text strings, but it is not required to.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.entities.ttypes.
EntityMention
(uuid=None, tokens=None, entityType=None, phraseType=None, confidence=None, text=None, childMentionIdList=None)¶ Bases:
object
A span of text with a specific referent, such as a person,organization, or time. Things that can be referred to by a mentionare called “entities.”It is left up to individual EntityMention taggers to decide whichreferent types and phrase types to identify. For example, someEntityMention taggers may only identify proper nouns, or may onlyidentify EntityMentions that refer to people.Each EntityMention consists of a sequence of tokens. This sequenceis usually annotated with information about the referent type(e.g., is it a person, or a location, or an organization, etc) aswell as the phrase type (is it a name, pronoun, common noun, etc.).EntityMentions typically consist of a single noun phrase; however,other phrase types may also be marked as mentions. Forexample, in the phrase “French hotel,” the adjective “French” mightbe marked as a mention for France.Attributes:- uuid- tokens: Pointer to sequence of tokens.Special note: In the case of PRO-drop, where there is no explicitmention, but an EntityMention is needed for downstream Entityanalysis, this field should be set to a TokenRefSequence with anempty tokenIndexList and the anchorTokenIndex set to the head/onlytoken of the verb/predicate from which the PRO was dropped.- entityType: The type of referent that is referred to by this mention.- phraseType: The phrase type of the tokens that constitute this mention.- confidence: A confidence score for this individual mention. You can alsoset a confidence score for an entire EntityMentionSet using theEntityMentionSet’s metadata.- text: The text content of this entity mention. This field istypically redundant with the string formed by cross-referencingthe ‘tokens.tokenIndexList’ field with this mention’stokenization. This field may not be generated by all analytics.- childMentionIdList: A list of pointers to the “child” EntityMentions of thisEntityMention.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.entities.ttypes.
EntityMentionSet
(uuid=None, metadata=None, mentionList=None, linkingList=None)¶ Bases:
object
A theory about the set of entity mentions that are present in amessage. See also: EntityMentionThis type does not represent a coreference relationship, which is handled by Entity.This type is meant to represent the output of a entity-mention-identifier,which is often a part of an in-doc coreference system.Attributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- mentionList: List of mentions in this set.- linkingList: Entity linking annotations associated with this EntityMentionSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.entities.ttypes.
EntitySet
(uuid=None, metadata=None, entityList=None, linkingList=None, mentionSetId=None)¶ Bases:
object
A theory about the set of entities that are present in amessage. See also: Entity.Attributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- entityList: List of entities in this set.- linkingList: Entity linking annotations associated with this EntitySet.- mentionSetId: An optional UUID pointer to an EntityMentionSet.If this field is present, consumers can assume that allEntity objects in this EntitySet have EntityMentions that are includedin the named EntityMentionSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.language package¶
Submodules¶
concrete.language.constants module¶
concrete.language.ttypes module¶
-
class
concrete.language.ttypes.
LanguageIdentification
(uuid=None, metadata=None, languageToProbabilityMap=None)¶ Bases:
object
A theory about what languages are present in a given communicationor piece of communication. Note that it is possible to have morethan one language present in a given communication.Attributes:- uuid: Unique identifier for this language identification.- metadata: Information about where this language identification came from.- languageToProbabilityMap: A list mapping from a language to the probability that thatlanguage occurs in a given communication. Each language code shouldoccur at most once in this list. The probabilities do <i>not</i>need to sum to one – for example, if a single communication is knownto contain both English and French, then it would be appropriateto assign a probability of 1 to both langauges. (Manuallyannotated LanguageProb objects should always have probabilitiesof either zero or one; machine-generated LanguageProbs may haveintermediate probabilities.)Note: The string key should represent the ISO 639-3 three-letter code.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.learn package¶
Submodules¶
concrete.learn.ActiveLearnerClientService module¶
-
class
concrete.learn.ActiveLearnerClientService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.learn.ActiveLearnerClientService.Iface
The active learner client implements a method to accept new sorts of the annotation units-
recv_submitSort
()¶
-
send_submitSort
(sessionId, unitIds)¶
-
submitSort
(sessionId, unitIds)¶ - Submit a new sort of communications to the brokerParameters:- sessionId- unitIds
-
-
class
concrete.learn.ActiveLearnerClientService.
Iface
¶ Bases:
concrete.services.Service.Iface
The active learner client implements a method to accept new sorts of the annotation units-
submitSort
(sessionId, unitIds)¶ - Submit a new sort of communications to the brokerParameters:- sessionId- unitIds
-
-
class
concrete.learn.ActiveLearnerClientService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.learn.ActiveLearnerClientService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_submitSort
(seqid, iprot, oprot)¶
-
concrete.learn.ActiveLearnerServerService module¶
-
class
concrete.learn.ActiveLearnerServerService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.learn.ActiveLearnerServerService.Iface
The active learning server is responsible for sorting a list of communications.Users annotate communications based on the sort.Active learning is an asynchronous process.It is started by the client calling start().At arbitrary times, the client can call addAnnotations().When the server is done with a sort of the data, it calls submitSort() on the client.The server can perform additional sorts until stop() is called.The server must be preconfigured with the details of the data source to pull communications.-
addAnnotations
(sessionId, annotations)¶ - Add annotations from the user to the learning processParameters:- sessionId- annotations
-
recv_addAnnotations
()¶
-
recv_start
()¶
-
recv_stop
()¶
-
send_addAnnotations
(sessionId, annotations)¶
-
send_start
(sessionId, task, contact)¶
-
send_stop
(sessionId)¶
-
start
(sessionId, task, contact)¶ - Start an active learning session on these communicationsParameters:- sessionId- task- contact
-
stop
(sessionId)¶ - Stop the learning sessionParameters:- sessionId
-
-
class
concrete.learn.ActiveLearnerServerService.
Iface
¶ Bases:
concrete.services.Service.Iface
The active learning server is responsible for sorting a list of communications.Users annotate communications based on the sort.Active learning is an asynchronous process.It is started by the client calling start().At arbitrary times, the client can call addAnnotations().When the server is done with a sort of the data, it calls submitSort() on the client.The server can perform additional sorts until stop() is called.The server must be preconfigured with the details of the data source to pull communications.-
addAnnotations
(sessionId, annotations)¶ - Add annotations from the user to the learning processParameters:- sessionId- annotations
-
start
(sessionId, task, contact)¶ - Start an active learning session on these communicationsParameters:- sessionId- task- contact
-
stop
(sessionId)¶ - Stop the learning sessionParameters:- sessionId
-
-
class
concrete.learn.ActiveLearnerServerService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.learn.ActiveLearnerServerService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_addAnnotations
(seqid, iprot, oprot)¶
-
process_start
(seqid, iprot, oprot)¶
-
process_stop
(seqid, iprot, oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
addAnnotations_args
(sessionId=None, annotations=None)¶ Bases:
object
Attributes:- sessionId- annotations-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
addAnnotations_result
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
start_args
(sessionId=None, task=None, contact=None)¶ Bases:
object
Attributes:- sessionId- task- contact-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ActiveLearnerServerService.
start_result
(success=None)¶ Bases:
object
Attributes:- success-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.learn.constants module¶
concrete.learn.ttypes module¶
-
class
concrete.learn.ttypes.
Annotation
(id=None, communication=None)¶ Bases:
object
Annotation on a communication.Attributes:- id: Identifier of the part of the communication being annotated.- communication: Communication with the annotation stored in it.The location of the annotation depends on the annotation unit identifier-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.learn.ttypes.
AnnotationTask
(type=None, language=None, unitType=None, units=None)¶ Bases:
object
Annotation task including information for pulling data.Attributes:- type: Type of annotation task- language: Language of the data for the task- unitType: Entire communication or individual sentences- units: Identifiers for each annotation unit-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.linking package¶
Submodules¶
concrete.linking.constants module¶
concrete.linking.ttypes module¶
-
class
concrete.linking.ttypes.
Link
(sourceId=None, linkTargetList=None)¶ Bases:
object
A structure that represents the origin of an entity linking annotation.Attributes:- sourceId: The “root” of this Link; points to a EntityMention UUID, Entity UUID, etc.- linkTargetList: A list of LinkTarget objects that this Link contains.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.linking.ttypes.
LinkTarget
(confidence=None, targetId=None, dbId=None, dbName=None)¶ Bases:
object
A structure that represents the target of an entity linking annotation.Attributes:- confidence: Confidence of this LinkTarget object.- targetId: A UUID that represents the target of this LinkTarget. ThisUUID should exist in the Entity/Situation(Mention)Set that theLinking object is contained in.- dbId: A database ID that represents the target of this linking.This should be used if the target of the linking is not associatedwith an Entity|Situation(Mention)Set in Concrete, and therefore cannot be linked bya UUID internal to concrete.If present, other optional field ‘dbName’ should also be populated.- dbName: The name of the database that represents the target of this linking.Together with the ‘dbId’, this can form a pointer to a targetthat is not represented inside concrete.Should be populated alongside ‘dbId’.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.linking.ttypes.
Linking
(metadata=None, linkList=None)¶ Bases:
object
A structure that represents entity linking annotations.Attributes:- metadata: Metadata related to this Linking object.- linkList: A list of Link objects that this Linking object contains.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.metadata package¶
Submodules¶
concrete.metadata.constants module¶
concrete.metadata.ttypes module¶
-
class
concrete.metadata.ttypes.
AnnotationMetadata
(tool=None, timestamp=None, digest=None, dependencies=None, kBest=1)¶ Bases:
object
Metadata associated with an annotation or a set of annotations,that identifies where those annotations came from.Attributes:- tool: The name of the tool that generated this annotation.- timestamp: The time at which this annotation was generated (in unix timeUTC – i.e., seconds since January 1, 1970).- digest: A Digest, carrying over any information the annotation metadatawishes to carry over.- dependencies: The theories that supported this annotation.An empty field indicates that the theory has nodependencies (e.g., an ingester).- kBest: An integer that represents a ranking for systemsthat output k-best lists.For systems that do not output k-best lists,the default value (1) should suffice.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.metadata.ttypes.
CommunicationMetadata
(tweetInfo=None, emailInfo=None, nitfInfo=None)¶ Bases:
object
Metadata specific to a particular Communication object.This might include corpus-specific metadata (from the Twitter API),attributes associated with the Communication (the author),or other information about the Communication.Attributes:- tweetInfo: Extra information for communications where kind==TWEET:Information about this tweet that is provided by the TwitterAPI. For information about the Twitter API, see:- emailInfo: Extra information for communications where kind==EMAIL- nitfInfo: Extra information that may come from the NITF(News Industry Text Format) schema. See ‘nitf.thrift’.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.metadata.ttypes.
Digest
(bytesValue=None, int64Value=None, doubleValue=None, stringValue=None, int64List=None, doubleList=None, stringList=None)¶ Bases:
object
Analytic-specific information about an attribute or edge. Digestsare used to combine information from multiple sources to generate aunified value. The digests generated by an analytic will only everbe used by that same analytic, so analytics can feel free to encodeinformation in whatever way is convenient.Attributes:- bytesValue: The following fields define various ways you can store thedigest data (for convenience). If none of these meets yourneeds, then serialize the digest to a byte sequence and store itin bytesValue.- int64Value- doubleValue- stringValue- int64List- doubleList- stringList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.metadata.ttypes.
TheoryDependencies
(sectionTheoryList=None, sentenceTheoryList=None, tokenizationTheoryList=None, posTagTheoryList=None, nerTagTheoryList=None, lemmaTheoryList=None, langIdTheoryList=None, parseTheoryList=None, dependencyParseTheoryList=None, tokenAnnotationTheoryList=None, entityMentionSetTheoryList=None, entitySetTheoryList=None, situationMentionSetTheoryList=None, situationSetTheoryList=None, communicationsList=None)¶ Bases:
object
A struct that holds UUIDs for all theories that a particularannotation was based upon (and presumably requires).Producers of TheoryDependencies should list all stages that theyused in constructing their particular annotation. They do not,however, need to explicitly label each stage; they can labelonly the immediate stage before them.Examples:If you are producing a Tokenization, and only used theSentenceSegmentation in order to produce that Tokenization, listonly the single SentenceSegmentation UUID in sentenceTheoryList.In this example, even though the SentenceSegmentation will havea dependency on some SectionSegmentation, it is not necessaryfor the Tokenization to list the SectionSegmentation UUID as adependency.If you are a producer of EntityMentions, and you use twoPOSTokenTagging and one NERTokenTagging objects, add the UUIDs forthe POSTokenTagging objects to posTagTheoryList, and the UUID ofthe NER TokenTagging to the nerTagTheoryList.In this example, because multiple annotations influenced thenew annotation, they should all be listed as dependencies.Attributes:- sectionTheoryList- sentenceTheoryList- tokenizationTheoryList- posTagTheoryList- nerTagTheoryList- lemmaTheoryList- langIdTheoryList- parseTheoryList- dependencyParseTheoryList- tokenAnnotationTheoryList- entityMentionSetTheoryList- entitySetTheoryList- situationMentionSetTheoryList- situationSetTheoryList- communicationsList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.nitf package¶
Submodules¶
concrete.nitf.constants module¶
concrete.nitf.ttypes module¶
-
class
concrete.nitf.ttypes.
NITFInfo
(alternateURL=None, articleAbstract=None, authorBiography=None, banner=None, biographicalCategoryList=None, columnName=None, columnNumber=None, correctionDate=None, correctionText=None, credit=None, dayOfWeek=None, descriptorList=None, featurePage=None, generalOnlineDescriptorList=None, guid=None, kicker=None, leadParagraphList=None, locationList=None, nameList=None, newsDesk=None, normalizedByline=None, onlineDescriptorList=None, onlineHeadline=None, onlineLeadParagraph=None, onlineLocationList=None, onlineOrganizationList=None, onlinePeople=None, onlineSectionList=None, onlineTitleList=None, organizationList=None, page=None, peopleList=None, publicationDate=None, publicationDayOfMonth=None, publicationMonth=None, publicationYear=None, section=None, seriesName=None, slug=None, taxonomicClassifierList=None, titleList=None, typesOfMaterialList=None, url=None, wordCount=None)¶ Bases:
object
Attributes:- alternateURL: This field specifies the URL of the article, if published online. In somecases, such as with the New York Times, when this field is present,the URL is preferred to the URL field on articles published onor after April 02, 2006, as the linked page will have richer content.- articleAbstract: This field is a summary of the article, possibly written byan indexing service.- authorBiography: This field specifies the biography of the author of the article.Generally, this field is specified for guest authors, and not forregular reporters, except to provide the author’s email address.- banner: The banner field is used to indicate if there has been additionalinformation appended to the articles since its publication. Examples ofbanners include (‘Correction Appended’ and ‘Editor’s Note Appended’).- biographicalCategoryList: When present, the biographical category field generally indicates that adocument focuses on a particular individual. The value of the fieldindicates the area or category in which this individual is best known.This field is most often defined for Obituaries and Book Reviews.<ol><li>Politics and Government (U.S.)</li><li>Books and Magazines <li>Royalty</li></ol>- columnName: If the article is part of a regular column, this field specifies the nameof that column.<br>Sample Column Names:<br><ol><li>World News Briefs</li><li>WEDDINGS</li><li>The Accessories Channel</li></ol>- columnNumber: This field specifies the column in which the article starts in the printpaper. A typical printed page in the paper has six columns numbered fromright to left. As a consequence most, but not all, of the values for thisfield fall in the range 1-6.- correctionDate: This field specifies the date on which a correction was made to thearticle. Generally, if the correction date is specified, the correctiontext will also be specified (and vice versa).- correctionText: For articles corrected following publication, this field specifies thecorrection. Generally, if the correction text is specified, thecorrection date will also be specified (and vice versa).- credit: This field indicates the entity that produced the editorial content ofthis document.- dayOfWeek: This field specifies the day of week on which the article was published.<ul><li>Monday</li><li>Tuesday</li><li>Wednesday</li><li>Thursday</li><li>Friday</li><li>Saturday</li><li>Sunday</li></ul>- descriptorList: The "descriptors" field specifies a list of descriptive terms drawn froma normalized controlled vocabulary corresponding to subjects mentioned inthe article.<br>Examples Include:<ol><li>ECONOMIC CONDITIONS AND TRENDS</li><li>AIRPLANES</li><li>VIOLINS</li></ol>- featurePage: The feature page containing this article, such as<ul><li>Education Page</li><li>Fashion Page</li></ul>- generalOnlineDescriptorList: The "general online descriptors" field specifies a list of descriptorsthat are at a higher level of generality than the other tags associatedwith the article.<br>Examples Include:<ol><li>Surfing</li><li>Venice Biennale</li><li>Ranches</li></ol>- guid: The GUID field specifies an integer that is guaranteed to be unique forevery document in the corpus.- kicker: The kicker is an additional piece of information printed as anaccompaniment to a news headline.- leadParagraphList: The "lead Paragraph" field is the lead paragraph of the article.Generally this field is populated with the first two paragraphs from thearticle.- locationList: The "locations" field specifies a list of geographic descriptors drawnfrom a normalized controlled vocabulary that correspond to placesmentioned in the article.<br>Examples Include:<ol><li>Wellsboro (Pa)</li><li>Kansas City (Kan)</li><li>Park Slope (NYC)</li></ol>- nameList: The "names" field specifies a list of names mentioned in the article.<br>Examples Include:<ol><li>Azza Fahmy</li><li>George C. Izenour</li><li>Chris Schenkel</li></ol>- newsDesk: This field specifies the desk in the newsroom thatproduced the article. The desk is related to, but is not the same as thesection in which the article appears.- normalizedByline: The Normalized Byline field is the byline normalized to the form (lastname, first name).- onlineDescriptorList: This field specifies a list of descriptors from a normalized controlledvocabulary that correspond to topics mentioned in the article.<br>Examples Include:<ol><li>Marriages</li><li>Parks and Other Recreation Areas</li><li>Cooking and Cookbooks</li></ol>- onlineHeadline: This field specifies the headline displayed with the articleonline. Often this differs from the headline used in print.- onlineLeadParagraph: This field specifies the lead paragraph for the online version.- onlineLocationList: This field specifies a list of place names that correspond to geographiclocations mentioned in the article.<br>Examples Include:<ol><li>Hollywood</li><li>Los Angeles</li><li>Arcadia</li></ol>- onlineOrganizationList: This field specifies a list of organizations that correspond toorganizations mentioned in the article.<br>Examples Include:<ol><li>Nintendo Company Limited</li><li>Yeshiva University</li><li>Rose Center</li></ol>- onlinePeople: This field specifies a list of people that correspond to individualsmentioned in the article.<br>Examples Include:<ol><li>Lopez, Jennifer</li><li>Joyce, James</li><li>Robinson, Jackie</li></ol>- onlineSectionList: This field specifies the section(s) in which the online version of the articleis placed. This may typically be populated from a semicolon (;) delineated list.- onlineTitleList: This field specifies a list of authored works mentioned in the article.<br>Examples Include:<ol><li>Matchstick Men (Movie)</li><li>Blades of Glory (Movie)</li><li>Bridge and Tunnel (Play)</li></ol>- organizationList: This field specifies a list of organization names drawn from a normalizedcontrolled vocabulary that correspond to organizations mentioned in thearticle.<br>Examples Include:<ol><li>Circuit City Stores Inc</li><li>Delaware County Community College (Pa)</li><li>CONNECTICUT GRAND OPERA</li></ol>- page: This field specifies the page of the section in the paper in which thearticle appears. This is not an absolute pagination. An article thatappears on page 3 in section A occurs in the physical paper before anarticle that occurs on page 1 of section F. The section is encoded inthe <strong>section</strong> field.- peopleList: This field specifies a list of people from a normalized controlledvocabulary that correspond to individuals mentioned in the article.<br>Examples Include:<ol><li>REAGAN, RONALD WILSON (PRES)</li><li>BEGIN, MENACHEM (PRIME MIN)</li><li>COLLINS, GLENN</li></ol>- publicationDate: This field specifies the date of the article’s publication.- publicationDayOfMonth: This field specifies the day of the month on which the article waspublished, always in the range 1-31.- publicationMonth: This field specifies the month on which the article was published in therange 1-12 where 1 is January 2 is February etc.- publicationYear: This field specifies the year in which the article was published. Thisvalue is in the range 1987-2007 for this collection.- section: This field specifies the section of the paper in which the articleappears. This is not the name of the section, but rather a letter ornumber that indicates the section.- seriesName: If the article is part of a regular series, this field specifies the nameof that column.- slug: The slug is a short string that uniquely identifies an article from allother articles published on the same day. Please note, however, thatdifferent articles on different days may have the same slug.<ul><li>30other</li><li>12reunion</li></ul>- taxonomicClassifierList: This field specifies a list of taxonomic classifiers that place thisarticle into a hierarchy of articles. The individual terms of eachtaxonomic classifier are separated with the ‘/’ character.<br>Examples Include:<ol><li>Top/Features/Travel/Guides/Destinations/North America/UnitedStates/Arizona</li><li>Top/News/U.S./Rockies</li><li>Top/Opinion</li></ol>- titleList: This field specifies a list of authored works that correspond to worksmentioned in the article.<br>Examples Include:<ol><li>Greystoke: The Legend of Tarzan, Lord of the Apes (Movie)</li><li>Law and Order (TV Program)</li><li>BATTLEFIELD EARTH (BOOK)</li></ol>- typesOfMaterialList: This field specifies a normalized list of terms describing the generaleditorial category of the article.<br>Examples Include:<ol><li>REVIEW</li><li>OBITUARY</li><li>ANALYSIS</li></ol>- url: This field specifies the location of the online version of the article. The"Alternative Url" field is preferred to this field on articles publishedon or after April 02, 2006, as the linked page will have richer content.- wordCount: This field specifies the number of words in the body of the article,including the lead paragraph.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.search package¶
Submodules¶
concrete.search.FeedbackService module¶
-
class
concrete.search.FeedbackService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.search.FeedbackService.Iface
-
addCommunicationFeedback
(searchResultsId, communicationId, feedback)¶ - Provide feedback on the relevance of a particular communication to a searchParameters:- searchResultsId- communicationId- feedback
-
addSentenceFeedback
(searchResultsId, communicationId, sentenceId, feedback)¶ - Provide feedback on the relevance of a particular sentence to a searchParameters:- searchResultsId- communicationId- sentenceId- feedback
-
recv_addCommunicationFeedback
()¶
-
recv_addSentenceFeedback
()¶
-
recv_startFeedback
()¶
-
send_addCommunicationFeedback
(searchResultsId, communicationId, feedback)¶
-
send_addSentenceFeedback
(searchResultsId, communicationId, sentenceId, feedback)¶
-
send_startFeedback
(results)¶
-
startFeedback
(results)¶ - Start providing feedback for the specified SearchResults.This causes the search and its results to be persisted.Parameters:- results
-
-
class
concrete.search.FeedbackService.
Iface
¶ Bases:
concrete.services.Service.Iface
-
addCommunicationFeedback
(searchResultsId, communicationId, feedback)¶ - Provide feedback on the relevance of a particular communication to a searchParameters:- searchResultsId- communicationId- feedback
-
addSentenceFeedback
(searchResultsId, communicationId, sentenceId, feedback)¶ - Provide feedback on the relevance of a particular sentence to a searchParameters:- searchResultsId- communicationId- sentenceId- feedback
-
startFeedback
(results)¶ - Start providing feedback for the specified SearchResults.This causes the search and its results to be persisted.Parameters:- results
-
-
class
concrete.search.FeedbackService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.search.FeedbackService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_addCommunicationFeedback
(seqid, iprot, oprot)¶
-
process_addSentenceFeedback
(seqid, iprot, oprot)¶
-
process_startFeedback
(seqid, iprot, oprot)¶
-
-
class
concrete.search.FeedbackService.
addCommunicationFeedback_args
(searchResultsId=None, communicationId=None, feedback=None)¶ Bases:
object
Attributes:- searchResultsId- communicationId- feedback-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.FeedbackService.
addCommunicationFeedback_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.FeedbackService.
addSentenceFeedback_args
(searchResultsId=None, communicationId=None, sentenceId=None, feedback=None)¶ Bases:
object
Attributes:- searchResultsId- communicationId- sentenceId- feedback-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.FeedbackService.
addSentenceFeedback_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.search.SearchProxyService module¶
-
class
concrete.search.SearchProxyService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.search.SearchProxyService.Iface
The search proxy service provides a single interface to multiple search providers-
getCapabilities
(provider)¶ - Get a list of search type and language pairs for a search providerParameters:- provider
-
getCorpora
(provider)¶ - Get a corpus list for a search providerParameters:- provider
-
getProviders
()¶ - Get a list of search providers behind the proxy
-
recv_getCapabilities
()¶
-
recv_getCorpora
()¶
-
recv_getProviders
()¶
-
recv_search
()¶
-
search
(query, provider)¶ - Specify the search provider when performing a searchParameters:- query- provider
-
send_getCapabilities
(provider)¶
-
send_getCorpora
(provider)¶
-
send_getProviders
()¶
-
send_search
(query, provider)¶
-
-
class
concrete.search.SearchProxyService.
Iface
¶ Bases:
concrete.services.Service.Iface
The search proxy service provides a single interface to multiple search providers-
getCapabilities
(provider)¶ - Get a list of search type and language pairs for a search providerParameters:- provider
-
getCorpora
(provider)¶ - Get a corpus list for a search providerParameters:- provider
-
getProviders
()¶ - Get a list of search providers behind the proxy
-
search
(query, provider)¶ - Specify the search provider when performing a searchParameters:- query- provider
-
-
class
concrete.search.SearchProxyService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.search.SearchProxyService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_getCapabilities
(seqid, iprot, oprot)¶
-
process_getCorpora
(seqid, iprot, oprot)¶
-
process_getProviders
(seqid, iprot, oprot)¶
-
process_search
(seqid, iprot, oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCapabilities_args
(provider=None)¶ Bases:
object
Attributes:- provider-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCapabilities_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCorpora_args
(provider=None)¶ Bases:
object
Attributes:- provider-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getCorpora_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getProviders_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchProxyService.
getProviders_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.search.SearchService module¶
-
class
concrete.search.SearchService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.search.SearchService.Iface
-
getCapabilities
()¶ - Get a list of search type-language pairs
-
getCorpora
()¶ - Get a corpus list from the search provider
-
recv_getCapabilities
()¶
-
recv_getCorpora
()¶
-
recv_search
()¶
-
search
(query)¶ - Perform a search specified by the queryParameters:- query
-
send_getCapabilities
()¶
-
send_getCorpora
()¶
-
send_search
(query)¶
-
-
class
concrete.search.SearchService.
Iface
¶ Bases:
concrete.services.Service.Iface
-
getCapabilities
()¶ - Get a list of search type-language pairs
-
getCorpora
()¶ - Get a corpus list from the search provider
-
search
(query)¶ - Perform a search specified by the queryParameters:- query
-
-
class
concrete.search.SearchService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.search.SearchService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_getCapabilities
(seqid, iprot, oprot)¶
-
process_getCorpora
(seqid, iprot, oprot)¶
-
process_search
(seqid, iprot, oprot)¶
-
-
class
concrete.search.SearchService.
getCapabilities_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchService.
getCapabilities_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchService.
getCorpora_args
¶ Bases:
object
-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.SearchService.
getCorpora_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
concrete.search.constants module¶
concrete.search.ttypes module¶
-
class
concrete.search.ttypes.
SearchCapability
(type=None, lang=None)¶ Bases:
object
A search provider describes its capabilities with a list of search type and language pairs.Attributes:- type: A type of search supported by the search provider- lang: Language that the search provider supports.Use ISO 639-2/T three letter codes.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.ttypes.
SearchFeedback
¶ Bases:
object
Feedback values-
NEGATIVE
= -1¶
-
NONE
= 0¶
-
POSITIVE
= 1¶
-
-
class
concrete.search.ttypes.
SearchQuery
(terms=None, questions=None, communicationId=None, tokens=None, rawQuery=None, auths=None, userId=None, name=None, labels=None, type=None, lang=None, corpus=None, k=None, communication=None)¶ Bases:
object
Wrapper for information relevant to a (possibly structured) search.Attributes:- terms: Individual words, or multiword phrases, e.g., ‘dog’, ‘bluecheese’. It is the responsibility of the implementation ofSearch* to tokenize multiword phrases, if so-desired. Further,an implementation may choose to support advanced features such aswildcards, e.g.: ‘blue*’. This specification makes nocommittment as to the internal structure of keywords and theirsemantics: that is the responsibility of the individualimplementation.- questions: e.g., “what is the capital of spain?”questions is a list in order that possibly different phrasings ofthe question can be included, e.g.: “what is the name of spain’scapital?”- communicationId: Refers to an optional communication that can provide context for the query.- tokens: Refers to a sequence of tokens in the communication referenced by communicationId.- rawQuery: The input from the user provided in the search box, unmodified- auths: optional authorization mechanism- userId: Identifies the user who submitted the search query- name: Human readable name of the query.- labels: Properties of the query or user.These labels can be used to group queries and results by a domain or group ofusers for training. An example usage would be assigning the geographical regionas a label (“spain”). User labels could be based on organizational units (“hltcoe”).- type: This search is over this type of data (communications, sentences, entities)- lang: The language of the corpus that the user wants to search.Use ISO 639-2/T three letter codes.- corpus: An identifier of the corpus that the search is to be performed over.- k: The maximum number of candidates the search service should return.- communication: An optional communication used as context for the query.If both this field and communicationId is populated, then it isassumed the ID of the communication is the same as communicationId.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.ttypes.
SearchResult
(uuid=None, searchQuery=None, searchResultItems=None, metadata=None, lang=None)¶ Bases:
object
Single wrapper for results from all the various Search* services.Attributes:- uuid: Unique identifier for the results of this search.- searchQuery: The query that led to this result.Useful for capturing feedback or building training data.- searchResultItems: The list is assumed sorted best to worst, which should bereflected by the values contained in the score field of eachSearchResult, if that field is populated.- metadata: The system that provided the response: likely use case forpopulating this field is for building training data. Presumablya system will not need/want to return this object in live use.- lang: The dominant language of the search results.Use ISO 639-2/T three letter codes.Search providers should set this when possible to support downstream processing.Do not set if it is not known.If multilingual, use the string “multilingual”.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.search.ttypes.
SearchResultItem
(communicationId=None, sentenceId=None, score=None, tokens=None)¶ Bases:
object
An individual element returned from a search. Most/all methodswill return a communicationId, possibly with an associated score.For example if the target element type of the search is Sentencethen the sentenceId field should be populated.Attributes:- communicationId- sentenceId: The UUID of the returned sentence, which appears in thecommunication referenced by communicationId.- score: Values are not restricted in range (e.g., do not have to bewithin [0,1]). Higher is better.- tokens: If the Search is meant to result in a tokenRefSequence, this isthat result. Otherwise, this field may be optionally populatedin order to provide a hint to the client as to where to center avisualization, or the extraction of context, etc.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.services package¶
Subpackages¶
concrete.services.results package¶
-
class
concrete.services.results.ResultsServerService.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Client
,concrete.services.results.ResultsServerService.Iface
-
getLatestSearchResult
(userId)¶ - Get the most recent search results for a userParameters:- userId
-
getNextChunk
(sessionId)¶ - Get next chunk of data to annotateThe client should use the Retriever service to access the dataParameters:- sessionId
-
getSearchResult
(searchResultId)¶ - Get a search result objectParameters:- searchResultId
-
getSearchResults
(taskType, limit)¶ - Get a list of search results for a particular annotation taskSet the limit to 0 to get all relevant search resultsParameters:- taskType- limit
-
getSearchResultsByUser
(taskType, userId, limit)¶ - Get a list of search results for a particular annotation task filtered by a user idSet the limit to 0 to get all relevant search resultsParameters:- taskType- userId- limit
-
recv_getLatestSearchResult
()¶
-
recv_getNextChunk
()¶
-
recv_getSearchResult
()¶
-
recv_getSearchResults
()¶
-
recv_getSearchResultsByUser
()¶
-
recv_registerSearchResult
()¶
-
recv_startSession
()¶
-
recv_stopSession
()¶
-
recv_submitAnnotation
()¶
-
registerSearchResult
(result, taskType)¶ - Register the specified search result for annotation.If a name has not been assigned to the search query, one will be generated.This service also requires that the user_id field be populated in the SearchQuery.Parameters:- result- taskType
-
send_getLatestSearchResult
(userId)¶
-
send_getNextChunk
(sessionId)¶
-
send_getSearchResult
(searchResultId)¶
-
send_getSearchResults
(taskType, limit)¶
-
send_getSearchResultsByUser
(taskType, userId, limit)¶
-
send_registerSearchResult
(result, taskType)¶
-
send_startSession
(searchResultId, taskType)¶
-
send_stopSession
(sessionId)¶
-
send_submitAnnotation
(sessionId, unitId, communication)¶
-
startSession
(searchResultId, taskType)¶ - Start an annotation sessionReturns a session id used in future session callsParameters:- searchResultId- taskType
-
stopSession
(sessionId)¶ - Stops an annotation sessionParameters:- sessionId
-
submitAnnotation
(sessionId, unitId, communication)¶ - Submit an annotation for a sessionParameters:- sessionId- unitId- communication
-
-
class
concrete.services.results.ResultsServerService.
Iface
¶ Bases:
concrete.services.Service.Iface
-
getLatestSearchResult
(userId)¶ - Get the most recent search results for a userParameters:- userId
-
getNextChunk
(sessionId)¶ - Get next chunk of data to annotateThe client should use the Retriever service to access the dataParameters:- sessionId
-
getSearchResult
(searchResultId)¶ - Get a search result objectParameters:- searchResultId
-
getSearchResults
(taskType, limit)¶ - Get a list of search results for a particular annotation taskSet the limit to 0 to get all relevant search resultsParameters:- taskType- limit
-
getSearchResultsByUser
(taskType, userId, limit)¶ - Get a list of search results for a particular annotation task filtered by a user idSet the limit to 0 to get all relevant search resultsParameters:- taskType- userId- limit
-
registerSearchResult
(result, taskType)¶ - Register the specified search result for annotation.If a name has not been assigned to the search query, one will be generated.This service also requires that the user_id field be populated in the SearchQuery.Parameters:- result- taskType
-
startSession
(searchResultId, taskType)¶ - Start an annotation sessionReturns a session id used in future session callsParameters:- searchResultId- taskType
-
stopSession
(sessionId)¶ - Stops an annotation sessionParameters:- sessionId
-
submitAnnotation
(sessionId, unitId, communication)¶ - Submit an annotation for a sessionParameters:- sessionId- unitId- communication
-
-
class
concrete.services.results.ResultsServerService.
Processor
(handler)¶ Bases:
concrete.services.Service.Processor
,concrete.services.results.ResultsServerService.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_getLatestSearchResult
(seqid, iprot, oprot)¶
-
process_getNextChunk
(seqid, iprot, oprot)¶
-
process_getSearchResult
(seqid, iprot, oprot)¶
-
process_getSearchResults
(seqid, iprot, oprot)¶
-
process_getSearchResultsByUser
(seqid, iprot, oprot)¶
-
process_registerSearchResult
(seqid, iprot, oprot)¶
-
process_startSession
(seqid, iprot, oprot)¶
-
process_stopSession
(seqid, iprot, oprot)¶
-
process_submitAnnotation
(seqid, iprot, oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getLatestSearchResult_args
(userId=None)¶ Bases:
object
Attributes:- userId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getLatestSearchResult_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getNextChunk_args
(sessionId=None)¶ Bases:
object
Attributes:- sessionId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getNextChunk_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResult_args
(searchResultId=None)¶ Bases:
object
Attributes:- searchResultId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResult_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResultsByUser_args
(taskType=None, userId=None, limit=None)¶ Bases:
object
Attributes:- taskType- userId- limit-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResultsByUser_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResults_args
(taskType=None, limit=None)¶ Bases:
object
Attributes:- taskType- limit-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
getSearchResults_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
registerSearchResult_args
(result=None, taskType=None)¶ Bases:
object
Attributes:- result- taskType-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
registerSearchResult_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
startSession_args
(searchResultId=None, taskType=None)¶ Bases:
object
Attributes:- searchResultId- taskType-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
startSession_result
(success=None, ex=None)¶ Bases:
object
Attributes:- success- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
stopSession_args
(sessionId=None)¶ Bases:
object
Attributes:- sessionId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.results.ResultsServerService.
stopSession_result
(ex=None)¶ Bases:
object
Attributes:- ex-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Submodules¶
concrete.services.Service module¶
-
class
concrete.services.Service.
Client
(iprot, oprot=None)¶ Bases:
concrete.services.Service.Iface
Base service that all other services should inherit from-
about
()¶ - Get information about the service
-
alive
()¶ - Is the service alive?
-
recv_about
()¶
-
recv_alive
()¶
-
send_about
()¶
-
send_alive
()¶
-
-
class
concrete.services.Service.
Iface
¶ Bases:
object
Base service that all other services should inherit from-
about
()¶ - Get information about the service
-
alive
()¶ - Is the service alive?
-
-
class
concrete.services.Service.
Processor
(handler)¶ Bases:
concrete.services.Service.Iface
,thrift.Thrift.TProcessor
-
process
(iprot, oprot)¶
-
process_about
(seqid, iprot, oprot)¶
-
process_alive
(seqid, iprot, oprot)¶
-
concrete.services.constants module¶
concrete.services.ttypes module¶
-
class
concrete.services.ttypes.
AnnotationTaskType
¶ Bases:
object
Annotation Tasks Types-
NER
= 2¶
-
TRANSLATION
= 1¶
-
-
class
concrete.services.ttypes.
AnnotationUnitIdentifier
(communicationId=None, sentenceId=None)¶ Bases:
object
An annotation unit is the part of the communication to be annotated.It can be the entire communication or a particular sentence in the communication.If the sentenceID is null, the unit is the entire communicationAttributes:- communicationId: Communication identifier for loading data- sentenceId: Sentence identifer if annotating sentences-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.ttypes.
AnnotationUnitType
¶ Bases:
object
An annotation unit is the part of the communication to be annotated.-
COMMUNICATION
= 1¶
-
SENTENCE
= 2¶
-
-
class
concrete.services.ttypes.
AsyncContactInfo
(host=None, port=None)¶ Bases:
object
Contact information for the asynchronous communications.When a client contacts a server for a job that takes a significant amount of time,it is often best to implement this asynchronously.We do this by having the client stand up a server to accept the results andpassing that information to the original server.The server may want to create a new thrift client on every request or maintaina pool of clients for reuse.Attributes:- host- port-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
exception
concrete.services.ttypes.
NotImplementedException
(message=None, serEx=None)¶ Bases:
thrift.Thrift.TException
An exception to be used when an invoked method hasnot been implemented by the service.Attributes:- message: The explanation (why the exception occurred)- serEx: The serialized exception-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.services.ttypes.
ServiceInfo
(name=None, version=None, description=None)¶ Bases:
object
Each service is described by this info struct.It is for human consumption and for records of versions in deployments.Attributes:- name: Name of the service- version: Version string of the service.It is preferred that the services implement semantic versioning: http://semver.org/with version strings like x.y.z- description: Description of the service-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
exception
concrete.services.ttypes.
ServicesException
(message=None, serEx=None)¶ Bases:
thrift.Thrift.TException
An exception to be used with Concrete services.Attributes:- message: The explanation (why the exception occurred)- serEx: The serialized exception-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.situations package¶
Submodules¶
concrete.situations.constants module¶
concrete.situations.ttypes module¶
-
class
concrete.situations.ttypes.
Argument
(role=None, entityId=None, situationId=None, propertyList=None)¶ Bases:
object
A situation argument, consisting of an argument role and a value.Argument values may be Entities or Situations.Attributes:- role: The relationship between this argument and the situation thatowns it. The roles that a situation’s arguments can takedepend on the type of the situation (including subtypeinformation, such as event_type).- entityId: A pointer to the value of this argument, if it is explicitlyencoded as an Entity.- situationId: A pointer to the value of this argument, if it is a situation.- propertyList: For the BinarySRL task, there may be situationswhere more than one property is attached to a singleparticipant. A list of these properties can be stored in this field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
Justification
(justificationType=None, mentionId=None, tokenRefSeqList=None)¶ Bases:
object
Attributes:- justificationType: An enumerated value used to describe the way in which thejustification’s mention provides supporting evidence for thesituation.- mentionId: A pointer to the SituationMention itself.- tokenRefSeqList: An optional list of pointers to tokens that are (especially)relevant to the way in which this mention providesjustification for the situation. It is left up to individualanalytics to decide what tokens (if any) they wish to includein this field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
MentionArgument
(role=None, entityMentionId=None, situationMentionId=None, tokens=None, constituent=None, confidence=None, propertyList=None)¶ Bases:
object
A “concrete” argument, that may be used by SituationMentions or EntityMentionsto avoid conflicts where abstract Arguments were being used to support concrete Mentions.Attributes:- role: The relationship between this argument and the situation thatowns it. The roles that a situation’s arguments can takedepend on the type of the situation (including subtypeinformation, such as event_type).- entityMentionId: A pointer to the value of an EntityMention, if this is being used to supportan EntityMention.- situationMentionId: A pointer to the value of this argument, if it is a SituationMention.- tokens: The location of this MentionArgument in the Communication.If this MentionArgument can be identified in a document using anEntityMention or SituationMention, then UUID references to thosetypes should be preferred and this field left as null.- constituent: An alternative way to specify the same thing as tokens.- confidence: Confidence of this argument belonging to its SituationMention- propertyList: For the BinarySRL task, there may be situationswhere more than one property is attached to a singleparticipant. A list of these properties can be stored in this field.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
Property
(value=None, metadata=None, polarity=None)¶ Bases:
object
Attached to Arguments to support situations wherea ‘participant’ has more than one ‘property’ (in BinarySRL terms),whereas Arguments notionally only support one Role.Attributes:- value: The required value of the property.- metadata: Metadata to support this particular property object.- polarity: This value is typically boolean, 0.0 or 1.0, but we use afloat in order to potentially capture cases where an annotator ishighly confident that the value is underspecified, via a value of0.5.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
Situation
(uuid=None, situationType=None, situationKind=None, argumentList=None, mentionIdList=None, justificationList=None, timeML=None, intensity=None, polarity=None, confidence=None)¶ Bases:
object
A single situation, along with pointers to situation mentions thatprovide evidence for the situation. “Situations” include events,relations, facts, sentiments, and beliefs. Each situation has acore type (such as EVENT or SENTIMENT), along with an optionalsubtype based on its core type (e.g., event_type=CONTACT_MEET), anda set of zero or more unordered arguments.This struct may be used for a variety of “processed” Situations suchas (but not limited to):- SituationMentions which have been collapsed into a coreferential cluster- Situations which are inferred and not directly supported by a textual mentionAttributes:- uuid: Unique identifier for this situation.- situationType: The core type of this situation (eg EVENT or SENTIMENT),or a coarse grain situation type.- situationKind: A fine grain situation type that specifically describes thesituation based on situationType above. It allows for moredetailed description of the situation.Some examples:if situationType == EVENT, the event type for the situationif situationType == STATE, the state typeif situationType == TEMPORAL_FACT, the temporal fact typeFor Propbank, this field should be the predicate lemma and id,e.g. “strike.02”. For FrameNet, this should be the frame name,e.g. “Commerce_buy”.Different and more varied situationTypes may be addedin the future.- argumentList: The arguments for this situation. Each argument consists of arole and a value. It is possible for an situation to havemultiple arguments with the same role. Arguments areunordered.- mentionIdList: Ids of the mentions of this situation in a communication(type=SituationMention)- justificationList: An list of pointers to SituationMentions that providejustification for this situation. These mentions may be eitherdirect mentions of the situation, or indirect evidence.- timeML: A wrapper for TimeML annotations.- intensity: An “intensity” rating for this situation, typically ranging from0-1. In the case of SENTIMENT situations, this is used to recordthe intensity of the sentiment.- polarity: The polarity of this situation. In the case of SENTIMENTsituations, this is used to record the polarity of thesentiment.- confidence: A confidence score for this individual situation. You can alsoset a confidence score for an entire SituationSet using theSituationSet’s metadata.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
SituationMention
(uuid=None, text=None, situationType=None, situationKind=None, argumentList=None, intensity=None, polarity=None, tokens=None, constituent=None, confidence=None)¶ Bases:
object
A concrete mention of a situation, where “situations” includeevents, relations, facts, sentiments, and beliefs. Each situationhas a core type (such as EVENT or SENTIMENT), along with anoptional subtype based on its core type (e.g.,event_type=CONTACT_MEET), and a set of zero or more unorderedarguments.This struct should be used for most types of SRL labelings(e.g. Propbank and FrameNet) because they are grounded in text.Attributes:- uuid: Unique identifier for this situation.- text: The text content of this situation mention. This field isoften redundant with the ‘tokens’ field, and may notbe generated by all analytics.- situationType: The core type of this situation (eg EVENT or SENTIMENT),or a coarse grain situation type.- situationKind: A fine grain situation type that specifically describes thesituation mention based on situationType above. It allows formore detailed description of the situation mention.Some examples:if situationType == EVENT, the event type for the sit. mentionif situationType == STATE, the state type for this sit. mentionFor Propbank, this field should be the predicate lemma and id,e.g. “strike.02”. For FrameNet, this should be the frame name,e.g. “Commerce_buy”.Different and more varied situationTypes may be addedin the future.- argumentList: The arguments for this situation mention. Each argumentconsists of a role and a value. It is possible for an situationto have multiple arguments with the same role. Arguments areunordered.- intensity: An “intensity” rating for the situation, typically ranging from0-1. In the case of SENTIMENT situations, this is used to recordthe intensity of the sentiment.- polarity: The polarity of this situation. In the case of SENTIMENTsituations, this is used to record the polarity of thesentiment.- tokens: An optional pointer to tokens that are (especially)relevant to this situation mention. It is left up to individualanalytics to decide what tokens (if any) they wish to include inthis field. In particular, it is not specified whether thearguments’ tokens should be included.- constituent: An alternative way to specify the same thing as tokens.- confidence: A confidence score for this individual situation mention. Youcan also set a confidence score for an entire SituationMentionSetusing the SituationMentionSet’s metadata.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
SituationMentionSet
(uuid=None, metadata=None, mentionList=None, linkingList=None)¶ Bases:
object
A theory about the set of situation mentions that are present in amessage. See also: SituationMentionAttributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- mentionList: List of mentions in this set.- linkingList: Entity linking annotations associated with this SituationMentionSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
SituationSet
(uuid=None, metadata=None, situationList=None, linkingList=None)¶ Bases:
object
A theory about the set of situations that are present in amessage. See also: SituationAttributes:- uuid: Unique identifier for this set.- metadata: Information about where this set came from.- situationList: List of mentions in this set.- linkingList: Entity linking annotations associated with this SituationSet.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.situations.ttypes.
TimeML
(timeMLClass=None, timeMLTense=None, timeMLAspect=None)¶ Bases:
object
A wrapper for various TimeML annotations.Attributes:- timeMLClass: The TimeML class for situations representing TimeML events- timeMLTense: The TimeML tense for situations representing TimeML events- timeMLAspect: The TimeML aspect for situations representing TimeML events-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.spans package¶
Submodules¶
concrete.spans.constants module¶
concrete.spans.ttypes module¶
-
class
concrete.spans.ttypes.
AudioSpan
(start=None, ending=None)¶ Bases:
object
A span of audio within a single communication, identified by apair of time offests. Time offsets are zero-based.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.Attributes:- start: Start time (in seconds)- ending: End time (in seconds)-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.spans.ttypes.
TextSpan
(start=None, ending=None)¶ Bases:
object
A span of text within a single communication, identified by a pairof zero-indexed character offsets into a Thrift string. Thrift stringsare encoded using UTF-8:The offsets are character-based, not byte-based - a character with athree-byte UTF-8 representation only counts as one character.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.Attributes:- start: Start character, inclusive.- ending: End character, exclusive-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.structure package¶
Submodules¶
concrete.structure.constants module¶
concrete.structure.ttypes module¶
-
class
concrete.structure.ttypes.
Arc
(src=None, dst=None, token=None, weight=None)¶ Bases:
object
Type for arcs. For epsilon edges, leave ‘token’ blank.Attributes:- src- dst- token- weight-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Constituent
(id=None, tag=None, childList=None, headChildIndex=-1, start=None, ending=None)¶ Bases:
object
A single parse constituent (or “phrase”).Attributes:- id: A parse-relative identifier for this consistuent. Togetherwith the UUID for a Parse, this can be used to definepointers to specific constituents.- tag: A description of this constituency node, e.g. the category “NP”.For leaf nodes, this should be a word and for pre-terminal nodesthis should be a POS tag.- childList- headChildIndex: The index of the head child of this constituent. I.e., thehead child of constituent <tt>c</tt> is<tt>c.children[c.head_child_index]</tt>. A value of -1indicates that no child head was identified.- start: The first token (inclusive) of this constituent in theparent Tokenization. Almost certainly should be populated.- ending: The last token (exclusive) of this constituent in theparent Tokenization. Almost certainly should be populated.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
ConstituentRef
(parseId=None, constituentIndex=None)¶ Bases:
object
A reference to a Constituent within a Parse.Attributes:- parseId: The UUID of the Parse that this Constituent belongs to.- constituentIndex: The index in the constituent list of this Constituent.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Dependency
(gov=-1, dep=None, edgeType=None)¶ Bases:
object
A syntactic edge between two tokens in a tokenized sentence.Attributes:- gov: The governor or the head token. 0 indexed.- dep: The dependent token. 0 indexed.- edgeType: The relation that holds between gov and dep.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
DependencyParse
(uuid=None, metadata=None, dependencyList=None, structureInformation=None)¶ Bases:
object
Represents a dependency parse with typed edges.Attributes:- uuid- metadata- dependencyList- structureInformation-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
DependencyParseStructure
(isAcyclic=None, isConnected=None, isSingleHeaded=None, isProjective=None)¶ Bases:
object
Information about the structure of a dependency parse.This information is computable from the list of dependencies,but this allows the consumer to make (verified) assumptionsabout the dependencies being processed.Attributes:- isAcyclic: True iff there are no cycles in the dependency graph.- isConnected: True iff the dependency graph forms a single connected component.- isSingleHeaded: True iff every node in the dependency parse has at mostone head/parent/governor.- isProjective: True iff there are no crossing edges in the dependency parse.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
LatticePath
(weight=None, tokenList=None)¶ Bases:
object
Attributes:- weight- tokenList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Parse
(uuid=None, metadata=None, constituentList=None)¶ Bases:
object
A theory about the syntactic parse of a sentence.ote If we add support for parse forests in the future, then itwill most likely be done by adding a new field (e.g.“<tt>forest_root</tt>”) that uses a new struct type to encode theforest. A “<tt>kind</tt>” field might also be added (analogous to<tt>Tokenization.kind</tt>) to indicate whether a parse is encodedusing a simple tree or a parse forest.Attributes:- uuid- metadata- constituentList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Section
(uuid=None, sentenceList=None, textSpan=None, rawTextSpan=None, audioSpan=None, kind=None, label=None, numberList=None, lidList=None)¶ Bases:
object
A single “section” of a communication, such as a paragraph. Eachsection is defined using a text or audio span, and can optionallycontain a list of sentences.Attributes:- uuid: The unique identifier for this section.- sentenceList: The sentences of this “section.”- textSpan: Location of this section in the communication text.NOTE: This text span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- rawTextSpan: Location of this section in the raw text.NOTE: This text span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- audioSpan: Location of this section in the original audio.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- kind: A short, sometimes corpus-specific term characterizing the natureof the section; may change in a future version of concrete. Thisoften acts as a coarse-grained descriptor that is used forfiltering. For example, Gigaword uses the section kind “passage”to distinguish content-bearing paragraphs in the body of anarticle from other paragraphs, such as the headline and dateline.- label: The name of the section. For example, a title of a section onWikipedia.- numberList: Position within the communication with respect to other Sections:The section number, E.g., 3, or 3.1, or 3.1.2, etc. Aimed atCommunications with content organized in a hierarchy, such as a Bookwith multiple chapters, then sections, then paragraphs. Or even adense Wikipedia page with subsections. Sections should still bearranged linearly, where reading these numbers should not be requiredto get a start-to-finish enumeration of the Communication’s content.- lidList: An optional field to be used for multi-language documents.This field should be populated when a section is inside ofa document that contains multiple languages.Minimally, each block of text in one language should be it’s ownsection. For example, if a paragraph is in English and theparagraph afterwards is in French, these should be separated intotwo different sections, allowing language-specific analytics torun on appropriate sections.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Sentence
(uuid=None, tokenization=None, textSpan=None, rawTextSpan=None, audioSpan=None)¶ Bases:
object
A single sentence or utterance in a communication.Attributes:- uuid- tokenization: Theory about the tokens that make up this sentence. For textcommunications, these tokenizations will typically be generatedby a tokenizer. For audio communications, these tokenizationswill typically be generated by an automatic speech recognizer.The “Tokenization” message type is also used to store the outputof machine translation systems and text normalizationsystems.- textSpan: Location of this sentence in the communication text.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- rawTextSpan: Location of this sentence in the raw text.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.- audioSpan: Location of this sentence in the original audio.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
SpanLink
(tokens=None, concreteTarget=None, externalTarget=None, linkType=None)¶ Bases:
object
A collection of tokens that represent a link to another resource.This resource might be another Concrete object (e.g., anotherConcrete Communication), represented with the ‘concreteTarget’field, or it could link to a resource outside of Concrete via the‘externalTarget’ field.Attributes:- tokens: The tokens that make up this SpanLink object.- concreteTarget- externalTarget- linkType-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TaggedToken
(tokenIndex=None, tag=None, confidence=None, tagList=None, confidenceList=None)¶ Bases:
object
Attributes:- tokenIndex: A pointer to the token being tagged.Token indices are 0-based. These indices are also 0-based.- tag: A string containing the annotation.If the tag set you are using is not case sensitive,then all part of speech tags should be normalized to upper case.- confidence: Confidence of the annotation.- tagList: A list of strings that represent a distribution of possibletags for this token.If populated, the ‘tag’ field should also be populatedwith the “best” value from this list.- confidenceList: A list of doubles that represent confidences associated withthe tags in the ‘tagList’ field.If populated, the ‘confidence’ field should also be populatedwith the confidence associated with the “best” tag in ‘tagList’.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Token
(tokenIndex=None, text=None, textSpan=None, rawTextSpan=None, audioSpan=None)¶ Bases:
object
A single token (typically a word) in a communication. The exactdefinition of what counts as a token is left up to the tools thatgenerate token sequences.Usually, each token will include at least a text string.Attributes:- tokenIndex: A 0-based tokenization-relative identifier for this token thatrepresents the order that this token appears in thesentence. Together with the UUID for a Tokenization, this can beused to define pointers to specific tokens. If a Tokenizationobject contains multiple Token objects with the same id (e.g., indifferent n-best lists), then all of their other fields must beidentical as well.- text: The text associated with this token.Note - we may have a destructive tokenizer (e.g., Stanford rewriting)and as a result, we want to maintain this field.- textSpan: Location of this token in this perspective’s text (.text field).In cases where this token does not correspond directly with anytext span in the text (such as word insertion during MT),this field may be given a value indicating “approximately” wherethe token comes from. A span covering the entire sentence may beused if no more precise value seems appropriate.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the document, but is the annotation’s besteffort at such a representation.- rawTextSpan: Location of this token in the original, raw text (.originalTextfield). In cases where this token does not correspond directlywith any text span in the original text (such as word insertionduring MT), this field may be given a value indicating“approximately” where the token comes from. A span covering theentire sentence may be used if no more precise value seemsappropriate.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original raw document, but is the annotation’s besteffort at such a representation.- audioSpan: Location of this token in the original audio.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenLattice
(startState=0, endState=0, arcList=None, cachedBestPath=None)¶ Bases:
object
A lattice structure that assigns scores to a set of tokensequences. The lattice is encoded as an FSA, where states areidentified by integers, and each arc is annotated with anoptional tokens and a weight. (Arcs with no tokens are“epsilon” arcs.) The lattice has a single start state and asingle end state. (You can use epsilon edges to simulatemultiple start states or multiple end states, if desired.)The score of a path through the lattice is the sum of the weightsof the arcs that make up that path. A path with a lower scoreis considered “better” than a path with a higher score.If possible, path scores should be negative log likelihoods(with base e – e.g. if P=1, then weight=0; and if P=0.5, thenweight=0.693). Furthermore, if possible, the path scores shouldbe globally normalized (i.e., they should encode probabilities).This will allow for them to be combined with other informationin a reasonable way when determining confidences for systemoutputs.TokenLattices should never contain any paths with cycles. Everyarc in the lattice should be included in some path from the startstate to the end state.Attributes:- startState- endState- arcList- cachedBestPath-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenList
(tokenList=None)¶ Bases:
object
A wrapper around a list of tokens.Attributes:- tokenList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenRefSequence
(tokenIndexList=None, anchorTokenIndex=-1, tokenizationId=None, textSpan=None, rawTextSpan=None, audioSpan=None)¶ Bases:
object
A list of pointers to tokens that all belong to the sametokenization.Attributes:- tokenIndexList: The tokenization-relative identifiers for each token that isincluded in this sequence.- anchorTokenIndex: An optional field that can be used to describethe root of a sentence (if this sequence is a full sentence),the head of a constituent (if this sequence is a constituent),or some other form of “canonical” token in this sequence if,for instance, it is not easy to map this sequence to a anotherannotation that has a head.This field is defined with respect to the Tokenization givenby tokenizationId, and not to this object’s tokenIndexList.- tokenizationId: The UUID of the tokenization that contains the tokens.- textSpan: The text span in the main text (.text field) associated with thisTokenRefSequence.NOTE: This span represents a best guess, or ‘provenance’: itcannot be guaranteed that this text span matches the _exact_ textof the original document, but is the annotation’s best effort atsuch a representation.- rawTextSpan: The text span in the original text (.originalText field)associated with this TokenRefSequence.NOTE: This span represents a best guess, or ‘provenance’: itcannot be guaranteed that this text span matches the _exact_ textof the original raw document, but is the annotation’s best effortat such a representation.- audioSpan: The audio span associated with this TokenRefSequence.NOTE: This span represents a best guess, or ‘provenance’:it cannot be guaranteed that this text span matches the _exact_text of the original document, but is the annotation’s besteffort at such a representation.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
TokenTagging
(uuid=None, metadata=None, taggedTokenList=None, taggingType=None)¶ Bases:
object
A theory about some token-level annotation.The TokenTagging consists of a mapping from tokens(using token ids) to string tags (e.g. part-of-speech tags or lemmas).The mapping defined by a TokenTagging may be partial –i.e., some tokens may not be assigned any part of speech tags.For lattice tokenizations, you may need to create multiplepart-of-speech taggings (for different paths through the lattice),since the appropriate tag for a given token may depend on the pathtaken. For example, you might define a separateTokenTagging for each of the top K paths, which leaves alltokens that are not part of the path unlabeled.Currently, we use strings to encode annotations. Inthe future, we may add fields for encoding specific tag sets(eg treebank tags), or for adding compound tags.Attributes:- uuid: The UUID of this TokenTagging object.- metadata: Information about where the annotation came from.This should be used to tell between gold-standard annotationsand automatically-generated theories about the data- taggedTokenList: The mapping from tokens to annotations.This may be a partial mapping.- taggingType: An ontology-backed string that represents thetype of token taggings this TokenTagging objectproduces.-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.structure.ttypes.
Tokenization
(uuid=None, metadata=None, tokenList=None, lattice=None, kind=None, tokenTaggingList=None, parseList=None, dependencyParseList=None, spanLinkList=None)¶ Bases:
object
A theory (or set of alternative theories) about the sequence oftokens that make up a sentence.This message type is used to record the output of not just fortokenizers, but also for a wide variety of other tools, includingmachine translation systems, text normalizers, part-of-speechtaggers, and stemmers.Each Tokenization is encoded using either a TokenListor a TokenLattice. (If you want to encode an n-best list, thenyou should store it as n separate Tokenization objects.) The“kind” field is used to indicate whether this Tokenization containsa list of tokens or a TokenLattice.The confidence value for each sequence is determined by combiningthe confidence from the “metadata” field with confidenceinformation from individual token sequences as follows:<ul><li> For n-best lists:metadata.confidence </li><li> For lattices:metadata.confidence * exp(-sum(arc.weight)) </li></ul>Note: in some cases (such as the output of a machine translationtool), the order of the tokens in a token sequence may notcorrespond with the order of their original text span offsets.Attributes:- uuid- metadata: Information about where this tokenization came from.- tokenList: A wrapper around an ordered list of the tokens in this tokenization.This may also give easy access to the “reconstructed text” associatedwith this tokenization.This field should only have a value if kind==TOKEN_LIST.- lattice: A lattice that compactly describes a set of token sequences thatmight make up this tokenization. This field should only have avalue if kind==LATTICE.- kind: Enumerated value indicating whether this tokenization isimplemented using an n-best list or a lattice.- tokenTaggingList- parseList- dependencyParseList- spanLinkList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.twitter package¶
Submodules¶
concrete.twitter.constants module¶
concrete.twitter.ttypes module¶
-
class
concrete.twitter.ttypes.
BoundingBox
(type=None, coordinateList=None)¶ Bases:
object
Attributes:- type- coordinateList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
HashTag
(text=None, startOffset=None, endOffset=None)¶ Bases:
object
Attributes:- text- startOffset- endOffset-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
PlaceAttributes
(streetAddress=None, region=None, locality=None)¶ Bases:
object
Attributes:- streetAddress- region- locality-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TweetInfo
(id=None, text=None, createdAt=None, user=None, truncated=None, entities=None, source=None, coordinates=None, place=None, favorited=None, retweeted=None, retweetCount=None, inReplyToScreenName=None, inReplyToStatusId=None, inReplyToUserId=None, retweetedScreenName=None, retweetedStatusId=None, retweetedUserId=None)¶ Bases:
object
Attributes:- id- text- createdAt- user- truncated- entities- source- coordinates- place- favorited- retweeted- retweetCount- inReplyToScreenName- inReplyToStatusId- inReplyToUserId- retweetedScreenName- retweetedStatusId- retweetedUserId-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterCoordinates
(type=None, coordinates=None)¶ Bases:
object
Attributes:- type- coordinates-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterEntities
(hashtagList=None, urlList=None, userMentionList=None)¶ Bases:
object
Attributes:- hashtagList- urlList- userMentionList-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterLatLong
(latitude=None, longitude=None)¶ Bases:
object
A twitter geocoordinate.Attributes:- latitude- longitude-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterPlace
(placeType=None, countryCode=None, country=None, fullName=None, name=None, id=None, url=None, boundingBox=None, attributes=None)¶ Bases:
object
Attributes:- placeType- countryCode- country- fullName- name- id- url- boundingBox- attributes-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
-
class
concrete.twitter.ttypes.
TwitterUser
(id=None, name=None, screenName=None, lang=None, geoEnabled=None, createdAt=None, friendsCount=None, statusesCount=None, verified=None, listedCount=None, favouritesCount=None, followersCount=None, location=None, timeZone=None, description=None, utcOffset=None, url=None)¶ Bases:
object
Information about a Twitter user.Attributes:- id- name- screenName- lang- geoEnabled- createdAt- friendsCount- statusesCount- verified- listedCount- favouritesCount- followersCount- location- timeZone- description- utcOffset- url-
read
(iprot)¶
-
validate
()¶
-
write
(oprot)¶
-
Module contents¶
concrete.util package¶
Submodules¶
concrete.util.access module¶
-
class
concrete.util.access.
CommunicationContainerFetchHandler
(communication_container)¶ Bases:
object
FetchCommunicationService implementation using Communication containers
Implements the
FetchCommunicationService
interface, retrieving Communications from a dict-like communication_container object that maps Communication ID strings to Communications. The communication_container could be an actual dict, or a container such as:DirectoryBackedCommunicationContainer
FetchBackedCommunicationContainer
MemoryBackedCommunicationContainer
RedisHashBackedCommunicationContainer
ZipFileBackedCommunicationContainer
Usage:
from concrete.util.access_wrapper import FetchCommunicationServiceWrapper handler = CommunicationContainerFetchHandler(comm_container) fetch_service = FetchCommunicationServiceWrapper(handler) fetch_service.serve(host, port)
Parameters: communication_container – Dict-like object that maps Communication IDs to Communications -
about
()¶
-
alive
()¶
-
fetch
(fetch_request)¶
-
getCommunicationCount
()¶
-
getCommunicationIDs
(offset, count)¶
-
class
concrete.util.access.
DirectoryBackedStoreHandler
(store_path)¶ Bases:
object
Simple StoreCommunicationService implementation using a directory
Implements the
StoreCommunicationService
interface, storing Communications in a directory.Parameters: store_path – Path where Communications should be Stored -
about
()¶
-
alive
()¶
-
store
(communication)¶ Save Communication to a directory
Stored Communication files will be named [COMMUNICATION_ID].comm. If a file with that name already exists, it will be overwritten.
-
-
class
concrete.util.access.
RelayFetchHandler
(host, port)¶ Bases:
object
Implements a ‘relay’ to another
FetchCommunicationService
server.A
FetchCommunicationService
that acts as a relay to a secondFetchCommunicationService
, where the second service is using the TSocket transport and TCompactProtocol protocol.This class was designed for the use case where you have Thrift JavaScript code that needs to communicate with a
FetchCommunicationService
server, but the server does not support the same Thrift serialization protocol as the JavaScript client.The de-facto standard for Concrete services is to use the TCompactProtocol serialization protocol over a TSocket connection. But as of Thrift 0.10.0, the Thrift JavaScript libraries only support using TJSONProtocol over HTTP.
The RelayFetchHandler class is intended to be used as server-side code by a web application. The JavaScript code will make
FetchCommunicationService
RPC calls to the web server using HTTP/TJSONProtocol, and the web application will then pass these RPC calls to anotherFetchCommunicationService
using TSocket/TCompactProtocol RPC calls.Parameters: - host (str) – Hostname of
FetchCommunicationService
server - port (int) – Port # of
FetchCommunicationService
server
-
about
()¶
-
alive
()¶
-
fetch
(request)¶
-
getCommunicationCount
()¶
-
getCommunicationIDs
(offset, count)¶
- host (str) – Hostname of
concrete.util.access_wrapper module¶
-
class
concrete.util.access_wrapper.
FetchCommunicationClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/FetchCommunicationService.pyc'>¶
-
-
class
concrete.util.access_wrapper.
FetchCommunicationServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.access.FetchCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/FetchCommunicationService.pyc'>¶
-
-
class
concrete.util.access_wrapper.
StoreCommunicationClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/StoreCommunicationService.pyc'>¶
-
-
class
concrete.util.access_wrapper.
StoreCommunicationServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.access.StoreCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/access/StoreCommunicationService.pyc'>¶
-
-
class
concrete.util.access_wrapper.
SubprocessFetchCommunicationServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
FetchCommunicationServiceWrapper
-
-
class
concrete.util.access_wrapper.
SubprocessStoreCommunicationServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
StoreCommunicationServiceWrapper
-
concrete.util.annotate_wrapper module¶
-
class
concrete.util.annotate_wrapper.
AnnotateCommunicationClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/annotate/AnnotateCommunicationService.pyc'>¶
-
-
class
concrete.util.annotate_wrapper.
AnnotateCommunicationServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.annotate.AnnotateCommunicationService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/annotate/AnnotateCommunicationService.pyc'>¶
-
-
class
concrete.util.annotate_wrapper.
SubprocessAnnotateCommunicationServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
AnnotateCommunicationServiceWrapper
-
concrete.util.comm_container module¶
Communication Containers - mapping Communication IDs to Communications
Classes that behave like a read-only dictionary (implementing Python’s collections.Mapping interface) and map Communication ID strings to Communications.
The classes abstract away the storage backend. If you need to optimize for performance, you may not want to use a dictionary abstraction that retrieves one Communication at a time.
-
class
concrete.util.comm_container.
DirectoryBackedCommunicationContainer
(directory_path, comm_extensions=[u'.comm', u'.concrete', u'.gz'])¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms, retrieving Comms from the filesystem
DirectoryBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from the filesystem.
Upon initialization, a DirectoryBackedCommunicationContainer instance will (recursively) search directory_path for any files that end with the specified comm_extensions. Files with matching extensions are assumed to be Communication files whose filename (sans extension) is the file’s Communication ID. So, for example, a file named ‘XIN_ENG_20101212.0120.concrete’ is assumed to be a Communication file with a Communication ID of ‘XIN_ENG_20101212.0120’.
Files with the extension .gz will be decompressed using gzip.
A DirectoryBackedCommunicationsContainer will not be able to find any files that are added to directory_path after the container was initialized.
Parameters: - directory_path (str) – Path to directory containing Communications files
- comm_extensions (str[]) – List of strings specifying filename extensions to be associated with Communications
-
class
concrete.util.comm_container.
FetchBackedCommunicationContainer
(host, port)¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms, retrieving Comms from a
FetchCommunicationService
serverFetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a
FetchCommunicationService
.If you need to retrieve large amounts of data from a
FetchCommunicationService
, then you SHOULD NOT USE THIS CLASS. This class retrieves one Communication at a time usingFetchCommunicationService
.Parameters: - host (str) – Hostname of
FetchCommunicationService
server - port (int) – Port # of
FetchCommunicationService
server
- host (str) – Hostname of
-
class
concrete.util.comm_container.
MemoryBackedCommunicationContainer
(communications_file, max_file_size=1073741824)¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms by loading all Comms in file into memory
FetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. All Communications in communications_file will be read into memory using a
CommunicationReader
instance.Parameters: - communications_file (str) – String specifying name of Communications file
- max_file_size (int) – Maximum file size, in bytes
-
class
concrete.util.comm_container.
RedisHashBackedCommunicationContainer
(redis_db, key)¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms, retrieving Comms from a Redis hash
RedisHashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a Redis hash.
Parameters: - redis_db (redis.Redis) – redis database connection
- key (str) – Key in redis database where hash is located
-
class
concrete.util.comm_container.
ZipFileBackedCommunicationContainer
(zipfile_path, comm_extensions=[u'.comm', u'.concrete'])¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms, retrieving Comms from a Zip file
ZipFileBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a Zip file.
Parameters: - zipfile_path (str) – Path to Zip file containing Communications
- comm_extensions (str[]) – List of strings specifying filename extensions associated with Communications
concrete.util.concrete_uuid module¶
Helper functions for generating Concrete UUID
objects
-
class
concrete.util.concrete_uuid.
AnalyticUUIDGeneratorFactory
(comm=None)¶ Bases:
object
Factory for a compressible UUID generator.
One factory should be created per Communication, and a new generator should be created from that factory for each analytic processing the communication. Usually each program represents a single analytic, so common usage is:
augf = AnalyticUUIDGeneratorFactory(comm) aug = augf.create() for <each annotation object created by this analytic>: annotation = next(aug) <add annotation to communication>
or if you’re creating a new Communication:
augf = AnalyticUUIDGeneratorFactory() aug = augf.create() comm = <create communication> comm.uuid = next(aug) for <each annotation object created by this analytic>: annotation = next(aug) <add annotation to communication>
where the annotation objects might be objects of type
Parse
,DependencyParse
,TokenTagging
,CommunicationTagging
, etc.-
create
()¶ Returns: A UUID generator for a new analytic.
-
-
class
concrete.util.concrete_uuid.
UUIDClustering
(comm)¶ Bases:
object
Representation of the UUID instance clusters in a concrete communication (each cluster represents the set of nested members of the communication that reference or are identified by a given UUID).
-
hashable_clusters
()¶ Hashable version of UUIDClustering.
Two UUIDClusterings c1 and c2 are equivalent (the two underlying Communications’ UUID structures are equivalent) if and only if:
c1.hashable_clusters() == c2.hashable_clusters()Returns: The set of unlabeled UUID clusters in a unique and hashable format.
-
-
class
concrete.util.concrete_uuid.
UUIDCompressor
(single_analytic=False)¶ Bases:
object
-
compress
(comm)¶ Parameters: comm (Communication) – Returns: Deep copy of comm with compressed UUIDs Return type: Communication
-
-
concrete.util.concrete_uuid.
bin_to_hex
(b, n=None)¶
-
concrete.util.concrete_uuid.
compress_uuids
(comm, verify=False, single_analytic=False)¶ Create a copy of
Communication
comm with UUIDs converted according to the compressible UUID schemeParameters: - comm (Communication) –
- verify (bool) – If True, use a heuristic to verify the UUID link structure is preserved in the new Communication
- single_analytic (bool) – If True, use a single analytic prefix for all UUIDs in comm.
Returns: A 2-tuple containing the new
Communication
(converted using the compressible UUID scheme) and theUUIDCompressor
object used to perform the conversion.Raises: ValueError
– If verify is True and comm has references added, raise because verification would cause an infinite loop.
-
concrete.util.concrete_uuid.
generate_UUID
()¶ Helper function for generating a Concrete UUID object
Returns: Concrete UUID object Return type: UUID
-
concrete.util.concrete_uuid.
generate_hex_unif
(n)¶
-
concrete.util.concrete_uuid.
generate_uuid_unif
()¶
-
concrete.util.concrete_uuid.
hex_to_bin
(h)¶
-
concrete.util.concrete_uuid.
join_uuid
(xs, ys, zs)¶
-
concrete.util.concrete_uuid.
split_uuid
(u)¶
concrete.util.file_io module¶
Code for reading and writing Concrete Communications
-
class
concrete.util.file_io.
CommunicationReader
(filename, add_references=True, filetype=0)¶ Bases:
concrete.util.file_io.ThriftReader
Iterator/generator class for reading one or more Communications from a file
The iterator returns a (Communication, filename) tuple
Supported filetypes are:
- a file with a single Communication
- a file with multiple Communications concatenated together
- a gzipped file with a single Communication
- a gzipped file with multiple Communications concatenated together
- a .tar.gz file with one or more Communications
- a .zip file with one or more Communications
Sample usage:
for (comm, filename) in CommunicationReader('multiple_comms.tar.gz'): do_something(comm)
Parameters: - filename (str) –
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
on allCommunication
objects read from file - filetype (FileType) – Expected type of file. Default value is FileType.AUTO, where function will try to automatically determine file type.
-
class
concrete.util.file_io.
CommunicationWriter
(filename=None)¶ Bases:
object
Class for writing one or more Communications to a file
Sample usage:
writer = CommunicationWriter('foo.concrete') writer.write(existing_comm_object) writer.close()
-
close
()¶
-
open
(filename)¶ Parameters: filename (str) –
-
write
(comm)¶ Parameters: comm (Communication) –
-
-
class
concrete.util.file_io.
CommunicationWriterTGZ
(tar_filename=None)¶ Bases:
concrete.util.file_io.CommunicationWriterTar
Class for writing one or more Communications to a .TAR.GZ archive
Sample usage:
writer = CommunicationWriterTGZ('multiple_comms.tgz') writer.write(comm_object_one, 'comm_one.concrete') writer.write(comm_object_two, 'comm_two.concrete') writer.write(comm_object_three, 'comm_three.concrete') writer.close()
-
class
concrete.util.file_io.
CommunicationWriterTar
(tar_filename=None, gzip=False)¶ Bases:
object
Class for writing one or more Communications to a .TAR archive
Sample usage:
writer = CommunicationWriterTar('multiple_comms.tar') writer.write(comm_object_one, 'comm_one.concrete') writer.write(comm_object_two, 'comm_two.concrete') writer.write(comm_object_three, 'comm_three.concrete') writer.close()
Parameters: - tar_filename (str) – If a filename is given,
open()
will be called with the filename - gzip (bool) – Flag indicating if .TAR file should be compressed with gzip
-
close
()¶
-
open
(tar_filename)¶ Parameters: tar_filename (str) –
-
write
(comm, comm_filename=None)¶ Parameters: - comm (Communication) –
- comm_filename (str) –
- tar_filename (str) – If a filename is given,
-
class
concrete.util.file_io.
ThriftReader
(thrift_type, filename, postprocess=None, filetype=0)¶ Bases:
object
Iterator/generator class for reading one or more Thrift structures from a file
The iterator returns a (obj, filename) tuple where obj is an object of type thrift_type.
Supported filetypes are:
- a file with a single Thrift structure
- a file with multiple Thrift structures concatenated together
- a gzipped file with a single Thrift structure
- a gzipped file with multiple Thrift structures concatenated together
- a .tar.gz file with one or more Thrift structures
- a .zip file with one or more Thrift structures
Sample usage:
for (comm, filename) in ThriftReader(Communication, 'multiple_comms.tar.gz'): do_something(comm)
Parameters: - thrift_type – Class for Thrift type, e.g. Communication, TokenLattice
- filename (str) –
- postprocess (function) – A post-processing function that is called with the Thrift object as argument each time a Thrift object is read from the file
- filetype (FileType) – Expected type of file. Default value is FileType.AUTO, where function will try to automatically determine file type.
-
next
()¶
-
concrete.util.file_io.
read_communication_from_file
(communication_filename, add_references=True)¶ Read a Communication from the file specified by filename
Parameters: - communication_filename (str) – String with filename
- add_references (bool) – If True, calls
concrete.util.references.add_references_to_communication()
onCommunication
read from file
Returns: Return type:
-
concrete.util.file_io.
read_thrift_from_file
(thrift_obj, filename)¶ Instantiate Thrift object from contents of named file
The Thrift file is assumed to be encoded using TCompactProtocol
WARNING - Thrift deserialization tends to fail silently. For example, the Thrift libraries will not complain if you try to deserialize data from the file /dev/urandom.
Parameters: - thrift_obj – A Thrift object (e.g. a Communication object)
- filename (str) – A filename string
Returns: The Thrift object that was passed in as an argument
-
concrete.util.file_io.
read_tokenlattice_from_file
(tokenlattice_filename)¶ Read a
TokenLattice
from a fileParameters: tokenlattice_filename (str) – Name of file containing serialized TokenLattice
Returns: Return type: TokenLattice
-
concrete.util.file_io.
write_communication_to_file
(communication, communication_filename)¶ Write a
Communication
to a fileParameters: - communication (Communication) –
- communication_filename (str) –
-
concrete.util.file_io.
write_thrift_to_file
(thrift_obj, filename)¶ Write a Thrift object to a file
Parameters: - thrift_obj –
- filename (str) –
concrete.util.json_fu module¶
Convert Concrete objects to JSON strings
-
concrete.util.json_fu.
communication_file_to_json
(communication_filename, remove_timestamps=False, remove_uuids=False)¶ Get a “pretty-printed” JSON string representation for a
Communication
Parameters: - communication_filename (str) – Communication filename
- remove_timestamps (bool) – Flag for removing timestamps from JSON output
- remove_uuids (bool) – Flag for removing
UUID
info from JSON output
Returns: A “pretty-printed” JSON representation of the Communication
Return type: str
-
concrete.util.json_fu.
get_json_object_without_timestamps
(json_object)¶ Create a copy of a JSON object created by json.loads(), with all representations of
AnnotationMetadata
timestamps (dictionary keys with value timestamp) recursively removed.Parameters: json_object – Python object created from string by json.loads() Returns: A copy of the input data structure with all timestamp objects removed
-
concrete.util.json_fu.
get_json_object_without_uuids
(json_object)¶ Create a copy of a JSON object created by json.loads(), with all representations of
UUID
objects (dictionaries containing a ‘uuidString’ key) recursively removed.Parameters: json_object – Python object created from string by json.loads() Returns: A copy of the input data structure with all UUID objects removed
-
concrete.util.json_fu.
thrift_to_json
(tobj, remove_timestamps=False, remove_uuids=False)¶ Get a “pretty-printed” JSON string representation for a Thrift object
Parameters: - tobj – A Thrift object
- remove_timestamps (bool) – Flag for removing timestamps from JSON output
- remove_uuids (bool) – Flag for removing
UUID
info from JSON output
Returns: A “pretty-printed” JSON representation of the Thrift object
Return type: str
-
concrete.util.json_fu.
tokenlattice_file_to_json
(toklat_filename)¶ Get a “pretty-printed” JSON string representation for a
TokenLattice
Parameters: toklat_filename (str) – String specifying TokenLattice filename Returns: A “pretty-printed” JSON representation of the TokenLattice Return type: str
concrete.util.learn_wrapper module¶
-
class
concrete.util.learn_wrapper.
ActiveLearnerClientClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerClientService.pyc'>¶
-
-
class
concrete.util.learn_wrapper.
ActiveLearnerClientServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.learn.ActiveLearnerClientService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerClientService.pyc'>¶
-
-
class
concrete.util.learn_wrapper.
ActiveLearnerServerClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerServerService.pyc'>¶
-
-
class
concrete.util.learn_wrapper.
ActiveLearnerServerServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.learn.ActiveLearnerServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/learn/ActiveLearnerServerService.pyc'>¶
-
-
class
concrete.util.learn_wrapper.
SubprocessActiveLearnerClientServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
ActiveLearnerClientServiceWrapper
-
-
class
concrete.util.learn_wrapper.
SubprocessActiveLearnerServerServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
ActiveLearnerServerServiceWrapper
-
concrete.util.mem_io module¶
-
concrete.util.mem_io.
communication_deep_copy
(comm)¶ Return deep copy of communication.
-
concrete.util.mem_io.
read_communication_from_buffer
(buf, add_references=True)¶ Deserialize buf (a binary string) and return resulting communication. Add references if requested.
-
concrete.util.mem_io.
write_communication_to_buffer
(comm)¶ Serialize communication to buffer (binary string) and return buffer.
concrete.util.metadata module¶
-
concrete.util.metadata.
datetime_to_timestamp
(dt)¶
-
concrete.util.metadata.
get_index_of_tool
(lst_of_conc, tool)¶ Return the index of the object in the provided list whose tool name matches tool.
If tool is None, return the first valid index into lst_of_conc.
- This returns -1 if:
- lst_of_conc is None, or
- lst_of_conc has no entries, or
- no object in lst_of_conc matches tool.
Args:
- lst_of_conc: A list of Concrete objects, each of which has a .metadata field.
- tool: A tool name to match.
-
concrete.util.metadata.
now_timestamp
()¶ Return timestamp representing the current time.
concrete.util.net module¶
-
concrete.util.net.
find_port
()¶ Find and return an available TCP port.
>>> find_port() > 1023 True
concrete.util.redis_io module¶
-
class
concrete.util.redis_io.
RedisCommunicationReader
(redis_db, key, add_references=True, **kwargs)¶ Bases:
concrete.util.redis_io.RedisReader
Iterable class for reading one or more Communications from redis. See RedisReader for further description.
Example usage:
from redis import Redis redis_db = Redis(port=12345) for comm in RedisCommunicationReader(redis_db, 'my-set-key'): do_something(comm)
Create communication reader for specified key in specified redis_db.
Parameters: - redis_db – object of class redis.Redis
- key – name of redis key containing your communication(s)
- add_references – boolean, True to fill in members in the communication according to UUID relationships (see concrete.util.add_references), False to return communication as-is (note: you may need this False if you are dealing with incomplete communications)
All other keyword arguments are passed through to RedisReader.
-
class
concrete.util.redis_io.
RedisCommunicationWriter
(redis_db, key, uuid_hash_key=False, **kwargs)¶ Bases:
concrete.util.redis_io.RedisWriter
Class for writing one or more Communications to redis. See RedisWriter for further description.
Example usage:
from redis import Redis redis_db = Redis(port=12345) w = RedisCommunicationWriter(redis_db, ‘my-set-key’) w.write(comm)Create communication writer for specified key in specified redis_db.
Parameters: - redis_db – object of class redis.Redis
- key – name of redis key containing your communication(s)
- uuid_hash_key – boolean, True to use the UUID as the hash key for a communication, False to use the id
-
class
concrete.util.redis_io.
RedisReader
(redis_db, key, key_type=None, pop=False, block=False, right_to_left=True, block_timeout=0, temp_key_ttl=3600, temp_key_leaf_len=32, cycle_list=False, deserialize_func=None)¶ Bases:
object
Iterable class for reading one or more objects from redis.
Supported input types are:
- a set containing zero or more objects
- a list containing zero or more objects
- a hash containing zero or more key-object pairs
For list and set types, the reader can optionally pop (consume) its input; for lists only, the reader can moreover block on the input.
Note that iteration over a set or hash will create a temporary key in the redis database to maintain a set of elements scanned so far.
If pop is False and the key (in the database) is modified during iteration, behavior is undefined. If pop is True, modifications during iteration are encouraged.
Example usage:
from redis import Redis redis_db = Redis(port=12345) for obj in RedisReader(redis_db, 'my-set-key'): do_something(obj)
Create reader for specified key in specified redis_db.
Parameters: - redis_db – object of class redis.Redis
- key – name of redis key containing your object(s)
- key_type – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists, so probably not suitable for block and/or pop modes)
- pop – boolean, True to remove objects from redis as we iterate over them, and False to leave redis unaltered
- block – boolean, True to block for data (i.e., wait for something to be added to the list if it is empty), False to end iteration when there is no more data
- right_to_left – boolean, True to iterate over and index in lists from right to left, False to iterate/index from left to right
- deserialize_func – function, maps blobs from redis to some more friendly representation (e.g., if all your items are unicode strings, you might want to specify lambda s: s.decode(‘utf-8’)); return blobs unchanged if deserialize_func is None
-
batch
(n)¶ Return a batch of n objects. May be faster than one-at-a-time iteration, but currently only supported for non-popping, non-blocking set configurations. Support for popping, non-blocking sets is planned; see http://redis.io/commands/spop .
Parameters: n –
-
class
concrete.util.redis_io.
RedisWriter
(redis_db, key, key_type=None, right_to_left=True, serialize_func=None, hash_key_func=None)¶ Bases:
object
Class for writing one or more objects to redis.
Supported input types are:
- a set of objects
- a list of objects
- a hash of key-object pairs
Example usage:
from redis import Redis redis_db = Redis(port=12345) w = RedisWriter(redis_db, ‘my-set-key’) w.write(obj)Create object writer for specified key in specified redis_db.
Parameters: - redis_db – object of class redis.Redis
- key – name of redis key containing your object(s)
- key_type – ‘set’, ‘list’, ‘hash’, or None; if None, look up type in redis (only works if the key exists)
- right_to_left – boolean, True to write elements to the left end of lists, False to write to the right end
- serialize_func – function, maps objects to blobs before sending to Redis (e.g., if everything you write will be a unicode string, you might want to use lambda u: u.encode(‘utf-8’)); pass objects to Redis unchanged if serialize_func is None
- hash_key_func – function, maps objects to keys when key_type is hash (None: use Python’s hash function)
-
clear
()¶
-
write
(obj)¶
-
concrete.util.redis_io.
read_communication_from_redis_key
(redis_db, key, add_references=True)¶ Return a serialized communication from a string key. If block is True, poll server until key appears at specified interval or until specified timeout (indefinitely if timeout is zero). Return None if block is False and key does not exist or if block is True and key does not exist after specified timeout.
Parameters: - redis_db –
- key –
- add_references –
-
concrete.util.redis_io.
write_communication_to_redis_key
(redis_db, key, comm)¶ Serialize communication and store result in redis key.
concrete.util.references module¶
Add reference variables for each UUID
“pointer” in a
Communication
-
concrete.util.references.
add_references_to_communication
(comm)¶ Create references for each
UUID
‘pointer’Parameters: comm (Communication) – A Concrete Communication object The Concrete schema uses
UUID
objects as internal pointers between Concrete objects. This function adds member variables to Concrete objects that are references to the Concrete objects identified by theUUID
.For example, each
Entity
has a mentionIdlist that lists the UUIDs of theEntityMention
objects for thatEntity
. This function adds a mentionList variable to theEntity
that is a list of references to the actualEntityMention
objects. This allows you to access theEntityMention
objects using:entity.mentionListThis function adds these reference variables:
- tokenization to each
TokenRefSequence
- entityMention to each
Argument
- sentence backpointer to each
Tokenization
- parentMention backpointer to appropriate
EntityMention
And adds these lists of reference variables:
- mentionList to each
Entity
- situationMention to each
Argument
- mentionList to each
Situation
- childMentionList to each
EntityMention
For variables that represent optional lists of
UUID
objects (e.g. situation.mentionIdList), Python Thrift will set the variable to None if the list is not provided. When this function adds a list-of-references variable (in this case, situation.mentionList) for an omitted optional list, it sets the new variable to None - it DOES NOT leave the variable undefined.- tokenization to each
concrete.util.results_wrapper module¶
-
class
concrete.util.results_wrapper.
ResultsServerClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/services/results/ResultsServerService.pyc'>¶
-
-
class
concrete.util.results_wrapper.
ResultsServerServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.services.results.ResultsServerService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/services/results/ResultsServerService.pyc'>¶
-
-
class
concrete.util.results_wrapper.
SubprocessResultsServerServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
ResultsServerServiceWrapper
-
concrete.util.search_wrapper module¶
-
class
concrete.util.search_wrapper.
FeedbackClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/FeedbackService.pyc'>¶
-
-
class
concrete.util.search_wrapper.
FeedbackServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.search.FeedbackService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/FeedbackService.pyc'>¶
-
-
class
concrete.util.search_wrapper.
SearchClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchService.pyc'>¶
-
-
class
concrete.util.search_wrapper.
SearchProxyClientWrapper
(host, port)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceClientWrapper
-
concrete_service_class
= <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchProxyService.pyc'>¶
-
-
class
concrete.util.search_wrapper.
SearchProxyServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.search.SearchProxyService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchProxyService.pyc'>¶
-
-
class
concrete.util.search_wrapper.
SearchServiceWrapper
(implementation)¶ Bases:
concrete.util.service_wrapper.ConcreteServiceWrapper
-
concrete_service_class
= <module 'concrete.search.SearchService' from '/home/docs/checkouts/readthedocs.org/user_builds/concrete-python/envs/4.12.9/local/lib/python2.7/site-packages/concrete-4.12.9-py2.7.egg/concrete/search/SearchService.pyc'>¶
-
-
class
concrete.util.search_wrapper.
SubprocessFeedbackServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
FeedbackServiceWrapper
-
-
class
concrete.util.search_wrapper.
SubprocessSearchProxyServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
SearchProxyServiceWrapper
-
-
class
concrete.util.search_wrapper.
SubprocessSearchServiceWrapper
(implementation, host, port, timeout=None)¶ Bases:
concrete.util.service_wrapper.SubprocessConcreteServiceWrapper
-
concrete_service_wrapper_class
¶ alias of
SearchServiceWrapper
-
concrete.util.service_wrapper module¶
-
class
concrete.util.service_wrapper.
ConcreteServiceClientWrapper
(host, port)¶ Bases:
object
concrete.util.simple_comm module¶
Create a simple (valid) Communication suitable for testing purposes
-
class
concrete.util.simple_comm.
SimpleCommTempFile
(n=10, id_fmt=u'temp-%d', sentence_fmt=u'Super simple sentence %d .', writer_class=<class 'concrete.util.file_io.CommunicationWriter'>, suffix=u'.concrete')¶ Bases:
object
DEPRECATED. Please use
create_comm()
instead.Class representing a temporary file of sample concrete objects. Designed to facilitate testing.
-
path
¶ str – path to file
-
communications
¶ Communication[] – List of communications that were written to file
Usage:
from concrete.util import CommunicationReader with SimpleCommTempFile(n=3, id_fmt='temp-%d') as f: reader = CommunicationReader(f.path) for (orig_comm, comm_path_pair) in zip(f.communications, reader): print(orig_comm.id) print(orig_comm.id == comm_path_pair[0].id) print(f.path == comm_path_pair[1])
Create temp file and write communications.
Parameters: - n – i number of communications to write
- id_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- sentence_fmt – format string used to generate communication IDs; should contain one instance of %d, which will be replaced by the number of the communication
- writer_class – CommunicationWriter or CommunicationWriterTGZ
- suffix – file path suffix (you probably want to choose this to match writer_class)
-
-
concrete.util.simple_comm.
add_annotation_level_argparse_argument
(parser)¶ Add an ‘–annotation-level’ argument to an ArgumentParser
The ‘–annotation-level argument specifies the level of concrete annotation to infer from whitespace in text. See
create_comm()
for details.Parameters: parser (argparse.ArgumentParser) –
-
concrete.util.simple_comm.
create_comm
(comm_id, text=u'', comm_type=u'article', section_kind=u'passage', metadata_tool=u'concrete-python', metadata_timestamp=None, annotation_level=u'token')¶ Create a simple, valid
Communication
from text.By default the text will be split by double-newlines into sections and then by single newlines into sentences within those sections.
annotation_level controls the amount of annotation that is added:
- AL_NONE: add no optional annotations (not even sections)
- AL_SECTION: add sections but not sentences
- AL_SENTENCE: add sentences but not tokens
- AL_TOKEN: add all annotations, up to tokens (the default)
Parameters: - comm_id (str) –
- text (str) –
- comm_type (str) –
- section_kind (str) –
- metadata_tool (str) –
- metadata_timestamp (int) – Time in seconds since the Epoch. If None, the current time will be used.
- annotation_level (str) –
Returns: Return type:
-
concrete.util.simple_comm.
create_section
(sec_text, sec_start, sec_end, section_kind, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Section
from provided text and metadata.Lower-level routine (called by
create_comm()
).Parameters: - sec_text (str) –
- sec_start (int) –
- sec_end (int) –
- section_kind (str) –
- aug (_AnalyticUUIDGenerator) –
- metadata_tool (str) –
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Return type:
-
concrete.util.simple_comm.
create_sentence
(sen_text, sen_start, sen_end, aug, metadata_tool, metadata_timestamp, annotation_level)¶ Create
Sentence
from provided text and metadata.Lower-level routine (called indirectly by
create_comm()
)Parameters: - sen_text (str) –
- sen_start (int) –
- sen_end (int) –
- aug (_AnalyticUUIDGenerator) –
- metadata_tool (str) –
- metadata_timestamp (int) – Time in seconds since the Epoch
- annotation_level (str) – See
create_comm()
for details
Returns: Return type:
-
concrete.util.simple_comm.
create_simple_comm
(comm_id, sentence_string=u'Super simple sentence .')¶ Create a simple (valid)
Communication
suitable for testing purposesThe Communication will have a single
Section
containing a singleSentence
.Parameters: - comm_id (str) – Specifies a Communication ID
- sentence_string (str) – String to be used for the sentence text. The string will be whitespace-tokenized.
Returns: Return type:
concrete.util.thrift_factory module¶
-
class
concrete.util.thrift_factory.
ThriftFactory
(transportFactory, protocolFactory)¶ Bases:
object
Abstract factory to create Thrift objects for client and server.
-
createProtocol
(transport)¶
-
createServer
(processor, host, port)¶
-
createSocket
(host, port)¶
-
createTransport
(socket)¶
-
-
concrete.util.thrift_factory.
is_accelerated
()¶
concrete.util.tokenization module¶
-
exception
concrete.util.tokenization.
NoSuchTokenTagging
(*args, **kwargs)¶ Bases:
exceptions.Exception
-
concrete.util.tokenization.
compute_lattice_expected_counts
(lattice)¶ Given a
TokenLattice
in which the dst, src, token, and weight fields are set in each arc, compute and return a list of expected token log-probabilities.Input arc weights are treated as unnormalized log-probabilities.
Parameters: lattice (TokenLattice) – Returns: List of floats (expected log-probabilities) with the float at position i corresponding to the token with tokenIndex i.
-
concrete.util.tokenization.
flatten
(a)¶ Parameters: a (list) – Returns: Flattened list Return type: list
-
concrete.util.tokenization.
get_comm_tokenizations
(comm, tool=None)¶ Get list of
Tokenization
objects in aCommunication
Parameters: - comm (Communication) –
- tool (str) – If given, only return
Tokenization
objects whose metadata.tool field is equal to tool
Returns: List of
Tokenization
objects
-
concrete.util.tokenization.
get_comm_tokens
(comm, sect_pred=None, suppress_warnings=False)¶ Get list of
Token
objects inCommunication
.Parameters: - comm (Communication) –
- sect_pred (function) – Function that takes a
Section
and returns false if theSection
should be excluded. - suppress_warnings (bool) –
Returns: List of
Token
objects inCommunication
, delegating toget_tokens()
for each sentence.
-
concrete.util.tokenization.
get_lemmas
(t, tool=None)¶ Calls
get_tagged_tokens()
with a tagging_type of “LEMMA”
-
concrete.util.tokenization.
get_ner
(t, tool=None)¶ Calls
get_tagged_tokens()
with a tagging_type of “NER”
-
concrete.util.tokenization.
get_pos
(t, tool=None)¶ Calls
get_tagged_tokens()
with a tagging_type of “POS”
-
concrete.util.tokenization.
get_tagged_tokens
(tokenization, tagging_type, tool=None)¶ Return list of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Parameters: - tokenization (Tokenization) –
- tagging_type (str) –
- tool (str) – If tool is not None, filter the candidate TokenTaggings to those whose metadata.tool field matches tool.
Returns: List of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Raises: NoSuchTokenTagging
– if there is no matching taggingException
– if there is more than one matching tagging.
-
concrete.util.tokenization.
get_tokenizations
(comm, tool=None)¶ Returns a flat list of all Tokenization objects in a Communication
Parameters: comm (Communication) – Returns: A list of all Tokenization objects within the Communication
-
concrete.util.tokenization.
get_tokens
(tokenization, suppress_warnings=False)¶ Get list of
Token
objects for aTokenization
Return list of Tokens from lattice.cachedBestPath, if Tokenization kind is TOKEN_LATTICE; else, return list of Tokens from tokenList.
Warn and return list of Tokens from tokenList if kind is not set.
Return None if kind is set but the respective data fields are not.
Parameters: - tokenization (Tokenization) –
- suppress_warnings (bool) –
Returns: List of
Token
objects, or None
-
concrete.util.tokenization.
plus
(x, y)¶ Returns: x + y
concrete.util.twitter module¶
Convert between JSON and Concrete representations of Tweets
The JSON fields used by the Twitter API are documented at:
-
concrete.util.twitter.
capture_tweet_lid
(tweet)¶ Attempts to capture the ‘lang’ field in the twitter API, if it exists.
Parameters: tweet (object) – Object created by deserializing a JSON Tweet string Returns: List of LanguageIdentification
objects, or None if the field is not present in the Tweet JSON
-
concrete.util.twitter.
json_tweet_object_to_Communication
(tweet)¶ Convert deserialized JSON Tweet object to
Communication
Parameters: tweet (object) – Object created by deserializing a JSON Tweet string Returns: Return type: Communication
-
concrete.util.twitter.
json_tweet_object_to_TweetInfo
(tweet)¶ Create
TweetInfo
object from deserialized JSON Tweet objectParameters: tweet (object) – Object created by deserializing a JSON Tweet string Returns: Return type: TweetInfo
-
concrete.util.twitter.
json_tweet_string_to_Communication
(json_tweet_string, check_empty=False, check_delete=False)¶ Convert JSON Tweet string to Communication
Parameters: - json_tweet_string (str) – JSON Tweet string from Twitter API
- check_empty (bool) – If True, check if json_tweet_string is empty
- check_delete (bool) – If True, check for presence of delete field in Tweet JSON, and if the ‘delete’ field is present, return None
Returns: Return type:
-
concrete.util.twitter.
json_tweet_string_to_TweetInfo
(json_tweet_string)¶ Create
TweetInfo
object from JSON Tweet stringParameters: tweet (object) – JSON Tweet string from Twitter API Returns: Return type: TweetInfo
-
concrete.util.twitter.
snake_case_to_camelcase
(value)¶ Converts snake case to camel case
Implementation copied from this Stack Overflow post: http://goo.gl/SSgo9k
Parameters: value (unicode) – Returns: unicode
-
concrete.util.twitter.
twitter_lid_to_iso639_3
(twitter_lid)¶ Convert Twitter Language ID string to ISO639-3 code
Ref: https://dev.twitter.com/rest/reference/get/help/languages
Parameters: twitter_lid (str) – This can be an iso639-3 code (no-op), iso639-1 2-letter abbr (converted to 3), or combo (split by ‘-‘, then first part converted) Returns: An ISO639-3 code Return type: str
concrete.util.unnone module¶
-
concrete.util.unnone.
dun
(d)¶ If l is None return an empty dict, else return l. Simplifies iteration over dict fields that might be unset.
-
concrete.util.unnone.
lun
(l)¶ If l is None return an empty list, else return l. Simplifies iteration over list fields that might be unset.
-
concrete.util.unnone.
sun
(s)¶ If l is None return an empty set, else return l. Simplifies iteration over set fields that might be unset.
Module contents¶
Utility code for working with Concrete
Submodules¶
concrete.inspect module¶
Functions used by concrete_inspect.py to print data in a Communication.
The function implementations provide useful examples of how to interact with many different Concrete datastructures.
-
concrete.inspect.
penn_treebank_for_parse
(parse)¶ Get a Penn-Treebank style string for a Concrete Parse object
Parameters: parse (Parse) – Returns: A string containing a Penn Treebank style parse tree representation Return type: str
-
concrete.inspect.
print_communication_taggings_for_communication
(comm, tool=None)¶ Print information for
CommunicationTagging
objectsParameters: - comm (Communication) –
- tool (str) – If not None, only print information for
CommunicationTagging
objects with a matching metadata.tool field
Print ‘ConLL-style’ tags for the tokens in a Communication
Parameters: - comm (Communication) –
- char_offsets (bool) – Flag for printing token text specified by
a
Token
‘s (optional)TextSpan
- dependency (bool) – Flag for printing dependency parse HEAD tags
- lemmas (bool) – Flag for printing lemma tags
- ner (bool) – Flag for printing Named Entity Recognition tags
- pos (bool) – Flag for printing Part-of-Speech tags
Print ‘ConLL-style’ tags for the tokens in a tokenization
Parameters: - tokenization (Tokenization) –
- token_tag_lists – A list of lists of token tag strings
-
concrete.inspect.
print_entities
(comm, tool=None)¶ Print information for
Entity
objects and their associatedEntityMention
objectsParameters: - comm (Communication) –
- tool (str) – If not None, only print information for
EntitySet
objects with a matching metadata.tool field
-
concrete.inspect.
print_id_for_communication
(comm, tool=None)¶ Print ID field of
Communication
Parameters: - comm (Communication) –
- tool (str) – If not None, only print ID of
Communication
objects with a matching metadata.tool field
-
concrete.inspect.
print_metadata
(comm, tool=None)¶ Print metadata for tools used to annotate Communication
Parameters: - comm (Communication) –
- tool (str) – If not None, only print
AnnotationMetadata
information for objects with a matching metadata.tool field
-
concrete.inspect.
print_penn_treebank_for_communication
(comm, tool=None)¶ Print Penn-Treebank parse trees for all
Tokenization
objectsParameters: - comm (Communication) –
- tool (str) – If not None, only print information for
Tokenization
objects with a matching metadata.tool field
-
concrete.inspect.
print_sections
(comm, tool=None)¶ Print information for all
Section
object, according to their spans.Parameters: - comm (Communication) –
- tool (str) – If not None, only print information for
Section
objects with a matching metadata.tool field
-
concrete.inspect.
print_situation_mentions
(comm, tool=None)¶ Print information for all
SituationMention
(some of which may not have aSituation
)Parameters: - comm (Communication) –
- tool (str) – If not None, only print information for
SituationMention
objects with a matching metadata.tool field
-
concrete.inspect.
print_situations
(comm, tool=None)¶ Print information for all
Situation
objects and their associatedSituationMention
objectsParameters: - comm (Communication) –
- tool (str) – If not None, only print information for
Situation
objects with a matching metadata.tool field
-
concrete.inspect.
print_text_for_communication
(comm, tool=None)¶ Print text field of :class:.Communication`
Parameters: - comm (Communication) –
- tool (str) – If not None, only print text field of
Communication
objects with a matching metadata.tool field
-
concrete.inspect.
print_tokens_for_communication
(comm, tool=None)¶ Print token text for a
Communication
Parameters: - comm (Communication) –
- tool (str) – If not None, only print token text for
Communication
objects with a matching metadata.tool field
-
concrete.inspect.
print_tokens_with_entityMentions
(comm, tool=None)¶ Print information for
Token
objects that are part of anEntityMention
Parameters: - comm (Communication) –
- tool (str) – If not None, only print information for tokens
that are associated with an
EntityMention
that is part of anEntityMentionSet
with a matching metadata.tool field
concrete.validate module¶
Library to validate a Concrete Communication
Validation info, error and warning messages are logged using the Python standard library’s logging module.
-
concrete.validate.
validate_communication
(comm)¶ Test if all objects in a
Communication
are valid.Calls
validate_thrift_deep()
to check for Concrete data structure fields that are required by the Concrete Thrift definitions. Then calls:validate_token_offsets_for_section()
validate_token_offsets_for_sentence()
validate_constituency_parses()
validate_dependency_parses()
validate_token_taggings()
validate_entity_mention_ids()
validate_entity_mention_tokenization_ids()
validate_situations()
validate_situation_mentions()
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_communication_file
(communication_filename)¶ Test if the
Communication
in a file is validDeserializes a
Communication
file into memory, then callsvalidate_communication()
on the Communication object.Parameters: communication_filename (str) – Name of file containing Returns: bool
-
concrete.validate.
validate_constituency_parses
(comm, tokenization)¶ Test a
Tokenization
‘s constituencyParse
objects.Verifies that, for each constituent
Parse
:- none of the constituent IDs for the parse repeat
- the parse tree is a fully connected graph
- the parse “tree” is really a tree data structure
Parameters: - comm (Communication) –
- tokenization (Tokenization) –
Returns: bool
-
concrete.validate.
validate_dependency_parses
(tokenization)¶ Test a
Tokenization
‘sDependencyParse
objectsVerifies that, for each
DependencyParse
:- the parse is a fully connected graph
- there are no nodes with a null governer node whose edgeType is not root
Parameters: tokenization (Tokenization) – Returns: bool
-
concrete.validate.
validate_entity_mention_ids
(comm)¶ Test if all
Entity
mentionIds are validChecks if all
Entity
mentionIdUUID
‘s refer to aEntityMention
UUID
that exists in theCommunication
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_entity_mention_token_ref_sequences
(comm)¶ Test if all
EntityMention
objects have a validTokenRefSequences
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_entity_mention_tokenization_ids
(comm)¶ Test tokenizationID field of every
EntityMention
Verifies that, for each
EntityMention
, the entityMention.tokens.tokenizationIdUUID
field matches theUUID
of aTokenization
that exists in thisCommunication
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_situation_mentions
(comm)¶ Test every
SituationMention
in theCommunication
A
SituationMention
has a list ofMentionArgument
objects, and eachMentionArgument
can point to anEntityMention
,SituationMention
orTokenRefSequence
.Checks that each
MentionArgument
points to only one type of argument. Also checks validity of allEntityMention
andSituationMention
UUID
‘s.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_situations
(comm)¶ Test every
Situation
in theCommunication
Checks the validity of all
EntityMention
andSituationMention
UUID
‘s referenced by eachSituation
.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_thrift
(thrift_object, indent_level=0)¶ Test if a Thrift object has all required fields.
This function calls the Thrift object’s validate() function. If an exception is raised because of missing required fields, the function catches the exception and logs the exception’s error message using the Python Standard Library’s logging module.
Parameters: - thrift_object –
- indent_level (int) – Text indentation level for logging error message
Returns: bool
-
concrete.validate.
validate_thrift_deep
(thrift_object, valid=True)¶ Deep validation of thrift messages.
Parameters: thrift_object – a Thrift object The Python version of Thrift 0.9.1 does not support deep (recursive) validation, and none of the Thrift serialization/deserialization code calls even the shallow validation functions provided by Thrift.
This function implements deep validation. The code is adapted from:
See this blog post for more information:
A patch to implement deep validation was submitted to the Thrift repository in February of 2013:
but Thrift 0.9.1 - which was released on 2013-08-21 - does not include this functionality.
-
concrete.validate.
validate_thrift_object_required_fields
(thrift_object, indent_level=0)¶ DEPRECATED: Use
validate_thrift()
instead
-
concrete.validate.
validate_thrift_object_required_fields_recursively
(thrift_object, valid=True)¶ DEPRECATED. Use
validate_thrift_deep()
instead.
-
concrete.validate.
validate_token_offsets_for_section
(section)¶ Test if the
TextSpan
boundaries for allSentence
objects in aSection
fall within the boundaries of theSection
‘sTextSpan
Parameters: section (Section) – Returns: bool
-
concrete.validate.
validate_token_offsets_for_sentence
(sentence)¶ Test if the
TextSpan
boundaries for allToken
objects` in aSentence
fall within the boundaries of theSentence
‘sTextSpan
.Parameters: sentence (Sentence) – Returns: bool
-
concrete.validate.
validate_token_ref_sequence
(comm, token_ref_sequence)¶ Check if a
TokenRefSequence
is validVerify that all token indices in the
TokenRefSequence
point to actual token indices in correspondingTokenization
Parameters: - comm (Communication) –
- token_ref_sequence (TokenRefSequence) –
Returns: bool
-
concrete.validate.
validate_token_taggings
(tokenization)¶ Test if a
Tokenization
has anyTokenTagging
objects with invalid token indicesParameters: tokenization (Tokenization) – Returns: bool