concrete.clustering package

class concrete.clustering.ttypes.Cluster(clusterMemberIndexList=None, confidenceList=None, childIndexList=None)

Bases: object


A set of items which are alike in some way. Has an implicit id which is the
index of this Cluster in its parent Clustering’s ‘clusterList’.

Attributes:
- clusterMemberIndexList: The items in this cluster. Values are indices into the
‘clusterMemberList’ of the Clustering which contains this Cluster.
- confidenceList: Co-indexed with ‘clusterMemberIndexList’. The i^{th} value represents the
confidence that mention clusterMemberIndexList[i] belongs to this cluster.
- childIndexList: A set of clusters (implicit ids/indices) from which this cluster was
created. This cluster should represent the union of all the items in all
of the child clusters. (For hierarchical clustering only).

read(iprot)
validate()
write(oprot)
class concrete.clustering.ttypes.ClusterMember(communicationId=None, setId=None, elementId=None)

Bases: object


An item being clustered. Does not designate cluster _membership_, as in
“item x belongs to cluster C”, but rather just the item (“x” in this
example). Membership is indicated through Cluster objects. An item may be a
Entity, EntityMention, Situation, SituationMention, or technically anything
with a UUID.

Attributes:
- communicationId: UUID of the Communication which contains the item specified by ‘elementId’.
This is ancillary info assuming UUIDs are indeed universally unique.
- setId: UUID of the Entity|Situation(Mention)Set which contains the item specified by ‘elementId’.
This is ancillary info assuming UUIDs are indeed universally unique.
- elementId: UUID of the EntityMention, Entity, SituationMention, or Situation that
this item represents. This is the characteristic field.

read(iprot)
validate()
write(oprot)
class concrete.clustering.ttypes.Clustering(uuid=None, metadata=None, clusterMemberList=None, clusterList=None, rootClusterIndexList=None)

Bases: object


An (optionally) hierarchical clustering of items appearing across a set of
Communications (intra-Communication clusterings are encoded by Entities and
Situations). An item may be a Entity, EntityMention, Situation,
SituationMention, or technically anything with a UUID.

Attributes:
- uuid: UUID for this Clustering object.
- metadata: Metadata for this Clustering object.
- clusterMemberList: The set of items being clustered.
- clusterList: Clusters of items. If this is a hierarchical clustering, this may contain
clusters which are the set of smaller clusters.
Clusters may not “overlap”, meaning (for all clusters X,Y):
X cap Y
eq emptyset implies X subset Y ee Y subset X
- rootClusterIndexList: A set of disjoint clusters (indices in ‘clusterList’) which cover all
items in ‘clusterMemberList’. This list must be specified for hierarchical
clusterings and should not be specified for flat clusterings.

read(iprot)
validate()
write(oprot)