concrete.util.tokenization module¶
-
exception
concrete.util.tokenization.NoSuchTokenTagging(*args, **kwargs)¶ Bases:
ExceptionException representing there is no
TokenTaggingannotation that matches the given criteria in a given concrete object
-
concrete.util.tokenization.compute_lattice_expected_counts(lattice)¶ Given a
TokenLatticein which the dst, src, token, and weight fields are set in each arc, compute and return a list of expected token log-probabilities.Input arc weights are treated as unnormalized log-probabilities.
Parameters: lattice (TokenLattice) – lattice to compute expected counts for Returns: List of floats (expected log-probabilities) with the float at position i corresponding to the token with tokenIndex i.
-
concrete.util.tokenization.flatten(a)¶ Returned flattened version of input list.
Parameters: a (list) – Returns: Flattened list Return type: list
-
concrete.util.tokenization.get_comm_tokenizations(comm, tool=None)¶ Get list of
Tokenizationobjects in aCommunicationParameters: - comm (Communication) – communications to extract tokenizations from
- tool (str) – If not None, only return
Tokenizationobjects whose metadata.tool field is equal to tool
Returns: List of
Tokenizationobjects
-
concrete.util.tokenization.get_comm_tokens(comm, sect_pred=None, suppress_warnings=False)¶ Get list of
Tokenobjects inCommunication.Parameters: - comm (Communication) – communications to extract tokens from
- sect_pred (function) – Function that takes a
Sectionand returns false if theSectionshould be excluded. - suppress_warnings (bool) – True to suppress warning messages that Tokenization.kind is None
Returns: List of
Tokenobjects inCommunication, delegating toget_tokens()for each sentence.
-
concrete.util.tokenization.get_lemmas(t, tool=None)¶ Returns the result of
get_tagged_tokens()with a tagging_type of “LEMMA”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTaggingobjects whose metadata.tool field is equal to tool
Returns: list of ‘LEMMA’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.get_ner(t, tool=None)¶ Returns the result of
get_tagged_tokens()with a tagging_type of “NER”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTaggingobjects whose metadata.tool field is equal to tool
Returns: list of ‘NER’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.get_pos(t, tool=None)¶ Returns the result of
get_tagged_tokens()with a tagging_type of “LEMMA”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTaggingobjects whose metadata.tool field is equal to tool
Returns: list of ‘POS’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.get_tagged_tokens(tokenization, tagging_type, tool=None)¶ Return list of
TaggedTokenobjects of taggingType equal to tagging_type, if there is a unique choice.Parameters: - tokenization (Tokenization) – tokenization to return tagged tokens for
- tagging_type (str) – only return tagged tokens for
TokenTaggingobjects whose taggingType field is equal to tagging_type - tool (str) – If not None, only return tagged tokens for
TokenTaggingobjects whose metadata.tool field is equal to tool
Returns: List of
TaggedTokenobjects of taggingType equal to tagging_type, if there is a unique choice.Raises: NoSuchTokenTagging– if there is no matching taggingException– if there is more than one matching tagging.
-
concrete.util.tokenization.get_token_taggings(tokenization, tagging_type, case_sensitive=False)¶ Return list of
TokenTaggingobjects of taggingType equal to tagging_type.Parameters: - tokenization (Tokenization) – tokenization from which taggings will be selected
- tagging_type (str) – value of taggingType to filter to
- case_sensitive (bool) – True to do case-sensitive matching on taggingType.
Returns: List of
TokenTaggingobjects of taggingType equal to tagging_type, in same order as they appeared in the tokenization. If no matchingTokenTaggingobjects exist, return an empty list.
-
concrete.util.tokenization.get_tokenizations(comm, tool=None)¶ Returns a flat list of all Tokenization objects in a Communication
Parameters: - comm (Communication) – communication to get tokenizations from
- tool (str) – if not None, return only tokenizations whose metadata.tool field matches tool
Returns: A list of all Tokenization objects within the Communication matching tool (if it is not None)
-
concrete.util.tokenization.get_tokens(tokenization, suppress_warnings=False)¶ Get list of
Tokenobjects for aTokenizationReturn list of Tokens from lattice.cachedBestPath, if Tokenization kind is TOKEN_LATTICE; else, return list of Tokens from tokenList.
Warn and return list of Tokens from tokenList if kind is not set.
Return None if kind is set but the respective data fields are not.
Parameters: - tokenization (Tokenization) – tokenization to extract tokens from
- suppress_warnings (bool) – True to suppress warning messages that tokenization.kind is None
Returns: List of
Tokenobjects, or NoneRaises: ValueError– if tokenization.kind is not a recognized tokenization kind
-
concrete.util.tokenization.plus(x, y)¶ Return concatenation of two lists.
Parameters: - x (list) –
- y (list) –
Returns: list concatenation of x and y