concrete.util.tokenization module¶
-
exception
concrete.util.tokenization.
NoSuchTokenTagging
(*args, **kwargs)¶ Bases:
exceptions.Exception
Exception representing there is no
TokenTagging
annotation that matches the given criteria in a given concrete object
-
concrete.util.tokenization.
compute_lattice_expected_counts
(lattice)¶ Given a
TokenLattice
in which the dst, src, token, and weight fields are set in each arc, compute and return a list of expected token log-probabilities.Input arc weights are treated as unnormalized log-probabilities.
Parameters: lattice (TokenLattice) – lattice to compute expected counts for Returns: List of floats (expected log-probabilities) with the float at position i corresponding to the token with tokenIndex i.
-
concrete.util.tokenization.
flatten
(a)¶ Returned flattened version of input list.
Parameters: a (list) – Returns: Flattened list Return type: list
-
concrete.util.tokenization.
get_comm_tokenizations
(comm, tool=None)¶ Get list of
Tokenization
objects in aCommunication
Parameters: - comm (Communication) – communications to extract tokenizations from
- tool (str) – If not None, only return
Tokenization
objects whose metadata.tool field is equal to tool
Returns: List of
Tokenization
objects
-
concrete.util.tokenization.
get_comm_tokens
(comm, sect_pred=None, suppress_warnings=False)¶ Get list of
Token
objects inCommunication
.Parameters: - comm (Communication) – communications to extract tokens from
- sect_pred (function) – Function that takes a
Section
and returns false if theSection
should be excluded. - suppress_warnings (bool) – True to suppress warning messages that Tokenization.kind is None
Returns: List of
Token
objects inCommunication
, delegating toget_tokens()
for each sentence.
-
concrete.util.tokenization.
get_lemmas
(t, tool=None)¶ Returns the result of
get_tagged_tokens()
with a tagging_type of “LEMMA”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: list of ‘LEMMA’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.
get_ner
(t, tool=None)¶ Returns the result of
get_tagged_tokens()
with a tagging_type of “NER”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: list of ‘NER’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.
get_pos
(t, tool=None)¶ Returns the result of
get_tagged_tokens()
with a tagging_type of “LEMMA”Parameters: - t (Tokenization) – tokenization to extract tagged tokens from
- tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: list of ‘POS’-tagged tokens matching tool (if specified)
-
concrete.util.tokenization.
get_tagged_tokens
(tokenization, tagging_type, tool=None)¶ Return list of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Parameters: - tokenization (Tokenization) – tokenization to return tagged tokens for
- tagging_type (str) – only return tagged tokens for
TokenTagging
objects whose taggingType field is equal to tagging_type - tool (str) – If not None, only return tagged tokens for
TokenTagging
objects whose metadata.tool field is equal to tool
Returns: List of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Raises: NoSuchTokenTagging
– if there is no matching taggingException
– if there is more than one matching tagging.
-
concrete.util.tokenization.
get_token_taggings
(tokenization, tagging_type, case_sensitive=False)¶ Return list of
TokenTagging
objects of taggingType equal to tagging_type.Parameters: - tokenization (Tokenization) – tokenization from which taggings will be selected
- tagging_type (str) – value of taggingType to filter to
- case_sensitive (bool) – True to do case-sensitive matching on taggingType.
Returns: List of
TokenTagging
objects of taggingType equal to tagging_type, in same order as they appeared in the tokenization.
-
concrete.util.tokenization.
get_tokenizations
(comm, tool=None)¶ Returns a flat list of all Tokenization objects in a Communication
Parameters: - comm (Communication) – communication to get tokenizations from
- tool (str) – if not None, return only tokenizations whose metadata.tool field matches tool
Returns: A list of all Tokenization objects within the Communication matching tool (if it is not None)
-
concrete.util.tokenization.
get_tokens
(tokenization, suppress_warnings=False)¶ Get list of
Token
objects for aTokenization
Return list of Tokens from lattice.cachedBestPath, if Tokenization kind is TOKEN_LATTICE; else, return list of Tokens from tokenList.
Warn and return list of Tokens from tokenList if kind is not set.
Return None if kind is set but the respective data fields are not.
Parameters: - tokenization (Tokenization) – tokenization to extract tokens from
- suppress_warnings (bool) – True to suppress warning messages that tokenization.kind is None
Returns: List of
Token
objects, or NoneRaises: ValueError
– if tokenization.kind is not a recognized tokenization kind
-
concrete.util.tokenization.
plus
(x, y)¶ Return concatenation of two lists.
Parameters: - x (list) –
- y (list) –
Returns: list concatenation of x and y