concrete.util.tokenization module¶
-
concrete.util.tokenization.
compute_lattice_expected_counts
(lattice)¶ Given a
TokenLattice
in which the dst, src, token, and weight fields are set in each arc, compute and return a list of expected token log-probabilities.Input arc weights are treated as unnormalized log-probabilities.
Parameters: lattice (TokenLattice) – Returns: List of floats (expected log-probabilities) with the float at position i corresponding to the token with tokenIndex i.
-
concrete.util.tokenization.
flatten
(a)¶ Parameters: a (list) – Returns: Flattened list Return type: list
-
concrete.util.tokenization.
get_comm_tokenizations
(comm, tool=None)¶ Get list of
Tokenization
objects in aCommunication
Parameters: - comm (Communication) –
- tool (str) – If given, only return
Tokenization
objects whose metadata.tool field is equal to tool
Returns: List of
Tokenization
objects
-
concrete.util.tokenization.
get_comm_tokens
(comm, sect_pred=None, suppress_warnings=False)¶ Get list of
Token
objects inCommunication
.Parameters: - comm (Communication) –
- sect_pred (function) – Function that takes a
Section
and returns false if theSection
should be excluded. - suppress_warnings (bool) –
Returns: List of
Token
objects inCommunication
, delegating toget_tokens()
for each sentence.
-
concrete.util.tokenization.
get_lemmas
(t, tool=None)¶ Calls
get_tagged_tokens()
with a tagging_type of “LEMMA”
-
concrete.util.tokenization.
get_ner
(t, tool=None)¶ Calls
get_tagged_tokens()
with a tagging_type of “NER”
-
concrete.util.tokenization.
get_pos
(t, tool=None)¶ Calls
get_tagged_tokens()
with a tagging_type of “POS”
-
concrete.util.tokenization.
get_tagged_tokens
(tokenization, tagging_type, tool=None)¶ Return list of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Parameters: - tokenization (Tokenization) –
- tagging_type (str) –
- tool (str) – If tool is not None, filter the candidate TokenTaggings to those whose metadata.tool field matches tool.
Returns: List of
TaggedToken
objects of taggingType equal to tagging_type, if there is a unique choice.Raises: Exception
– Raised if there is no matching tagging or more than one matching tagging.
-
concrete.util.tokenization.
get_tokens
(tokenization, suppress_warnings=False)¶ Get list of
Token
objects for aTokenization
Return list of Tokens from lattice.cachedBestPath, if Tokenization kind is TOKEN_LATTICE; else, return list of Tokens from tokenList.
Warn and return list of Tokens from tokenList if kind is not set.
Return None if kind is set but the respective data fields are not.
Parameters: - tokenization (Tokenization) –
- suppress_warnings (bool) –
Returns: List of
Token
objects, or None
-
concrete.util.tokenization.
plus
(x, y)¶ Returns: x + y