concrete.validate module¶
Library to validate a Concrete Communication
Validation info, error and warning messages are logged using the Python standard library’s logging module.
-
concrete.validate.
validate_communication
(comm)¶ Test if all objects in a
Communication
are valid.Calls
validate_thrift_deep()
to check for Concrete data structure fields that are required by the Concrete Thrift definitions. Then calls:validate_token_offsets_for_section()
validate_token_offsets_for_sentence()
validate_constituency_parses()
validate_dependency_parses()
validate_token_taggings()
validate_entity_mention_ids()
validate_entity_mention_tokenization_ids()
validate_situations()
validate_situation_mentions()
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_communication_file
(communication_filename)¶ Test if the
Communication
in a file is validDeserializes a
Communication
file into memory, then callsvalidate_communication()
on the Communication object.Parameters: communication_filename (str) – Name of file containing Returns: bool
-
concrete.validate.
validate_constituency_parses
(comm, tokenization)¶ Test a
Tokenization
‘s constituencyParse
objects.Verifies that, for each constituent
Parse
:- none of the constituent IDs for the parse repeat
- the parse tree is a fully connected graph
- the parse “tree” is really a tree data structure
Parameters: - comm (Communication) –
- tokenization (Tokenization) –
Returns: bool
-
concrete.validate.
validate_dependency_parses
(tokenization)¶ Test a
Tokenization
‘sDependencyParse
objectsVerifies that, for each
DependencyParse
:- the parse is a fully connected graph
- there are no nodes with a null governer node whose edgeType is not root
Parameters: tokenization (Tokenization) – Returns: bool
-
concrete.validate.
validate_entity_mention_ids
(comm)¶ Test if all
Entity
mentionIds are validChecks if all
Entity
mentionIdUUID
‘s refer to aEntityMention
UUID
that exists in theCommunication
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_entity_mention_token_ref_sequences
(comm)¶ Test if all
EntityMention
objects have a validTokenRefSequences
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_entity_mention_tokenization_ids
(comm)¶ Test tokenizationID field of every
EntityMention
Verifies that, for each
EntityMention
, the entityMention.tokens.tokenizationIdUUID
field matches theUUID
of aTokenization
that exists in thisCommunication
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_situation_mentions
(comm)¶ Test every
SituationMention
in theCommunication
A
SituationMention
has a list ofMentionArgument
objects, and eachMentionArgument
can point to anEntityMention
,SituationMention
orTokenRefSequence
.Checks that each
MentionArgument
points to only one type of argument. Also checks validity of allEntityMention
andSituationMention
UUID
‘s.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_situations
(comm)¶ Test every
Situation
in theCommunication
Checks the validity of all
EntityMention
andSituationMention
UUID
‘s referenced by eachSituation
.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.
validate_thrift
(thrift_object, indent_level=0)¶ Test if a Thrift object has all required fields.
This function calls the Thrift object’s validate() function. If an exception is raised because of missing required fields, the function catches the exception and logs the exception’s error message using the Python Standard Library’s logging module.
Parameters: - thrift_object –
- indent_level (int) – Text indentation level for logging error message
Returns: bool
-
concrete.validate.
validate_thrift_deep
(thrift_object, valid=True)¶ Deep validation of thrift messages.
Parameters: thrift_object – a Thrift object The Python version of Thrift 0.9.1 does not support deep (recursive) validation, and none of the Thrift serialization/deserialization code calls even the shallow validation functions provided by Thrift.
This function implements deep validation. The code is adapted from:
See this blog post for more information:
A patch to implement deep validation was submitted to the Thrift repository in February of 2013:
but Thrift 0.9.1 - which was released on 2013-08-21 - does not include this functionality.
-
concrete.validate.
validate_thrift_object_required_fields
(thrift_object, indent_level=0)¶ DEPRECATED: Use
validate_thrift()
instead
-
concrete.validate.
validate_thrift_object_required_fields_recursively
(thrift_object, valid=True)¶ DEPRECATED. Use
validate_thrift_deep()
instead.
-
concrete.validate.
validate_token_offsets_for_section
(section)¶ Test if the
TextSpan
boundaries for allSentence
objects in aSection
fall within the boundaries of theSection
‘sTextSpan
Parameters: section (Section) – Returns: bool
-
concrete.validate.
validate_token_offsets_for_sentence
(sentence)¶ Test if the
TextSpan
boundaries for allToken
objects` in aSentence
fall within the boundaries of theSentence
‘sTextSpan
.Parameters: sentence (Sentence) – Returns: bool
-
concrete.validate.
validate_token_ref_sequence
(comm, token_ref_sequence)¶ Check if a
TokenRefSequence
is validVerify that all token indices in the
TokenRefSequence
point to actual token indices in correspondingTokenization
Parameters: - comm (Communication) –
- token_ref_sequence (TokenRefSequence) –
Returns: bool
-
concrete.validate.
validate_token_taggings
(tokenization)¶ Test if a
Tokenization
has anyTokenTagging
objects with invalid token indicesParameters: tokenization (Tokenization) – Returns: bool