concrete.validate module¶
Library to validate a Concrete Communication
Validation info, error and warning messages are logged using the Python standard library’s logging module.
-
concrete.validate.validate_communication(comm)¶ Test if all objects in a
Communicationare valid.Calls
validate_thrift_deep()to check for Concrete data structure fields that are required by the Concrete Thrift definitions. Then calls:validate_token_offsets_for_section()validate_token_offsets_for_sentence()validate_constituency_parses()validate_dependency_parses()validate_token_taggings()validate_entity_mention_ids()validate_entity_mention_tokenization_ids()validate_situations()validate_situation_mentions()
Parameters: comm (Communication) – Returns: bool
-
concrete.validate.validate_communication_file(communication_filename)¶ Test if the
Communicationin a file is validDeserializes a
Communicationfile into memory, then callsvalidate_communication()on the Communication object.Parameters: communication_filename (str) – Name of file containing Returns: bool
-
concrete.validate.validate_constituency_parses(comm, tokenization)¶ Test a
Tokenization’s constituencyParseobjects.Verifies that, for each constituent
Parse:- none of the constituent IDs for the parse repeat
- the parse tree is a fully connected graph
- the parse “tree” is really a tree data structure
Parameters: - comm (Communication) –
- tokenization (Tokenization) –
Returns: bool
-
concrete.validate.validate_dependency_parses(tokenization)¶ Test a
Tokenization’sDependencyParseobjectsVerifies that, for each
DependencyParse:- the parse is a fully connected graph
- there are no nodes with a null governer node whose edgeType is not root
Parameters: tokenization (Tokenization) – Returns: bool
-
concrete.validate.validate_entity_mention_ids(comm)¶ Test if all
EntitymentionIds are validChecks if all
EntitymentionIdUUID’s refer to aEntityMentionUUIDthat exists in theCommunicationParameters: comm (Communication) – Returns: bool
-
concrete.validate.validate_entity_mention_token_ref_sequences(comm)¶ Test if all
EntityMentionobjects have a validTokenRefSequencesParameters: comm (Communication) – Returns: bool
-
concrete.validate.validate_entity_mention_tokenization_ids(comm)¶ Test tokenizationID field of every
EntityMentionVerifies that, for each
EntityMention, the entityMention.tokens.tokenizationIdUUIDfield matches theUUIDof aTokenizationthat exists in thisCommunicationParameters: comm (Communication) – Returns: bool
-
concrete.validate.validate_situation_mentions(comm)¶ Test every
SituationMentionin theCommunicationA
SituationMentionhas a list ofMentionArgumentobjects, and eachMentionArgumentcan point to anEntityMention,SituationMentionorTokenRefSequence.Checks that each
MentionArgumentpoints to only one type of argument. Also checks validity of allEntityMentionandSituationMentionUUID’s.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.validate_situations(comm)¶ Test every
Situationin theCommunicationChecks the validity of all
EntityMentionandSituationMentionUUID’s referenced by eachSituation.Parameters: comm (Communication) – Returns: bool
-
concrete.validate.validate_thrift(thrift_object, indent_level=0)¶ Test if a Thrift object has all required fields.
This function calls the Thrift object’s validate() function. If an exception is raised because of missing required fields, the function catches the exception and logs the exception’s error message using the Python Standard Library’s logging module.
Parameters: - thrift_object –
- indent_level (int) – Text indentation level for logging error message
Returns: bool
-
concrete.validate.validate_thrift_deep(thrift_object, valid=True)¶ Deep validation of thrift messages.
Parameters: thrift_object – a Thrift object The Python version of Thrift 0.9.1 does not support deep (recursive) validation, and none of the Thrift serialization/deserialization code calls even the shallow validation functions provided by Thrift.
This function implements deep validation. The code is adapted from:
See this blog post for more information:
A patch to implement deep validation was submitted to the Thrift repository in February of 2013:
but Thrift 0.9.1 - which was released on 2013-08-21 - does not include this functionality.
-
concrete.validate.validate_thrift_object_required_fields(thrift_object, indent_level=0)¶ DEPRECATED: Use
validate_thrift()instead
-
concrete.validate.validate_thrift_object_required_fields_recursively(thrift_object, valid=True)¶ DEPRECATED. Use
validate_thrift_deep()instead.
-
concrete.validate.validate_token_offsets_for_section(section)¶ Test if the
TextSpanboundaries for allSentenceobjects in aSectionfall within the boundaries of theSection’sTextSpanParameters: section (Section) – Returns: bool
-
concrete.validate.validate_token_offsets_for_sentence(sentence)¶ Test if the
TextSpanboundaries for allTokenobjects` in aSentencefall within the boundaries of theSentence’sTextSpan.Parameters: sentence (Sentence) – Returns: bool
-
concrete.validate.validate_token_ref_sequence(comm, token_ref_sequence)¶ Check if a
TokenRefSequenceis validVerify that all token indices in the
TokenRefSequencepoint to actual token indices in correspondingTokenizationParameters: - comm (Communication) –
- token_ref_sequence (TokenRefSequence) –
Returns: bool
-
concrete.validate.validate_token_taggings(tokenization)¶ Test if a
Tokenizationhas anyTokenTaggingobjects with invalid token indicesParameters: tokenization (Tokenization) – Returns: bool