concrete.validate module

Library to validate a Concrete Communication

Validation info, error and warning messages are logged using the Python standard library’s logging module.

concrete.validate.validate_communication(comm)

Test if all objects in a Communication are valid.

Calls validate_thrift_deep() to check for Concrete data structure fields that are required by the Concrete Thrift definitions. Then calls:

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_communication_file(communication_filename)

Test if the Communication in a file is valid

Deserializes a Communication file into memory, then calls validate_communication() on the Communication object.

Parameters:communication_filename (str) – Name of file containing
Returns:bool
concrete.validate.validate_constituency_parses(comm, tokenization)

Test a Tokenization‘s constituency Parse objects.

Verifies that, for each constituent Parse:

  • none of the constituent IDs for the parse repeat
  • the parse tree is a fully connected graph
  • the parse “tree” is really a tree data structure
Parameters:
Returns:

bool

concrete.validate.validate_dependency_parses(tokenization)

Test a Tokenization‘s DependencyParse objects

Verifies that, for each DependencyParse:

  • the parse is a fully connected graph
  • there are no nodes with a null governer node whose edgeType is not root
Parameters:tokenization (Tokenization) –
Returns:bool
concrete.validate.validate_entity_mention_ids(comm)

Test if all Entity mentionIds are valid

Checks if all Entity mentionId UUID‘s refer to a EntityMention UUID that exists in the Communication

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_entity_mention_token_ref_sequences(comm)

Test if all EntityMention objects have a valid TokenRefSequences

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_entity_mention_tokenization_ids(comm)

Test tokenizationID field of every EntityMention

Verifies that, for each EntityMention, the entityMention.tokens.tokenizationId UUID field matches the UUID of a Tokenization that exists in this Communication

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_situation_mentions(comm)

Test every SituationMention in the Communication

A SituationMention has a list of MentionArgument objects, and each MentionArgument can point to an EntityMention, SituationMention or TokenRefSequence.

Checks that each MentionArgument points to only one type of argument. Also checks validity of all EntityMention and SituationMention UUID‘s.

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_situations(comm)

Test every Situation in the Communication

Checks the validity of all EntityMention and SituationMention UUID‘s referenced by each Situation.

Parameters:comm (Communication) –
Returns:bool
concrete.validate.validate_thrift(thrift_object, indent_level=0)

Test if a Thrift object has all required fields.

This function calls the Thrift object’s validate() function. If an exception is raised because of missing required fields, the function catches the exception and logs the exception’s error message using the Python Standard Library’s logging module.

Parameters:
  • thrift_object
  • indent_level (int) – Text indentation level for logging error message
Returns:

bool

concrete.validate.validate_thrift_deep(thrift_object, valid=True)

Deep validation of thrift messages.

Parameters:thrift_object – a Thrift object

The Python version of Thrift 0.9.1 does not support deep (recursive) validation, and none of the Thrift serialization/deserialization code calls even the shallow validation functions provided by Thrift.

This function implements deep validation. The code is adapted from:

See this blog post for more information:

A patch to implement deep validation was submitted to the Thrift repository in February of 2013:

but Thrift 0.9.1 - which was released on 2013-08-21 - does not include this functionality.

concrete.validate.validate_thrift_object_required_fields(thrift_object, indent_level=0)

DEPRECATED: Use validate_thrift() instead

concrete.validate.validate_thrift_object_required_fields_recursively(thrift_object, valid=True)

DEPRECATED. Use validate_thrift_deep() instead.

concrete.validate.validate_token_offsets_for_section(section)

Test if the TextSpan boundaries for all Sentence objects in a Section fall within the boundaries of the Section‘s TextSpan

Parameters:section (Section) –
Returns:bool
concrete.validate.validate_token_offsets_for_sentence(sentence)

Test if the TextSpan boundaries for all Token objects` in a Sentence fall within the boundaries of the Sentence‘s TextSpan.

Parameters:sentence (Sentence) –
Returns:bool
concrete.validate.validate_token_ref_sequence(comm, token_ref_sequence)

Check if a TokenRefSequence is valid

Verify that all token indices in the TokenRefSequence point to actual token indices in corresponding Tokenization

Parameters:
Returns:

bool

concrete.validate.validate_token_taggings(tokenization)

Test if a Tokenization has any TokenTagging objects with invalid token indices

Parameters:tokenization (Tokenization) –
Returns:bool