Advanced Usage¶

In this section we demonstrate more advanced processing of Concrete Communications. We previously traversed Sections, Sentences, TokenLists, and TokenTaggings, which have a nested linear structure; we now demonstrate usage of DependencyParses, Entities, and SituationMentions, which are non-linear, higher-level annotations.

Print DependencyParses¶

The following code prints a Communication’s tokens and their dependency graph in CoNLL format, similar to concrete-inspect.py --dependency, for the first dependency parse in each sentence. This example makes use of serif_dog-bites-man.concrete:

from concrete.util import read_communication_from_file, lun
comm = read_communication_from_file('serif_dog-bites-man.concrete')

for section in lun(comm.sectionList):
    for sentence in lun(section.sentenceList):
        if sentence.tokenization and sentence.tokenization.tokenList:
            # Columns of CoNLL-style output go here.
            taggings = []

            # Token text
            taggings.append([x.text for x in sentence.tokenization.tokenList.tokenList])

            if sentence.tokenization.dependencyParseList:
                # Read dependency arcs from dependency parse tree. (Deps start at zero.)
                head = [-1]*len(sentence.tokenization.tokenList.tokenList)
                for arc in sentence.tokenization.dependencyParseList[0].dependencyList:
                    head[arc.dep] = arc.gov

                # Add head index to taggings
                taggings.append(head)

            # Transpose the list. Format and print each row.
            for row in zip(*taggings):
                print('\t'.join('%15s' % x for x in row))

            print('')

There are many optional fields in Concrete and here we’ve encountered several of them: Communication.sectionList, Section.sentenceList, Sentence.tokenization, Tokenization.tokenList, and Tokenization.dependencyParseList. An unset optional field is represented with a value of None. We’ve used concrete.util.unnone.lun(), which returns its argument if its argument is not None and otherwise returns an empty list, to work around some of the optional fields, while we’ve directly checked the others.

Expected output of the previous code:

     John                   1
    Smith                   9
        ,                  -1
  manager                   1
       of                   6
     ACMÉ                   6
      INC                   3
        ,                  -1
      was                   9
      bit                  -1
       by                  12
        a                  12
      dog                   9
       on                  14
    March                  12
     10th                  14
        ,                  -1
     2013                  12
        .                  -1

       He                   1
     died                  -1
        !                  -1

     John                   2
       's                   0
 daughter                   4
     Mary                   4
expressed                  -1
   sorrow                   4
        .                  -1

Print Entities¶

We now print Entities and their EntityMentions (which represent the result of coreference resolution). This example makes use of serif_dog-bites-man.concrete:

from concrete.util import read_communication_from_file, lun
comm = read_communication_from_file('serif_dog-bites-man.concrete')

for entitySet in lun(comm.entitySetList):
    for ei, entity in enumerate(entitySet.entityList):
        print('Entity %s (%s)' % (ei, entity.canonicalName))
        for i, mention in enumerate(entity.mentionList):
            print('  Mention %s: %s' % (i, mention.text))
        print('')
    print('')

Note that Entity.mentionList is not in the schema! This field was added in concrete.util.file_io.read_communication_from_file() after deserializing the original Communication. By default, some additional fields are added to Concrete objects by concrete.util.references.add_references_to_communication() when they are deserialized; see that function’s documentation for details. For our purposes here, know that add_references_to_communication adds a mentionList field to each Entity that contains a list of the EntityMentions that reference that Entity.

Expected output of the previous code:

Entity 0 (None)
  Mention 0: John Smith
  Mention 1: John Smith, manager of ACMÉ INC,
  Mention 2: manager of ACMÉ INC
  Mention 3: He
  Mention 4: John

Entity 1 (None)
  Mention 0: ACMÉ INC

Entity 2 (None)
  Mention 0: John's daughter Mary
  Mention 1: daughter


Entity 0 (2013-03-10)
  Mention 0: March 10th, 2013

Print SituationMentions¶

We now print SituationMentions, the results of relation extraction. This example makes use of serif_example.concrete, on which BBN-SERIF’s relation and event extractor has been run:

from concrete.util import read_communication_from_file, lun
comm = read_communication_from_file('serif_example.concrete')

for i, situationMentionSet in enumerate(lun(comm.situationMentionSetList)):
    if situationMentionSet.metadata:
        print('Situation Set %d (%s):' % (i, situationMentionSet.metadata.tool))
    else:
        print('Situation Set %d:' % i)
    for j, situationMention in enumerate(situationMentionSet.mentionList):
        print('SituationMention %d-%d:' % (i, j))
        print('    text', situationMention.text)
        print('    situationType', situationMention.situationType)
        for k, arg in enumerate(lun(situationMention.argumentList)):
            print('    Argument %d:' % k)
            print('      role', arg.role)
            if arg.entityMention:
                print('      entityMention', arg.entityMention.text)
            if arg.situationMention:
                print('      situationMention:')
                print('        text', situationMention.text)
                print('        situationType', situationMention.situationType)
        print('')
    print('')

Expected output:

Situation Set 0 (Serif: relations):
SituationMention 0-0:
    text None
    situationType ORG-AFF.Employment
    Argument 0:
      role Role.RELATION_SOURCE_ROLE
      entityMention manager of ACME INC
    Argument 1:
      role Role.RELATION_TARGET_ROLE
      entityMention ACME INC

SituationMention 0-1:
    text None
    situationType PER-SOC.Family
    Argument 0:
      role Role.RELATION_SOURCE_ROLE
      entityMention John
    Argument 1:
      role Role.RELATION_TARGET_ROLE
      entityMention daughter


Situation Set 1 (Serif: events):
SituationMention 1-0:
    text died
    situationType Life.Die
    Argument 0:
      role Victim
      entityMention He