concrete.util.concrete_uuid module

Helper functions for generating Concrete UUID objects

class concrete.util.concrete_uuid.AnalyticUUIDGeneratorFactory(comm=None)

Bases: object

Primary interface to generation of compressible UUIDs. Each compressible UUID takes the form

xxxxxxxx-xxxx-yyyy-yyyy-zzzzzzzzzzzz

where each instance of x, y, or z is a hexadecimal digit, the group of x digits is shared across all annotations in a Communication, the group of y digits is shared across all annotations generated by a given analytic (by convention, AnnotationMetadata tool) in a given Communication, and the group of z digits is unique to each annotation (generated by a given analytic). Thus all UUIDs in a Communication share the same first twelve hex digits and some UUIDs in a Communication share the same middle eight hex digits. Additionally, while the x and y components are generated uniformly at random, the z component for each analytic in a Communication starts at a uniform-at-random twelve hex digits for the first annotation and increments by one for each annotation thereafter. Thus the UUIDs of a Communication likely have many substrings in common and are easily compressed. For example, we might find the following seven UUIDs in a Communication, corresponding to seven annotations split across two analytics:

1bccb123-be45-7288-028a-4fdf3181ab51 1bccb123-be45-7288-028a-4fdf3181ab52 1bccb123-be45-7288-028a-4fdf3181ab53 1bccb123-be45-df12-9c04-198eaa130a4e 1bccb123-be45-df12-9c04-198eaa130a4f 1bccb123-be45-df12-9c04-198eaa130a50 1bccb123-be45-df12-9c04-198eaa130a51

One generator factory should be created per Communication, and a new generator should be created from that factory for each analytic processing the communication. Often each program represents a single analytic, so common usage is:

augf = AnalyticUUIDGeneratorFactory(comm)
aug = augf.create()
for <each annotation object created by this analytic>:
    annotation = next(aug)
    <add annotation to communication>

or if you’re creating a new Communication:

augf = AnalyticUUIDGeneratorFactory()
aug = augf.create()
comm = <create communication>
comm.uuid = next(aug)
for <each annotation object created by this analytic>:
    annotation = next(aug)
    <add annotation to communication>

where the annotation objects might be objects of type Parse, DependencyParse, TokenTagging, CommunicationTagging, etc.

create()
Returns:A UUID generator for a new analytic.
class concrete.util.concrete_uuid.UUIDClustering(comm)

Bases: object

Representation of the UUID instance clusters in a concrete communication (each cluster represents the set of nested members of the communication that reference or are identified by a given UUID).

hashable_clusters()

Hashable version of UUIDClustering.

Two UUIDClusterings c1 and c2 are equivalent (the two underlying Communications’ UUID structures are equivalent) if and only if:

c1.hashable_clusters() == c2.hashable_clusters()
Returns:The set of unlabeled UUID clusters in a unique and hashable format.
class concrete.util.concrete_uuid.UUIDCompressor(single_analytic=False)

Bases: object

Interface to replacing a Communication’s UUIDs with compressible UUIDs.

Parameters:single_analytic (bool) – True to generate new UUIDs using a single analytic for all annotations, false to use the annotation metadata tool name as the analytic id
compress(comm)

Return a copy of a communication whose UUIDs have been replaced by compressible UUIDs using AnalyticUUIDGeneratorFactory. When this method returns this object’s public member variable uuid_map will contain a dictionary mapping the original UUIDs to the new UUIDs.

Parameters:comm (Communication) – communication to be copied (the UUIDs of the copy will be made compressible)
Returns:Deep copy of comm with compressed UUIDs
Return type:Communication
concrete.util.concrete_uuid.bin_to_hex(b, n=None)

Return hexadecimal representation of binary value

Parameters:
  • b (int) – integer whose bit representation will be converted
  • n (int) – length of returned hexadecimal string (the string will be left-padded with 0s if it is originally shorter than n; an exception will be thrown if it is longer; the string will be returned as-is if n is None)
Returns:

a string of hexadecimal characters representing the bit sequence in b, padded to be n characters long if n is not None

Raises:

ValueError – if n is not None and the hexadecimal string representing b is longer than n

concrete.util.concrete_uuid.compress_uuids(comm, verify=False, single_analytic=False)

Create a copy of Communication comm with UUIDs converted according to the compressible UUID scheme

Parameters:
  • comm (Communication) –
  • verify (bool) – If True, use a heuristic to verify the UUID link structure is preserved in the new Communication
  • single_analytic (bool) – If True, use a single analytic prefix for all UUIDs in comm.
Returns:

A 2-tuple containing the new Communication (converted using the compressible UUID scheme) and the UUIDCompressor object used to perform the conversion.

Raises:

ValueError – If verify is True and comm has references added, raise because verification would cause an infinite loop.

concrete.util.concrete_uuid.generate_UUID()

Return a Concrete UUID object with a random UUID4 value.

Returns:a Concrete UUID object
concrete.util.concrete_uuid.generate_hex_unif(n)

Generate and return random string of n hexadecimal characters.

Parameters:n (int) – number of characters of string to return
Returns:string of n i.i.d. uniform hexadecimal characters
concrete.util.concrete_uuid.generate_uuid_unif()

Generate and return random UUID string whose characters are drawn uniformly from the hexadecimal alphabet.

Returns:string of hexadecimal characters drawn uniformly at random (delimited into five UUID-like segments by hyphens)
concrete.util.concrete_uuid.hex_to_bin(h)

Return binary encoding of hexadecimal string

Parameters:h (str) – string of hexadecimal characters
Returns:an integer whose bit representation corresponds to the hexadecimal representation in h
concrete.util.concrete_uuid.join_uuid(xs, ys, zs)

Given three hexadecimal strings of sizes 12, 8, and 12, join them into a UUID string (inserting hyphens appropriately) and return the result.

Parameters:
  • xs (str) – 12 hexadecimal characters that will form first two segments of the UUID string (size 8 and size 4 respectively)
  • ys (str) – 8 hexadecimal characters that will form the third and fourth segment of the UUID string (each of size 4)
  • zs (str) – 12 hexadecimal characters that will form the last segment of the UUID string (size 12)
Returns:

string of size 36 (12 + 8 + 12 = 32, plus four hyphens inserted appropriately) comprising UUID formed from xs, ys, and zs

Raises:

ValueError – if xs, ys, or zs have incorrect length

concrete.util.concrete_uuid.split_uuid(u)

Split UUID string into three hexadecimal strings of sizes 12, 8, and 12, returning those three strings (with hyphens stripped) in a tuple.

Parameters:u (str) – UUID string
Returns:a tuple of three hexadecimal strings of sizes 12, 8, and 12, corresponding to the first two segments, middle two segments, and last segment of the input UUID string (with all hyphens stripped)
Raises:ValueError – if UUID string is malformatted