concrete.util.comm_container module¶
Communication Containers - mapping Communication IDs to Communications
Classes that behave like a read-only dictionary (implementing Python’s collections.Mapping interface) and map Communication ID strings to Communications.
The classes abstract away the storage backend. If you need to optimize for performance, you may not want to use a dictionary abstraction that retrieves one Communication at a time.
-
class
concrete.util.comm_container.
DirectoryBackedCommunicationContainer
(directory_path, comm_extensions=[u'.comm', u'.concrete', u'.gz'])¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms, retrieving Comms from the filesystem
DirectoryBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from the filesystem.
Upon initialization, a DirectoryBackedCommunicationContainer instance will (recursively) search directory_path for any files that end with the specified comm_extensions. Files with matching extensions are assumed to be Communication files whose filename (sans extension) is the file’s Communication ID. So, for example, a file named ‘XIN_ENG_20101212.0120.concrete’ is assumed to be a Communication file with a Communication ID of ‘XIN_ENG_20101212.0120’.
Files with the extension .gz will be decompressed using gzip.
A DirectoryBackedCommunicationsContainer will not be able to find any files that are added to directory_path after the container was initialized.
Parameters: - directory_path (str) – Path to directory containing Communications files
- comm_extensions (str[]) – List of strings specifying filename extensions to be associated with Communications
-
class
concrete.util.comm_container.
FetchBackedCommunicationContainer
(host, port)¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms, retrieving Comms from a
FetchCommunicationService
serverFetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a
FetchCommunicationService
.If you need to retrieve large amounts of data from a
FetchCommunicationService
, then you SHOULD NOT USE THIS CLASS. This class retrieves one Communication at a time usingFetchCommunicationService
.Parameters: - host (str) – Hostname of
FetchCommunicationService
server - port (int) – Port # of
FetchCommunicationService
server
- host (str) – Hostname of
-
class
concrete.util.comm_container.
MemoryBackedCommunicationContainer
(communications_file, max_file_size=1073741824)¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms by loading all Comms in file into memory
FetchBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. All Communications in communications_file will be read into memory using a
CommunicationReader
instance.Parameters: - communications_file (str) – String specifying name of Communications file
- max_file_size (int) – Maximum file size, in bytes
-
class
concrete.util.comm_container.
RedisHashBackedCommunicationContainer
(redis_db, key)¶ Bases:
_abcoll.Mapping
Provides access to Communications stored in a Redis hash, assuming the key of each communication is its Communication id.
RedisHashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily retrieved from a Redis hash.
Parameters: - redis_db (redis.Redis) – Redis database connection object
- key (str) – Key in redis database where hash is located
-
class
concrete.util.comm_container.
S3BackedCommunicationContainer
(bucket, prefix_len=4)¶ Bases:
_abcoll.Mapping
Provides access to Communications stored in an AWS S3 bucket, assuming the key of each communication is its Communication id (optionally prefixed with a fixed-length, random-looking but deterministic hash to improve performance).
S3HashBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs (with or without prefixes) to Communications. Communications are lazily retrieved from an S3 bucket.
References
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Parameters: - bucket (boto.s3.bucket.Bucket) – S3 bucket object
- prefix_len (int) – length of prefix in each Communication’s key in the bucket. This number of characters will be removed from the beginning of the key to determine the Communication id (without incurring the cost of fetching and deserializing the Communication). A prefix enables S3 to better partition the bucket contents, yielding higher performance and a lower chance of getting rate-limited by AWS.
-
class
concrete.util.comm_container.
ZipFileBackedCommunicationContainer
(zipfile_path, comm_extensions=[u'.comm', u'.concrete'])¶ Bases:
_abcoll.Mapping
Maps Comm IDs to Comms, retrieving Comms from a Zip file
ZipFileBackedCommunicationContainer instances behave as dict-like data structures that map Communication IDs to Communications. Communications are lazily-retrieved from a Zip file.
Parameters: - zipfile_path (str) – Path to Zip file containing Communications
- comm_extensions (str[]) – List of strings specifying filename extensions associated with Communications