emtellipro package
Subpackages
Submodules
emtellipro.auth module
This module contains all the classes and methods necessary for authenticating with the API.
There’s no need to use any of these if using the Emtellipro class since that class uses this module internally.
- class emtellipro.auth.AuthenticationDetails(accesskey, secretkey)
Bases:
object
Conveniently stores the authentication details (access key, secret key) for use with all functions which require.
- __init__(accesskey, secretkey)
- Parameters
accesskey – the API key used to identify the user
secretkey – the API key used to sign messages
- property accesskey
The access key stored in this object. Read-only.
- property secretkey
The secret key stored in this object. Read-only.
- class emtellipro.auth.EmtelliproAuth(authdetails)
Bases:
requests.auth.AuthBase
Attaches the necessary authentication headers to Request objects. Instances of this class are used as Request.auth attributes.
- __init__(authdetails)
- Parameters
authdetails – instance of AuthenticationDetails
- emtellipro.auth.compute_canonical_request(request)
Given a Request object, this computes the canonical request string which is then used for computing the signature of the request.
- emtellipro.auth.gen_auth_header(request, auth_details)
Generates the cmplete Authorization header which contains the necessary signature for the request, and the details needed to verify it.
- emtellipro.auth.string_to_sign(request)
Given a Request object, this computes the string which the client needs to sign to verify the validity of the request.
emtellipro.data module
This module contains all data structures used to represent documents and annotations, including all forms of associated metadata.
The only class that should be created by user code should be InputDocument. The other classes are used when parsing annotations returned by the server.
- class emtellipro.data.AnnotatedDocument(document_dict)
Bases:
object
The annotated document returned by the Emtellipro API. This provides access to all returned entities and relations.
- id
the ID for the associated document that was submitted for processing
- category
the category of the document as returned by the API, or None if not returned.
- Type
dict
- subcategory
the subcategory of the document, or None if not returned by API
- Type
dict
- concepts
List of
emtellipro.data.Concept
objects
- ontology_versions
mapping from ontology name to a dict with version information. Only
release_version
is guaranteed to be present; this key will also be present even if emtelliPro does not return any ontology version information.- Type
dict
- found_entities
List of
emtellipro.data.FoundEntity
objects
- assumed_entities
List of
emtellipro.data.AssumedEntity
objects
- relations
the relations between entities found in the document. Keys are:
experiencer
,follow-up
,measurement
,imagelink
,medication
,qualifier
,temporality
,reportedevent
- Type
dict
- locations
locations found in the document. Keys are:
sentences
,sections
- Type
dict
- text
The text content of the document, if returned by the API. This will be populated for PDF documents. Will be None if not returned by the API.
- Type
str
- processing_status
the processing status of the report as returned by the API
- Type
str
- __init__(document_dict)
- Parameters
document_dict – a dictionary of the returned results for this document
- as_dict()
Returns the initial
document_dict
argument that was used to instantiate this object.
- class emtellipro.data.AssumedEntity(label, value, _doc)
Bases:
emtellipro.data.Entity
An entity that isn’t concretely present in the document text.
- value
the text representing this entity
- __init__(label, value, _doc)
- class emtellipro.data.Concept(concept_id, label, ontology, description, _doc)
Bases:
emtellipro.data._LabeledObject
Concept identified from input document.
- id
the ID of concept found in the associated ontology
- ontology
string containing the ontology’s name
- description
the description of this concept from the ontology
- __init__(concept_id, label, ontology, description, _doc)
- class emtellipro.data.Entity(doc, label)
Bases:
emtellipro.data._LabeledObject
Base class for entities found in submitted document.
- class emtellipro.data.ExperiencerRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
The experiencer/experienced relation between entities.
- __init__(label, attributes, args, _doc)
- experienced
The entity experienced by the experiencer
- experiencer
The entity experiencing the experienced
- polarity
The polarity of the relation as returned by the API, or None if it was not returned
- class emtellipro.data.FollowupRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
The requested follow-up found in the submitted document
- __init__(label, attributes, args, _doc)
- polarity
The polarity of the relation as returned by the API, or None if it was not returned
- procedures
list: List of entities representing the procedures referred to by this follow-up. May be empty.
- reasons
list: List of entities mentioned as reasons for the follow-up. May be empty.
- time_expression
The entity representing the follow-up time in the document text. May be None.
- class emtellipro.data.FoundEntity(label, entity_type, attributes, concept_links, locations, section_name, spans, _doc, text=None)
Bases:
emtellipro.data.Entity
Entities that were found concretely in the submitted document.
- type_name
entity type names from the different ontologies returned by the API. Note that not all ontologies are always present, so it’s safer to use
.get()
than[]
.- Type
dict
- polarity
the polarity of the entity, or None if it was not returned by the API
- uncertainty
the uncertainty of the entity, or None if it was not returned by the API
- measurement_unit
a list of measurement units in the found entity, or None if the API returned null.
- known_ambiguity
the known ambiguity status of the entity, or None if it was not returned by the API
- question_status
the question status of the entity, or None if it was not returned by the API
- guidance
the guidance attribute of the entity, or None if it was not returned by the API
- section_name
the name of the document section this entity was found in
- spans
list of spans associated with this entity
- __init__(label, entity_type, attributes, concept_links, locations, section_name, spans, _doc, text=None)
- property attributes
The names of the instance attributes which represent found entity attributes from emtelliPro.
- property concepts
The concepts associated with this entity
- property locations
List of text locations that contain this entity
- property parts
A mapping of spans to text as returned by the API. If the text was not returned by the API, the values will be None.
- class emtellipro.data.ImageLinkRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
A reference to an image found in the text
- __init__(label, attributes, args, _doc)
- image_findings
List of entities representing findings in images. May be empty.
- Type
list
- references
List of entites containing references to images. May be empty.
- Type
list
- class emtellipro.data.InputDocument(id_, category, subcategory, text=None, type_=None, *, filepath=None, file_obj=None, filetype=None, encoding=None, section_label=None)
Bases:
object
A document that needs to be annotated. It can be initialized either from plain text by passing the ‘text’ parameter, or from a file when passing the ‘filepath’ parameter.
If the path to a plaintext file is provided, then the ‘text’ attributed will be populated with the contents of the file for convenience.
- __init__(id_, category, subcategory, text=None, type_=None, *, filepath=None, file_obj=None, filetype=None, encoding=None, section_label=None)
- Parameters
id – a unique ID representing this particular document. This is useful for figuring out which document the returned annotations are for when providing multiple documents.
category – the category of document
subcategory – the subcategory
text (optional) – the raw text of the document (as a string, not bytes). This can be set to None, in which case filename must be specified.
type (optional) – the document type to be used for this document. If not specified, the API submit call will set it.
filepath (optional) – the path to the file which is to be submitted. It must not be set if document text is provided. This parameter should be used for PDF files. NOTE: A ValueError will be raised for unsupported filetypes. Currently allowed are plaintext and PDF files. If filetype not set, it’ll be detected from the file suffix.
file_obj (optional) – file-like object containing the raw file data to be sent to the server. If this is set, then filetype will also need to be set.
filetype (optional) – the file type of file_obj or filepath if set. Can be one of ‘pdf’ or ‘txt’.
encoding (optional) – the encoding with which the file should be read. If None, Python’s defaults are used. This is currently only used for plaintext files.
section_label (optional) – the value sent as the
process_with_section_labels
header for this document.
- as_dict()
Returns a representation of this document as a dictionary for easy conversion to JSON.
- property file_data
The file contents as a file-like object.
- class emtellipro.data.Location(doc, label)
Bases:
emtellipro.data._LabeledObject
Base class for different location types
- class emtellipro.data.MeasurementRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
A measurement found in the document.
- __init__(label, attributes, args, _doc)
- subject
The subject being measured.
- value
The value of the measurement.
- class emtellipro.data.MedicationRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
A relation between a medication and its arguments.
- __init__(label, attributes, args, _doc)
- date_times
The date_time entities
- Type
list
- dosages
The dosage entities
- Type
list
- drug
The drug entity
- durations
The duration entities
- Type
list
- frequencies
The frequency entities
- Type
list
- indications
The indication entities
- Type
list
- modes
The mode entities
- Type
list
- modifiers
The modifier entities
- Type
list
- necessities
The necessity entities
- Type
list
- quantities
The quantity entities
- Type
list
- routes
The route entities
- Type
list
- class emtellipro.data.QualifierRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
A relation between a qualifier and the entity it qualifiers.
- __init__(label, attributes, args, _doc)
- qualifier
The qualifier entity
- qualifier_type
The type of the qualifier relation
- qualifies
The entity being qualified
- class emtellipro.data.Relation(_doc, label, args, attributes)
Bases:
emtellipro.data._LabeledObject
Base class for all relation types.
- __init__(_doc, label, args, attributes)
- property arguments
The names of the instance attributes which represent relation arguments.
- property attributes
The names of the instance attributes which represent relation arguments.
- class emtellipro.data.ReportedEventRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
A relation capturing the notion of a communication between two groups of entities.
- __init__(label, attributes, args, _doc)
- category
The category of the relation
- from_entities
Sources of the communication
- Type
list
- methods
The methods of communication, such as in person or by telephone
- Type
list
- polarity
The polarity of the relation, or None if it was not returned by the API.
- subject
The subject of the communication
- time_expressions
Time expressions that indicate when the communication took place
- Type
list
- to_entities
Recipients of the communication
- Type
list
- class emtellipro.data.SectionLocation(label, spans, name, level, parent, _doc, text=None)
Bases:
emtellipro.data.Location
A section identified in the input document.
- spans
list of Span objects representing parts of the section
- text
list of strings containing text from the input document if returned by the API. If not, it will be None.
- name
the name of the section
- level
the level of the section in the document, starting with 0 for the outer-most section, and incremeting by 1 for each nested section.
- __init__(label, spans, name, level, parent, _doc, text=None)
- property parent
The parent section that contains this sub-section, or None if this is an outer-most section
- property parts
A mapping of spans to text as returned by the API. If the text was not returned by the API, the values will be None.
- class emtellipro.data.SentenceLocation(label, spans, sections, _doc, text=None)
Bases:
emtellipro.data.Location
A sentence identified in the input document. May be discontinuous.
- spans
list of Span objects representing parts of the sentence
- text
list of strings containing text from the input document if returned by the API. If not, it will be None.
- __init__(label, spans, sections, _doc, text=None)
- property parts
A mapping of spans to text as returned by the API. If the text was not returned by the API, the values will be None.
- property sections
All sections containting this sentence.
- class emtellipro.data.Span(start, end)
Bases:
object
A span representing a slice of the input document text.
>>> s = Span(1, 45) >>> start, end = s >>> start, end (1, 45) >>> slice(*s) slice(1, 45, None)
- __init__(start, end)
- property end
The end index of the slice (non-inclusive)
- property start
The start index of the slice
- class emtellipro.data.TemporalityRelation(label, attributes, args, _doc)
Bases:
emtellipro.data.Relation
A relation between a qualifier and the entity it qualifiers.
- __init__(label, attributes, args, _doc)
- category
The polarity of the temporality relation, or None if it was not returned by the API.
- modifiers
The modifier entities
- Type
list
- polarity
The polarity of the temporality relation, or None if it was not returned by the API.
- subject
The subject entity
- temporal_entity
The temporal entity
emtellipro.exceptions module
All the exceptions raised by code in this library are contained in this module. They all share a base class: EmtelliproError.
- exception emtellipro.exceptions.APIError(reason, task_id=None)
Bases:
emtellipro.exceptions.EmtelliproError
Exception raised for unexpected API-related errors
- reason
the reason for the error as provided by the server
- task_id
when available, the task ID the error is related to
- __init__(reason, task_id=None)
- exception emtellipro.exceptions.AuthenticationError(reason, task_id=None)
Bases:
emtellipro.exceptions.HTTPClientError
Exception for authentication errors.
- exception emtellipro.exceptions.CancelledError
Bases:
emtellipro.exceptions.EmtelliproError
Exception raised when an annotation task has been cancelled.
- exception emtellipro.exceptions.EmtelliproError
Bases:
Exception
Base class for all expections in this library.
- exception emtellipro.exceptions.HTTPClientError(reason, task_id=None)
Bases:
emtellipro.exceptions.APIError
Exception for generic client errors (i.e. HTTP 4xx errors).
- exception emtellipro.exceptions.ServerError(reason, task_id=None)
Bases:
emtellipro.exceptions.APIError
Exception for generic server errors (i.e. HTTP 5xx errors).
- exception emtellipro.exceptions.TaskFailedError(reason, task_id=None)
Bases:
emtellipro.exceptions.APIError
Exception raised when a task submitted has failed.
- exception emtellipro.exceptions.TaskNotFoundError(reason, task_id=None)
Bases:
emtellipro.exceptions.APIError
Exception indication the server does not know about the task ID it’s been sent.
- reason
textual representation of why the task was not found as returned by the server
- exception emtellipro.exceptions.UnsupportedFormatForParsingError
Bases:
emtellipro.exceptions.EmtelliproError
Exception raised when attempting to parse a result format we don’t provide a parser for in this SDK. This will usually occur when specifying the result format; if you’re interested in using our parsers, leave the result format unspecified.
emtellipro.utils module
These are some convenience functions for some common tasks.
- emtellipro.utils.load_annotated_json(json_file)
Parse json file to re-load AnnotatedDocument instances, if they’ve been saved using
json.dumps([annotated_doc.as_dict()])
- Parameters
json_file – file-like object containing JSON data
Returns: list of AnnotatedDocument instances
- emtellipro.utils.read_files(file_paths, category='Radiology', subcategory='CT', encoding=None, section_label=None, doc_id_filepath=False, filetype=None)
Read files and create InputDocument objects from each.
- Parameters
file_paths – paths for text files containing documents to be submitted
category (optional) – category to set for all documents read from the files
subcategory (optional) – subcategory to set for all documents read from the files
encoding (optional) – encoding used when reading the files. Default is None, which means ‘autodetect’. If you know what the encoding is ahead of time, it’s best to set this to avoid the overhead of detecting the encoding.
section_label (optional) – if given, this is the section label used for all documents read in.
doc_id_filepath (optional) – if set to True, this will use the filepath of the input file as the document ID.
filetype (optional) – the filetype to use for input documents; if not provided, the file extension will be used.
- Returns: tuple containing list of InputDocument objects, and mapping from
file path to document ID
emtellipro.version module
Version information for package to be used both by the setup.py and the package itself.
Module contents
This is the SDK for interacting with the Emtellipro engine API. The main entry point is the Emtellipro class which will handle all the details for submitting documents and retrieving results.
- class emtellipro.Emtellipro(accesskey, secretkey, server='https://api.emtelligent.com:50001', max_retries=5)
Bases:
object
This is the main way of interacting with the API. It handles all the necessary details for submitting documents for processing and retrieving results.
- __init__(accesskey, secretkey, server='https://api.emtelligent.com:50001', max_retries=5)
- Parameters
accesskey – the API key used to identify the user
secretkey – the API key used to sign messages
server (optional) – the URL of the server to connect to. This is used only for testing.
max_retries (optional) – the number of times to retry failed API requests; there may be failures due to network issues, so this should be a positive integer.
- check_task(task_id)
Create a ResultFuture instance from a
task_id
.- Parameters
task_id (str) – the task ID. This is available on
ResultFuture
instances as the.task_id
attribute, however it is simply the task ID as returned by the emtelliPro API’s/submit
call, so that can be provided regardless of how it was obtained (e.g. if you obtained it from our Java SDK, that will work, too).
- submit(documents, features=None, document_type='plain', max_shard_size=inf) Iterable[emtellipro.ResultFuture]
Submit documents for annotating.
- Parameters
documents – an iterable of Document objects to be annotated
features – a list of features to enable in the processing of the documents. Default of None means “enable all”.
document_type – string containing the document type to be used for all documents which don’t specify their own type. Default is
'plain'
max_shard_size (int) – maximum number of documents to submit at once. Documents are sharded based on size regardless of this setting, but this is also useful to limit how many documents there will be in the response from the engine, to help control memory usage when parsing the results.
- Returns: a ResultFuture object representing the current status of the
annotation process.
- property user
The user information as known by the Emtellipro server.
- class emtellipro.ResultFuture(task_id, api, num_docs=None)
Bases:
object
This class mimics asyncio.Future and concurrent.futures.Future in terms of some of the available methods and semantics, but you cannot await or yield from any of its methods.
ResultFuture instances will be created by the Emtellipro and should not be created directly.
Note: ResultFuture objects are pickle-able.
- task_id
The task ID as returned by the emtelliPro API. This can be stored and then used with
Emtellipro.check_task()
to recreate a ResultFuture instance.- Type
str
- __init__(task_id, api, num_docs=None)
- Parameters
task_id – a string representing the ID of the task submitted to the server.
api – instance of _api.Api used for retrieving results and checking status.
num_docs – the number of documents this result future is handling, if known.
- cancel()
Cancel the task being represented by this ResultFuture.
- cancelled()
Return True if the task was cancelled.
- done(timeout=None) emtellipro.ResultFutureStatus
Return True if the task was cancelled or finished processing.
- Parameters
timeout (int) – when checking the status of the job, the server may keep the connection open up to this amount of time or until the job is completed, whichever is quicker.
- property engine_version
The engine version as returned in the result. This can only be accessed after calling
.result()
; accessing this attribute before will result in an AttributeError.
- property num_docs
The total number of documents submitted for processing.
- raw_result(result_format='emtellipro-json-2')
Return the raw text result from the API, unparsed. The text is cached, so multiple calls to this method will not result in multiple API calls.
If you’d like the parsed result, use
.result()
instead.
- result()
Return a generator of AnnotatedDocument instances that were processed by Emtellipro API.
If you want to iterate over the generator multiple times, simply call this method multiple times and new generator instances will be created. This Emtellipro API response will be stored after the first call, so no extra API calls will be made.
If you’d like the unparsed raw API response, use
.raw_result()
instead.
- class emtellipro.ResultFutureStatus(status, prev_progress)
Bases:
object
This class is returned by ResultFuture.done, and contains information about the processing status of the job.
Note: this class implements
__bool__
with the same semantics as.done
- done
whether the job has completed processing (can either be a successful, failed, or cancelled job; both are considered “done”)
- Type
bool
- success
whether the job has completed successfully; note if the job is not done, this will be false
- Type
bool
- progress
a number in [0, 1] representing the percentage of completed documents
- Type
float
- __init__(status, prev_progress)