emtellipro package

Subpackages

Submodules

emtellipro.auth module

This module contains all the classes and methods necessary for authenticating with the API.

There’s no need to use any of these if using the Emtellipro class since that class uses this module internally.

class emtellipro.auth.AuthenticationDetails(accesskey, secretkey)

Bases: object

Conveniently stores the authentication details (access key, secret key) for use with all functions which require.

__init__(accesskey, secretkey)
Parameters
  • accesskey – the API key used to identify the user

  • secretkey – the API key used to sign messages

property accesskey

The access key stored in this object. Read-only.

property secretkey

The secret key stored in this object. Read-only.

class emtellipro.auth.EmtelliproAuth(authdetails)

Bases: requests.auth.AuthBase

Attaches the necessary authentication headers to Request objects. Instances of this class are used as Request.auth attributes.

__init__(authdetails)
Parameters

authdetails – instance of AuthenticationDetails

emtellipro.auth.compute_canonical_request(request)

Given a Request object, this computes the canonical request string which is then used for computing the signature of the request.

emtellipro.auth.gen_auth_header(request, auth_details)

Generates the cmplete Authorization header which contains the necessary signature for the request, and the details needed to verify it.

emtellipro.auth.string_to_sign(request)

Given a Request object, this computes the string which the client needs to sign to verify the validity of the request.

emtellipro.data module

This module contains all data structures used to represent documents and annotations, including all forms of associated metadata.

The only class that should be created by user code should be InputDocument. The other classes are used when parsing annotations returned by the server.

class emtellipro.data.AnnotatedDocument(document_dict)

Bases: object

The annotated document returned by the Emtellipro API. This provides access to all returned entities and relations.

id

the ID for the associated document that was submitted for processing

category

the category of the document as returned by the API, or None if not returned.

Type

dict

subcategory

the subcategory of the document, or None if not returned by API

Type

dict

concepts

List of emtellipro.data.Concept objects

ontology_versions

mapping from ontology name to a dict with version information. Only release_version is guaranteed to be present; this key will also be present even if emtelliPro does not return any ontology version information.

Type

dict

found_entities

List of emtellipro.data.FoundEntity objects

assumed_entities

List of emtellipro.data.AssumedEntity objects

relations

the relations between entities found in the document. Keys are: experiencer, follow-up, measurement, imagelink, medication, qualifier, temporality, reportedevent

Type

dict

locations

locations found in the document. Keys are: sentences, sections

Type

dict

text

The text content of the document, if returned by the API. This will be populated for PDF documents. Will be None if not returned by the API.

Type

str

processing_status

the processing status of the report as returned by the API

Type

str

__init__(document_dict)
Parameters

document_dict – a dictionary of the returned results for this document

as_dict()

Returns the initial document_dict argument that was used to instantiate this object.

class emtellipro.data.AssumedEntity(label, value, _doc)

Bases: emtellipro.data.Entity

An entity that isn’t concretely present in the document text.

value

the text representing this entity

__init__(label, value, _doc)
class emtellipro.data.Concept(concept_id, label, ontology, description, _doc)

Bases: emtellipro.data._LabeledObject

Concept identified from input document.

id

the ID of concept found in the associated ontology

ontology

string containing the ontology’s name

description

the description of this concept from the ontology

__init__(concept_id, label, ontology, description, _doc)
class emtellipro.data.Entity(doc, label)

Bases: emtellipro.data._LabeledObject

Base class for entities found in submitted document.

class emtellipro.data.ExperiencerRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

The experiencer/experienced relation between entities.

__init__(label, attributes, args, _doc)
experienced

The entity experienced by the experiencer

experiencer

The entity experiencing the experienced

polarity

The polarity of the relation as returned by the API, or None if it was not returned

class emtellipro.data.FollowupRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

The requested follow-up found in the submitted document

__init__(label, attributes, args, _doc)
polarity

The polarity of the relation as returned by the API, or None if it was not returned

procedures

list: List of entities representing the procedures referred to by this follow-up. May be empty.

reasons

list: List of entities mentioned as reasons for the follow-up. May be empty.

time_expression

The entity representing the follow-up time in the document text. May be None.

class emtellipro.data.FoundEntity(label, entity_type, attributes, concept_links, locations, section_name, spans, _doc, text=None)

Bases: emtellipro.data.Entity

Entities that were found concretely in the submitted document.

type_name

entity type names from the different ontologies returned by the API. Note that not all ontologies are always present, so it’s safer to use .get() than [].

Type

dict

polarity

the polarity of the entity, or None if it was not returned by the API

uncertainty

the uncertainty of the entity, or None if it was not returned by the API

measurement_unit

a list of measurement units in the found entity, or None if the API returned null.

known_ambiguity

the known ambiguity status of the entity, or None if it was not returned by the API

question_status

the question status of the entity, or None if it was not returned by the API

guidance

the guidance attribute of the entity, or None if it was not returned by the API

section_name

the name of the document section this entity was found in

spans

list of spans associated with this entity

__init__(label, entity_type, attributes, concept_links, locations, section_name, spans, _doc, text=None)
property attributes

The names of the instance attributes which represent found entity attributes from emtelliPro.

property concepts

The concepts associated with this entity

property locations

List of text locations that contain this entity

property parts

A mapping of spans to text as returned by the API. If the text was not returned by the API, the values will be None.

class emtellipro.data.ImageLinkRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

A reference to an image found in the text

__init__(label, attributes, args, _doc)
image_findings

List of entities representing findings in images. May be empty.

Type

list

references

List of entites containing references to images. May be empty.

Type

list

class emtellipro.data.InputDocument(id_, category, subcategory, text=None, type_=None, *, filepath=None, file_obj=None, filetype=None, encoding=None, section_label=None)

Bases: object

A document that needs to be annotated. It can be initialized either from plain text by passing the ‘text’ parameter, or from a file when passing the ‘filepath’ parameter.

If the path to a plaintext file is provided, then the ‘text’ attributed will be populated with the contents of the file for convenience.

__init__(id_, category, subcategory, text=None, type_=None, *, filepath=None, file_obj=None, filetype=None, encoding=None, section_label=None)
Parameters
  • id – a unique ID representing this particular document. This is useful for figuring out which document the returned annotations are for when providing multiple documents.

  • category – the category of document

  • subcategory – the subcategory

  • text (optional) – the raw text of the document (as a string, not bytes). This can be set to None, in which case filename must be specified.

  • type (optional) – the document type to be used for this document. If not specified, the API submit call will set it.

  • filepath (optional) – the path to the file which is to be submitted. It must not be set if document text is provided. This parameter should be used for PDF files. NOTE: A ValueError will be raised for unsupported filetypes. Currently allowed are plaintext and PDF files. If filetype not set, it’ll be detected from the file suffix.

  • file_obj (optional) – file-like object containing the raw file data to be sent to the server. If this is set, then filetype will also need to be set.

  • filetype (optional) – the file type of file_obj or filepath if set. Can be one of ‘pdf’ or ‘txt’.

  • encoding (optional) – the encoding with which the file should be read. If None, Python’s defaults are used. This is currently only used for plaintext files.

  • section_label (optional) – the value sent as the process_with_section_labels header for this document.

as_dict()

Returns a representation of this document as a dictionary for easy conversion to JSON.

property file_data

The file contents as a file-like object.

class emtellipro.data.Location(doc, label)

Bases: emtellipro.data._LabeledObject

Base class for different location types

class emtellipro.data.MeasurementRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

A measurement found in the document.

__init__(label, attributes, args, _doc)
subject

The subject being measured.

value

The value of the measurement.

class emtellipro.data.MedicationRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

A relation between a medication and its arguments.

__init__(label, attributes, args, _doc)
date_times

The date_time entities

Type

list

dosages

The dosage entities

Type

list

drug

The drug entity

durations

The duration entities

Type

list

frequencies

The frequency entities

Type

list

indications

The indication entities

Type

list

modes

The mode entities

Type

list

modifiers

The modifier entities

Type

list

necessities

The necessity entities

Type

list

quantities

The quantity entities

Type

list

routes

The route entities

Type

list

class emtellipro.data.QualifierRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

A relation between a qualifier and the entity it qualifiers.

__init__(label, attributes, args, _doc)
qualifier

The qualifier entity

qualifier_type

The type of the qualifier relation

qualifies

The entity being qualified

class emtellipro.data.Relation(_doc, label, args, attributes)

Bases: emtellipro.data._LabeledObject

Base class for all relation types.

__init__(_doc, label, args, attributes)
property arguments

The names of the instance attributes which represent relation arguments.

property attributes

The names of the instance attributes which represent relation arguments.

class emtellipro.data.ReportedEventRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

A relation capturing the notion of a communication between two groups of entities.

__init__(label, attributes, args, _doc)
category

The category of the relation

from_entities

Sources of the communication

Type

list

methods

The methods of communication, such as in person or by telephone

Type

list

polarity

The polarity of the relation, or None if it was not returned by the API.

subject

The subject of the communication

time_expressions

Time expressions that indicate when the communication took place

Type

list

to_entities

Recipients of the communication

Type

list

class emtellipro.data.SectionLocation(label, spans, name, level, parent, _doc, text=None)

Bases: emtellipro.data.Location

A section identified in the input document.

spans

list of Span objects representing parts of the section

text

list of strings containing text from the input document if returned by the API. If not, it will be None.

name

the name of the section

level

the level of the section in the document, starting with 0 for the outer-most section, and incremeting by 1 for each nested section.

__init__(label, spans, name, level, parent, _doc, text=None)
property parent

The parent section that contains this sub-section, or None if this is an outer-most section

property parts

A mapping of spans to text as returned by the API. If the text was not returned by the API, the values will be None.

class emtellipro.data.SentenceLocation(label, spans, sections, _doc, text=None)

Bases: emtellipro.data.Location

A sentence identified in the input document. May be discontinuous.

spans

list of Span objects representing parts of the sentence

text

list of strings containing text from the input document if returned by the API. If not, it will be None.

__init__(label, spans, sections, _doc, text=None)
property parts

A mapping of spans to text as returned by the API. If the text was not returned by the API, the values will be None.

property sections

All sections containting this sentence.

class emtellipro.data.Span(start, end)

Bases: object

A span representing a slice of the input document text.

>>> s = Span(1, 45)
>>> start, end = s
>>> start, end
(1, 45)
>>> slice(*s)
slice(1, 45, None)
__init__(start, end)
property end

The end index of the slice (non-inclusive)

property start

The start index of the slice

class emtellipro.data.TemporalityRelation(label, attributes, args, _doc)

Bases: emtellipro.data.Relation

A relation between a qualifier and the entity it qualifiers.

__init__(label, attributes, args, _doc)
category

The polarity of the temporality relation, or None if it was not returned by the API.

modifiers

The modifier entities

Type

list

polarity

The polarity of the temporality relation, or None if it was not returned by the API.

subject

The subject entity

temporal_entity

The temporal entity

class emtellipro.data.User(username)

Bases: object

Instances of this class represent the response of the /user API endpoint.

__init__(username)
Parameters

username – the username as known by the server

emtellipro.exceptions module

All the exceptions raised by code in this library are contained in this module. They all share a base class: EmtelliproError.

exception emtellipro.exceptions.APIError(reason, task_id=None)

Bases: emtellipro.exceptions.EmtelliproError

Exception raised for unexpected API-related errors

reason

the reason for the error as provided by the server

task_id

when available, the task ID the error is related to

__init__(reason, task_id=None)
exception emtellipro.exceptions.AuthenticationError(reason, task_id=None)

Bases: emtellipro.exceptions.HTTPClientError

Exception for authentication errors.

exception emtellipro.exceptions.CancelledError

Bases: emtellipro.exceptions.EmtelliproError

Exception raised when an annotation task has been cancelled.

exception emtellipro.exceptions.EmtelliproError

Bases: Exception

Base class for all expections in this library.

exception emtellipro.exceptions.HTTPClientError(reason, task_id=None)

Bases: emtellipro.exceptions.APIError

Exception for generic client errors (i.e. HTTP 4xx errors).

exception emtellipro.exceptions.ServerError(reason, task_id=None)

Bases: emtellipro.exceptions.APIError

Exception for generic server errors (i.e. HTTP 5xx errors).

exception emtellipro.exceptions.TaskFailedError(reason, task_id=None)

Bases: emtellipro.exceptions.APIError

Exception raised when a task submitted has failed.

exception emtellipro.exceptions.TaskNotFoundError(reason, task_id=None)

Bases: emtellipro.exceptions.APIError

Exception indication the server does not know about the task ID it’s been sent.

reason

textual representation of why the task was not found as returned by the server

exception emtellipro.exceptions.UnsupportedFormatForParsingError

Bases: emtellipro.exceptions.EmtelliproError

Exception raised when attempting to parse a result format we don’t provide a parser for in this SDK. This will usually occur when specifying the result format; if you’re interested in using our parsers, leave the result format unspecified.

emtellipro.utils module

These are some convenience functions for some common tasks.

emtellipro.utils.load_annotated_json(json_file)

Parse json file to re-load AnnotatedDocument instances, if they’ve been saved using json.dumps([annotated_doc.as_dict()])

Parameters

json_file – file-like object containing JSON data

Returns: list of AnnotatedDocument instances

emtellipro.utils.read_files(file_paths, category='Radiology', subcategory='CT', encoding=None, section_label=None, doc_id_filepath=False, filetype=None)

Read files and create InputDocument objects from each.

Parameters
  • file_paths – paths for text files containing documents to be submitted

  • category (optional) – category to set for all documents read from the files

  • subcategory (optional) – subcategory to set for all documents read from the files

  • encoding (optional) – encoding used when reading the files. Default is None, which means ‘autodetect’. If you know what the encoding is ahead of time, it’s best to set this to avoid the overhead of detecting the encoding.

  • section_label (optional) – if given, this is the section label used for all documents read in.

  • doc_id_filepath (optional) – if set to True, this will use the filepath of the input file as the document ID.

  • filetype (optional) – the filetype to use for input documents; if not provided, the file extension will be used.

Returns: tuple containing list of InputDocument objects, and mapping from

file path to document ID

emtellipro.version module

Version information for package to be used both by the setup.py and the package itself.

Module contents

This is the SDK for interacting with the Emtellipro engine API. The main entry point is the Emtellipro class which will handle all the details for submitting documents and retrieving results.

class emtellipro.Emtellipro(accesskey, secretkey, server='https://api.emtelligent.com:50001', max_retries=5)

Bases: object

This is the main way of interacting with the API. It handles all the necessary details for submitting documents for processing and retrieving results.

__init__(accesskey, secretkey, server='https://api.emtelligent.com:50001', max_retries=5)
Parameters
  • accesskey – the API key used to identify the user

  • secretkey – the API key used to sign messages

  • server (optional) – the URL of the server to connect to. This is used only for testing.

  • max_retries (optional) – the number of times to retry failed API requests; there may be failures due to network issues, so this should be a positive integer.

check_task(task_id)

Create a ResultFuture instance from a task_id.

Parameters

task_id (str) – the task ID. This is available on ResultFuture instances as the .task_id attribute, however it is simply the task ID as returned by the emtelliPro API’s /submit call, so that can be provided regardless of how it was obtained (e.g. if you obtained it from our Java SDK, that will work, too).

submit(documents, features=None, document_type='plain', max_shard_size=inf) Iterable[emtellipro.ResultFuture]

Submit documents for annotating.

Parameters
  • documents – an iterable of Document objects to be annotated

  • features – a list of features to enable in the processing of the documents. Default of None means “enable all”.

  • document_type – string containing the document type to be used for all documents which don’t specify their own type. Default is 'plain'

  • max_shard_size (int) – maximum number of documents to submit at once. Documents are sharded based on size regardless of this setting, but this is also useful to limit how many documents there will be in the response from the engine, to help control memory usage when parsing the results.

Returns: a ResultFuture object representing the current status of the

annotation process.

property user

The user information as known by the Emtellipro server.

class emtellipro.ResultFuture(task_id, api, num_docs=None)

Bases: object

This class mimics asyncio.Future and concurrent.futures.Future in terms of some of the available methods and semantics, but you cannot await or yield from any of its methods.

ResultFuture instances will be created by the Emtellipro and should not be created directly.

Note: ResultFuture objects are pickle-able.

task_id

The task ID as returned by the emtelliPro API. This can be stored and then used with Emtellipro.check_task() to recreate a ResultFuture instance.

Type

str

__init__(task_id, api, num_docs=None)
Parameters
  • task_id – a string representing the ID of the task submitted to the server.

  • api – instance of _api.Api used for retrieving results and checking status.

  • num_docs – the number of documents this result future is handling, if known.

cancel()

Cancel the task being represented by this ResultFuture.

cancelled()

Return True if the task was cancelled.

done(timeout=None) emtellipro.ResultFutureStatus

Return True if the task was cancelled or finished processing.

Parameters

timeout (int) – when checking the status of the job, the server may keep the connection open up to this amount of time or until the job is completed, whichever is quicker.

property engine_version

The engine version as returned in the result. This can only be accessed after calling .result(); accessing this attribute before will result in an AttributeError.

property num_docs

The total number of documents submitted for processing.

raw_result(result_format='emtellipro-json-2')

Return the raw text result from the API, unparsed. The text is cached, so multiple calls to this method will not result in multiple API calls.

If you’d like the parsed result, use .result() instead.

result()

Return a generator of AnnotatedDocument instances that were processed by Emtellipro API.

If you want to iterate over the generator multiple times, simply call this method multiple times and new generator instances will be created. This Emtellipro API response will be stored after the first call, so no extra API calls will be made.

If you’d like the unparsed raw API response, use .raw_result() instead.

class emtellipro.ResultFutureStatus(status, prev_progress)

Bases: object

This class is returned by ResultFuture.done, and contains information about the processing status of the job.

Note: this class implements __bool__ with the same semantics as .done

done

whether the job has completed processing (can either be a successful, failed, or cancelled job; both are considered “done”)

Type

bool

success

whether the job has completed successfully; note if the job is not done, this will be false

Type

bool

progress

a number in [0, 1] representing the percentage of completed documents

Type

float

__init__(status, prev_progress)