Changelog
All notable changes to the emtelliPro Python SDK and associated files will be documented in this file. This SDK follows Semantic Versioning, although version identifiers adhere to standard Python packaging guidelines specified in PEP 440.
5.28.0 (2023-10-23)
Added
JSON input files can now contain a
"filepath"
item which points to a file containing the actual data for the document. This can be either a.txt
or.pdf
file. The path should not be in the same directory as the JSON file, or else there’s a risk the document will be processed twice.
Changed
Updated JSON example in documentation to show all the possible metadata options.
New CCD config option
report-subsection-sections
in config section[file_handlers.ccd]
, to control what sections the report subsection feature is applied in. See new CCD config docs for details.
5.27.0 (2023-08-22)
Added
When the database client detects that it’s running outside a virtual environment it will print a warning to let the user know.
APIError
exceptions (and subclasses) now contain atask_id
attribute when available.The JSON files used as input for the database client can now contain structured metadata. This metadata must be stored as a mapping in a
"structured_metadata"
key and will be saved to thedocumentstructuredmetadata
table; if this key is missing, then no structured metadata will be stored, as before.The database client will now include version numbers when listing installed plugins.
The CCD plugin has now been merged into the SDK and the database client now natively supports CCD files. The separate plugin is now ignored if it’s installed.
Deprecated
The database client’s
--bulk-insert
and--fast-postgres
options are deprecated; the bulk-insert mode will be the only insert method moving forward, since it supports all database types and is faster.
5.26.0 (2023-03-17)
Database migration is required for this version.
Added
The
radplaybook-ontology
emtelliPro feature is now supported.There’s now support for the
date_time
argument of medication relations. This is stored inemtellipro.data.MedicationRelation.date_times
(as a list), and in themedicationrelationdatetime
table.The database client is now able to save the unprocessed results from emtelliPro to disk instead of in a database. You can use this by setting
--database raw://SOME_DIRECTORY_PATH
for theprocess
command; thecreate-db
command is not necessary because theprocess
command will automatically create the target directory if missing.Support has been added for the
guidance
attribute of found entities. This is now found in the SDK asFoundEntity.guidance
(where it is stored as a string), and it’s stored in the database in the newfoundentity.guidance
column. The new feature is calledentity-guidance
. This change will require a database migration.Support is now available for the
ontology_versions
returned by emtelliPro for each processed document. These are available asAnnotatedDocument.ontology_versions
and in theontologyversions
table. The database migration will insert rows in theontologyversions
table based on thedocumentconcepts
table, but withrelease_version = NULL
since previous ontology version data is not available.
5.25.0 (2022-10-17)
Added
You can now use
-q
as an alias for--quiet
for the database client.Error log messages are now printed to STDERR when
emtellipro-db-client process --quiet
is used; to hide the error messages you can pass--quiet
(or-q
) twice. In the config file, this can now be controlled by settingquiet = 2
(using the oldtrue
andfalse
values still works, but integer values are also supported).In the
emtellipro.data
module, there are nowFoundEntity.attributes
,Relation.attributes
, andRelation.arguments
instance attributes which tell you the names of the instance attributes which are relation attributes and relation arguments, respectively; eachRelation
subclass have these attributes. This provides some support for introspection to allow writing generic relation-handling code.
Changed
The JSON file loader in the database client now validates input JSON documents and provides descriptive error messages.
Document that Python 3.7 is now the minimum supported version, since Python 3.6 is no longer officially supported.
The
emtellipro_flatten_json
example uses the new data model metadata to look up attributes and arguments. New found entity attributes, new relation types, and new relation attributes and arguments will be handled automatically as long as their values fall into the general formats already supported. There are some changes to table names and column order. Relation tables are now produced only if non-empty.When cancelling tasks, the HTTP
PATCH
method is now used instead ofGET
; both methods are still supported by the API, butGET
is deprecated.
Fixed
Filenames with extra dots in them would previously not have the file extension parsed correctly, and so would not be read
5.24.0 (2022-07-04)
Added
When running
emtellipro-db-client process --help
, the defaults for all options are now printed.The database client now takes a
-c / --config
option which allows you to pass in a configuration file containing options. All command-line options are supported (simply remove the--
from the beginning of an option to get the config file version).The
emtellipro-db-client process
command now accepts a--filetype
option which tells it to treat all input files as if they had the given file type (thus disabling detecting file type based on file extension). The associated config file option is calledfiletype
. Seeprocess --help
for available file types (since it depends on installed plugins).
Changed
API requests will now automatically time out after 90 seconds.
5.23.0 (2022-05-17)
Database migration is required for this version.
Added
The SDK’s version is stored in the new
processingdetails.sdk_version
column.The
emtellipro-db-client process --doc-id-filepath
option now supports JSON and JSONL files.Support for the “reported event” relation in
examples/emtellipro_flatten_json/
.Support for the
snomedicd10cm-ontology
feature has now been added.There is now support for the “reported event” relation. This is accessible using
emtellipro.data.ReportedEventRelation
, and in the database in thereportedeventrelation
table, with arguments in thereportedeventrelationtoentity
,reportedeventrelationfromentity
,reportedeventrelationmodifier
,reportedeventrelationtimeexpression
tables. The feature name isreportedevent-relations
.Example
examples/emtellipro_flatten_json/
which reads emtelliPro JSON result files and uses the SDK’s data model to convert a flatter format.The database client now stores the engine version returned by emtelliPro in the new
processingdetails.engine_version
column; if not returned, this will be NULL.The
documentmetadata
table now has 2 new columns:subject_gender
, andrequestor
, which can be populated by providing the same field names in the input documents’ metadata.
Fixed
Some database errors were leading to the database client to exit with code 0 instead of 5 (even though the error was printed and logged).
5.22.0 (2022-03-03)
Added
The database client now accepts a
process --doc-id-filepath
flag which tells it to use the filepath of the input document as the document ID when submitting to the server; this is currently only supported for.txt
documents.The database client now allows configuration of the logging path and level through the
--log-file PATH
and--log-level LEVEL
options, which go in the same position as the--database
option (i.e. in betweenemtellipro-db-client
and the subcommand); seeemtellipro-db-client --help
for details.The database client now takes a
process --polling-frequency SECONDS
option which adjusts how often it polls emtelliPro for status updates on the job processing status; by default it waits for 1 second between status checks, but for jobs with very large documents it might make sense to increase that value.
Fixed
Previously it was not possible to run multiple database clients in parallel using the
--no-bulk-insert
option due to uniqueness constraints on theconcept
table; this was not an issue with--bulk-insert
, and now--no-bulk-insert
has been fixed to allow multiple clients to insert into theconcept
table simultaneously.
5.21.0 (2022-01-27)
Database migration is required for this version.
Added
in addition to CSV and JSONL output from the database client, it now also supports JSON output. You can use this by setting
--database json://SOME_DIRECTORY_PATH
for theprocess
andcreate-db
commands.the simple client now takes a
--max-retries
option in the same position as the--server
option, and the database client now takes the same option for theprocess
command. This option specifies the number of times to retry failed API requests, and is enabled default, with a default value of 5, using exponential backoff between retries. There is a maximum wait time between retries of 120 seconds, so setting--max-retries
to large values will eventually result in multiple retries at 120 second intervals. The intention behind this option is that API requests may fail due to network issues, so this will allow continued processing in the case of transient network errors.the database client now accepts a
process --store-failed
option which will store details about documents that failed processing (i.e. ones whereprocessing_status
returned by emtelliPro is"error"
).the database now contains a
document.processing_status
column which contains the processing status returned by emtelliPro for the document. This will usually be"success"
unless--store-failed
is passed to the database client’sprocess
command, in which case it may be"error"
.the database client has improved error handling and reporting about errors when submitting reports for processing
Fixed
the database client would raise an error when saving temporality relations using
--no-bulk-insert
due to an incorrect relationship definition. This affects version 5.20.0 and later.
5.20.0 (2022-01-17)
Database migration is required for this version.
Added
Temporality relations are now supported. They’re found in
emtellipro.data.TemporalityRelation
, and in the database in thetemporalityrelation
table, with the modifiers in thetemporalityrelationmodifier
table.
5.19.0 (2021-12-08)
Database migration is required for this version.
Added
The
documentmetadata
table now has 3 new columns:author_name
,subject_name
, andsubject_dob
.
Fixed
The
migrate
command for the database client would raise an exception when trying to migrate CSV or JSON files (which is not supported). It now prints an simple error message instead.There was performance regression introduced in 5.17.0 when saving to SQL Server, which caused significantly reduced storage speed. This was improved in 5.18.0 and has now been completely fixed.
5.18.0 (2021-11-26)
Added
Support has been added for the
question_status
attribute of found entities. This is now found in the SDK asFoundEntity.question_status
(where it is stored as a string), and it’s stored in the database in the newfoundentity.question_status
column. The new feature is calledentity-question-status
. This change will require a database migration.
Fixed
there are now some performance improvements when saving to SQL Server, especially when using Microsoft’s ODBC driver instead of FreeTDS.
creating an
InputDocument
with an empty string as the text raised aValueError
, but now empty strings are allowed.
5.17.1 (2021-10-13)
Fixed
There was a bug introduced in 5.17.0 when exporting to CSV which caused a crash; this has now been fixed.
The
--features
option was not parsed properly in 5.17.0; this has now been fixed.
5.17.0 (2021-10-08)
Added
The database client’s
process
command now accepts a--max-save-shard-size
which specifies the number of reports to store to the database at once; this is ideally used for lowering the default value from 50, since large numbers can cause database errors.You can now press CTRL-C when the database client is running and it will finish saving to the database what it has so far, and will exit cleanly. Pressing CTRL-C a second time will cause it to exit immediately.
Changed
The database client is now multi-threaded, so it can read files, submit them for processing, and store results to the database in parallel.
The database client’s
--max-submit-shard-size
default is now 100, to avoid having the database thread waiting too long for results.
5.16.0 (2021-09-23)
Added
Support for key-pair authentication for Snowflake has been added. All commands now take
--snowflake-private-key-path
as an option alongside--database
which is the path to the private key used for the connecting to Snowflake. If using this option, omit the password in the connection URL. To be consistent with SnowSQL, the passphrase for this key file can be passed using SNOWFLAKE_PRIVATE_KEY_PASSPHRASE or SNOWSQL_PRIVATE_KEY_PASSPHRASE in the environment, or if those environment variables are not set, the client will prompt for the passphrase if necessary. Both encrypted and unencrypted key files are supported.
5.15.0 (2021-09-20)
Added
There is now a plugin architecture based on signals.
Improved error handling when running
create-db
on an existing database, and checking whether the database schema version stored in thealembic_version
table has been accidentally deleted.If the
alembic_version
table has been accidentally cleared, thecreate-db
command will now allow the user to re-insert the database schema version in that table.The simple client now has a
cancel
command that allows cancelling one or more tasks. Seeemtellipro-client cancel --help
for available options.The simple client will now print the last unfinished task ID when it receives a SIGINT (e.g. from CTRL-C) while a job is in progress
Fixed
There was a memory leak in the SDK which has now been fixed, so running the simple client or database client will now use a stable amount of memory.
5.14.1 (2021-08-09)
Fixed
The last version of the SDK introduced a bug where JSON documents weren’t counted properly for the progress bar, and JSON document IDs were not properly attached to filenames, leading to an error message. This is now fixed.
5.14.0 (2021-07-29)
Added
The database client now supports saving output to CSV or JSONL files. Simply use
--database csv://PATH_TO_DIRECTORY
or--database jsonl://PATH_TO_DIRECTORY
for thecreate-db
andprocess
commands, instead of the regular database URLs. The directory will be populated with files named after each table in the regular database schema, and all ID columns will use UUIDs in standard format (36 characters, with the hyphens).
Changed
The database schema when saving to Snowflake now uses UUIDs defined as
CHAR(36)
for all the ID columns (and associated foreign keys). These are generated locally by the database client and will lead to significantly faster insertions for Snowflake users. This will require running themigrate
command to update existing Snowflake databases. Other databases will continue using auto-incrementing integer IDs.When using
--max-submit-shard-size
with the database client, it will now only load the specified number of input documents at once (previously all input documents were loaded unconditionally, and this option only limited how many were submitted in each shard). This option can now be used to reduce memory usage when there are many input documents.
5.13.0 (2021-07-15)
Added
The database client’s
process
command now accepts a--skip-database-checks
option which skips checks for database consistency (i.e. checking all the tables exist and have the correct columns).
Fixed
When reading input documents from JSON files and storing results using
--bulk-insert
the database client failed to store document metadata into thedocumentmetadata
table if the set of metadata keys weren’t the same across all JSON files. This has now been fixed and JSON files can have differing metadata keys.
5.12.0 (2021-06-24)
Added
The database client will now use
INSERT IGNORE
in MySQL andINSERT OR IGNORE
in SQLite to speed up inserts into theconcepts
table when using the--bulk-insert
optionThe database client now accepts a
--max-submit-shard-size
option which allows restricting the number of documents submitted at once to the API. If unset, the client will continue determining shard size based on document size, so this is mostly useful for restricting shard size further than normal.
Fixed
On certain reports, when using the
--bulk-insert
option the database client would attempt to insert duplicate concepts in theconcepts
table which would fail due to the uniqueness constraints on the table. Inserted concepts are now de-duplicated before insertion to fix this issue.
5.11.0 (2021-06-18)
Added
There is now support for the “duration” and “indication” arguments for medication relations. Like all the other medication relation arguments, these will also be stored as lists. They can be found in the
MedicationRelation.duration
andMedicationRelation.indication
attributes andmedicationrelationduration
andmedicationrelationindication
tables, respectively; the new tables will require a database migration.
5.10.1 (2021-06-08)
Changed
Stopped Snowflake’s SQLAlchemy driver from logging INSERT queries to
errors.log
Improved performance of Snowflake insertions
Fixed
Specified SQLAlchemy version must be at least 1.4
5.10.0 (2021-06-08)
Added
Snowflake databases are now supported for storing data using the database client
Bulk inserts (the
--fast-postgres
) option is now available for all databases, and is called--bulk-insert
); this will lead to some speed improvements especially in the cases of databases which support returning primary keys on inserted rows without an extra query (such as PostgreSQL and SQL Server). Postgres still has the best support in this case since it allows inserting missing entries into theconcepts
table in a single query.Regular storing of documents one at a time (without
--bulk-insert
) also has received some speed improvements by grouping inserts more efficiently.
5.9.0 (2021-05-13)
Added
The
foundentity
andassumedentity
tables now contain adocument_id
column with a foreign key referencingdocument.id
; this should make it simpler to query found entities and assumed entities in a document without having to join to theentity
table (although theentity
table is unchanged so older queries will continue working). This will require a database schema migration using themigrate
command.Added
AnnotatedDocument.processing_status
attribute which stores the response of the engine for the “processing_status” JSON attribute. This is used for checking for failed reports in the database client (in the case that the entire task isn’t marked as failed).
Changed
Modified the type of the
document.json_representation
anddocument.text
columns in MySQL from TEXT to LONGTEXT, since depending on the document size, TEXT might not be able to hold all of the data.When you enable
--fast-postgres
with an unsupported database (i.e. anything other than postgres), you now get a clear error message.Minimum required Python version is now 3.6; all previous versions are now end-of-life and are no longer receiving security updates.
5.8.0 (2021-03-09)
Added
If SQL Server has full-text search support installed, then the database client will now create a FULLTEXT index on the
sectionlocation.text
andsentencelocation.text
columns. Note that you will need to run themigrate
command to create these indexes.This matches the existing support for similar indexes in MySQL and Postgres. If full-text support is not installed, the
migrate
andcreate-db
commands will indicate that no full-text indexes were created.
5.7.0 (2021-02-25)
Added
Support for the
snomedicd10-ontology
feature has now been added.The new
known_ambiguity
attribute of found entities is now stored in the database as thefoundentity.known_ambiguity
column, and is available in theFoundEntity.known_ambiguity
attribute in the SDK. You will need to run the database client’smigrate
command when updating.
Fixed
The issue with the extra COMMIT being sent to SQL Server when outside of a transaction has now been fixed.
5.6.0 (2021-02-04)
Added
PDFs stored in JSON files are now supported; this requires the PDF to be Base64-encoded and stored under a
pdf
key in the JSON; thetext
option must be omitted in that case, or else the JSON will be considered to contain plaintext data instead.
Fixed
The database client was storing document text when using the
--fast-postgres
option even when it shouldn’t have been. Ensure you’re using--store-reports
if you’d like to store report text. If the input document is a PDF, the document text will be stored regardless, to ensure the spans make sense.
5.5.0 (2020-12-05)
Added
Both the simple client and the database client now support the
--quiet
flag when processing reports, which will hide the progress bars.Both clients now support the
timeout
parameter on/status
calls to the engine, potentially speeding up processing of single documents.
Fixed
The clients were making some unnecessary status calls at the end of processing, and that’s now improved. Calling
ResultFuture.result()
andResultFuture.raw_result()
may now raiseTaskNotFoundError
when attempting to retrieve a result for an invalid task ID.
5.4.3 (2020-11-27)
Fixed
Auth was signing all headers, and now only signs those required. This will avoid auth failures in some network configurations.
5.4.2 (2020-11-05)
Fixed
There was a typo in an index name which was confusing (
ix_sectionlocationlocation_text
, has too many locations)
5.4.1 (2020-11-05)
Fixed
There was a misalignment of sentence locations and sentence spans when storing sentences in the database when using the
--fast-postgres
option; this has occurred since 5.3.0. This has now been fixed, and checks have been added to ensure this sort of issue will not recur with other similar tables.
5.4.0 (2020-11-03)
Added
Added two new Postgres GIN indexes on the
sentencelocation.text
andsectionlocation.text
columns.
Changed
The Postgres B-tree indices on
sentencelocation.text
andsectionlocation.text
were removed because there is a maximum size for these indices and sometimes the text in those columns exceeds the maximum size supported by the index.
5.3.0 (2020-10-19)
Added
When using the database client to ingest documents from a database, you can now provide the
--text-is-pdf
option to tell it the ‘text’ column contains raw PDF bytes, rather than plaintext.documentation has been added for the new PDF ingest from database column feature
the database client now has a
process --store-sections-and-sentences
option which will store section and sentence text insectionlocation.text
andsentencelocation.text
columns, respectively. In the case that sentences or sections are discontinuous, their different parts will simply joined by using single spaces.There are now indices on the
sectionlocation.text
andsentencelocation.text
columns. On MySQL these will be FULLTEXT indices, and on Postgres, they’re B-tree indices using thetext_pattern_ops
operator class.
Fixed
The database storing progress bar was usually not very smooth when using the fast-postgres option. This was because the batches were too big and usually most docs fit into one batch. Now there’s a batch of size 50 when storing results which leads to a smoother progress bar.
5.2.1 (2020-07-27)
Fixed
There was a corner case where the database client detected that some documents failed to be processed, but didn’t log which ones.
5.2.0 (2020-07-03)
Added
There’s now support for arbitrary key-value metadata which can be returned by file handler plugins. This data is stored in the documentstructuredmetadata table where the keys and values are both unicode texts; you may want to cast the values to more useful types for queries.
5.1.0 (2020-06-09)
Added
DB client support for plugins that can add handlers for new file types.
More general DB client handling of file paths, especially to handle more than one JSON or JSONL file at once and to apply the recursive option to JSON, JSONL, or plugin-added file types.
5.0.0 (2020-05-01)
Added
Added support for
umlsloinc-ontology
as an available feature.The necessity and modifier arguments for medications have been added, and are stored in the
medicationrelationnecessity
andmedicationrelationmodifier
tables, respectively.
Changed
ResultFuture.done()
now returns aResultFutureStatus
object (which is truthy), and this object must be used now for checking progress and general status;ResultFuture.progress()
has now been removed (useResultFutureStatus.progress
for this)
4.9.0 (2020-04-24)
Added
The database client can now store the JSON it received from the server for each document into a new
document.json_representation
column. This will require a migration using themigrate
subcommand. Storing JSON can be enabled using theprocess --store-json
command-line option.
4.8.0 (2020-04-04)
Added
All HTTP client error response codes (4xx) from the emtelliPro server are now handled using an
HTTPClientError
exception if there isn’t a more specific exception for them. The clients will continue to exit with code 4 for these errors (unless there’s a more specific exit code for them).There is now support for
process_with_section_labels
. Both clients now accept a--section-label
option which will be used for all documents being read in. This is passed verbatim to the server. Support for this option has also been added using thesection_label
parameter ofInputDocument
objects. The section label is now stored in a newdocument.section_label
column in the database. NOTE: this change requires you to use themigrate
command to add the column to your exiting database.
4.7.0 (2020-03-23)
Added
Added support for medication relations. It’s controlled by the
medication-relations
feature. New tables aremedicationrelation
,medicationrelationdosage
,medicationrelationfrequency
,medicationrelationmode
,medicationrelationquantity
, andmedicationrelationroute
. It can also be accessed usingAnnotatedDocument.relations['medication']
.
4.6.0 (2020-03-18)
Added
Support for the
entity-measurement-unit
feature. This means there’s a new table where it gets stored:foundentitymeasurementunit
, and theFoundEntity
object now has aFoundEntity.measurement_unit
attribute.There’s now an
emtellipro-db-client migrate
sub-command which runs migration scripts against the database to upgrade it to the current schema, copying data as necessary; this currently only supports databases going back to v4.3.0, and going forward every new version of the database client will allow migrating the database. Please ensure you have a backup of the database just in case.
4.5.0 (2020-03-13)
Added
The database client now has a
--version
option
Changed
The API server option
--server
on the two clients is now a required option with no default; this is to avoid accidentally sending data to the wrong jurisdiction.Improved progress bars so they are displayed more reliably.
Fixed
It’s now possible to use
--help
on all subcommands without setting a value for--database
onemtellipro-db-client
.
4.4.0 (2020-02-20)
Added
The database client now checks if the target database contains all necessary columns before submitting documents, and exits early with an error message if any are missing. There is also an new exit code (5) for this and other database-related errors.
Changed
There’s now a
foundentitytype
table which contains the ontology and type name pairs for each found entity. This replaces thefoundentity.type_name_*
columns, and allows for any future ontologies to be added without changing any table’s schema.The
FoundEntity.type_name
dictionary will now contain whatever was returned by the API, and no type name will ever be None. If it wasn’t returned by the API, it won’t be present in that dictionary.
4.3.0 (2020-02-04)
Added
Added support to the database client for different exit codes based on the error.
Improved error handling and logging of exceptions, so now almost every exception ends up being logged to error.log, including details about which documents failed. See the
process --help
command option for details.
4.2.3 (2020-01-28)
Fixed
When attempting to store chartdate in an SQLite database, there was an error about the sqlite driver only accepting Python datetime objects for storing in a DateTime column. Now when storing dates in an SQLite database, the dates are parsed using the dateutil library. For all other databases the dates are left as strings.
If a submission fails, the database client no longer crashes with an exception. It now tries to split each submission into individual documents to narrow down the failed document, and logs the failed document to
errors.log
.
4.2.2 (2020-01-21)
Fixed
There was an exception being raised when there weren’t any documents found. Now the client simply prints a message stating that zero documents were found.
4.2.1 (2020-01-16)
Fixed
The database client now stores the JSON/JSONL filename in the
document.filename
columnThere was an exception being raised when there weren’t any entities found in a document. This is now fixed.
4.2.0 (2020-01-15)
Added
The database client now accepts JSON input in a single JSON or JSONL file.
4.1.0 (2020-01-09)
Changed
Removed the
--fetchall
option from the database client, as it is no longer necessary for connecting to SQL Server. This also removes the dependency onrecords
.
4.0.0 (2020-01-05)
Changed
Backwards incompatible: To ensure consistent naming of relations and locations, all relation and location types are now singular. This means that
AnnotatedDocument.locations.keys()
is now['section', 'sentence']
rather than['sections', 'sentences']
. And for relations,AnnotatedDocument.relations['experiencers']
is nowAnnotatedDocument.relations['experiencer']
. All other relations were already singular, soexperiencers
was the odd one out.If using the database client or
emtellipro_db
package, you can now find enums for all the type names inemtellipro_db.rowtypes
.The
source_document_id
column in thedocumentmetadata
table is now a unicode string of length 255, rather than an integer.
Fixed
Inconsistent
type_
values in the database when using--fast-postgress
have now been fixed. They’re now all singular and match the names used without--fast-postgres
.
Added
There’s now an
emtellipro_db.rowtypes
module that contains enums for all the different type names stored intype_
columns in the tables that have them.
3.1.1 (2019-12-25)
Added
A new option
--fetchall
to the Database client that is useful when processing documents taken from a database and the connection suffers from timeouts. This option will let you fetch all the documents in one query at the expense of a larger memory footprint. This option is currently required for retrieving documents from Microsoft SQL Server databases for processing.
3.1.0 (2019-12-18)
Added
A new database table for storing document metadata, called
documentmetadata
. It will be populated with extra fields provided using--sql-query
foremtellipro-db-client
.The database client now accepts a
--fast-postgres/--no-fast-postgres
option which enables/disables a new batched insertion mode for saving to the database which uses PostgreSQL-specific SQL extensions. This should result in up to a 10x performance improvement.emtellipro.data.InputDocument
now takes afile_obj
parameter, allowing callers to submit PDF files that don’t come from the file-system.
3.0.1 (2019-11-14)
Added
Updated documentation for the
emtellipro-db-client
Database ClientOther minor documentation updates/fixes
3.0.0 (2019-10-31)
Changed
Moved
examples/database.py
into its own module, and it’s now automatically installed when the complete package is installed. The database client is now callable usingemtellipro-db-client
orpython3 -m emtellipro_db
Ported advanced client to use SQLAlchemy instead of peewee as the ORM, and also changed the database schema slightly to more easily allow for future additions
2.8.0 (2019-10-01)
Added
Added support for sections to the API. They can be accessed through
AnnotatedDocument.locations['sections']
.
Fixed
Since the API response may omit the category and subcategory attributes, the SDK now handles that case correctly by allowing those attributes to be None in
AnnotatedDocument
, and NULL in the database columns used by the advanced client. Previously, missing attributes would result in a KeyError.
2.7.0 (2019-09-10)
Added
Added support to the advanced client for ingesting documents for processing from a database
2.6.0 (2019-08-26)
Added
Added support for the
umlshgnc-ontology
featureAdded job IDs and job timestamps to advanced client
2.5.1 (2019-07-17)
Fixed
There was a typo in the mime-type for plaintext documents that caused an error in some situations on CentOS. The mime-type has now been fixed.
2.5.0 (2019-07-09)
Added
Added support for new features
umlsnci-ontology
andmedcin-ontology
.
2.4.0 (2019-06-13)
Added
There is now a new exception
emtellipro.exceptions.TaskFailedError
that is raised when trying to retrieve results of a task that failed.The command-line clients now print some simple stats about how long it took to process the submitted documents.
2.3.0 (2019-04-04)
Added
Added support for qualifier relations to the SDK and
database.py
example. The relevant new class isemtellipro.data.QualifierRelation
which is usable throughannotated_document.relations['qualifier']
.
Fixed
There was a bug which caused an empty list of features to be treated as requesting all features (only None means “all features”). This has been fixed and an empty list of features correctly requests no features.
2.2.0 (2019-03-08)
Added
Added a
debug
command to the example client which prints out internal debugging information from the state file.
Changed
The example client’s
--access-key
and--secret-key
options now need to be set on the specific commands which use it (such assubmit
) instead of at the top level. This allowssubmit --help
to work as expected and allows for commands which don’t need those options.
2.1.0 (2019-02-21)
Added
The type names dictionary for found entities now contains a “radlex” key. This has also been added to the relevant database table.
2.0.2 (2019-02-19)
Fixed
A bug was fixed where one couldn’t specifically request the followup relations.
2.0.1 (2019-02-14)
Fixed
A bug was fixed where one couldn’t specifically request the measurement and imagelink relations.
2.0.0 (2019-02-11)
Added
There is now support for two new relation types: measurements and image-links. They’re implemented similarly to the previous relation types.
Changed
The example client now shards reports, and produces a JSONL formatted output file.
The example client now allows continuing an unfinished submission using the
continue
subcommand.The
Emtellipro.submit()
method now splits up the submitted documents so they fit in the maximum request size limit; this means that it now returns an iterable ofResultFuture
instances, instead of just one.
1.6.0 (2019-01-16)
Added
The example
database.py
script now saves the text of found entities in afoundentitytext
table.The example client now saves a mapping file along with the output file which contains the mapping from document ID to file path (space-separated).
The example client’s
submit
command learned--doc-id-filepath
which sets the document ID to the filepath (for easier matching of files to annotations). If this flag isn’t set, the new mapping file can be used instead.The example client’s
submit
command learned--recursive
for recursively looking for files in any directories provided as input.
1.5.1 (2019-01-10)
Fixed
The progress bar in the example client was incorrectly computed, and got to 100% too quickly. This has now been fixed so the progress displayed accurately represents the progress provided by the emtelliPro server.
1.5.0 (2018-12-24)
Changed
The document type can now be specified per
InputDocument
instance by passing thetype_
parameter to__init__()
. If not specified, theEmtellipro.submit()
method will set it, so previous ways of setting the document type for all documents will continue to work.utils.read_files()
used a default encoding of UTF-8 for all files, but in some cases not all files will have the same encoding. It now autodetects the encoding for each file, unless explicitly told the encoding to use for all files.
1.4.1 (2018-11-23)
Changed
Use Radiology/generic for default category/subcategory in
database.py
and example client.Set default document type as ‘plain’ in
database.py
1.4.0 (2018-11-20)
Changed
Use new multi-file format for submitting documents to emtelliPro server instead of older JSON-based one. There are no user-facing changes as a result of this.
The computation of document progress was changed to match changes to the API’s response for
/status
Added
sample_Discharge_summary_report.txt
toexample-data
folder.Added
generic
report subtype as default
1.3.0 (2018-11-15)
Added
examples/database.py
gained a--store-reports
flag on itsprocess
command which will enable storing of the original report text in thetext
column of thedocument
table.A
foundentitylocation
table was added toexamples/database.py
which maps found entities to locations (e.g. sentences) where they were found. These are the same locations as available inemtellipro.data.AnnotatedDocument.locations
Fixed
emtellipro.data.Span
objects are now hashable, fixing an issue with their use as dictionary keys inFoundEntity.parts
andSentenceLocation.parts
1.2.0 (2018-11-9)
Initial post-Beta release