Changelog
All notable changes to the emtelliPro Python SDK and associated files will be documented in this file. This SDK follows Semantic Versioning, although version identifiers adhere to standard Python packaging guidelines specified in PEP 440.
5.28.0 (2023-10-23)
Added
JSON input files can now contain a
"filepath"item which points to a file containing the actual data for the document. This can be either a.txtor.pdffile. The path should not be in the same directory as the JSON file, or else there’s a risk the document will be processed twice.
Changed
Updated JSON example in documentation to show all the possible metadata options.
New CCD config option
report-subsection-sectionsin config section[file_handlers.ccd], to control what sections the report subsection feature is applied in. See new CCD config docs for details.
5.27.0 (2023-08-22)
Added
When the database client detects that it’s running outside a virtual environment it will print a warning to let the user know.
APIErrorexceptions (and subclasses) now contain atask_idattribute when available.The JSON files used as input for the database client can now contain structured metadata. This metadata must be stored as a mapping in a
"structured_metadata"key and will be saved to thedocumentstructuredmetadatatable; if this key is missing, then no structured metadata will be stored, as before.The database client will now include version numbers when listing installed plugins.
The CCD plugin has now been merged into the SDK and the database client now natively supports CCD files. The separate plugin is now ignored if it’s installed.
Deprecated
The database client’s
--bulk-insertand--fast-postgresoptions are deprecated; the bulk-insert mode will be the only insert method moving forward, since it supports all database types and is faster.
5.26.0 (2023-03-17)
Database migration is required for this version.
Added
The
radplaybook-ontologyemtelliPro feature is now supported.There’s now support for the
date_timeargument of medication relations. This is stored inemtellipro.data.MedicationRelation.date_times(as a list), and in themedicationrelationdatetimetable.The database client is now able to save the unprocessed results from emtelliPro to disk instead of in a database. You can use this by setting
--database raw://SOME_DIRECTORY_PATHfor theprocesscommand; thecreate-dbcommand is not necessary because theprocesscommand will automatically create the target directory if missing.Support has been added for the
guidanceattribute of found entities. This is now found in the SDK asFoundEntity.guidance(where it is stored as a string), and it’s stored in the database in the newfoundentity.guidancecolumn. The new feature is calledentity-guidance. This change will require a database migration.Support is now available for the
ontology_versionsreturned by emtelliPro for each processed document. These are available asAnnotatedDocument.ontology_versionsand in theontologyversionstable. The database migration will insert rows in theontologyversionstable based on thedocumentconceptstable, but withrelease_version = NULLsince previous ontology version data is not available.
5.25.0 (2022-10-17)
Added
You can now use
-qas an alias for--quietfor the database client.Error log messages are now printed to STDERR when
emtellipro-db-client process --quietis used; to hide the error messages you can pass--quiet(or-q) twice. In the config file, this can now be controlled by settingquiet = 2(using the oldtrueandfalsevalues still works, but integer values are also supported).In the
emtellipro.datamodule, there are nowFoundEntity.attributes,Relation.attributes, andRelation.argumentsinstance attributes which tell you the names of the instance attributes which are relation attributes and relation arguments, respectively; eachRelationsubclass have these attributes. This provides some support for introspection to allow writing generic relation-handling code.
Changed
The JSON file loader in the database client now validates input JSON documents and provides descriptive error messages.
Document that Python 3.7 is now the minimum supported version, since Python 3.6 is no longer officially supported.
The
emtellipro_flatten_jsonexample uses the new data model metadata to look up attributes and arguments. New found entity attributes, new relation types, and new relation attributes and arguments will be handled automatically as long as their values fall into the general formats already supported. There are some changes to table names and column order. Relation tables are now produced only if non-empty.When cancelling tasks, the HTTP
PATCHmethod is now used instead ofGET; both methods are still supported by the API, butGETis deprecated.
Fixed
Filenames with extra dots in them would previously not have the file extension parsed correctly, and so would not be read
5.24.0 (2022-07-04)
Added
When running
emtellipro-db-client process --help, the defaults for all options are now printed.The database client now takes a
-c / --configoption which allows you to pass in a configuration file containing options. All command-line options are supported (simply remove the--from the beginning of an option to get the config file version).The
emtellipro-db-client processcommand now accepts a--filetypeoption which tells it to treat all input files as if they had the given file type (thus disabling detecting file type based on file extension). The associated config file option is calledfiletype. Seeprocess --helpfor available file types (since it depends on installed plugins).
Changed
API requests will now automatically time out after 90 seconds.
5.23.0 (2022-05-17)
Database migration is required for this version.
Added
The SDK’s version is stored in the new
processingdetails.sdk_versioncolumn.The
emtellipro-db-client process --doc-id-filepathoption now supports JSON and JSONL files.Support for the “reported event” relation in
examples/emtellipro_flatten_json/.Support for the
snomedicd10cm-ontologyfeature has now been added.There is now support for the “reported event” relation. This is accessible using
emtellipro.data.ReportedEventRelation, and in the database in thereportedeventrelationtable, with arguments in thereportedeventrelationtoentity,reportedeventrelationfromentity,reportedeventrelationmodifier,reportedeventrelationtimeexpressiontables. The feature name isreportedevent-relations.Example
examples/emtellipro_flatten_json/which reads emtelliPro JSON result files and uses the SDK’s data model to convert a flatter format.The database client now stores the engine version returned by emtelliPro in the new
processingdetails.engine_versioncolumn; if not returned, this will be NULL.The
documentmetadatatable now has 2 new columns:subject_gender, andrequestor, which can be populated by providing the same field names in the input documents’ metadata.
Fixed
Some database errors were leading to the database client to exit with code 0 instead of 5 (even though the error was printed and logged).
5.22.0 (2022-03-03)
Added
The database client now accepts a
process --doc-id-filepathflag which tells it to use the filepath of the input document as the document ID when submitting to the server; this is currently only supported for.txtdocuments.The database client now allows configuration of the logging path and level through the
--log-file PATHand--log-level LEVELoptions, which go in the same position as the--databaseoption (i.e. in betweenemtellipro-db-clientand the subcommand); seeemtellipro-db-client --helpfor details.The database client now takes a
process --polling-frequency SECONDSoption which adjusts how often it polls emtelliPro for status updates on the job processing status; by default it waits for 1 second between status checks, but for jobs with very large documents it might make sense to increase that value.
Fixed
Previously it was not possible to run multiple database clients in parallel using the
--no-bulk-insertoption due to uniqueness constraints on theconcepttable; this was not an issue with--bulk-insert, and now--no-bulk-inserthas been fixed to allow multiple clients to insert into theconcepttable simultaneously.
5.21.0 (2022-01-27)
Database migration is required for this version.
Added
in addition to CSV and JSONL output from the database client, it now also supports JSON output. You can use this by setting
--database json://SOME_DIRECTORY_PATHfor theprocessandcreate-dbcommands.the simple client now takes a
--max-retriesoption in the same position as the--serveroption, and the database client now takes the same option for theprocesscommand. This option specifies the number of times to retry failed API requests, and is enabled default, with a default value of 5, using exponential backoff between retries. There is a maximum wait time between retries of 120 seconds, so setting--max-retriesto large values will eventually result in multiple retries at 120 second intervals. The intention behind this option is that API requests may fail due to network issues, so this will allow continued processing in the case of transient network errors.the database client now accepts a
process --store-failedoption which will store details about documents that failed processing (i.e. ones whereprocessing_statusreturned by emtelliPro is"error").the database now contains a
document.processing_statuscolumn which contains the processing status returned by emtelliPro for the document. This will usually be"success"unless--store-failedis passed to the database client’sprocesscommand, in which case it may be"error".the database client has improved error handling and reporting about errors when submitting reports for processing
Fixed
the database client would raise an error when saving temporality relations using
--no-bulk-insertdue to an incorrect relationship definition. This affects version 5.20.0 and later.
5.20.0 (2022-01-17)
Database migration is required for this version.
Added
Temporality relations are now supported. They’re found in
emtellipro.data.TemporalityRelation, and in the database in thetemporalityrelationtable, with the modifiers in thetemporalityrelationmodifiertable.
5.19.0 (2021-12-08)
Database migration is required for this version.
Added
The
documentmetadatatable now has 3 new columns:author_name,subject_name, andsubject_dob.
Fixed
The
migratecommand for the database client would raise an exception when trying to migrate CSV or JSON files (which is not supported). It now prints an simple error message instead.There was performance regression introduced in 5.17.0 when saving to SQL Server, which caused significantly reduced storage speed. This was improved in 5.18.0 and has now been completely fixed.
5.18.0 (2021-11-26)
Added
Support has been added for the
question_statusattribute of found entities. This is now found in the SDK asFoundEntity.question_status(where it is stored as a string), and it’s stored in the database in the newfoundentity.question_statuscolumn. The new feature is calledentity-question-status. This change will require a database migration.
Fixed
there are now some performance improvements when saving to SQL Server, especially when using Microsoft’s ODBC driver instead of FreeTDS.
creating an
InputDocumentwith an empty string as the text raised aValueError, but now empty strings are allowed.
5.17.1 (2021-10-13)
Fixed
There was a bug introduced in 5.17.0 when exporting to CSV which caused a crash; this has now been fixed.
The
--featuresoption was not parsed properly in 5.17.0; this has now been fixed.
5.17.0 (2021-10-08)
Added
The database client’s
processcommand now accepts a--max-save-shard-sizewhich specifies the number of reports to store to the database at once; this is ideally used for lowering the default value from 50, since large numbers can cause database errors.You can now press CTRL-C when the database client is running and it will finish saving to the database what it has so far, and will exit cleanly. Pressing CTRL-C a second time will cause it to exit immediately.
Changed
The database client is now multi-threaded, so it can read files, submit them for processing, and store results to the database in parallel.
The database client’s
--max-submit-shard-sizedefault is now 100, to avoid having the database thread waiting too long for results.
5.16.0 (2021-09-23)
Added
Support for key-pair authentication for Snowflake has been added. All commands now take
--snowflake-private-key-pathas an option alongside--databasewhich is the path to the private key used for the connecting to Snowflake. If using this option, omit the password in the connection URL. To be consistent with SnowSQL, the passphrase for this key file can be passed using SNOWFLAKE_PRIVATE_KEY_PASSPHRASE or SNOWSQL_PRIVATE_KEY_PASSPHRASE in the environment, or if those environment variables are not set, the client will prompt for the passphrase if necessary. Both encrypted and unencrypted key files are supported.
5.15.0 (2021-09-20)
Added
There is now a plugin architecture based on signals.
Improved error handling when running
create-dbon an existing database, and checking whether the database schema version stored in thealembic_versiontable has been accidentally deleted.If the
alembic_versiontable has been accidentally cleared, thecreate-dbcommand will now allow the user to re-insert the database schema version in that table.The simple client now has a
cancelcommand that allows cancelling one or more tasks. Seeemtellipro-client cancel --helpfor available options.The simple client will now print the last unfinished task ID when it receives a SIGINT (e.g. from CTRL-C) while a job is in progress
Fixed
There was a memory leak in the SDK which has now been fixed, so running the simple client or database client will now use a stable amount of memory.
5.14.1 (2021-08-09)
Fixed
The last version of the SDK introduced a bug where JSON documents weren’t counted properly for the progress bar, and JSON document IDs were not properly attached to filenames, leading to an error message. This is now fixed.
5.14.0 (2021-07-29)
Added
The database client now supports saving output to CSV or JSONL files. Simply use
--database csv://PATH_TO_DIRECTORYor--database jsonl://PATH_TO_DIRECTORYfor thecreate-dbandprocesscommands, instead of the regular database URLs. The directory will be populated with files named after each table in the regular database schema, and all ID columns will use UUIDs in standard format (36 characters, with the hyphens).
Changed
The database schema when saving to Snowflake now uses UUIDs defined as
CHAR(36)for all the ID columns (and associated foreign keys). These are generated locally by the database client and will lead to significantly faster insertions for Snowflake users. This will require running themigratecommand to update existing Snowflake databases. Other databases will continue using auto-incrementing integer IDs.When using
--max-submit-shard-sizewith the database client, it will now only load the specified number of input documents at once (previously all input documents were loaded unconditionally, and this option only limited how many were submitted in each shard). This option can now be used to reduce memory usage when there are many input documents.
5.13.0 (2021-07-15)
Added
The database client’s
processcommand now accepts a--skip-database-checksoption which skips checks for database consistency (i.e. checking all the tables exist and have the correct columns).
Fixed
When reading input documents from JSON files and storing results using
--bulk-insertthe database client failed to store document metadata into thedocumentmetadatatable if the set of metadata keys weren’t the same across all JSON files. This has now been fixed and JSON files can have differing metadata keys.
5.12.0 (2021-06-24)
Added
The database client will now use
INSERT IGNOREin MySQL andINSERT OR IGNOREin SQLite to speed up inserts into theconceptstable when using the--bulk-insertoptionThe database client now accepts a
--max-submit-shard-sizeoption which allows restricting the number of documents submitted at once to the API. If unset, the client will continue determining shard size based on document size, so this is mostly useful for restricting shard size further than normal.
Fixed
On certain reports, when using the
--bulk-insertoption the database client would attempt to insert duplicate concepts in theconceptstable which would fail due to the uniqueness constraints on the table. Inserted concepts are now de-duplicated before insertion to fix this issue.
5.11.0 (2021-06-18)
Added
There is now support for the “duration” and “indication” arguments for medication relations. Like all the other medication relation arguments, these will also be stored as lists. They can be found in the
MedicationRelation.durationandMedicationRelation.indicationattributes andmedicationrelationdurationandmedicationrelationindicationtables, respectively; the new tables will require a database migration.
5.10.1 (2021-06-08)
Changed
Stopped Snowflake’s SQLAlchemy driver from logging INSERT queries to
errors.logImproved performance of Snowflake insertions
Fixed
Specified SQLAlchemy version must be at least 1.4
5.10.0 (2021-06-08)
Added
Snowflake databases are now supported for storing data using the database client
Bulk inserts (the
--fast-postgres) option is now available for all databases, and is called--bulk-insert); this will lead to some speed improvements especially in the cases of databases which support returning primary keys on inserted rows without an extra query (such as PostgreSQL and SQL Server). Postgres still has the best support in this case since it allows inserting missing entries into theconceptstable in a single query.Regular storing of documents one at a time (without
--bulk-insert) also has received some speed improvements by grouping inserts more efficiently.
5.9.0 (2021-05-13)
Added
The
foundentityandassumedentitytables now contain adocument_idcolumn with a foreign key referencingdocument.id; this should make it simpler to query found entities and assumed entities in a document without having to join to theentitytable (although theentitytable is unchanged so older queries will continue working). This will require a database schema migration using themigratecommand.Added
AnnotatedDocument.processing_statusattribute which stores the response of the engine for the “processing_status” JSON attribute. This is used for checking for failed reports in the database client (in the case that the entire task isn’t marked as failed).
Changed
Modified the type of the
document.json_representationanddocument.textcolumns in MySQL from TEXT to LONGTEXT, since depending on the document size, TEXT might not be able to hold all of the data.When you enable
--fast-postgreswith an unsupported database (i.e. anything other than postgres), you now get a clear error message.Minimum required Python version is now 3.6; all previous versions are now end-of-life and are no longer receiving security updates.
5.8.0 (2021-03-09)
Added
If SQL Server has full-text search support installed, then the database client will now create a FULLTEXT index on the
sectionlocation.textandsentencelocation.textcolumns. Note that you will need to run themigratecommand to create these indexes.This matches the existing support for similar indexes in MySQL and Postgres. If full-text support is not installed, the
migrateandcreate-dbcommands will indicate that no full-text indexes were created.
5.7.0 (2021-02-25)
Added
Support for the
snomedicd10-ontologyfeature has now been added.The new
known_ambiguityattribute of found entities is now stored in the database as thefoundentity.known_ambiguitycolumn, and is available in theFoundEntity.known_ambiguityattribute in the SDK. You will need to run the database client’smigratecommand when updating.
Fixed
The issue with the extra COMMIT being sent to SQL Server when outside of a transaction has now been fixed.
5.6.0 (2021-02-04)
Added
PDFs stored in JSON files are now supported; this requires the PDF to be Base64-encoded and stored under a
pdfkey in the JSON; thetextoption must be omitted in that case, or else the JSON will be considered to contain plaintext data instead.
Fixed
The database client was storing document text when using the
--fast-postgresoption even when it shouldn’t have been. Ensure you’re using--store-reportsif you’d like to store report text. If the input document is a PDF, the document text will be stored regardless, to ensure the spans make sense.
5.5.0 (2020-12-05)
Added
Both the simple client and the database client now support the
--quietflag when processing reports, which will hide the progress bars.Both clients now support the
timeoutparameter on/statuscalls to the engine, potentially speeding up processing of single documents.
Fixed
The clients were making some unnecessary status calls at the end of processing, and that’s now improved. Calling
ResultFuture.result()andResultFuture.raw_result()may now raiseTaskNotFoundErrorwhen attempting to retrieve a result for an invalid task ID.
5.4.3 (2020-11-27)
Fixed
Auth was signing all headers, and now only signs those required. This will avoid auth failures in some network configurations.
5.4.2 (2020-11-05)
Fixed
There was a typo in an index name which was confusing (
ix_sectionlocationlocation_text, has too many locations)
5.4.1 (2020-11-05)
Fixed
There was a misalignment of sentence locations and sentence spans when storing sentences in the database when using the
--fast-postgresoption; this has occurred since 5.3.0. This has now been fixed, and checks have been added to ensure this sort of issue will not recur with other similar tables.
5.4.0 (2020-11-03)
Added
Added two new Postgres GIN indexes on the
sentencelocation.textandsectionlocation.textcolumns.
Changed
The Postgres B-tree indices on
sentencelocation.textandsectionlocation.textwere removed because there is a maximum size for these indices and sometimes the text in those columns exceeds the maximum size supported by the index.
5.3.0 (2020-10-19)
Added
When using the database client to ingest documents from a database, you can now provide the
--text-is-pdfoption to tell it the ‘text’ column contains raw PDF bytes, rather than plaintext.documentation has been added for the new PDF ingest from database column feature
the database client now has a
process --store-sections-and-sentencesoption which will store section and sentence text insectionlocation.textandsentencelocation.textcolumns, respectively. In the case that sentences or sections are discontinuous, their different parts will simply joined by using single spaces.There are now indices on the
sectionlocation.textandsentencelocation.textcolumns. On MySQL these will be FULLTEXT indices, and on Postgres, they’re B-tree indices using thetext_pattern_opsoperator class.
Fixed
The database storing progress bar was usually not very smooth when using the fast-postgres option. This was because the batches were too big and usually most docs fit into one batch. Now there’s a batch of size 50 when storing results which leads to a smoother progress bar.
5.2.1 (2020-07-27)
Fixed
There was a corner case where the database client detected that some documents failed to be processed, but didn’t log which ones.
5.2.0 (2020-07-03)
Added
There’s now support for arbitrary key-value metadata which can be returned by file handler plugins. This data is stored in the documentstructuredmetadata table where the keys and values are both unicode texts; you may want to cast the values to more useful types for queries.
5.1.0 (2020-06-09)
Added
DB client support for plugins that can add handlers for new file types.
More general DB client handling of file paths, especially to handle more than one JSON or JSONL file at once and to apply the recursive option to JSON, JSONL, or plugin-added file types.
5.0.0 (2020-05-01)
Added
Added support for
umlsloinc-ontologyas an available feature.The necessity and modifier arguments for medications have been added, and are stored in the
medicationrelationnecessityandmedicationrelationmodifiertables, respectively.
Changed
ResultFuture.done()now returns aResultFutureStatusobject (which is truthy), and this object must be used now for checking progress and general status;ResultFuture.progress()has now been removed (useResultFutureStatus.progressfor this)
4.9.0 (2020-04-24)
Added
The database client can now store the JSON it received from the server for each document into a new
document.json_representationcolumn. This will require a migration using themigratesubcommand. Storing JSON can be enabled using theprocess --store-jsoncommand-line option.
4.8.0 (2020-04-04)
Added
All HTTP client error response codes (4xx) from the emtelliPro server are now handled using an
HTTPClientErrorexception if there isn’t a more specific exception for them. The clients will continue to exit with code 4 for these errors (unless there’s a more specific exit code for them).There is now support for
process_with_section_labels. Both clients now accept a--section-labeloption which will be used for all documents being read in. This is passed verbatim to the server. Support for this option has also been added using thesection_labelparameter ofInputDocumentobjects. The section label is now stored in a newdocument.section_labelcolumn in the database. NOTE: this change requires you to use themigratecommand to add the column to your exiting database.
4.7.0 (2020-03-23)
Added
Added support for medication relations. It’s controlled by the
medication-relationsfeature. New tables aremedicationrelation,medicationrelationdosage,medicationrelationfrequency,medicationrelationmode,medicationrelationquantity, andmedicationrelationroute. It can also be accessed usingAnnotatedDocument.relations['medication'].
4.6.0 (2020-03-18)
Added
Support for the
entity-measurement-unitfeature. This means there’s a new table where it gets stored:foundentitymeasurementunit, and theFoundEntityobject now has aFoundEntity.measurement_unitattribute.There’s now an
emtellipro-db-client migratesub-command which runs migration scripts against the database to upgrade it to the current schema, copying data as necessary; this currently only supports databases going back to v4.3.0, and going forward every new version of the database client will allow migrating the database. Please ensure you have a backup of the database just in case.
4.5.0 (2020-03-13)
Added
The database client now has a
--versionoption
Changed
The API server option
--serveron the two clients is now a required option with no default; this is to avoid accidentally sending data to the wrong jurisdiction.Improved progress bars so they are displayed more reliably.
Fixed
It’s now possible to use
--helpon all subcommands without setting a value for--databaseonemtellipro-db-client.
4.4.0 (2020-02-20)
Added
The database client now checks if the target database contains all necessary columns before submitting documents, and exits early with an error message if any are missing. There is also an new exit code (5) for this and other database-related errors.
Changed
There’s now a
foundentitytypetable which contains the ontology and type name pairs for each found entity. This replaces thefoundentity.type_name_*columns, and allows for any future ontologies to be added without changing any table’s schema.The
FoundEntity.type_namedictionary will now contain whatever was returned by the API, and no type name will ever be None. If it wasn’t returned by the API, it won’t be present in that dictionary.
4.3.0 (2020-02-04)
Added
Added support to the database client for different exit codes based on the error.
Improved error handling and logging of exceptions, so now almost every exception ends up being logged to error.log, including details about which documents failed. See the
process --helpcommand option for details.
4.2.3 (2020-01-28)
Fixed
When attempting to store chartdate in an SQLite database, there was an error about the sqlite driver only accepting Python datetime objects for storing in a DateTime column. Now when storing dates in an SQLite database, the dates are parsed using the dateutil library. For all other databases the dates are left as strings.
If a submission fails, the database client no longer crashes with an exception. It now tries to split each submission into individual documents to narrow down the failed document, and logs the failed document to
errors.log.
4.2.2 (2020-01-21)
Fixed
There was an exception being raised when there weren’t any documents found. Now the client simply prints a message stating that zero documents were found.
4.2.1 (2020-01-16)
Fixed
The database client now stores the JSON/JSONL filename in the
document.filenamecolumnThere was an exception being raised when there weren’t any entities found in a document. This is now fixed.
4.2.0 (2020-01-15)
Added
The database client now accepts JSON input in a single JSON or JSONL file.
4.1.0 (2020-01-09)
Changed
Removed the
--fetchalloption from the database client, as it is no longer necessary for connecting to SQL Server. This also removes the dependency onrecords.
4.0.0 (2020-01-05)
Changed
Backwards incompatible: To ensure consistent naming of relations and locations, all relation and location types are now singular. This means that
AnnotatedDocument.locations.keys()is now['section', 'sentence']rather than['sections', 'sentences']. And for relations,AnnotatedDocument.relations['experiencers']is nowAnnotatedDocument.relations['experiencer']. All other relations were already singular, soexperiencerswas the odd one out.If using the database client or
emtellipro_dbpackage, you can now find enums for all the type names inemtellipro_db.rowtypes.The
source_document_idcolumn in thedocumentmetadatatable is now a unicode string of length 255, rather than an integer.
Fixed
Inconsistent
type_values in the database when using--fast-postgresshave now been fixed. They’re now all singular and match the names used without--fast-postgres.
Added
There’s now an
emtellipro_db.rowtypesmodule that contains enums for all the different type names stored intype_columns in the tables that have them.
3.1.1 (2019-12-25)
Added
A new option
--fetchallto the Database client that is useful when processing documents taken from a database and the connection suffers from timeouts. This option will let you fetch all the documents in one query at the expense of a larger memory footprint. This option is currently required for retrieving documents from Microsoft SQL Server databases for processing.
3.1.0 (2019-12-18)
Added
A new database table for storing document metadata, called
documentmetadata. It will be populated with extra fields provided using--sql-queryforemtellipro-db-client.The database client now accepts a
--fast-postgres/--no-fast-postgresoption which enables/disables a new batched insertion mode for saving to the database which uses PostgreSQL-specific SQL extensions. This should result in up to a 10x performance improvement.emtellipro.data.InputDocumentnow takes afile_objparameter, allowing callers to submit PDF files that don’t come from the file-system.
3.0.1 (2019-11-14)
Added
Updated documentation for the
emtellipro-db-clientDatabase ClientOther minor documentation updates/fixes
3.0.0 (2019-10-31)
Changed
Moved
examples/database.pyinto its own module, and it’s now automatically installed when the complete package is installed. The database client is now callable usingemtellipro-db-clientorpython3 -m emtellipro_dbPorted advanced client to use SQLAlchemy instead of peewee as the ORM, and also changed the database schema slightly to more easily allow for future additions
2.8.0 (2019-10-01)
Added
Added support for sections to the API. They can be accessed through
AnnotatedDocument.locations['sections'].
Fixed
Since the API response may omit the category and subcategory attributes, the SDK now handles that case correctly by allowing those attributes to be None in
AnnotatedDocument, and NULL in the database columns used by the advanced client. Previously, missing attributes would result in a KeyError.
2.7.0 (2019-09-10)
Added
Added support to the advanced client for ingesting documents for processing from a database
2.6.0 (2019-08-26)
Added
Added support for the
umlshgnc-ontologyfeatureAdded job IDs and job timestamps to advanced client
2.5.1 (2019-07-17)
Fixed
There was a typo in the mime-type for plaintext documents that caused an error in some situations on CentOS. The mime-type has now been fixed.
2.5.0 (2019-07-09)
Added
Added support for new features
umlsnci-ontologyandmedcin-ontology.
2.4.0 (2019-06-13)
Added
There is now a new exception
emtellipro.exceptions.TaskFailedErrorthat is raised when trying to retrieve results of a task that failed.The command-line clients now print some simple stats about how long it took to process the submitted documents.
2.3.0 (2019-04-04)
Added
Added support for qualifier relations to the SDK and
database.pyexample. The relevant new class isemtellipro.data.QualifierRelationwhich is usable throughannotated_document.relations['qualifier'].
Fixed
There was a bug which caused an empty list of features to be treated as requesting all features (only None means “all features”). This has been fixed and an empty list of features correctly requests no features.
2.2.0 (2019-03-08)
Added
Added a
debugcommand to the example client which prints out internal debugging information from the state file.
Changed
The example client’s
--access-keyand--secret-keyoptions now need to be set on the specific commands which use it (such assubmit) instead of at the top level. This allowssubmit --helpto work as expected and allows for commands which don’t need those options.
2.1.0 (2019-02-21)
Added
The type names dictionary for found entities now contains a “radlex” key. This has also been added to the relevant database table.
2.0.2 (2019-02-19)
Fixed
A bug was fixed where one couldn’t specifically request the followup relations.
2.0.1 (2019-02-14)
Fixed
A bug was fixed where one couldn’t specifically request the measurement and imagelink relations.
2.0.0 (2019-02-11)
Added
There is now support for two new relation types: measurements and image-links. They’re implemented similarly to the previous relation types.
Changed
The example client now shards reports, and produces a JSONL formatted output file.
The example client now allows continuing an unfinished submission using the
continuesubcommand.The
Emtellipro.submit()method now splits up the submitted documents so they fit in the maximum request size limit; this means that it now returns an iterable ofResultFutureinstances, instead of just one.
1.6.0 (2019-01-16)
Added
The example
database.pyscript now saves the text of found entities in afoundentitytexttable.The example client now saves a mapping file along with the output file which contains the mapping from document ID to file path (space-separated).
The example client’s
submitcommand learned--doc-id-filepathwhich sets the document ID to the filepath (for easier matching of files to annotations). If this flag isn’t set, the new mapping file can be used instead.The example client’s
submitcommand learned--recursivefor recursively looking for files in any directories provided as input.
1.5.1 (2019-01-10)
Fixed
The progress bar in the example client was incorrectly computed, and got to 100% too quickly. This has now been fixed so the progress displayed accurately represents the progress provided by the emtelliPro server.
1.5.0 (2018-12-24)
Changed
The document type can now be specified per
InputDocumentinstance by passing thetype_parameter to__init__(). If not specified, theEmtellipro.submit()method will set it, so previous ways of setting the document type for all documents will continue to work.utils.read_files()used a default encoding of UTF-8 for all files, but in some cases not all files will have the same encoding. It now autodetects the encoding for each file, unless explicitly told the encoding to use for all files.
1.4.1 (2018-11-23)
Changed
Use Radiology/generic for default category/subcategory in
database.pyand example client.Set default document type as ‘plain’ in
database.py
1.4.0 (2018-11-20)
Changed
Use new multi-file format for submitting documents to emtelliPro server instead of older JSON-based one. There are no user-facing changes as a result of this.
The computation of document progress was changed to match changes to the API’s response for
/statusAdded
sample_Discharge_summary_report.txttoexample-datafolder.Added
genericreport subtype as default
1.3.0 (2018-11-15)
Added
examples/database.pygained a--store-reportsflag on itsprocesscommand which will enable storing of the original report text in thetextcolumn of thedocumenttable.A
foundentitylocationtable was added toexamples/database.pywhich maps found entities to locations (e.g. sentences) where they were found. These are the same locations as available inemtellipro.data.AnnotatedDocument.locations
Fixed
emtellipro.data.Spanobjects are now hashable, fixing an issue with their use as dictionary keys inFoundEntity.partsandSentenceLocation.parts
1.2.0 (2018-11-9)
Initial post-Beta release