Changelog

All notable changes to the emtelliPro Python SDK and associated files will be documented in this file. This SDK follows Semantic Versioning, although version identifiers adhere to standard Python packaging guidelines specified in PEP 440.

5.28.0 (2023-10-23)

Added

  • JSON input files can now contain a "filepath" item which points to a file containing the actual data for the document. This can be either a .txt or .pdf file. The path should not be in the same directory as the JSON file, or else there’s a risk the document will be processed twice.

Changed

  • Updated JSON example in documentation to show all the possible metadata options.

  • New CCD config option report-subsection-sections in config section [file_handlers.ccd], to control what sections the report subsection feature is applied in. See new CCD config docs for details.

5.27.0 (2023-08-22)

Added

  • When the database client detects that it’s running outside a virtual environment it will print a warning to let the user know.

  • APIError exceptions (and subclasses) now contain a task_id attribute when available.

  • The JSON files used as input for the database client can now contain structured metadata. This metadata must be stored as a mapping in a "structured_metadata" key and will be saved to the documentstructuredmetadata table; if this key is missing, then no structured metadata will be stored, as before.

  • The database client will now include version numbers when listing installed plugins.

  • The CCD plugin has now been merged into the SDK and the database client now natively supports CCD files. The separate plugin is now ignored if it’s installed.

Deprecated

  • The database client’s --bulk-insert and --fast-postgres options are deprecated; the bulk-insert mode will be the only insert method moving forward, since it supports all database types and is faster.

5.26.0 (2023-03-17)

Database migration is required for this version.

Added

  • The radplaybook-ontology emtelliPro feature is now supported.

  • There’s now support for the date_time argument of medication relations. This is stored in emtellipro.data.MedicationRelation.date_times (as a list), and in the medicationrelationdatetime table.

  • The database client is now able to save the unprocessed results from emtelliPro to disk instead of in a database. You can use this by setting --database raw://SOME_DIRECTORY_PATH for the process command; the create-db command is not necessary because the process command will automatically create the target directory if missing.

  • Support has been added for the guidance attribute of found entities. This is now found in the SDK as FoundEntity.guidance (where it is stored as a string), and it’s stored in the database in the new foundentity.guidance column. The new feature is called entity-guidance. This change will require a database migration.

  • Support is now available for the ontology_versions returned by emtelliPro for each processed document. These are available as AnnotatedDocument.ontology_versions and in the ontologyversions table. The database migration will insert rows in the ontologyversions table based on the documentconcepts table, but with release_version = NULL since previous ontology version data is not available.

5.25.0 (2022-10-17)

Added

  • You can now use -q as an alias for --quiet for the database client.

  • Error log messages are now printed to STDERR when emtellipro-db-client process --quiet is used; to hide the error messages you can pass --quiet (or -q) twice. In the config file, this can now be controlled by setting quiet = 2 (using the old true and false values still works, but integer values are also supported).

  • In the emtellipro.data module, there are now FoundEntity.attributes, Relation.attributes, and Relation.arguments instance attributes which tell you the names of the instance attributes which are relation attributes and relation arguments, respectively; each Relation subclass have these attributes. This provides some support for introspection to allow writing generic relation-handling code.

Changed

  • The JSON file loader in the database client now validates input JSON documents and provides descriptive error messages.

  • Document that Python 3.7 is now the minimum supported version, since Python 3.6 is no longer officially supported.

  • The emtellipro_flatten_json example uses the new data model metadata to look up attributes and arguments. New found entity attributes, new relation types, and new relation attributes and arguments will be handled automatically as long as their values fall into the general formats already supported. There are some changes to table names and column order. Relation tables are now produced only if non-empty.

  • When cancelling tasks, the HTTP PATCH method is now used instead of GET; both methods are still supported by the API, but GET is deprecated.

Fixed

  • Filenames with extra dots in them would previously not have the file extension parsed correctly, and so would not be read

5.24.0 (2022-07-04)

Added

  • When running emtellipro-db-client process --help, the defaults for all options are now printed.

  • The database client now takes a -c / --config option which allows you to pass in a configuration file containing options. All command-line options are supported (simply remove the -- from the beginning of an option to get the config file version).

  • The emtellipro-db-client process command now accepts a --filetype option which tells it to treat all input files as if they had the given file type (thus disabling detecting file type based on file extension). The associated config file option is called filetype. See process --help for available file types (since it depends on installed plugins).

Changed

  • API requests will now automatically time out after 90 seconds.

5.23.0 (2022-05-17)

Database migration is required for this version.

Added

  • The SDK’s version is stored in the new processingdetails.sdk_version column.

  • The emtellipro-db-client process --doc-id-filepath option now supports JSON and JSONL files.

  • Support for the “reported event” relation in examples/emtellipro_flatten_json/.

  • Support for the snomedicd10cm-ontology feature has now been added.

  • There is now support for the “reported event” relation. This is accessible using emtellipro.data.ReportedEventRelation, and in the database in the reportedeventrelation table, with arguments in the reportedeventrelationtoentity, reportedeventrelationfromentity, reportedeventrelationmodifier, reportedeventrelationtimeexpression tables. The feature name is reportedevent-relations.

  • Example examples/emtellipro_flatten_json/ which reads emtelliPro JSON result files and uses the SDK’s data model to convert a flatter format.

  • The database client now stores the engine version returned by emtelliPro in the new processingdetails.engine_version column; if not returned, this will be NULL.

  • The documentmetadata table now has 2 new columns: subject_gender, and requestor, which can be populated by providing the same field names in the input documents’ metadata.

Fixed

  • Some database errors were leading to the database client to exit with code 0 instead of 5 (even though the error was printed and logged).

5.22.0 (2022-03-03)

Added

  • The database client now accepts a process --doc-id-filepath flag which tells it to use the filepath of the input document as the document ID when submitting to the server; this is currently only supported for .txt documents.

  • The database client now allows configuration of the logging path and level through the --log-file PATH and --log-level LEVEL options, which go in the same position as the --database option (i.e. in between emtellipro-db-client and the subcommand); see emtellipro-db-client --help for details.

  • The database client now takes a process --polling-frequency SECONDS option which adjusts how often it polls emtelliPro for status updates on the job processing status; by default it waits for 1 second between status checks, but for jobs with very large documents it might make sense to increase that value.

Fixed

  • Previously it was not possible to run multiple database clients in parallel using the --no-bulk-insert option due to uniqueness constraints on the concept table; this was not an issue with --bulk-insert, and now --no-bulk-insert has been fixed to allow multiple clients to insert into the concept table simultaneously.

5.21.0 (2022-01-27)

Database migration is required for this version.

Added

  • in addition to CSV and JSONL output from the database client, it now also supports JSON output. You can use this by setting --database json://SOME_DIRECTORY_PATH for the process and create-db commands.

  • the simple client now takes a --max-retries option in the same position as the --server option, and the database client now takes the same option for the process command. This option specifies the number of times to retry failed API requests, and is enabled default, with a default value of 5, using exponential backoff between retries. There is a maximum wait time between retries of 120 seconds, so setting --max-retries to large values will eventually result in multiple retries at 120 second intervals. The intention behind this option is that API requests may fail due to network issues, so this will allow continued processing in the case of transient network errors.

  • the database client now accepts a process --store-failed option which will store details about documents that failed processing (i.e. ones where processing_status returned by emtelliPro is "error").

  • the database now contains a document.processing_status column which contains the processing status returned by emtelliPro for the document. This will usually be "success" unless --store-failed is passed to the database client’s process command, in which case it may be "error".

  • the database client has improved error handling and reporting about errors when submitting reports for processing

Fixed

  • the database client would raise an error when saving temporality relations using --no-bulk-insert due to an incorrect relationship definition. This affects version 5.20.0 and later.

5.20.0 (2022-01-17)

Database migration is required for this version.

Added

  • Temporality relations are now supported. They’re found in emtellipro.data.TemporalityRelation, and in the database in the temporalityrelation table, with the modifiers in the temporalityrelationmodifier table.

5.19.0 (2021-12-08)

Database migration is required for this version.

Added

  • The documentmetadata table now has 3 new columns: author_name, subject_name, and subject_dob.

Fixed

  • The migrate command for the database client would raise an exception when trying to migrate CSV or JSON files (which is not supported). It now prints an simple error message instead.

  • There was performance regression introduced in 5.17.0 when saving to SQL Server, which caused significantly reduced storage speed. This was improved in 5.18.0 and has now been completely fixed.

5.18.0 (2021-11-26)

Added

  • Support has been added for the question_status attribute of found entities. This is now found in the SDK as FoundEntity.question_status (where it is stored as a string), and it’s stored in the database in the new foundentity.question_status column. The new feature is called entity-question-status. This change will require a database migration.

Fixed

  • there are now some performance improvements when saving to SQL Server, especially when using Microsoft’s ODBC driver instead of FreeTDS.

  • creating an InputDocument with an empty string as the text raised a ValueError, but now empty strings are allowed.

5.17.1 (2021-10-13)

Fixed

  • There was a bug introduced in 5.17.0 when exporting to CSV which caused a crash; this has now been fixed.

  • The --features option was not parsed properly in 5.17.0; this has now been fixed.

5.17.0 (2021-10-08)

Added

  • The database client’s process command now accepts a --max-save-shard-size which specifies the number of reports to store to the database at once; this is ideally used for lowering the default value from 50, since large numbers can cause database errors.

  • You can now press CTRL-C when the database client is running and it will finish saving to the database what it has so far, and will exit cleanly. Pressing CTRL-C a second time will cause it to exit immediately.

Changed

  • The database client is now multi-threaded, so it can read files, submit them for processing, and store results to the database in parallel.

  • The database client’s --max-submit-shard-size default is now 100, to avoid having the database thread waiting too long for results.

5.16.0 (2021-09-23)

Added

  • Support for key-pair authentication for Snowflake has been added. All commands now take --snowflake-private-key-path as an option alongside --database which is the path to the private key used for the connecting to Snowflake. If using this option, omit the password in the connection URL. To be consistent with SnowSQL, the passphrase for this key file can be passed using SNOWFLAKE_PRIVATE_KEY_PASSPHRASE or SNOWSQL_PRIVATE_KEY_PASSPHRASE in the environment, or if those environment variables are not set, the client will prompt for the passphrase if necessary. Both encrypted and unencrypted key files are supported.

5.15.0 (2021-09-20)

Added

  • There is now a plugin architecture based on signals.

  • Improved error handling when running create-db on an existing database, and checking whether the database schema version stored in the alembic_version table has been accidentally deleted.

  • If the alembic_version table has been accidentally cleared, the create-db command will now allow the user to re-insert the database schema version in that table.

  • The simple client now has a cancel command that allows cancelling one or more tasks. See emtellipro-client cancel --help for available options.

  • The simple client will now print the last unfinished task ID when it receives a SIGINT (e.g. from CTRL-C) while a job is in progress

Fixed

  • There was a memory leak in the SDK which has now been fixed, so running the simple client or database client will now use a stable amount of memory.

5.14.1 (2021-08-09)

Fixed

  • The last version of the SDK introduced a bug where JSON documents weren’t counted properly for the progress bar, and JSON document IDs were not properly attached to filenames, leading to an error message. This is now fixed.

5.14.0 (2021-07-29)

Added

  • The database client now supports saving output to CSV or JSONL files. Simply use --database csv://PATH_TO_DIRECTORY or --database jsonl://PATH_TO_DIRECTORY for the create-db and process commands, instead of the regular database URLs. The directory will be populated with files named after each table in the regular database schema, and all ID columns will use UUIDs in standard format (36 characters, with the hyphens).

Changed

  • The database schema when saving to Snowflake now uses UUIDs defined as CHAR(36) for all the ID columns (and associated foreign keys). These are generated locally by the database client and will lead to significantly faster insertions for Snowflake users. This will require running the migrate command to update existing Snowflake databases. Other databases will continue using auto-incrementing integer IDs.

  • When using --max-submit-shard-size with the database client, it will now only load the specified number of input documents at once (previously all input documents were loaded unconditionally, and this option only limited how many were submitted in each shard). This option can now be used to reduce memory usage when there are many input documents.

5.13.0 (2021-07-15)

Added

  • The database client’s process command now accepts a --skip-database-checks option which skips checks for database consistency (i.e. checking all the tables exist and have the correct columns).

Fixed

  • When reading input documents from JSON files and storing results using --bulk-insert the database client failed to store document metadata into the documentmetadata table if the set of metadata keys weren’t the same across all JSON files. This has now been fixed and JSON files can have differing metadata keys.

5.12.0 (2021-06-24)

Added

  • The database client will now use INSERT IGNORE in MySQL and INSERT OR IGNORE in SQLite to speed up inserts into the concepts table when using the --bulk-insert option

  • The database client now accepts a --max-submit-shard-size option which allows restricting the number of documents submitted at once to the API. If unset, the client will continue determining shard size based on document size, so this is mostly useful for restricting shard size further than normal.

Fixed

  • On certain reports, when using the --bulk-insert option the database client would attempt to insert duplicate concepts in the concepts table which would fail due to the uniqueness constraints on the table. Inserted concepts are now de-duplicated before insertion to fix this issue.

5.11.0 (2021-06-18)

Added

  • There is now support for the “duration” and “indication” arguments for medication relations. Like all the other medication relation arguments, these will also be stored as lists. They can be found in the MedicationRelation.duration and MedicationRelation.indication attributes and medicationrelationduration and medicationrelationindication tables, respectively; the new tables will require a database migration.

5.10.1 (2021-06-08)

Changed

  • Stopped Snowflake’s SQLAlchemy driver from logging INSERT queries to errors.log

  • Improved performance of Snowflake insertions

Fixed

  • Specified SQLAlchemy version must be at least 1.4

5.10.0 (2021-06-08)

Added

  • Snowflake databases are now supported for storing data using the database client

  • Bulk inserts (the --fast-postgres) option is now available for all databases, and is called --bulk-insert); this will lead to some speed improvements especially in the cases of databases which support returning primary keys on inserted rows without an extra query (such as PostgreSQL and SQL Server). Postgres still has the best support in this case since it allows inserting missing entries into the concepts table in a single query.

  • Regular storing of documents one at a time (without --bulk-insert) also has received some speed improvements by grouping inserts more efficiently.

5.9.0 (2021-05-13)

Added

  • The foundentity and assumedentity tables now contain a document_id column with a foreign key referencing document.id; this should make it simpler to query found entities and assumed entities in a document without having to join to the entity table (although the entity table is unchanged so older queries will continue working). This will require a database schema migration using the migrate command.

  • Added AnnotatedDocument.processing_status attribute which stores the response of the engine for the “processing_status” JSON attribute. This is used for checking for failed reports in the database client (in the case that the entire task isn’t marked as failed).

Changed

  • Modified the type of the document.json_representation and document.text columns in MySQL from TEXT to LONGTEXT, since depending on the document size, TEXT might not be able to hold all of the data.

  • When you enable --fast-postgres with an unsupported database (i.e. anything other than postgres), you now get a clear error message.

  • Minimum required Python version is now 3.6; all previous versions are now end-of-life and are no longer receiving security updates.

5.8.0 (2021-03-09)

Added

  • If SQL Server has full-text search support installed, then the database client will now create a FULLTEXT index on the sectionlocation.text and sentencelocation.text columns. Note that you will need to run the migrate command to create these indexes.

    This matches the existing support for similar indexes in MySQL and Postgres. If full-text support is not installed, the migrate and create-db commands will indicate that no full-text indexes were created.

5.7.0 (2021-02-25)

Added

  • Support for the snomedicd10-ontology feature has now been added.

  • The new known_ambiguity attribute of found entities is now stored in the database as the foundentity.known_ambiguity column, and is available in the FoundEntity.known_ambiguity attribute in the SDK. You will need to run the database client’s migrate command when updating.

Fixed

  • The issue with the extra COMMIT being sent to SQL Server when outside of a transaction has now been fixed.

5.6.0 (2021-02-04)

Added

  • PDFs stored in JSON files are now supported; this requires the PDF to be Base64-encoded and stored under a pdf key in the JSON; the text option must be omitted in that case, or else the JSON will be considered to contain plaintext data instead.

Fixed

  • The database client was storing document text when using the --fast-postgres option even when it shouldn’t have been. Ensure you’re using --store-reports if you’d like to store report text. If the input document is a PDF, the document text will be stored regardless, to ensure the spans make sense.

5.5.0 (2020-12-05)

Added

  • Both the simple client and the database client now support the --quiet flag when processing reports, which will hide the progress bars.

  • Both clients now support the timeout parameter on /status calls to the engine, potentially speeding up processing of single documents.

Fixed

  • The clients were making some unnecessary status calls at the end of processing, and that’s now improved. Calling ResultFuture.result() and ResultFuture.raw_result() may now raise TaskNotFoundError when attempting to retrieve a result for an invalid task ID.

5.4.3 (2020-11-27)

Fixed

  • Auth was signing all headers, and now only signs those required. This will avoid auth failures in some network configurations.

5.4.2 (2020-11-05)

Fixed

  • There was a typo in an index name which was confusing (ix_sectionlocationlocation_text, has too many locations)

5.4.1 (2020-11-05)

Fixed

  • There was a misalignment of sentence locations and sentence spans when storing sentences in the database when using the --fast-postgres option; this has occurred since 5.3.0. This has now been fixed, and checks have been added to ensure this sort of issue will not recur with other similar tables.

5.4.0 (2020-11-03)

Added

  • Added two new Postgres GIN indexes on the sentencelocation.text and sectionlocation.text columns.

Changed

  • The Postgres B-tree indices on sentencelocation.text and sectionlocation.text were removed because there is a maximum size for these indices and sometimes the text in those columns exceeds the maximum size supported by the index.

5.3.0 (2020-10-19)

Added

  • When using the database client to ingest documents from a database, you can now provide the --text-is-pdf option to tell it the ‘text’ column contains raw PDF bytes, rather than plaintext.

  • documentation has been added for the new PDF ingest from database column feature

  • the database client now has a process --store-sections-and-sentences option which will store section and sentence text in sectionlocation.text and sentencelocation.text columns, respectively. In the case that sentences or sections are discontinuous, their different parts will simply joined by using single spaces.

  • There are now indices on the sectionlocation.text and sentencelocation.text columns. On MySQL these will be FULLTEXT indices, and on Postgres, they’re B-tree indices using the text_pattern_ops operator class.

Fixed

  • The database storing progress bar was usually not very smooth when using the fast-postgres option. This was because the batches were too big and usually most docs fit into one batch. Now there’s a batch of size 50 when storing results which leads to a smoother progress bar.

5.2.1 (2020-07-27)

Fixed

  • There was a corner case where the database client detected that some documents failed to be processed, but didn’t log which ones.

5.2.0 (2020-07-03)

Added

  • There’s now support for arbitrary key-value metadata which can be returned by file handler plugins. This data is stored in the documentstructuredmetadata table where the keys and values are both unicode texts; you may want to cast the values to more useful types for queries.

5.1.0 (2020-06-09)

Added

  • DB client support for plugins that can add handlers for new file types.

  • More general DB client handling of file paths, especially to handle more than one JSON or JSONL file at once and to apply the recursive option to JSON, JSONL, or plugin-added file types.

5.0.0 (2020-05-01)

Added

  • Added support for umlsloinc-ontology as an available feature.

  • The necessity and modifier arguments for medications have been added, and are stored in the medicationrelationnecessity and medicationrelationmodifier tables, respectively.

Changed

  • ResultFuture.done() now returns a ResultFutureStatus object (which is truthy), and this object must be used now for checking progress and general status; ResultFuture.progress() has now been removed (use ResultFutureStatus.progress for this)

4.9.0 (2020-04-24)

Added

  • The database client can now store the JSON it received from the server for each document into a new document.json_representation column. This will require a migration using the migrate subcommand. Storing JSON can be enabled using the process --store-json command-line option.

4.8.0 (2020-04-04)

Added

  • All HTTP client error response codes (4xx) from the emtelliPro server are now handled using an HTTPClientError exception if there isn’t a more specific exception for them. The clients will continue to exit with code 4 for these errors (unless there’s a more specific exit code for them).

  • There is now support for process_with_section_labels. Both clients now accept a --section-label option which will be used for all documents being read in. This is passed verbatim to the server. Support for this option has also been added using the section_label parameter of InputDocument objects. The section label is now stored in a new document.section_label column in the database. NOTE: this change requires you to use the migrate command to add the column to your exiting database.

4.7.0 (2020-03-23)

Added

  • Added support for medication relations. It’s controlled by the medication-relations feature. New tables are medicationrelation, medicationrelationdosage, medicationrelationfrequency, medicationrelationmode, medicationrelationquantity, and medicationrelationroute. It can also be accessed using AnnotatedDocument.relations['medication'].

4.6.0 (2020-03-18)

Added

  • Support for the entity-measurement-unit feature. This means there’s a new table where it gets stored: foundentitymeasurementunit, and the FoundEntity object now has a FoundEntity.measurement_unit attribute.

  • There’s now an emtellipro-db-client migrate sub-command which runs migration scripts against the database to upgrade it to the current schema, copying data as necessary; this currently only supports databases going back to v4.3.0, and going forward every new version of the database client will allow migrating the database. Please ensure you have a backup of the database just in case.

4.5.0 (2020-03-13)

Added

  • The database client now has a --version option

Changed

  • The API server option --server on the two clients is now a required option with no default; this is to avoid accidentally sending data to the wrong jurisdiction.

  • Improved progress bars so they are displayed more reliably.

Fixed

  • It’s now possible to use --help on all subcommands without setting a value for --database on emtellipro-db-client.

4.4.0 (2020-02-20)

Added

  • The database client now checks if the target database contains all necessary columns before submitting documents, and exits early with an error message if any are missing. There is also an new exit code (5) for this and other database-related errors.

Changed

  • There’s now a foundentitytype table which contains the ontology and type name pairs for each found entity. This replaces the foundentity.type_name_* columns, and allows for any future ontologies to be added without changing any table’s schema.

  • The FoundEntity.type_name dictionary will now contain whatever was returned by the API, and no type name will ever be None. If it wasn’t returned by the API, it won’t be present in that dictionary.

4.3.0 (2020-02-04)

Added

  • Added support to the database client for different exit codes based on the error.

  • Improved error handling and logging of exceptions, so now almost every exception ends up being logged to error.log, including details about which documents failed. See the process --help command option for details.

4.2.3 (2020-01-28)

Fixed

  • When attempting to store chartdate in an SQLite database, there was an error about the sqlite driver only accepting Python datetime objects for storing in a DateTime column. Now when storing dates in an SQLite database, the dates are parsed using the dateutil library. For all other databases the dates are left as strings.

  • If a submission fails, the database client no longer crashes with an exception. It now tries to split each submission into individual documents to narrow down the failed document, and logs the failed document to errors.log.

4.2.2 (2020-01-21)

Fixed

  • There was an exception being raised when there weren’t any documents found. Now the client simply prints a message stating that zero documents were found.

4.2.1 (2020-01-16)

Fixed

  • The database client now stores the JSON/JSONL filename in the document.filename column

  • There was an exception being raised when there weren’t any entities found in a document. This is now fixed.

4.2.0 (2020-01-15)

Added

  • The database client now accepts JSON input in a single JSON or JSONL file.

4.1.0 (2020-01-09)

Changed

  • Removed the --fetchall option from the database client, as it is no longer necessary for connecting to SQL Server. This also removes the dependency on records.

4.0.0 (2020-01-05)

Changed

  • Backwards incompatible: To ensure consistent naming of relations and locations, all relation and location types are now singular. This means that AnnotatedDocument.locations.keys() is now ['section', 'sentence'] rather than ['sections', 'sentences']. And for relations, AnnotatedDocument.relations['experiencers'] is now AnnotatedDocument.relations['experiencer']. All other relations were already singular, so experiencers was the odd one out.

    If using the database client or emtellipro_db package, you can now find enums for all the type names in emtellipro_db.rowtypes.

  • The source_document_id column in the documentmetadata table is now a unicode string of length 255, rather than an integer.

Fixed

  • Inconsistent type_ values in the database when using --fast-postgress have now been fixed. They’re now all singular and match the names used without --fast-postgres.

Added

  • There’s now an emtellipro_db.rowtypes module that contains enums for all the different type names stored in type_ columns in the tables that have them.

3.1.1 (2019-12-25)

Added

  • A new option --fetchall to the Database client that is useful when processing documents taken from a database and the connection suffers from timeouts. This option will let you fetch all the documents in one query at the expense of a larger memory footprint. This option is currently required for retrieving documents from Microsoft SQL Server databases for processing.

3.1.0 (2019-12-18)

Added

  • A new database table for storing document metadata, called documentmetadata. It will be populated with extra fields provided using --sql-query for emtellipro-db-client.

  • The database client now accepts a --fast-postgres/--no-fast-postgres option which enables/disables a new batched insertion mode for saving to the database which uses PostgreSQL-specific SQL extensions. This should result in up to a 10x performance improvement.

  • emtellipro.data.InputDocument now takes a file_obj parameter, allowing callers to submit PDF files that don’t come from the file-system.

3.0.1 (2019-11-14)

Added

  • Updated documentation for the emtellipro-db-client Database Client

  • Other minor documentation updates/fixes

3.0.0 (2019-10-31)

Changed

  • Moved examples/database.py into its own module, and it’s now automatically installed when the complete package is installed. The database client is now callable using emtellipro-db-client or python3 -m emtellipro_db

  • Ported advanced client to use SQLAlchemy instead of peewee as the ORM, and also changed the database schema slightly to more easily allow for future additions

2.8.0 (2019-10-01)

Added

  • Added support for sections to the API. They can be accessed through AnnotatedDocument.locations['sections'].

Fixed

  • Since the API response may omit the category and subcategory attributes, the SDK now handles that case correctly by allowing those attributes to be None in AnnotatedDocument, and NULL in the database columns used by the advanced client. Previously, missing attributes would result in a KeyError.

2.7.0 (2019-09-10)

Added

  • Added support to the advanced client for ingesting documents for processing from a database

2.6.0 (2019-08-26)

Added

  • Added support for the umlshgnc-ontology feature

  • Added job IDs and job timestamps to advanced client

2.5.1 (2019-07-17)

Fixed

  • There was a typo in the mime-type for plaintext documents that caused an error in some situations on CentOS. The mime-type has now been fixed.

2.5.0 (2019-07-09)

Added

  • Added support for new features umlsnci-ontology and medcin-ontology.

2.4.0 (2019-06-13)

Added

  • There is now a new exception emtellipro.exceptions.TaskFailedError that is raised when trying to retrieve results of a task that failed.

  • The command-line clients now print some simple stats about how long it took to process the submitted documents.

2.3.0 (2019-04-04)

Added

  • Added support for qualifier relations to the SDK and database.py example. The relevant new class is emtellipro.data.QualifierRelation which is usable through annotated_document.relations['qualifier'].

Fixed

  • There was a bug which caused an empty list of features to be treated as requesting all features (only None means “all features”). This has been fixed and an empty list of features correctly requests no features.

2.2.0 (2019-03-08)

Added

  • Added a debug command to the example client which prints out internal debugging information from the state file.

Changed

  • The example client’s --access-key and --secret-key options now need to be set on the specific commands which use it (such as submit) instead of at the top level. This allows submit --help to work as expected and allows for commands which don’t need those options.

2.1.0 (2019-02-21)

Added

  • The type names dictionary for found entities now contains a “radlex” key. This has also been added to the relevant database table.

2.0.2 (2019-02-19)

Fixed

  • A bug was fixed where one couldn’t specifically request the followup relations.

2.0.1 (2019-02-14)

Fixed

  • A bug was fixed where one couldn’t specifically request the measurement and imagelink relations.

2.0.0 (2019-02-11)

Added

  • There is now support for two new relation types: measurements and image-links. They’re implemented similarly to the previous relation types.

Changed

  • The example client now shards reports, and produces a JSONL formatted output file.

  • The example client now allows continuing an unfinished submission using the continue subcommand.

  • The Emtellipro.submit() method now splits up the submitted documents so they fit in the maximum request size limit; this means that it now returns an iterable of ResultFuture instances, instead of just one.

1.6.0 (2019-01-16)

Added

  • The example database.py script now saves the text of found entities in a foundentitytext table.

  • The example client now saves a mapping file along with the output file which contains the mapping from document ID to file path (space-separated).

  • The example client’s submit command learned --doc-id-filepath which sets the document ID to the filepath (for easier matching of files to annotations). If this flag isn’t set, the new mapping file can be used instead.

  • The example client’s submit command learned --recursive for recursively looking for files in any directories provided as input.

1.5.1 (2019-01-10)

Fixed

  • The progress bar in the example client was incorrectly computed, and got to 100% too quickly. This has now been fixed so the progress displayed accurately represents the progress provided by the emtelliPro server.

1.5.0 (2018-12-24)

Changed

  • The document type can now be specified per InputDocument instance by passing the type_ parameter to __init__(). If not specified, the Emtellipro.submit() method will set it, so previous ways of setting the document type for all documents will continue to work.

  • utils.read_files() used a default encoding of UTF-8 for all files, but in some cases not all files will have the same encoding. It now autodetects the encoding for each file, unless explicitly told the encoding to use for all files.

1.4.1 (2018-11-23)

Changed

  • Use Radiology/generic for default category/subcategory in database.py and example client.

  • Set default document type as ‘plain’ in database.py

1.4.0 (2018-11-20)

Changed

  • Use new multi-file format for submitting documents to emtelliPro server instead of older JSON-based one. There are no user-facing changes as a result of this.

  • The computation of document progress was changed to match changes to the API’s response for /status

  • Added sample_Discharge_summary_report.txt to example-data folder.

  • Added generic report subtype as default

1.3.0 (2018-11-15)

Added

  • examples/database.py gained a --store-reports flag on its process command which will enable storing of the original report text in the text column of the document table.

  • A foundentitylocation table was added to examples/database.py which maps found entities to locations (e.g. sentences) where they were found. These are the same locations as available in emtellipro.data.AnnotatedDocument.locations

Fixed

  • emtellipro.data.Span objects are now hashable, fixing an issue with their use as dictionary keys in FoundEntity.parts and SentenceLocation.parts

1.2.0 (2018-11-9)

  • Initial post-Beta release