Structured entries as metadata

CCD files also contain structured entries represented by XML elements. The client extracts information from structured entries as as extra metadata to go in the documentstructuredmetadata database table. Note that this information goes directly to the database, and is not passed through emtelliPro for processing.

Each metadata item in the database is a JSON blob containing a single JSON object, and has a field label which is used for cross-references between items. Fields in metadata correspond to XML attributes or closely related XML elements (but with the names converted to lowercase with underscores) unless described otherwise here. See the metadata schema for more details.

Each metadata item is based on an XML element which implies coded information or a relationship, and an XML element that the code or relationship is for. The tags of these two elements are code_elt_tag and for_elt_tag respectively.

When there is coded information (ontology information), it is in fields code, code_system, code_system_name, and code_system_version, corresponding to the XML attributes.

When metadata implies a relation, parent_reference is the label of another metadata item which is referred to. relation_type_code corresponds to an XML typeCode attribute found for the relation.

Some metadata items refer to coded elements which are translations of other coded elements. These have the translation_of field set to the label of the metadata item with the codes they are a translation of, and fields other than the code information will be identical.

The section field is the label of the metadata item for the innermost containing <section> element.

Metadata items which refer to text have the text in text and the spans (into the plain text document as sent to emtelliPro and stored in the database) in text_spans. If text_spans is null then the text is not part of the plain text document, and text may not correspond to any substring of the plain text document. If the text was determined by a referenced separate XML element, that element’s ID is stored in text_id. text_source indicates where the text came from:

  • inline: There was an explicit text element for the element the metadata item is for or for its code element.

  • reference: Like inline but the explicit text was as a reference (to the element with ID text_id).

  • contained: There was no explicit text element, but the metadata item was for an element which does contain a text element at some level of nesting.

Note

If the client is not capturing the structured information you need, then contact Emtelligent.