Using the Database Client with Microsoft SQL Server

The emtelliPro database client works well with Microsoft SQL Server, but just due to how SQL Server works, there are a few more steps needed to establish connectivity, as well as one additional option needed at runtime. Also, if you’re using SQL Server, you may be using Microsoft Windows, so we’ve included instructions here for both Windows usage. Linux usage is fairly similar - but the drivers you use may be different (e.g. FreeTDS vs Microsoft’s drivers) and so the database URLs used with the drivers will have to be different, and the driver install and setup processes will necessarily be different.

This document assumes that either you, or someone in your organization can help you with the SQL Server setup/installation/connectivity bit. You may need to modify settings on your SQL Server installation to allow remote connectivity, deal with certificates, etc in order to establish remote connectivity to the SQL Server instance.

Instructions for Microsoft Windows

Step 1: Install the Required Software

NB - it’s assumed here you’re using Windows 7 or newer.

There are a few pieces of software you’ll need to install and have working in order to use the emtelliPro Database Client in Windows with connectivity with MS SQL Server:

  • Python 3

  • Microsoft ODBC Drivers for SQL Server

  • The emtelliPro Python Client/SDK

  • Some pip modules will be installed as part of installing the client

Let’s start off by installing Python 3. First, download a version of the Python 3 installer for Windows. The Windows Python installer can be downloaded from the python.org site. Run the installer, ensuring that you select the option to add Python to your path, and just using the default options are fine:

_images/install_python_windows.png

Next, download and install the Microsoft ODBC Drivers for SQL Server, which are available from Microsoft.. Make sure you install them with selecting the additional option to include the SDK as well.

The next step is to download and unpack the emtelliPro Python Client/SDK, minimum version 3.1.1, available from the emtelligent Download page. We have both .tar and .zip files available - we suggest you use whichever version that you’re comfortable with unpacking. If you’re on Windows and you need to untar a .tar file, and you double-click it in explorer and nothing happens, or it asks you what program you want to open this with, you’ll want to install some kind of compression/uncompression software that supports UNIX tar files. While this isn’t an official endorsement, 7-zip is open source and works well, and can be downloaded here.

The final step is to set up and install the database client. Start off by opening a command prompt window (you can do this by going to the start menu, and typing ‘cmd’ and hitting ‘enter’).

Next, go to the directory that you unpacked the emtelliPro Python SDK into. It might be something like C:\Users\jsmith\Documents\emtellipro-python-sdk if your Windows username is jsmith. If I wanted to get to this directory, I could do it using the Windows cd command:

cd c:\Users\jsmith\Documents\emtellipro-python-sdk

Now, we’ll set up a python virtual environment by running the following at the command line prompt:

python -m venv venv

This should create a directory under the current directory that contains the virtual environment (venv) files. Any time we want to run the emtellipro Python client(s), we need to be inside this venv; you do this by activating the venv with the following command, which should be executed from within the emtellipro-python-sdk directory:

venv\Scripts\activate.bat

If this has worked, it will change your command prompt - it should now have (venv) at the start of it and should look something like:

(venv) C:\Users\jsmith\Documents\emtellipro-python-sdk>

You’re now ready to install the client. These next steps install python packages using the pip installer program, and will pull down packages from pip archives on the internet - so make sure the machine you’re using has access to the internet. First, install the pip wheel package:

pip install wheel

Then, install the emtelliPro client. NB - the client name changes depending on the version - here, we’re installing version 3.1.1, so the filename is emtellipro-3.1.1-py3-none-any.whl:

pip install emtellipro-3.1.1-py3-none-any.whl

Finally, install the pyodbc package:

pip install pyodbc

This should be all that you need in terms of installed software. Next, we’ll go on to using the database client.

Step 2: Process Reports

A more exhaustive explanation of the database client is available here: Database Client, but basically the database client can either:

  • Retrieve text or PDF documents from a database table, send them to emtelliPro for processing, and store the output in a database

  • Retrieve text, PDF, or emtelliPro JSON-format documents stored disk, send them to emtelliPro for processing, and store the output in a database

Process Reports from a Database

The example below is for retrieving text documents from a database table, processing them with emtelliPro, and storing the output in a database. We are making some assumptions here:

  1. We have access to a SQL Server database of medical documents which we can query, and which we know the constraints of the query - e.g. reports from date A to date B, with a certain report description.

  2. We can create a new database into which to store the structure data from our emtelliPro processing run.

The first thing we need to figure out is the query and SQL server settings that we’re going to use for for #1 above: In this case, let’s say we have a table called reports in a database called reports. The SQL Server username we’re using , which has access to this reports database is emtellipro and the password is password. The SQL Server is running on a host with an IP address of 10.100.1.70, and SQL Server is listening on TCP port 1433. The SQL query we’re planning on using to retrieve our reports and metadata is:

SELECT TOP 10
    report_id AS id,
    report_id AS source_document_id,
    subject_id,
    description,
    report AS text,
    category,
    subcategory
    FROM reports
    WHERE category='Radiology' and subcategory='CT'
    ORDER BY report_id ASC

Please see the Database Client documentation for more info on how to do the field mapping for this kind of query.

The next step in preparing for processing is to create a database into which to store the emtelliPro output. First, go into SQL Server Management Console, and right-click on ‘Databases’ in the tree in the left-hand pane and choose ‘New Database…’:

_images/mssql_create_new_db_step1.png

In the window that opens in this case I’ll call this new database emtellipro_output and give it an owner of the emtellipro user account and click OK - no other special settings were required.

_images/mssql_create_new_db_step2.png

Our next step is at the command prompt, inside our venv that we created above - we need to now create the database tables in this new DB into which our emtelliPro output will be stored:

emtellipro-db-client --database "mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=ODBC+Driver+17+for+SQL+Server" create-db

If this works, we should see the client echo back Tables created.

Note

If you are using Windows Authentication (instead of the above example, which uses SQL Server Authentication), you need to use a slightly different connection string to use a trusted connection, which would look like (in this example, our fully qualified windows username is DESKTOP-HGX2AXQ\emtellipro):

emtellipro-db-client --database "mssql+pyodbc://DESKTOP-HGX2AXQ\emtellipro:str0nGP8%[email protected]:1433/emtellipro_output?driver=ODBC+Driver+17+for+SQL+Server&Trusted_Connection=Yes" create-db

Our next step is to just do the processing run:

emtellipro-db-client ^
   --database "mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=ODBC+Driver+17+for+SQL+Server" process ^
   --access-key <your-access-key>  ^
   --secret-key <your-secret-key>  ^
   --sql-query "SELECT TOP 10 report_id AS id, report_id AS source_document_id, subject_id, description, report AS text, category, subcategory FROM reports WHERE category='Radiology' and subcategory='CT' ORDER BY report_id ASC" "mssql+pyodbc://emtellipro:[email protected]:1433/reports?driver=ODBC+Driver+17+for+SQL+Server" ^
   --server https://api.emtelligent.com:50001  ^
   --store-reports  ^
   --store-sections-and-sentences  ^
   --document-type plain  ^
   --category Radiology  ^
   --subcategory CT  ^
   --features snomed-ontology,entity-polarity,entity-uncertainty,followup-relations,measurement-relations,text  ^
   --job-id ct_10

If this is successful, we should see something similar to the output below:

Submitting 10 documents...
Waiting for processing results  [###---------------------------------]   10%  00
Waiting for processing results  [#######-----------------------------]   20%  00
Waiting for processing results  [##########--------------------------]   30%  00
Waiting for processing results  [##############----------------------]   40%  00
Waiting for processing results  [##################------------------]   50%  00
Waiting for processing results  [#####################---------------]   60%  00
Waiting for processing results  [#########################-----------]   70%  00
Waiting for processing results  [############################--------]   80%  00
Waiting for processing results  [################################----]   90%  00
Waiting for processing results  [####################################]  100%

10 documents saved to database
Processed 10 reports in 3.93s for an average per report processing time of .392577s per report
Saved documents to database in 22.87s (2.29s / document)

Some important pointers about using the Database Client in Windows:

  • In the code block above, you’ll see some ^ characters; these are optional and they’re there just to let you enter multi-line commands; you can type it all as one long string but it gets unwieldy and difficult to debug

  • The database URLs are enclosed in double-quotes; single-quotes don’t work just to how the windows command deals with syntax

  • You don’t have to use MS SQL server in Windows - you could be using any database that SQLAlchemy supports - e.g. MySQL, SQLite, etc - your URL(s) would just be different than the examples I’ve provided above.

Process Reports from Files on Disk

If you plan to use the client to process multiple documents in Windows that are stored in files on disk, you should use Windows PowerShell due to its better handling of files and directories than the standard Windows Command Prompt. The examples below in this section all require the use of PowerShell.

Using the instructions from the above section, create an empty database in SQL Server (e.g. emtellipro-output) with a user account that you have access to (e.g. the emtellipro user account from the above section).

Our next step is at the command prompt, inside our venv that we created above - we need to now create the database tables in this new DB into which our emtelliPro output will be stored. Note that because we are using PowerShell, we need to use a different Python ‘activate’ script (activate.ps1 rather than activate.bat) to activate our venv:

.\venv\Scripts\activate.ps1

Next, we can create our database tables using the same command as above:

emtellipro-db-client --database "mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=ODBC+Driver+17+for+SQL+Server" create-db

If this works, we should see the client echo back Tables created.

Next we should go into the directory which contains the files that we want to process - in this case, we have a directory containing 1000 CT scan reports called 1k_radiology_CT:

cd .\1k_radiology_CT

Our next step is just to do the processing run. Note that because we are using PowerShell, the line continuation character is different; it is now a backtick instead of a caret. And different from the above section, because we are not querying a database to retrieve the documents there is no --sql-query command, and we are using (ls .) list the files in the current directory for processing:

emtellipro-db-client `
--database "mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=ODBC+Driver+17+forSQL+Server" `
process `
--access-key <your-access-key>  `
--secret-key <your-secret-key>  `
--server https://api.emtelligent.com:50001  `
--store-reports  `
--store-sections-and-sentences  `
--document-type plain  `
--category Radiology  `
--subcategory CT  `
--features snomed-ontology,entity-polarity,entity-uncertainty,entity-known-ambiguity,followup-relations,measurement-relations,text  `
--job-id ct_1k  `
(ls .)

If this is successful, we will see the client echo back the processing progress bars similar to the above.

Instructions for Linux

These are abridged instructions, but they should be enough to get users started.

Using the Database Client with FreeTDS on Ubuntu Linux 20.04

Step 1: Setup the Drivers

NB: Instructions for other Linuxes may use different package installers.

First, you need to install a number of packages:

sudo apt-get install unixodbc unixodbc-dev freetds-dev freetds-bin tdsodbc

Next, edit the /etc/odbcinst.ini file:

sudo nano /etc/odbcinst.ini

Create a code block in this file that reads:

[FreeTDS]
Driver=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Threading=1

Optionally, if you want to test connectivity to your SQL server to see if you have FreeTDS installed correctly, you can then use the tsql utility which should now be installed. In the example below, we’re connecting to a SQL Server instance with an IP address of 10.100.1.70, listening on port 1433, with a username of emtellipro.

tsql -H 10.100.1.70 -p 1433 -U emtellipro

Note

If you’re using newer versions of SQL server (e.g. SQL Server 2019), you’ll have to specify which TDS version you’re using, e.g.:

TDSVER=8.0  tsql -H 10.100.1.70 -p 1433 -U emtellipro

And if you’re using Windows authentication on your server, and/or if your password contains special characters, you should use single quotes around your username and password. You’ll need something like this next example if you’re using Windows Authentication with SQL Server 2019:

TDSVER=8.0  tsql -H 10.100.1.70 -p 1433 -U 'DESKTOP-HGX2ASQ\emtellipro' -P 'str0nGP8%$w0Rd!'

If we are able to connect, this should give a prompt that looks like:

locale is "en_US.UTF-8"
locale charset is "UTF-8"
using default charset "UTF-8"
1>

And you can execute queries similar to the following (customize the SQL query for your environment):

1> USE reports;
2> go
1> SELECT TOP 2 report_id, subject_id, category, subcategory, description FROM reports;
2> go
report_id   subject_id  category    subcategory description
1001113 11232   Radiology   CT  CT C-SPINE WITHOUT CONTRAST
1001114 11234   Radiology   CT  CT HEAD WITHOUT CONTRAST
(2 rows affected)

Assuming that this worked, this means that our drivers are at least partially properly installed and we have connectivity to the server.

Next, we want to download and install the database client. Please see the Database Client for more info on how to do this. The only different step is to install the pyodbc pip package. Make sure you’re in the python venv where you installed the client and run the following:

pip install pyodbc

Step 2: Using the Client

Please see the Windows section of this document above for more detail on creating an output database, but assuming that we have done this and that we know what SQL query we want to use to retrieve our reports for processing, our first step in a processing run would be to create the tables used to store the emtelliPro output:

emtellipro-db-client --database mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=FreeTDS create-db

Note

For newer versions of SQL Server (e.g. SQL Server 2019), we need to explicitly specify the TDS version in our connection string. And for Windows Authentication, we should use double quotes to enclose our connection string. Here is an example for both SQL Server 2019 using Windows Authentication:

emtellipro-db-client --database "mssql+pyodbc://DESKTOP-HGX2AXQ\emtellipro:str0nGP8%[email protected]:1433/emtellipro_output?driver=FreeTDS&TDS_VERSION=8.0" create-db

And next we can start a processing run with the following command:

emtellipro-db-client \
    --database 'mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=FreeTDS&TDS_Version=8.0&ClientCharset=UTF-8'  \
    process \
    --access-key <your-access-key> \
    --secret-key <your-secret-key>  \
    --sql-query "SELECT TOP 1000 report_id AS id, report_id AS source_document_id, subject_id, description, report AS text, category, subcategory FROM reports WHERE category='Radiology' and subcategory='CT' ORDER BY report_id ASC" 'mssql+pyodbc://emtellipro:[email protected]:1433/reports?driver=FreeTDS' \
    --server https://api.emtelligent.com:50001 \
    --store-reports \
    --store-sections-and-sentences \
    --document-type plain \
    --category Radiology \
    --subcategory CT \
    --features snomed-ontology,entity-polarity,entity-uncertainty,followup-relations,measurement-relations,text \
    --job-id ct_1000

If this works, you should see output similar to the Windows output from the section above. One important point to note when using FreeTDS is that the output database URL above ends in ?driver=FreeTDS&TDS_Version=8.0&ClientCharset=UTF-8 – the TDS_Version and ClientCharset directives are required for this to work otherwise you’ll get character encoding errors and the database inserts will fail.

Using the Database Client with MS ODBC Drivers on Ubuntu Linux 20.04

Step 1: Setup the Drivers

If you prefer not to (or for technical reasons you can not) use the FreeTDS drivers, you can use the Microsoft-provided drivers, which are available for download directly from Microsoft:.

Installing these drivers is well documented on the Microsoft site, but a summary of the commands required for installing these drivers is here:

sudo su
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
exit
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install msodbcsql17
sudo ACCEPT_EULA=Y apt-get install mssql-tools
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc

This install process should install a new section into /etc/odbcinst.ini with configuration information for these new drivers that looks something like:

[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.4.so.2.1
UsageCount=1

You can then test connectivity using the (newly-installed) sqlcmd utility:

sqlcmd -S 10.100.1.70 -U emtellipro -p password

Similar to FreeTDS, if your connection is successful, this will give you a 1> prompt which you can use to issue some SQL statements:

1> USE reports;
2> GO
Changed database context to 'reports'.
1> SELECT COUNT(*) FROM reports;
2> GO

-----------
       1000

(1 rows affected)
1> quit

Next, we want to download and install the emtelliPro database client. Please see the Database Client for more info on how to do this. The only different step is to install the pyodbc pip package. Make sure you’re in the python venv where you installed the client and run the following:

pip install pyodbc

Step 2: Using the Client

Please see the Windows section of this document above for more detail on creating an output database, but assuming that we have done this and that we know what SQL query we want to use to retrieve our reports for processing, our first step in a processing run would be to create the tables used to store the emtelliPro output:

emtellipro-db-client --database mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=ODBC+Driver+17+for+SQL+Server create-db

And next we can start a processing run with the following command:

emtellipro-db-client \
    --database mssql+pyodbc://emtellipro:[email protected]:1433/emtellipro_output?driver=ODBC+Driver+17+for+SQL+Server \
    process \
    --access-key <your access key> \
    --secret-key <your secret key>  \
    --sql-query "SELECT TOP 10 report_id AS id, report_id AS source_document_id, subject_id, description, report AS text, category, subcategory FROM reports WHERE category='Radiology' and subcategory='CT' ORDER BY report_id ASC" 'mssql+pyodbc://emtellipro:[email protected]:1433/reports?driver=ODBC+Driver+17+for+SQL+Server' \
    --server https://api.emtelligent.com:50001 \
    --store-reports \
    --store-sections-and-sentences \
    --document-type plain \
    --category Radiology \
    --subcategory CT \
    --features snomed-ontology,entity-polarity,entity-uncertainty,followup-relations,measurement-relations,text \
    --job-id ct_10

If this works, you should see output similar to the Windows output from the section above.

SQL Server Connectivity Troubleshooting

If you’re having difficulty connecting to your SQL server instance, by all means enlist some help from your local, friendly DBA. But if one isn’t available, then you might need to dig into it a bit yourself.

Start by ensuring that TCP/IP connections are enabled for the server. There are lots of websites that detail how to do this like this one. (ignore the part on named pipes and just enable TCP/IP service, and on the correct interface. Remember to restart the SQL server service after you’re done!)

You can then (or even before the above step) confirm that SQL server is listening to a TCP port:

First, open a DOS command window (Start Menu-> type ‘cmd’ and hit enter). Next, type netstat -atn. If SQL server is listening, you should see a line like the following (the important part is the 0.0.0.0:1433):

TCP    0.0.0.0:1433           0.0.0.0:0              LISTENING       InHost

Next, test that your machine has TCP/IP connectivity to the server, by trying to telnet to the SQL Server’s TCP port. You can do this by opening a DOS window (in Windows, click on the Start Menu -> type cmd and hit enter; in Linux open a terminal window):

telnet <ip address of server> 1433

In windows, if you see:

'telnet' is not recognized as an internal or external command,
operable program or batch file.

Then you have to enable telnet first – go to Control Panel -> Programs and Features -> Turn Windows Features on or Off -> Check the ‘Telnet Client’ box and hit ‘OK’. If you’re on a Mac, installing telnet is difficult - try googling ‘homebrew install telnet’. Linux should have telnet installed by default.

On Windows, if you’re successful, the dos window will scroll up and you’ll just see an empty dos box. You can close the window to get out of it. If you’re not successful, you’ll see something like:

Connecting To <ip address of server>...Could not open connection to the host, on port 1433: Connect failed

On Linux, if you’re successful, you should see something like:

Trying <ip address of server>...
Connected to <ip address of server.
Escape character is '^]'.

At this step, if you’re successful, it means that you have TCP/IP access to your SQL Server. You should try some of the connection examples with the emtellipro client above, and try looking at the SQL Server logs if you’re having difficulty logging in.

If you’re not successful with the steps above, i.e. you can’t prove that you have TCP/IP connectivity to the server, you may need to open port 1433 on the server’s Windows or Linux firewall, or some other internal network firewall may be blocking access. Adding an inbound rule to the Windows firewall is pretty straightforward, but you may want to talk to your sysadmin before making any global security changes like that.