Using the Database Client with Snowflake

Snowflake is a cloud-based database which can be used as an RDBMS for the emtelliPro database. Compatibility with the emtelliPro database client is achieved using the Snowflake plugin for SQLAlchemy, the ORM that the emtelliPro database client uses. This document aims to walk the user through installing the Snowflake plugin for SQLAlchemy, as well as creating a Snowflake-based emtelliPro database and schema and processing documents with the emtellPro client for storage in a Snowflake-based emtelliPro database.

Direct Insertion into Snowflake

The steps for this howto are:

Install the emtelliPro Database Client and Snowflake SQLAlchemy plugin
Create a database in Snowflake
Create the emtelliPro tables/schema in the new database
Process a document and store the output in the new database

Step 1: Install the Required Software

These instructions are for Linux; however performing these steps on MacOS or Windows will be similar but may require slight changes to the commands depending on the OS and shell used.

First, download and install the SDK/clients using the instrucions from Basic Client and Database Client. Or, for convenience, the commands to run to install the basic and database clients are:

python3 -m venv venv
source venv/bin/activate
pip install --upgrade wheel setuptools
pip install emtellipro-VERSION-py3-none-any.whl

(where VERSION in the filename in last command above is the version of the SDK you’re installing).

Now, let’s install the Snowflake plugin for SQLAlchemy:

pip install --upgrade snowflake-sqlalchemy

Finally, you should test your installation to ensure that the emtelliPro database client is working. Ensure that you’re still within the Python virtualenv and execute the following:

emtellipro-db-client --version

If you have a working emtelliPro database client, you should see output similar to:

emtellipro-db-client, version 5.10.1

Step 2: Create a Database in Snowflake

The screenshots and commands in this section may vary slightly for your version of Snowflake.

First, go to the ‘Warehouses’ tab in the Snowflake UI and ensure that the warehouse that you’re going to use is in status Started, as shown in the screenshot below:

Note that the name of our warehouse in this case is COMPUTE_WH as we will use this name again shortly. If the warehouse that you want to use is not in Started status, then you’ll need to start it before proceeding.

Next, click on the ‘Databases’ tab in the Snowflake UI and click on the ‘Create’ button to create a new database. In this case, we’ve called our new database EMTELLIPRO, as shown in the screenshot below:

For the purposes of this howto, we’re just going to use the sysadmin’s username to access this database; we don’t need to set any additional privileges or do anything else at this point.

Step 3: Create the emtelliPro Tables

Next, we’ll use emtellipro-db-client to create the tables for storing the emtelliPro output in our new database. For this step, make sure that you’re inside the Python virtual environment that we created in Step 1 above. If you have dropped out of the Python venv, you can get back inside of it with:

source venv/bin/activate

You should take a minute to gather your variables, as there are quite a few in this step. The variables we’re going to need are:

<snowflake_username> - the Snowflake username we’ll use for this database
<snowflake_password> - the Snowflake password for the account that we are using
<snowflake_account> - the Snowflake account that we are using
<snowflake_database> - the name of the database we created above (in this case, it is EMTELLIPRO)
<snowflake_schema> - the Snowflake schema name that we are using (we will use the default schema of PUBLIC)
<snowflake_warehouse> - the name for our Snowflake warehouse (in this case, it is COMPUTE_WH)
<snowflake_role> - the Snowflake role name that we are using (for this username, it is SYSADMIN)

Now, let’s build the Snowflake URL that we are going to use for accessing our Snowflake database, which we’ll use in this step and in Step 4 below. It has the following format:

snowflake://<snowflake_username>:<snowflake_password>@<snowflake_account>/<snowflake_database>/<snowflake_schema>?warehouse=<snowflake_warehouse>&role=<snowflake_role>

Once we have plugged our variables into the URL, we can use it with the emtelliPro database client’s create-db command as follows:

emtellipro-db-client \
    --database "snowflake://user:password@account/EMTELLIPRO/PUBLIC?warehouse=COMPUTE_WH&role=SYSADMIN" \
    create-db

Note that the use of double-quotes in the example above are necessary to get bash to parse the command properly, and that the line continuation character of \ is bash-specific; different shells (like the Windows command interpreter) use different characters for this. If this command is successful in creating the tables, the database client should echo something back like:

Using plugins: None
Tables created

And in the Snowflake UI, the tables that were just created will be visible in the EMTELLIPRO database as shown below (note that more tables are present than those shown in the screenshot):

Using private keys for authentication

All commands in the Python database client also take --snowflake-private-key-path as an option alongside --database. This option should be set to the path of the private key used for the connecting to Snowflake. If using this option, omit the password <snowflake_password> in the connection URL above. To be consistent with SnowSQL, the passphrase for this key file can be passed using SNOWFLAKE_PRIVATE_KEY_PASSPHRASE or SNOWSQL_PRIVATE_KEY_PASSPHRASE in the environment, or if those environment variables are not set, the client will prompt for the passphrase if necessary. Both encrypted and unencrypted key files are supported.

Step 4: Process a Sample Report and Store the Output in the Snowflake Database

In this processing example, we will use the Snowflake URL that we built in Step 3 above, and we will process the sample CT scan report that is in the Python SDK in the file example-data/sample_ct_report.txt.

Again, inside of the Python virtual environment that has been used in these steps, we will process the sample CT scan report with the command below. Note that the variables <access_key>, <secret_key>, and <server_url> are for the public access key, secret key, and the emtelliPro server API URL that you are using. These should be assigned to you by emtelligent or your system administrator.

emtellipro-db-client \
    --database "snowflake://user:password@account/EMTELLIPRO/PUBLIC?warehouse=COMPUTE_WH&role=SYSADMIN" \
    process \
    --bulk-insert \
    --access-key <access_key> \
    --secret-key <secret_key>  \
    --server <server_url> \
    --document-type plain \
    --category Radiology \
    --subcategory CT \
    --job-id snowflake_demo \
    ./example-data/sample_ct_report.txt

If this command is successful, we should see output similar to the following:

Using plugins: None
NOTE: You've enabled bulk-inserts when saving. Due to bulk insertion the Storing progress bar below will only work in increments of 50 reports.
Connecting as [email protected]
Submitting 1 documents...
Processing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00]
Storing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [06:47<00:00]
1 documents saved to database
Processed 1 reports in 10.83s for an average per report processing time of 10.827675s per report
Saved documents to database in 6:48s (6:48s / document)

The output above shows a successful processing and saving of a document to the Snowflake emtelliPro database that we have created. Note that the storage performance when using a hosted, cloud-based Snowflake instance can be less than that of a locally-hosted database due to network packetization and transit delays.

We can now go back to the Snowflake UI, and in the ‘Worksheets’ tab, select the EMTELLIPRO database and execute a simple query such as:

SELECT * FROM concept WHERE ontology='snomed';

From processing the sample_ct_report.txt document above, we would then expect to see output (based on the current version of emtelliPro) similar to the following:

If you can see data similar to the above with this query, then you have been able to successfully process and store data from emtelliPro into a Snowflake-based emtelliPro database. For further reading on SQL queries, please see the documentation on the emtelliPro database schema.

Access JSON data in Snowflake

Depending on the options provided to the Python database client, the entire JSON output from emtelliPro can be stored in the document table in the emtelliPro database. The JSON is stored in the JSON_REPRESENTATION column in the document table as a varchar type. In order to access the JSON data in Snowflake the parse_json SQL command is necessary.

SELECT parse_json(d.JSON_REPRESENTATION):id::string as id from document d;

This SQL statement returns a single column table of the task ids stored in the JSON.

CSV uploads to Snowflake

Creating sequences for unique numerical identifiers which is used in the emtelliPro database schema can be quite slow in Snowflake. One way around this is to use UUIDs for the identifiers and generate CSV files which can be uploaded to a Snowflake database. This process is much faster than the direct insertion into a Snowflake database described above. Using CSV uploads the concepts table might contain duplicates as Snowflake does not enforce the uniqueness constraint for file uploads. The user will have to run a post-process step to remove the duplicates or at least be aware of these duplicates in the concept table when writing the SQL queries against the database.

Steps 1 through 3 are the same as in the above section about direct insertion into a Snowflake database. Step 4 onwards will be different: we will create CSV files as the output emtelliPro database and then upload the CSV files to a Snowflake database.

Step 4: Process Data and Store the Output in CSV Files

This section provides an example processing run of some input data and storing the output in CSV files that can be then uploaded to the Snowflake database created in Steps 1 through 3 from the previous section.

Step 4a

First create a directory to store the CSV files. For this example, let’s call this the csv_output directory.

Step 4b

Use the emtellipro database client to initialize the CSV output files using create-db. I’m using the csv_output directory.

emtellipro-db-client \
  --database csv://csv_output \
  create-db

Step 4c

Process some Clinical text documents stored in the files reports*.txt and store the output as CSV files to the directory csv_output.

emtellipro-db-client \
  --database csv://csv_output \
  process \
  --access-key $(cat access.key) \
  --secret-key $(cat secret.key) \
  --server $(emtellipro-server) \
  --document-type plain \
  --category Clinical \
  --subcategory generic \
  --job-id csv \
  reports*.txt

Step 5: Use snowsql to upload the CSV files

Set up snowsql with the appropriate command line parameters or create a config file by following the Snowflake documentation for snowsql.

Make sure you are in the directory where the csv files were created (csv_output in our running example) or modify the script with the path to the csv files. You may want to timestamp the csv files but the script provided below simply removes the staged files after the insertion into the snowflake database is done. The sample script below does not do a lot of error checking and it just removes the uploaded csv files so please examine the script carefully and try it on sample data before running it on production data.

Assume that using Steps 1 through 3 of the previous section, the database EMTELLIPRO has been created on Snowflake and the emtelliPro tables have been created using the create-db option in the Python database client. Also we assume below you have a config file setup for snowsql for the EMTELLIPRO database on Snowflake.

Save the snowsql code provided at the end of this section to the file snowflake_upload.sql and then run the SQL code using snowsql.

cd csv_output
snowsql -c EMTELLIPRO

In the snowsql interpreter type in:

!load snowflake_upload.sql

The emtelliPro database will be available on Snowflake once the upload completes successfully.

The Snowflake database can be queried just like the database created through direct insertion into Snowflake from the previous section.

The SQL code for snowflake_upload.sql:

-- snowflake_upload.sql
-- Uploads csv to table stage
-- Usage: Ensure the file names are correct in the PUT and COPY command (find/replace as needed)
--        Current file path assumes snowsql session is started in the directory containing the csvs

CREATE STAGE IF NOT EXISTS ASSUMEDENTITY;
CREATE STAGE IF NOT EXISTS CONCEPT;
CREATE STAGE IF NOT EXISTS DOCUMENT;
CREATE STAGE IF NOT EXISTS DOCUMENTCONCEPT;
CREATE STAGE IF NOT EXISTS DOCUMENTMETADATA;
CREATE STAGE IF NOT EXISTS DOCUMENTSTRUCTUREDMETADATA;
CREATE STAGE IF NOT EXISTS ENTITY;
CREATE STAGE IF NOT EXISTS EXPERIENCERRELATION;
CREATE STAGE IF NOT EXISTS FOLLOWUPRELATION;
CREATE STAGE IF NOT EXISTS FOLLOWUPRELATIONPROCEDURE;
CREATE STAGE IF NOT EXISTS FOLLOWUPRELATIONREASON;
CREATE STAGE IF NOT EXISTS FOUNDENTITY;
CREATE STAGE IF NOT EXISTS FOUNDENTITYCONCEPT;
CREATE STAGE IF NOT EXISTS FOUNDENTITYLOCATION;
CREATE STAGE IF NOT EXISTS FOUNDENTITYMEASUREMENTUNIT;
CREATE STAGE IF NOT EXISTS FOUNDENTITYSPAN;
CREATE STAGE IF NOT EXISTS FOUNDENTITYTYPE;
CREATE STAGE IF NOT EXISTS IMAGELINKRELATION;
CREATE STAGE IF NOT EXISTS IMAGELINKRELATIONIMAGEFINDING;
CREATE STAGE IF NOT EXISTS IMAGELINKRELATIONREFERENCE;
CREATE STAGE IF NOT EXISTS LOCATION;
CREATE STAGE IF NOT EXISTS MEASUREMENTRELATION;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATION;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONDOSAGE;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONDURATION;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONFREQUENCY;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONINDICATION;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONMODE;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONMODIFIER;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONNECESSITY;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONQUANTITY;
CREATE STAGE IF NOT EXISTS MEDICATIONRELATIONROUTE;
CREATE STAGE IF NOT EXISTS PROCESSINGDETAILS;
CREATE STAGE IF NOT EXISTS QUALIFIERRELATION;
CREATE STAGE IF NOT EXISTS RELATION;
CREATE STAGE IF NOT EXISTS SECTIONLOCATION;
CREATE STAGE IF NOT EXISTS SECTIONLOCATIONSPAN;
CREATE STAGE IF NOT EXISTS SENTENCELOCATION;
CREATE STAGE IF NOT EXISTS SENTENCELOCATIONSPAN;
CREATE STAGE IF NOT EXISTS SENTENCESECTION;

PUT file://assumedentity.csv  @%ASSUMEDENTITY auto_compress=true;
PUT file://concept.csv  @%CONCEPT auto_compress=true;
PUT file://document.csv  @%DOCUMENT auto_compress=true;
PUT file://documentconcept.csv  @%DOCUMENTCONCEPT auto_compress=true;
PUT file://documentmetadata.csv  @%DOCUMENTMETADATA auto_compress=true;
PUT file://documentstructuredmetadata.csv  @%DOCUMENTSTRUCTUREDMETADATA auto_compress=true;
PUT file://entity.csv  @%ENTITY auto_compress=true;
PUT file://experiencerrelation.csv  @%EXPERIENCERRELATION auto_compress=true;
PUT file://followuprelation.csv  @%FOLLOWUPRELATION auto_compress=true;
PUT file://followuprelationprocedure.csv  @%FOLLOWUPRELATIONPROCEDURE auto_compress=true;
PUT file://followuprelationreason.csv  @%FOLLOWUPRELATIONREASON auto_compress=true;
PUT file://foundentity.csv  @%FOUNDENTITY auto_compress=true;
PUT file://foundentityconcept.csv  @%FOUNDENTITYCONCEPT auto_compress=true;
PUT file://foundentitylocation.csv  @%FOUNDENTITYLOCATION auto_compress=true;
PUT file://foundentitymeasurementunit.csv  @%FOUNDENTITYMEASUREMENTUNIT auto_compress=true;
PUT file://foundentityspan.csv  @%FOUNDENTITYSPAN auto_compress=true;
PUT file://foundentitytype.csv  @%FOUNDENTITYTYPE auto_compress=true;
PUT file://imagelinkrelation.csv  @%IMAGELINKRELATION auto_compress=true;
PUT file://imagelinkrelationimagefinding.csv  @%IMAGELINKRELATIONIMAGEFINDING auto_compress=true;
PUT file://imagelinkrelationreference.csv  @%IMAGELINKRELATIONREFERENCE auto_compress=true;
PUT file://location.csv  @%LOCATION auto_compress=true;
PUT file://measurementrelation.csv  @%MEASUREMENTRELATION auto_compress=true;
PUT file://medicationrelation.csv  @%MEDICATIONRELATION auto_compress=true;
PUT file://medicationrelationdosage.csv  @%MEDICATIONRELATIONDOSAGE auto_compress=true;
PUT file://medicationrelationduration.csv  @%MEDICATIONRELATIONDURATION auto_compress=true;
PUT file://medicationrelationfrequency.csv  @%MEDICATIONRELATIONFREQUENCY auto_compress=true;
PUT file://medicationrelationindication.csv  @%MEDICATIONRELATIONINDICATION auto_compress=true;
PUT file://medicationrelationmode.csv  @%MEDICATIONRELATIONMODE auto_compress=true;
PUT file://medicationrelationmodifier.csv  @%MEDICATIONRELATIONMODIFIER auto_compress=true;
PUT file://medicationrelationnecessity.csv  @%MEDICATIONRELATIONNECESSITY auto_compress=true;
PUT file://medicationrelationquantity.csv  @%MEDICATIONRELATIONQUANTITY auto_compress=true;
PUT file://medicationrelationroute.csv  @%MEDICATIONRELATIONROUTE auto_compress=true;
PUT file://processingdetails.csv  @%PROCESSINGDETAILS auto_compress=true;
PUT file://qualifierrelation.csv  @%QUALIFIERRELATION auto_compress=true;
PUT file://relation.csv  @%RELATION auto_compress=true;
PUT file://sectionlocation.csv  @%SECTIONLOCATION auto_compress=true;
PUT file://sectionlocationspan.csv  @%SECTIONLOCATIONSPAN auto_compress=true;
PUT file://sentencelocation.csv  @%SENTENCELOCATION auto_compress=true;
PUT file://sentencelocationspan.csv  @%SENTENCELOCATIONSPAN auto_compress=true;
PUT file://sentencesection.csv  @%SENTENCESECTION auto_compress=true;

COPY INTO ASSUMEDENTITY file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO CONCEPT file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO DOCUMENT file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO DOCUMENTCONCEPT file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO DOCUMENTMETADATA file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO DOCUMENTSTRUCTUREDMETADATA file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO ENTITY file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO EXPERIENCERRELATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOLLOWUPRELATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOLLOWUPRELATIONPROCEDURE file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOLLOWUPRELATIONREASON file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOUNDENTITY file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOUNDENTITYCONCEPT file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOUNDENTITYLOCATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOUNDENTITYMEASUREMENTUNIT file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOUNDENTITYSPAN file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO FOUNDENTITYTYPE file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO IMAGELINKRELATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO IMAGELINKRELATIONIMAGEFINDING file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO IMAGELINKRELATIONREFERENCE file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO LOCATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEASUREMENTRELATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONDOSAGE file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONDURATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONFREQUENCY file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONINDICATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONMODE file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONMODIFIER file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONNECESSITY file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONQUANTITY file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO MEDICATIONRELATIONROUTE file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO PROCESSINGDETAILS file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO QUALIFIERRELATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO RELATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO SECTIONLOCATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO SECTIONLOCATIONSPAN file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO SENTENCELOCATION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO SENTENCELOCATIONSPAN file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';
COPY INTO SENTENCESECTION file_format = (type = csv skip_header = 1 field_optionally_enclosed_by='"', NULL_IF=('NULL')) on_error = 'skip_file';

RM @~ pattern='.*\.csv\.gz';