Clinical data is often handled in specific formats such as ADaM (Analysis Data Model), which includes standard datasets such as ADSL for subject-level data, ADAE for adverse events, ADVS for vital signs, etc. These are derived from the SDTM (Study Data Tabulation Model), which organizes clinical trial data into a standardized structure. These are commonly stored in SAS, XPT (SAS Transport), or standardized JSON-format files.
Reading these formats directly in a Snowpark Notebook in Snowflake is useful because it allows users to process clinical trial data without moving it out of Snowflake. This demo contains notebooks that show how to read clinical dataset using a Snowpark notebook, write the data to Snowflake tables, and query the data from a view that combines various clinical attributed.
├── README.md # This file
├── LEGAL.md # Legal notice
├── LICENSE # License information
├── notebooks/ # SQL and configuration files
│ ├── READ_SAS_FILES.ipynb # notebook to read SAS files
│ ├── READ_XPT_FILES.ipynb # notebook to read XPT files
│ ├── environment.yml # environment config
├── scripts/ # SQL and configuration files
│ ├── setup.sql # Database setup script
│ ├── teardown.sql # Cleanup script
This demo works with ADaM and SDTM clinical trial datasets from external repositories:
-
SAS Format (.sas7bdat): r4csr ADaM datasets
- Provides standardized ADaM datasets in SAS binary format
- Includes datasets like ADSL (Subject Level), ADAE (Adverse Events), ADVS (Vital Signs), etc.
-
XPT/JSON Format (.xpt, .json): CDISC Pilot Project
- Official CDISC pilot submission package
- Contains datasets in both SAS Transport (XPT) and JSON formats
- Follows CDISC standards for clinical data interchange
- Snowflake Account - Active account with appropriate privileges
- SQL Knowledge - Basic understanding of SQL syntax and concepts
- Python Knowledge - Basic understanding of Python syntax and concepts
Follow these three steps to get the demo running in your Snowflake environment:
- Open Snowsight - Navigate to your Snowflake account via Snowsight (https://app.snowflake.com)
- Create a new SQL worksheet:
- Click + Create → SQL Worksheet in the top navigation
- Execute the setup script:
- Copy the contents of
scripts/setup.sql
into your worksheet - Run the entire script by clicking Run All or using Ctrl+Shift+Enter
- This creates all necessary objects: database, schema, warehouse, role, and stage
- Copy the contents of
-
Download data files from external sources:
- SAS files: Download from r4csr ADaM datasets
- XPT/JSON files: Download from CDISC Pilot Project
-
Navigate to the Database:
- In Snowsight, go to Data → Databases → ADAM_SDTM_SNOWPARK_DB → ANALYTICS
-
Access the Stage:
- Click on Stages in the left sidebar
- Select ADAM_STAGE from the list
-
Upload your data files:
- Click + Files or Upload Files
- Upload your clinical data files in the following structure:
json_datasets/ ├── adsl.json ├── adae.json ├── advs.json ├── adlbc.json ├── adlbh.json ├── adlbhy.json ├── adqsadas.json ├── adqscibc.json ├── adqsnpix.json └── adtte.json sas_datasets/ ├── adsl.sas7bdat ├── adae.sas7bdat ├── advs.sas7bdat ├── adlbc.sas7bdat ├── adlbh.sas7bdat ├── adlbhy.sas7bdat ├── adqsadas.sas7bdat ├── adqscibc.sas7bdat ├── adqsnpix.sas7bdat └── adtte.sas7bdat xpt_datasets/ ├── adsl.xpt ├── adae.xpt ├── advs.xpt ├── adlbc.xpt ├── adlbh.xpt ├── adlbhy.xpt ├── adqsadas.xpt ├── adqscibc.xpt ├── adqsnpix.xpt └── adtte.xpt
- Organize files into respective folders (json_datasets, sas_datasets, xpt_datasets)
- Navigate to Notebooks:
- In Snowsight, go to Projects → Notebooks in the left navigation
- Import the first notebook:
- Click + Notebook → Import from file
- Select and upload
notebooks/READ_SAS_FILES.ipynb
- Set the notebook properties:
- Database:
ADAM_SDTM_SNOWPARK_DB
- Schema:
ANALYTICS
- Warehouse:
ADAM_SDTM_SNOWPARK_WH
- Database:
- Import the second notebook:
- Repeat the process for
notebooks/READ_XPT_FILES.ipynb
- Use the same database, schema, and warehouse settings
- Repeat the process for
- Configure the environment:
- In each notebook, ensure the environment is set up with the packages from
notebooks/environment.yml
- The notebooks will automatically install required packages like
pandas
,snowpark
, etc.
- In each notebook, ensure the environment is set up with the packages from
- Run the notebooks:
- Execute each cell sequentially using Shift+Enter
- Or run all cells using Run All from the notebook toolbar
- The notebooks will read the data from the stage and create tables in Snowflake
After completing all steps, verify your setup:
- Check that tables were created: Data → Databases → ADAM_SDTM_SNOWPARK_DB → ANALYTICS → Tables
- Confirm data was loaded by querying the created tables
- Review any views that were created for combined clinical data analysis
This demo showcases how Snowflake's powerful data platform can seamlessly handle clinical trial data in multiple formats (SAS, XPT, JSON) without requiring data movement outside the platform. By leveraging Snowpark notebooks, clinical researchers and data scientists can efficiently process ADaM and SDTM datasets, perform analytics, and build insights directly within Snowflake's secure, scalable environment.
The combination of Snowflake's native support for diverse data formats and Snowpark's Python capabilities provides a robust foundation for clinical data analysis, enabling teams to focus on deriving insights rather than managing complex data pipelines.