Using ADaM/SDTM Data in Snowpark

Description

Clinical data is often handled in specific formats such as ADaM (Analysis Data Model), which includes standard datasets such as ADSL for subject-level data, ADAE for adverse events, ADVS for vital signs, etc. These are derived from the SDTM (Study Data Tabulation Model), which organizes clinical trial data into a standardized structure. These are commonly stored in SAS, XPT (SAS Transport), or standardized JSON-format files.

Reading these formats directly in a Snowpark Notebook in Snowflake is useful because it allows users to process clinical trial data without moving it out of Snowflake. This demo contains notebooks that show how to read clinical dataset using a Snowpark notebook, write the data to Snowflake tables, and query the data from a view that combines various clinical attributed.

Repository Structure

├── README.md                   # This file
├── LEGAL.md                    # Legal notice
├── LICENSE                     # License information
├── notebooks/                  # SQL and configuration files
│   ├── READ_SAS_FILES.ipynb    # notebook to read SAS files
│   ├── READ_XPT_FILES.ipynb    # notebook to read XPT files
│   ├── environment.yml         # environment config
├── scripts/                    # SQL and configuration files
│   ├── setup.sql               # Database setup script
│   ├── teardown.sql            # Cleanup script

📊 Data Sources

This demo works with ADaM and SDTM clinical trial datasets from external repositories:

SAS Format (.sas7bdat): r4csr ADaM datasets
- Provides standardized ADaM datasets in SAS binary format
- Includes datasets like ADSL (Subject Level), ADAE (Adverse Events), ADVS (Vital Signs), etc.
XPT/JSON Format (.xpt, .json): CDISC Pilot Project
- Official CDISC pilot submission package
- Contains datasets in both SAS Transport (XPT) and JSON formats
- Follows CDISC standards for clinical data interchange

📋 Getting Started

Prerequisites

Snowflake Account - Active account with appropriate privileges
SQL Knowledge - Basic understanding of SQL syntax and concepts
Python Knowledge - Basic understanding of Python syntax and concepts

Setup Instructions

Follow these three steps to get the demo running in your Snowflake environment:

Step 1: Run the Setup Script

Open Snowsight - Navigate to your Snowflake account via Snowsight (https://app.snowflake.com)
Create a new SQL worksheet:
- Click + Create → SQL Worksheet in the top navigation
Execute the setup script:
- Copy the contents of scripts/setup.sql into your worksheet
- Run the entire script by clicking Run All or using Ctrl+Shift+Enter
- This creates all necessary objects: database, schema, warehouse, role, and stage

Step 2: Download and Upload Data to the Stage

Download data files from external sources:
- SAS files: Download from r4csr ADaM datasets
- XPT/JSON files: Download from CDISC Pilot Project
Navigate to the Database:
- In Snowsight, go to Data → Databases → ADAM_SDTM_SNOWPARK_DB → ANALYTICS
Access the Stage:
- Click on Stages in the left sidebar
- Select ADAM_STAGE from the list

Upload your data files:

Click + Files or Upload Files

Upload your clinical data files in the following structure:

json_datasets/
├── adsl.json
├── adae.json
├── advs.json
├── adlbc.json
├── adlbh.json
├── adlbhy.json
├── adqsadas.json
├── adqscibc.json
├── adqsnpix.json
└── adtte.json

sas_datasets/
├── adsl.sas7bdat
├── adae.sas7bdat
├── advs.sas7bdat
├── adlbc.sas7bdat
├── adlbh.sas7bdat
├── adlbhy.sas7bdat
├── adqsadas.sas7bdat
├── adqscibc.sas7bdat
├── adqsnpix.sas7bdat
└── adtte.sas7bdat

xpt_datasets/
├── adsl.xpt
├── adae.xpt
├── advs.xpt
├── adlbc.xpt
├── adlbh.xpt
├── adlbhy.xpt
├── adqsadas.xpt
├── adqscibc.xpt
├── adqsnpix.xpt
└── adtte.xpt

Organize files into respective folders (json_datasets, sas_datasets, xpt_datasets)

Step 3: Upload and Run Notebooks

Navigate to Notebooks:
- In Snowsight, go to Projects → Notebooks in the left navigation
Import the first notebook:
- Click + Notebook → Import from file
- Select and upload notebooks/READ_SAS_FILES.ipynb
- Set the notebook properties:
  - Database: ADAM_SDTM_SNOWPARK_DB
  - Schema: ANALYTICS
  - Warehouse: ADAM_SDTM_SNOWPARK_WH
Import the second notebook:
- Repeat the process for notebooks/READ_XPT_FILES.ipynb
- Use the same database, schema, and warehouse settings
Configure the environment:
- In each notebook, ensure the environment is set up with the packages from notebooks/environment.yml
- The notebooks will automatically install required packages like pandas, snowpark, etc.
Run the notebooks:
- Execute each cell sequentially using Shift+Enter
- Or run all cells using Run All from the notebook toolbar
- The notebooks will read the data from the stage and create tables in Snowflake

Verification

After completing all steps, verify your setup:

Check that tables were created: Data → Databases → ADAM_SDTM_SNOWPARK_DB → ANALYTICS → Tables
Confirm data was loaded by querying the created tables
Review any views that were created for combined clinical data analysis

Conclusion

This demo showcases how Snowflake's powerful data platform can seamlessly handle clinical trial data in multiple formats (SAS, XPT, JSON) without requiring data movement outside the platform. By leveraging Snowpark notebooks, clinical researchers and data scientists can efficiently process ADaM and SDTM datasets, perform analytics, and build insights directly within Snowflake's secure, scalable environment.

The combination of Snowflake's native support for diverse data formats and Snowpark's Python capabilities provides a robust foundation for clinical data analysis, enabling teams to focus on deriving insights rather than managing complex data pipelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Using ADaM/SDTM Data in Snowpark

Description

Repository Structure

📊 Data Sources

📋 Getting Started

Prerequisites

Setup Instructions

Step 1: Run the Setup Script

Step 2: Download and Upload Data to the Stage

Step 3: Upload and Run Notebooks

Verification

Conclusion

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md

License

Snowflake-Labs/sfguide-getting-started-with-adam-and-sdtm-in-snowpark

Folders and files

Latest commit

History

Repository files navigation

Using ADaM/SDTM Data in Snowpark

Description

Repository Structure

📊 Data Sources

📋 Getting Started

Prerequisites

Setup Instructions

Step 1: Run the Setup Script

Step 2: Download and Upload Data to the Stage

Step 3: Upload and Run Notebooks

Verification

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages