Skip to content

Snowflake-Labs/sfguide-getting-started-with-adam-and-sdtm-in-snowpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snowflake - Certified

Using ADaM/SDTM Data in Snowpark

Description

Clinical data is often handled in specific formats such as ADaM (Analysis Data Model), which includes standard datasets such as ADSL for subject-level data, ADAE for adverse events, ADVS for vital signs, etc. These are derived from the SDTM (Study Data Tabulation Model), which organizes clinical trial data into a standardized structure. These are commonly stored in SAS, XPT (SAS Transport), or standardized JSON-format files.

Reading these formats directly in a Snowpark Notebook in Snowflake is useful because it allows users to process clinical trial data without moving it out of Snowflake. This demo contains notebooks that show how to read clinical dataset using a Snowpark notebook, write the data to Snowflake tables, and query the data from a view that combines various clinical attributed.

Repository Structure

├── README.md                   # This file
├── LEGAL.md                    # Legal notice
├── LICENSE                     # License information
├── notebooks/                  # SQL and configuration files
│   ├── READ_SAS_FILES.ipynb    # notebook to read SAS files
│   ├── READ_XPT_FILES.ipynb    # notebook to read XPT files
│   ├── environment.yml         # environment config
├── scripts/                    # SQL and configuration files
│   ├── setup.sql               # Database setup script
│   ├── teardown.sql            # Cleanup script

📊 Data Sources

This demo works with ADaM and SDTM clinical trial datasets from external repositories:

  • SAS Format (.sas7bdat): r4csr ADaM datasets

    • Provides standardized ADaM datasets in SAS binary format
    • Includes datasets like ADSL (Subject Level), ADAE (Adverse Events), ADVS (Vital Signs), etc.
  • XPT/JSON Format (.xpt, .json): CDISC Pilot Project

    • Official CDISC pilot submission package
    • Contains datasets in both SAS Transport (XPT) and JSON formats
    • Follows CDISC standards for clinical data interchange

📋 Getting Started

Prerequisites

  • Snowflake Account - Active account with appropriate privileges
  • SQL Knowledge - Basic understanding of SQL syntax and concepts
  • Python Knowledge - Basic understanding of Python syntax and concepts

Setup Instructions

Follow these three steps to get the demo running in your Snowflake environment:

Step 1: Run the Setup Script

  1. Open Snowsight - Navigate to your Snowflake account via Snowsight (https://app.snowflake.com)
  2. Create a new SQL worksheet:
    • Click + CreateSQL Worksheet in the top navigation
  3. Execute the setup script:
    • Copy the contents of scripts/setup.sql into your worksheet
    • Run the entire script by clicking Run All or using Ctrl+Shift+Enter
    • This creates all necessary objects: database, schema, warehouse, role, and stage

Step 2: Download and Upload Data to the Stage

  1. Download data files from external sources:

  2. Navigate to the Database:

    • In Snowsight, go to DataDatabasesADAM_SDTM_SNOWPARK_DBANALYTICS
  3. Access the Stage:

    • Click on Stages in the left sidebar
    • Select ADAM_STAGE from the list
  4. Upload your data files:

    • Click + Files or Upload Files
    • Upload your clinical data files in the following structure:
      json_datasets/
      ├── adsl.json
      ├── adae.json
      ├── advs.json
      ├── adlbc.json
      ├── adlbh.json
      ├── adlbhy.json
      ├── adqsadas.json
      ├── adqscibc.json
      ├── adqsnpix.json
      └── adtte.json
      
      sas_datasets/
      ├── adsl.sas7bdat
      ├── adae.sas7bdat
      ├── advs.sas7bdat
      ├── adlbc.sas7bdat
      ├── adlbh.sas7bdat
      ├── adlbhy.sas7bdat
      ├── adqsadas.sas7bdat
      ├── adqscibc.sas7bdat
      ├── adqsnpix.sas7bdat
      └── adtte.sas7bdat
      
      xpt_datasets/
      ├── adsl.xpt
      ├── adae.xpt
      ├── advs.xpt
      ├── adlbc.xpt
      ├── adlbh.xpt
      ├── adlbhy.xpt
      ├── adqsadas.xpt
      ├── adqscibc.xpt
      ├── adqsnpix.xpt
      └── adtte.xpt
      
    • Organize files into respective folders (json_datasets, sas_datasets, xpt_datasets)

Step 3: Upload and Run Notebooks

  1. Navigate to Notebooks:
    • In Snowsight, go to ProjectsNotebooks in the left navigation
  2. Import the first notebook:
    • Click + NotebookImport from file
    • Select and upload notebooks/READ_SAS_FILES.ipynb
    • Set the notebook properties:
      • Database: ADAM_SDTM_SNOWPARK_DB
      • Schema: ANALYTICS
      • Warehouse: ADAM_SDTM_SNOWPARK_WH
  3. Import the second notebook:
  4. Configure the environment:
    • In each notebook, ensure the environment is set up with the packages from notebooks/environment.yml
    • The notebooks will automatically install required packages like pandas, snowpark, etc.
  5. Run the notebooks:
    • Execute each cell sequentially using Shift+Enter
    • Or run all cells using Run All from the notebook toolbar
    • The notebooks will read the data from the stage and create tables in Snowflake

Verification

After completing all steps, verify your setup:

  • Check that tables were created: DataDatabasesADAM_SDTM_SNOWPARK_DBANALYTICSTables
  • Confirm data was loaded by querying the created tables
  • Review any views that were created for combined clinical data analysis

Conclusion

This demo showcases how Snowflake's powerful data platform can seamlessly handle clinical trial data in multiple formats (SAS, XPT, JSON) without requiring data movement outside the platform. By leveraging Snowpark notebooks, clinical researchers and data scientists can efficiently process ADaM and SDTM datasets, perform analytics, and build insights directly within Snowflake's secure, scalable environment.

The combination of Snowflake's native support for diverse data formats and Snowpark's Python capabilities provides a robust foundation for clinical data analysis, enabling teams to focus on deriving insights rather than managing complex data pipelines.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published