Skip to content

PERADIGM: Phenotype Embedding Similarity-based Rare Disease Gene Mapping

Notifications You must be signed in to change notification settings

YCSGP/PERADIGM

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PERADIGM

Phenotype Embedding Similarity-based Rare Disease Gene Mapping

This repository contains the R code supporting the analysis described in the paper:
PERADIGM: Phenotype Embedding Similarity-based Rare Disease Gene Mapping

Overview

PERADIGM is a framework that integrates phenotype embedding and patient similarity to identify rare disease-associated genes using large-scale biobank data. This repository includes code to replicate the key analyses and figures from the study.

Contents

  • main.R: Main script for the analysis, including:

    • Data loading and preprocessing
    • Running phenotype-gene association tests
    • Generating similarity matrices and embeddings
    • Outputting statistical results
  • function.R: Contains all helper functions for:

    • Embedding computation
    • Similarity scoring
    • Regression-based testing
    • Carrier/control selection

πŸ“ Repository Structure

Place your data files using the following directory structure:

data/
β”œβ”€β”€ R_doc/
β”‚   β”œβ”€β”€ hesin_diag_all_new.RData
β”‚   β”œβ”€β”€ eid_all.RData
β”‚   β”œβ”€β”€ cov_adjust.RData
β”‚   └── IC_hesin_500k.csv
β”œβ”€β”€ icd_related/
β”‚   └── ICD10_mapping.csv
β”œβ”€β”€ generate_all_gene_pos/
β”‚   └── gene_info.RData
β”œβ”€β”€ embedding/
β”‚   └── hesin_icd10_descrip_embed.txt
└── hesin_diag.txt   # Optional/redundant diagnosis file

πŸ”§ Getting Started

To reproduce the analysis:

  1. Ensure R and required packages are installed.
  2. Place the data files in the correct subfolders as shown above.
  3. Run main.R to initiate the pipeline.

About

PERADIGM: Phenotype Embedding Similarity-based Rare Disease Gene Mapping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%