Install dependencies with uv:
uv venv --python 3.13
source .venv/bin/activate
uv pip install -r requirements.txtAdditionally, conda is required to run the ldsc step.
Optional (recommended): download the processed data (9.1G, feel free to only download a subset):
- S-LDSC reference files, originally in (https://zenodo.org/records/10515792) but packaged here as well for your convenience.
- Models
- Traits
- Results (models x traits)
mkdir -p results
uv run hf download songlab/ldsc --repo-type dataset --local-dir results/
cd results
tar -xf output.tarTake a look at workflow/Snakefile rule all for example targets.
The first one will run S-LDSC on one model on one trait.
uv run snakemake --cores all --use-conda# Snakemake sometimes gets confused about which files it needs to rerun and this forces
# not to rerun any existing file
uv run snakemake --cores all --touch
# to output an execution plan
uv run snakemake --cores all --dry-runLDSC jobs (e.g. running model X on trait Y) is by default parallelized as 1 job per core.
This works as long as you have enough memory (e.g. when asking for a complete savio3_htc or savio4_htc node).
When running on a node with less memory, should reduce the parallelization of the ld_score and run_ldsc_annot rules in workflow/rules/ldsc.smk.
To add a new model {model}, place a parquet file with column score in either of these locations:
-
results/variant_scores/{model}.parquetVariants should be in the standard S-LDSC order (e.g. see the 9,997,231 variants in https://huggingface.co/datasets/songlab/ldsc)
-
results/features/{model}.parquetThis corresponds to the variants above with
pos!= -1.pos== -1 are a tiny fraction where we were not able to liftover fromhg19tohg38. This complexity is because S-LDSC useshg19but we do most our work with newer annotations inhg38.
To add a new trait {trait}, place it under results/sumstats_107/{trait}.sumstats.gz (please check format of other traits).