|
| 1 | +--- |
| 2 | +license: apache-2.0 |
| 3 | +language: |
| 4 | + - en |
| 5 | +library_name: geno-lewm |
| 6 | +base_model: |
| 7 | + - HuggingFaceBio/Carbon-500M |
| 8 | +datasets: |
| 9 | + - abdelstark/geno-lewm-data |
| 10 | +tags: |
| 11 | + - genomics |
| 12 | + - bioinformatics |
| 13 | + - variant-effect-prediction |
| 14 | + - world-model |
| 15 | + - carbon-500m |
| 16 | + - research |
| 17 | +--- |
| 18 | + |
| 19 | +# GenoLeWM model package |
| 20 | + |
| 21 | +<p> |
| 22 | + <a href="https://huggingface.co/spaces/abdelstark/geno-lewm"><img alt="Space" src="https://img.shields.io/badge/Space-GenoLeWM-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000000"></a> |
| 23 | + <a href="https://huggingface.co/abdelstark/geno-lewm"><img alt="Model" src="https://img.shields.io/badge/Checkpoint-abdelstark%2Fgeno--lewm-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000000"></a> |
| 24 | + <a href="https://huggingface.co/abdelstark/geno-lewm-runs/tree/main/geno-lewm-v021-strong-4f36eef-10k-r1"><img alt="Run tree" src="https://img.shields.io/badge/Run%20Tree-v0.2.1-0B7285?style=for-the-badge&logo=huggingface&logoColor=ffffff"></a> |
| 25 | + <a href="https://github.com/AbdelStark/GenoLeWM"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-GenoLeWM-181717?style=for-the-badge&logo=github&logoColor=ffffff"></a> |
| 26 | +</p> |
| 27 | + |
| 28 | +GenoLeWM is an alpha research project for action-conditioned latent world |
| 29 | +models over genomic edits. This repository contains the public v0.1 model |
| 30 | +package: the trainable GenoLeWM predictor/action-encoder artifacts, calibration |
| 31 | +file, training evidence, evaluation evidence, and checksums. |
| 32 | + |
| 33 | +This is not a standard `transformers.AutoModel.from_pretrained()` package. The |
| 34 | +checkpoint is loaded by the `geno-lewm` runtime. Carbon-500M is a frozen state |
| 35 | +encoder dependency and is not bundled in this repository. |
| 36 | + |
| 37 | +## Claim Boundary |
| 38 | + |
| 39 | +Use this checkpoint as a research artifact for reproducible local scoring, |
| 40 | +artifact inspection, and method development. Do not use it for clinical |
| 41 | +diagnosis, clinical decision support, deployment readiness claims, privacy |
| 42 | +claims, or broad claims that GenoLeWM outperforms Carbon. The measured results |
| 43 | +below are narrow artifact-level evaluations. |
| 44 | + |
| 45 | +## Published Artifacts |
| 46 | + |
| 47 | +| Artifact | Location | Notes | |
| 48 | +| --- | --- | --- | |
| 49 | +| v0.1 release checkpoint | this repository | Stable public package `geno-lewm-v0.1.0-r1` | |
| 50 | +| Generated package model card | [`model_card.md`](https://huggingface.co/abdelstark/geno-lewm/blob/main/model_card.md) | Checksum-bound output from `tools.release.model_package` | |
| 51 | +| Training evidence | [`training_run_manifest.json`](https://huggingface.co/abdelstark/geno-lewm/blob/main/training_run_manifest.json), [`training_run_card.md`](https://huggingface.co/abdelstark/geno-lewm/blob/main/training_run_card.md), [`training_run_SHA256SUMS`](https://huggingface.co/abdelstark/geno-lewm/blob/main/training_run_SHA256SUMS) | Carbon-backed training run evidence | |
| 52 | +| Evaluation evidence | [`eval_metrics.json`](https://huggingface.co/abdelstark/geno-lewm/blob/main/eval_metrics.json), [`eval_report.md`](https://huggingface.co/abdelstark/geno-lewm/blob/main/eval_report.md), [`eval_config.effective.yaml`](https://huggingface.co/abdelstark/geno-lewm/blob/main/eval_config.effective.yaml) | Held-out chr21 ClinVar evaluation | |
| 53 | +| Efficiency evidence | [`efficiency_report.json`](https://huggingface.co/abdelstark/geno-lewm/blob/main/efficiency_report.json) | Release efficiency measurement | |
| 54 | +| Integrity manifest | [`SHA256SUMS`](https://huggingface.co/abdelstark/geno-lewm/blob/main/SHA256SUMS) | Package file hashes | |
| 55 | +| Interactive Space | [`abdelstark/geno-lewm`](https://huggingface.co/spaces/abdelstark/geno-lewm) | Artifact browser and checkpoint-backed scoring UI | |
| 56 | +| Dataset package | [`abdelstark/geno-lewm-data`](https://huggingface.co/datasets/abdelstark/geno-lewm-data) | Public data snapshot and data card | |
| 57 | +| v0.2.1 run tree | [`abdelstark/geno-lewm-runs`](https://huggingface.co/abdelstark/geno-lewm-runs/tree/main/geno-lewm-v021-strong-4f36eef-10k-r1) | Newer benchmark/demo checkpoint and result artifacts | |
| 58 | + |
| 59 | +The generated `model_card.md` in this repository is intentionally terse because |
| 60 | +it is part of the checksum-bound release package. This top-level card is the |
| 61 | +human-facing Hugging Face model documentation. |
| 62 | + |
| 63 | +## Model Identity |
| 64 | + |
| 65 | +| Field | Value | |
| 66 | +| --- | --- | |
| 67 | +| Release id | `geno-lewm-v0.1.0-r1` | |
| 68 | +| Model version | `0.1.0` | |
| 69 | +| Manifest id | `sha256:861ec142cc87f3fac01751ef538553356dfba439e6da99064b4adb121e75c215` | |
| 70 | +| Predictor artifact | `predictor.safetensors` | |
| 71 | +| Predictor hash | `sha256:6642c604a1352727969c86664f291fd6d2193c1c65bc6f9baf9b716469c52731` | |
| 72 | +| Action encoder hash | `sha256:8b2311d768855ab440b26dbbef5ddbda252cc8bb2c69509d28fa4bcf8eff025a` | |
| 73 | +| Calibration hash | `sha256:d4cf4778ac8e5557d363aca43cd13723b0ed9983b83215ab164d2b642b886201` | |
| 74 | +| Frozen encoder | Carbon-500M, mounted as `/carbon` in release jobs | |
| 75 | +| Encoder revision | `5d31d59b3c845b288a13aedb1358934196852eec` | |
| 76 | +| Dataset snapshot | `geno-lewm-data-v0.1.0-r1` | |
| 77 | + |
| 78 | +The newer Space default checkpoint is separate: |
| 79 | +`geno-lewm-v0.2.1-r1` in |
| 80 | +[`geno-lewm-v021-strong-4f36eef-10k-r1/suite/model`](https://huggingface.co/abdelstark/geno-lewm-runs/tree/main/geno-lewm-v021-strong-4f36eef-10k-r1/suite/model). |
| 81 | +It is published as run-tree evidence, not as a replacement for this stable v0.1 |
| 82 | +model package. |
| 83 | + |
| 84 | +## Training Summary |
| 85 | + |
| 86 | +The v0.1 checkpoint was trained as a JEPA-style predictor over frozen |
| 87 | +Carbon-500M latent states. |
| 88 | + |
| 89 | +| Field | Value | |
| 90 | +| --- | --- | |
| 91 | +| Run id | `first-snv-carbon-500m-r1` | |
| 92 | +| Config | `training_config.effective.yaml` | |
| 93 | +| Commit | `cd2bfccb33ec5a2df3c4707e8be8443f4682dad3` | |
| 94 | +| Samples | 160,000 | |
| 95 | +| Steps | 20,000 | |
| 96 | +| Final training loss | 0.36124 | |
| 97 | +| Status | completed | |
| 98 | + |
| 99 | +## v0.1 Evaluation |
| 100 | + |
| 101 | +Held-out ClinVar GRCh38 chr21, binary P/LP versus B/LB labels. Scores use |
| 102 | +`sigma_raw`; intervals are deterministic stratified bootstrap confidence |
| 103 | +intervals from `eval_metrics.json`. |
| 104 | + |
| 105 | +| Split | N | Positives | Negatives | Metric | Value | 95% CI | |
| 106 | +| --- | ---: | ---: | ---: | --- | ---: | --- | |
| 107 | +| `eval_clinvar_chr21` | 3,000 | 494 | 2,506 | AUROC | 0.519160 | 0.491366 to 0.546846 | |
| 108 | +| `eval_clinvar_chr21` | 3,000 | 494 | 2,506 | Average precision | 0.165174 | 0.155331 to 0.177035 | |
| 109 | +| `eval_clinvar_chr21` | 3,000 | 494 | 2,506 | Balanced accuracy at 0.5 | 0.500000 | 0.500000 to 0.500000 | |
| 110 | +| `eval_clinvar_chr21` | 3,000 | 494 | 2,506 | Accuracy at 0.5 | 0.164667 | 0.164667 to 0.164667 | |
| 111 | + |
| 112 | +Negative finding: this v0.1 slice does not establish useful clinical |
| 113 | +performance, non-coding performance, multi-edit behavior, or superiority over |
| 114 | +Carbon. |
| 115 | + |
| 116 | +## v0.1 Efficiency |
| 117 | + |
| 118 | +Measured by `tools.release.efficiency_report` on `cuda:NVIDIA H200`. |
| 119 | + |
| 120 | +| Measurement | Value | |
| 121 | +| --- | ---: | |
| 122 | +| Single-variant latency | 494.056 ms | |
| 123 | +| Batched throughput | 2.024 variants/s | |
| 124 | +| Peak memory | 1,152,656,384 bytes | |
| 125 | + |
| 126 | +## v0.2.1 Run-Tree Benchmark Evidence |
| 127 | + |
| 128 | +The Space also exposes the newer `geno-lewm-v0.2.1-r1` checkpoint from the run |
| 129 | +tree. Its benchmark suite is broader than v0.1 and includes Carbon zero-shot |
| 130 | +comparisons, but the results are mixed and mostly negative relative to Carbon on |
| 131 | +the measured slices. |
| 132 | + |
| 133 | +| Slice | N | Metric | GenoLeWM | Baseline | Delta | |
| 134 | +| --- | ---: | --- | ---: | ---: | ---: | |
| 135 | +| ClinVar coding | 16 | AUROC | 0.734375 | 0.921875 | -0.187500 | |
| 136 | +| ClinVar coding | 16 | Average precision | 0.852976 | 0.951923 | -0.098947 | |
| 137 | +| ClinVar coding | 16 | Balanced accuracy | 0.750000 | 0.687500 | +0.062500 | |
| 138 | +| ClinVar non-coding | 16 | AUROC | 0.562500 | 0.875000 | -0.312500 | |
| 139 | +| ClinVar non-coding | 16 | Average precision | 0.605456 | 0.914423 | -0.308967 | |
| 140 | +| ClinVar non-coding | 16 | Balanced accuracy | 0.437500 | 0.687500 | -0.250000 | |
| 141 | +| BRCA2 saturation | 32 | Spearman rho | 0.149194 | 0.476906 | -0.327713 | |
| 142 | +| TraitGym Mendelian | 32 | Spearman rho | -0.027965 | -0.083894 | +0.055929 | |
| 143 | +| Phased-haplotype rollout | 8 | Cosine mean | 0.288861 | 0.997831 | -0.708970 | |
| 144 | +| Synthetic edit-chain rollout | 8 | Cosine mean | 0.301608 | 0.991240 | -0.689631 | |
| 145 | + |
| 146 | +The v0.2.1 readiness report is `ok=true` for artifact coverage and provenance. |
| 147 | +That is not a model-quality success claim. The rollout speed report is |
| 148 | +`ok=false`: k=5 measured 2.41x speedup against a 2x target, while k=20 measured |
| 149 | +2.47x against a 5x target and missed the target. |
| 150 | + |
| 151 | +The v0.2.1 efficiency report measured one sample with no warmup on |
| 152 | +`cuda:NVIDIA H200`: 115,262.94 ms single-variant latency, 0.3095 variants/s |
| 153 | +throughput, and 1,966,149,632 bytes peak memory. Treat that as run evidence, not |
| 154 | +a production serving benchmark. |
| 155 | + |
| 156 | +## Loading Artifacts |
| 157 | + |
| 158 | +Install the package: |
| 159 | + |
| 160 | +```bash |
| 161 | +python -m pip install "geno-lewm[train,eval]==0.2.1" |
| 162 | +``` |
| 163 | + |
| 164 | +Download the v0.1 model package: |
| 165 | + |
| 166 | +```python |
| 167 | +from huggingface_hub import snapshot_download |
| 168 | + |
| 169 | +model_dir = snapshot_download("abdelstark/geno-lewm") |
| 170 | +``` |
| 171 | + |
| 172 | +Download the v0.2.1 run-tree model artifacts: |
| 173 | + |
| 174 | +```python |
| 175 | +from huggingface_hub import snapshot_download |
| 176 | + |
| 177 | +run_dir = snapshot_download( |
| 178 | + "abdelstark/geno-lewm-runs", |
| 179 | + allow_patterns="geno-lewm-v021-strong-4f36eef-10k-r1/suite/model/*", |
| 180 | +) |
| 181 | +``` |
| 182 | + |
| 183 | +For scoring, Carbon-500M must also be available. The release manifests record |
| 184 | +the encoder as `/carbon` because training, evaluation, and demo jobs mounted |
| 185 | +`HuggingFaceBio/Carbon-500M` there at revision |
| 186 | +`5d31d59b3c845b288a13aedb1358934196852eec`. The Space can resolve and remap |
| 187 | +that encoder from the Hub before scoring. |
| 188 | + |
| 189 | +Example single-variant invocation once the model directory and Carbon encoder |
| 190 | +are available: |
| 191 | + |
| 192 | +```bash |
| 193 | +geno-lewm-score \ |
| 194 | + --model-dir "$MODEL_DIR" \ |
| 195 | + --backend auto \ |
| 196 | + --variant chrSynthetic:3073:A:T \ |
| 197 | + --window ACGTACGTACGTACGT \ |
| 198 | + --window-start-bp 3064 \ |
| 199 | + --receipt receipt.json |
| 200 | +``` |
| 201 | + |
| 202 | +The `REF` allele in `--variant` must match the supplied reference window at the |
| 203 | +variant locus. If it does not, scoring fails before model inference. |
| 204 | + |
| 205 | +## Limitations |
| 206 | + |
| 207 | +- Alpha research checkpoint; not a clinical, diagnostic, or deployment model. |
| 208 | +- v0.1 evaluation is narrow: held-out chr21 ClinVar P/LP versus B/LB labels. |
| 209 | +- v0.2.1 benchmark evidence is broader but mixed, with multiple negative deltas |
| 210 | + versus Carbon zero-shot and source-state rollout baselines. |
| 211 | +- Carbon-500M is required at runtime and is resolved separately from this model |
| 212 | + package. |
| 213 | +- Calibration is proof-scale and should be interpreted only within the reported |
| 214 | + artifact context. |
| 215 | +- Fixture outputs and UI demos are not model-quality evidence. |
| 216 | + |
| 217 | +## License |
| 218 | + |
| 219 | +Apache-2.0. |
0 commit comments