Skip to content

clay-good/alleleforge

Repository files navigation

AlleleForge

Variant in, corrective edit out.

A variant-driven, multi-modality, uncertainty-aware CRISPR guide & edit design framework — across SpCas9 nuclease, base editors, and prime editors, with population-aware off-target nomination and a public benchmark.

CI Python License: MIT Typed: mypy strict Code style: ruff


Warning

AlleleForge is a research tool. It is not a medical device and does not provide medical advice. It produces ranked, explicitly uncertain design hypotheses. Every off-target nomination it makes is computational and must be experimentally validated before any wet-lab or therapeutic use. See Scope & responsible use.


Why AlleleForge

Most monogenic disease is, in effect, a copy-paste error at the allele level. The job of a genome editor is to forge the corrective edit. Today that job is fragmented across a dozen single-purpose tools — one to pick a guide, another to predict efficiency, a third to enumerate prime-editing extensions, a fourth to scan for off-targets — none of which speak the same language and few of which agree on what "uncertain" means.

AlleleForge unifies the journey behind one typed interface: you supply a variant, it returns a ranked, safety-annotated menu of candidate edits spanning every applicable modality, each carrying a calibrated uncertainty interval, a predicted edit outcome, and a population- and haplotype-aware off-target report.

The four-axis gap it fills

For prime editing in particular, no existing open-source tool combines all four of:

Axis PRIDICT2.0 PrimeDesign / PrimeVar CRISPRme AlleleForge
Therapeutic variant front-end
ML efficiency with calibrated uncertainty
Outcome / byproduct prediction partial
Population-aware off-target

AlleleForge's contribution is to wrap the best existing models (PRIDICT2.0, BE-Hive, BE-DICT, inDelphi, Cas-OFFinder, …) behind a unified, typed, uncertainty-honest interface and add value at the seams.


Design principles

  1. Variant-first. The canonical journey starts from what is broken, not from a guide.
  2. Honest uncertainty. Every numeric prediction ships with a calibrated interval. No scorer returns a bare float.
  3. Population-aware by default. Reference-only off-target analysis is a known safety gap (the Casgevy / BCL11A rs114518452 case is the canonical cautionary tale). AlleleForge searches population variation by default.
  4. Wrap, don't rebuild. Integrate proven tools; add new ML only at genuine coverage gaps.
  5. Reproducible to the byte. Pinned environments, versioned datasets, deterministic seeds, content-hashed checkpoints.
  6. Three audiences, one core. The library is the source of truth; CLI and web are thin shells over it.
  7. Typed and tested. mypy --strict, ruff, and Hypothesis property tests on all core logic.
  8. Cite everything. Every dataset, model, and scoring function carries a literature citation and a version.

Architecture

AlleleForge is strictly layered: lower layers know nothing about higher ones. The Designer is the only component that sees the whole pipeline; every domain service is independently testable and usable.

flowchart TB
    subgraph I["Interfaces"]
        PY["Python library"]
        CLI["aforge CLI"]
        WEB["Web UI (FastAPI + Next.js)"]
    end
    subgraph O["Orchestration"]
        DES["Designer: variant → routing → candidates → score → outcome → off-target → rank → report"]
    end
    subgraph D["Domain services"]
        VR["Variant resolver<br/>(HGVS, ClinVar)"]
        EN["Guide enumerators<br/>(cas9, base, prime)"]
        SC["Scoring<br/>(efficiency, outcome, uncertainty)"]
        OT["Off-target engine<br/>(population / haplotype)"]
    end
    subgraph F["Foundations"]
        GA["Genome access<br/>(FASTA, FM-index)"]
        DR["Data registry<br/>(DVC, gnomAD, ClinVar)"]
        MZ["Model zoo<br/>(ckpt hashing)"]
        CT["Core types and schemas"]
    end
    RUST["Rust / PyO3 — aforge_native: BWT off-target search · k-mer hashing · haplotype walking"]

    I --> O --> D --> F
    OT -.calls.-> RUST
    EN -.calls.-> RUST
Loading

The variant-first journey

sequenceDiagram
    actor U as User
    participant R as Resolver
    participant Rt as Router
    participant E as Enumerators
    participant S as Scorers
    participant X as Off-target engine
    participant K as Ranker

    U->>R: ClinVar / rsID / HGVS / VCF / coords
    R->>Rt: normalized Variant + consequence
    Rt->>E: eligible modalities (nuclease / base / prime)
    E->>S: candidate guides and pegRNAs
    S->>S: efficiency + outcome (calibrated Prediction)
    E->>X: spacers / nicks
    X->>X: reference → population → haplotype → patient VCF
    S->>K: scored candidates
    X->>K: ancestry-stratified off-target reports
    K-->>U: RankedMenu (+ Pareto front, provenance, disclaimer)
Loading

Build status & roadmap

AlleleForge is built in ordered phases (see SPEC.md, the authoritative build contract). Phases 0–5 establish the spine before any modality or ML code.

Phase Component Status
0 Repo bootstrap, CI, packaging, Rust toolchain ✅ done
1 Core domain types & schemas (types/) ✅ done
2 Genome access & indexing (genome/) ✅ done
3 Data registry & population datasets (data/) ✅ done
4 Variant resolver (variant/) ✅ done
5 Off-target engine — population & haplotype aware (offtarget/) ✅ done
6 Scoring foundations: model zoo, embeddings, uncertainty (scoring/, model_zoo/) ✅ done
7 Chemistry: SpCas9 nuclease (enumerate/, scoring/, design/) ✅ done
8 Chemistry: base editing — ABE / CBE (enumerate/, scoring/, design/) ✅ done
9 Chemistry: prime editing ⏳ next
10 Designer: routing, candidate menu, ranking ◻️ planned
11 Reporting & oligo output ◻️ planned
12 CLI (aforge) ◻️ planned
13 Web UI & API ◻️ planned
14 CRISPR-Bench: benchmark, splits, leaderboard ◻️ planned
15 Docs, examples, release ◻️ planned

Install

AlleleForge targets Python ≥ 3.11. The core install is deliberately light; heavy scientific, ML, and web stacks live in optional dependency groups so the base package installs fast and CI stays reliable.

# Core library (light: pydantic types, config, model-card parsing — no torch/numpy)
pip install alleleforge            # once published to PyPI

# From source, with the optional groups you need
git clone https://github.com/clay-good/alleleforge
cd alleleforge
pip install -e ".[core,genome,variant,ml,dev]"

Optional dependency groups

Group Pulls in Needed for
core polars, pyarrow, numpy tabular I/O
genome pyfaidx, pysam, cyvcf2, mappy, pyliftover reference access, indexing (Phase 2)
variant hgvs HGVS resolution (Phase 4)
ml torch, transformers, lightning, scikit-learn real embedding backbones (Phase 6+); the uncertainty core needs none of these
web fastapi, uvicorn API server (Phase 13)
docs mkdocs-material, mkdocstrings documentation site
dev ruff, mypy, pytest, hypothesis, maturin development

Native acceleration (optional)

The performance kernels live in a PyO3 crate built with maturin. AlleleForge imports and runs cleanly without it (pure-Python mode); build it for speed:

pip install maturin
cd rust && maturin develop --release      # builds & installs aforge_native

alleleforge._native.NATIVE_AVAILABLE reports whether the compiled extension is present.


Quickstart

The end-to-end design pipeline lands incrementally across the modality phases (6–12). Today the package exposes the core vocabulary, genome access, the data registry, and the variant resolver — the entire front half of the variant-first journey. The snippets below work now; the full design() call arrives as the modality phases complete.

from alleleforge.types import DNASequence, Prediction, UncertaintyMethod

seq = DNASequence("ACGTRYN")           # validates IUPAC alphabet
print(seq.reverse_complement())        # ambiguity-aware: R↔Y, N↔N → "NRYACGT"

# Every numeric prediction carries a calibrated interval, never a bare float.
p = Prediction(value=0.72, interval=(0.61, 0.83), method=UncertaintyMethod.ENSEMBLE,
               in_distribution=True, calibrated=True)
print(p.interval_level)                # 0.80 by default

Resolve a variant — every input form normalizes to one canonical, left-aligned record:

from alleleforge.variant import resolve, RawTarget
from alleleforge.types import DNASequence

# A raw target sequence with a marked edit — no reference file needed.
rv = resolve(RawTarget(sequence=DNASequence("ACGTAACGTACGT"), position=4, ref="A", alt="G"))
print(rv.variant)            # target:4:A>G
print(rv.working_interval)   # 0-based half-open analysis window around it

# With a reference genome, indels are left-aligned and the asserted ref is
# validated against the build (a mismatch is a hard error — likely wrong build):
#   resolve("chr2:g.5226001del", reference=hg38, dbsnp=dbsnp_db)
#   resolve("VCV000012345", clinvar=clinvar_db)   # ClinVar accession → Variant

Inspect the data registry — every external dataset is versioned and license-aware:

from alleleforge.data import DEFAULT_REGISTRY

print(DEFAULT_REGISTRY.names)                 # ('1000g', 'clinvar', 'dbsnp', 'encode', ...)
clinvar = DEFAULT_REGISTRY.get("clinvar")
print(clinvar.version, clinvar.license)       # 2024-05  public-domain (NCBI)
# Non-redistributable sources are never vendored; downloads are consent-gated
# and checksum-verified. See docs/data.md for the full provenance table.

The target journey (Phase 12 CLI):

# Variant → ranked, safety-annotated menu of candidate edits
aforge design --clinvar VCV000012345 --intent correct --populations all

# Standalone population/haplotype-aware off-target for a spacer
aforge offtarget --spacer GACGGAGGCTAAGCGTCGCAA --pam NGG

# Normalize any input form and show its consequence (debugging aid)
aforge resolve --hgvs "NM_000518.5:c.20A>T"

The variant-first front end (Phases 2–4, shipping now)

Phases 2–4 implement everything from an input to a validated, annotated variant with its genomic context — the foundation every modality plugs into.

flowchart LR
    subgraph IN["Accepted inputs"]
        A1["ClinVar accession"]
        A2["dbSNP rsID"]
        A3["HGVS g./c./p."]
        A4["VCF record"]
        A5["raw coordinates"]
        A6["raw target seq"]
    end
    R["resolve()"]
    subgraph NORM["Normalize"]
        N1["left-align + trim<br/>(bcftools-norm)"]
        N2["validate ref vs build<br/>(hard error on mismatch)"]
    end
    OUT["ResolvedVariant<br/>variant · working interval ·<br/>consequence · T2T recommendation"]

    A1 & A2 & A3 & A4 & A5 & A6 --> R --> NORM --> OUT
    R -. ClinVar/dbSNP/HGVS lookups .- DATA["Data registry<br/>(versioned, license-aware)"]
    NORM -. fetch + flag ambiguous loci .- GEN["Genome access<br/>(FASTA, FM-index, liftover)"]
Loading

Coordinate convention cheat-sheet. Internals are uniformly 0-based half-open; only I/O boundaries are 1-based. Every parser converts on read.

Surface System Converted by
AlleleForge internals (GenomicInterval, Variant.pos) 0-based half-open — (canonical)
ClinVar / gnomAD / dbSNP VCF 1-based pos − 1 on read
GENCODE GTF 1-based inclusive [start − 1, end) on read
ENCODE bedGraph 0-based half-open unchanged
HGVS (g.), human-readable reports 1-based boundary helpers only

Dataset provenance (pinned, versioned, citation-stamped — full table in docs/data.md):

Dataset Version License Role
ClinVar 2024-05 Public domain accession → variant + significance
gnomAD v4.1 CC0-1.0 per-population allele frequencies
1000 Genomes phase 3 high-cov Public (IGSR) phased common haplotypes
HGDP gnomAD v3.1 CC0-1.0 ancestry breadth
dbSNP b156 Public domain rsID ↔ locus
GENCODE v47 Open gene models / transcripts
ENCODE 2024 Open chromatin tracks

The off-target engine (Phase 5, shipping now)

AlleleForge's safety core, and its clearest point of novelty: off-target nomination that is reference-, population-, and haplotype-aware for every chemistry, behind one search() call that returns an ancestry-stratified report. Reference-only off-target analysis has a known blind spot — a minor allele can create a de novo PAM the reference never shows — and because allele frequencies differ by ancestry, that blind spot concentrates risk in under-represented populations.

flowchart TB
    SP["spacer + PAM"] --> S1
    subgraph ENG["search() — five stages"]
        direction TB
        S1["1 · Reference scan<br/>PAM-anchored · ≤4 mismatch · ≤1 DNA + ≤1 RNA bulge · both strands"]
        S2["2 · Population augmentation<br/>gnomAD alt-allele re-scan → de-novo PAMs / strengthened seed sites"]
        S3["3 · Haplotype walk<br/>common 1000G / HGDP haplotypes (variant combinations)"]
        S4["4 · Patient VCF (optional)<br/>personalize to one genome"]
        S5["5 · Score · threshold · de-dup · stratify"]
        S1 --> S5
        S2 --> S5
        S3 --> S5
        S4 --> S5
    end
    S5 --> R["OffTargetReport<br/>ancestry-stratified · every site tagged<br/>reference / population / patient + causal allele + freq"]
Loading

Every site records where it came from — the reference, a population variant (which allele, which populations, at what frequency), or a patient's VCF — so a nomination can be audited, not trusted blindly. The report's worst-case is computed against the worst-affected ancestry, never the average.

Reference bias, reproduced

The canonical cautionary tale is the BCL11A enhancer variant rs114518452 (Cancellieri & Pinello, Nat Genet 2023). AlleleForge reproduces it as an integration test: a reference-only scan returns zero sites, while the population-aware scan nominates the high-CFD off-target the minor allele creates — ancestry-stratified, with its African-ancestry-enriched frequency recorded.

from alleleforge.offtarget import search
from alleleforge.types.guide import PAM

report = search(spacer, PAM(pattern="NGG"), reference=hg38, gnomad=gnomad_db)
for site in report.sites:
    print(site.origin, round(site.score, 2), site.causal_allele, site.populations)
worst = report.worst_ancestry()        # ('afr', 1.0) — flagged, not averaged away

Specificity scoring cheat-sheet

Score Source Status in AlleleForge
MIT / Hsu Hsu et al., Nat Biotechnol 2013 Exact — published 20-position weight table
CFD Doench et al., Nat Biotechnol 2016 Published PAM table; mismatch weights default to a transparent seed model, injectable with the exact Doench matrix
CFD-Cas12a analog Seed at the PAM-proximal 5' end, TTTV PAM

All three sit behind one swappable OffTargetScorer protocol, so a Phase 6 ML scorer drops in without touching the engine. Reporting thresholds default to CFD ≥ 0.20 or MIT ≥ 0.10.

The genome-scale search is the Rust FM-index seed-and-extend kernel; until that crate is built, AlleleForge ships a correct pure-Python linear-scan fallback (CI never blocks on the native build).


The scoring substrate (Phase 6, shipping now)

Before any chemistry-specific predictor, AlleleForge establishes the reusable ML substrate: a license-gated model zoo, a swappable embedding backbone, and the calibrated-uncertainty machinery that realizes the honest-uncertainty principle. The whole substrate is pure stdlib in its core path — no numpy or torch — so it runs in CI on a weight-free stub embedder; real 500M-parameter backbones are gated behind the real_weights marker.

flowchart LR
    SEQ["DNA sequence"] --> EMB["SequenceEmbedder<br/>(NT v2 · Caduceus · Evo 2 · Stub)"]
    EMB --> CACHE["embedding cache<br/>(by sequence hash)"]
    EMB --> OOD["OODDetector<br/>distance vs training reference"]
    CACHE --> MODEL["scorer / ensemble"]
    MODEL --> U{"uncertainty"}
    U -->|N=5 default| ENS["deep ensemble<br/>mean ± z·σ (disagreement)"]
    U -->|fallback| EV["evidential<br/>aleatoric + epistemic"]
    U -->|if quantiles| QT["quantile interval"]
    ENS & EV & QT --> CAL["isotonic calibration<br/>(reduces ECE)"]
    OOD --> CAL
    CAL --> PRED["Prediction[float]<br/>value · 80% interval · method ·<br/>in_distribution · calibrated"]
Loading

No bare floats. Every scorer returns a Prediction, never a number; ensure_prediction is the runtime guard at the orchestration seam. No undocumented models. Every checkpoint loads through the model zoo, which refuses a missing card, a license that forbids the use, or an unverifiable hash, and surfaces a ModelCheckpoint into result provenance.

Uncertainty method cheat-sheet

Method Role Interval
Deep ensemble (N=5) default mean ± z·σ from member disagreement — widens on OOD
Evidential (NIG) single-model fallback splits aleatoric (data) vs epistemic (model) variance
Quantile when the model emits quantiles read off the (1±level)/2 quantiles
Isotonic calibration post-hoc, all of the above PAV fit; expected_calibration_error quantifies the gain
from alleleforge.scoring import DeepEnsemble, ensemble_prediction, OODDetector, StubEmbedder

ens = DeepEnsemble([m1, m2, m3, m4, m5])                 # five members
emb = StubEmbedder().embed(["GACCATGCAACCTTGAACGT"])[0]   # NT v2 in production
ood = OODDetector(training_reference)                     # embedding-space density
pred = ensemble_prediction(ens.predict(features), in_distribution=ood.is_in_distribution(emb))
print(pred.value, pred.interval, pred.method, pred.in_distribution)   # honest by construction

The first chemistry: SpCas9 nuclease (Phase 7, shipping now)

The most mature chemistry, and the right one to prove the full vertical slice end to end. From a resolved variant, design_cas9 enumerates guides, scores efficiency and outcome with calibrated uncertainty, runs the population-aware off-target engine, and returns ranked candidates.

flowchart LR
    V["ResolvedVariant<br/>+ intent"] --> EN["enumerate_cas9<br/>PAM-anchored · strand-aware ·<br/>cut 3 bp 5' of PAM · actionable window"]
    EN --> EF["efficiency<br/>RS3 baseline / deep ensemble<br/>(80% interval + OOD)"]
    EN --> OUT["outcome<br/>microhomology / MMEJ +<br/>1-bp insertion spectrum"]
    EN --> OT["off-target<br/>(Phase 5 engine,<br/>ancestry-stratified)"]
    EF & OUT & OT --> C["DesignCandidate[]<br/>ranked: efficiency then safety"]
    EN -.precise intent.-> HDR["HDR donor template"]
Loading

Defaults & decisions. Primary PAM NGG; NG (SpCas9-NG) and NRN/NYN (SpRY) are emitted only when no NGG guide is actionable and opted in. Cut site 3 bp 5' of the PAM. The actionable window is tight around the edit for precise intents (HDR efficiency falls off with cut-to-edit distance) and the whole working interval for a knock-out, which marks frameshift outcomes as intended.

Axis Default (CI, weight-free) Trained alternative (model zoo, ml extra)
Efficiency RS3-style feature baseline + backbone deep ensemble Rule Set 3; fine-tuned NT v2 ensemble
Outcome microhomology/MMEJ + 1-bp insertion model inDelphi (default) · Lindel · X-CRISP + agreement
Off-target Phase 5 engine (pure-Python fallback) Phase 5 engine (Rust FM-index)

Every efficiency score carries an 80% interval and an OOD flag; every outcome is a normalized distribution over indel alleles; every candidate carries an ancestry-stratified off-target report — so a ranked menu is honest about what it does and does not know.


Base editing: the bystander problem (Phase 8, shipping now)

Base editors install a single transition (ABE: A·T→G·C; CBE: C·G→T·A) without a double-strand break, within a narrow activity window. The hard part is the window outcome: of the editable bases in the window, which get edited — and what bystanders ride along. AlleleForge enumerates every sgRNA placing the target base in-window per editor, predicts the window-allele distribution, and ranks by the probability of the exact intended allele while minimizing bystander burden.

flowchart LR
    V["ResolvedVariant<br/>(transition SNV)"] --> EL{"editor eligible?<br/>ABE: A·T→G·C<br/>CBE: C·G→T·A"}
    EL --> EN["enumerate_base_edits<br/>target base in window 4–8 ·<br/>strand-aware · bystanders flagged"]
    EN --> WO["window outcome<br/>per-position p(edit) × motif →<br/>2ᵏ allele distribution"]
    WO --> M["p_intended_exact<br/>+ bystander_burden"]
    EN --> OT["off-target<br/>(Phase 5, ancestry-stratified)"]
    M & OT --> C["DesignCandidate[]<br/>ranked: clean-edit then bystander<br/>cleanest = recommended"]
Loading

Declarative editor registry. ABE8e, CBE4max, and evoCDA1 ship as data; adding an editor (deaminase, chemistry, window, PAM, motif preference) is a one-descriptor change, not code.

Editor Deaminase Edit Window Motif preference
ABE8e TadA-8e A→G 4–8 none (broad)
CBE4max APOBEC1 C→T 4–8 TC (prefers 5′ T)
evoCDA1 evoCDA1 C→T 2–10 none (broad window)

Every candidate carries the tradeoff explicitly — bystander-present:N / clean, a bystander-burden score, the full window-allele distribution, and an ancestry-stratified off-target report — so the recommendation is the cleanest editor/guide combination, not just the first one found.


Defaults cheat-sheet

Every default is overridable; these are the spec-mandated starting points.

Topic Default Notes
Reference / coordinates hg38, 0-based half-open T2T-CHM13 auto-recommended for ambiguous loci; mm39 for mouse
Strand always explicit no implicit "default strand"; spacers stored 5'→3'
SpCas9 PAM NGG (primary), NAG low-stringency NG / SpRY opt-in when no NGG is actionable
Off-target search ≤ 4 mismatches, ≤ 1 DNA + ≤ 1 RNA bulge report CFD ≥ 0.20 or MIT ≥ 0.10
Population inclusion MAF ≥ 0.001, all populations de-novo PAM & seed-mismatch changes always evaluated
Base-editing window protospacer positions 4–8 ABE8e (A→G), CBE4max / evoCDA1 (C→T); bystanders always reported
Prime editing PE5max + epegRNA (tevopreQ1) PBS 8–17 nt, RTT 7–34 nt; PE3b nicking guide when seed-disrupting
Uncertainty 80% predictive interval deep ensemble (N=5) + isotonic calibration
Seed 20240501 threaded through every stochastic step, recorded in provenance

Project layout

alleleforge/
├── pyproject.toml            # hatchling build, deps, ruff/mypy/pytest config
├── SPEC.md                   # the authoritative, phase-by-phase build contract
├── rust/                     # PyO3 crate: aforge_native (BWT, k-mer, haplotype)
├── src/alleleforge/
│   ├── config.py             # typed Settings (pydantic-settings), defaults, paths
│   ├── _native.py            # optional Rust bridge
│   ├── types/                # Phase 1: core domain vocabulary
│   ├── genome/               # Phase 2: reference access, FM-index, liftover
│   ├── data/                 # Phase 3: registry, ClinVar, gnomAD, 1000G/HGDP, dbSNP, annotations
│   ├── variant/              # Phase 4: resolver, HGVS adapter, consequence
│   ├── offtarget/            # Phase 5: population/haplotype-aware off-target
│   ├── model_zoo/            # Phase 6: license-gated model cards + checkpoints
│   ├── scoring/              # Phase 6: embeddings, uncertainty, Scorer (this release)
│   ├── enumerate/            # Phases 7–8: SpCas9 guide + base-editor window enumeration
│   ├── design/               # Phases 7–8: SpCas9 + base-editor verticals (designer: Phase 10)
│   ├── report/ cli/ web/                   # Phases 11–13 (interfaces)
│   └── ...
├── tests/                    # mirrors src/; pytest + hypothesis
├── benchmark/                # CRISPR-Bench (Phase 14)
└── docs/                     # mkdocs-material site

Development

pip install -e ".[dev]"
ruff check src tests           # lint + import order + docstrings
ruff format --check src tests  # formatting
mypy src                       # strict type-check
pytest                         # tests + ≥85% coverage gate on core
cd rust && cargo test && maturin develop   # native crate

CI (GitHub Actions) runs lint, type-check, tests (Python 3.11 + 3.12 on Linux & macOS), the Rust build, and a docs build on every push and PR. See .github/workflows/ci.yml.

Contributions are welcome — please read CONTRIBUTING.md and the Contributor Covenant 2.1 code of conduct.


Scope & responsible use

  • Research use only. AlleleForge produces hypotheses and rankings, not medical advice or clinical decisions. Every generated report repeats this.
  • Off-target predictions require experimental validation. Computational nomination narrows the search; it does not replace GUIDE-seq / CHANGE-seq / amplicon confirmation.
  • No telemetry, no phone-home. All computation runs locally or on user-controlled infrastructure. User sequences are never transmitted externally.
  • Honest uncertainty over false confidence. Where models are out of distribution (e.g., prime-editing efficiency outside PRIDICT's HEK293T / K562 training context), AlleleForge flags it rather than hiding it.
  • Dual-use awareness. This is a design and safety-analysis tool for legitimate therapeutic and basic research. It contains no wet-lab protocols or synthesis instructions.

License

AlleleForge is released under the MIT License — all code, schemas, benchmark, and any first-party model weights. It is fully open source and free to use, modify, and redistribute.

Each wrapped third-party tool or model retains its own upstream license, recorded in its model/tool card; the registry refuses to bundle any component whose license is incompatible with redistribution and fetches it at runtime with the user's consent instead.

Citation

If you use AlleleForge, please cite it via CITATION.cff. A Zenodo DOI is minted on the first tagged release.

About

A variant-first, multi-modality CRISPR design framework that unifies SpCas9 nuclease, base-editor, and prime-editor chemistries under a single typed interface to deliver ranked candidate edits complete with calibrated uncertainty intervals, predicted outcome distributions, and ancestry-stratified, population-aware off-target safety reporting.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors