Skip to content

deploy: implement int8 predictor and int4 Carbon quantization #70

@AbdelStark

Description

@AbdelStark

Context

Hits the on-device memory / latency budget.

Scope

  • Per-tensor symmetric int8 for predictor + action encoder with calibration over 1k reference windows.
  • Q4_K_M for Carbon via llama.cpp toolchain.
  • Evaluate quality drop against RFC-0016 §3.3 budget.

Out of Scope

  • AWQ / GPTQ alternatives (Phase 4 if needed).

Design Reference

  • RFC: rfcs/0010-on-device-personal-genome-deployment.md §3.3
  • RFC: rfcs/0016-performance-budget.md §3.3

Acceptance Criteria

  • Int8 predictor: ≤ 1.0 AUROC drop on ClinVar coding
  • Int4 Carbon + int8 predictor: ≤ 2.0 AUROC drop on ClinVar coding
  • Quantization is reproducible from a seeded calibration set

Parent tracking issue: #14

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions