deploy: implement int8 predictor and int4 Carbon quantization

## Context
Hits the on-device memory / latency budget.

## Scope
- Per-tensor symmetric int8 for predictor + action encoder with calibration over 1k reference windows.
- Q4_K_M for Carbon via llama.cpp toolchain.
- Evaluate quality drop against RFC-0016 §3.3 budget.

## Out of Scope
- AWQ / GPTQ alternatives (Phase 4 if needed).

## Design Reference
- RFC: rfcs/0010-on-device-personal-genome-deployment.md §3.3
- RFC: rfcs/0016-performance-budget.md §3.3

## Acceptance Criteria
- [ ] Int8 predictor: ≤ 1.0 AUROC drop on ClinVar coding
- [ ] Int4 Carbon + int8 predictor: ≤ 2.0 AUROC drop on ClinVar coding
- [ ] Quantization is reproducible from a seeded calibration set

_Parent tracking issue: #14_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy: implement int8 predictor and int4 Carbon quantization #70

Context

Scope

Out of Scope

Design Reference

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

deploy: implement int8 predictor and int4 Carbon quantization #70

Description

Context

Scope

Out of Scope

Design Reference

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions