Commit a5cb49c

committed

perf(embed): engage hybrid CPU twin under throughput and parallelize batch prep

Balance the throughput-profile hybrid split so the GPU arm and the CPU twin each receive a real share of every batch. The split target now adds the long-entity work to the GPU's share of the short work (long entities load the CPU side), so long-heavy batches keep the GPU saturated instead of emptying its subset. When the GPU subset is empty, the work runs on the CPU twin rather than being routed back to the GPU by the auto resolver, so the idle cores are used. Parallelize the embed prep hot path: extract tokenized ids and masks with a parallel iterator, then hold them as contiguous length-sorted arrays so each dispatch borrows its sub-batch instead of re-cloning token buffers per chunk. Consolidate the Metal/CPU dispatch into one routine shared by the plain and hybrid paths. The aggressive hybrid stays gated on the throughput resource profile; the proof path is unchanged. The batch budget continues to read the throughput plan's hardware-scaled token and attention-area caps, and now surfaces detected unified memory to the plan so the caps track the host. Result order is stable regardless of how a batch splits across the two arms. Add partition coverage/determinism unit tests and an ignored real-model parity test asserting the split is value- and order-preserving. Signed-off-by: Troy Fortin <troy@firelock.ai>

1 parent 2697c45 commit a5cb49cCopy full SHA for a5cb49c

2 files changed

crates/kin-db
- src/embed
  - mod.rs
- tests
  - embed_hybrid_parity.rs

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit a5cb49c

Uh oh!

File tree

0 commit comments