Hi, thanks for the impressive work!
The streaming benchmarks currently compare CASA against "Full Insertion" (accumulating infinite memory). Have you compared it against a Standard Insertion + KV Eviction baseline (processing images via FFN but dropping old visual KVs)?
While CASA has a clear compute advantage by skipping FFNs, both methods share similar memory characteristics and RoPE handling (keeping position ID "gaps").
Do you have any performance comparisons against a simple KV eviction strategy?
Thanks!
Hi, thanks for the impressive work!
The streaming benchmarks currently compare CASA against "Full Insertion" (accumulating infinite memory). Have you compared it against a Standard Insertion + KV Eviction baseline (processing images via FFN but dropping old visual KVs)?
While CASA has a clear compute advantage by skipping FFNs, both methods share similar memory characteristics and RoPE handling (keeping position ID "gaps").
Do you have any performance comparisons against a simple KV eviction strategy?
Thanks!