Per-layer locks for compressed cache instead of single global Mutex

## Problem / Motivation

All transformer layers share a single `Arc<Mutex<dyn CompressedKVCache>>`. Every decode step acquires this mutex once per layer — with 32 layers, that's 32 sequential lock acquisitions per token. With batch size > 1, all sequences serialize on this lock.

Identified by Copilot review (Finding M1).

## Current code (`kv_cache/mod.rs`)
```rust
pub enum KvCache {
    Compressed {
        cache: Arc<Mutex<dyn CompressedKVCache>>,  // shared across ALL layers
        layer: usize,
        ...
    }
}
```

## Solution

Options (in order of preference):

**Option A: Per-layer storage within the cache**
The `CompressedKVCache` trait already receives `layer: usize` in every method call. Internally, storage is already per-layer. The Mutex only serializes because the trait object is shared. If the cache implementation uses internal per-layer locking (e.g., `RwLock` per layer), the outer Mutex can be downgraded.

**Option B: Vec of per-layer Mutexes**
```rust
// Instead of one Mutex for the whole cache:
cache: Vec<Arc<Mutex<LayerCache>>>,  // one per layer
```

**Option C: Lock-free decode path**
If decode only reads from committed cache + writes to a single new slot, it could use atomic operations instead of a mutex.

## Acceptance criteria
- [ ] Decode with batch_size > 1 does not serialize all layers
- [ ] No deadlocks
- [ ] All existing tests pass
- [ ] Benchmark: measurable improvement with batch_size >= 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-layer locks for compressed cache instead of single global Mutex #46

Problem / Motivation

Current code (`kv_cache/mod.rs`)

Solution

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Per-layer locks for compressed cache instead of single global Mutex #46

Description

Problem / Motivation

Current code (kv_cache/mod.rs)

Solution

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Current code (`kv_cache/mod.rs`)