Commit 97a2844
authored
This PR is:
- To remove a stale `xfail` on `test_greedy_output_matches` that was
originally added for issue #119.
- To align test expectation with current `main` behavior after
paged-path fixes already merged.
- To keep parity tracking accurate while leaving batched behavior to its
own tracking path.
## Context
Issue #119 reported token mismatch parity failures between:
- standard MLX KV cache path, and
- Metal paged-attention path.
Since then, two key fixes landed:
- #125 corrected paged KV cache dtype inference/fallback behavior and KV
cache size accounting used by paged memory/block calculations.
- #136 replaced the HF/PyTorch kernel-bridge path with native MLX +
inline Metal JIT dispatch (`get_ops`/nanobind), removing cross-framework
bridge behavior from paged execution.
With those changes, the old greedy mismatch from #119 no longer
reproduces on `main`, so the greedy `xfail` is stale.
## Verification
```bash
pytest -q tests/test_metal_kernel_paged.py::TestMetalKernelPagedVsStandard::test_greedy_output_matches -s
pytest -m slow -q tests/test_metal_kernel_paged.py
```
Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>
1 parent ea16013 commit 97a2844
0 file changed
0 commit comments