VLLM / LLama.cpp integration

Hello

VLLM and llama.cpp are the main inferences engines available (vllm is also in python), is there any plan to open a PR to get support of KVzip directly inside those inference engines ?


vllm also implemented few days ago DCA (dual chunk attention)