A modular plugin library for vLLM.
π [Preprint] vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM
vLLM.hook is a plugin library designed to let developers and researchers inspect, analyze, and steer the internal operations of large language models running under the vLLM inference engine.
This includes dynamic analysis of:
- attention patterns
- attention heads
- activations
- custom intervention behaviors
- Model-agnostic plugin system for vLLM engines
- Extensible worker/analyzer abstraction
- Easy to define new hooks, analyzers, and behaviors
- Introspection of model internals
- Interventions (activation steering, attention control, etc.)
- Example applications:
- Safety guardrails
- Reranking
- Enhanced instruction following
For a detailed benchmark comparing vLLM-Hook against Native vLLM Eagle (ExampleHiddenStatesConnector) for hidden state extraction, see Numerical_Analysis/.
Key takeaways:
- vLLM-Hook (
last_token) offers significantly lower and prompt-length-invariant latency when only the final-position representation is needed - vLLM-Hook (
all_tokens) is numerically equivalent to Native Eagle while avoiding its GPU memory overhead - Native Eagle requires loading a speculative decoding drafter model, reducing available KV cache
Each use case (e.g. attention tracker, activation steering, hidden states extraction, etc) runs across a Cartesian product of configuration axes β execution path (offline / vllm serve), storage (rpc / disk / shm), disk format (pt / safetensors), and save mode (sync / async). See configs.md for code snippets showing how to select each config.
git clone https://github.com/IBM/vLLM-Hook.git
cd vLLM-Hookconda create -n vllm_hook_env python=3.12 pip
conda activate vllm_hook_envpip install -r requirement.txt
pip install -e vllm_hook_pluginsIf you plan to use the notebooks under notebooks/, you may need to register your environment as a Jupyter kernel:
pip install ipykernel
python -m ipykernel install --user --name vllm_hook_env --display-name "vllm_hook_env"Then inside Jupyter Lab:
Kernel β Change Kernel β vllm_hook_env
You can also use the included examples/ and/or notebooks/ directories to explore different functionalities.
Notebook π: notebooks/demo_attntracker.ipynb
CLI π§° :
python examples/demo_attntracker.pyNotebook π: notebooks/demo_corer.ipynb
CLI π§° :
python examples/demo_corer.pyNotebook π: notebooks/demo_actsteer.ipynb
CLI π§° :
python examples/demo_actsteer.pyYou can customize model configurations in the model_configs/ folder, e.g.:
model_configs/<example_name>/<model_name>.json
For example model_configs/attention_tracker/granite-3.1-8b-instruct.json.
The main package is structured as follows:
vllm_hook_plugins/
βββ analyzers/
β βββ attention_tracker_analyzer.py
β βββ core_reranker_analyzer.py
βββ workers/
β βββ probe_hookqk_worker.py
β βββ steer_activation_worker.py
βββ hook_llm.py
βββ registry.py
Each component handles a key stage of the plugin lifecycle:
- Registry β manages available hooks and extensions
- Workers β define execution behavior and orchestration
- Analyzers β optionally conduct analysis based on the saved statistics
We welcome contributions from the community!
- Fork this repository
- Create a branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to your branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Users are encouraged to define new worker/analyzer, but should not touch hook_llm
- Include examples and documentation for new features
- The registry will be updated by the admin
@article{ko2026vllm,
title={vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM},
author={Ko, Ching-Yun and Chen, Pin-Yu},
journal={arXiv preprint arXiv:2603.06588},
year={2026}
}
vLLM.hook has been started by IBM Research.
- Built for the vLLM ecosystem
- Inspired by community efforts to make LLMs more interpretable and controllable