🪝 vLLM.hook

A modular plugin library for vLLM.

📄 [Preprint] vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

vLLM.hook is a plugin library designed to let developers and researchers inspect, analyze, and steer the internal operations of large language models running under the vLLM inference engine.

This includes dynamic analysis of:

attention patterns
attention heads
activations
custom intervention behaviors

🚀 Features

Model-agnostic plugin system for vLLM engines
Extensible worker/analyzer abstraction
- Easy to define new hooks, analyzers, and behaviors
Introspection of model internals
Interventions (activation steering, attention control, etc.)
Example applications:
- Safety guardrails
- Reranking
- Enhanced instruction following

📊 Performance Analysis

For a detailed benchmark comparing vLLM-Hook against Native vLLM Eagle (ExampleHiddenStatesConnector) for hidden state extraction, see Numerical_Analysis/.

Key takeaways:

vLLM-Hook (last_token) offers significantly lower and prompt-length-invariant latency when only the final-position representation is needed
vLLM-Hook (all_tokens) is numerically equivalent to Native Eagle while avoiding its GPU memory overhead
Native Eagle requires loading a speculative decoding drafter model, reducing available KV cache

🧩 Supported Configurations

Each use case (e.g. attention tracker, activation steering, hidden states extraction, etc) runs across a Cartesian product of configuration axes — execution path (offline / vllm serve), storage (rpc / disk / shm), disk format (pt / safetensors), and save mode (sync / async). See configs.md for code snippets showing how to select each config.

📦 Installation

1. Clone the repository

git clone https://github.com/IBM/vLLM-Hook.git
cd vLLM-Hook

2. (Optional) Create an environment

conda create -n vllm_hook_env python=3.12 pip
conda activate vllm_hook_env

3. Install the plugin and dependencies

pip install -r requirement.txt
pip install -e vllm_hook_plugins

📕 Notebook Setup

If you plan to use the notebooks under notebooks/, you may need to register your environment as a Jupyter kernel:

pip install ipykernel
python -m ipykernel install --user --name vllm_hook_env --display-name "vllm_hook_env"

Then inside Jupyter Lab:

Kernel → Change Kernel → vllm_hook_env

👉 Usage Examples (Notebook / CLI)

You can also use the included examples/ and/or notebooks/ directories to explore different functionalities.

1. Attention Tracker (In-Model Safety Guardrail)

Notebook 📓: notebooks/demo_attntracker.ipynb
CLI 🧰 :

python examples/demo_attntracker.py

2. Core Reranker (In-Model Relevance Ranking)

Notebook 📓: notebooks/demo_corer.ipynb
CLI 🧰 :

python examples/demo_corer.py

3. Activation Steering (Enhanced instruction following via activation steering)

Notebook 📓: notebooks/demo_actsteer.ipynb
CLI 🧰 :

python examples/demo_actsteer.py

You can customize model configurations in the model_configs/ folder, e.g.:

model_configs/<example_name>/<model_name>.json

For example model_configs/attention_tracker/granite-3.1-8b-instruct.json.

🏠 Plugin Architecture

The main package is structured as follows:

vllm_hook_plugins/
├── analyzers/
│   ├── attention_tracker_analyzer.py
│   ├── core_reranker_analyzer.py
├── workers/
│   ├── probe_hookqk_worker.py
│   ├── steer_activation_worker.py
├── hook_llm.py
├── registry.py

Each component handles a key stage of the plugin lifecycle:

Registry — manages available hooks and extensions
Workers — define execution behavior and orchestration
Analyzers — optionally conduct analysis based on the saved statistics

🤝 Contributing

We welcome contributions from the community!

To contribute:

Fork this repository
Create a branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to your branch (git push origin feature/amazing-feature)
Open a Pull Request

Guidelines:

Users are encouraged to define new worker/analyzer, but should not touch hook_llm
Include examples and documentation for new features
The registry will be updated by the admin

🌟 Feeling Inspired

@article{ko2026vllm,
  title={vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM},
  author={Ko, Ching-Yun and Chen, Pin-Yu},
  journal={arXiv preprint arXiv:2603.06588},
  year={2026}
}

IBM ❤️ Open Source AI

vLLM.hook has been started by IBM Research.

Built for the vLLM ecosystem
Inspired by community efforts to make LLMs more interpretable and controllable

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github		.github
Numerical_Analysis		Numerical_Analysis
examples		examples
model_configs		model_configs
notebooks		notebooks
steering_vectors		steering_vectors
tests		tests
vllm_hook_plugins		vllm_hook_plugins
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
configs.md		configs.md
requirement.txt		requirement.txt
vLLM_Hook_v0.pdf		vLLM_Hook_v0.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪝 vLLM.hook

🚀 Features

📊 Performance Analysis

🧩 Supported Configurations

📦 Installation

1. Clone the repository

2. (Optional) Create an environment

3. Install the plugin and dependencies

📕 Notebook Setup

👉 Usage Examples (Notebook / CLI)

1. Attention Tracker (In-Model Safety Guardrail)

2. Core Reranker (In-Model Relevance Ranking)

3. Activation Steering (Enhanced instruction following via activation steering)

🏠 Plugin Architecture

🤝 Contributing

To contribute:

Guidelines:

🌟 Feeling Inspired

IBM ❤️ Open Source AI

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪝 vLLM.hook

🚀 Features

📊 Performance Analysis

🧩 Supported Configurations

📦 Installation

1. Clone the repository

2. (Optional) Create an environment

3. Install the plugin and dependencies

📕 Notebook Setup

👉 Usage Examples (Notebook / CLI)

1. Attention Tracker (In-Model Safety Guardrail)

2. Core Reranker (In-Model Relevance Ranking)

3. Activation Steering (Enhanced instruction following via activation steering)

🏠 Plugin Architecture

🤝 Contributing

To contribute:

Guidelines:

🌟 Feeling Inspired

IBM ❤️ Open Source AI

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages