Skip to content

IBM/vLLM-Hook

Repository files navigation

πŸͺ vLLM.hook

A modular plugin library for vLLM.

πŸ“„ [Preprint] vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

vLLM.hook is a plugin library designed to let developers and researchers inspect, analyze, and steer the internal operations of large language models running under the vLLM inference engine.

This includes dynamic analysis of:

  • attention patterns
  • attention heads
  • activations
  • custom intervention behaviors

πŸš€ Features

  • Model-agnostic plugin system for vLLM engines
  • Extensible worker/analyzer abstraction
    • Easy to define new hooks, analyzers, and behaviors
  • Introspection of model internals
  • Interventions (activation steering, attention control, etc.)
  • Example applications:
    • Safety guardrails
    • Reranking
    • Enhanced instruction following

πŸ“Š Performance Analysis

For a detailed benchmark comparing vLLM-Hook against Native vLLM Eagle (ExampleHiddenStatesConnector) for hidden state extraction, see Numerical_Analysis/.

Key takeaways:

  • vLLM-Hook (last_token) offers significantly lower and prompt-length-invariant latency when only the final-position representation is needed
  • vLLM-Hook (all_tokens) is numerically equivalent to Native Eagle while avoiding its GPU memory overhead
  • Native Eagle requires loading a speculative decoding drafter model, reducing available KV cache

🧩 Supported Configurations

Each use case (e.g. attention tracker, activation steering, hidden states extraction, etc) runs across a Cartesian product of configuration axes β€” execution path (offline / vllm serve), storage (rpc / disk / shm), disk format (pt / safetensors), and save mode (sync / async). See configs.md for code snippets showing how to select each config.


πŸ“¦ Installation

1. Clone the repository

git clone https://github.com/IBM/vLLM-Hook.git
cd vLLM-Hook

2. (Optional) Create an environment

conda create -n vllm_hook_env python=3.12 pip
conda activate vllm_hook_env

3. Install the plugin and dependencies

pip install -r requirement.txt
pip install -e vllm_hook_plugins

πŸ“• Notebook Setup

If you plan to use the notebooks under notebooks/, you may need to register your environment as a Jupyter kernel:

pip install ipykernel
python -m ipykernel install --user --name vllm_hook_env --display-name "vllm_hook_env"

Then inside Jupyter Lab:

Kernel β†’ Change Kernel β†’ vllm_hook_env

πŸ‘‰ Usage Examples (Notebook / CLI)

You can also use the included examples/ and/or notebooks/ directories to explore different functionalities.

1. Attention Tracker (In-Model Safety Guardrail)

Notebook πŸ““: notebooks/demo_attntracker.ipynb
CLI 🧰 :

python examples/demo_attntracker.py

2. Core Reranker (In-Model Relevance Ranking)

Notebook πŸ““: notebooks/demo_corer.ipynb
CLI 🧰 :

python examples/demo_corer.py

3. Activation Steering (Enhanced instruction following via activation steering)

Notebook πŸ““: notebooks/demo_actsteer.ipynb
CLI 🧰 :

python examples/demo_actsteer.py

You can customize model configurations in the model_configs/ folder, e.g.:

model_configs/<example_name>/<model_name>.json

For example model_configs/attention_tracker/granite-3.1-8b-instruct.json.


🏠 Plugin Architecture

The main package is structured as follows:

vllm_hook_plugins/
β”œβ”€β”€ analyzers/
β”‚   β”œβ”€β”€ attention_tracker_analyzer.py
β”‚   β”œβ”€β”€ core_reranker_analyzer.py
β”œβ”€β”€ workers/
β”‚   β”œβ”€β”€ probe_hookqk_worker.py
β”‚   β”œβ”€β”€ steer_activation_worker.py
β”œβ”€β”€ hook_llm.py
β”œβ”€β”€ registry.py

Each component handles a key stage of the plugin lifecycle:

  • Registry β€” manages available hooks and extensions
  • Workers β€” define execution behavior and orchestration
  • Analyzers β€” optionally conduct analysis based on the saved statistics

🀝 Contributing

We welcome contributions from the community!

To contribute:

  1. Fork this repository
  2. Create a branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to your branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Guidelines:

  • Users are encouraged to define new worker/analyzer, but should not touch hook_llm
  • Include examples and documentation for new features
  • The registry will be updated by the admin

🌟 Feeling Inspired

@article{ko2026vllm,
  title={vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM},
  author={Ko, Ching-Yun and Chen, Pin-Yu},
  journal={arXiv preprint arXiv:2603.06588},
  year={2026}
}

IBM ❀️ Open Source AI

vLLM.hook has been started by IBM Research.

  • Built for the vLLM ecosystem
  • Inspired by community efforts to make LLMs more interpretable and controllable

About

Repo for vLLM Hook, an vLLM plug-in for programming internal states of models deployed on vLLM

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors