Releases: ndif-team/nnsight
v0.5.15
What's Changed
- refactor(config): streamline environment variable handling in AppConfigModel
- Simplified the logic for setting API key and host from environment variables and Colab userdata. by @JadenFiotto-Kaufman in #592
Full Changelog: v0.5.14...v0.5.15
v0.5.14
NNsight v0.5.14 Release Notes
Release Date: January 2026
This release focuses on improving the remote execution experience, vLLM compatibility, developer documentation, and overall code quality. It includes 59 commits across 37 files, with significant enhancements to the job status display system, vLLM input handling, and comprehensive new documentation.
✨ New Features
Enhanced Remote Job Status Display
The remote execution logging system has been completely redesigned with a new JobStatusDisplay class that provides:
- Real-time visual feedback with Unicode spinners and status icons
- ANSI color support with automatic detection for terminals and notebooks
- In-place status updates that don't flood the console with repeated messages
- Elapsed time tracking per status phase
- Seamless Jupyter notebook integration with flicker-free HTML rendering using
DisplayHandle
# New visual status display when running remote traces
with model.trace("Hello", remote=True):
output = model.lm_head.output.save()
# Output now shows:
# ⠋ [job-id] QUEUED (2.3s)
# ● [job-id] RUNNING (0.5s)
# ✓ [job-id] COMPLETED (1.2s)vLLM Token Input Compatibility
vLLM now accepts a broader range of input formats, matching the flexibility of LanguageModel:
- Token ID lists:
model.trace([1, 2, 3, 4]) - HuggingFace tokenizer outputs:
model.trace(tokenizer("Hello", return_tensors="pt")) - Dictionary with
input_ids:model.trace({"input_ids": tensor, "attention_mask": mask})
from nnsight.modeling.vllm import VLLM
model = VLLM("gpt2", dispatch=True)
# Now works with pre-tokenized inputs
tokens = tokenizer("Hello world", return_tensors="pt")
with model.trace(tokens, temperature=0.0):
logits = model.logits.output.save()vLLM Auto-Dispatch
vLLM models now automatically dispatch when entering a trace context without dispatch=True, matching the behavior of LanguageModel:
model = VLLM("gpt2") # No dispatch=True needed
# Automatically dispatches on first trace
with model.trace("Hello"):
output = model.logits.output.save()Envoy.devices Property
New property to retrieve all devices a model is distributed across:
model = LanguageModel("meta-llama/Llama-3.1-70B", device_map="auto")
print(model.devices) # {device(type='cuda', index=0), device(type='cuda', index=1), ...}Auto-Detect API Key
The API key is now automatically detected from multiple sources in order:
NDIF_API_KEYenvironment variable- Google Colab userdata (
userdata.get("NDIF_API_KEY")) - Saved configuration
# No need to manually set if NDIF_API_KEY is in your environment
import os
os.environ["NDIF_API_KEY"] = "your-key"
# Works automatically
with model.trace("Hello", remote=True):
output = model.output.save()vLLM Optional Dependency
vLLM is now available as an optional dependency with a pinned Triton version for stability:
pip install nnsight[vllm]
# Installs: vllm>=0.12, triton==3.5.0🐛 Bug Fixes
vLLM Auto-Dispatch Fix
Fixed an issue where vLLM would fail when tracing without explicitly setting dispatch=True. The model now auto-dispatches when needed.
NDIF Status Null Revision Handling
Fixed a bug where ndif_status() and is_model_running() would fail when a model's revision was null in the API response. Now properly defaults to "main".
Type Annotations for _prepare_input
Corrected type annotations in LanguageModel._prepare_input() to properly reflect the accepted input types.
Attention Mask Handling
Fixed a bug where attention masks were incorrectly overwritten during batching. The attention mask is now only applied when explicitly provided.
HTTPS Configuration
Simplified API configuration by using full URLs (https://api.ndif.us) instead of separate HOST and SSL fields, reducing potential misconfiguration.
Performance: Automatic Attention Mask Creation
Removed automatic attention mask creation for language models, improving performance by avoiding unnecessary tensor operations when the mask isn't needed.
📚 Documentation
New Comprehensive Documentation Files
Two major documentation files have been added:
CLAUDE.md(~1,800 lines): AI agent-focused guide covering all NNsight features with practical examples, common patterns, and gotchasNNsight.md(~3,800 lines): Deep technical documentation covering NNsight's internal architecture including tracing, interleaving, Envoy system, vLLM integration, and remote execution
README Improvements
- Redesigned header inspired by vLLM style
- Added out-of-order access warning with troubleshooting table
- Added
tracer.iter[:]footgun warning - Fixed documentation examples
Walkthrough Updates
The NNsight_Walkthrough.ipynb has been streamlined and updated for clarity, with a restructured practical focus.
🔧 Internal Changes
Logging System Refactor
- Removed
src/nnsight/log.py(old logging module) - Consolidated job status display into
RemoteBackendwith the newJobStatusDisplayclass - Better separation of concerns between logging and remote execution
Configuration Refactor
ConfigModel.load()now handles environment variable overrides internally- Removed deprecated
SSLconfiguration field - Host configuration now uses full URL format
NDIF Status Improvements
- Added proper enums for
Status,ModelStatus, andDeploymentType - Better error handling with graceful fallbacks
- Improved docstrings with usage examples
vLLM Updates
- Updated to use
cached_tokenizer_from_config(replacing deprecatedinit_tokenizer_from_configs) - Uses
TokensPromptfor token-based inputs - Proper pad token handling
Test Suite Improvements
-
New test files:
test_debug.py: Comprehensive debugging and exception teststest_remote.py: Remote execution tests for NDIFtest_vllm_dispatch_bug.py: Regression test for vLLM auto-dispatchconftest.py: Shared pytest fixturesdebug_demo.py,explore_remote.py,explore_remote_advanced.py: Development utilities
-
Test reorganization: Following pytest best practices with shared fixtures and improved organization
-
CI update: Limited pytest to
test_lm.pyandtest_tiny.pyfor faster CI runs
⚠️ Breaking Changes
Configuration
CONFIG.API.SSLhas been removed. TheCONFIG.API.HOSTnow includes the full URL with protocol (e.g.,https://api.ndif.us)
Remote Logging
CONFIG.APP.REMOTE_LOGGINGno longer triggers a callback when changed. The newJobStatusDisplayclass handles all logging internally based on this setting.
📦 Dependencies
| Change | Details |
|---|---|
| New optional | vllm>=0.12 (via nnsight[vllm]) |
| New optional | triton==3.5.0 (pinned for vLLM stability) |
🙏 Contributors
- @Butanium - vLLM auto-dispatch fix, input compatibility, type annotations
- NDIF Team - Remote logging refactor, documentation, NDIF status improvements
Upgrade Guide
pip install --upgrade nnsight
# For vLLM support
pip install --upgrade nnsight[vllm]If you were using CONFIG.API.SSL:
# Before (v0.5.13)
CONFIG.API.HOST = "api.ndif.us"
CONFIG.API.SSL = True
# After (v0.5.14)
CONFIG.API.HOST = "https://api.ndif.us"Full Changelog: v0.5.13...v0.5.14
v0.5.13
Release Notes:
v0.5.13
1. nnsight support for vLLM inference has been complexly refactored and works with the latest version of vLLM, including tensor parallelism. Enabling fast inference on multi-GPU models with NNsight interventions!
if __name__ == "__main__":
from nnsight.modeling.vllm import VLLM
model = VLLM("meta-llama/Llama-3.1-8B", dispatch=True, tensor_parallel_size=2)
with model.trace(
"The Eiffel Tower is located in the city of",
temperatur=0.8,
max_tokens=30,
) as tracer:
activations = list().save()
logits = list().save()
samples = list().save()
with tracer.iter[:30]:
activations.append(model.model.layers[16].mlp.down_proj.output[0].cpu())
logits = logits.append(model.logits.output)
samples.append(model.samples.output)
output = tracer.result.save()Feedback on our vLLM integration would be much appreciated.
Works with vLLM>=0.12
2. Optimizations to interleaving induced performance improvements across the board. Noticeable when doing a significant amount of interventions.
In addition, there are three config flags you can set which see a much greater improvement but require code changes or are more experimental.
from nnsight import CONFIG as NNSIGHT_CONFIG
NNSIGHT_CONFIG.APP.PYMOUNT = False
NNSIGHT_CONFIG.APP.CROSS_INVOKER = False
NNSIGHT_CONFIG.APP.TRACE_CACHING = True- PYMOUNT: Turning this flag off removes the ability to call .save() on arbitrary objects and instead will require you to call
nnsight.save
from nnsight import save
with model.trace("Hello world"):
output = save(model.output)Mounting and un-mounting .save() onto python objects has some performance cost.
- CROSS_INVOKER: Turning this off prevents sharing variable between invokers. This sharing has a performance cost. Most people don't use this anyway so you should probably turn it off.
with model.trace() as tracer:
with tracer.invoke("Hello world"):
hs = model.model.layers[0].output
with tracer.invoke("Hello world"):
model.model.layers[1].output = hs # X UnboundVariable: hs is not defined- TRACE_CACHING: This caches the source code of your trace making future lookups much faster. So if you have a trace in a loop or in a function called more than once, you'll see a significant improvement.
Full Changelog: v0.5.12...v0.5.13
v0.5.12
What's Changed
- Check for tuples and lists becuase when doing a non blovking requests… by @JadenFiotto-Kaufman in #557
Full Changelog: v0.5.11...v0.5.12
v0.5.11
What's Changed
- Refactor iter by @JadenFiotto-Kaufman in #547
- Dev by @JadenFiotto-Kaufman in #548
- bug by @JadenFiotto-Kaufman in #549
- Hf by @JadenFiotto-Kaufman in #553
- Dev by @JadenFiotto-Kaufman in #556
Full Changelog: v0.5.10...v0.5.11
v0.5.10
What's Changed
- Diffusion Max Iteration by @AdamBelfki3 in #544
- add iteration to operation envoy by @AdamBelfki3 in #543
- NDIF Status API by @AdamBelfki3 in #545
- Dev by @JadenFiotto-Kaufman in #546
Full Changelog: v0.5.9...v0.5.10
v0.5.9
What's Changed
- memory leak fix by @JadenFiotto-Kaufman in #540
- Dev by @JadenFiotto-Kaufman in #541
Full Changelog: v0.5.8...v0.5.9
v0.5.8
What's Changed
-
tracer.result by @JadenFiotto-Kaufman in #537
-
Added
tracer.resultto grab the result from any method you're tracing. Same api asenvoy.output/inputbut on theTracerobject itself:
with model.generate("Hello world") as tracer:
result = tracer.result.save()Why this was added:
When you're using .trace(), youre tracing the forward pass of a module where the result can be obtained normally via model.output. However when you're tracing another method like .generate, theres no module to access the complete generation output, model.output refers only to a single iteration.
This was addressed by adding a blank .generator module to LanguageModels where we simply passed the output of .generate through it making it accessible via model.generator.output. This would have to be done explicitly for any method output we want to support.
The is now addressed with tracer.result and works generally with all method on any model/Envoy. Here is wrapping a VLM with NNsight and getting the result of .generate():
import torch
from nnsight import NNsight
from transformers import AutoProcessor
from transformers import Qwen2_5_VLForConditionalGeneration
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct", trust_remote_code=True)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-3B-Instruct",
dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Who is the president of the United States?"},
],
},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = processor(text=[text], return_tensors="pt").to("cuda:0")
input_ids = prompt.pop("input_ids")
lm = NNsight(model)
with lm.generate(input_ids, **prompt) as tracer:
result = tracer.result.save()Full Changelog: v0.5.7...v0.5.8
v0.5.7
What's Changed
- Performance changes by @JadenFiotto-Kaufman in #529
- Dev by @JadenFiotto-Kaufman in #530
- Dev by @JadenFiotto-Kaufman in #531
Full Changelog: v0.5.6...v0.5.7
v0.5.6
What's Changed
- Remote classes by @JadenFiotto-Kaufman in #526
- Dev by @JadenFiotto-Kaufman in #527
Full Changelog: v0.5.5...v0.5.6