13 Jan 00:40

JadenFiotto-Kaufman

79e1848

v0.5.15 Latest

Latest

What's Changed

refactor(config): streamline environment variable handling in AppConfigModel

Simplified the logic for setting API key and host from environment variables and Colab userdata. by @JadenFiotto-Kaufman in #592

Full Changelog: v0.5.14...v0.5.15

Contributors

JadenFiotto-Kaufman

Assets 2

08 Jan 04:58

JadenFiotto-Kaufman

v0.5.14

7ab8470

v0.5.14

NNsight v0.5.14 Release Notes

Release Date: January 2026

This release focuses on improving the remote execution experience, vLLM compatibility, developer documentation, and overall code quality. It includes 59 commits across 37 files, with significant enhancements to the job status display system, vLLM input handling, and comprehensive new documentation.

✨ New Features

Enhanced Remote Job Status Display

The remote execution logging system has been completely redesigned with a new JobStatusDisplay class that provides:

Real-time visual feedback with Unicode spinners and status icons
ANSI color support with automatic detection for terminals and notebooks
In-place status updates that don't flood the console with repeated messages
Elapsed time tracking per status phase
Seamless Jupyter notebook integration with flicker-free HTML rendering using DisplayHandle

# New visual status display when running remote traces
with model.trace("Hello", remote=True):
    output = model.lm_head.output.save()

# Output now shows:
# ⠋ [job-id] QUEUED     (2.3s)
# ● [job-id] RUNNING    (0.5s) 
# ✓ [job-id] COMPLETED  (1.2s)

vLLM Token Input Compatibility

vLLM now accepts a broader range of input formats, matching the flexibility of LanguageModel:

Token ID lists: model.trace([1, 2, 3, 4])
HuggingFace tokenizer outputs: model.trace(tokenizer("Hello", return_tensors="pt"))
Dictionary with input_ids: model.trace({"input_ids": tensor, "attention_mask": mask})

from nnsight.modeling.vllm import VLLM

model = VLLM("gpt2", dispatch=True)

# Now works with pre-tokenized inputs
tokens = tokenizer("Hello world", return_tensors="pt")
with model.trace(tokens, temperature=0.0):
    logits = model.logits.output.save()

vLLM Auto-Dispatch

vLLM models now automatically dispatch when entering a trace context without dispatch=True, matching the behavior of LanguageModel:

model = VLLM("gpt2")  # No dispatch=True needed

# Automatically dispatches on first trace
with model.trace("Hello"):
    output = model.logits.output.save()

`Envoy.devices` Property

New property to retrieve all devices a model is distributed across:

model = LanguageModel("meta-llama/Llama-3.1-70B", device_map="auto")
print(model.devices)  # {device(type='cuda', index=0), device(type='cuda', index=1), ...}

Auto-Detect API Key

The API key is now automatically detected from multiple sources in order:

NDIF_API_KEY environment variable
Google Colab userdata (userdata.get("NDIF_API_KEY"))
Saved configuration

# No need to manually set if NDIF_API_KEY is in your environment
import os
os.environ["NDIF_API_KEY"] = "your-key"

# Works automatically
with model.trace("Hello", remote=True):
    output = model.output.save()

vLLM Optional Dependency

vLLM is now available as an optional dependency with a pinned Triton version for stability:

pip install nnsight[vllm]
# Installs: vllm>=0.12, triton==3.5.0

🐛 Bug Fixes

vLLM Auto-Dispatch Fix

Fixed an issue where vLLM would fail when tracing without explicitly setting dispatch=True. The model now auto-dispatches when needed.

NDIF Status Null Revision Handling

Fixed a bug where ndif_status() and is_model_running() would fail when a model's revision was null in the API response. Now properly defaults to "main".

Type Annotations for `_prepare_input`

Corrected type annotations in LanguageModel._prepare_input() to properly reflect the accepted input types.

Attention Mask Handling

Fixed a bug where attention masks were incorrectly overwritten during batching. The attention mask is now only applied when explicitly provided.

HTTPS Configuration

Simplified API configuration by using full URLs (https://api.ndif.us) instead of separate HOST and SSL fields, reducing potential misconfiguration.

Performance: Automatic Attention Mask Creation

Removed automatic attention mask creation for language models, improving performance by avoiding unnecessary tensor operations when the mask isn't needed.

📚 Documentation

New Comprehensive Documentation Files

Two major documentation files have been added:

CLAUDE.md (~1,800 lines): AI agent-focused guide covering all NNsight features with practical examples, common patterns, and gotchas
NNsight.md (~3,800 lines): Deep technical documentation covering NNsight's internal architecture including tracing, interleaving, Envoy system, vLLM integration, and remote execution

README Improvements

Redesigned header inspired by vLLM style
Added out-of-order access warning with troubleshooting table
Added tracer.iter[:] footgun warning
Fixed documentation examples

Walkthrough Updates

The NNsight_Walkthrough.ipynb has been streamlined and updated for clarity, with a restructured practical focus.

🔧 Internal Changes

Logging System Refactor

Removed src/nnsight/log.py (old logging module)
Consolidated job status display into RemoteBackend with the new JobStatusDisplay class
Better separation of concerns between logging and remote execution

Configuration Refactor

ConfigModel.load() now handles environment variable overrides internally
Removed deprecated SSL configuration field
Host configuration now uses full URL format

NDIF Status Improvements

Added proper enums for Status, ModelStatus, and DeploymentType
Better error handling with graceful fallbacks
Improved docstrings with usage examples

vLLM Updates

Updated to use cached_tokenizer_from_config (replacing deprecated init_tokenizer_from_configs)
Uses TokensPrompt for token-based inputs
Proper pad token handling

Test Suite Improvements

New test files:
- test_debug.py: Comprehensive debugging and exception tests
- test_remote.py: Remote execution tests for NDIF
- test_vllm_dispatch_bug.py: Regression test for vLLM auto-dispatch
- conftest.py: Shared pytest fixtures
- debug_demo.py, explore_remote.py, explore_remote_advanced.py: Development utilities
Test reorganization: Following pytest best practices with shared fixtures and improved organization
CI update: Limited pytest to test_lm.py and test_tiny.py for faster CI runs

⚠️ Breaking Changes

Configuration

CONFIG.API.SSL has been removed. The CONFIG.API.HOST now includes the full URL with protocol (e.g., https://api.ndif.us)

Remote Logging

CONFIG.APP.REMOTE_LOGGING no longer triggers a callback when changed. The new JobStatusDisplay class handles all logging internally based on this setting.

📦 Dependencies

Change	Details
New optional	`vllm>=0.12` (via `nnsight[vllm]`)
New optional	`triton==3.5.0` (pinned for vLLM stability)

🙏 Contributors

@Butanium - vLLM auto-dispatch fix, input compatibility, type annotations
NDIF Team - Remote logging refactor, documentation, NDIF status improvements

Upgrade Guide

pip install --upgrade nnsight

# For vLLM support
pip install --upgrade nnsight[vllm]

If you were using CONFIG.API.SSL:

# Before (v0.5.13)
CONFIG.API.HOST = "api.ndif.us"
CONFIG.API.SSL = True

# After (v0.5.14)
CONFIG.API.HOST = "https://api.ndif.us"

Full Changelog: v0.5.13...v0.5.14

Assets 2

19 Dec 22:31

JadenFiotto-Kaufman

v0.5.13

cc4161d

v0.5.13

Release Notes:

v0.5.13

1. nnsight support for vLLM inference has been complexly refactored and works with the latest version of vLLM, including tensor parallelism. Enabling fast inference on multi-GPU models with NNsight interventions!

if __name__ == "__main__":
    from nnsight.modeling.vllm import VLLM

    model = VLLM("meta-llama/Llama-3.1-8B", dispatch=True, tensor_parallel_size=2)

    with model.trace(
        "The Eiffel Tower is located in the city of",
        temperatur=0.8,
        max_tokens=30,
    ) as tracer:

        activations = list().save()
        logits = list().save()
        samples = list().save()

        with tracer.iter[:30]:
            activations.append(model.model.layers[16].mlp.down_proj.output[0].cpu())
            logits = logits.append(model.logits.output)
            samples.append(model.samples.output)

        output = tracer.result.save()

Feedback on our vLLM integration would be much appreciated.

Works with vLLM>=0.12

2. Optimizations to interleaving induced performance improvements across the board. Noticeable when doing a significant amount of interventions.

In addition, there are three config flags you can set which see a much greater improvement but require code changes or are more experimental.

from nnsight import CONFIG as NNSIGHT_CONFIG

NNSIGHT_CONFIG.APP.PYMOUNT = False
NNSIGHT_CONFIG.APP.CROSS_INVOKER = False
NNSIGHT_CONFIG.APP.TRACE_CACHING = True

PYMOUNT: Turning this flag off removes the ability to call .save() on arbitrary objects and instead will require you to call nnsight.save

from nnsight import save
with model.trace("Hello world"):
    output = save(model.output)

Mounting and un-mounting .save() onto python objects has some performance cost.

CROSS_INVOKER: Turning this off prevents sharing variable between invokers. This sharing has a performance cost. Most people don't use this anyway so you should probably turn it off.

with model.trace() as tracer:

    with tracer.invoke("Hello world"):
        hs = model.model.layers[0].output
    with tracer.invoke("Hello world"):
        model.model.layers[1].output = hs # X UnboundVariable: hs is not defined

TRACE_CACHING: This caches the source code of your trace making future lookups much faster. So if you have a trace in a loop or in a function called more than once, you'll see a significant improvement.

Full Changelog: v0.5.12...v0.5.13

Assets 2

21 Nov 23:25

JadenFiotto-Kaufman

v0.5.12

0040937

v0.5.12

What's Changed

Check for tuples and lists becuase when doing a non blovking requests… by @JadenFiotto-Kaufman in #557

Full Changelog: v0.5.11...v0.5.12

Contributors

JadenFiotto-Kaufman

Assets 2

21 Nov 17:26

JadenFiotto-Kaufman

v0.5.11

2470719

v0.5.11

What's Changed

Refactor iter by @JadenFiotto-Kaufman in #547
Dev by @JadenFiotto-Kaufman in #548
bug by @JadenFiotto-Kaufman in #549
Hf by @JadenFiotto-Kaufman in #553
Dev by @JadenFiotto-Kaufman in #556

Full Changelog: v0.5.10...v0.5.11

Contributors

JadenFiotto-Kaufman

Assets 2

30 Oct 18:23

JadenFiotto-Kaufman

v0.5.10

97ea0ad

v0.5.10

What's Changed

Diffusion Max Iteration by @AdamBelfki3 in #544
add iteration to operation envoy by @AdamBelfki3 in #543
NDIF Status API by @AdamBelfki3 in #545
Dev by @JadenFiotto-Kaufman in #546

Full Changelog: v0.5.9...v0.5.10

Contributors

JadenFiotto-Kaufman and AdamBelfki3

Assets 2

14 Oct 22:31

JadenFiotto-Kaufman

v0.5.9

079cf52

v0.5.9

What's Changed

memory leak fix by @JadenFiotto-Kaufman in #540
Dev by @JadenFiotto-Kaufman in #541

Full Changelog: v0.5.8...v0.5.9

Contributors

JadenFiotto-Kaufman

Assets 2

10 Oct 18:20

JadenFiotto-Kaufman

v0.5.8

56f23e8

v0.5.8

What's Changed

tracer.result by @JadenFiotto-Kaufman in #537
Added tracer.result to grab the result from any method you're tracing. Same api as envoy.output/input but on the Tracer object itself:

with model.generate("Hello world") as tracer:

    result = tracer.result.save()

Why this was added:

When you're using .trace(), youre tracing the forward pass of a module where the result can be obtained normally via model.output. However when you're tracing another method like .generate, theres no module to access the complete generation output, model.output refers only to a single iteration.

This was addressed by adding a blank .generator module to LanguageModels where we simply passed the output of .generate through it making it accessible via model.generator.output. This would have to be done explicitly for any method output we want to support.

The is now addressed with tracer.result and works generally with all method on any model/Envoy. Here is wrapping a VLM with NNsight and getting the result of .generate():

import torch
from nnsight import NNsight
from transformers import AutoProcessor
from transformers import Qwen2_5_VLForConditionalGeneration

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct", trust_remote_code=True)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-3B-Instruct",
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Who is the president of the United States?"},
        ],
    },
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = processor(text=[text], return_tensors="pt").to("cuda:0")

input_ids = prompt.pop("input_ids")

lm = NNsight(model)

with lm.generate(input_ids, **prompt) as tracer:
    
    result = tracer.result.save()

Full Changelog: v0.5.7...v0.5.8

Contributors

JadenFiotto-Kaufman

Assets 2

01 Oct 16:04

JadenFiotto-Kaufman

v0.5.7

62cfc35

v0.5.7

What's Changed

Performance changes by @JadenFiotto-Kaufman in #529
Dev by @JadenFiotto-Kaufman in #530
Dev by @JadenFiotto-Kaufman in #531

Full Changelog: v0.5.6...v0.5.7

Contributors

JadenFiotto-Kaufman

Assets 2

29 Sep 19:37

JadenFiotto-Kaufman

v0.5.6

2e8d9ca

v0.5.6

What's Changed

Remote classes by @JadenFiotto-Kaufman in #526
Dev by @JadenFiotto-Kaufman in #527

Full Changelog: v0.5.5...v0.5.6

Contributors

JadenFiotto-Kaufman

Assets 2

Releases: ndif-team/nnsight

v0.5.15

What's Changed

Contributors

Uh oh!

v0.5.14

NNsight v0.5.14 Release Notes

✨ New Features

Enhanced Remote Job Status Display

vLLM Token Input Compatibility

vLLM Auto-Dispatch

Envoy.devices Property

Auto-Detect API Key

vLLM Optional Dependency

🐛 Bug Fixes

vLLM Auto-Dispatch Fix

NDIF Status Null Revision Handling

Type Annotations for _prepare_input

Attention Mask Handling

HTTPS Configuration

Performance: Automatic Attention Mask Creation

📚 Documentation

New Comprehensive Documentation Files

README Improvements

Walkthrough Updates

🔧 Internal Changes

Logging System Refactor

Configuration Refactor

NDIF Status Improvements

vLLM Updates

Test Suite Improvements

⚠️ Breaking Changes

Configuration

Remote Logging

📦 Dependencies

🙏 Contributors

Upgrade Guide

Uh oh!

v0.5.13

Uh oh!

v0.5.12

What's Changed

Contributors

Uh oh!

v0.5.11

What's Changed

Contributors

Uh oh!

v0.5.10

What's Changed

Contributors

Uh oh!

v0.5.9

What's Changed

Contributors

Uh oh!

v0.5.8

What's Changed

Contributors

Uh oh!

v0.5.7

What's Changed

Contributors

Uh oh!

v0.5.6

What's Changed

Contributors

Uh oh!

`Envoy.devices` Property

Type Annotations for `_prepare_input`