-
Notifications
You must be signed in to change notification settings - Fork 5
HTML/SVG based LogitLens widget, plus "chunkier" backend for logit lens. #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
davidbau
wants to merge
13
commits into
ndif-team:widget-refactoring
Choose a base branch
from
davidbau:kitwidget
base: widget-refactoring
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
@davidbau is attempting to deploy a commit to the NDIF Team on Vercel. A member of the Team first needs to authorize it. |
Core features: - V2 lens endpoint with rank and entropy data support - LogitLens widget with heatmap, trajectory chart, and pin/group support - React integration via LogitLensWidgetEmbed component - Bidirectional hover sync between widget and React TokenArea Widget fixes: - Fix pinned row visibility with two-pass rendering algorithm - Fix popup positioning when near right edge of viewport - Fix hover trajectory display in rank mode - Fix widget ID collisions in Jupyter notebooks Co-Authored-By: Claude <[email protected]>
- Add test configuration with GPT-2 only for fast local testing - Add pytest fixtures for test client and app state - Add comprehensive tests for V2, grid, and line lens endpoints - Tests run with REMOTE=false using local nnsight execution Co-Authored-By: Claude <[email protected]>
- Add auto-pin last row with prominent token support - Simplify show_logit_lens to use **kwargs - Replace setEventHandlers with on/off event system - Add rank and entropy support to collect_logit_lens - Simplify pinned row serialization format - Fix NDIF remote execution issues - Unify collect_logit_lens between API and notebook module Co-Authored-By: Claude <[email protected]>
Widget tests (Playwright): - Initialization, rendering, hover interactions - Pin/unpin tokens, metric switching (prob/rank) - Dark mode, title editing, state serialization - Visual regression tests Module tests (pytest): - Model architecture detection (GPT-2, Llama, Gemma) - Data collection with collect_logit_lens() - HTML/widget generation with show_logit_lens() E2E tests: - Full stack browser tests with real GPT-2 inference - API endpoint validation Co-Authored-By: Claude <[email protected]>
- Add smoke test notebook for quick validation - Add tutorial notebook with interactive walkthrough - Add Playwright-based Colab test runner - Add auth setup flow for Google Colab authentication - Include data size measurement utilities for Llama 70B Tests verify widget renders correctly in real Colab environment with NDIF remote execution. Co-Authored-By: Claude <[email protected]>
- LogitLens Python API documentation (collect and display functions) - Data format specification (how data flows from model to widget) - JavaScript widget API documentation for web embedding - Frontend README with development and testing guides - Testing guide with all test types and commands Co-Authored-By: Claude <[email protected]>
- Add unified test runner (scripts/test.sh) for all test types - Auto-start/stop servers as needed for different test suites - Add architecture diagram to README - Add Colab link to tutorial notebook - Update package dependencies for testing - Clean up project structure Co-Authored-By: Claude <[email protected]>
Remove duplicate conversion logic from process_v2_results in lens.py. Now both API and notebook module use the same to_js_format function from workbench.logitlens.display for converting tensors to V2 JSON. For local execution, result already contains vocab/model/input/layers. For remote execution (raw tensors only), we build missing metadata before calling to_js_format. Co-Authored-By: Claude <[email protected]>
Use the real collect_logit_lens from workbench.logitlens.collect instead of maintaining a separate 145-line copy in the test file. Co-Authored-By: Claude <[email protected]>
- Add svg() helper to utils.ts for cleaner SVG element creation - Consolidate duplicated legend-building code into createLegendEntry() - Refactor X-axis, Y-axis, and clip path construction to use svg() helper Reduces chart.ts by ~60 lines while improving readability. The verbose setAttribute() calls are now declarative object literals. Co-Authored-By: Claude <[email protected]>
- Simplify cell text color determination from 16 lines to 4 lines - Fix memory leak: hint hover listeners were re-attached on every rebuild - Move hint listener setup to initialization (runs once, not per rebuild) Co-Authored-By: Claude <[email protected]>
- Fix bug where hovering row A didn't show pinned token trajectory when different row B was pinned (chart.ts positionsToShow logic) - Add unit test for hover trajectory with pinned rows - Add SQLite database initialization to test.sh and e2e.spec.ts - Fix test.sh widget grep to exclude React Integration Tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
For long prompts, only return the last max_loc token positions. This reduces memory/bandwidth when analyzing prompts with many tokens but you only care about the final predictions. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR includes several enhancements to the LogitLens widget and adds backend test infrastructure:
Widget Improvements
Backend Fixes
all_entropy(containing dead proxy objects) was used outside the trace context instead of the properly savedentropytensorBackend Test Infrastructure
test.toml) with only GPT-2 for fast local testingREMOTE=falseusing local nnsight executionTest plan
uv run pytest workbench/_api/tests/ -vto verify all 9 tests passManual testing:
🤖 Generated with Claude Code