Skip to content

Conversation

@davidbau
Copy link
Member

@davidbau davidbau commented Jan 6, 2026

Summary

This PR includes several enhancements to the LogitLens widget and adds backend test infrastructure:

Widget Improvements

  • Bidirectional hover sync: Hovering tokens in the React TokenArea highlights the corresponding row in the widget heatmap, and vice versa
  • Popup positioning fix: Cell prediction popups now appear to the left when near the right edge of the viewport
  • Rank mode trajectory fix: Gray hover trajectory lines now correctly display rank data when in rank mode

Backend Fixes

  • V2 endpoint entropy fix: Fixed bug where all_entropy (containing dead proxy objects) was used outside the trace context instead of the properly saved entropy tensor

Backend Test Infrastructure

  • Added pytest, pytest-asyncio, and httpx to dev dependencies
  • Created test config (test.toml) with only GPT-2 for fast local testing
  • Added comprehensive tests for lens endpoints (V2, grid, line)
  • Tests run with REMOTE=false using local nnsight execution
  • All 9 tests pass including V2 with entropy enabled

Test plan

  • Run uv run pytest workbench/_api/tests/ -v to verify all 9 tests pass
  • More automated tests needed.

Manual testing:

  • Test hover sync by hovering over tokens in the sidebar and seeing heatmap row highlight
  • Test popup positioning by clicking cells near the right edge of the screen
  • Test rank mode trajectory by switching to rank mode and hovering over cells

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Jan 6, 2026

@davidbau is attempting to deploy a commit to the NDIF Team on Vercel.

A member of the Team first needs to authorize it.

@davidbau davidbau requested a review from AdamBelfki3 January 6, 2026 22:12
@AdamBelfki3 AdamBelfki3 changed the base branch from main to widget-refactoring January 7, 2026 19:24
davidbau and others added 7 commits January 8, 2026 05:50
Core features:
- V2 lens endpoint with rank and entropy data support
- LogitLens widget with heatmap, trajectory chart, and pin/group support
- React integration via LogitLensWidgetEmbed component
- Bidirectional hover sync between widget and React TokenArea

Widget fixes:
- Fix pinned row visibility with two-pass rendering algorithm
- Fix popup positioning when near right edge of viewport
- Fix hover trajectory display in rank mode
- Fix widget ID collisions in Jupyter notebooks

Co-Authored-By: Claude <[email protected]>
- Add test configuration with GPT-2 only for fast local testing
- Add pytest fixtures for test client and app state
- Add comprehensive tests for V2, grid, and line lens endpoints
- Tests run with REMOTE=false using local nnsight execution

Co-Authored-By: Claude <[email protected]>
- Add auto-pin last row with prominent token support
- Simplify show_logit_lens to use **kwargs
- Replace setEventHandlers with on/off event system
- Add rank and entropy support to collect_logit_lens
- Simplify pinned row serialization format
- Fix NDIF remote execution issues
- Unify collect_logit_lens between API and notebook module

Co-Authored-By: Claude <[email protected]>
Widget tests (Playwright):
- Initialization, rendering, hover interactions
- Pin/unpin tokens, metric switching (prob/rank)
- Dark mode, title editing, state serialization
- Visual regression tests

Module tests (pytest):
- Model architecture detection (GPT-2, Llama, Gemma)
- Data collection with collect_logit_lens()
- HTML/widget generation with show_logit_lens()

E2E tests:
- Full stack browser tests with real GPT-2 inference
- API endpoint validation

Co-Authored-By: Claude <[email protected]>
- Add smoke test notebook for quick validation
- Add tutorial notebook with interactive walkthrough
- Add Playwright-based Colab test runner
- Add auth setup flow for Google Colab authentication
- Include data size measurement utilities for Llama 70B

Tests verify widget renders correctly in real Colab environment
with NDIF remote execution.

Co-Authored-By: Claude <[email protected]>
- LogitLens Python API documentation (collect and display functions)
- Data format specification (how data flows from model to widget)
- JavaScript widget API documentation for web embedding
- Frontend README with development and testing guides
- Testing guide with all test types and commands

Co-Authored-By: Claude <[email protected]>
- Add unified test runner (scripts/test.sh) for all test types
- Auto-start/stop servers as needed for different test suites
- Add architecture diagram to README
- Add Colab link to tutorial notebook
- Update package dependencies for testing
- Clean up project structure

Co-Authored-By: Claude <[email protected]>
davidbau and others added 6 commits January 8, 2026 09:34
Remove duplicate conversion logic from process_v2_results in lens.py.
Now both API and notebook module use the same to_js_format function
from workbench.logitlens.display for converting tensors to V2 JSON.

For local execution, result already contains vocab/model/input/layers.
For remote execution (raw tensors only), we build missing metadata
before calling to_js_format.

Co-Authored-By: Claude <[email protected]>
Use the real collect_logit_lens from workbench.logitlens.collect
instead of maintaining a separate 145-line copy in the test file.

Co-Authored-By: Claude <[email protected]>
- Add svg() helper to utils.ts for cleaner SVG element creation
- Consolidate duplicated legend-building code into createLegendEntry()
- Refactor X-axis, Y-axis, and clip path construction to use svg() helper

Reduces chart.ts by ~60 lines while improving readability.
The verbose setAttribute() calls are now declarative object literals.

Co-Authored-By: Claude <[email protected]>
- Simplify cell text color determination from 16 lines to 4 lines
- Fix memory leak: hint hover listeners were re-attached on every rebuild
- Move hint listener setup to initialization (runs once, not per rebuild)

Co-Authored-By: Claude <[email protected]>
- Fix bug where hovering row A didn't show pinned token trajectory when
  different row B was pinned (chart.ts positionsToShow logic)
- Add unit test for hover trajectory with pinned rows
- Add SQLite database initialization to test.sh and e2e.spec.ts
- Fix test.sh widget grep to exclude React Integration Tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
For long prompts, only return the last max_loc token positions.
This reduces memory/bandwidth when analyzing prompts with many tokens
but you only care about the final predictions.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant