Feat/stand alone demo by hanneshapke · Pull Request #30 · dataiku/kiji-inspector

hanneshapke · 2026-04-03T21:16:46Z

No description provided.

- Switch subject model from Gemma-3-4B to Nemotron-3-Nano-30B - Load SAE from HuggingFace Hub (davidnet/kiji-inspector-...) via SAE.from_pretrained(repo_id=...) instead of local file paths - Add build_ui_data() to transform SAE analysis into UI-ready JSON - Add index.html: interactive explainer showing tool decisions, SAE feature bars, comparison chart, and contrast theme cards - HTML loads output/ui_data.json (live data) with mock fallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The final recommendation prompt concatenated all 12 prior analyses, creating a sequence too long for an 80GB GPU with the 30B model. - Per-tool generation: 300 → 150 tokens - Final prompt context: truncated to last 4000 chars - Final generation: 800 → 500 tokens Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The naive Mamba path (no causal-conv1d on aarch64) materializes huge intermediate tensors. Phase 4 explanation prompts are long and OOM on single 80GB GPUs. Now skipped by default, opt in with --explain. Also restores per-tool generation to 300 tokens and adds torch.cuda.empty_cache() between phases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allows using a Docker vLLM server (or any OpenAI-compatible endpoint) instead of in-process vLLM, bypassing torch/vllm ABI compatibility issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Non-MoE models (like Gemma 4) fail when this flag is set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Filters out contrastive pairs where anchor_tool == contrast_tool, as these provide no decision signal for the SAE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Source vllm from nightly cu129 wheel index - Switch pytorch indexes from cu128 to cu129 - Pin transformers==5.5.0 with uv override for gemma4 support - Add unsafe-best-match index strategy for cross-index resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The LLM sometimes returns a bare int/string instead of a JSON object. Now catches ValueError and AttributeError alongside JSONDecodeError. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allows running the demo with a locally trained SAE checkpoint instead of downloading from HuggingFace Hub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Load contrastive_features.json from pipeline output to annotate each active SAE feature with its associated training themes (diy_vs_professional, urgent_vs_planned, etc.) and compute per-step theme activation scores. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Split CrewAI into 4 single-tool agents + direct HFEngine synthesis to avoid Nemotron struggling with ReAct multi-step loops - Custom prompt templates for tool agents, bypassing CrewAI's default ReAct scaffolding that confused the local model - Fix step label detection: match tool names case-insensitively across all message content instead of fragile keyword matching - Use completion-style synthesis prompt ("Complete this sentence:") to prevent model from meta-reasoning about format - Only extract layer 20 activations (the SAE layer), not all 5 layers - Default --sae-local-dir to output_merged/ for local SAE checkpoint - Fix contrastive features path lookup for output_merged layout - UI: rename "What the AI Noticed" to "What was the model thinking" - UI: move explanation sentence above feature bars - UI: sort features by activation strength descending - UI: new index.html design (old saved as index_old.html) - Generate tool-specific explanations tied to contrastive themes - Pin vllm to v0.18.0 wheel, cap requires-python <3.14 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The model was echoing CrewAI scaffolding instead of generating a recommendation. Reworded both synthesis prompts (CrewAI and scripted flows) to use positive instructions, label research data as "Key findings" so the model treats it as reference material, and remove negation-based instructions that backfire on smaller models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The model was echoing CrewAI ReAct patterns (Thought/Action/Observation) instead of generating recommendations because the research context passed to the synthesis prompt contained those scaffolding lines. Added _strip_scaffolding() to remove them from both the CrewAI and scripted flows before building the final synthesis prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous approach stripped ReAct line patterns but left behind CrewAI task descriptions, tool instructions, and expected_output text embedded in the task output strings. The model was still picking up on these as instructions. Now _strip_scaffolding() extracts only the text after "Final Answer:" from each task output, with a fallback to line-stripping if no Final Answer is found. Also applies stripping per-task before joining, so each answer is isolated cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The model was echoing ReAct scaffolding because the research context passed to synthesis contained CrewAI task output (full of Thought/Action patterns and meta-instructions). Now both the CrewAI and scripted flows build synthesis context directly from the raw tool data dictionaries (_TOOL_SOURCES), which contain only clean JSON. This completely eliminates the source of scaffolding contamination. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Nemotron starts a <think> block and meta-reasons about the instructions instead of writing the recommendation. Fix: append "The agent recommends" after the chat template so it appears as the start of the assistant's response. The model must continue the sentence with actual content instead of reasoning about what to do. Also simplified system prompt and moved the task instruction into the user message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Nemotron emits <think>...</think> blocks even with pre-filled assistant response. Added _clean_model_output() to strip both closed and unclosed think tags. Also bumped synthesis max_tokens from 120 to 256 so the recommendation doesn't get cut off mid-sentence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The pre-fill bypasses the opening <think> tag but the model still emits a closing </think>, which leaked into the recommendation text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hanneshapke and others added 9 commits March 26, 2026 14:46

WIP

85f436d

refactored demo

9b93b3e

chore: update SAE repo to multiple-scenarios variant

94f0f96

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add --api-base flag for OpenAI-compatible API pair generation

6cf9f9d

Allows using a Docker vLLM server (or any OpenAI-compatible endpoint) instead of in-process vLLM, bypassing torch/vllm ABI compatibility issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feat/stand-alone-demo

16fbb6b

fix: remove enable_expert_parallel from pair generation LLM init

1870319

Non-MoE models (like Gemma 4) fail when this flag is set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hanneshapke marked this pull request as draft April 3, 2026 21:16

hanneshapke and others added 18 commits April 6, 2026 09:48

fix: exclude same-tool pairs during SAE training pair loading

eb554ef

Filters out contrastive pairs where anchor_tool == contrast_tool, as these provide no decision signal for the SAE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: handle non-dict JSON responses in feature labeling

f5174f1

The LLM sometimes returns a bare int/string instead of a JSON object. Now catches ValueError and AttributeError alongside JSONDecodeError. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add --sae-local-dir flag to load SAE from local pipeline output

41b8763

Allows running the demo with a locally trained SAE checkpoint instead of downloading from HuggingFace Hub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

updated sae path

bfd1e11

working demo

4fbc584

web demo

8cb14ba

WIP: add debug logging around synthesis prompt/response

36e4c53

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: strip orphaned </think> closing tags from model output

353c321

The pre-fill bypasses the opening <think> tag but the model still emits a closing </think>, which leaked into the recommendation text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add whitespace after "The agent recommends" prefix

87689ea

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/stand alone demo#30

Feat/stand alone demo#30
hanneshapke wants to merge 27 commits intomainfrom
feat/stand-alone-demo

hanneshapke commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanneshapke commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant