Draft
Conversation
- Switch subject model from Gemma-3-4B to Nemotron-3-Nano-30B - Load SAE from HuggingFace Hub (davidnet/kiji-inspector-...) via SAE.from_pretrained(repo_id=...) instead of local file paths - Add build_ui_data() to transform SAE analysis into UI-ready JSON - Add index.html: interactive explainer showing tool decisions, SAE feature bars, comparison chart, and contrast theme cards - HTML loads output/ui_data.json (live data) with mock fallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The final recommendation prompt concatenated all 12 prior analyses, creating a sequence too long for an 80GB GPU with the 30B model. - Per-tool generation: 300 → 150 tokens - Final prompt context: truncated to last 4000 chars - Final generation: 800 → 500 tokens Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The naive Mamba path (no causal-conv1d on aarch64) materializes huge intermediate tensors. Phase 4 explanation prompts are long and OOM on single 80GB GPUs. Now skipped by default, opt in with --explain. Also restores per-tool generation to 300 tokens and adds torch.cuda.empty_cache() between phases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows using a Docker vLLM server (or any OpenAI-compatible endpoint) instead of in-process vLLM, bypassing torch/vllm ABI compatibility issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Non-MoE models (like Gemma 4) fail when this flag is set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Filters out contrastive pairs where anchor_tool == contrast_tool, as these provide no decision signal for the SAE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Source vllm from nightly cu129 wheel index - Switch pytorch indexes from cu128 to cu129 - Pin transformers==5.5.0 with uv override for gemma4 support - Add unsafe-best-match index strategy for cross-index resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The LLM sometimes returns a bare int/string instead of a JSON object. Now catches ValueError and AttributeError alongside JSONDecodeError. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows running the demo with a locally trained SAE checkpoint instead of downloading from HuggingFace Hub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Load contrastive_features.json from pipeline output to annotate each active SAE feature with its associated training themes (diy_vs_professional, urgent_vs_planned, etc.) and compute per-step theme activation scores. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Split CrewAI into 4 single-tool agents + direct HFEngine synthesis
to avoid Nemotron struggling with ReAct multi-step loops
- Custom prompt templates for tool agents, bypassing CrewAI's default
ReAct scaffolding that confused the local model
- Fix step label detection: match tool names case-insensitively across
all message content instead of fragile keyword matching
- Use completion-style synthesis prompt ("Complete this sentence:")
to prevent model from meta-reasoning about format
- Only extract layer 20 activations (the SAE layer), not all 5 layers
- Default --sae-local-dir to output_merged/ for local SAE checkpoint
- Fix contrastive features path lookup for output_merged layout
- UI: rename "What the AI Noticed" to "What was the model thinking"
- UI: move explanation sentence above feature bars
- UI: sort features by activation strength descending
- UI: new index.html design (old saved as index_old.html)
- Generate tool-specific explanations tied to contrastive themes
- Pin vllm to v0.18.0 wheel, cap requires-python <3.14
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The model was echoing CrewAI scaffolding instead of generating a recommendation. Reworded both synthesis prompts (CrewAI and scripted flows) to use positive instructions, label research data as "Key findings" so the model treats it as reference material, and remove negation-based instructions that backfire on smaller models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The model was echoing CrewAI ReAct patterns (Thought/Action/Observation) instead of generating recommendations because the research context passed to the synthesis prompt contained those scaffolding lines. Added _strip_scaffolding() to remove them from both the CrewAI and scripted flows before building the final synthesis prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous approach stripped ReAct line patterns but left behind CrewAI task descriptions, tool instructions, and expected_output text embedded in the task output strings. The model was still picking up on these as instructions. Now _strip_scaffolding() extracts only the text after "Final Answer:" from each task output, with a fallback to line-stripping if no Final Answer is found. Also applies stripping per-task before joining, so each answer is isolated cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The model was echoing ReAct scaffolding because the research context passed to synthesis contained CrewAI task output (full of Thought/Action patterns and meta-instructions). Now both the CrewAI and scripted flows build synthesis context directly from the raw tool data dictionaries (_TOOL_SOURCES), which contain only clean JSON. This completely eliminates the source of scaffolding contamination. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Nemotron starts a <think> block and meta-reasons about the instructions instead of writing the recommendation. Fix: append "The agent recommends" after the chat template so it appears as the start of the assistant's response. The model must continue the sentence with actual content instead of reasoning about what to do. Also simplified system prompt and moved the task instruction into the user message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Nemotron emits <think>...</think> blocks even with pre-filled assistant response. Added _clean_model_output() to strip both closed and unclosed think tags. Also bumped synthesis max_tokens from 120 to 256 so the recommendation doesn't get cut off mid-sentence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pre-fill bypasses the opening <think> tag but the model still emits a closing </think>, which leaked into the recommendation text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.