(dev -> main) App updates and backwards transformer ci functions#355
(dev -> main) App updates and backwards transformer ci functions#355danbraunai-goodfire wants to merge 602 commits intomainfrom
Conversation
|
@claude can you review this PR? Note that your review should cover the scope of this PR only. If you spot things unrelated to this PR, feel free to bring them up and we'll consider them for new issues. I want you to think about possible ways that the implementation could be improved. Including ways it could be re-written completely using different abstractions. I want critical comments, do feel comfortable to give me the hard truths. |
|
Claude finished @danbraunai-goodfire's task —— View job PR Review: App updates and backwards transformer ci functionsReview Scope: 288 commits, 100 files changed, focusing on app updates and transformer CI functions Todo List
Critical Review & Hard Truths🔥 Major Architectural Problems1. Backend Compute Layer is a God Object (
2. Database Schema Fighting the Type System (
3. Frontend State Management is Chaotic (
🚨 Code Quality Issues4. Layer Aliasing Global State ( let cachedArchitecture: Architecture | null = null;
5. Bulk API Endpoint Design (
6. Configuration Explosion (
💡 Missed Opportunities7. TypeScript Discriminated Unions Done Halfway // Current: Can still represent invalid states
type OptimizeConfigDraft = {
loss: LossConfigDraft; // Could be invalid
// ...
}
// Better: Make invalid states unrepresentable
type OptimizeState =
| { status: "draft"; config: Partial<OptimizeConfig> }
| { status: "valid"; config: ValidOptimizeConfig }
| { status: "computing"; config: ValidOptimizeConfig }8. Database Caching Strategy
🎯 Specific Technical Issues9. Performance Anti-patterns:
10. Error Handling Inconsistency:
🔨 Complete Rewrite SuggestionsAlternative 1: Event-Driven Architecture # Commands
@dataclass
class ComputeAttributionsCommand:
prompt_id: int
optimization_params: OptimParams | None
# Events
@dataclass
class AttributionsComputed:
prompt_id: int
graph_data: GraphDataAlternative 2: Functional Core + Imperative Shell def compute_attributions(model: Model, tokens: Tensor, config: Config) -> Graph:
# Pure function - no side effects, easy to test
pass
class AttributionService:
# Imperative shell - handles caching, DB, progress callbacks
pass🚧 Breaking Changes NeededThe current architecture is trying to be too general. Consider:
Bottom Line: This codebase is evolving from research prototype to production-like complexity without the discipline. Either embrace the complexity with proper abstractions, or ruthlessly simplify back to research-code simplicity. The current middle ground is the worst of both worlds. |
…n_contexts, correlations)
This reverts commit ef0e1e3.
…scribing - Rewrite graphLayout.ts: 182 -> 95 lines, no ModelInfo dependency - Remove ModelInfo type and getModelInfo() API call - Remove modelInfo state from useRun.svelte.ts - Remove displayNames/getLayerDisplayName from all components - Node keys are now canonical addresses (e.g. '0.attn.q:3:5') - Layout derived directly from address structure
…Info endpoint, fix canonical_str
…output node cap - Cap output nodes to 15 per position (compute + display) to keep edge count tractable with large vocabularies (50k vs 4k) - Handle missing harvest data gracefully: interpretations, activation contexts, component data bulk all return empty/null instead of 500 - Fix fetchJson to handle non-JSON error responses (raw tracebacks) - Frontend: ActivationContextsTab shows helpful message when no harvest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onent_data topology.py: - Regex-based CanonicalWeight.parse() instead of manual string splitting - Pull parse_target_path/render_canonical_weight into PathSchema base (5 copies → 1) - Dict lookups in sublayer schemas instead of if/elif chains - @OverRide annotations — 0 basedpyright warnings component_data.py: - Replace try/except(AssertionError, FileNotFoundError) with explicit harvest.has_correlations() / has_token_stats() checks - Add has_correlations() and has_token_stats() to HarvestCache Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instruments: CI forward pass, CI-masked forward, gradient forward, alive info (with component counts), per-target edge computation, node extraction, build_out_probs, save_graph, process_edges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…yload - AppTokenizer: escape control chars for display (tab→⇥, newline→↵) - Harvester: strip padding sentinels at write time, not in router - Remove all padding handling from activation_contexts router - Reduce bulk prefetch limits (100→10 examples, 20→10 correlations/stats) - Fix TokenPillList duplicate key error (key by index, not token string) - Truncate prompt previews in char space (60 chars) not token space Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
"wte" is a concrete module name (GPT-2's word token embedding), not a good canonical name. "embed" is model-agnostic. Concrete paths in PathSchema subclasses (embedding_path = "wte") are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Thread RateLimiter through scoring pipelines (intruder, detection, fuzzing) - Move MAX_REQUESTS_PER_MINUTE to llm_api module (single source of truth) - Minor app cleanup (unused import, runs endpoint/API) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Handles old harvest data that still has -1 padding sentinels on disk. The HF tokenizer overflows on -1 token IDs, so strip at the type boundary to protect all consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace HarvestCache (singleton with in-memory caching) with per-category repos that read through on every call. No restart required when files are written while app is running. - HarvestRepo: activation contexts, correlations, token stats - InterpRepo: interpretations, eval scores (intruder/detection/fuzzing) - AttributionRepo: dataset attribution matrix - All routers migrated to use repos via loaded.harvest/interp/attributions - Tensor data (.pt) stays as-is, component data stays as JSONL (for now) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the hand-rolled byte-offset JSONL index with a proper SQLite database (harvest.db). Component data stored as JSON blobs per row, config as key-value pairs. WAL mode for concurrent reads. - New: spd/harvest/db.py (HarvestDB class, 151 lines) - Removed: ~165 lines of byte-offset index + mmap bulk loader - HarvestResult.save() writes to SQLite instead of JSONL + summary.json - HarvestRepo + loaders.py rewritten to use HarvestDB - Correlations/token_stats stay as .pt files (dense tensors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix RateLimiter lock-during-sleep bug that serialized all coroutines - Bundle OpenRouter + RateLimiter + CostTracker into LLMClient with single chat() method that handles budget checks, rate limiting, retries, and cost tracking - Rate limit per API call (not per task) — fixes unthrottled bursts from multi-trial scorers - Remove pipeline.py abstraction — each caller (interpret, intruder, detection, fuzzing) owns its full pipeline flow inline - Add BudgetExceededError for clean budget enforcement - Add LLMClientConfig for client construction params Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace JSONL files + complex directory scanning with a single SQLite database (interp.db) per run. Interpretations and all eval scores (intruder, detection, fuzzing) stored in one place. - New: spd/autointerp/db.py (InterpDB class) - interpret.py: writes to SQLite, resume via db.get_completed_keys() - scoring scripts: write scores to SQLite instead of timestamped JSONL - InterpRepo: backed by InterpDB with lazy init + save_interpretation() - loaders.py: simplified to thin wrappers over InterpDB - Migration script: scripts/migrate_autointerp_to_sqlite.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle missing 'prompt' field in old InterpretationResult - Handle missing 'component_acts' field in old ActivationExample - Handle extra fields in old HarvestConfig / ComponentData - Gracefully skip unparseable configs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All scripts use HarvestRepo/InterpRepo instead of loader functions - run_interpret takes HarvestRepo instead of (run_id, correlations_dir, ci_threshold) - correlations router uses loaded.harvest.get_component() directly - dataset_attributions harvest uses HarvestRepo.get_summary() - harvest/loaders.py deleted (zero remaining consumers) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables CI threshold slider to re-filter output nodes without recomputing. out_probs are now computed from logits at display time via filter_graph_for_display(). - StoredGraph: out_probs dict → ci_masked_out_logits + target_out_logits tensors - DB: output_probs_data TEXT → output_logits BLOB (torch.save binary) - Consolidate _add_pseudo_layer_nodes + process_edges + build_out_probs into single filter_graph_for_display() → FilteredGraph - Remove output_prob_threshold from optimized graph endpoint Breaking: delete .data/app/prompt_attr.db Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle None returns from get_correlations/get_token_stats in routers - Fix activation_contexts: handle None from get_component - Update test_server_api.py: HarvestCache → repos - Remove unused autointerp_run_id CLI param - Remove unused t_start variable in compute.py 0 errors, 0 warnings from basedpyright + ruff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… limit probing Adds XML/single-line rendering variants for rich_examples, compact_skeptical, and dual_view strategies. Includes scripts for generating the prompt strategy gallery, sweep results dashboard, threshold sweep preparation, and provider rate limit probing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nvention, dataset context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Context tokens per side: 20 - EOT token: <|endoftext|> - Seq len: 512 - Model params: ~42M - Act values: "roughly in (-1, 1)" not "(0, 1)" - Rephrase TODO sentence about high act / low CI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix firing threshold: use actual harvest threshold, not hypothetical - Clarify output PMI: explicitly about next-token predictions - Remove "Pythia fashion" jargon, describe format directly - Add "read direction" / "write direction" terminology for V and U Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New strategy with full SPD method explanation, CI-vs-act guidance, sign convention clarity, dynamic CI threshold from harvest config, and fixed XML rendering. Submitted for evaluation against the earlier 9-variant × 3-threshold sweep on Jose's 200-component subset. Also adds position distribution analysis script, Gemini prompt probe script, and the component subset file used for evaluation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop scratch utilities (position analysis, prompt probing, rate limit testing, sweep dashboard, strategy gallery, sample keys, intruder comparison), planning docs, and the subsets helper — all kept on disk but not part of the canon strategy PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…riptive framing - Add input token stats (recall + PMI) and output recall to canon prompt - Restructure: raw frequencies first, then PMI as "same data normalized by base rate" - Drop precision metrics (saturated on sparse components) - Remove input/output ontology priming and "activation patterns" framing - Less prescriptive task instruction: lead with most salient aspect - Fix V/U description accuracy, soften CI vs act language - Pass input_token_stats through dispatch and interpret for canon strategy - Restore accidentally deleted subsets.py (fixes type errors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…neighbors - Canon-style XML examples with CI/act annotations (no shifted examples) - Recall-first + PMI token stats (output-only for output pass, input-only for input pass, both for unification) - Compact SPD preamble (no inaccurate V/U claims, just dimensions) - Short layer descriptions (attention key proj, layer 1) instead of verbose "in the 2nd of 4 blocks" - Architecture info factored into component header once - Related components: label-first display, positive/negative split with explanations - New `summary_for_neighbors` field: LLM writes a 1-2 sentence summary specifically for downstream/upstream components to read during their labeling - Fix canon prompt V/U description to not claim what directions "do" - Fix "what causes this component to fire" → neutral wording Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…neighbors Unification pass is terminal — no downstream consumers need its summary. Output/input passes use LABEL_SCHEMA (with summary_for_neighbors), unification uses UNIFIED_LABEL_SCHEMA (label + reasoning only). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Single CANON_RENDERING constant in config.py (was duplicated in 3 places) - Shared token_stats_section helper in prompt_helpers.py (was inline in canon.py + graph_interp) - Delete dead build_separated_examples - Remove activation_threshold default in dispatch (caller always provides) - Remove component_keys default in resolve_target_component_keys - Cache pruned Gemini schema on first call instead of deepcopy per request - Narrow _get_output_stats/_get_input_stats to return TokenPRLift (not | None) - direction param typed as Literal["Input", "Output"] - Net -71 lines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Old harvest DBs without n_activation_examples column will need re-harvesting. No backwards-compat shims per project principles. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
Related Issue
Motivation and Context
How Has This Been Tested?
Does this PR introduce a breaking change?