(dev -> main) App updates and backwards transformer ci functions by danbraunai-goodfire · Pull Request #355 · goodfire-ai/spd

danbraunai-goodfire · 2026-02-04T12:29:08Z

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

danbraunai-goodfire · 2026-02-04T21:12:47Z

@claude can you review this PR? Note that your review should cover the scope of this PR only. If you spot things unrelated to this PR, feel free to bring them up and we'll consider them for new issues. I want you to think about possible ways that the implementation could be improved. Including ways it could be re-written completely using different abstractions. I want critical comments, do feel comfortable to give me the hard truths.

claude · 2026-02-04T21:13:01Z

Claude finished @danbraunai-goodfire's task —— View job

PR Review: App updates and backwards transformer ci functions

Review Scope: 288 commits, 100 files changed, focusing on app updates and transformer CI functions

Todo List

Activate virtual environment and setup
Analyze major architectural changes in the app
Review new abstractions and components
Examine backend API changes
Review frontend component refactoring
Analyze config changes across experiments
Provide critical feedback on implementation approach
Suggest alternative architectures and improvements

Critical Review & Hard Truths

🔥 Major Architectural Problems

1. Backend Compute Layer is a God Object (spd/app/backend/compute.py:1-827)

Problem: 827 lines doing gradient computation, edge calculation, intervention logic, AND result formatting
Anti-pattern: Multiple responsibilities violating SRP

Rewrite suggestion:

# Split into focused modules:
spd/app/backend/compute/
  ├── gradient_engine.py     # Core autograd operations
  ├── attribution_builder.py # Edge construction logic  
  ├── intervention_runner.py # Forward passes with masks
  └── result_formatters.py   # Output transformation

2. Database Schema Fighting the Type System (spd/app/backend/database.py)

Problem: StoredGraph uses discriminated unions in Python, but database still has nullable columns for all possible fields
Result: Invalid states are representable at DB level despite type safety in Python
Solution: Either use proper table inheritance OR embrace the union and validate constraints in application code, don't half-ass both approaches

3. Frontend State Management is Chaotic (PromptAttributionsTab.svelte:565+ lines)

Problem: No state machine - implicit transitions everywhere
Examples:
- TabViewState + GraphComputeState + ComposerState can be in invalid combinations
- UI can be in "loading" state while also in "draft" mode
Rewrite suggestion: Use XState or explicit state machine pattern. Current approach doesn't scale.

🚨 Code Quality Issues

4. Layer Aliasing Global State (spd/app/frontend/src/lib/layerAliasing.ts:57)

let cachedArchitecture: Architecture | null = null;

Problem: Global mutable cache that can become stale, not reset between contexts
Solution: Pass architecture context down explicitly or use a proper store

5. Bulk API Endpoint Design (spd/app/backend/routers/component_data.py)

Good: Eliminated GIL contention by batching requests
Bad: Still sequential execution inside the handler - missed opportunity for true parallelism
Alternative: Use asyncio.gather() to run the three data fetches concurrently

6. Configuration Explosion (spd/configs.py)

Added complex global_reverse_residual CI config with 7+ required fields and complex validation
Problem: Config complexity is growing faster than the value it provides
Question: Why not use dataclasses with defaults instead of elaborate Pydantic validation?

💡 Missed Opportunities

7. TypeScript Discriminated Unions Done Halfway

// Current: Can still represent invalid states
type OptimizeConfigDraft = {
    loss: LossConfigDraft;  // Could be invalid
    // ...
}

// Better: Make invalid states unrepresentable  
type OptimizeState = 
  | { status: "draft"; config: Partial<OptimizeConfig> }
  | { status: "valid"; config: ValidOptimizeConfig }
  | { status: "computing"; config: ValidOptimizeConfig }

8. Database Caching Strategy

Current: Ad-hoc SQLite caching with complex uniqueness constraints
Alternative: Consider Redis/in-memory cache for computed graphs + persistent storage for configuration only
Benefit: Simpler schema, better performance for large graphs

🎯 Specific Technical Issues

9. Performance Anti-patterns:

compute_prompt_attributions() does full gradient computation even for display-only graphs
useComponentData and useComponentDataExpectCached are nearly identical - code duplication
Frontend doesn't virtualize large lists (could be 1000+ components)

10. Error Handling Inconsistency:

Some places use assert for validation (good, fail-fast)
Others use try/catch with graceful degradation (bad for research code)
Pick one approach: Research code should fail fast, not silently degrade

🔨 Complete Rewrite Suggestions

Alternative 1: Event-Driven Architecture
Replace current request/response with events:

# Commands
@dataclass
class ComputeAttributionsCommand:
    prompt_id: int
    optimization_params: OptimParams | None

# Events  
@dataclass
class AttributionsComputed:
    prompt_id: int
    graph_data: GraphData

Alternative 2: Functional Core + Imperative Shell
Move all computation to pure functions, keep I/O at edges:

def compute_attributions(model: Model, tokens: Tensor, config: Config) -> Graph:
    # Pure function - no side effects, easy to test
    pass

class AttributionService:
    # Imperative shell - handles caching, DB, progress callbacks
    pass

🚧 Breaking Changes Needed

The current architecture is trying to be too general. Consider:

Remove the three graph types - just have "computed graphs" with optional optimization metadata
Eliminate the database - store graphs as JSON files, much simpler
Split the mega-components - PromptAttributionsTab should be 4-5 smaller components

Bottom Line: This codebase is evolving from research prototype to production-like complexity without the discipline. Either embrace the complexity with proper abstractions, or ruthlessly simplify back to research-code simplicity.

The current middle ground is the worst of both worlds.

… concrete

…n_contexts, correlations)

This reverts commit ef0e1e3.

…endency

…scribing - Rewrite graphLayout.ts: 182 -> 95 lines, no ModelInfo dependency - Remove ModelInfo type and getModelInfo() API call - Remove modelInfo state from useRun.svelte.ts - Remove displayNames/getLayerDisplayName from all components - Node keys are now canonical addresses (e.g. '0.attn.q:3:5') - Layout derived directly from address structure

…Info endpoint, fix canonical_str

…output node cap - Cap output nodes to 15 per position (compute + display) to keep edge count tractable with large vocabularies (50k vs 4k) - Handle missing harvest data gracefully: interpretations, activation contexts, component data bulk all return empty/null instead of 500 - Fix fetchJson to handle non-JSON error responses (raw tracebacks) - Frontend: ActivationContextsTab shows helpful message when no harvest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@OverRide

…onent_data topology.py: - Regex-based CanonicalWeight.parse() instead of manual string splitting - Pull parse_target_path/render_canonical_weight into PathSchema base (5 copies → 1) - Dict lookups in sublayer schemas instead of if/elif chains - @OverRide annotations — 0 basedpyright warnings component_data.py: - Replace try/except(AssertionError, FileNotFoundError) with explicit harvest.has_correlations() / has_token_stats() checks - Add has_correlations() and has_token_stats() to HarvestCache Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instruments: CI forward pass, CI-masked forward, gradient forward, alive info (with component counts), per-target edge computation, node extraction, build_out_probs, save_graph, process_edges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…yload - AppTokenizer: escape control chars for display (tab→⇥, newline→↵) - Harvester: strip padding sentinels at write time, not in router - Remove all padding handling from activation_contexts router - Reduce bulk prefetch limits (100→10 examples, 20→10 correlations/stats) - Fix TokenPillList duplicate key error (key by index, not token string) - Truncate prompt previews in char space (60 chars) not token space Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

"wte" is a concrete module name (GPT-2's word token embedding), not a good canonical name. "embed" is model-agnostic. Concrete paths in PathSchema subclasses (embedding_path = "wte") are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Thread RateLimiter through scoring pipelines (intruder, detection, fuzzing) - Move MAX_REQUESTS_PER_MINUTE to llm_api module (single source of truth) - Minor app cleanup (unused import, runs endpoint/API) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Handles old harvest data that still has -1 padding sentinels on disk. The HF tokenizer overflows on -1 token IDs, so strip at the type boundary to protect all consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace HarvestCache (singleton with in-memory caching) with per-category repos that read through on every call. No restart required when files are written while app is running. - HarvestRepo: activation contexts, correlations, token stats - InterpRepo: interpretations, eval scores (intruder/detection/fuzzing) - AttributionRepo: dataset attribution matrix - All routers migrated to use repos via loaded.harvest/interp/attributions - Tensor data (.pt) stays as-is, component data stays as JSONL (for now) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the hand-rolled byte-offset JSONL index with a proper SQLite database (harvest.db). Component data stored as JSON blobs per row, config as key-value pairs. WAL mode for concurrent reads. - New: spd/harvest/db.py (HarvestDB class, 151 lines) - Removed: ~165 lines of byte-offset index + mmap bulk loader - HarvestResult.save() writes to SQLite instead of JSONL + summary.json - HarvestRepo + loaders.py rewritten to use HarvestDB - Correlations/token_stats stay as .pt files (dense tensors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix RateLimiter lock-during-sleep bug that serialized all coroutines - Bundle OpenRouter + RateLimiter + CostTracker into LLMClient with single chat() method that handles budget checks, rate limiting, retries, and cost tracking - Rate limit per API call (not per task) — fixes unthrottled bursts from multi-trial scorers - Remove pipeline.py abstraction — each caller (interpret, intruder, detection, fuzzing) owns its full pipeline flow inline - Add BudgetExceededError for clean budget enforcement - Add LLMClientConfig for client construction params Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace JSONL files + complex directory scanning with a single SQLite database (interp.db) per run. Interpretations and all eval scores (intruder, detection, fuzzing) stored in one place. - New: spd/autointerp/db.py (InterpDB class) - interpret.py: writes to SQLite, resume via db.get_completed_keys() - scoring scripts: write scores to SQLite instead of timestamped JSONL - InterpRepo: backed by InterpDB with lazy init + save_interpretation() - loaders.py: simplified to thin wrappers over InterpDB - Migration script: scripts/migrate_autointerp_to_sqlite.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Handle missing 'prompt' field in old InterpretationResult - Handle missing 'component_acts' field in old ActivationExample - Handle extra fields in old HarvestConfig / ComponentData - Gracefully skip unparseable configs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- All scripts use HarvestRepo/InterpRepo instead of loader functions - run_interpret takes HarvestRepo instead of (run_id, correlations_dir, ci_threshold) - correlations router uses loaded.harvest.get_component() directly - dataset_attributions harvest uses HarvestRepo.get_summary() - harvest/loaders.py deleted (zero remaining consumers) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Enables CI threshold slider to re-filter output nodes without recomputing. out_probs are now computed from logits at display time via filter_graph_for_display(). - StoredGraph: out_probs dict → ci_masked_out_logits + target_out_logits tensors - DB: output_probs_data TEXT → output_logits BLOB (torch.save binary) - Consolidate _add_pseudo_layer_nodes + process_edges + build_out_probs into single filter_graph_for_display() → FilteredGraph - Remove output_prob_threshold from optimized graph endpoint Breaking: delete .data/app/prompt_attr.db Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Handle None returns from get_correlations/get_token_stats in routers - Fix activation_contexts: handle None from get_component - Update test_server_api.py: HarvestCache → repos - Remove unused autointerp_run_id CLI param - Remove unused t_start variable in compute.py 0 errors, 0 warnings from basedpyright + ruff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… limit probing Adds XML/single-line rendering variants for rich_examples, compact_skeptical, and dual_view strategies. Includes scripts for generating the prompt strategy gallery, sweep results dashboard, threshold sweep preparation, and provider rate limit probing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>