Skip to content

(dev -> main) App updates and backwards transformer ci functions#355

Draft
danbraunai-goodfire wants to merge 602 commits intomainfrom
dev
Draft

(dev -> main) App updates and backwards transformer ci functions#355
danbraunai-goodfire wants to merge 602 commits intomainfrom
dev

Conversation

@danbraunai-goodfire
Copy link
Collaborator

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

@danbraunai-goodfire
Copy link
Collaborator Author

@claude can you review this PR? Note that your review should cover the scope of this PR only. If you spot things unrelated to this PR, feel free to bring them up and we'll consider them for new issues. I want you to think about possible ways that the implementation could be improved. Including ways it could be re-written completely using different abstractions. I want critical comments, do feel comfortable to give me the hard truths.

@claude
Copy link
Contributor

claude bot commented Feb 4, 2026

Claude finished @danbraunai-goodfire's task —— View job


PR Review: App updates and backwards transformer ci functions

Review Scope: 288 commits, 100 files changed, focusing on app updates and transformer CI functions

Todo List

  • Activate virtual environment and setup
  • Analyze major architectural changes in the app
  • Review new abstractions and components
  • Examine backend API changes
  • Review frontend component refactoring
  • Analyze config changes across experiments
  • Provide critical feedback on implementation approach
  • Suggest alternative architectures and improvements

Critical Review & Hard Truths

🔥 Major Architectural Problems

1. Backend Compute Layer is a God Object (spd/app/backend/compute.py:1-827)

  • Problem: 827 lines doing gradient computation, edge calculation, intervention logic, AND result formatting
  • Anti-pattern: Multiple responsibilities violating SRP
  • Rewrite suggestion:
    # Split into focused modules:
    spd/app/backend/compute/
      ├── gradient_engine.py     # Core autograd operations
      ├── attribution_builder.py # Edge construction logic  
      ├── intervention_runner.py # Forward passes with masks
      └── result_formatters.py   # Output transformation

2. Database Schema Fighting the Type System (spd/app/backend/database.py)

  • Problem: StoredGraph uses discriminated unions in Python, but database still has nullable columns for all possible fields
  • Result: Invalid states are representable at DB level despite type safety in Python
  • Solution: Either use proper table inheritance OR embrace the union and validate constraints in application code, don't half-ass both approaches

3. Frontend State Management is Chaotic (PromptAttributionsTab.svelte:565+ lines)

  • Problem: No state machine - implicit transitions everywhere
  • Examples:
    • TabViewState + GraphComputeState + ComposerState can be in invalid combinations
    • UI can be in "loading" state while also in "draft" mode
  • Rewrite suggestion: Use XState or explicit state machine pattern. Current approach doesn't scale.

🚨 Code Quality Issues

4. Layer Aliasing Global State (spd/app/frontend/src/lib/layerAliasing.ts:57)

let cachedArchitecture: Architecture | null = null;
  • Problem: Global mutable cache that can become stale, not reset between contexts
  • Solution: Pass architecture context down explicitly or use a proper store

5. Bulk API Endpoint Design (spd/app/backend/routers/component_data.py)

  • Good: Eliminated GIL contention by batching requests
  • Bad: Still sequential execution inside the handler - missed opportunity for true parallelism
  • Alternative: Use asyncio.gather() to run the three data fetches concurrently

6. Configuration Explosion (spd/configs.py)

  • Added complex global_reverse_residual CI config with 7+ required fields and complex validation
  • Problem: Config complexity is growing faster than the value it provides
  • Question: Why not use dataclasses with defaults instead of elaborate Pydantic validation?

💡 Missed Opportunities

7. TypeScript Discriminated Unions Done Halfway

// Current: Can still represent invalid states
type OptimizeConfigDraft = {
    loss: LossConfigDraft;  // Could be invalid
    // ...
}

// Better: Make invalid states unrepresentable  
type OptimizeState = 
  | { status: "draft"; config: Partial<OptimizeConfig> }
  | { status: "valid"; config: ValidOptimizeConfig }
  | { status: "computing"; config: ValidOptimizeConfig }

8. Database Caching Strategy

  • Current: Ad-hoc SQLite caching with complex uniqueness constraints
  • Alternative: Consider Redis/in-memory cache for computed graphs + persistent storage for configuration only
  • Benefit: Simpler schema, better performance for large graphs

🎯 Specific Technical Issues

9. Performance Anti-patterns:

  • compute_prompt_attributions() does full gradient computation even for display-only graphs
  • useComponentData and useComponentDataExpectCached are nearly identical - code duplication
  • Frontend doesn't virtualize large lists (could be 1000+ components)

10. Error Handling Inconsistency:

  • Some places use assert for validation (good, fail-fast)
  • Others use try/catch with graceful degradation (bad for research code)
  • Pick one approach: Research code should fail fast, not silently degrade

🔨 Complete Rewrite Suggestions

Alternative 1: Event-Driven Architecture
Replace current request/response with events:

# Commands
@dataclass
class ComputeAttributionsCommand:
    prompt_id: int
    optimization_params: OptimParams | None

# Events  
@dataclass
class AttributionsComputed:
    prompt_id: int
    graph_data: GraphData

Alternative 2: Functional Core + Imperative Shell
Move all computation to pure functions, keep I/O at edges:

def compute_attributions(model: Model, tokens: Tensor, config: Config) -> Graph:
    # Pure function - no side effects, easy to test
    pass

class AttributionService:
    # Imperative shell - handles caching, DB, progress callbacks
    pass

🚧 Breaking Changes Needed

The current architecture is trying to be too general. Consider:

  1. Remove the three graph types - just have "computed graphs" with optional optimization metadata
  2. Eliminate the database - store graphs as JSON files, much simpler
  3. Split the mega-components - PromptAttributionsTab should be 4-5 smaller components

Bottom Line: This codebase is evolving from research prototype to production-like complexity without the discipline. Either embrace the complexity with proper abstractions, or ruthlessly simplify back to research-code simplicity.

The current middle ground is the worst of both worlds.


@danbraunai-goodfire danbraunai-goodfire changed the title App updates and backwards transformer ci functions (dev -> main) App updates and backwards transformer ci functions Feb 6, 2026
claude-spd1 and others added 27 commits February 10, 2026 20:23
…scribing

- Rewrite graphLayout.ts: 182 -> 95 lines, no ModelInfo dependency
- Remove ModelInfo type and getModelInfo() API call
- Remove modelInfo state from useRun.svelte.ts
- Remove displayNames/getLayerDisplayName from all components
- Node keys are now canonical addresses (e.g. '0.attn.q:3:5')
- Layout derived directly from address structure
…output node cap

- Cap output nodes to 15 per position (compute + display) to keep edge
  count tractable with large vocabularies (50k vs 4k)
- Handle missing harvest data gracefully: interpretations, activation
  contexts, component data bulk all return empty/null instead of 500
- Fix fetchJson to handle non-JSON error responses (raw tracebacks)
- Frontend: ActivationContextsTab shows helpful message when no harvest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onent_data

topology.py:
- Regex-based CanonicalWeight.parse() instead of manual string splitting
- Pull parse_target_path/render_canonical_weight into PathSchema base (5 copies → 1)
- Dict lookups in sublayer schemas instead of if/elif chains
- @OverRide annotations — 0 basedpyright warnings

component_data.py:
- Replace try/except(AssertionError, FileNotFoundError) with explicit
  harvest.has_correlations() / has_token_stats() checks
- Add has_correlations() and has_token_stats() to HarvestCache

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instruments: CI forward pass, CI-masked forward, gradient forward,
alive info (with component counts), per-target edge computation,
node extraction, build_out_probs, save_graph, process_edges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…yload

- AppTokenizer: escape control chars for display (tab→⇥, newline→↵)
- Harvester: strip padding sentinels at write time, not in router
- Remove all padding handling from activation_contexts router
- Reduce bulk prefetch limits (100→10 examples, 20→10 correlations/stats)
- Fix TokenPillList duplicate key error (key by index, not token string)
- Truncate prompt previews in char space (60 chars) not token space

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
"wte" is a concrete module name (GPT-2's word token embedding), not a
good canonical name. "embed" is model-agnostic. Concrete paths in
PathSchema subclasses (embedding_path = "wte") are unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Thread RateLimiter through scoring pipelines (intruder, detection, fuzzing)
- Move MAX_REQUESTS_PER_MINUTE to llm_api module (single source of truth)
- Minor app cleanup (unused import, runs endpoint/API)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Handles old harvest data that still has -1 padding sentinels on disk.
The HF tokenizer overflows on -1 token IDs, so strip at the type
boundary to protect all consumers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace HarvestCache (singleton with in-memory caching) with per-category
repos that read through on every call. No restart required when files
are written while app is running.

- HarvestRepo: activation contexts, correlations, token stats
- InterpRepo: interpretations, eval scores (intruder/detection/fuzzing)
- AttributionRepo: dataset attribution matrix
- All routers migrated to use repos via loaded.harvest/interp/attributions
- Tensor data (.pt) stays as-is, component data stays as JSONL (for now)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the hand-rolled byte-offset JSONL index with a proper SQLite
database (harvest.db). Component data stored as JSON blobs per row,
config as key-value pairs. WAL mode for concurrent reads.

- New: spd/harvest/db.py (HarvestDB class, 151 lines)
- Removed: ~165 lines of byte-offset index + mmap bulk loader
- HarvestResult.save() writes to SQLite instead of JSONL + summary.json
- HarvestRepo + loaders.py rewritten to use HarvestDB
- Correlations/token_stats stay as .pt files (dense tensors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix RateLimiter lock-during-sleep bug that serialized all coroutines
- Bundle OpenRouter + RateLimiter + CostTracker into LLMClient with single chat() method
  that handles budget checks, rate limiting, retries, and cost tracking
- Rate limit per API call (not per task) — fixes unthrottled bursts from multi-trial scorers
- Remove pipeline.py abstraction — each caller (interpret, intruder, detection, fuzzing)
  owns its full pipeline flow inline
- Add BudgetExceededError for clean budget enforcement
- Add LLMClientConfig for client construction params

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace JSONL files + complex directory scanning with a single SQLite
database (interp.db) per run. Interpretations and all eval scores
(intruder, detection, fuzzing) stored in one place.

- New: spd/autointerp/db.py (InterpDB class)
- interpret.py: writes to SQLite, resume via db.get_completed_keys()
- scoring scripts: write scores to SQLite instead of timestamped JSONL
- InterpRepo: backed by InterpDB with lazy init + save_interpretation()
- loaders.py: simplified to thin wrappers over InterpDB
- Migration script: scripts/migrate_autointerp_to_sqlite.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle missing 'prompt' field in old InterpretationResult
- Handle missing 'component_acts' field in old ActivationExample
- Handle extra fields in old HarvestConfig / ComponentData
- Gracefully skip unparseable configs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All scripts use HarvestRepo/InterpRepo instead of loader functions
- run_interpret takes HarvestRepo instead of (run_id, correlations_dir, ci_threshold)
- correlations router uses loaded.harvest.get_component() directly
- dataset_attributions harvest uses HarvestRepo.get_summary()
- harvest/loaders.py deleted (zero remaining consumers)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables CI threshold slider to re-filter output nodes without recomputing.
out_probs are now computed from logits at display time via filter_graph_for_display().

- StoredGraph: out_probs dict → ci_masked_out_logits + target_out_logits tensors
- DB: output_probs_data TEXT → output_logits BLOB (torch.save binary)
- Consolidate _add_pseudo_layer_nodes + process_edges + build_out_probs
  into single filter_graph_for_display() → FilteredGraph
- Remove output_prob_threshold from optimized graph endpoint

Breaking: delete .data/app/prompt_attr.db

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle None returns from get_correlations/get_token_stats in routers
- Fix activation_contexts: handle None from get_component
- Update test_server_api.py: HarvestCache → repos
- Remove unused autointerp_run_id CLI param
- Remove unused t_start variable in compute.py

0 errors, 0 warnings from basedpyright + ruff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ocg-goodfire and others added 30 commits March 19, 2026 15:28
… limit probing

Adds XML/single-line rendering variants for rich_examples, compact_skeptical, and
dual_view strategies. Includes scripts for generating the prompt strategy gallery,
sweep results dashboard, threshold sweep preparation, and provider rate limit probing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nvention, dataset context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Context tokens per side: 20
- EOT token: <|endoftext|>
- Seq len: 512
- Model params: ~42M
- Act values: "roughly in (-1, 1)" not "(0, 1)"
- Rephrase TODO sentence about high act / low CI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix firing threshold: use actual harvest threshold, not hypothetical
- Clarify output PMI: explicitly about next-token predictions
- Remove "Pythia fashion" jargon, describe format directly
- Add "read direction" / "write direction" terminology for V and U

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New strategy with full SPD method explanation, CI-vs-act guidance,
sign convention clarity, dynamic CI threshold from harvest config,
and fixed XML rendering. Submitted for evaluation against the
earlier 9-variant × 3-threshold sweep on Jose's 200-component subset.

Also adds position distribution analysis script, Gemini prompt probe
script, and the component subset file used for evaluation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop scratch utilities (position analysis, prompt probing, rate limit
testing, sweep dashboard, strategy gallery, sample keys, intruder
comparison), planning docs, and the subsets helper — all kept on disk
but not part of the canon strategy PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…riptive framing

- Add input token stats (recall + PMI) and output recall to canon prompt
- Restructure: raw frequencies first, then PMI as "same data normalized by base rate"
- Drop precision metrics (saturated on sparse components)
- Remove input/output ontology priming and "activation patterns" framing
- Less prescriptive task instruction: lead with most salient aspect
- Fix V/U description accuracy, soften CI vs act language
- Pass input_token_stats through dispatch and interpret for canon strategy
- Restore accidentally deleted subsets.py (fixes type errors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…neighbors

- Canon-style XML examples with CI/act annotations (no shifted examples)
- Recall-first + PMI token stats (output-only for output pass, input-only for input pass, both for unification)
- Compact SPD preamble (no inaccurate V/U claims, just dimensions)
- Short layer descriptions (attention key proj, layer 1) instead of verbose "in the 2nd of 4 blocks"
- Architecture info factored into component header once
- Related components: label-first display, positive/negative split with explanations
- New `summary_for_neighbors` field: LLM writes a 1-2 sentence summary specifically for downstream/upstream components to read during their labeling
- Fix canon prompt V/U description to not claim what directions "do"
- Fix "what causes this component to fire" → neutral wording

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…neighbors

Unification pass is terminal — no downstream consumers need its summary.
Output/input passes use LABEL_SCHEMA (with summary_for_neighbors),
unification uses UNIFIED_LABEL_SCHEMA (label + reasoning only).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Single CANON_RENDERING constant in config.py (was duplicated in 3 places)
- Shared token_stats_section helper in prompt_helpers.py (was inline in canon.py + graph_interp)
- Delete dead build_separated_examples
- Remove activation_threshold default in dispatch (caller always provides)
- Remove component_keys default in resolve_target_component_keys
- Cache pruned Gemini schema on first call instead of deepcopy per request
- Narrow _get_output_stats/_get_input_stats to return TokenPRLift (not | None)
- direction param typed as Literal["Input", "Output"]
- Net -71 lines

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Old harvest DBs without n_activation_examples column will need re-harvesting.
No backwards-compat shims per project principles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants