feat: offline safetensors ablation, async vLLM benchmarking, and polysemantic eval by RUFFY-369 · Pull Request #11 · NousResearch/neural-steering

RUFFY-369 · 2026-05-20T19:54:13Z

📌 Overview

Introduces an offline ablation pipeline to bypass PyTorch runtime hooks. This enables native deployment to high-throughput engines (vLLM/TGI) with zero Time-To-First-Token (TTFT) or Inter-Token Latency (ITL) overhead. Includes async load generation and documents a novel capability drift boundary condition.

🛠️ Architectural Implementation

apply_surgery.py:
- Executes in-place, zero-copy matrix surgery on gate_proj, up_proj, and down_proj.
- Enforces aggressive garbage collection per shard to prevent host RAM OOM cliffs on 70B+ models.
- Injects the JSON ablation map directly into the .safetensors metadata header for architectural provenance.
stress_test.py:
- Async vLLM load generator utilizing AsyncOpenAI.
- Streams tokens to accurately capture TTFT and concurrent throughput under exponential backoff.
extract_circuit.py:
- Parameterizes the contrastive discovery (CNA) pipeline into a CLI tool for dynamic target mapping.

🔬 Empirical Findings: Polysemantic Entanglement

Stress-testing the offline-ablated Llama-3.1-8B-Instruct (0.1% refusal circuit, localized to L30/31) against a 4.8k-token prefix revealed a semantic reasoning flaw undetected by standard n-gram repetition metrics.

Warning

Observation: The ablation successfully bypassed the refusal state without triggering an EOS-avoidance loop. The model maintained perfect structural fluency (flawless Markdown and syntax). However, it suffered a complete semantic collapse, generating logically invalid code (e.g., hallucinated C functions, floating if-statements, string-literal misassignments).

Important

Conclusion: Late-layer refusal neurons are mathematically entangled with logical code-correctness circuits. Ablation maintains the "shape" of a valid response while quietly lobotomizing downstream reasoning.

📂 Verification & Reproducibility

Raw evidentiary logs demonstrating this semantic collapse are fully committed and preserved in:

cc @samherring99 @DamascusGit

… test - Add extract_circuit.py for contrastive neuron attribution (CNA) discovery. - Map canonical 458-neuron refusal circuit to canonical_indices.json. - Implement apply_surgery.py to bypass runtime hooks and hard-ablate gate_proj, up_proj, and down_proj directly within .safetensors binaries. - Add stress_test.py to validate long-context autoregressive stability via vLLM. - Include post-mortem logs demonstrating polysemantic entanglement and code-correctness collapse at 4.8k tokens.

…est to production-grade - Add argparse interface to make paths portable across environments. - Inject ablation indices directly into .safetensors header metadata for model provenance. - Optimize host RAM safety in apply_surgery.py via strict garbage collection. - Re-implement stress_test.py as an asynchronous streaming client with exponential backoff and load statistics.

…c findings

RUFFY-369 added 5 commits May 21, 2026 00:44

docs: update README with offline safetensors ablation and polysemanti…

65465d6

…c findings

docs(ablation): add evidentiary log file references to README

931b39b

docs(ablation): hyper-link all referenced files in README

63cf954

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: offline safetensors ablation, async vLLM benchmarking, and polysemantic eval#11

feat: offline safetensors ablation, async vLLM benchmarking, and polysemantic eval#11
RUFFY-369 wants to merge 5 commits into
NousResearch:mainfrom
RUFFY-369:feat/offline-safetensors-ablation

RUFFY-369 commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RUFFY-369 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Overview

🛠️ Architectural Implementation

🔬 Empirical Findings: Polysemantic Entanglement

📂 Verification & Reproducibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RUFFY-369 commented May 20, 2026 •

edited

Loading