Discover and causally validate hallucination-associated FFN neurons (H-Neurons) in transformer LLMs.
Based on arXiv:2512.01797.
pip install hprobes
# or
uv add hprobesfrom transformers import AutoModelForCausalLM, AutoTokenizer
from hprobes import HProbe
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-it", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-4b-it")
# samples: list of dicts with question, options, answer
probe = HProbe(model, tokenizer)
probe.fit(samples, options_key="choices", answer_key="answer")
print(probe.n_neurons_, probe.layer_distribution_)
results = probe.score()
print(f"AUROC {results['auroc']:.3f} gap {results['auroc_gap']:+.3f}")
probe.causal_validate()# Fit and score on an MCQ dataset
hprobes run --model google/gemma-3-4b-it --data dataset.jsonl --samples 500
# Transfer: score a saved probe on a different model
hprobes transfer --probe results/probe --model google/gemma-3-4b --data dataset.jsonl
# Fit from pre-generated responses with judge labels
hprobes responses --model google/gemma-3-4b-it --data responses.jsonlInput files: .jsonl, .json, .parquet
Auto-detected dataset formats: mmlu, medqa, medmcqa. Any other format works by passing options_key and answer_key directly.
| Parameter | Default | Description |
|---|---|---|
l1_C |
0.01 |
Inverse L1 strength — lower = fewer neurons |
contrastive |
True |
3-vs-1 labeling at the generated answer token |
layer_stride |
1 |
Sample every Nth layer (2 = faster) |
validation_split |
0.2 |
Holdout fraction for scoring |
max_tokens |
1024 |
Truncation length |
probe.save("results/gemma_medqa") # writes .json + .pkl
probe = HProbe.load("results/gemma_medqa", model, tokenizer)
probe.score_on(new_samples, options_key="choices", answer_key="answer")This research is conducted in collaboration with the Great Ormond Street Hospital DRIVE Unit.
- Huseyin Cavus — Core Contributor
- Dr. Pavithra Rajendran — Machine Learning Lead, GOSH DRIVE
- Sebin Sabu — Senior AI Scientist, GOSH DRIVE
- Jaskaran Singh Kawatra — ML Engineer, GOSH DRIVE