hprobes

Discover and causally validate hallucination-associated FFN neurons (H-Neurons) in transformer LLMs.

Install

pip install hprobes
# or
uv add hprobes

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
from hprobes import HProbe

model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-it", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-4b-it")

# samples: list of dicts with question, options, answer
probe = HProbe(model, tokenizer)
probe.fit(samples, options_key="choices", answer_key="answer")

print(probe.n_neurons_, probe.layer_distribution_)

results = probe.score()
print(f"AUROC {results['auroc']:.3f}  gap {results['auroc_gap']:+.3f}")

probe.causal_validate()

CLI

# Fit and score on an MCQ dataset
hprobes run --model google/gemma-3-4b-it --data dataset.jsonl --samples 500

# Transfer: score a saved probe on a different model
hprobes transfer --probe results/probe --model google/gemma-3-4b --data dataset.jsonl

# Fit from pre-generated responses with judge labels
hprobes responses --model google/gemma-3-4b-it --data responses.jsonl

Supported formats

Input files: .jsonl, .json, .parquet

Auto-detected dataset formats: mmlu, medqa, medmcqa. Any other format works by passing options_key and answer_key directly.

Key options

Parameter	Default	Description
`l1_C`	`0.01`	Inverse L1 strength — lower = fewer neurons
`contrastive`	`True`	3-vs-1 labeling at the generated answer token
`layer_stride`	`1`	Sample every Nth layer (2 = faster)
`validation_split`	`0.2`	Holdout fraction for scoring
`max_tokens`	`1024`	Truncation length

Save & load

probe.save("results/gemma_medqa")          # writes .json + .pkl
probe = HProbe.load("results/gemma_medqa", model, tokenizer)
probe.score_on(new_samples, options_key="choices", answer_key="answer")

Acknowledgements

This research is conducted in collaboration with the Great Ormond Street Hospital DRIVE Unit.

Contributors

Huseyin Cavus — Core Contributor
Dr. Pavithra Rajendran — Machine Learning Lead, GOSH DRIVE
Sebin Sabu — Senior AI Scientist, GOSH DRIVE
Jaskaran Singh Kawatra — ML Engineer, GOSH DRIVE

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github		.github
data		data
docs		docs
src/hprobes		src/hprobes
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
hprobes_tutorial.ipynb		hprobes_tutorial.ipynb
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hprobes

Install

Quickstart

CLI

Supported formats

Key options

Save & load

Acknowledgements

Contributors

About

Uh oh!

Releases 11

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hprobes

Install

Quickstart

CLI

Supported formats

Key options

Save & load

Acknowledgements

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages