White-Box Scorers for Uncertainty Quantification and Hallucination Detection in Large Language Models
This repository accompanies a master's thesis on white-box scoring methods for large language models. It combines research code, archived outputs, benchmark subsets, and reproduction notes for two linked experimental phases.
The repository is organized as an academic artifact/codebase, not as a production package. In most cases, the most useful entry point is the archived outputs and the phase-specific documentation rather than a full rerun.
phase_1_replication/: conceptual replication of white-box hallucination scoring methods on frozen TruthfulQA answersphase_2_medical/: transfer of selected white-box signals to constrained medical QA, framed as benchmark-defined prediction error detection on MedQA and PubMedQA
Across the two phases, the implemented methods include intrinsic logit-based scores such as LNTP and MTP, plus supervised white-box readouts based on hidden states and EGH-style feature families.
If you want to understand the repository quickly, start with:
phase_1_replication/README.mdphase_2_medical/README.mdbenchmarks_overview.mdARTIFACT_MANIFEST.md
For reproduction and data details, see:
phase_1_replication/reproduce_phase1.mdphase_1_replication/DATA.mdphase_2_medical/reproduce_phase2.mdphase_2_medical/DATA.md
phase_1_replication/
benchmarks/ frozen TruthfulQA subset, labels, and annotation material
src/ Phase 1 scoring pipeline
analysis/ figure generation
outputs/ archived results, manifests, and PDFs
phase_2_medical/
benchmarks/ prepared MedQA and PubMedQA subsets
src/ frozen-generation and scoring pipeline
scripts/ maintained baseline and ablation runners
analysis/ table and figure generation
outputs/ frozen predictions, final baseline runs, ablations, figures, tables
Useful root-level files:
benchmarks_overview.md: benchmark summary across both phasesARTIFACT_MANIFEST.md: inventory of the main archived artifactsrequirements.txt: shared Python dependenciesCITATION.cff: citation metadata
Some of the most useful files and directories are:
phase_1_replication/benchmarks/truthfulqa_hallu_frozen_model_outputs_300.jsonlphase_1_replication/outputs/phase1_truthfulqa_hallu_results_300.jsonlphase_1_replication/outputs/phase1_run_manifest.jsonphase_1_replication/outputs/self_iaa_summary.jsonphase_1_replication/outputs/figs/phase_2_medical/outputs/frozen/phase_2_medical/outputs/final/phase_2_medical/outputs/figures_tables/phase_2_medical/outputs/ablations/
If you mainly want the released Phase 2 baseline results, start with phase_2_medical/outputs/final/. If you want the derived figures and tables, go to phase_2_medical/outputs/figures_tables/.
Install dependencies from the repository root:
pip install -r requirements.txtRegenerate analysis outputs from archived results:
python phase_1_replication/analysis/phase1_figures.py
python phase_2_medical/analysis/phase2_tables.py
python phase_2_medical/analysis/phase2_figures.pyRerun the main pipelines:
python phase_1_replication/src/run_phase1_truthfulqa.py
bash phase_2_medical/scripts/run_baseline_all.shPhase 2 also includes maintained ablation runners under phase_2_medical/scripts/.
- Python
3.10+ - GPU is recommended for the heavier extraction and scoring steps
- full reruns may require access to models or datasets hosted via Hugging Face
- the maintained Phase 2 shell runners assume a Linux-like environment such as Linux or WSL
Citation metadata is provided in CITATION.cff.
The repository is licensed under CC BY-NC 4.0; see LICENSE. External datasets and model checkpoints referenced by this project remain subject to their own licenses and terms.