Verifying Chain-of-Thought Reasoning via its Computational Graph

This is the official code repository for the paper "Verifying Chain-of-Thought Reasoning via its Computational Graph".

Installation

To create the environment and install dependencies:

conda env create -f environment.yml
conda activate crv

Data

All datasets used in the paper are located in the data/ folder. You can also download the datasets directly from Hugging Face.

Baselines

We compare CRV against black-box (MaxProb, PPL, Entropy, Temperature Scaling, Energy) and gray-box (Chain-of-Embedding, CoT-Kinetics) methods.

To reproduce these baselines, please use our fork of the Chain-of-Embedding repository:

Clone the repository:

git clone https://github.com/ncancedda/Chain-of-Embedding/
cd Chain-of-Embedding/
git checkout fair-mir

Install the required dependencies:

conda create -n coeeval python=3.10
conda activate coeeval
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Run the baselines:

python main.py \
    --model_name meta-llama/Llama-3.1-8B-Instruct \
    --dataset /path/to/dataset \
    --print_model_parameter \
    --save_output \
    --save_hidden_states \
    --save_coe_score \
    --save_cotk_score \
    --use_cached_outputs \
    --token_aggregation "average" \
    --cotk_gamma 0.8 \
    --no_dialogs

Evaluate results: Results are saved to OutputInfo/. Run the evaluation script to compute AUROC, AUPR, and FPR@95:

python ./Evaluation/eval.py \
    --model_name meta-llama/Llama-3.1-8B-Instruct \
    --dataset /path/to/dataset \
    --pos_label 0 \
    --token_aggregation "average"

Note: We set pos_label=0 because we treat the incorrect label as the positive class for our metrics.

Models and Transcoders

This work utilizes Llama 3.1 8B Instruct with MLPs replaced by trained Top-K Transcoders.

Weights: The per-layer transcoder weights can be downloaded from Hugging Face.
Usage: To load and use the transcoders, please refer to the models/ directory for instruction.

Attribution Graphs

We utilize the Circuit Tracer library to construct the attribution graphs. Detailed instructions and the specific scripts used to generate graphs for our datasets are provided in the attribution/ folder.

Feature Extraction

Once attribution graphs are generated, use the extraction script to convert graph structures into tabular features for classification.

Script: features/feature_extraction.py
Documentation: See features/README.md for details.

Verification

Finally, with all features extracted, you can train the diagnostic classifier. We recommend to use the scikit-learn library for this purpose, as they provide implementation for a wide selection of classification methods.

Citation

If you find this work useful, please consider citing our paper:

@article{zhao2025verifying,
      title={Verifying Chain-of-Thought Reasoning via Its Computational Graph},
      author={Zheng Zhao and Yeskendir Koishekenov and Xianjun Yang and Naila Murray and Nicola Cancedda},
      year={2025},
      eprint={2510.09312},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.09312},
}

License

This project is licensed under the CC-BY-NC-4.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
attribution		attribution
data		data
features		features
models		models
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Verifying Chain-of-Thought Reasoning via its Computational Graph

Installation

Data

Baselines

Models and Transcoders

Attribution Graphs

Feature Extraction

Verification

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

facebookresearch/CRV

Folders and files

Latest commit

History

Repository files navigation

Verifying Chain-of-Thought Reasoning via its Computational Graph

Installation

Data

Baselines

Models and Transcoders

Attribution Graphs

Feature Extraction

Verification

Citation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages