Reasoning-Lab

This repository provides a self-contained Colab-friendly benchmark for comparing different reasoning strategies:

CoT (Chain-of-Thought)
CoT + Self-consistency
RAG × CoT (mocked)
ReAct (Tool-Augmented, mocked)
FSM Controller (LangGraph-like controller, mocked)

It does not call any external LLM API – everything is implemented using small, deterministic toy logic so that it runs for free on Colab or any laptop.

How to run (Colab or local)

python scripts/run_benchmark.py

You should see a table similar to:

Model   | Outcome(EM) | Process(StepAcc) | Robustness(SC/Para) | Efficiency(Tokens/Steps)
--------+-------------+------------------+----------------------+-------------------------
CoT    ...
...

The exact numbers will differ from the paper-style example, but the evaluation pipeline and comparison structure match what you would use in a NeurIPS/ICLR paper.

Structure

reasoning_core/ – core reasoning models (CoT, RAG×CoT, ReAct, FSM)
evaluation/ – metric computation and table/report generation
data/ – small JSONL sample datasets
scripts/ – CLI entrypoints (run_benchmark.py)

This is intended as a template: you can replace the mocked models with real LLM wrappers and keep all evaluation code intact.

Configuration

The default experiment is defined in configs/experiment_default.yaml:

dataset:
  path: data/tasks/sample_tasks.jsonl
models:
  - name: CoT
    type: cot
  - name: CoT+SC
    type: cot_sc
  - name: RAG×CoT
    type: rag_cot
  - name: ReAct
    type: react
  - name: FSM
    type: fsm
random_seed: 42

You can create new configs pointing to different datasets or subsets of models without touching the Python code.

Tests

A minimal pytest smoke test is included under tests/:

pytest -q

This verifies that all models run end-to-end and that metrics stay in valid ranges.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reasoning-Lab

How to run (Colab or local)

Structure

Configuration

Tests

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
evaluation		evaluation
reasoning_core		reasoning_core
scripts		scripts
tests		tests
README.md		README.md

REICHIYAN/reasoning_lab

Folders and files

Latest commit

History

Repository files navigation

Reasoning-Lab

How to run (Colab or local)

Structure

Configuration

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages