Skip to content

BibbyAI/Bibby-AI---AI-Latex-Editor-for-Researchers-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

473 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Bibby AI

Bibby AI Mascot

Bibby AI β€” LaTeXBench-500 Benchmark

The first open benchmark for AI-powered LaTeX compilation error detection & repair

arXiv License: MIT Try Bibby Institutions


Bibby AI detects 91.4% of LaTeX errors before they silently break your paper.
13 points ahead of OpenAI Prism. 30 points ahead of Overleaf.


πŸ–₯️ The Editor

Bibby AI Editor β€” AI-powered LaTeX editor with real-time error detection, citation search, and live preview

Bibby AI β€” AI-native LaTeX editor with real-time error detection, smart citation search, and live PDF preview. No plugins. No copy-paste. Everything in one place.


πŸ“Š Benchmark Results (LaTeXBench-500)

LaTeXBench-500 is the first standardised benchmark for LaTeX compilation error detection and one-click repair, introduced in our arXiv paper.

Overall Performance

Tool Detection Accuracy (DA%) Fix Accuracy (FA%) Pre-compilation?
πŸ₯‡ Bibby AI 91.4% 83.7% βœ… Yes
πŸ₯ˆ OpenAI Prism 78.3% 64.1% Partial
πŸ₯‰ Overleaf (native) 61.2% β€” (no auto-fix) ❌ No

Per-Category Breakdown

Error Category Count Bibby DA% Prism DA% Overleaf DA%
Undefined control sequences 112 94.6% 81.2% 68.3%
Math mode errors 98 92.8% 79.4% 63.1%
Table & figure errors 86 90.1% 77.9% 59.4%
Reference errors 79 91.2% 78.8% 61.7%
Package conflicts 74 88.4% 74.6% 54.2%
Encoding & font errors 51 87.3% 72.1% 52.8%
Total / Average 500 91.4% 78.3% 61.2%

DA% = Detection Accuracy β€” correct identification of error type AND location
FA% = Fix Accuracy β€” suggested fix produces clean, semantically correct compilation


πŸ”¬ What Is LaTeXBench-500?

500 authentic LaTeX compilation errors drawn from real-world arXiv preprints, across 6 error categories, each with:

  • Ground-truth error location (file + line number)
  • Error category label
  • Verified correct fix
  • Compilation validation (before and after fix)

All errors were silently failing β€” i.e., the document compiled without crashing but produced incorrect output. This is the hardest and most practically relevant class of LaTeX errors.


πŸ—‚οΈ Repository Structure

bibby-latex-benchmark/
β”œβ”€β”€ assets/
β”‚   β”œβ”€β”€ bibby-mascot.png          ← Bibby mascot
β”‚   └── bibby-editor-screenshot.png ← Editor UI
β”œβ”€β”€ benchmark/
β”‚   β”œβ”€β”€ corpus/                   ← 500 LaTeX documents
β”‚   β”œβ”€β”€ ground_truth/             ← Annotated error locations
β”‚   └── error_categories.md       ← Full taxonomy
β”œβ”€β”€ evaluation/
β”‚   β”œβ”€β”€ metrics.py                ← DA% and FA% calculation
β”‚   β”œβ”€β”€ run_benchmark.py          ← Main runner
β”‚   └── results/                  ← Raw results per tool
β”œβ”€β”€ analysis/
β”‚   β”œβ”€β”€ figures/                  ← All paper figures (reproducible)
β”‚   └── notebooks/                ← Jupyter analysis notebooks
β”œβ”€β”€ BENCHMARK.md                  ← How to run on a new tool
β”œβ”€β”€ CONTRIBUTING.md
└── README.md

πŸš€ Run the Benchmark

Prerequisites

pip install -r requirements.txt
# Requires: Python 3.10+, latexmk, biber

Evaluate a tool

python evaluation/run_benchmark.py \
  --tool bibby \
  --corpus benchmark/corpus/ \
  --output evaluation/results/my_run/

# --tool options: bibby | prism | overleaf | custom

Compute metrics

python evaluation/metrics.py \
  --results evaluation/results/my_run/ \
  --ground-truth benchmark/ground_truth/

Reproduce paper figures

jupyter notebook analysis/notebooks/paper_figures.ipynb

🧠 Why Bibby AI Outperforms

Three architectural reasons Bibby AI's error detection is fundamentally different:

1. AST-grounded localisation
Bibby maintains a live Abstract Syntax Tree of your document. When compiler logs point to line 847, Bibby traces back through the AST to find the actual source β€” which is often 20 lines earlier. Other tools trust the log line number blindly.

2. Package-aware reasoning
Bibby's error model is conditioned on curated documentation for 2,000+ LaTeX packages. When \pgfplotsset fails, Bibby knows whether you're missing a \usetikzlibrary call vs. using a deprecated option β€” not just that something broke.

3. Validated fix generation
Every suggested fix is compiled and validated before being shown to you. Bibby never surfaces a fix that doesn't actually work.


πŸ›οΈ Institutional Adoption

Bibby AI is used by researchers at:

Institution Use Case
Simons Foundation Mathematical research papers
Allen Institute Neuroscience & biology publications
Yale University Academic dissertation writing

✍️ Try Bibby AI

Bibby

β†’ Try Bibby AI free at trybibby.com

No credit card. No installation. Open in your browser and start writing.


πŸ“š Citation

If you use LaTeXBench-500 in your research, please cite:

@misc{jain2026bibby,
  title     = {Bibby AI β€” AI LaTeX Editor writing assistant for researchers 
               vs Overleaf Alternative vs OpenAI Prism},
  author    = {Jain, Nilesh and others},
  year      = {2026},
  eprint    = {2602.16432},
  archivePrefix = {arXiv},
  primaryClass  = {cs.DL},
  url       = {https://arxiv.org/abs/2602.16432}
}

🀝 Contributing

We welcome:

  • New tool evaluations β€” run the benchmark on any tool and submit results via PR
  • Additional error categories β€” open an issue to propose new LaTeX error types
  • Corpus extensions β€” more arXiv-derived documents with ground-truth annotations

See CONTRIBUTING.md for guidelines.


πŸ“„ License

Benchmark code & evaluation scripts: MIT License
Corpus documents: Derived from arXiv papers under their respective CC licenses
Paper: CC BY 4.0


Made with πŸ’™ by the Bibby AI team

trybibby.com Β· arXiv:2602.16432 Β· @BibbyResearch

Bibby

About

Welcome to Bibby AI - an Opensource and Cloud AI Latex editor for researchers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages