“We don't just read history. We translate time.”
Chronos-VL is a specialized Vision–Language Model system designed to decipher Early Modern Spanish Gothic manuscripts (c. 1545).
Built for the Baidu ERNIE AI Developer Challenge.
Millions of pages of legal, social, and cultural history remain locked in Spanish archives (e.g., The RODRIGO Corpus).
-
Standard OCR (Tesseract / Base Models)
Fails catastrophically (≈68% error rate) due to:- Ink bleed-through
- Dense ligatures
- Gothic calligraphy
- The infamous Long S (ſ / f) ambiguity
-
The Gap
Historians need searchable, modernized text, not just noisy transcriptions.
Chronos-VL introduces a two-stage vision–language pipeline purpose-built for 16th-century manuscripts.
- Fine-tuned PaddleOCR-VL-0.9B
- Training via ERNIEKit (SFT)
- Optimized on NVIDIA A100 (80GB)
- Learns period-specific Gothic features and ligatures
- Chronos Engine post-processes OCR output
- Normalizes archaic spelling:
dixo→dijofacer→hacer
- Produces:
- Clean modern Spanish
A/B testing conducted on 100 unseen manuscript pages.
| Metric | PaddleOCR (Base) | Chronos-VL (Ours) | Improvement |
|---|---|---|---|
| Median Character Error Rate | 19.82% | 1.64% | 12× Better |
| Usable Output (<5% Error) | 1% | 77% | 76× Increase |
| Word Error Rate | 74.44% | 17.35% | 4× Better |
Upload any 16th-century manuscript and see Chronos-VL in action.
Here is a breakdown of the core files in this repository:
| File | Description |
|---|---|
| 📓 Demo_Chronos_VL.ipynb | The Interactive App. A complete Colab notebook that launches the Gradio interface, allowing you to upload images and compare baseline and finetuned model. |
| 📓 Evaluation.ipynb | The Proof. The script used to benchmark the model against 100 unseen images. Generates the CER/WER statistics and comparisons. |
| 🐍 chronos_processing.py | The Logic Layer. Contains the custom ChronosPostProcessor class for hallucination filtering, archaic text modernization. |
| 📄 Finetuning_script.txt | The Training Protocol. The exact commands and configurations used with ERNIEKit to train the model on the NVIDIA A100 GPU. |
| 🖼️ Rodrigo_*.png | Sample Data. Authentic 1545 manuscript fragments from the test set. You can use these to test the demo immediately. |
| 📊 training_loss.jpg | Convergence Metrics. Visual evidence of the training process showing the reduction in loss over 400 steps. |
| 🖼️ finetune_success.jpg | Visual Evidence. Showing how well finetuned model is performing. |
- Base Model: PaddleOCR-VL-0.9B (Vision-Language)
- Training Framework: ERNIEKit (SFT)
- Hardware: NVIDIA A100 (80GB)
- Dataset: RODRIGO Corpus (1545) - 9,000 Text Lines
[ https://www.youtube.com/watch?v=PaK24VT_3Jk ]
- Baidu PaddlePaddle Team for the ERNIEKit framework.
- Universitat Politècnica de València for the RODRIGO dataset.
