Hi! I'd like to suggest adding CAJAL to this excellent curated list of scientific language models.
What is CAJAL?
CAJAL is a family of fine-tuned language models (4B–9B parameters, based on Qwen3.6-27B) specifically optimized for autonomous scientific paper generation. It produces full IMRaD-structured papers with verified citations, mathematical notation, and Lean 4 formal verification support.
Why it fits this list:
- 🧠 Domain: General science (multi-disciplinary — physics, chemistry, biology, materials, math)
- 📝 Task: End-to-end scientific writing — abstract, introduction, methods, results, discussion, conclusions
- 🔬 Unique feature: Integrates with Lean 4 for formal proof verification within generated papers
- 📊 Performance: Ranked on internal benchmarks against scientific reasoning tasks
- 💻 Deployment: Runs locally on consumer hardware (2GB VRAM for 4B, 6GB for 9B)
Links:
License: Apache 2.0
Would love to see CAJAL listed alongside other scientific LLMs like Galactica and Llemma. Happy to provide any additional details needed!
Submitted with respect for the curation standards of this list.
Hi! I'd like to suggest adding CAJAL to this excellent curated list of scientific language models.
What is CAJAL?
CAJAL is a family of fine-tuned language models (4B–9B parameters, based on Qwen3.6-27B) specifically optimized for autonomous scientific paper generation. It produces full IMRaD-structured papers with verified citations, mathematical notation, and Lean 4 formal verification support.
Why it fits this list:
Links:
License: Apache 2.0
Would love to see CAJAL listed alongside other scientific LLMs like Galactica and Llemma. Happy to provide any additional details needed!
Submitted with respect for the curation standards of this list.