Skip to content

Add CAJAL: Scientific Language Model for Paper Generation (4B-9B, Open Source) #11

Description

@Agnuxo1

Proposal: Add CAJAL — Scientific Language Model for Paper Generation

CAJAL is a family of open-source language models (4B-9B parameters) specifically trained for generating structured scientific papers with real citations, LaTeX output, and domain-specific reasoning.

Why CAJAL fits this list:

  • Scientific Language Model — Trained on 500+ scientific papers with structured sections
  • Academic Domain — Generates Abstract, Introduction, Methods, Results, Discussion, Conclusions
  • Open Source — Qwen-based architecture, MIT licensed, full training scripts available
  • Local Execution — GGUF format for llama.cpp/Ollama, runs on 4-6GB VRAM
  • Citations — Integrates with OpenAlex/Semantic Scholar for real bibliography

Model Specs:

Model Base Size Context Format
CAJAL-4B Qwen2.5-4B-Instruct ~3GB (Q4_K_M) 32K GGUF, PyTorch
CAJAL-9B Qwen3.6-9B-Instruct ~5.5GB (Q5_K_M) 32K GGUF, PyTorch

Links:

Suggested Section:

Scientific Paper Generation (new subsection)

Model Paper GitHub Size Domain
CAJAL GitHub 4B-9B Multi-domain scientific papers

Happy to submit a PR if there's interest!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions