LitGPT is designed as an installable product, not a fragile dependency stack. Unlike Hugging Face Transformers — where version conflicts are common — LitGPT installs as a self‑contained environment with its own compatible dependencies. This makes it ideal for users who want a “black‑box” installation that just works.
LitGPT provides:
- A command‑line interface (CLI) for downloading models, fine‑tuning, pretraining, inference, and evaluation.
- A Python API for dynamic or programmatic workflows.
- Support for JSON datasets, YAML recipes, and custom data loaders.
- Ready‑made tutorials for fine‑tuning, pretraining, LoRA, QLoRA, and more.
Below is a practical guide to using LitGPT for Q&A‑style training, including command‑line examples, server‑based workflows, and card formats.
LitGPT installs cleanly:
pip install litgpt[all]This installs:
- LitGPT core
- Tokenizers
- Flash‑attention
- Training utilities
- CLI tools
No dependency juggling required.
LitGPT can download models directly from Hugging Face:
litgpt download microsoft/phi-2You can list all available models:
litgpt download listLitGPT fine‑tuning expects JSON datasets in Alpaca‑style or instruction‑style format.
Example command:
litgpt finetune microsoft/phi-2 \
--data JSON \
--data.json_path my_cards.json \
--data.val_split_fraction 0.1LitGPT supports Alpaca‑style entries:
[
{
"instruction": "Question text here",
"input": "",
"output": "Answer text here"
}
]Or simple Q&A:
[
{
"question": "What is X?",
"answer": "X is ..."
}
]LitGPT automatically tokenizes and formats these.
You can serve your Q&A cards from:
- A Flask server
- A static JSON endpoint
- A local file server
- An Anki export converted to JSON
LitGPT does not require a database — any JSON file or URL is acceptable.
from flask import Flask, jsonify
app = Flask(__name__)
@app.route("/cards")
def cards():
return jsonify([
{"instruction": "What is 2+2?", "output": "4"},
{"instruction": "Define entropy.", "output": "Entropy is ..."}
])
app.run()Then train with:
litgpt finetune microsoft/phi-2 \
--data JSON \
--data.json_path http://localhost:5000/cardsLitGPT has a Python API for dynamic dataset creation.
Example:
from litgpt import Trainer, Config
cards = [
{"instruction": "Explain gravity.", "output": "Gravity is ..."},
{"instruction": "What is a neuron?", "output": "A neuron is ..."}
]
cfg = Config(model="microsoft/phi-2", data=cards)
trainer = Trainer(cfg)
trainer.finetune()You can generate cards on the fly, load them from Anki, or synthesize them programmatically.
LitGPT supports several dataset formats:
| Format | Supported? | Notes |
|---|---|---|
| Q&A | ✔️ Yes | Simplest format; works everywhere |
| Instruction + Input + Output | ✔️ Yes | Alpaca‑style; widely used |
| Documentation → Q&A | ✔️ Yes | Must be converted to JSON |
| ChatML / Multi‑turn | ✔️ Partial | Requires custom data loader |
| Anki Q&A | ❌ Not directly | Must be exported → JSON |
Yes.
Even instruction‑style datasets are fundamentally Q&A:
- Instruction = Question
- Output = Answer
LitGPT treats them as prompt → completion pairs.
Not directly.
But Anki exports to:
- CSV
- JSON (via add‑ons)
- TSV
These can be trivially converted to LitGPT JSON format.
Thus: Anki → JSON → LitGPT is the standard workflow.
Your universal source format.
Using:
- SpaCy
- DatSu
- Python scripts
- Custom parsers
Anki ensures clarity and correctness.
Final training dataset.
Depending on your automation needs.
litgpt download microsoft/phi-2litgpt finetune microsoft/phi-2 \
--data JSON \
--data.json_path my_cards.jsonlitgpt inference microsoft/phi-2 \
--prompt "Explain entropy."litgpt evaluate microsoft/phi-2 \
--data JSON \
--data.json_path test_cards.json- LitGPT Home: https://lightning.ai/docs/litgpt
- CLI Interface: https://deepwiki.com/Lightning-AI/litgpt/7.4-cli-interface
- Tutorials (finetuning, pretraining, inference):
https://github.com/Lightning-AI/litgpt/tree/main/tutorials - PyPI package: https://pypi.org/project/litgpt
Use:
litgpt finetune model-name --data JSON --data.json_path file.jsonYes — any JSON endpoint works.
Yes — Python API supports dynamic datasets.
- Q&A
- Instruction → Output
- Instruction + Input → Output
- Documentation converted to Q&A
Yes.
All formats reduce to “prompt → completion.”
Not directly — but Anki → JSON → LitGPT works perfectly.