Current Status: ✅ READY TO TRAIN
| Metric | Value |
|---|---|
| Total Examples | 5,586,092 |
| Size | 13 GB |
| Dark Content | 20% (1.1M examples) |
| 10% Sample | 559K examples (1.27 GB) |
# Pre-flight check (optional but recommended)
./scripts/pre_flight_check.sh
# Start training (24-36 hours)
./scripts/train_10pct_a4000.sh# 10% test (4-6 hours, ~$8)
./scripts/train_10pct_test.sh
# Full training (48-72 hours, ~$66)
./scripts/train_full.sh# Watch logs
tail -f work/training/leviathan_10pct_a4000/logs/training_*.log
# Monitor GPU
watch -n 1 nvidia-smi
# Check latest checkpoint
source scripts/checkpoint_utils.sh
latest_checkpoint# Pause: Press Ctrl+C (once)
# Training saves checkpoint and stops
# Resume: Run same command again
./scripts/train_10pct_a4000.sh
# Automatically resumes from last checkpoint# Load utilities
source scripts/checkpoint_utils.sh
# List all checkpoints
list_checkpoints
# View latest details
latest_checkpoint
# Clean up old checkpoints (keep 2 most recent)
cleanup_checkpoints work/training/leviathan_10pct_a4000 2| Run | Duration | Cost |
|---|---|---|
| 10% test | 24-36 hours | Free (power ~$7) |
| Full training | 10-15 days | Free (power ~$70) |
| Run | Duration | Cost (Lambda) |
|---|---|---|
| 10% test | 4-6 hours | ~$8 |
| Full training | 48-72 hours | ~$66 |
# 1. Test on your A4000 (validate setup)
./scripts/train_10pct_a4000.sh # 24-36 hrs, FREE
# 2. Verify it works
python3 scripts/test_inference.py \
--adapter work/training/leviathan_10pct_a4000/checkpoint-final
# 3. If successful, rent Lambda Labs A100 for full training
# Total cost: ~$70, Total time: 3-4 days# Edit configs/leviathan_10pct_a4000.yaml
sequence_len: 1024 # was 2048
lora_r: 16 # was 32# Just restart - auto-resumes
./scripts/train_10pct_a4000.shpkill -f axolotl
# Then resume when ready- TRAINING_OPTIONS_SUMMARY.md - Choose your strategy
- A4000_TRAINING_GUIDE.md - A4000 optimization tips
- docs/CHECKPOINTING_GUIDE.md - Pause/resume details
- Loss: <1.5 (good), <1.3 (excellent)
- Identity: Responds as Leviathan
- Dark content: Reduced refusal
- Coding/reasoning: Maintained quality
- Strong Leviathan persona
- Light-moderate refusal reduction (20% dark)
- Deep esoteric knowledge
- Maintained benchmarks (HumanEval ~40-50%, TruthfulQA ~50-60%)
# Start screen session
screen -S leviathan
# Launch training
./scripts/train_10pct_a4000.sh
# Detach: Ctrl+A, then D
# Reattach later: screen -r leviathanconfigs/
├── leviathan_10pct_a4000.yaml ← A4000 config
├── leviathan_10pct_test.yaml ← A100 test
└── leviathan_full_training.yaml ← A100 full
examples/datasets/
├── FINAL_CORPUS_7M_PLUS_ESOTERIC.jsonl ← 5.5M (13GB)
└── LEVIATHAN_10PCT_SAMPLE.jsonl ← 559K (1.3GB)
work/training/
└── leviathan_10pct_a4000/ ← Training output
├── checkpoint-*/ ← Saved checkpoints
└── logs/ ← Training logs
./scripts/train_10pct_a4000.shThat's it. Everything else is automatic.
Quick Reference • See SYSTEM_READY_STATUS.md for full details