A multiplayer prediction market game with autonomous AI agents and continuous RL training.
Babylon includes a complete continuous RL training system with two modes:
Train on your own GPU - perfect for development and testing.
cd python
# Setup (no W&B key needed!)
export DATABASE_URL=postgresql://your-db-url
export TRAIN_RL_LOCAL=true
# Train
python -m src.training.babylon_trainerFeatures:
- β Uses your local GPU (free!)
- β All data in YOUR PostgreSQL
- β Local inference serving
- β Perfect for development
Cost: $0
Train on W&B's managed infrastructure - perfect for production.
cd python
# Setup (with W&B key)
export DATABASE_URL=postgresql://your-db-url
export WANDB_API_KEY=your-wandb-key # Get from wandb.ai/settings
export TRAIN_RL_LOCAL=true
# Train
python -m src.training.babylon_trainerFeatures:
- β W&B manages all GPUs (no setup!)
- β All data in YOUR PostgreSQL
- β W&B hosted inference (automatic!)
- β Perfect for production
Cost: ~$820-1720/month (vs $7,000+ self-managed)
cd python
pip install openpipe-art==0.5.1 asyncpg python-dotenv# Copy template
cp env.template .env
# Edit with your values
nano .envFor Local Mode:
DATABASE_URL=postgresql://your-db-url
TRAIN_RL_LOCAL=true
# That's it! Will use local GPUFor Cloud Mode (add this):
WANDB_API_KEY=your-wandb-keysource .env
psql $DATABASE_URL -f migrations/002_add_self_hosted_tables.sqlMODE=list python -m src.training.babylon_trainerOutput:
Ready windows (3):
2025-01-15T10:00: 5 agents
2025-01-15T11:00: 4 agents
2025-01-15T12:00: 3 agents
MODE=single python -m src.training.babylon_trainerLocal Mode: Trains on your GPU
Cloud Mode: Trains on W&B serverless
| Feature | Local Mode | Cloud Mode |
|---|---|---|
| GPU | Your GPU | W&B managed (CoreWeave) |
| Setup | Zero | Zero |
| Cost | Free | ~$820/month |
| Training Time | Depends on your GPU | ~15 min |
| Inference | Local serving | W&B hosted |
| Deployment | Manual | Automatic |
| Best For | Development | Production |
| Requires | DATABASE_URL | DATABASE_URL + WANDB_API_KEY |
Both modes:
- β All data in YOUR PostgreSQL
- β No OpenPipe API dependency
- β Local heuristic scoring
- β Automatic inference setup
Your AI agents learn from their trading performance:
- Agents play β Trajectories logged to PostgreSQL
- Time windows β Group agents by hour (fair comparison)
- Local scoring β Heuristic scoring (P&L + win rate + activity)
- ART training β Fine-tune model with RL
- Inference ready β Agents use improved model
TypeScript Agents (MMO)
β (log trajectories with window_id)
PostgreSQL (YOUR database)
β (query by window)
Python Trainer
ββ Collect from database
ββ Score locally (no external API)
ββ Create ART trajectories
β
ART ServerlessBackend
ββ Local Mode: Uses your GPU
ββ Cloud Mode: W&B manages GPUs on CoreWeave
β
Trained Model
ββ Local Mode: Checkpoint saved locally
ββ Cloud Mode: W&B hosted inference endpoint
β
Better Agents! π―
File: python/src/training/babylon_trainer.py
Features:
- Follows ART ServerlessBackend pattern
- Local heuristic scoring (no external API)
- Automatic local/cloud mode switching
- All data in YOUR database
Run:
# Local mode (free)
MODE=list python -m src.training.babylon_trainer
MODE=single python -m src.training.babylon_trainer
# Cloud mode (add WANDB_API_KEY)
export WANDB_API_KEY=your-key
MODE=single python -m src.training.babylon_trainerFile: python/src/training/trainer.py
Features:
- Uses ART's
ruler_score_group()for RULER scoring - External LLM judges agents (higher quality)
- Production-tested
- Complete data bridge integration
Run:
python -m src.training.trainer \
--min-agents 3 \
--lookback-hours 48 \
--model Qwen/Qwen2.5-0.5B-InstructThe beauty of ART's ServerlessBackend is it automatically switches:
# Don't set WANDB_API_KEY
export DATABASE_URL=postgresql://...
export TRAIN_RL_LOCAL=true
python -m src.training.babylon_trainer
# β Uses local GPU automatically# Add WANDB_API_KEY
export DATABASE_URL=postgresql://...
export WANDB_API_KEY=your-key
export TRAIN_RL_LOCAL=true
python -m src.training.babylon_trainer
# β Uses W&B serverless automaticallySame code, automatic switching! β¨
- This file - Overview and quick start
- python/QUICK_START.md - 4-command setup
- python/README.md - Python package documentation
- python/TEST_BOTH_MODES.md - Local vs Cloud comparison
- RL_TRAINING_CONTINUOUS_MMO_SUMMARY.md - Architecture overview
"CUDA out of memory" β Use cloud mode instead:
export WANDB_API_KEY=your-key"No GPU available" β Install CUDA drivers or use cloud mode
"WANDB_API_KEY not set" β Get from: https://wandb.ai/settings
"W&B quota exceeded" β Check your W&B billing/quota settings
"No windows found" β Check database:
SELECT "scenarioId", COUNT(*) FROM trajectories GROUP BY "scenarioId";"Database connection failed" β Verify DATABASE_URL:
psql $DATABASE_URL -c "SELECT 1"- Setup: Your GPU (one-time)
- Training: Free (uses your GPU)
- Inference: Free (local serving)
- Monthly: $0
Best for: Development, testing, learning
- Setup: Zero (W&B manages everything)
- Training: ~$1-2 per job (~15 min)
- Daily (24 jobs): ~$24-48
- Monthly: ~$720
- Inference: ~$0.001 per request
- 100k requests: ~$100
- 1M requests: ~$1,000
- Monthly Total: ~$820-1720
Best for: Production, scaling, no GPU management
- GPU Rental: $7,000+/month (24/7 A100s)
- DevOps: Ongoing work
- Monitoring: Custom setup
- Total: $10,000+/month
Savings with Cloud Mode: 75-85%!
# Test locally first (free!)
export DATABASE_URL=postgresql://...
export TRAIN_RL_LOCAL=true
MODE=list python -m src.training.babylon_trainer
MODE=single python -m src.training.babylon_trainerGoal: Verify data collection, test training flow, iterate quickly
# Scale to cloud when ready
export WANDB_API_KEY=your-key
MODE=single python -m src.training.babylon_trainerGoal: Production deployment, W&B managed, auto-scaling
- Game Setup: START_HERE.md
- Agent System: examples/babylon-typescript-agent/
- RL Training: python/README.md
- Architecture: TYPESCRIPT_INTEGRATION_MMO.md
RL Training Issues: See python/README.md
Database Issues: Check prisma/schema.prisma
W&B Issues: https://docs.wandb.ai/
Ready to train? See python/QUICK_START.md
π Both local and cloud modes ready!