Release Date: January 24, 2026
🎉 First Public Release
This is the initial public release of the Crisis Response Data Pipeline - a production-ready system for generating synthetic crisis scenario data for fine-tuning large language models.
✨ Key Features
Core Functionality
- Synthetic Crisis Scenario Generation: Generate realistic crisis scenarios using LLMs
- Multi-Perspective Responses: Each scenario includes responses from both civilian and first responder perspectives
- Structured Output: Consistent format with facts, uncertainties, analysis, and guidance
- 40+ Crisis Categories: Comprehensive coverage from day-to-day emergencies to large-scale disasters
Quality Assurance
- Three Quality Levels: Choose the right balance of cost and quality
- Level 1: Free structure validation (default)
- Level 2: Optional content quality check (~$2-5 for 2000 samples)
- Level 3: Optional full critique (~$870 for 2000 samples)
Performance & Cost
- Optimized for Speed: ~9-13 seconds per sample
- Cost Effective: ~$11.18 for 2000 samples (default configuration)
- Parallel Processing: Configurable parallel sample generation
- Resume Capability: Automatically resume from interruptions
Training Ready
- Multiple Formats: Convert to instruction, conversational, or completion formats
- Hugging Face Ready: Complete dataset with 2000 examples included
- Fine-Tuning Compatible: Works with OpenAI, Anthropic, Hugging Face, and local models
📦 What's Included
Code
- Complete pipeline implementation
- CLI interface (
main.py) - Configuration system
- Validation and quality checks
- Training format conversion
Documentation
- Comprehensive README
- Windows Command Prompt guide
- Large dataset generation guides (1K and 2K)
- Cost estimation guide
- Quality assurance documentation
Dataset
- 2000 training examples (1000 scenarios × 2 perspectives)
- Hugging Face dataset preparation
- Complete dataset card and documentation
Tests
- Unit tests for all major components
- Test runner script
🚀 Quick Start
- Install dependencies
pip install -r requirements.txt
-
Configure API keys (create .env file)
Add your OPEN_API_KEY, ANTHROPIC_API_KEY, and GEMINI_API_KEY -
Generate your first samples
python main.py generate --n 10