Skip to content

Release v1.0.0 - Initial Release

Latest

Choose a tag to compare

@ianktoo ianktoo released this 25 Jan 06:58
· 1 commit to master since this release
b786421

Release Date: January 24, 2026

🎉 First Public Release

This is the initial public release of the Crisis Response Data Pipeline - a production-ready system for generating synthetic crisis scenario data for fine-tuning large language models.

✨ Key Features

Core Functionality

  • Synthetic Crisis Scenario Generation: Generate realistic crisis scenarios using LLMs
  • Multi-Perspective Responses: Each scenario includes responses from both civilian and first responder perspectives
  • Structured Output: Consistent format with facts, uncertainties, analysis, and guidance
  • 40+ Crisis Categories: Comprehensive coverage from day-to-day emergencies to large-scale disasters

Quality Assurance

  • Three Quality Levels: Choose the right balance of cost and quality
    • Level 1: Free structure validation (default)
    • Level 2: Optional content quality check (~$2-5 for 2000 samples)
    • Level 3: Optional full critique (~$870 for 2000 samples)

Performance & Cost

  • Optimized for Speed: ~9-13 seconds per sample
  • Cost Effective: ~$11.18 for 2000 samples (default configuration)
  • Parallel Processing: Configurable parallel sample generation
  • Resume Capability: Automatically resume from interruptions

Training Ready

  • Multiple Formats: Convert to instruction, conversational, or completion formats
  • Hugging Face Ready: Complete dataset with 2000 examples included
  • Fine-Tuning Compatible: Works with OpenAI, Anthropic, Hugging Face, and local models

📦 What's Included

Code

  • Complete pipeline implementation
  • CLI interface (main.py)
  • Configuration system
  • Validation and quality checks
  • Training format conversion

Documentation

  • Comprehensive README
  • Windows Command Prompt guide
  • Large dataset generation guides (1K and 2K)
  • Cost estimation guide
  • Quality assurance documentation

Dataset

  • 2000 training examples (1000 scenarios × 2 perspectives)
  • Hugging Face dataset preparation
  • Complete dataset card and documentation

Tests

  • Unit tests for all major components
  • Test runner script

🚀 Quick Start

  1. Install dependencies

pip install -r requirements.txt

  1. Configure API keys (create .env file)
    Add your OPEN_API_KEY, ANTHROPIC_API_KEY, and GEMINI_API_KEY

  2. Generate your first samples

python main.py generate --n 10