A comprehensive, AI-powered clinical trial protocol generation platform built with Streamlit. It uses a multi-agent debate system where six expert stakeholders argue, negotiate, and reach consensus on every protocol decision β enforcing real regulatory and scientific guidelines at each step.
- Overview
- How It Works
- The Expert Agent Panel
- Debate Process
- Generation Modes
- Validation System
- PDF Export
- Architecture
- File Structure
- Technology Stack
- Setup & Configuration
- Environment Variables
- Running the App
- Troubleshooting
Designing a clinical trial protocol is one of the most complex, high-stakes tasks in medicine. It requires input from medical experts, biostatisticians, regulatory specialists, site operations teams, patient advocates, and finance directors β all of whom have different (often conflicting) priorities.
This platform simulates that real-world multi-stakeholder process using AI agents. Instead of generating a generic protocol template, each major protocol decision goes through a live debate where six expert personas argue their positions, cite guidelines, push back on each other, and only settle when every stakeholder is satisfied.
The result is a protocol that reflects real-world trade-offs, is guideline-compliant, and contains zero placeholder values.
-
Enter Study Parameters β The user fills out a guided wizard with:
- Disease indication and therapeutic area
- Study phase (Phase 1, 2, or 3)
- Investigational drug/intervention name
- Target enrollment and study duration
-
Choose Generation Mode β Select between Draft (fast), Review (balanced), or Submission (thorough) quality levels.
-
AI Agents Debate β Six expert agents debate five key protocol topics in sequence:
- Primary Endpoint selection
- Sample Size calculation
- Eligibility Criteria definition
- Visit Schedule design
- Budget & Timeline planning
-
Consensus is Reached β Each debate runs through three structured rounds. Consensus is tracked dynamically (30% β 50% β 80β95%) until all agents agree on a decision that meets their non-negotiable requirements.
-
Protocol is Assembled β All debate decisions are compiled into a complete clinical protocol object using Pydantic data models.
-
Validation Report β The protocol is automatically scored across five dimensions: completeness, scientific validity, regulatory compliance, operational feasibility, and ethical considerations.
-
Download PDF β A fully formatted, professional PDF is generated containing all protocol sections with no TBD or placeholder values.
Six AI personas represent real stakeholder roles found in clinical trial design. Each has a set of non-negotiable requirements they enforce during every debate.
- Focus: Patient safety, clinical meaningfulness of endpoints, safety monitoring frequency
- Won't budge on: Any decision that compromises patient safety or clinical validity
- Enforces: FDA safety assessment requirements, minimum monitoring frequency, benefit-risk standards
- Focus: Statistical power, sample size methodology, endpoint measurability
- Won't budge on: Minimum 80% statistical power, Type I error β€ 0.05, proper dropout adjustments
- Enforces: Power calculation standards, interim analysis rules for large trials (>300 patients), measurement validity requirements
- Focus: FDA compliance, regulatory precedent, submission readiness
- Won't budge on: Established regulatory frameworks for the indication, proper endpoint precedent, GCP compliance
- Enforces: FDA endpoint requirements for specific indications, regulatory submission standards, ethical review criteria
- Focus: Enrollment realism, site capabilities, operational logistics
- Flexible on: Site numbers (can be adjusted based on other constraints)
- Enforces: Realistic enrollment rates based on disease prevalence, site capacity constraints, screen failure rate adjustments (typically 20β30%)
- Focus: Patient burden, quality of life, visit frequency
- Won't budge on: Maximum 2 visits per month unless safety concern, procedures per visit < 4 hours, availability of home/remote options
- Enforces: Ethical treatment of participants, dropout risk reduction, patient-centric scheduling
- Focus: Budget ceilings, cost-effectiveness, resource optimization
- Won't budge on: Total cost must not exceed the budget ceiling; every dollar must be justified
- Enforces: Per-patient cost justification, screen failure cost inclusion, milestone-based spending controls
Each of the five protocol topics goes through three structured debate rounds:
Each agent submits an initial proposal for the topic (e.g., "I propose OS as the primary endpoint because..."). Agents cite applicable guidelines and their own area of expertise.
Agents review each other's proposals and flag any violations of their mandatory requirements. An agent may:
- Approve a proposal if it meets their requirements
- Reject it with specific reasons tied to guidelines
- Propose a compromise if they see a path to resolution
Based on compliance feedback, agents revise or accept proposals. The final agreed-upon decision is recorded with:
- The specific protocol decision (e.g., "Overall Survival at 24 months")
- The compliance rate (percentage of agents who approved)
- The consensus level reached
If guidelines conflict in a way that makes agreement impossible, agents state "no solution possible" and the system escalates to the fallback generator.
Consensus is updated dynamically as the debate progresses:
- Start: ~30%
- After Round 2: ~50%
- After Round 3: 80β95% (depending on how contentious the topic is)
Uses the standard ProtocolGeneratorAgent to produce a protocol quickly without running debates. Ideal for rough outlines or initial exploration. Takes seconds.
Runs the full guideline-based debate system with AI agents. Each topic gets a real multi-round debate. Takes several minutes depending on API response times. This is the main mode of the application.
Similar to Review mode but with additional validation passes and stricter completeness checks. Intended for protocols approaching regulatory submission readiness.
After protocol generation, an automated validation report scores the protocol across five weighted dimensions:
| Dimension | Weight | What It Checks |
|---|---|---|
| Completeness | 25% | All required sections present and populated |
| Scientific Validity | 25% | Endpoints are clinically meaningful and powered |
| Regulatory Compliance | 20% | FDA and GCP requirements are met |
| Operational Feasibility | 15% | Enrollment rates, site capacity, timelines are realistic |
| Ethical Considerations | 15% | Patient safety, informed consent, burden management |
A protocol where agents reached consensus on all five topics and all guidelines are met will score 95%+.
The validation system is implemented in:
src/validation/enhanced_validation_generator.pyβ Main scoring logicsrc/validation/protocol_validator.pyβ Field-level validationsrc/validation/placeholder_validator.pyβ Detects any remaining TBD / placeholder valuessrc/validation/validation_report_generator.pyβ Report formatting and display
The PDF generator (src/export/pdf_generator.py) produces a fully formatted clinical trial protocol document containing:
- Title Page β Study title, clinical trial registry IDs, version tracking, regulatory identifiers, sponsor info, investigator details
- Protocol Summary β Key metrics including enrollment rates, per-patient costs, number of sites, and phase-specific highlights
- Study Design Section β Detailed design description, randomization methodology, blinding specifications, duration components
- Sample Size & Statistics β Full power analysis, evaluable sample size, dropout adjustments, interim analysis plan
- Eligibility Criteria β Inclusion/exclusion criteria with scientific rationale for each
- Visit Schedule β Detailed schedule with procedures listed at each visit
- Budget Appendix β 4-column breakdown of all cost categories with percentages and descriptions
All PDF sections are populated with real content derived from the debate decisions. No TBD, "Not specified", or "To be assigned" values appear anywhere in the output.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit Frontend β
β main_enhanced.py β
β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββ β
β β Wizard UI β β Debate View β β Validation UI β β
β β (5 steps) β β (live feed) β β (scorecard) β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ-β
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Debate System β
β guideline_based_debate_system.py β
β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β Medical β β Statistician β β Regulatory β β
β β Monitor β β β β Affairs β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β Site β β Patient β β Finance β β
β β Feasibility β β Advocate β β Director β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β
β Round 1: Propose β Round 2: Check β Round 3: Agree β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ-β
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Protocol Assembly & Validation β
β β
β src/models/protocol.py β Pydantic data models β
β src/validation/ β Scoring and report generation β
β src/export/ β PDF generation with ReportLab β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ-β
Streamlit rerenders the entire script on every user interaction. To prevent losing debate progress mid-generation, all debate decisions and the final protocol object are stored in st.session_state. This means:
- If the page rerenders during a long AI call, completed topic debates are not re-run
- If the user navigates away and returns, they can resume from where they left off
- A "View Generated Protocol" button appears in the sidebar after any successful generation
βββ main_enhanced.py # Main Streamlit app entry point
βββ main.py # Alternative simpler entry point
βββ main_generator.py # Standalone protocol generator
βββ app.py # Lightweight app wrapper
βββ pyproject.toml # Python package dependencies
β
βββ src/
β βββ agents/
β β βββ guideline_based_debate_system.py # Core debate engine (6 agents, 3 rounds)
β β βββ comprehensive_generator.py # Fallback protocol generator
β β βββ generator_agent.py # ProtocolGeneratorAgent (Draft mode)
β β βββ real_debate_generator.py # CrewAI-based debate (legacy)
β β βββ visual_debate_generator.py # Visual debate UI helper
β β βββ mock_debate_generator.py # Mock debates for testing
β β βββ simple_debate_system.py # Simplified debate (no AI)
β β βββ review_coordinator.py # Multi-agent review orchestration
β β βββ safety_validator.py # Agent-level safety checks
β β βββ base_agent.py # Base class for all agents
β β βββ agent_factory.py # Agent instantiation factory
β β βββ agent_configs.py # Agent personalities and configs
β β
β βββ models/
β β βββ protocol.py # ClinicalProtocol Pydantic model
β β βββ enhanced_protocol.py # Extended protocol fields
β β βββ nih_protocol.py # NIH-format protocol model
β β βββ validation.py # Validation result models
β β βββ debate.py # Debate state models
β β
β βββ ui/
β β βββ progressive_wizard.py # 5-step wizard UI (AIProtocolArchitect)
β β βββ modern_debate_ui.py # Debate feed UI components
β β βββ validation_dashboard.py # Validation score display
β β βββ visualizations.py # Plotly charts and consensus meter
β β βββ components.py # Reusable UI components
β β βββ database_manager_ui.py # Protocol database browser
β β βββ protocol_generator_ui.py # Legacy generator UI
β β
β βββ validation/
β β βββ enhanced_validation_generator.py # Main validation and scoring
β β βββ protocol_validator.py # Field-level protocol validation
β β βββ placeholder_validator.py # TBD/placeholder detection
β β βββ validation_report_generator.py # Report formatting
β β
β βββ export/
β β βββ pdf_generator.py # ReportLab PDF generation
β β
β βββ database/
β β βββ (protocol storage and retrieval)
β β
β βββ utils/
β βββ (shared helper utilities)
β
βββ tests/
β βββ (test files)
β
βββ .streamlit/
β βββ config.toml # Streamlit server configuration
β
βββ nextjs-ui/ # Experimental Next.js frontend (not the main app)
| Technology | Version | Purpose |
|---|---|---|
| Streamlit | β₯ 1.47 | Interactive web UI and session management |
| Anthropic Claude | claude-3-5-haiku-20241022 | AI agent debate generation |
| CrewAI | β₯ 0.28 | Multi-agent orchestration framework |
| Pydantic | β₯ 2.0 | Data validation and protocol modeling |
| ReportLab | β₯ 4.4 | Professional PDF document generation |
| Plotly | β₯ 6.2 | Interactive charts and consensus meter |
| pandas | β₯ 2.3 | Data manipulation and tabular display |
| NumPy / SciPy | latest | Statistical calculations and power analysis |
| python-docx | β₯ 1.2 | Word document export |
- Python 3.11 or higher
- An Anthropic API key (for live AI debates β optional, see fallback below)
Dependencies are managed via pyproject.toml and uv. On Replit, they are installed automatically. For local development:
pip install -e .Or with uv:
uv sync| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Optional | Enables live AI agent debates using Claude 3.5 Haiku. Without this, the system uses an evidence-based fallback generator that produces high-quality protocols from clinical trial best practices and FDA guidelines. |
If no Anthropic API key is present (or if the key is invalid / quota is exhausted), the system automatically switches to _run_fallback_debate(), which:
- Uses phase-specific evidence-based defaults (e.g., OS for Phase 3 oncology, ORR for Phase 2)
- Calculates statistically valid sample sizes using standard power formulas
- Generates realistic budgets based on phase and indication
- Shows the same debate UI with structured "evidence-based" agent positions
- Produces identical PDF output with full content
No data is lost and no errors are shown to the user β the system is fully functional in fallback mode.
The workflow is already configured. Click Run or start the Enhanced Protocol Generator workflow.
streamlit run main_enhanced.py --server.port 5000streamlit run main_enhanced.pyThe app will be available at http://localhost:8501 by default.
From the home screen, click Start New Protocol or select AI Protocol Architect from the navigation sidebar.
Fill out all five steps:
- Step 1: Disease indication and intervention name
- Step 2: Study design (phase, randomization, blinding, duration)
- Step 3: Patient population and eligibility criteria
- Step 4: Endpoints and assessments
- Step 5: Review and confirm before generation
A demo dataset (breast cancer Phase 3) can be loaded from the sidebar for quick exploration.
Select Draft for speed or Review/Submission for full AI debate. The mode selector is in the sidebar under Generation Settings.
Click Generate Protocol. The app will:
- Initialize the debate system
- Run agents through three rounds per topic
- Display live agent messages and consensus updates
- Assemble and validate the final protocol
After generation, a download button for the full protocol PDF appears at the top of the results screen.
This can happen if the AI debate takes longer than Streamlit's default session timeout, or if an error occurs mid-debate. The app handles this by:
- Storing each completed topic debate decision in
st.session_stateimmediately after it finishes - Catching exceptions at the debate topic level and falling back to evidence-based decisions rather than crashing
- Displaying a View Generated Protocol button in the sidebar if a protocol was previously completed
If you see the ready screen unexpectedly, check the sidebar for the View Generated Protocol button.
The app previously used heavy CSS effects (gradients, backdrop-filter, multiple box-shadow layers) that caused Largest Contentful Paint times exceeding 90 seconds. These have been replaced with solid-color equivalents that maintain the same dark theme without the performance cost.
If you still see slow loads, check:
- Whether a large number of Plotly charts are being rendered simultaneously
- Browser extensions that may be interfering with the WebSocket connection
If the Anthropic API is unavailable, the status banner at the top of the debate section will tell you why (no key, invalid key, quota exceeded, etc.). The system will automatically use evidence-based protocol generation. No action is needed β the output quality is still high.
The validation system will warn (not error) if it detects placeholder-like values. These warnings don't block PDF generation. If you see them, it typically means a fallback decision produced a less specific value for a particular field. The system is designed to handle this gracefully.
The codebase follows a clear layered architecture:
- UI layer (
src/ui/) β All Streamlit rendering code - Agent layer (
src/agents/) β Debate and generation logic - Model layer (
src/models/) β Data structures - Validation layer (
src/validation/) β Scoring and quality checks - Export layer (
src/export/) β Document generation
When adding a new agent or debate topic, add it to guideline_based_debate_system.py following the existing pattern of must_enforce requirements and three-round debate logic.
This project is intended for research and educational use in clinical trial design. Always ensure that any generated protocols are reviewed by qualified medical, statistical, and regulatory professionals before use in actual research.