Clinical Trial Protocol AI Designer

A comprehensive, AI-powered clinical trial protocol generation platform built with Streamlit. It uses a multi-agent debate system where six expert stakeholders argue, negotiate, and reach consensus on every protocol decision — enforcing real regulatory and scientific guidelines at each step.

Overview

Designing a clinical trial protocol is one of the most complex, high-stakes tasks in medicine. It requires input from medical experts, biostatisticians, regulatory specialists, site operations teams, patient advocates, and finance directors — all of whom have different (often conflicting) priorities.

This platform simulates that real-world multi-stakeholder process using AI agents. Instead of generating a generic protocol template, each major protocol decision goes through a live debate where six expert personas argue their positions, cite guidelines, push back on each other, and only settle when every stakeholder is satisfied.

The result is a protocol that reflects real-world trade-offs, is guideline-compliant, and contains zero placeholder values.

How It Works

Step-by-Step Flow

Enter Study Parameters — The user fills out a guided wizard with:
- Disease indication and therapeutic area
- Study phase (Phase 1, 2, or 3)
- Investigational drug/intervention name
- Target enrollment and study duration
Choose Generation Mode — Select between Draft (fast), Review (balanced), or Submission (thorough) quality levels.
AI Agents Debate — Six expert agents debate five key protocol topics in sequence:
- Primary Endpoint selection
- Sample Size calculation
- Eligibility Criteria definition
- Visit Schedule design
- Budget & Timeline planning
Consensus is Reached — Each debate runs through three structured rounds. Consensus is tracked dynamically (30% → 50% → 80–95%) until all agents agree on a decision that meets their non-negotiable requirements.
Protocol is Assembled — All debate decisions are compiled into a complete clinical protocol object using Pydantic data models.
Validation Report — The protocol is automatically scored across five dimensions: completeness, scientific validity, regulatory compliance, operational feasibility, and ethical considerations.
Download PDF — A fully formatted, professional PDF is generated containing all protocol sections with no TBD or placeholder values.

The Expert Agent Panel

Six AI personas represent real stakeholder roles found in clinical trial design. Each has a set of non-negotiable requirements they enforce during every debate.

1. Medical Monitor (Chief Medical Officer)

Focus: Patient safety, clinical meaningfulness of endpoints, safety monitoring frequency
Won't budge on: Any decision that compromises patient safety or clinical validity
Enforces: FDA safety assessment requirements, minimum monitoring frequency, benefit-risk standards

2. Statistician (Head of Biostatistics)

Focus: Statistical power, sample size methodology, endpoint measurability
Won't budge on: Minimum 80% statistical power, Type I error ≤ 0.05, proper dropout adjustments
Enforces: Power calculation standards, interim analysis rules for large trials (>300 patients), measurement validity requirements

3. Regulatory Affairs (VP Regulatory Affairs)

Focus: FDA compliance, regulatory precedent, submission readiness
Won't budge on: Established regulatory frameworks for the indication, proper endpoint precedent, GCP compliance
Enforces: FDA endpoint requirements for specific indications, regulatory submission standards, ethical review criteria

4. Site Feasibility (Site Operations Director)

Focus: Enrollment realism, site capabilities, operational logistics
Flexible on: Site numbers (can be adjusted based on other constraints)
Enforces: Realistic enrollment rates based on disease prevalence, site capacity constraints, screen failure rate adjustments (typically 20–30%)

5. Patient Advocate

Focus: Patient burden, quality of life, visit frequency
Won't budge on: Maximum 2 visits per month unless safety concern, procedures per visit < 4 hours, availability of home/remote options
Enforces: Ethical treatment of participants, dropout risk reduction, patient-centric scheduling

6. Finance Director

Focus: Budget ceilings, cost-effectiveness, resource optimization
Won't budge on: Total cost must not exceed the budget ceiling; every dollar must be justified
Enforces: Per-patient cost justification, screen failure cost inclusion, milestone-based spending controls

Debate Process

Each of the five protocol topics goes through three structured debate rounds:

Round 1 — Propose

Each agent submits an initial proposal for the topic (e.g., "I propose OS as the primary endpoint because..."). Agents cite applicable guidelines and their own area of expertise.

Round 2 — Check Compliance

Agents review each other's proposals and flag any violations of their mandatory requirements. An agent may:

Approve a proposal if it meets their requirements
Reject it with specific reasons tied to guidelines
Propose a compromise if they see a path to resolution

Round 3 — Final Agreement

Based on compliance feedback, agents revise or accept proposals. The final agreed-upon decision is recorded with:

The specific protocol decision (e.g., "Overall Survival at 24 months")
The compliance rate (percentage of agents who approved)
The consensus level reached

If guidelines conflict in a way that makes agreement impossible, agents state "no solution possible" and the system escalates to the fallback generator.

Consensus Tracking

Consensus is updated dynamically as the debate progresses:

Start: ~30%
After Round 2: ~50%
After Round 3: 80–95% (depending on how contentious the topic is)

Generation Modes

Draft (Fast)

Uses the standard ProtocolGeneratorAgent to produce a protocol quickly without running debates. Ideal for rough outlines or initial exploration. Takes seconds.

Review (Balanced) — Default

Runs the full guideline-based debate system with AI agents. Each topic gets a real multi-round debate. Takes several minutes depending on API response times. This is the main mode of the application.

Submission (Thorough)

Similar to Review mode but with additional validation passes and stricter completeness checks. Intended for protocols approaching regulatory submission readiness.

Validation System

After protocol generation, an automated validation report scores the protocol across five weighted dimensions:

Dimension	Weight	What It Checks
Completeness	25%	All required sections present and populated
Scientific Validity	25%	Endpoints are clinically meaningful and powered
Regulatory Compliance	20%	FDA and GCP requirements are met
Operational Feasibility	15%	Enrollment rates, site capacity, timelines are realistic
Ethical Considerations	15%	Patient safety, informed consent, burden management

A protocol where agents reached consensus on all five topics and all guidelines are met will score 95%+.

The validation system is implemented in:

src/validation/enhanced_validation_generator.py — Main scoring logic
src/validation/protocol_validator.py — Field-level validation
src/validation/placeholder_validator.py — Detects any remaining TBD / placeholder values
src/validation/validation_report_generator.py — Report formatting and display

PDF Export

The PDF generator (src/export/pdf_generator.py) produces a fully formatted clinical trial protocol document containing:

Title Page — Study title, clinical trial registry IDs, version tracking, regulatory identifiers, sponsor info, investigator details
Protocol Summary — Key metrics including enrollment rates, per-patient costs, number of sites, and phase-specific highlights
Study Design Section — Detailed design description, randomization methodology, blinding specifications, duration components
Sample Size & Statistics — Full power analysis, evaluable sample size, dropout adjustments, interim analysis plan
Eligibility Criteria — Inclusion/exclusion criteria with scientific rationale for each
Visit Schedule — Detailed schedule with procedures listed at each visit
Budget Appendix — 4-column breakdown of all cost categories with percentages and descriptions

All PDF sections are populated with real content derived from the debate decisions. No TBD, "Not specified", or "To be assigned" values appear anywhere in the output.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Streamlit Frontend                     │
│                    main_enhanced.py                       │
│                                                           │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐  │
│  │  Wizard UI  │  │  Debate View │  │  Validation UI │  │
│  │ (5 steps)   │  │  (live feed) │  │  (scorecard)   │  │
│  └─────────────┘  └──────────────┘  └────────────────┘  │
└─────────────────────────────────────────────────────────-┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│              Agent Debate System                          │
│         guideline_based_debate_system.py                  │
│                                                           │
│  ┌─────────────┐   ┌──────────────┐  ┌───────────────┐  │
│  │   Medical   │   │ Statistician │  │  Regulatory   │  │
│  │   Monitor   │   │              │  │   Affairs     │  │
│  └─────────────┘   └──────────────┘  └───────────────┘  │
│  ┌─────────────┐   ┌──────────────┐  ┌───────────────┐  │
│  │    Site     │   │   Patient    │  │   Finance     │  │
│  │ Feasibility │   │   Advocate   │  │   Director    │  │
│  └─────────────┘   └──────────────┘  └───────────────┘  │
│                                                           │
│   Round 1: Propose → Round 2: Check → Round 3: Agree    │
└─────────────────────────────────────────────────────────-┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│              Protocol Assembly & Validation               │
│                                                           │
│  src/models/protocol.py  — Pydantic data models          │
│  src/validation/         — Scoring and report generation │
│  src/export/             — PDF generation with ReportLab │
└─────────────────────────────────────────────────────────-┘

Session State Management

Streamlit rerenders the entire script on every user interaction. To prevent losing debate progress mid-generation, all debate decisions and the final protocol object are stored in st.session_state. This means:

If the page rerenders during a long AI call, completed topic debates are not re-run
If the user navigates away and returns, they can resume from where they left off
A "View Generated Protocol" button appears in the sidebar after any successful generation

File Structure

├── main_enhanced.py              # Main Streamlit app entry point
├── main.py                       # Alternative simpler entry point
├── main_generator.py             # Standalone protocol generator
├── app.py                        # Lightweight app wrapper
├── pyproject.toml                # Python package dependencies
│
├── src/
│   ├── agents/
│   │   ├── guideline_based_debate_system.py   # Core debate engine (6 agents, 3 rounds)
│   │   ├── comprehensive_generator.py          # Fallback protocol generator
│   │   ├── generator_agent.py                  # ProtocolGeneratorAgent (Draft mode)
│   │   ├── real_debate_generator.py            # CrewAI-based debate (legacy)
│   │   ├── visual_debate_generator.py          # Visual debate UI helper
│   │   ├── mock_debate_generator.py            # Mock debates for testing
│   │   ├── simple_debate_system.py             # Simplified debate (no AI)
│   │   ├── review_coordinator.py               # Multi-agent review orchestration
│   │   ├── safety_validator.py                 # Agent-level safety checks
│   │   ├── base_agent.py                       # Base class for all agents
│   │   ├── agent_factory.py                    # Agent instantiation factory
│   │   └── agent_configs.py                    # Agent personalities and configs
│   │
│   ├── models/
│   │   ├── protocol.py                         # ClinicalProtocol Pydantic model
│   │   ├── enhanced_protocol.py                # Extended protocol fields
│   │   ├── nih_protocol.py                     # NIH-format protocol model
│   │   ├── validation.py                       # Validation result models
│   │   └── debate.py                           # Debate state models
│   │
│   ├── ui/
│   │   ├── progressive_wizard.py               # 5-step wizard UI (AIProtocolArchitect)
│   │   ├── modern_debate_ui.py                 # Debate feed UI components
│   │   ├── validation_dashboard.py             # Validation score display
│   │   ├── visualizations.py                   # Plotly charts and consensus meter
│   │   ├── components.py                       # Reusable UI components
│   │   ├── database_manager_ui.py              # Protocol database browser
│   │   └── protocol_generator_ui.py            # Legacy generator UI
│   │
│   ├── validation/
│   │   ├── enhanced_validation_generator.py    # Main validation and scoring
│   │   ├── protocol_validator.py               # Field-level protocol validation
│   │   ├── placeholder_validator.py            # TBD/placeholder detection
│   │   └── validation_report_generator.py      # Report formatting
│   │
│   ├── export/
│   │   └── pdf_generator.py                    # ReportLab PDF generation
│   │
│   ├── database/
│   │   └── (protocol storage and retrieval)
│   │
│   └── utils/
│       └── (shared helper utilities)
│
├── tests/
│   └── (test files)
│
├── .streamlit/
│   └── config.toml               # Streamlit server configuration
│
└── nextjs-ui/                    # Experimental Next.js frontend (not the main app)

Technology Stack

Technology	Version	Purpose
Streamlit	≥ 1.47	Interactive web UI and session management
Anthropic Claude	claude-3-5-haiku-20241022	AI agent debate generation
CrewAI	≥ 0.28	Multi-agent orchestration framework
Pydantic	≥ 2.0	Data validation and protocol modeling
ReportLab	≥ 4.4	Professional PDF document generation
Plotly	≥ 6.2	Interactive charts and consensus meter
pandas	≥ 2.3	Data manipulation and tabular display
NumPy / SciPy	latest	Statistical calculations and power analysis
python-docx	≥ 1.2	Word document export

Setup & Configuration

Prerequisites

Python 3.11 or higher
An Anthropic API key (for live AI debates — optional, see fallback below)

Installation

Dependencies are managed via pyproject.toml and uv. On Replit, they are installed automatically. For local development:

pip install -e .

Or with uv:

uv sync

Environment Variables

Variable	Required	Description
`ANTHROPIC_API_KEY`	Optional	Enables live AI agent debates using Claude 3.5 Haiku. Without this, the system uses an evidence-based fallback generator that produces high-quality protocols from clinical trial best practices and FDA guidelines.

What Happens Without an API Key

If no Anthropic API key is present (or if the key is invalid / quota is exhausted), the system automatically switches to _run_fallback_debate(), which:

Uses phase-specific evidence-based defaults (e.g., OS for Phase 3 oncology, ORR for Phase 2)
Calculates statistically valid sample sizes using standard power formulas
Generates realistic budgets based on phase and indication
Shows the same debate UI with structured "evidence-based" agent positions
Produces identical PDF output with full content

No data is lost and no errors are shown to the user — the system is fully functional in fallback mode.

Running the App

On Replit

The workflow is already configured. Click Run or start the Enhanced Protocol Generator workflow.

streamlit run main_enhanced.py --server.port 5000

Locally

streamlit run main_enhanced.py

The app will be available at http://localhost:8501 by default.

Using the Application

1. Start a New Protocol

From the home screen, click Start New Protocol or select AI Protocol Architect from the navigation sidebar.

2. Complete the Wizard

Fill out all five steps:

Step 1: Disease indication and intervention name
Step 2: Study design (phase, randomization, blinding, duration)
Step 3: Patient population and eligibility criteria
Step 4: Endpoints and assessments
Step 5: Review and confirm before generation

A demo dataset (breast cancer Phase 3) can be loaded from the sidebar for quick exploration.

3. Choose Generation Mode

Select Draft for speed or Review/Submission for full AI debate. The mode selector is in the sidebar under Generation Settings.

4. Generate

Click Generate Protocol. The app will:

Initialize the debate system
Run agents through three rounds per topic
Display live agent messages and consensus updates
Assemble and validate the final protocol

5. Download PDF

After generation, a download button for the full protocol PDF appears at the top of the results screen.

Troubleshooting

Generation Resets to Ready Screen

This can happen if the AI debate takes longer than Streamlit's default session timeout, or if an error occurs mid-debate. The app handles this by:

Storing each completed topic debate decision in st.session_state immediately after it finishes
Catching exceptions at the debate topic level and falling back to evidence-based decisions rather than crashing
Displaying a View Generated Protocol button in the sidebar if a protocol was previously completed

If you see the ready screen unexpectedly, check the sidebar for the View Generated Protocol button.

Slow Initial Load (High LCP)

The app previously used heavy CSS effects (gradients, backdrop-filter, multiple box-shadow layers) that caused Largest Contentful Paint times exceeding 90 seconds. These have been replaced with solid-color equivalents that maintain the same dark theme without the performance cost.

If you still see slow loads, check:

Whether a large number of Plotly charts are being rendered simultaneously
Browser extensions that may be interfering with the WebSocket connection

AI Service Not Available

If the Anthropic API is unavailable, the status banner at the top of the debate section will tell you why (no key, invalid key, quota exceeded, etc.). The system will automatically use evidence-based protocol generation. No action is needed — the output quality is still high.

Protocol Contains Placeholder Warnings

The validation system will warn (not error) if it detects placeholder-like values. These warnings don't block PDF generation. If you see them, it typically means a fallback decision produced a less specific value for a particular field. The system is designed to handle this gracefully.

Contributing

The codebase follows a clear layered architecture:

UI layer (src/ui/) — All Streamlit rendering code
Agent layer (src/agents/) — Debate and generation logic
Model layer (src/models/) — Data structures
Validation layer (src/validation/) — Scoring and quality checks
Export layer (src/export/) — Document generation

When adding a new agent or debate topic, add it to guideline_based_debate_system.py following the existing pattern of must_enforce requirements and three-round debate logic.

License

This project is intended for research and educational use in clinical trial design. Always ensure that any generated protocols are reviewed by qualified medical, statistical, and regulatory professionals before use in actual research.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.streamlit		.streamlit
attached_assets		attached_assets
nextjs-ui		nextjs-ui
src		src
tests		tests
.replit		.replit
README.md		README.md
app.py		app.py
main.py		main.py
main_enhanced.py		main_enhanced.py
main_generator.py		main_generator.py
main_simple.py		main_simple.py
professional_protocol.pdf		professional_protocol.pdf
protocol_database.db		protocol_database.db
pyproject.toml		pyproject.toml
replit.md		replit.md
streamlit_sprintf_fix.js		streamlit_sprintf_fix.js
test_app.py		test_app.py
test_protocol.pdf		test_protocol.pdf
test_validation_ui.py		test_validation_ui.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Clinical Trial Protocol AI Designer

Table of Contents

Overview

How It Works

Step-by-Step Flow

The Expert Agent Panel

1. Medical Monitor (Chief Medical Officer)

2. Statistician (Head of Biostatistics)

3. Regulatory Affairs (VP Regulatory Affairs)

4. Site Feasibility (Site Operations Director)

5. Patient Advocate

6. Finance Director

Debate Process

Round 1 — Propose

Round 2 — Check Compliance

Round 3 — Final Agreement

Consensus Tracking

Generation Modes

Draft (Fast)

Review (Balanced) — Default

Submission (Thorough)

Validation System

PDF Export

Architecture

Session State Management

File Structure

Technology Stack

Setup & Configuration

Prerequisites

Installation

Environment Variables

What Happens Without an API Key

Running the App

On Replit

Locally

Using the Application

1. Start a New Protocol

2. Complete the Wizard

3. Choose Generation Mode

4. Generate

5. Download PDF

Troubleshooting

Generation Resets to Ready Screen

Slow Initial Load (High LCP)

AI Service Not Available

Protocol Contains Placeholder Warnings

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages