Skip to content

AKarode/Clinical-Trial-Review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

88 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Clinical Trial Protocol AI Designer

A comprehensive, AI-powered clinical trial protocol generation platform built with Streamlit. It uses a multi-agent debate system where six expert stakeholders argue, negotiate, and reach consensus on every protocol decision β€” enforcing real regulatory and scientific guidelines at each step.


Table of Contents


Overview

Designing a clinical trial protocol is one of the most complex, high-stakes tasks in medicine. It requires input from medical experts, biostatisticians, regulatory specialists, site operations teams, patient advocates, and finance directors β€” all of whom have different (often conflicting) priorities.

This platform simulates that real-world multi-stakeholder process using AI agents. Instead of generating a generic protocol template, each major protocol decision goes through a live debate where six expert personas argue their positions, cite guidelines, push back on each other, and only settle when every stakeholder is satisfied.

The result is a protocol that reflects real-world trade-offs, is guideline-compliant, and contains zero placeholder values.


How It Works

Step-by-Step Flow

  1. Enter Study Parameters β€” The user fills out a guided wizard with:

    • Disease indication and therapeutic area
    • Study phase (Phase 1, 2, or 3)
    • Investigational drug/intervention name
    • Target enrollment and study duration
  2. Choose Generation Mode β€” Select between Draft (fast), Review (balanced), or Submission (thorough) quality levels.

  3. AI Agents Debate β€” Six expert agents debate five key protocol topics in sequence:

    • Primary Endpoint selection
    • Sample Size calculation
    • Eligibility Criteria definition
    • Visit Schedule design
    • Budget & Timeline planning
  4. Consensus is Reached β€” Each debate runs through three structured rounds. Consensus is tracked dynamically (30% β†’ 50% β†’ 80–95%) until all agents agree on a decision that meets their non-negotiable requirements.

  5. Protocol is Assembled β€” All debate decisions are compiled into a complete clinical protocol object using Pydantic data models.

  6. Validation Report β€” The protocol is automatically scored across five dimensions: completeness, scientific validity, regulatory compliance, operational feasibility, and ethical considerations.

  7. Download PDF β€” A fully formatted, professional PDF is generated containing all protocol sections with no TBD or placeholder values.


The Expert Agent Panel

Six AI personas represent real stakeholder roles found in clinical trial design. Each has a set of non-negotiable requirements they enforce during every debate.

1. Medical Monitor (Chief Medical Officer)

  • Focus: Patient safety, clinical meaningfulness of endpoints, safety monitoring frequency
  • Won't budge on: Any decision that compromises patient safety or clinical validity
  • Enforces: FDA safety assessment requirements, minimum monitoring frequency, benefit-risk standards

2. Statistician (Head of Biostatistics)

  • Focus: Statistical power, sample size methodology, endpoint measurability
  • Won't budge on: Minimum 80% statistical power, Type I error ≀ 0.05, proper dropout adjustments
  • Enforces: Power calculation standards, interim analysis rules for large trials (>300 patients), measurement validity requirements

3. Regulatory Affairs (VP Regulatory Affairs)

  • Focus: FDA compliance, regulatory precedent, submission readiness
  • Won't budge on: Established regulatory frameworks for the indication, proper endpoint precedent, GCP compliance
  • Enforces: FDA endpoint requirements for specific indications, regulatory submission standards, ethical review criteria

4. Site Feasibility (Site Operations Director)

  • Focus: Enrollment realism, site capabilities, operational logistics
  • Flexible on: Site numbers (can be adjusted based on other constraints)
  • Enforces: Realistic enrollment rates based on disease prevalence, site capacity constraints, screen failure rate adjustments (typically 20–30%)

5. Patient Advocate

  • Focus: Patient burden, quality of life, visit frequency
  • Won't budge on: Maximum 2 visits per month unless safety concern, procedures per visit < 4 hours, availability of home/remote options
  • Enforces: Ethical treatment of participants, dropout risk reduction, patient-centric scheduling

6. Finance Director

  • Focus: Budget ceilings, cost-effectiveness, resource optimization
  • Won't budge on: Total cost must not exceed the budget ceiling; every dollar must be justified
  • Enforces: Per-patient cost justification, screen failure cost inclusion, milestone-based spending controls

Debate Process

Each of the five protocol topics goes through three structured debate rounds:

Round 1 β€” Propose

Each agent submits an initial proposal for the topic (e.g., "I propose OS as the primary endpoint because..."). Agents cite applicable guidelines and their own area of expertise.

Round 2 β€” Check Compliance

Agents review each other's proposals and flag any violations of their mandatory requirements. An agent may:

  • Approve a proposal if it meets their requirements
  • Reject it with specific reasons tied to guidelines
  • Propose a compromise if they see a path to resolution

Round 3 β€” Final Agreement

Based on compliance feedback, agents revise or accept proposals. The final agreed-upon decision is recorded with:

  • The specific protocol decision (e.g., "Overall Survival at 24 months")
  • The compliance rate (percentage of agents who approved)
  • The consensus level reached

If guidelines conflict in a way that makes agreement impossible, agents state "no solution possible" and the system escalates to the fallback generator.

Consensus Tracking

Consensus is updated dynamically as the debate progresses:

  • Start: ~30%
  • After Round 2: ~50%
  • After Round 3: 80–95% (depending on how contentious the topic is)

Generation Modes

Draft (Fast)

Uses the standard ProtocolGeneratorAgent to produce a protocol quickly without running debates. Ideal for rough outlines or initial exploration. Takes seconds.

Review (Balanced) β€” Default

Runs the full guideline-based debate system with AI agents. Each topic gets a real multi-round debate. Takes several minutes depending on API response times. This is the main mode of the application.

Submission (Thorough)

Similar to Review mode but with additional validation passes and stricter completeness checks. Intended for protocols approaching regulatory submission readiness.


Validation System

After protocol generation, an automated validation report scores the protocol across five weighted dimensions:

Dimension Weight What It Checks
Completeness 25% All required sections present and populated
Scientific Validity 25% Endpoints are clinically meaningful and powered
Regulatory Compliance 20% FDA and GCP requirements are met
Operational Feasibility 15% Enrollment rates, site capacity, timelines are realistic
Ethical Considerations 15% Patient safety, informed consent, burden management

A protocol where agents reached consensus on all five topics and all guidelines are met will score 95%+.

The validation system is implemented in:

  • src/validation/enhanced_validation_generator.py β€” Main scoring logic
  • src/validation/protocol_validator.py β€” Field-level validation
  • src/validation/placeholder_validator.py β€” Detects any remaining TBD / placeholder values
  • src/validation/validation_report_generator.py β€” Report formatting and display

PDF Export

The PDF generator (src/export/pdf_generator.py) produces a fully formatted clinical trial protocol document containing:

  • Title Page β€” Study title, clinical trial registry IDs, version tracking, regulatory identifiers, sponsor info, investigator details
  • Protocol Summary β€” Key metrics including enrollment rates, per-patient costs, number of sites, and phase-specific highlights
  • Study Design Section β€” Detailed design description, randomization methodology, blinding specifications, duration components
  • Sample Size & Statistics β€” Full power analysis, evaluable sample size, dropout adjustments, interim analysis plan
  • Eligibility Criteria β€” Inclusion/exclusion criteria with scientific rationale for each
  • Visit Schedule β€” Detailed schedule with procedures listed at each visit
  • Budget Appendix β€” 4-column breakdown of all cost categories with percentages and descriptions

All PDF sections are populated with real content derived from the debate decisions. No TBD, "Not specified", or "To be assigned" values appear anywhere in the output.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Streamlit Frontend                     β”‚
β”‚                    main_enhanced.py                       β”‚
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Wizard UI  β”‚  β”‚  Debate View β”‚  β”‚  Validation UI β”‚  β”‚
β”‚  β”‚ (5 steps)   β”‚  β”‚  (live feed) β”‚  β”‚  (scorecard)   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
└─────────────────────────────────────────────────────────-β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Agent Debate System                          β”‚
β”‚         guideline_based_debate_system.py                  β”‚
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Medical   β”‚   β”‚ Statistician β”‚  β”‚  Regulatory   β”‚  β”‚
β”‚  β”‚   Monitor   β”‚   β”‚              β”‚  β”‚   Affairs     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚    Site     β”‚   β”‚   Patient    β”‚  β”‚   Finance     β”‚  β”‚
β”‚  β”‚ Feasibility β”‚   β”‚   Advocate   β”‚  β”‚   Director    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                           β”‚
β”‚   Round 1: Propose β†’ Round 2: Check β†’ Round 3: Agree    β”‚
└─────────────────────────────────────────────────────────-β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Protocol Assembly & Validation               β”‚
β”‚                                                           β”‚
β”‚  src/models/protocol.py  β€” Pydantic data models          β”‚
β”‚  src/validation/         β€” Scoring and report generation β”‚
β”‚  src/export/             β€” PDF generation with ReportLab β”‚
└─────────────────────────────────────────────────────────-β”˜

Session State Management

Streamlit rerenders the entire script on every user interaction. To prevent losing debate progress mid-generation, all debate decisions and the final protocol object are stored in st.session_state. This means:

  • If the page rerenders during a long AI call, completed topic debates are not re-run
  • If the user navigates away and returns, they can resume from where they left off
  • A "View Generated Protocol" button appears in the sidebar after any successful generation

File Structure

β”œβ”€β”€ main_enhanced.py              # Main Streamlit app entry point
β”œβ”€β”€ main.py                       # Alternative simpler entry point
β”œβ”€β”€ main_generator.py             # Standalone protocol generator
β”œβ”€β”€ app.py                        # Lightweight app wrapper
β”œβ”€β”€ pyproject.toml                # Python package dependencies
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ guideline_based_debate_system.py   # Core debate engine (6 agents, 3 rounds)
β”‚   β”‚   β”œβ”€β”€ comprehensive_generator.py          # Fallback protocol generator
β”‚   β”‚   β”œβ”€β”€ generator_agent.py                  # ProtocolGeneratorAgent (Draft mode)
β”‚   β”‚   β”œβ”€β”€ real_debate_generator.py            # CrewAI-based debate (legacy)
β”‚   β”‚   β”œβ”€β”€ visual_debate_generator.py          # Visual debate UI helper
β”‚   β”‚   β”œβ”€β”€ mock_debate_generator.py            # Mock debates for testing
β”‚   β”‚   β”œβ”€β”€ simple_debate_system.py             # Simplified debate (no AI)
β”‚   β”‚   β”œβ”€β”€ review_coordinator.py               # Multi-agent review orchestration
β”‚   β”‚   β”œβ”€β”€ safety_validator.py                 # Agent-level safety checks
β”‚   β”‚   β”œβ”€β”€ base_agent.py                       # Base class for all agents
β”‚   β”‚   β”œβ”€β”€ agent_factory.py                    # Agent instantiation factory
β”‚   β”‚   └── agent_configs.py                    # Agent personalities and configs
β”‚   β”‚
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ protocol.py                         # ClinicalProtocol Pydantic model
β”‚   β”‚   β”œβ”€β”€ enhanced_protocol.py                # Extended protocol fields
β”‚   β”‚   β”œβ”€β”€ nih_protocol.py                     # NIH-format protocol model
β”‚   β”‚   β”œβ”€β”€ validation.py                       # Validation result models
β”‚   β”‚   └── debate.py                           # Debate state models
β”‚   β”‚
β”‚   β”œβ”€β”€ ui/
β”‚   β”‚   β”œβ”€β”€ progressive_wizard.py               # 5-step wizard UI (AIProtocolArchitect)
β”‚   β”‚   β”œβ”€β”€ modern_debate_ui.py                 # Debate feed UI components
β”‚   β”‚   β”œβ”€β”€ validation_dashboard.py             # Validation score display
β”‚   β”‚   β”œβ”€β”€ visualizations.py                   # Plotly charts and consensus meter
β”‚   β”‚   β”œβ”€β”€ components.py                       # Reusable UI components
β”‚   β”‚   β”œβ”€β”€ database_manager_ui.py              # Protocol database browser
β”‚   β”‚   └── protocol_generator_ui.py            # Legacy generator UI
β”‚   β”‚
β”‚   β”œβ”€β”€ validation/
β”‚   β”‚   β”œβ”€β”€ enhanced_validation_generator.py    # Main validation and scoring
β”‚   β”‚   β”œβ”€β”€ protocol_validator.py               # Field-level protocol validation
β”‚   β”‚   β”œβ”€β”€ placeholder_validator.py            # TBD/placeholder detection
β”‚   β”‚   └── validation_report_generator.py      # Report formatting
β”‚   β”‚
β”‚   β”œβ”€β”€ export/
β”‚   β”‚   └── pdf_generator.py                    # ReportLab PDF generation
β”‚   β”‚
β”‚   β”œβ”€β”€ database/
β”‚   β”‚   └── (protocol storage and retrieval)
β”‚   β”‚
β”‚   └── utils/
β”‚       └── (shared helper utilities)
β”‚
β”œβ”€β”€ tests/
β”‚   └── (test files)
β”‚
β”œβ”€β”€ .streamlit/
β”‚   └── config.toml               # Streamlit server configuration
β”‚
└── nextjs-ui/                    # Experimental Next.js frontend (not the main app)

Technology Stack

Technology Version Purpose
Streamlit β‰₯ 1.47 Interactive web UI and session management
Anthropic Claude claude-3-5-haiku-20241022 AI agent debate generation
CrewAI β‰₯ 0.28 Multi-agent orchestration framework
Pydantic β‰₯ 2.0 Data validation and protocol modeling
ReportLab β‰₯ 4.4 Professional PDF document generation
Plotly β‰₯ 6.2 Interactive charts and consensus meter
pandas β‰₯ 2.3 Data manipulation and tabular display
NumPy / SciPy latest Statistical calculations and power analysis
python-docx β‰₯ 1.2 Word document export

Setup & Configuration

Prerequisites

  • Python 3.11 or higher
  • An Anthropic API key (for live AI debates β€” optional, see fallback below)

Installation

Dependencies are managed via pyproject.toml and uv. On Replit, they are installed automatically. For local development:

pip install -e .

Or with uv:

uv sync

Environment Variables

Variable Required Description
ANTHROPIC_API_KEY Optional Enables live AI agent debates using Claude 3.5 Haiku. Without this, the system uses an evidence-based fallback generator that produces high-quality protocols from clinical trial best practices and FDA guidelines.

What Happens Without an API Key

If no Anthropic API key is present (or if the key is invalid / quota is exhausted), the system automatically switches to _run_fallback_debate(), which:

  • Uses phase-specific evidence-based defaults (e.g., OS for Phase 3 oncology, ORR for Phase 2)
  • Calculates statistically valid sample sizes using standard power formulas
  • Generates realistic budgets based on phase and indication
  • Shows the same debate UI with structured "evidence-based" agent positions
  • Produces identical PDF output with full content

No data is lost and no errors are shown to the user β€” the system is fully functional in fallback mode.


Running the App

On Replit

The workflow is already configured. Click Run or start the Enhanced Protocol Generator workflow.

streamlit run main_enhanced.py --server.port 5000

Locally

streamlit run main_enhanced.py

The app will be available at http://localhost:8501 by default.


Using the Application

1. Start a New Protocol

From the home screen, click Start New Protocol or select AI Protocol Architect from the navigation sidebar.

2. Complete the Wizard

Fill out all five steps:

  • Step 1: Disease indication and intervention name
  • Step 2: Study design (phase, randomization, blinding, duration)
  • Step 3: Patient population and eligibility criteria
  • Step 4: Endpoints and assessments
  • Step 5: Review and confirm before generation

A demo dataset (breast cancer Phase 3) can be loaded from the sidebar for quick exploration.

3. Choose Generation Mode

Select Draft for speed or Review/Submission for full AI debate. The mode selector is in the sidebar under Generation Settings.

4. Generate

Click Generate Protocol. The app will:

  • Initialize the debate system
  • Run agents through three rounds per topic
  • Display live agent messages and consensus updates
  • Assemble and validate the final protocol

5. Download PDF

After generation, a download button for the full protocol PDF appears at the top of the results screen.


Troubleshooting

Generation Resets to Ready Screen

This can happen if the AI debate takes longer than Streamlit's default session timeout, or if an error occurs mid-debate. The app handles this by:

  • Storing each completed topic debate decision in st.session_state immediately after it finishes
  • Catching exceptions at the debate topic level and falling back to evidence-based decisions rather than crashing
  • Displaying a View Generated Protocol button in the sidebar if a protocol was previously completed

If you see the ready screen unexpectedly, check the sidebar for the View Generated Protocol button.

Slow Initial Load (High LCP)

The app previously used heavy CSS effects (gradients, backdrop-filter, multiple box-shadow layers) that caused Largest Contentful Paint times exceeding 90 seconds. These have been replaced with solid-color equivalents that maintain the same dark theme without the performance cost.

If you still see slow loads, check:

  • Whether a large number of Plotly charts are being rendered simultaneously
  • Browser extensions that may be interfering with the WebSocket connection

AI Service Not Available

If the Anthropic API is unavailable, the status banner at the top of the debate section will tell you why (no key, invalid key, quota exceeded, etc.). The system will automatically use evidence-based protocol generation. No action is needed β€” the output quality is still high.

Protocol Contains Placeholder Warnings

The validation system will warn (not error) if it detects placeholder-like values. These warnings don't block PDF generation. If you see them, it typically means a fallback decision produced a less specific value for a particular field. The system is designed to handle this gracefully.


Contributing

The codebase follows a clear layered architecture:

  • UI layer (src/ui/) β€” All Streamlit rendering code
  • Agent layer (src/agents/) β€” Debate and generation logic
  • Model layer (src/models/) β€” Data structures
  • Validation layer (src/validation/) β€” Scoring and quality checks
  • Export layer (src/export/) β€” Document generation

When adding a new agent or debate topic, add it to guideline_based_debate_system.py following the existing pattern of must_enforce requirements and three-round debate logic.


License

This project is intended for research and educational use in clinical trial design. Always ensure that any generated protocols are reviewed by qualified medical, statistical, and regulatory professionals before use in actual research.

Releases

No releases published

Packages

 
 
 

Contributors

Languages