Skip to content

Latest commit

 

History

History
440 lines (334 loc) · 12.7 KB

File metadata and controls

440 lines (334 loc) · 12.7 KB

High-Level Architecture: AI Code Review & CI Diagnosis Platform

1. Overview

This document describes a high-level architecture for an AI-powered GitHub integration product that supports pull request review, CI failure diagnosis, and automated feedback through GitHub comments and checks.

The core flow is:

GitHub Event → GitHub Webhook → Webhook Handler → Agent Orchestrator → Specialized Agents → GitHub APIs

The system is event-driven. GitHub emits events such as pull request creation, pull request update, workflow failure, or check completion. The backend receives these events through a GitHub App webhook, validates and routes them, then delegates work to an orchestrator that coordinates specialized AI agents. Final results are written back to GitHub through GitHub APIs.


2. Architecture Diagram

flowchart TD
    %% External Systems
    GitHub[GitHub]
    GitHubApp[GitHub App]
    OpenAI[OpenAI / LLM Provider]

    %% Events
    PREvent[Pull Request Events]
    CIEvent[Workflow / CI Failure Events]
    PushEvent[Push / Commit Events]

    %% Backend Entry
    Webhook[GitHub Webhook Endpoint]
    Handler[Webhook Handler]
    EventRouter[Event Router]

    %% Async Processing
    Queue[Job Queue]
    Worker[Background Worker]

    %% Core Logic
    Orchestrator[Agent Orchestrator]

    %% Agents
    PRSummaryAgent[PR Summary Agent]
    CodeReviewAgent[Code Review Agent]
    CIDiagnosisAgent[CI Root Cause Agent]
    FixSuggestionAgent[Fix Suggestion Agent]

    %% Services
    GitHubClient[GitHub API Client]
    DiffService[Diff Fetching Service]
    LogService[CI Log Fetching Service]
    OutputService[Output Service]
    Storage[(Database / Storage)]

    %% Event Sources
    GitHub --> PREvent
    GitHub --> CIEvent
    GitHub --> PushEvent

    PREvent --> GitHubApp
    CIEvent --> GitHubApp
    PushEvent --> GitHubApp

    GitHubApp --> Webhook
    Webhook --> Handler
    Handler --> EventRouter
    EventRouter --> Queue
    Queue --> Worker
    Worker --> Orchestrator

    %% Orchestration
    Orchestrator --> PRSummaryAgent
    Orchestrator --> CodeReviewAgent
    Orchestrator --> CIDiagnosisAgent
    Orchestrator --> FixSuggestionAgent

    %% Agent Dependencies
    PRSummaryAgent --> DiffService
    CodeReviewAgent --> DiffService
    CIDiagnosisAgent --> LogService
    FixSuggestionAgent --> DiffService
    FixSuggestionAgent --> LogService

    DiffService --> GitHubClient
    LogService --> GitHubClient
    GitHubClient --> GitHub

    PRSummaryAgent --> OpenAI
    CodeReviewAgent --> OpenAI
    CIDiagnosisAgent --> OpenAI
    FixSuggestionAgent --> OpenAI

    %% Output
    Orchestrator --> OutputService
    OutputService --> GitHubClient
    OutputService --> Storage
    Handler --> Storage
    Worker --> Storage
    Orchestrator --> Storage
Loading

3. Component Responsibilities

3.1 GitHub App

The GitHub App is the official integration identity installed by users or organizations. It provides permissions, receives GitHub events, and allows the backend to access repositories through installation tokens.

Typical permissions include:

  • Repository contents: read
  • Pull requests: read and write
  • Checks: read and write
  • Actions: read
  • Issues or pull request comments: write

3.2 GitHub Webhook Endpoint

The webhook endpoint is the public HTTP entrypoint exposed by the backend. GitHub sends event payloads to this endpoint whenever a subscribed event occurs.

Example endpoint:

POST /webhook/github

3.3 Webhook Handler

The webhook handler is a lightweight adapter layer. It should not run AI logic directly.

Its responsibilities are:

  • Verify GitHub webhook signature
  • Read GitHub event headers
  • Parse the raw payload
  • Extract required metadata
  • Deduplicate repeated webhook deliveries
  • Pass normalized events to the event router
  • Return 200 OK quickly

3.4 Event Router

The event router maps GitHub event types and actions to internal job types.

Example routing rules:

pull_request.opened        → PR_REVIEW_JOB
pull_request.synchronize   → PR_REVIEW_JOB
workflow_run.completed     → CI_DIAGNOSIS_JOB
check_suite.completed      → CI_DIAGNOSIS_JOB

3.5 Job Queue

The job queue decouples webhook handling from long-running AI processing. This prevents GitHub webhook timeouts and allows retry, backoff, and parallel processing.

For MVP, this can be implemented with FastAPI background tasks. For production, use a real queue such as Redis Queue, Celery, Dramatiq, or a cloud-native queue.

3.6 Background Worker

Workers consume jobs from the queue and execute business workflows. A worker loads the required GitHub context, invokes the orchestrator, and persists execution status.

3.7 Agent Orchestrator

The orchestrator coordinates specialized agents. It decides which agents should run based on the job type and repository context.

Example responsibilities:

  • Decide whether the event requires PR review or CI diagnosis
  • Fetch required context through services
  • Call one or more agents
  • Merge agent results
  • Send final result to the output service

3.8 Specialized Agents

The system can contain multiple task-specific agents:

  • PR Summary Agent: Generates a high-level summary of the pull request.
  • Code Review Agent: Reviews diffs and produces actionable comments.
  • CI Root Cause Agent: Analyzes failed workflow logs and explains the likely cause.
  • Fix Suggestion Agent: Suggests code or configuration changes to fix issues.

3.9 GitHub API Client

The GitHub API client wraps GitHub REST or GraphQL API calls. It is responsible for:

  • Generating installation tokens
  • Fetching pull request diffs
  • Fetching workflow runs and logs
  • Creating PR comments
  • Creating inline review comments
  • Creating or updating check runs

3.10 Output Service

The output service converts agent results into GitHub-facing output.

Possible outputs include:

  • Pull request summary comment
  • Inline code review comments
  • CI failure diagnosis comment
  • GitHub Check Run result
  • Optional auto-fix pull request in future versions

3.11 Storage

The database stores persistent system state.

Typical data includes:

  • GitHub installation ID
  • Repository ID
  • Pull request number
  • Commit SHA
  • Job status
  • Processed webhook delivery IDs
  • Generated review results
  • GitHub comment IDs
  • User or organization configuration

4. Business Sequence Diagram: Pull Request Review

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub
    participant App as GitHub App
    participant WH as Webhook Handler
    participant Router as Event Router
    participant Queue as Job Queue
    participant Worker as Worker
    participant Orch as Agent Orchestrator
    participant Diff as Diff Service
    participant Agent as PR Review Agents
    participant LLM as LLM Provider
    participant Output as Output Service
    participant API as GitHub API

    Dev->>GH: Open or update pull request
    GH->>App: Emit pull_request event
    App->>WH: POST webhook payload
    WH->>WH: Verify signature
    WH->>WH: Extract repo, PR number, installation ID, commit SHA
    WH->>Router: Send normalized event
    Router->>Queue: Enqueue PR_REVIEW_JOB
    WH-->>App: Return 200 OK

    Queue->>Worker: Deliver PR_REVIEW_JOB
    Worker->>Orch: Start PR review workflow
    Orch->>Diff: Fetch PR diff and changed files
    Diff->>API: Request PR files / diff
    API->>GH: Fetch repository data
    GH-->>API: Return diff data
    API-->>Diff: Return normalized diff
    Diff-->>Orch: Return review context

    Orch->>Agent: Run summary and code review agents
    Agent->>LLM: Analyze diff and generate review
    LLM-->>Agent: Return review result
    Agent-->>Orch: Return structured findings

    Orch->>Output: Publish PR review result
    Output->>API: Create PR summary comment and optional inline comments
    API->>GH: Write comments to pull request
    GH-->>API: Confirm comments created
    API-->>Output: Return comment metadata
    Output-->>Orch: Output completed
    Orch-->>Worker: Mark job completed
Loading

5. Business Sequence Diagram: CI Failure Diagnosis

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub
    participant App as GitHub App
    participant WH as Webhook Handler
    participant Router as Event Router
    participant Queue as Job Queue
    participant Worker as Worker
    participant Orch as Agent Orchestrator
    participant Logs as CI Log Service
    participant Agent as CI Root Cause Agent
    participant LLM as LLM Provider
    participant Output as Output Service
    participant API as GitHub API

    Dev->>GH: Push commit or update pull request
    GH->>GH: Run GitHub Actions workflow
    GH->>App: Emit workflow_run.completed event
    App->>WH: POST webhook payload
    WH->>WH: Verify signature
    WH->>WH: Extract repo, workflow run ID, conclusion, installation ID
    WH->>Router: Send normalized event

    alt Workflow failed
        Router->>Queue: Enqueue CI_DIAGNOSIS_JOB
    else Workflow succeeded
        Router-->>WH: Ignore or record event
    end

    WH-->>App: Return 200 OK

    Queue->>Worker: Deliver CI_DIAGNOSIS_JOB
    Worker->>Orch: Start CI diagnosis workflow
    Orch->>Logs: Fetch failed jobs and logs
    Logs->>API: Request workflow jobs and logs
    API->>GH: Fetch CI log artifacts
    GH-->>API: Return logs
    API-->>Logs: Return raw logs
    Logs->>Logs: Extract error sections and relevant context
    Logs-->>Orch: Return condensed failure context

    Orch->>Agent: Run CI root cause analysis
    Agent->>LLM: Analyze logs and infer likely cause
    LLM-->>Agent: Return diagnosis and suggested fix
    Agent-->>Orch: Return structured diagnosis

    Orch->>Output: Publish CI diagnosis result
    Output->>API: Create PR comment or Check Run annotation
    API->>GH: Write diagnosis result
    GH-->>API: Confirm output created
    API-->>Output: Return output metadata
    Output-->>Orch: Output completed
    Orch-->>Worker: Mark job completed
Loading

6. Recommended MVP Scope

The first MVP should focus on the smallest complete feedback loop:

pull_request.opened → webhook → handler → queue → orchestrator → PR summary agent → GitHub comment

Recommended MVP features:

  1. Receive GitHub pull request webhook
  2. Verify webhook signature
  3. Fetch pull request diff
  4. Generate a PR summary using an LLM
  5. Post one summary comment back to the PR
  6. Store processed event ID to avoid duplicate processing

Avoid implementing CI diagnosis, inline comments, auto-fix commits, and complex multi-agent workflows in the first version. Add those after the basic GitHub integration loop is reliable.


7. Key Engineering Considerations

7.1 Webhook Timeout Avoidance

GitHub expects webhook receivers to respond quickly. The webhook handler should enqueue work and return immediately instead of running AI analysis synchronously.

7.2 Idempotency

GitHub may retry webhook deliveries. The system should store delivery IDs and avoid processing the same event multiple times.

7.3 Installation Token Management

GitHub App installation tokens are short-lived. The GitHub API client should generate and refresh tokens when needed.

7.4 Large Diff and Log Handling

Pull request diffs and CI logs can be large. The system should support:

  • File filtering
  • Diff chunking
  • Log truncation
  • Error extraction
  • Token budget control

7.5 Output Quality Control

Agent output should be structured before being posted to GitHub. The output service should validate formatting and avoid noisy or duplicate comments.

7.6 Security

The system should:

  • Verify webhook signatures
  • Use least-privilege GitHub App permissions
  • Avoid logging secrets or private repository content unnecessarily
  • Encrypt sensitive credentials
  • Separate user-facing API authentication from GitHub App authentication

8. Suggested Backend Module Structure

app/
├── main.py
├── config.py
├── api/
│   ├── routes.py
│   └── endpoints/
│       ├── github_webhook.py
│       └── health.py
├── github/
│   ├── auth.py
│   ├── client.py
│   ├── webhooks.py
│   └── event_router.py
├── jobs/
│   ├── queue.py
│   ├── pr_review_job.py
│   └── ci_diagnosis_job.py
├── agents/
│   ├── orchestrator.py
│   ├── pr_summary_agent.py
│   ├── code_review_agent.py
│   ├── ci_root_cause_agent.py
│   └── fix_suggestion_agent.py
├── services/
│   ├── diff_service.py
│   ├── ci_log_service.py
│   ├── output_service.py
│   └── review_service.py
├── storage/
│   ├── db.py
│   └── repositories.py
└── utils/
    ├── logging.py
    └── errors.py