This document describes a high-level architecture for an AI-powered GitHub integration product that supports pull request review, CI failure diagnosis, and automated feedback through GitHub comments and checks.
The core flow is:
GitHub Event → GitHub Webhook → Webhook Handler → Agent Orchestrator → Specialized Agents → GitHub APIs
The system is event-driven. GitHub emits events such as pull request creation, pull request update, workflow failure, or check completion. The backend receives these events through a GitHub App webhook, validates and routes them, then delegates work to an orchestrator that coordinates specialized AI agents. Final results are written back to GitHub through GitHub APIs.
flowchart TD
%% External Systems
GitHub[GitHub]
GitHubApp[GitHub App]
OpenAI[OpenAI / LLM Provider]
%% Events
PREvent[Pull Request Events]
CIEvent[Workflow / CI Failure Events]
PushEvent[Push / Commit Events]
%% Backend Entry
Webhook[GitHub Webhook Endpoint]
Handler[Webhook Handler]
EventRouter[Event Router]
%% Async Processing
Queue[Job Queue]
Worker[Background Worker]
%% Core Logic
Orchestrator[Agent Orchestrator]
%% Agents
PRSummaryAgent[PR Summary Agent]
CodeReviewAgent[Code Review Agent]
CIDiagnosisAgent[CI Root Cause Agent]
FixSuggestionAgent[Fix Suggestion Agent]
%% Services
GitHubClient[GitHub API Client]
DiffService[Diff Fetching Service]
LogService[CI Log Fetching Service]
OutputService[Output Service]
Storage[(Database / Storage)]
%% Event Sources
GitHub --> PREvent
GitHub --> CIEvent
GitHub --> PushEvent
PREvent --> GitHubApp
CIEvent --> GitHubApp
PushEvent --> GitHubApp
GitHubApp --> Webhook
Webhook --> Handler
Handler --> EventRouter
EventRouter --> Queue
Queue --> Worker
Worker --> Orchestrator
%% Orchestration
Orchestrator --> PRSummaryAgent
Orchestrator --> CodeReviewAgent
Orchestrator --> CIDiagnosisAgent
Orchestrator --> FixSuggestionAgent
%% Agent Dependencies
PRSummaryAgent --> DiffService
CodeReviewAgent --> DiffService
CIDiagnosisAgent --> LogService
FixSuggestionAgent --> DiffService
FixSuggestionAgent --> LogService
DiffService --> GitHubClient
LogService --> GitHubClient
GitHubClient --> GitHub
PRSummaryAgent --> OpenAI
CodeReviewAgent --> OpenAI
CIDiagnosisAgent --> OpenAI
FixSuggestionAgent --> OpenAI
%% Output
Orchestrator --> OutputService
OutputService --> GitHubClient
OutputService --> Storage
Handler --> Storage
Worker --> Storage
Orchestrator --> Storage
The GitHub App is the official integration identity installed by users or organizations. It provides permissions, receives GitHub events, and allows the backend to access repositories through installation tokens.
Typical permissions include:
- Repository contents: read
- Pull requests: read and write
- Checks: read and write
- Actions: read
- Issues or pull request comments: write
The webhook endpoint is the public HTTP entrypoint exposed by the backend. GitHub sends event payloads to this endpoint whenever a subscribed event occurs.
Example endpoint:
POST /webhook/github
The webhook handler is a lightweight adapter layer. It should not run AI logic directly.
Its responsibilities are:
- Verify GitHub webhook signature
- Read GitHub event headers
- Parse the raw payload
- Extract required metadata
- Deduplicate repeated webhook deliveries
- Pass normalized events to the event router
- Return
200 OKquickly
The event router maps GitHub event types and actions to internal job types.
Example routing rules:
pull_request.opened → PR_REVIEW_JOB
pull_request.synchronize → PR_REVIEW_JOB
workflow_run.completed → CI_DIAGNOSIS_JOB
check_suite.completed → CI_DIAGNOSIS_JOB
The job queue decouples webhook handling from long-running AI processing. This prevents GitHub webhook timeouts and allows retry, backoff, and parallel processing.
For MVP, this can be implemented with FastAPI background tasks. For production, use a real queue such as Redis Queue, Celery, Dramatiq, or a cloud-native queue.
Workers consume jobs from the queue and execute business workflows. A worker loads the required GitHub context, invokes the orchestrator, and persists execution status.
The orchestrator coordinates specialized agents. It decides which agents should run based on the job type and repository context.
Example responsibilities:
- Decide whether the event requires PR review or CI diagnosis
- Fetch required context through services
- Call one or more agents
- Merge agent results
- Send final result to the output service
The system can contain multiple task-specific agents:
- PR Summary Agent: Generates a high-level summary of the pull request.
- Code Review Agent: Reviews diffs and produces actionable comments.
- CI Root Cause Agent: Analyzes failed workflow logs and explains the likely cause.
- Fix Suggestion Agent: Suggests code or configuration changes to fix issues.
The GitHub API client wraps GitHub REST or GraphQL API calls. It is responsible for:
- Generating installation tokens
- Fetching pull request diffs
- Fetching workflow runs and logs
- Creating PR comments
- Creating inline review comments
- Creating or updating check runs
The output service converts agent results into GitHub-facing output.
Possible outputs include:
- Pull request summary comment
- Inline code review comments
- CI failure diagnosis comment
- GitHub Check Run result
- Optional auto-fix pull request in future versions
The database stores persistent system state.
Typical data includes:
- GitHub installation ID
- Repository ID
- Pull request number
- Commit SHA
- Job status
- Processed webhook delivery IDs
- Generated review results
- GitHub comment IDs
- User or organization configuration
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant App as GitHub App
participant WH as Webhook Handler
participant Router as Event Router
participant Queue as Job Queue
participant Worker as Worker
participant Orch as Agent Orchestrator
participant Diff as Diff Service
participant Agent as PR Review Agents
participant LLM as LLM Provider
participant Output as Output Service
participant API as GitHub API
Dev->>GH: Open or update pull request
GH->>App: Emit pull_request event
App->>WH: POST webhook payload
WH->>WH: Verify signature
WH->>WH: Extract repo, PR number, installation ID, commit SHA
WH->>Router: Send normalized event
Router->>Queue: Enqueue PR_REVIEW_JOB
WH-->>App: Return 200 OK
Queue->>Worker: Deliver PR_REVIEW_JOB
Worker->>Orch: Start PR review workflow
Orch->>Diff: Fetch PR diff and changed files
Diff->>API: Request PR files / diff
API->>GH: Fetch repository data
GH-->>API: Return diff data
API-->>Diff: Return normalized diff
Diff-->>Orch: Return review context
Orch->>Agent: Run summary and code review agents
Agent->>LLM: Analyze diff and generate review
LLM-->>Agent: Return review result
Agent-->>Orch: Return structured findings
Orch->>Output: Publish PR review result
Output->>API: Create PR summary comment and optional inline comments
API->>GH: Write comments to pull request
GH-->>API: Confirm comments created
API-->>Output: Return comment metadata
Output-->>Orch: Output completed
Orch-->>Worker: Mark job completed
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant App as GitHub App
participant WH as Webhook Handler
participant Router as Event Router
participant Queue as Job Queue
participant Worker as Worker
participant Orch as Agent Orchestrator
participant Logs as CI Log Service
participant Agent as CI Root Cause Agent
participant LLM as LLM Provider
participant Output as Output Service
participant API as GitHub API
Dev->>GH: Push commit or update pull request
GH->>GH: Run GitHub Actions workflow
GH->>App: Emit workflow_run.completed event
App->>WH: POST webhook payload
WH->>WH: Verify signature
WH->>WH: Extract repo, workflow run ID, conclusion, installation ID
WH->>Router: Send normalized event
alt Workflow failed
Router->>Queue: Enqueue CI_DIAGNOSIS_JOB
else Workflow succeeded
Router-->>WH: Ignore or record event
end
WH-->>App: Return 200 OK
Queue->>Worker: Deliver CI_DIAGNOSIS_JOB
Worker->>Orch: Start CI diagnosis workflow
Orch->>Logs: Fetch failed jobs and logs
Logs->>API: Request workflow jobs and logs
API->>GH: Fetch CI log artifacts
GH-->>API: Return logs
API-->>Logs: Return raw logs
Logs->>Logs: Extract error sections and relevant context
Logs-->>Orch: Return condensed failure context
Orch->>Agent: Run CI root cause analysis
Agent->>LLM: Analyze logs and infer likely cause
LLM-->>Agent: Return diagnosis and suggested fix
Agent-->>Orch: Return structured diagnosis
Orch->>Output: Publish CI diagnosis result
Output->>API: Create PR comment or Check Run annotation
API->>GH: Write diagnosis result
GH-->>API: Confirm output created
API-->>Output: Return output metadata
Output-->>Orch: Output completed
Orch-->>Worker: Mark job completed
The first MVP should focus on the smallest complete feedback loop:
pull_request.opened → webhook → handler → queue → orchestrator → PR summary agent → GitHub comment
Recommended MVP features:
- Receive GitHub pull request webhook
- Verify webhook signature
- Fetch pull request diff
- Generate a PR summary using an LLM
- Post one summary comment back to the PR
- Store processed event ID to avoid duplicate processing
Avoid implementing CI diagnosis, inline comments, auto-fix commits, and complex multi-agent workflows in the first version. Add those after the basic GitHub integration loop is reliable.
GitHub expects webhook receivers to respond quickly. The webhook handler should enqueue work and return immediately instead of running AI analysis synchronously.
GitHub may retry webhook deliveries. The system should store delivery IDs and avoid processing the same event multiple times.
GitHub App installation tokens are short-lived. The GitHub API client should generate and refresh tokens when needed.
Pull request diffs and CI logs can be large. The system should support:
- File filtering
- Diff chunking
- Log truncation
- Error extraction
- Token budget control
Agent output should be structured before being posted to GitHub. The output service should validate formatting and avoid noisy or duplicate comments.
The system should:
- Verify webhook signatures
- Use least-privilege GitHub App permissions
- Avoid logging secrets or private repository content unnecessarily
- Encrypt sensitive credentials
- Separate user-facing API authentication from GitHub App authentication
app/
├── main.py
├── config.py
├── api/
│ ├── routes.py
│ └── endpoints/
│ ├── github_webhook.py
│ └── health.py
├── github/
│ ├── auth.py
│ ├── client.py
│ ├── webhooks.py
│ └── event_router.py
├── jobs/
│ ├── queue.py
│ ├── pr_review_job.py
│ └── ci_diagnosis_job.py
├── agents/
│ ├── orchestrator.py
│ ├── pr_summary_agent.py
│ ├── code_review_agent.py
│ ├── ci_root_cause_agent.py
│ └── fix_suggestion_agent.py
├── services/
│ ├── diff_service.py
│ ├── ci_log_service.py
│ ├── output_service.py
│ └── review_service.py
├── storage/
│ ├── db.py
│ └── repositories.py
└── utils/
├── logging.py
└── errors.py