Skip to content

Latest commit

 

History

History
243 lines (181 loc) · 6.58 KB

File metadata and controls

243 lines (181 loc) · 6.58 KB

CodeHawk logo

AI-assisted code review for GitHub pull requests.

CodeHawk is an AI-assisted code review platform for GitHub pull requests. It receives GitHub App webhook events, fetches pull request context, runs repository indexing and static analysis jobs, and posts reviewer-focused summaries or code review comments back to GitHub.

Chinese documentation: README.zh-CN.md

Features

  • GitHub App webhook receiver with signature verification.
  • Pull request summary generation for opened and synchronized PR events.
  • AI code review with structured review output and inline comments.
  • Static analysis pipeline using Ruff and Bandit.
  • Repository archive storage through MinIO.
  • Python code chunk indexing with PostgreSQL persistence.
  • Planned: local CLI for reviewing staged, unstaged, or all uncommitted changes.
  • Docker Compose environment for backend, MinIO, and PostgreSQL.

Tech Stack

  • Backend: Python 3.14, FastAPI, httpx, Pydantic, structlog.
  • AI agents: OpenAI Agents SDK.
  • Static analysis: Ruff, Bandit.
  • Storage: PostgreSQL, MinIO.
  • Container runtime: Docker, Docker Compose.
  • CLI: Go, Cobra.
  • Testing: pytest.
  • Package management: uv for Python, Go modules for CLI.

Architecture

GitHub Pull Request Event
        |
        v
FastAPI Webhook Endpoint
        |
        v
GithubHandler
        |
        +--> PR Summary Service
        |       +--> PR Summary Agent
        |       +--> GitHub Issue Comment
        |
        +--> Code Indexing Service
        |       +--> Git Fetcher Container --> MinIO repo archive
        |       +--> SAT Runner Container --> MinIO SAT report
        |       +--> Code Chunker + Reference Parser --> PostgreSQL code index
        |       +--> Embedding Builder --> PostgreSQL pgvector
        |       +--> Convention Indexing --> Repository Convention KB
        |
        +--> Code Review Service
                +--> Workflow Run/Step State --> PostgreSQL
                +--> SAT Summary JSON Agent
                +--> Code Review Planning Agent
                +--> Structural Evidence Agent --> Code index queries
                +--> Convention Evidence Agent --> Convention KB + semantic search
                +--> Evidence Validator
                +--> Code Review Writer Agent
                +--> Patch-line Review Output Validator
                +--> GitHub PR Review

Key backend modules:

  • codehawk/api: FastAPI routing and webhook endpoints.
  • codehawk/handlers: event routing from GitHub webhook payloads.
  • codehawk/services: workflow orchestration for PR summaries, reviews, and indexing.
  • codehawk/agents: PR summary, SAT summary, planning, evidence, and review writer agents.
  • codehawk/github: GitHub auth, token caching, and API client.
  • codehawk/minio: object storage URL generation.
  • codehawk/postgresql: database clients and repositories for code index, embeddings, convention KB, and review workflow state.
  • codehawk/code_index: repo fetching, SAT runner, Python code chunking, reference parsing, and embedding generation.
  • codehawk/models: DTOs and normalized context models.
  • codehawk/utils: deterministic validators and OpenAI retry helpers.

More design context is available in dev_docs/high_level_architecture.md.

Installation

Prerequisites

  • Python 3.14+
  • uv
  • Docker and Docker Compose
  • Go 1.24+ for the CLI
  • A GitHub App with webhook secret and private key
  • An OpenAI API key

Backend setup

  1. Copy the environment template:
cp .env.example .env
  1. Fill in the required values in .env:
OPENAI_API_KEY=
GITHUB_WEBHOOK_SECRET=
GITHUB_APP_CLIENT_ID=
GITHUB_PRIVATE_KEY_PATH=
MINIO_ACCESS_KEY=
MINIO_SECRET_KEY=
MINIO_BUCKET=repos
POSTGRES_DB=codehawk
POSTGRES_USER=codehawk
POSTGRES_PASSWORD=codehawk
  1. Install Python dependencies:
uv sync
  1. Start the local infrastructure and backend:
docker compose up --build

The backend is exposed through:

http://localhost:${CODEHAWK_BACKEND_PORT}

With the default .env.example value:

http://localhost:8300

Health check:

curl http://localhost:8300/health

GitHub webhook endpoint:

POST /webhook/github

GitHub App configuration

Configure the GitHub App webhook URL to point to:

https://<your-public-host>/webhook/github

The app needs permissions for reading repository contents and pull requests, and writing pull request reviews or issue comments. The private key path must match GITHUB_PRIVATE_KEY_PATH in .env; Docker Compose mounts it as a read-only secret inside the backend container.

CLI

The CLI lives in cli/.

The CLI is still under active development. Commands, configuration fields, and API integration behavior may change before a stable release.

Build it:

cd cli
go build -o codehawk .

Initialize repository config:

./codehawk config init

Review staged changes:

./codehawk review --type staged

Supported review types:

staged, unstaged, all

If CODEHAWK_API_URL is set, the CLI posts the review request to that endpoint. If it is not set, it prints the generated request payload.

CLI configuration is stored in .codehawk.yaml. See cli/docs/configuration-reference.md.

Development

Run tests:

uv run pytest

Run the backend locally without Docker:

uv run fastapi dev codehawk/main.py

Run static checks manually:

uv run ruff check .
uv run bandit --format=json --quiet -r .

Environment Variables

Important variables:

Variable Purpose
OPENAI_API_KEY OpenAI API key for agents.
GITHUB_WEBHOOK_SECRET Secret used to verify GitHub webhook signatures.
GITHUB_APP_CLIENT_ID GitHub App client ID.
GITHUB_PRIVATE_KEY_PATH Local path to the GitHub App private key.
MINIO_ENDPOINT MinIO API endpoint used by the backend container.
MINIO_ACCESS_KEY MinIO access key.
MINIO_SECRET_KEY MinIO secret key.
MINIO_BUCKET Bucket for repo archives and SAT reports.
POSTGRES_HOST PostgreSQL hostname.
POSTGRES_DB PostgreSQL database name.
POSTGRES_USER PostgreSQL user.
POSTGRES_PASSWORD PostgreSQL password.

See .env.example for the full list.

Current Status

CodeHawk is under active development. The current implementation focuses on GitHub PR summaries, static analysis, code indexing, and AI-assisted PR review workflows.