After making code changes, always run:
mise run format- Format Go code using goimports and gofumptmise run test- Run all tests to ensure nothing is broken
mise run build- Build the proxy binary for local usemise run run- Run the proxy locally on port 9877mise run build-worker- Build for Cloudflare Workers deploymentmise run wrangler-dev- Run local development server with Wrangler
mise run test- Run all testsgo test ./internal/server -v- Run specific package tests with verbose output
This is a Go proxy server that transforms standard Gemini API requests into Google's internal CloudCode format used by Gemini Code Assist API.
server.go- Main HTTP server with routing and request handlingadmin_middleware.go- Admin API middleware (protected byADMIN_API_KEY)chat_completions_handler.go- OpenAI-compatible chat completions endpointstream_generate_content_handler.go- Gemini streaming/non-streaming content endpointsmodels_handler.go- OpenAI-style models listing/details endpointgemini_helpers.go- Shared helpers (model normalization, path parsing, SSE unwrap)http_client*.go- HTTP client abstractions (separate Workers vs default implementations)
- provider.go - Interface for credential management
- file_provider.go - File-based credentials (local development)
- cloudflare_kv_provider.go - KV-based credentials (Workers deployment)
- Auto-handles OAuth token refresh when expired
- env.go - Environment variable access for standard Go
- env_workers.go - Environment variable access for Workers runtime
The codebase supports two deployment modes:
- Local/Traditional (
cmd/gemini-code-assist-proxy/) - Uses FileProvider for credentials - Cloudflare Workers (
cmd/gemini-proxy-worker/) - Uses CloudflareKVProvider for credentials
- URL Rewriting:
/v1beta/models/MODEL:ACTION→/v1internal:ACTION - Model Normalization: Any model containing "pro"→"gemini-2.5-pro", "flash"→"gemini-2.5-flash"
- Request Wrapping: Standard Gemini requests wrapped in CloudCode format with project ID
- Response Unwrapping: CloudCode responses unwrapped from "response" field
- SSE Streaming: Real-time transformation of Server-Sent Events for streaming responses
- Uses OAuth credentials (access_token, refresh_token) to authenticate with CloudCode API
- Automatically refreshes expired tokens using refresh_token
- For Workers: Admin API allows secure credential upload/management
- Supports both environment-provided project ID or auto-discovery via CloudCode API
- Cannot access filesystem - uses CloudflareKVProvider instead of FileProvider
- HTTP client uses Workers-compatible fetch API (
github.com/syumai/workers) - Graceful fallback for missing http.Flusher support in streaming responses
- Admin API required for credential management (no file access)
- All logging uses zerolog (
internal/logger) with structured logging - Environment variables handled through
internal/envabstraction - Credential providers implement common interface for different storage backends
- Server supports both regular JSON and SSE streaming responses
- Middleware applied to admin-protected routes (credentials, streaming, chat)
- IMPORTANT: Request body compression is intentionally disabled. Testing revealed that CloudCode API has severe performance issues with gzip-compressed requests (50+ seconds vs 2.6 seconds without compression).
- The proxy sends all requests uncompressed to ensure optimal streaming performance.
- This was discovered through debugging where direct
curlrequests to CloudCode API (without compression) were significantly faster than proxy requests with compression.