@@ -13,7 +13,8 @@ The service is built with **FastAPI** and features automatic route discovery bas
1313- ** Routing:** Dynamic — endpoints are auto-generated from the dbt ` manifest.json `
1414- ** Documentation:** OpenAPI (Swagger UI) & ReDoc (auto-generated)
1515- ** Security:** Header-based API Key authentication (` X-API-Key ` )
16- - ** Rate Limiting:** In-memory throttling per tier (Free/Pro/Unlimited) using ` slowapi `
16+ - ** Rate Limiting:** Tier-based per-pod in-memory throttling using ` slowapi `
17+ - ** Observability:** Structured JSON logging (stderr), Prometheus metrics (` /metrics ` ), Grafana dashboard
1718
1819---
1920
@@ -23,17 +24,29 @@ The service is built with **FastAPI** and features automatic route discovery bas
2324/cerebro-api
2425├── Dockerfile # Multi-stage Docker build definition
2526├── requirements.txt # Python dependencies
27+ ├── requirements-dev.txt # Dev/test dependencies (pytest, httpx)
2628├── .env.example # Template for environment variables
2729├── api_keys.json # API keys configuration (git-ignored)
2830├── .gitignore # Git ignore rules
29- └── app
30- ├── main.py # App entry point
31- ├── config.py # Settings & Env var loading
32- ├── database.py # ClickHouse client wrapper
33- ├── security.py # Auth & Rate limiting logic
34- ├── manifest.py # Logic to download & parse dbt manifest
35- └── factory.py # ⚙️ The Engine: auto-generates routes
36- ````
31+ ├── app/
32+ │ ├── server.py # Process entrypoint (setup logging, start uvicorn)
33+ │ ├── main.py # FastAPI app, middleware, system endpoints
34+ │ ├── observability.py # JSON logging, Prometheus metrics, middleware
35+ │ ├── config.py # Settings & env var loading
36+ │ ├── database.py # ClickHouse client wrapper
37+ │ ├── security.py # Auth resolution, access enforcement, rate limiting
38+ │ ├── manifest.py # dbt manifest loader with structured logging
39+ │ ├── router_manager.py # Dynamic route lifecycle & background refresh
40+ │ ├── factory.py # Dynamic route generation engine
41+ │ └── api_metadata.py # Endpoint spec parsing & validation
42+ └── tests/
43+ ├── conftest.py # Shared fixtures (mocked DB, manifest, API keys)
44+ ├── test_endpoints.py # /, /health, /metrics tests
45+ ├── test_auth.py # Auth resolution & tier access tests
46+ ├── test_observability.py # JSON formatter, log_event, middleware tests
47+ ├── test_rate_limiting.py # Rate limit enforcement & metrics tests
48+ └── test_manifest.py # Manifest refresh & router_manager tests
49+ ```
3750
3851---
3952
@@ -117,15 +130,30 @@ Create an `api_keys.json` file in your project root:
117130### 5. Run the Server
118131
119132``` bash
120- uvicorn app.main:app --reload
133+ python -m app.server
134+ ```
135+
136+ For development with auto-reload:
137+
138+ ``` bash
139+ uvicorn app.main:app --reload --proxy-headers
121140```
122141
123142The API will be available at:
124143
125144* Root: ` http://127.0.0.1:8000 `
145+ * Health check: ` http://127.0.0.1:8000/health `
146+ * Prometheus metrics: ` http://127.0.0.1:8000/metrics `
126147* Interactive Docs (Swagger UI): ` http://127.0.0.1:8000/docs `
127148* Alternative Docs (ReDoc): ` http://127.0.0.1:8000/redoc `
128149
150+ ### 6. Run Tests
151+
152+ ``` bash
153+ pip install -r requirements-dev.txt
154+ ./venv/bin/pytest tests/
155+ ```
156+
129157---
130158
131159## API Authentication & Access Tiers
@@ -424,6 +452,54 @@ Create separate models for different time granularities:
424452
425453---
426454
455+ ## Observability
456+
457+ The API emits structured JSON logs to stderr and exposes Prometheus metrics at ` /metrics ` .
458+
459+ ### Structured Logging
460+
461+ All log output is JSON, one object per line, with fields: ` timestamp ` , ` level ` , ` logger ` , ` message ` , ` event ` , plus context-specific fields. Logs never contain raw API keys, SQL text, query parameters, or request bodies.
462+
463+ Key log events:
464+
465+ | Event | Description |
466+ | -------| -------------|
467+ | ` http_request ` | Every HTTP request (method, route, status, duration) |
468+ | ` api_access_denied ` | Auth failures (reason, required/provided tier) |
469+ | ` api_rate_limit ` | Rate-limit blocks (tier, identity kind) |
470+ | ` clickhouse_query ` | ClickHouse queries (category, resource, granularity, tier, row count, duration) |
471+ | ` manifest_refresh ` | Manifest reload lifecycle (trigger, status, model count) |
472+ | ` route_install ` | Dynamic route registration (path, model, tier, methods) |
473+
474+ ### Prometheus Metrics
475+
476+ All metrics are prefixed with ` cerebro_api_ ` . Key metric families:
477+
478+ | Metric | Type | Labels |
479+ | --------| ------| --------|
480+ | ` http_requests_total ` | Counter | method, route, status |
481+ | ` http_request_duration_seconds ` | Histogram | method, route |
482+ | ` auth_resolutions_total ` | Counter | required_tier, result |
483+ | ` access_denied_total ` | Counter | required_tier, provided_tier, reason |
484+ | ` rate_limit_decisions_total ` | Counter | tier, result, identity_kind |
485+ | ` dynamic_requests_total ` | Counter | category, resource, granularity, tier, method, status |
486+ | ` dynamic_request_duration_seconds ` | Histogram | category, resource, granularity, tier, method |
487+ | ` clickhouse_query_duration_seconds ` | Histogram | category, resource, granularity, tier, status |
488+ | ` clickhouse_query_errors_total ` | Counter | category, resource, granularity, tier |
489+ | ` clickhouse_rows_returned ` | Histogram | category, resource, granularity, tier |
490+ | ` manifest_refresh_total ` | Counter | trigger, status |
491+ | ` manifest_models_loaded ` | Gauge | — |
492+ | ` dynamic_routes_registered ` | Gauge | — |
493+
494+ ### Kubernetes Integration
495+
496+ - ** PodMonitor** scrapes ` /metrics ` on port ` http ` — Prometheus discovers pods directly
497+ - ` /metrics ` is blocked at the public ALB with a fixed-response 403 rule
498+ - K8s probes use ` /health ` (returns 200 when ClickHouse is reachable, 503 otherwise)
499+ - A dedicated Grafana dashboard (` cerebro-api-observability ` ) provides real-time visibility across traffic, auth, rate limits, dynamic API, ClickHouse, manifest lifecycle, pod resources, and structured logs
500+
501+ ---
502+
427503## Deployment (Docker)
428504
429505This service is designed to run as a ** stateless container** on Kubernetes.
@@ -532,13 +608,16 @@ curl -X POST http://localhost:8000/v1/system/manifest/refresh \
532608
533609| File | Purpose |
534610|------|---------|
535- | `app/main.py` | FastAPI app initialization |
611+ | `app/server.py` | Process entrypoint — sets up logging before app import |
612+ | `app/main.py` | FastAPI app, CORS, middleware, system endpoints (`/`, `/health`, `/metrics`) |
613+ | `app/observability.py` | JSON log formatter, Prometheus metrics, HTTP middleware, observer helpers |
536614| `app/config.py` | Settings & environment loading |
537615| `app/database.py` | ClickHouse client wrapper |
538616| `app/api_metadata.py` | dbt endpoint metadata parsing and validation |
539- | `app/security.py` | Authentication & tier access control |
540- | `app/manifest.py` | dbt manifest loader |
541- | `app/factory.py` | Dynamic route generation engine |
617+ | `app/security.py` | Auth resolution, access enforcement, rate-limit helpers |
618+ | `app/manifest.py` | dbt manifest loader with structured logging and model gauge |
619+ | `app/router_manager.py` | Dynamic route lifecycle, background refresh, manifest refresh metrics |
620+ | `app/factory.py` | Dynamic route generation engine with instrumentation |
542621
543622# ## Adding Custom Endpoints
544623
0 commit comments