This repository contains the API service layer for dbt-cerebro. It exposes data transformed by dbt models (stored in ClickHouse) via a high-performance, metadata-driven REST API.
The service is built with FastAPI and features automatic route discovery based on your dbt manifest. It includes built-in documentation, rate limiting, and API key management.
- Framework: Python 3.11 + FastAPI (Async)
- Database: ClickHouse (via
clickhouse-connect) - Routing: Dynamic — endpoints are auto-generated from the dbt
manifest.json - Documentation: OpenAPI (Swagger UI) & ReDoc (auto-generated)
- Security: Header-based API Key authentication (
X-API-Key) - Rate Limiting: Tier-based per-pod in-memory throttling using
slowapi - Observability: Structured JSON logging (stderr), Prometheus metrics (
/metrics), Grafana dashboard
/cerebro-api
├── Dockerfile # Multi-stage Docker build definition
├── requirements.txt # Python dependencies
├── requirements-dev.txt # Dev/test dependencies (pytest, httpx)
├── .env.example # Template for environment variables
├── api_keys.json # API keys configuration (git-ignored)
├── .gitignore # Git ignore rules
├── app/
│ ├── server.py # Process entrypoint (setup logging, start uvicorn)
│ ├── main.py # FastAPI app, middleware, system endpoints
│ ├── observability.py # JSON logging, Prometheus metrics, middleware
│ ├── config.py # Settings & env var loading
│ ├── database.py # ClickHouse client wrapper
│ ├── security.py # Auth resolution, access enforcement, rate limiting
│ ├── manifest.py # dbt manifest loader with structured logging
│ ├── router_manager.py # Dynamic route lifecycle & background refresh
│ ├── factory.py # Dynamic route generation engine
│ └── api_metadata.py # Endpoint spec parsing & validation
└── tests/
├── conftest.py # Shared fixtures (mocked DB, manifest, API keys)
├── test_endpoints.py # /, /health, /metrics tests
├── test_auth.py # Auth resolution & tier access tests
├── test_observability.py # JSON formatter, log_event, middleware tests
├── test_rate_limiting.py # Rate limit enforcement & metrics tests
└── test_manifest.py # Manifest refresh & router_manager tests
Follow these steps to run the API locally without Docker for development or debugging.
- Python 3.10+
- Access to a ClickHouse instance (Local or Cloud)
# Create virtual environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install dependencies
pip install -r requirements.txtCopy the example environment file and configure your ClickHouse credentials:
cp .env.example .env
# Edit .env with your actual credentials
nano .envKey settings in .env:
| Variable | Description | Example |
|---|---|---|
CLICKHOUSE_URL |
ClickHouse Cloud hostname | abc123.eu-central-1.aws.clickhouse.cloud |
CLICKHOUSE_PORT |
ClickHouse port | 8443 |
CLICKHOUSE_USER |
Database username | default |
CLICKHOUSE_PASSWORD |
Database password | your_password |
CLICKHOUSE_DATABASE |
Database name | default |
CLICKHOUSE_SECURE |
Use HTTPS | true |
DBT_MANIFEST_URL |
URL to your live manifest.json |
https://gnosischain.github.io/dbt-cerebro/manifest.json |
DBT_MANIFEST_PATH |
Fallback local path | ./manifest.json |
DBT_MANIFEST_REFRESH_ENABLED |
Enable automatic manifest refresh | true |
DBT_MANIFEST_REFRESH_INTERVAL_SECONDS |
Refresh interval in seconds | 300 |
Create an api_keys.json file in your project root:
{
"sk_live_alice_abc123": {
"user": "alice",
"tier": "tier2",
"org": "Gnosis Core"
},
"sk_live_bob_xyz789": {
"user": "bob",
"tier": "tier1",
"org": "Partner Inc"
},
"sk_live_public_key": {
"user": "public",
"tier": "tier0",
"org": "Public"
},
"sk_live_internal_admin": {
"user": "admin",
"tier": "tier3",
"org": "Gnosis Internal"
}
}
⚠️ Security Note: Addapi_keys.jsonto your.gitignorefile!
python -m app.serverFor development with auto-reload:
uvicorn app.main:app --reload --proxy-headersThe API will be available at:
- Root:
http://127.0.0.1:8000 - Health check:
http://127.0.0.1:8000/health - Prometheus metrics:
http://127.0.0.1:8000/metrics - Interactive Docs (Swagger UI):
http://127.0.0.1:8000/docs - Alternative Docs (ReDoc):
http://127.0.0.1:8000/redoc
pip install -r requirements-dev.txt
./venv/bin/pytest tests/All requests must include the X-API-Key header.
Higher tier users can access all endpoints at or below their tier level.
| Tier | Access Level | Rate Limit | Can Access |
|---|---|---|---|
tier0 |
Public | 20 req/min | tier0 only |
tier1 |
Partner | 100 req/min | tier0, tier1 |
tier2 |
Premium | 500 req/min | tier0, tier1, tier2 |
tier3 |
Internal | 10,000 req/min | All endpoints |
curl -X 'GET' \
'http://localhost:8000/v1/consensus/blob_commitments/daily' \
-H 'accept: application/json' \
-H 'X-API-Key: sk_live_alice_abc123'Missing API Key (403):
{"detail": "Missing authentication header: X-API-Key"}Invalid API Key (403):
{"detail": "Invalid API Key"}Insufficient Tier Access (403):
{"detail": "Access denied. This endpoint requires tier2 access. User 'bob' has tier1 access."}The API is metadata-driven. You do not need to write Python code to add new endpoints.
A dbt model will be exposed as an API endpoint if it meets both conditions:
- ✅ Model has the
productiontag - ✅ Model has an
api:tag defining the resource name
Use dbt tags to control endpoint paths, Swagger UI grouping, and access control:
{{
config(
materialized='view',
tags=["production", "consensus", "tier1", "api:blob_commitments", "granularity:daily"]
)
}}| Tag | Format | Purpose | Required |
|---|---|---|---|
production |
literal | Marks model for API exposure | ✅ Yes |
| Category | consensus, execution, etc. |
Swagger UI section & URL prefix | ✅ Yes |
| Tier | tier0, tier1, tier2, tier3 |
Access control level | No (default: tier0) |
| Resource | api:{resource_name} |
Explicit resource name in URL | ✅ Yes |
| Granularity | granularity:{period} |
Time dimension suffix in URL | No |
The URL path is built from tags: /{category}/{resource}/{granularity?}
| Tags | Generated Path |
|---|---|
["production", "consensus", "api:blob_commitments", "granularity:daily"] |
/consensus/blob_commitments/daily |
["production", "consensus", "api:blob_commitments", "granularity:latest"] |
/consensus/blob_commitments/latest |
["production", "execution", "api:transactions"] |
/execution/transactions |
["production", "financial", "tier2", "api:treasury"] |
/financial/treasury |
Model: api_consensus_blob_commitments_daily.sql
{{
config(
materialized='view',
tags=["production", "consensus", "tier1", "api:blob_commitments", "granularity:daily"]
)
}}
SELECT
date,
total_blob_commitments AS value
FROM {{ ref('int_consensus_blocks_daily') }}
ORDER BY dateResult:
- Endpoint:
GET /v1/consensus/blob_commitments/daily - Swagger Section:
Consensus - Access:
tier1(Partner and above)
Endpoint behavior is now strict and metadata-only.
- No inferred filters.
- No inferred pagination.
- No inferred sort.
granularity:*tags only affect the URL path segment.- Swagger route listing is deterministic: grouped by
/{category}/{resource}then granularity priority (none,latest,daily,weekly,monthly,last_7d,last_30d,in_ranges,all_time, unknowns last).
- Legacy endpoint (
production+api:*, nometa.api):- GET only
- no declared filters
- no pagination
- no sort
- full result returned
- any query params return
400
- Metadata-driven endpoint (
meta.apipresent):- filters, methods, pagination, sort, and exclusion come only from
meta.api - undeclared query params or body fields return
400 limit/offsetonly work when pagination is enabled
- filters, methods, pagination, sort, and exclusion come only from
| Field | Type | Default | Description |
|---|---|---|---|
methods |
list | ["GET"] |
Allowed HTTP methods (GET, POST) |
allow_unfiltered |
bool | false |
Allow requests with no declared business filters |
require_any_of |
list | [] |
At least one listed filter name must be present |
exclude_from_api |
bool | false |
Skip route registration entirely so the endpoint is absent from the API/docs and requests return 404 |
parameters |
list | [] |
Declared filter contract |
pagination |
object | disabled | Enables limit and offset |
sort |
list | [] |
Explicit ORDER BY list |
Parameter object fields:
| Key | Required | Description |
|---|---|---|
name |
Yes | API field name used by clients |
column |
Yes | Final projected model column to filter on |
operator |
Yes | One of =, >=, <=, ILIKE, IN |
type |
Yes | One of string, date, string_list |
description |
No | OpenAPI description |
case |
No | lower or upper (string and string_list only) |
max_items |
No | Max list size (string_list only) |
Pagination object fields:
| Key | Required When Enabled | Description |
|---|---|---|
enabled |
Yes | Must be true to enable pagination |
default_limit |
Yes | Default row limit when client omits limit |
max_limit |
Yes | Hard upper bound for limit |
Sort object fields:
| Key | Required | Description |
|---|---|---|
column |
Yes | Column name present in final SELECT |
direction |
Yes | ASC or DESC |
{{
config(
materialized='view',
tags=['production', 'execution', 'tier1', 'api:token_balances', 'granularity:daily'],
meta={
"api": {
"methods": ["GET", "POST"],
"allow_unfiltered": false,
"require_any_of": ["symbol", "address"],
"exclude_from_api": false,
"parameters": [
{"name": "symbol", "column": "symbol", "operator": "=", "type": "string"},
{"name": "address", "column": "address", "operator": "IN", "type": "string_list", "case": "lower", "max_items": 200},
{"name": "start_date", "column": "date", "operator": ">=", "type": "date"},
{"name": "end_date", "column": "date", "operator": "<=", "type": "date"}
],
"pagination": {"enabled": true, "default_limit": 100, "max_limit": 5000},
"sort": [{"column": "date", "direction": "DESC"}]
}
}
)
}}GET list filters accept repeated params and CSV:
curl "http://localhost:8000/v1/execution/token_balances/daily?symbol=ETH&address=0x1&address=0x2&limit=50" \
-H 'X-API-Key: sk_live_internal_admin'curl "http://localhost:8000/v1/execution/token_balances/daily?symbol=ETH&address=0x1,0x2&limit=50" \
-H 'X-API-Key: sk_live_internal_admin'POST accepts only JSON body fields declared in metadata:
curl -X POST "http://localhost:8000/v1/execution/token_balances/daily" \
-H "Content-Type: application/json" \
-H "X-API-Key: sk_live_internal_admin" \
-d '{
"symbol": "ETH",
"address": ["0x1", "0x2"],
"start_date": "2024-01-01",
"end_date": "2024-01-31",
"limit": 100,
"offset": 0
}'- Prefer thin API-facing dbt views over exposing heavy internal models directly.
- Every
parameters[].columnandsort[].columnmust be present in the finalSELECT. - Do not rely on
granularity:latest,daily, orall_timefor behavior. They only shape URL paths. - If you want date filters, declare them in
meta.api.parameters. - If you want pagination, declare
meta.api.pagination. - If you want ordering, declare
meta.api.sort. - If you want POST, include
"POST"inmeta.api.methods.
Create separate models for different time granularities:
-- api_consensus_blob_commitments_daily.sql
{{ config(tags=["production", "consensus", "tier1", "api:blob_commitments", "granularity:daily"]) }}
-- api_consensus_blob_commitments_latest.sql
{{ config(tags=["production", "consensus", "tier0", "api:blob_commitments", "granularity:latest"]) }}
-- api_consensus_blob_commitments_last_30d.sql
{{ config(tags=["production", "consensus", "tier1", "api:blob_commitments", "granularity:last_30d"]) }}
-- api_consensus_blob_commitments_all_time.sql
{{ config(tags=["production", "consensus", "tier2", "api:blob_commitments", "granularity:all_time"]) }}Generated Endpoints:
GET /v1/consensus/blob_commitments/daily(tier1)GET /v1/consensus/blob_commitments/latest(tier0)GET /v1/consensus/blob_commitments/last_30d(tier1)GET /v1/consensus/blob_commitments/all_time(tier2)
| Granularity | Use Case |
|---|---|
daily |
Daily aggregated data |
weekly |
Weekly aggregated data |
monthly |
Monthly aggregated data |
latest |
Most recent value(s) only |
last_7d |
Rolling 7-day window |
last_30d |
Rolling 30-day window |
in_ranges |
Data within specified ranges |
all_time |
Complete historical data |
| Tag Type | Examples | Purpose |
|---|---|---|
| Required | production |
Marks model for API exposure |
| Category | consensus, execution, financial |
First tag = Swagger UI section + URL prefix |
| Access | tier0, tier1, tier2, tier3 |
Required tier level (default: tier0) |
| Resource | api:blob_commitments, api:validators |
Explicit resource name in URL |
| Granularity | granularity:daily, granularity:weekly, granularity:monthly, granularity:latest, granularity:in_ranges, granularity:last_30d, granularity:last_7d, granularity:all_time |
Optional time/range suffix |
| Ignored | view, table, incremental |
Filtered out from URL/grouping |
- Create Model — Name it descriptively (e.g.,
api_consensus_blob_commitments_daily.sql) - Add Tags — Include
production+ category +api:resource+ optionalgranularity:+ tier - Add
meta.api— Define methods, filters, unfiltered policy, pagination, and sort - Deploy — Merge PR, CI/CD updates
manifest.json - Result — API auto-discovers or updates the endpoint on the next manifest refresh
The API emits structured JSON logs to stderr and exposes Prometheus metrics at /metrics.
All log output is JSON, one object per line, with fields: timestamp, level, logger, message, event, plus context-specific fields. Logs never contain raw API keys, SQL text, query parameters, or request bodies.
Key log events:
| Event | Description |
|---|---|
http_request |
Every HTTP request (method, route, status, duration) |
api_access_denied |
Auth failures (reason, required/provided tier) |
api_rate_limit |
Rate-limit blocks (tier, identity kind) |
clickhouse_query |
ClickHouse queries (category, resource, granularity, tier, row count, duration) |
manifest_refresh |
Manifest reload lifecycle (trigger, status, model count) |
route_install |
Dynamic route registration (path, model, tier, methods) |
All metrics are prefixed with cerebro_api_. Key metric families:
| Metric | Type | Labels |
|---|---|---|
http_requests_total |
Counter | method, route, status |
http_request_duration_seconds |
Histogram | method, route |
auth_resolutions_total |
Counter | required_tier, result |
access_denied_total |
Counter | required_tier, provided_tier, reason |
rate_limit_decisions_total |
Counter | tier, result, identity_kind |
dynamic_requests_total |
Counter | category, resource, granularity, tier, method, status |
dynamic_request_duration_seconds |
Histogram | category, resource, granularity, tier, method |
clickhouse_query_duration_seconds |
Histogram | category, resource, granularity, tier, status |
clickhouse_query_errors_total |
Counter | category, resource, granularity, tier |
clickhouse_rows_returned |
Histogram | category, resource, granularity, tier |
manifest_refresh_total |
Counter | trigger, status |
manifest_models_loaded |
Gauge | — |
dynamic_routes_registered |
Gauge | — |
- PodMonitor scrapes
/metricson porthttp— Prometheus discovers pods directly /metricsis blocked at the public ALB with a fixed-response 403 rule- K8s probes use
/health(returns 200 when ClickHouse is reachable, 503 otherwise) - A dedicated Grafana dashboard (
cerebro-api-observability) provides real-time visibility across traffic, auth, rate limits, dynamic API, ClickHouse, manifest lifecycle, pod resources, and structured logs
This service is designed to run as a stateless container on Kubernetes.
docker build -t gnosis/cerebro-api:latest .docker run -d \
--name cerebro-api \
-p 8000:8000 \
--env-file .env \
-v $(pwd)/api_keys.json:/code/api_keys.json:ro \
gnosis/cerebro-api:latestWhen deploying to K8s, inject environment variables via ConfigMap or Secret.
Security Note: Never commit
api_keys.jsonorCLICKHOUSE_PASSWORDto git. Always use K8s Secrets or a Secrets Manager (Vault / AWS SSM / etc).
Sample deployment.yaml snippet:
env:
- name: CLICKHOUSE_URL
value: "your-clickhouse-url.com"
- name: CLICKHOUSE_PASSWORD
valueFrom:
secretKeyRef:
name: cerebro-secrets
key: clickhouse_password
- name: DBT_MANIFEST_URL
value: "https://gnosischain.github.io/dbt-cerebro/manifest.json"
volumeMounts:
- name: api-keys
mountPath: /code/api_keys.json
subPath: api_keys.json
readOnly: true
volumes:
- name: api-keys
secret:
secretName: cerebro-api-keys| Variable | Required | Default | Description |
|---|---|---|---|
CLICKHOUSE_URL |
No | None |
ClickHouse Cloud URL (takes precedence over HOST) |
CLICKHOUSE_HOST |
No | localhost |
ClickHouse hostname |
CLICKHOUSE_PORT |
No | 8443 |
ClickHouse port |
CLICKHOUSE_USER |
No | default |
ClickHouse username |
CLICKHOUSE_PASSWORD |
Yes | "" |
ClickHouse password |
CLICKHOUSE_DATABASE |
No | default |
ClickHouse database |
CLICKHOUSE_SECURE |
No | true |
Use HTTPS connection |
DBT_MANIFEST_URL |
No | GitHub Pages URL | Remote manifest URL |
DBT_MANIFEST_PATH |
No | ./manifest.json |
Local manifest fallback |
DBT_MANIFEST_REFRESH_ENABLED |
No | true |
Enable automatic manifest refresh |
DBT_MANIFEST_REFRESH_INTERVAL_SECONDS |
No | 300 |
Refresh interval in seconds |
API_KEYS_FILE |
No | ./api_keys.json |
Path to API keys file |
DEFAULT_ENDPOINT_TIER |
No | tier0 |
Default tier for untagged endpoints |
The API polls the manifest URL automatically and rebuilds routes when it changes.
You can force an immediate refresh with a tier3 API key:
curl -X POST http://localhost:8000/v1/system/manifest/refresh \
-H 'X-API-Key: sk_live_internal_admin'{
"sk_live_<unique_id>": {
"user": "username",
"tier": "tier0|tier1|tier2|tier3",
"org": "Organization Name"
}
}| File | Purpose |
|---|---|
app/server.py |
Process entrypoint — sets up logging before app import |
app/main.py |
FastAPI app, CORS, middleware, system endpoints (/, /health, /metrics) |
app/observability.py |
JSON log formatter, Prometheus metrics, HTTP middleware, observer helpers |
app/config.py |
Settings & environment loading |
app/database.py |
ClickHouse client wrapper |
app/api_metadata.py |
dbt endpoint metadata parsing and validation |
app/security.py |
Auth resolution, access enforcement, rate-limit helpers |
app/manifest.py |
dbt manifest loader with structured logging and model gauge |
app/router_manager.py |
Dynamic route lifecycle, background refresh, manifest refresh metrics |
app/factory.py |
Dynamic route generation engine with instrumentation |
api_config.yaml is now limited to route-level overrides only:
endpoints:
- model: fct_custom_table
path: /custom/endpoint
summary: "Custom endpoint"
tags: ["Custom"]
tier: "tier1"Behavior is never overridden from api_config.yaml.
- filters come only from
meta.api.parameters - methods come only from
meta.api.methods - pagination comes only from
meta.api.pagination - sort comes only from
meta.api.sort - unfiltered policy comes only from
meta.api.allow_unfilteredandmeta.api.require_any_of - exclusion comes only from
meta.api.exclude_from_api
MIT