Nydra is a production-grade template for ML APIs and pipelines. It couples FastAPI + MLflow + Metaflow with security, observability, and deployment patterns that work from laptop to enterprise.
- Inference/API: FastAPI with API keys (Argon2id), RBAC, optional OIDC/JWT, mTLS toggles, security headers/CSP, Redis-backed rate limiting (per-key/global/IP with circuit breaker), and OTEL tracing/logging.
- Traffic control: Champion-only, canary/A/B, and shadow routing with hot-reload from MLflow Model Registry.
- Model ops: Metaflow flows for training, evaluation/promotion guard (champion vs challenger), feature pipeline, drift monitoring, and key maintenance; promotion CLI and reports logged to MLflow.
- Data contracts: Versioned Pandera schemas for training/inference with compatibility checks and metrics for alerts.
- Observability: Prometheus metrics, Grafana dashboards + alert rules, OTLP hooks, structured logs with trace/span IDs; drift/eval metrics exposed for dashboards.
- Security & secrets: Vault optional, env loading with fallbacks, TLS toggles for Postgres/Redis/MLflow, and guidance for secret sidecars; security headers and strict CORS presets for prod.
- Deployment: Docker Compose dev/prod-ish stack, Helm/K8s manifests (ingress, HPA, PodSecurity, RBAC, NetworkPolicy), and Traefik guidance for shared self-hosted ingress.
- Integrations: Extras for PyTorch, TensorFlow, Transformers, Ultralytics, HuggingFace hub download, DuckDB/Snowflake/BigQuery stubs.
- Generate:
cookiecutter https://github.com/matjsz/nydra.git - Configure: copy
.env.exampleto.env; setMODEL_NAME,MLFLOW_TRACKING_URI,MLFLOW_BACKEND_URI(Postgres),MLFLOW_ARTIFACT_ROOT(S3/MinIO),APP_KEY,TRAFFIC_STRATEGY,MODEL_FLAVOR. - Run dev stack:
docker compose --profile dev up --build -d(API, MLflow, MinIO, Postgres auth + MLflow backend, Redis, Prometheus, Grafana). - Train:
uv run python -m src.flows.train run --model_name ${MODEL_NAME}. - Issue key:
curl -H "X-App-Key: $APP_KEY" -X POST http://localhost:8000/auth/key -d '{"name":"dev","role":"admin"}'. - Call:
curl -H "X-API-Key: <key>" -H "Content-Type: application/json" -d '{"input":0.5}' http://localhost:8000/predict.
- Inference/model:
MODEL_NAME,MODEL_FLAVOR,TRAFFIC_STRATEGY(champion_only|canary|shadow),PCT_CANARY,MLFLOW_TRACKING_URI,MLFLOW_BACKEND_URI,MLFLOW_ARTIFACT_ROOT. - Auth/keys:
AUTH_DB_URL(Postgres),APP_KEY(issuance),ENABLE_KEY_MANAGER,KEY_DEFAULT_TTL_DAYS,KEY_ROTATION_GRACE_DAYS,KEY_NOTIFY_DAYS. - Rate limiting:
RATE_LIMIT_BACKEND(local|redis),REDIS_URL,GLOBAL_RATE_LIMIT_PER_SEC,IP_RATE_LIMIT_PER_SEC, per-key limits set on issuance. - Security:
ALLOWED_ORIGINS(required in prod),CONTENT_SECURITY_POLICY(if set),ENABLE_OIDC+ issuer/audience/JWKS,DB_SSL_MODE/DB_SSL_ROOT_CERT,REDIS_USE_SSL/certs, mTLS toggles. - Observability:
OTEL_EXPORTER_OTLP_ENDPOINT/headers, Prometheus scrape/metrics, alert rules inprom-rules.yml. - Secrets:
VAULT_ENABLED=truewithVAULT_ADDR,VAULT_TOKEN/VAULT_TOKEN_FILE,VAULT_NAMESPACE,VAULT_KV_MOUNT,VAULT_KV_PATH; - SMTP (optional for key expiry):
SMTP_HOST,SMTP_PORT,SMTP_USER,SMTP_PASSWORD,SMTP_FROM.
src/api/security/key_store.pystores Argon2id-hashed keys with prefixes; roles: consumer|tester|admin; per-key rate limits; validity windows (valid_from,expires_at); owner/contact metadata.- Endpoints:
/auth/key(issue via APP_KEY or admin key),/auth/keys(list, admin only),/auth/keys/{id}/rotate,/auth/keys/{id}/deactivate. - Key Manager (
src/api/services/key_manager.py) runs in-process to detect expiring/expired keys, emit Prom gauges, and email owners; optional Redis leader lock to avoid duplicate notifications. - Scheduled maintenance flow:
src/flows/key_maintenance.pyfor deactivation + notifications (Metaflow/cron style).
- Train (
src/flows/train.py): logs metrics to MLflow, registers model, sets aliases (champion/challenger), logs baseline stats. - Eval (
src/flows/eval.py): compares champion vs challenger on holdout/production slices, logs Markdown/HTML reports to MLflow, blocks promotion on regression; supports approval flags/Slack webhook. - Features (
src/flows/features.py): builds feature parquet, tags schema/version, optional Redis/Feast-style online push. - Drift (
src/flows/drift.py): compares current data to baseline stats and logs drift metrics. - Promotion guard script:
scripts/promotion_guarded.pyto require fresh eval before alias changes. - Key maintenance (
src/flows/key_maintenance.py): scheduled expiry checks/notifications.
- Prometheus metrics: request latency, inference counts/errors by alias + endpoint, shadow usage, rate-limit decisions, per-key usage, key-expiry gauges.
- Grafana dashboards provisioned in
grafana/provisioning/dashboards; alert rules for latency/5xx/drift/eval inprom-rules.yml. - OTEL:
src/api/utils/otel.pyinstruments FastAPI and clients; logs include trace_id/span_id when tracing is enabled.
- Docker Compose: dev/prod-ish stack in
docker-compose.yml; seedocs/self_hosted.mdfor self-hosted guidance, Traefik shared-ingress pattern, and Vault usage. - Helm/K8s: charts under
helm/and manifests underk8s/include ingress, HPA, PodDisruptionBudget, PodSecurityContext, RBAC, NetworkPolicies, TLS toggles. - Traefik shared ingress: create
traefik-publicnetwork once; run Traefik; attach each model stack with host-rule labels (seedocs/self_hosted.md).
- Pandera schemas for inference/training in
src/api/utils/contracts.py; versions tracked viaINFERENCE_SCHEMA_VERSION,TRAINING_SCHEMA_VERSION, andEXPECTED_TRAINING_SCHEMA_VERSION. - Compatibility checks enforced in training/inference; schema version metrics surface for alerting.
- Security headers + CSP, strict CORS in prod, mTLS toggles for internal services, Redis-backed limiter default in prod with circuit breaker, optional OIDC alongside API keys.
- Secrets from Vault/env; guidance for rotation in
docs/rotation.md; Alembic migrationalembic/versions/0001_init_auth.pyseeds auth schema;scripts/db_upgrade.pyruns migrations.
- First-day walkthrough:
docs/first_day.md - Architecture:
docs/architecture.md - Self-hosted guide + Traefik/Vault notes:
docs/self_hosted.md - Rotation/secrets:
docs/rotation.md - Flows overview:
docs/flows.md - Postman collection:
postman_collection.json - Alerts/dashboards:
prom-rules.yml,grafana/provisioning/dashboards/json/*.json
- CI workflow runs smoke/e2e (
scripts/smoke.py,scripts/e2e.py); unit/contract tests intests/. - Run locally:
uv run pytest.