Nydra (Neural Hydra): End-to-end ML ops starter

Nydra is a production-grade template for ML APIs and pipelines. It couples FastAPI + MLflow + Metaflow with security, observability, and deployment patterns that work from laptop to enterprise.

Highlights

Inference/API: FastAPI with API keys (Argon2id), RBAC, optional OIDC/JWT, mTLS toggles, security headers/CSP, Redis-backed rate limiting (per-key/global/IP with circuit breaker), and OTEL tracing/logging.
Traffic control: Champion-only, canary/A/B, and shadow routing with hot-reload from MLflow Model Registry.
Model ops: Metaflow flows for training, evaluation/promotion guard (champion vs challenger), feature pipeline, drift monitoring, and key maintenance; promotion CLI and reports logged to MLflow.
Data contracts: Versioned Pandera schemas for training/inference with compatibility checks and metrics for alerts.
Observability: Prometheus metrics, Grafana dashboards + alert rules, OTLP hooks, structured logs with trace/span IDs; drift/eval metrics exposed for dashboards.
Security & secrets: Vault optional, env loading with fallbacks, TLS toggles for Postgres/Redis/MLflow, and guidance for secret sidecars; security headers and strict CORS presets for prod.
Deployment: Docker Compose dev/prod-ish stack, Helm/K8s manifests (ingress, HPA, PodSecurity, RBAC, NetworkPolicy), and Traefik guidance for shared self-hosted ingress.
Integrations: Extras for PyTorch, TensorFlow, Transformers, Ultralytics, HuggingFace hub download, DuckDB/Snowflake/BigQuery stubs.

Quickstart

Generate: cookiecutter https://github.com/matjsz/nydra.git
Configure: copy .env.example to .env; set MODEL_NAME, MLFLOW_TRACKING_URI, MLFLOW_BACKEND_URI (Postgres), MLFLOW_ARTIFACT_ROOT (S3/MinIO), APP_KEY, TRAFFIC_STRATEGY, MODEL_FLAVOR.
Run dev stack: docker compose --profile dev up --build -d (API, MLflow, MinIO, Postgres auth + MLflow backend, Redis, Prometheus, Grafana).
Train: uv run python -m src.flows.train run --model_name ${MODEL_NAME}.
Issue key: curl -H "X-App-Key: $APP_KEY" -X POST http://localhost:8000/auth/key -d '{"name":"dev","role":"admin"}'.
Call: curl -H "X-API-Key: <key>" -H "Content-Type: application/json" -d '{"input":0.5}' http://localhost:8000/predict.

Configuration essentials

Inference/model: MODEL_NAME, MODEL_FLAVOR, TRAFFIC_STRATEGY (champion_only|canary|shadow), PCT_CANARY, MLFLOW_TRACKING_URI, MLFLOW_BACKEND_URI, MLFLOW_ARTIFACT_ROOT.
Auth/keys: AUTH_DB_URL (Postgres), APP_KEY (issuance), ENABLE_KEY_MANAGER, KEY_DEFAULT_TTL_DAYS, KEY_ROTATION_GRACE_DAYS, KEY_NOTIFY_DAYS.
Rate limiting: RATE_LIMIT_BACKEND (local|redis), REDIS_URL, GLOBAL_RATE_LIMIT_PER_SEC, IP_RATE_LIMIT_PER_SEC, per-key limits set on issuance.
Security: ALLOWED_ORIGINS (required in prod), CONTENT_SECURITY_POLICY (if set), ENABLE_OIDC + issuer/audience/JWKS, DB_SSL_MODE/DB_SSL_ROOT_CERT, REDIS_USE_SSL/certs, mTLS toggles.
Observability: OTEL_EXPORTER_OTLP_ENDPOINT/headers, Prometheus scrape /metrics, alert rules in prom-rules.yml.
Secrets: VAULT_ENABLED=true with VAULT_ADDR, VAULT_TOKEN/VAULT_TOKEN_FILE, VAULT_NAMESPACE, VAULT_KV_MOUNT, VAULT_KV_PATH;
SMTP (optional for key expiry): SMTP_HOST, SMTP_PORT, SMTP_USER, SMTP_PASSWORD, SMTP_FROM.

Auth & key lifecycle

src/api/security/key_store.py stores Argon2id-hashed keys with prefixes; roles: consumer|tester|admin; per-key rate limits; validity windows (valid_from, expires_at); owner/contact metadata.
Endpoints: /auth/key (issue via APP_KEY or admin key), /auth/keys (list, admin only), /auth/keys/{id}/rotate, /auth/keys/{id}/deactivate.
Key Manager (src/api/services/key_manager.py) runs in-process to detect expiring/expired keys, emit Prom gauges, and email owners; optional Redis leader lock to avoid duplicate notifications.
Scheduled maintenance flow: src/flows/key_maintenance.py for deactivation + notifications (Metaflow/cron style).

Flows (Metaflow)

Train (src/flows/train.py): logs metrics to MLflow, registers model, sets aliases (champion/challenger), logs baseline stats.
Eval (src/flows/eval.py): compares champion vs challenger on holdout/production slices, logs Markdown/HTML reports to MLflow, blocks promotion on regression; supports approval flags/Slack webhook.
Features (src/flows/features.py): builds feature parquet, tags schema/version, optional Redis/Feast-style online push.
Drift (src/flows/drift.py): compares current data to baseline stats and logs drift metrics.
Promotion guard script: scripts/promotion_guarded.py to require fresh eval before alias changes.
Key maintenance (src/flows/key_maintenance.py): scheduled expiry checks/notifications.

Observability

Prometheus metrics: request latency, inference counts/errors by alias + endpoint, shadow usage, rate-limit decisions, per-key usage, key-expiry gauges.
Grafana dashboards provisioned in grafana/provisioning/dashboards; alert rules for latency/5xx/drift/eval in prom-rules.yml.
OTEL: src/api/utils/otel.py instruments FastAPI and clients; logs include trace_id/span_id when tracing is enabled.

Deployment

Docker Compose: dev/prod-ish stack in docker-compose.yml; see docs/self_hosted.md for self-hosted guidance, Traefik shared-ingress pattern, and Vault usage.
Helm/K8s: charts under helm/ and manifests under k8s/ include ingress, HPA, PodDisruptionBudget, PodSecurityContext, RBAC, NetworkPolicies, TLS toggles.
Traefik shared ingress: create traefik-public network once; run Traefik; attach each model stack with host-rule labels (see docs/self_hosted.md).

Data contracts & schemas

Pandera schemas for inference/training in src/api/utils/contracts.py; versions tracked via INFERENCE_SCHEMA_VERSION, TRAINING_SCHEMA_VERSION, and EXPECTED_TRAINING_SCHEMA_VERSION.
Compatibility checks enforced in training/inference; schema version metrics surface for alerting.

Security posture

Security headers + CSP, strict CORS in prod, mTLS toggles for internal services, Redis-backed limiter default in prod with circuit breaker, optional OIDC alongside API keys.
Secrets from Vault/env; guidance for rotation in docs/rotation.md; Alembic migration alembic/versions/0001_init_auth.py seeds auth schema; scripts/db_upgrade.py runs migrations.

Docs & assets

First-day walkthrough: docs/first_day.md
Architecture: docs/architecture.md
Self-hosted guide + Traefik/Vault notes: docs/self_hosted.md
Rotation/secrets: docs/rotation.md
Flows overview: docs/flows.md
Postman collection: postman_collection.json
Alerts/dashboards: prom-rules.yml, grafana/provisioning/dashboards/json/*.json

CI/testing

CI workflow runs smoke/e2e (scripts/smoke.py, scripts/e2e.py); unit/contract tests in tests/.
Run locally: uv run pytest.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
hooks		hooks
{{cookiecutter.project_slug}}		{{cookiecutter.project_slug}}
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
cookiecutter.json		cookiecutter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nydra (Neural Hydra): End-to-end ML ops starter

Highlights

Quickstart

Configuration essentials

Auth & key lifecycle

Flows (Metaflow)

Observability

Deployment

Data contracts & schemas

Security posture

Docs & assets

CI/testing

Built with sweat, scars and much love.

About

Uh oh!

Releases

Packages

Languages

matjsz/nydra

Folders and files

Latest commit

History

Repository files navigation

Nydra (Neural Hydra): End-to-end ML ops starter

Highlights

Quickstart

Configuration essentials

Auth & key lifecycle

Flows (Metaflow)

Observability

Deployment

Data contracts & schemas

Security posture

Docs & assets

CI/testing

Built with sweat, scars and much love.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages