Skip to content

Refactor : Backend API to integrate zenml integrations #501

Open
kshitijrajsharma wants to merge 16 commits into
developfrom
refactor/zenml-integration
Open

Refactor : Backend API to integrate zenml integrations #501
kshitijrajsharma wants to merge 16 commits into
developfrom
refactor/zenml-integration

Conversation

@kshitijrajsharma
Copy link
Copy Markdown
Member

@kshitijrajsharma kshitijrajsharma commented May 5, 2026

What does this PR do ?

Refactors backend ; backend becomes a thin coordination layer over ZenML (ML pipelines) and STAC (model + dataset metadata). ZenML replaces Celery and Redis; the monolithic core app is split into per-domain apps. Auth is unified on Authorization: Bearer <token> for both dev and Hanko.

  • ML orchestration: training and prediction submit ZenML pipelines via shared/integrations/zenml.py. ramp-worker and yolo-worker are archived; ZenML now owns pipeline execution.
  • STAC source of record: datasets, base-models, local-models live in STAC. DB stores stac_id references. shared/integrations/stac.py is the only client.
  • Per-domain apps: accounts, datasets, modelregistry, trainings, predictions, feedback, notifications, stars, workspace, system. Cross-cutting code in shared/.
  • Lightweight worker: django.tasks + django-tasks-db (Postgres-backed) replaces Redis as the task broker. Used for dataset build and status-sync re-enqueues.
  • Auth: accounts/ (renamed from login/). Hanko via hotosm_auth_django (JWT + cookie). Dev via static FAIR_DEV_TOKEN. Both read Authorization: Bearer <token>. OpenAPI security scheme is HTTP Bearer.
  • Tooling: uv, uv-build, ruff, ty, commitizen, pre-commit. backend/justfile exposes setup / lint / test / run / migrate / worker.
  • Env: pydantic-settings in config/env.py. Required values raise loud at boot. Secrets use SecretStr. .env.example mirrors the README tables.
  • Docker: one multi-stage slim image at backend/Dockerfile. One top-level docker-compose.yml (postgres + api + worker).
  • Tests: focused suites (test_api.py, test_dataset_endpoints.py, test_training_endpoints.py, test_prediction_endpoints.py, test_zenml_wrapper.py). ZenML and STAC mocked. 120 passing.
  • Docs: backend/ARCHITECTURE.md (flow + API reference + curl walkthrough). Rewritten backend/README.md.

Breaking

  • Clients should send Authorization: Bearer <token>. The legacy access-token header is retired in favour of the unified Bearer scheme.
  • /api/v1/ path layout is reorganised per app. See backend/ARCHITECTURE.md.
  • Required env: SECRET_KEY, DATABASE_URL, FRONTEND_URL, API_BASE_URL, FAIR_ZENML_STORE_URL, FAIR_ZENML_STORE_API_KEY, FAIR_STAC_API_URL, BUCKET_NAME, AWS_*. Missing values raise at boot.
  • celery and redis containers are archived; background work runs under just worker.
  • docker-compose.dev.yml and docker-compose.prod.yml are consolidated into a single root docker-compose.yml. Deployment topology lives in infra/ and fAIr-models/infra/.

How to test

This branch is deployed at https://api.fair.krschap.tech/api/docs/

Every request uses -H "Authorization: Bearer $TOKEN". Substitute IDs returned from each step into the next.

  1. Create AOI

    curl -s -X POST "$BASE/aois/" \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "type": "Feature",
        "geometry": {
          "type": "Polygon",
          "coordinates": [[
            [85.51678, 27.63133], [85.52323, 27.63133],
            [85.52323, 27.63743], [85.51678, 27.63743],
            [85.51678, 27.63133]
          ]]
        },
        "properties": { "dataset": null }
      }'

    Capture properties.id as AOI_ID.

  2. Build dataset

    curl -s -X POST "$BASE/datasets/build/" \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "title": "smoke-banepa",
        "description": "e2e test",
        "source_imagery": "https://tiles.openaerialmap.org/62d85d11d8499800053796c1/0/62d85d11d8499800053796c2/{z}/{x}/{y}",
        "zoom": 19,
        "aoi_ids": ['"$AOI_ID"'],
        "label_tasks": ["object-detection"],
        "label_classes": [{"name": "building", "classes": ["*"]}],
        "keywords": ["building"],
        "label_type": "vector",
        "geometry_type": "polygon"
      }'

    Capture id as DATASET_ID and stac_id as STAC_ID.

  3. Poll dataset build

    curl -s "$BASE/datasets/$DATASET_ID/?expand=stac" \
      -H "Authorization: Bearer $TOKEN" | jq .build_status

    Repeat until "published" (~2 min for this AOI). Confirms STAC item + presigned chips/labels URLs.

  4. Submit training

    curl -s -X POST "$BASE/trainings/submit/" \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "base_model_stac_id": "yolo11n-detection",
        "dataset_stac_id": "'"$STAC_ID"'",
        "model_name": "yolo11n-detection-finetuned-banepa-smoke",
        "overrides": {"epochs": 3, "batch_size": 2, "learning_rate": 0.01, "chip_size": 640}
      }'

    Capture id as TR_ID and zenml_run_id as RUN_ID.

  5. Poll training, tail logs, cancel

    curl -s "$BASE/trainings/$TR_ID/" -H "Authorization: Bearer $TOKEN" | jq .status
    curl -s "$BASE/trainings/runs/$RUN_ID/logs/?tail=100" -H "Authorization: Bearer $TOKEN"
    # cancel test (run on a fresh training only):
    curl -s -X POST "$BASE/trainings/runs/$RUN_ID/cancel/?graceful=true" \
      -H "Authorization: Bearer $TOKEN"

    Repeat the first call until "completed" (~22 min).

  6. Publish local model

    curl -s -X POST "$BASE/trainings/$TR_ID/publish/" \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"description": "smoke"}'

    Capture local_model_stac_id as LM_STAC_ID. Verify:

    curl -s "$BASE/local-models/" -H "Authorization: Bearer $TOKEN" | jq
  7. Submit prediction

    curl -s -X POST "$BASE/predictions/submit/" \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "model_stac_id": "'"$LM_STAC_ID"'",
        "image_uri": "https://tiles.openaerialmap.org/62d85d11d8499800053796c1/0/62d85d11d8499800053796c2/{z}/{x}/{y}",
        "bbox": [85.51678, 27.63133, 85.52323, 27.63743],
        "zoom": 19,
        "params": {"confidence_threshold": 0.25}
      }'

    Capture id as PRED_ID.

  8. Fetch prediction result

    curl -s "$BASE/predictions/$PRED_ID/" -H "Authorization: Bearer $TOKEN" | jq .results_ready
    # once true:
    curl -s "$BASE/predictions/$PRED_ID/result/" -H "Authorization: Bearer $TOKEN"

    Returns three presigned URLs (geojson, fgb, pmtiles). Open the geojson; expect ~150-200 building polygons.

  9. Public publish toggle

    curl -s -X POST "$BASE/predictions/$PRED_ID/publish/" -H "Authorization: Bearer $TOKEN"
    curl -s -o /dev/null -w "%{http_code}\n" "$BASE/predictions/$PRED_ID/"     # expect 200 anonymously
    curl -s -X POST "$BASE/predictions/$PRED_ID/unpublish/" -H "Authorization: Bearer $TOKEN"
    curl -s -o /dev/null -w "%{http_code}\n" "$BASE/predictions/$PRED_ID/"     # expect 401/404
  10. Swagger: open http://127.0.0.1:8000/api/docs/, click Authorize, paste FAIR_DEV_TOKEN in the single Bearer field, run GET /api/v1/aois/. Expect 200.

TODO

  • Fix the permission for public endpoints
  • Write up doc
  • End to End testing with API endpoints only

@kshitijrajsharma kshitijrajsharma marked this pull request as ready for review May 18, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant