Skip to content

jdrodjpl/urban-giggle

Repository files navigation

Frozon ISS Ingest Job

Sibling pipeline to czdt-iss-ingest-job, using Frozon's processing logic. First stage implemented: TIFF → Cloud Optimized GeoTIFF (COG) using the low-memory streaming approach from convert_to_cog_lowmem.py.

Pipeline Architecture

Data Flow

S3 TIFF input(s) → COG (gdal_translate, low-memory) → S3 upload → STAC cataloging

Components

  • frozon-iss-ingest-cog — per-acquisition-day DPS worker. Re-queries CMR for its date, downloads via EDL, mosaics multi-UTM granules to EPSG:3413 with gdalwarp, runs gdal_translate -of COG, uploads the COG, and emits a STAC catalog.
  • GH Actions cron (scripts/submit_cog_pipeline.py) — runs daily. Discovers acquisition dates via per-day CMR .hits(), drops the newest plus any below-threshold partial dates, pre-checks S3, and submits one worker per missing complete date. Also runs a retention sweep that keeps the N most recent date folders in S3.

Usage

The cron is the production path. For manual triggers see .github/workflows/daily-cog-ingest.yml (workflow_dispatch).

Worker (direct, local or DPS)

python src/ingest_cog.py \
  --input-s3 "s3://source-bucket/path/file.tif" \
  --collection-id "C123456789-FROZON" \
  --s3-bucket "target-bucket" \
  --s3-prefix "data/cogs" \
  --role-arn "arn:aws:iam::123456789:role/S3AccessRole" \
  --max-memory 512 --blocksize 512 --compress DEFLATE

Tuning

Flag Default Notes
--max-memory 512 GDAL_CACHEMAX MB for the conversion subprocess
--blocksize 512 256 / 512 / 1024 — drop to 256 for tighter memory
--compress DEFLATE DEFLATE / LZW / JPEG / WEBP / NONE
--resampling nearest base data resampling
--overview-resampling average overview pyramid resampling
--filter glob applied to file basenames during prefix listing
--limit cap discovered inputs (testing)
--overwrite off replace existing S3 outputs

Environment

conda env update -f environment.yml
conda activate ingest

MAAP Deployment

One algorithm registers with MAAP DPS:

Algorithm Build / run YAML
frozon-iss-ingest-cog:v2 .maap/ingest-cog/ .maap/sample-algo-configs/frozon-iss-ingest-cog.yml

The GH Actions cron submits jobs against algo_id="frozon-iss-ingest-cog" on the maap-dps-worker-32vcpu-64gb queue (full-Arctic daily mosaics need the high-vCPU/RAM worker). The orchestrator-as-DPS-job that used to live under .maap/cog-pipeline/ has been retired; orchestration runs directly in the GH Actions runner now.

Output Products

  • Cloud Optimized GeoTIFFs in s3://<s3-bucket>/<s3-prefix>/<collection-id>/YYYY/MM/DD/<input>_COG.tif — date is derived from the STAC item's datetime (read from TIFF tags by rio_stac; falls back to current UTC time when absent).
  • STAC Items upserted into MMGIS under <collection-id>. Reruns merge into the existing collection by default (--upsert); pass --no-upsert to fail on item conflicts.
  • Product Notifications via the CMSS logger.
  • Optional post-STAC webhook — when --post-stac-webhook-url is set, the orchestrator POSTs {event, collection_id, item_id, asset_uri} to that URL after each successful item upsert, prompting the receiver to fetch the COG from S3. Failures are logged and swallowed. Provide --post-stac-webhook-token-secret-name to attach a bearer token.

Dependencies

Python 3.11+, GDAL, rasterio, rioxarray, pystac, rio-stac, boto3, maap-py.

License

Apache 2.0

About

Testbed

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors