Sibling pipeline to czdt-iss-ingest-job, using Frozon's processing logic.
First stage implemented: TIFF → Cloud Optimized GeoTIFF (COG) using the
low-memory streaming approach from convert_to_cog_lowmem.py.
S3 TIFF input(s) → COG (gdal_translate, low-memory) → S3 upload → STAC cataloging
- frozon-iss-ingest-cog — per-acquisition-day DPS worker. Re-queries
CMR for its date, downloads via EDL, mosaics multi-UTM granules to
EPSG:3413 with
gdalwarp, runsgdal_translate -of COG, uploads the COG, and emits a STAC catalog. - GH Actions cron (
scripts/submit_cog_pipeline.py) — runs daily. Discovers acquisition dates via per-day CMR.hits(), drops the newest plus any below-threshold partial dates, pre-checks S3, and submits one worker per missing complete date. Also runs a retention sweep that keeps the N most recent date folders in S3.
The cron is the production path. For manual triggers see
.github/workflows/daily-cog-ingest.yml (workflow_dispatch).
python src/ingest_cog.py \
--input-s3 "s3://source-bucket/path/file.tif" \
--collection-id "C123456789-FROZON" \
--s3-bucket "target-bucket" \
--s3-prefix "data/cogs" \
--role-arn "arn:aws:iam::123456789:role/S3AccessRole" \
--max-memory 512 --blocksize 512 --compress DEFLATE| Flag | Default | Notes |
|---|---|---|
--max-memory |
512 | GDAL_CACHEMAX MB for the conversion subprocess |
--blocksize |
512 | 256 / 512 / 1024 — drop to 256 for tighter memory |
--compress |
DEFLATE | DEFLATE / LZW / JPEG / WEBP / NONE |
--resampling |
nearest | base data resampling |
--overview-resampling |
average | overview pyramid resampling |
--filter |
— | glob applied to file basenames during prefix listing |
--limit |
— | cap discovered inputs (testing) |
--overwrite |
off | replace existing S3 outputs |
conda env update -f environment.yml
conda activate ingestOne algorithm registers with MAAP DPS:
| Algorithm | Build / run | YAML |
|---|---|---|
frozon-iss-ingest-cog:v2 |
.maap/ingest-cog/ |
.maap/sample-algo-configs/frozon-iss-ingest-cog.yml |
The GH Actions cron submits jobs against algo_id="frozon-iss-ingest-cog"
on the maap-dps-worker-32vcpu-64gb queue (full-Arctic daily mosaics
need the high-vCPU/RAM worker). The orchestrator-as-DPS-job that used
to live under .maap/cog-pipeline/ has been retired; orchestration
runs directly in the GH Actions runner now.
- Cloud Optimized GeoTIFFs in
s3://<s3-bucket>/<s3-prefix>/<collection-id>/YYYY/MM/DD/<input>_COG.tif— date is derived from the STAC item'sdatetime(read from TIFF tags byrio_stac; falls back to current UTC time when absent). - STAC Items upserted into MMGIS under
<collection-id>. Reruns merge into the existing collection by default (--upsert); pass--no-upsertto fail on item conflicts. - Product Notifications via the CMSS logger.
- Optional post-STAC webhook — when
--post-stac-webhook-urlis set, the orchestrator POSTs{event, collection_id, item_id, asset_uri}to that URL after each successful item upsert, prompting the receiver to fetch the COG from S3. Failures are logged and swallowed. Provide--post-stac-webhook-token-secret-nameto attach a bearer token.
Python 3.11+, GDAL, rasterio, rioxarray, pystac, rio-stac, boto3, maap-py.
Apache 2.0