DM-54533 Replace enqueue webhook with Kafka consumer#80
Merged
Conversation
- Drop Flask /notify endpoint, gunicorn entrypoint, and NOTIFICATION_SECRET. - Consume RGW S3 ObjectCreated notifications (plain JSON, no Avro/SASL -- matches the pattern in lsst-dm/prompt_processing activator.py) via an aiokafka.AIOKafkaConsumer driven by asyncio.run(main()). - Add object_names_from_notification() that filters ObjectCreated:* events and maps records to "<profile><bucket>/<urldecoded key>" for Info.from_path. - Idempotent enqueue: SET NX EX ENQ:<path> guard before lpush so Kafka redelivery (rebalance, pod restart) and operator re-trigger tooling do not double-enqueue. Skipped duplicates log "Skipping duplicate enqueue". The existing "Enqueued <path> to <bucket>" line is preserved so existing Loki gap-detection scripts continue to parse cleanly. - At-least-once: enable_auto_commit=False with a two-stage try block -- parse failures (malformed JSON, tombstones) are committed as poison pills; process failures (Redis errors) leave the offset uncommitted so the message is redelivered on next poll or after pod restart. - Dockerfile.enqueue: drop gunicorn/flask, add aiokafka; entrypoint is now a plain Python process. Required env: REDIS_HOST REDIS_PASSWORD KAFKA_CLUSTER KAFKA_TOPIC. Optional: DATASET_REGEXP PROFILE KAFKA_GROUP_ID KAFKA_OFFSET_RESET. Made-with: Cursor
The enqueue-side dedupe key used to inherit FILE_RETENTION (7 days), which is longer than the Kafka topic retention and longer than the operator workflow needs. Anything Kafka could possibly redeliver is bounded by the topic retention, and operators occasionally need to force a re-enqueue of a path that recently failed ingest -- waiting a week is too long. Introduce a dedicated ENQUEUE_DEDUPE_TTL = 24h so the marker decays on its own within a day; operators can still force an immediate re-enqueue by deleting the ENQ:<path> key (the manual trigger script in usdf-embargo-deploy grows a --force flag that does this via `kubectl exec redis-0 -- redis-cli DEL`). Co-authored-by: Cursor <cursoragent@cursor.com>
ctslater
approved these changes
May 26, 2026
ctslater
left a comment
Member
There was a problem hiding this comment.
I like this a lot, the locking scheme is very simple. I just have one documentation request.
Contributor
Author
|
Merging it after reviews. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.