fix: prevent duplicate scheduled runs via deployment schedule lock + ID preservation#20921
Draft
devin-ai-integration[bot] wants to merge 7 commits intomainfrom
Draft
Conversation
…cate runs When a deployment is updated without schedule slugs, the server previously deleted all existing schedules and recreated them with new UUIDs. This changed the idempotency key for auto-scheduled flow runs (which includes the schedule ID), causing the scheduler to create duplicate runs when a race condition occurred between the scheduler reading stale schedule IDs and the deployment update creating new ones. This fix updates existing schedules in place by position rather than deleting and recreating them, preserving their IDs and keeping the idempotency keys stable across redeployments. Co-Authored-By: Nate Nowack <nate@prefect.io>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
The test previously asserted that schedule IDs change after update (the old delete-and-recreate behavior). Updated to validate the new behavior: - Schedule IDs are preserved (order-independent set comparison) - Schedule configs are correctly applied (order-independent lookup) Also fixed a pre-existing bug where the original assertion compared UUID objects to strings, which always evaluated to not-equal regardless of actual ID values. Co-Authored-By: Nate Nowack <nate@prefect.io>
Adds a distributed lock around create_deployment and update_deployment API handlers, matching Nebula's approach. Uses Redis lock when prefect-redis is configured as the messaging broker (HA deployments), falls back to per-deployment asyncio.Lock for single-server setups. Combined with the model-layer schedule ID preservation from the previous commit, this addresses both: 1. Concurrent update races (lock prevents interleaving) 2. Scheduler-vs-update races (preserved IDs keep idempotency keys stable) Co-Authored-By: Nate Nowack <nate@prefect.io>
Co-Authored-By: Nate Nowack <nate@prefect.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR fixes
This fixes duplicate auto-scheduled flow runs caused by deployment schedule races.
Root cause: when schedules were replaced, schedule IDs changed. Since scheduler idempotency keys include
deployment_schedule_id, races between scheduling and deployment updates could produce duplicates.Approach (short)
PATCH /deployments/{id}.POST /deploymentsupsert path as well (important forflow.serve/deployment.applyflows).POSTandPATCHcontend on the same identity key.503instead of silent in-memory fallback).Locking behavior
{flow_id}:{deployment_name}.PREFECT_MESSAGING_BROKERcontainsprefect_redis): use Redis lock withblocking=False, 30s timeout.asyncio.Lockwith fail-fast409behavior.503to avoid unsafe process-local fallback in distributed setups.No-slug schedule reconciliation
For both
PATCHupdates andPOSTupserts:This stabilizes schedule IDs across redeploys and keeps scheduler idempotency keys stable.
Test coverage added/updated
API tests (
tests/server/orchestration/api/test_deployments.py):test_create_deployment_upsert_preserves_schedule_ids_without_slugsfor POST upsert path.test_post_and_patch_contend_on_same_schedule_lock_key(cross-route lock contention).test_deployment_schedule_lock_fails_closed_when_redis_unavailable(503behavior).Integration test (
integration-tests/test_schedule_statefulness.py):test_schedule_id_stability_for_no_slug_redeploy.Known limitations / follow-ups
Link to Devin Session
Requested by: @zzstoatzz