Skip to content

Fix CANCEL_NEW strategy not cancelling late runs when limit is full#21086

Merged
zzstoatzz merged 1 commit intomainfrom
devin/1773241664-fix-cancel-new-late-runs
Mar 11, 2026
Merged

Fix CANCEL_NEW strategy not cancelling late runs when limit is full#21086
zzstoatzz merged 1 commit intomainfrom
devin/1773241664-fix-cancel-new-late-runs

Conversation

@devin-ai-integration
Copy link
Contributor

Closes #21060
Supersedes #21068

Summary

The CANCEL_NEW deployment concurrency strategy was only enforced by SecureFlowConcurrencySlots, which fires on * → PENDING transitions. When a runner/worker is at capacity, scheduled runs never reach PENDING — they stay Scheduled until the MarkLateRuns service marks them Late. These runs accumulated in a Late state instead of being cancelled.

This PR adds a new orchestration rule EnforceDeploymentConcurrencyOnLate to MarkLateRunsPolicy that intercepts Scheduled → Late transitions and rejects them with a Cancelled state when the deployment uses CANCEL_NEW and the concurrency limit is full.

Why an orchestration rule instead of a service-level check

PR #21068 solved this by adding the concurrency check directly into the MarkLateRuns service with force=True. This PR takes a different approach:

  • Keeps concurrency enforcement in the orchestration layer — alongside SecureFlowConcurrencySlots and the other deployment concurrency rules in core_policy.py
  • Preserves instrumentationMarkLateRunsPolicy includes InstrumentFlowRunStateTransitions, so the cancellation is properly tracked. The force=True path in Fix CANCEL_NEW strategy not cancelling late runs #21068 bypasses all orchestration rules including instrumentation.
  • Keeps the service focusedMarkLateRuns just proposes Late; the orchestration layer decides what actually happens
  • Uses reject_transition — the same mechanism SecureFlowConcurrencySlots already uses for CANCEL_NEW (line 641)

Key review points

  • The new rule uses FROM_STATES = {SCHEDULED}, TO_STATES = {SCHEDULED} because Late is a SCHEDULED state type with name "Late" — same pattern as EnsureOnlyScheduledFlowsMarkedLate
  • The deployment/concurrency lookup mirrors SecureFlowConcurrencySlots (including the isinstance(concurrency_options, dict) coercion)
  • reject_transition with a Cancelled state crosses state types (SCHEDULED → CANCELLED), which works because rejections bypass FROM/TO filtering on subsequent rules

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json. (N/A)
  • If this pull request adds functions or classes, it includes helpful docstrings.

Devin session | Requested by @zzstoatzz

Move deployment concurrency enforcement from the MarkLateRuns service into
a new orchestration rule (EnforceDeploymentConcurrencyOnLate) that is part
of MarkLateRunsPolicy. This keeps the CANCEL_NEW logic in the orchestration
layer alongside SecureFlowConcurrencySlots rather than embedding it in the
service, and preserves instrumentation via InstrumentFlowRunStateTransitions.

Closes #21060

Co-authored-by: Nate Nowack <nate@prefect.io>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added the bug Something isn't working label Mar 11, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 11, 2026

Merging this PR will not alter performance

✅ 2 untouched benchmarks


Comparing devin/1773241664-fix-cancel-new-late-runs (8cddfa0) with main (8e4f3fe)

Open in CodSpeed

@zzstoatzz
Copy link
Collaborator

what do you think about this @desertaxle?

Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

Copy link
Collaborator

@zzstoatzz zzstoatzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@zzstoatzz zzstoatzz merged commit c796a54 into main Mar 11, 2026
78 of 79 checks passed
@zzstoatzz zzstoatzz deleted the devin/1773241664-fix-cancel-new-late-runs branch March 11, 2026 16:20
haziqishere pushed a commit to haziqishere/prefect that referenced this pull request Mar 11, 2026
…refectHQ#21086)

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nate Nowack <nate@prefect.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CANCEL_NEW concurrency strategy does not cancel Late runs, causing queue buildup

2 participants