Skip to content

Fix duplicate task execution race condition#209

Merged
chrisguidry merged 1 commit intomainfrom
fix-duplicate-execution-race
Nov 25, 2025
Merged

Fix duplicate task execution race condition#209
chrisguidry merged 1 commit intomainfrom
fix-duplicate-execution-race

Conversation

@chrisguidry
Copy link
Copy Markdown
Owner

Tasks without a Perpetual dependency could execute multiple times in parallel
because the known field is deleted when claim() is called, before the task
function runs. This opens a window where duplicate docket.add() calls with
the same key succeed.

The fix adds a check for state='running' in the schedule Lua script's
deduplication logic. Now a task "exists" if either the known field exists
(scheduled/queued) OR the state is running (currently executing).

Also updated the self-perpetuating test patterns to use replace() instead of
add() for rescheduling, which is the correct pattern (matching how Perpetual
tasks work).

🤖 Generated with Claude Code

Tasks without a Perpetual dependency could execute multiple times in parallel
because the `known` field is deleted when `claim()` is called, before the task
function runs. This opens a window where duplicate `docket.add()` calls with
the same key succeed.

The fix adds a check for `state='running'` in the schedule Lua script's
deduplication logic. Now a task "exists" if either the `known` field exists
(scheduled/queued) OR the `state` is `running` (currently executing).

Also updated the self-perpetuating test patterns to use `replace()` instead of
`add()` for rescheduling, which is the correct pattern (matching how Perpetual
tasks work).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Nov 25, 2025

Claude finished @chrisguidry's task —— View job


Review Complete

Review Checklist

  • Analyze the race condition fix in execution.py
  • Review test coverage in test_worker.py
  • Check test updates in test_fundamentals.py
  • Assess code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security implications
  • Verify 100% test coverage requirement

Summary

This PR effectively fixes a critical race condition where tasks without a Perpetual dependency could execute multiple times in parallel. The fix is minimal, well-targeted, and includes comprehensive test coverage.

Strengths

✅ Root Cause Analysis

The PR correctly identifies the race condition: the known field is deleted when claim() is called (lines 733 in execution.py), creating a window where duplicate add() calls can succeed while the original task is still running.

✅ Minimal and Effective Fix

The solution adds a check for state='running' in the schedule Lua script (lines 598-601), ensuring a task "exists" if either:

  • The known field exists (task is scheduled/queued), OR
  • The state is running (task is currently executing)

This elegantly closes the race window without major architectural changes.

✅ Comprehensive Test Coverage

  • test_duplicate_execution_race_condition_non_perpetual_task (lines 707-767) thoroughly reproduces the bug with proper timeline documentation
  • Tests both default TTL and execution_ttl=0 scenarios to ensure the fix doesn't depend on volatile result keys
  • Uses concurrent workers and timing controls to reliably trigger the race condition

✅ Pattern Updates

The test updates in test_fundamentals.py correctly switch from add() to replace() for self-perpetuating tasks (lines 908, 934, 960, 966), aligning with how Perpetual tasks work internally.

Code Quality

✅ Clean Implementation

  • The Lua script modification is minimal and focused
  • No performance overhead - just an additional field check
  • Maintains backwards compatibility with legacy key locations

✅ Documentation

  • Excellent inline comments explaining the race condition timeline
  • Clear test documentation of the bug scenario
  • Proper commit message explaining the issue and fix

Potential Concerns

⚠️ Minor: Legacy Code Cleanup

Lines 509-510, 576-580, 589-590, 602-604, 712-714, 735-736 contain TODO comments about removing legacy key support in v0.14.0. While not related to this fix, tracking these for future cleanup would be good.

⚠️ Edge Case: State Transitions

The fix assumes state='running' is reliably set before the known field is deleted. The atomic Lua script in claim() ensures this, but any future refactoring should maintain this invariant.

Performance Considerations

No Performance Impact: The additional check is a simple string comparison in the already-executing Lua script. No additional Redis operations are introduced.

Security Considerations

No Security Issues: The change doesn't introduce any new attack vectors or expose sensitive data.

Test Coverage

100% Coverage Maintained: The new test comprehensively covers the race condition scenario, and existing tests validate that the fix doesn't break normal task scheduling behavior.

Recommendation

APPROVE - This is a well-crafted fix for a critical bug. The solution is minimal, correct, and thoroughly tested. The switch to replace() for self-perpetuating patterns is the right approach and matches Perpetual task behavior.

@github-actions
Copy link
Copy Markdown

📚 Documentation has been built for this PR!

You can download the documentation directly here:
https://github.com/chrisguidry/docket/actions/runs/19682275839/artifacts/4678042892

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (a7c542b) to head (34835b8).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #209   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           48        48           
  Lines         7545      7567   +22     
  Branches       395       396    +1     
=========================================
+ Hits          7545      7567   +22     
Flag Coverage Δ
python-3.10 100.00% <100.00%> (ø)
python-3.11 99.00% <96.15%> (-0.02%) ⬇️
python-3.12 100.00% <100.00%> (ø)
python-3.13 100.00% <100.00%> (ø)
python-3.14 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/docket/execution.py 100.00% <ø> (ø)
tests/test_fundamentals.py 100.00% <100.00%> (ø)
tests/test_worker.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chrisguidry chrisguidry merged commit e8f2981 into main Nov 25, 2025
25 checks passed
@chrisguidry chrisguidry deleted the fix-duplicate-execution-race branch November 25, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants