Fix duplicate task execution race condition by chrisguidry · Pull Request #209 · chrisguidry/docket

chrisguidry · 2025-11-25T19:52:31Z

Tasks without a Perpetual dependency could execute multiple times in parallel
because the known field is deleted when claim() is called, before the task
function runs. This opens a window where duplicate docket.add() calls with
the same key succeed.

The fix adds a check for state='running' in the schedule Lua script's
deduplication logic. Now a task "exists" if either the known field exists
(scheduled/queued) OR the state is running (currently executing).

Also updated the self-perpetuating test patterns to use replace() instead of
add() for rescheduling, which is the correct pattern (matching how Perpetual
tasks work).

🤖 Generated with Claude Code

Tasks without a Perpetual dependency could execute multiple times in parallel because the `known` field is deleted when `claim()` is called, before the task function runs. This opens a window where duplicate `docket.add()` calls with the same key succeed. The fix adds a check for `state='running'` in the schedule Lua script's deduplication logic. Now a task "exists" if either the `known` field exists (scheduled/queued) OR the `state` is `running` (currently executing). Also updated the self-perpetuating test patterns to use `replace()` instead of `add()` for rescheduling, which is the correct pattern (matching how Perpetual tasks work). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

claude · 2025-11-25T19:52:47Z

Claude finished @chrisguidry's task —— View job

Review Complete

Review Checklist

Analyze the race condition fix in execution.py
Review test coverage in test_worker.py
Check test updates in test_fundamentals.py
Assess code quality and best practices
Check for potential bugs or issues
Evaluate performance considerations
Review security implications
Verify 100% test coverage requirement

Summary

This PR effectively fixes a critical race condition where tasks without a Perpetual dependency could execute multiple times in parallel. The fix is minimal, well-targeted, and includes comprehensive test coverage.

Strengths

✅ Root Cause Analysis

The PR correctly identifies the race condition: the known field is deleted when claim() is called (lines 733 in execution.py), creating a window where duplicate add() calls can succeed while the original task is still running.

✅ Minimal and Effective Fix

The solution adds a check for state='running' in the schedule Lua script (lines 598-601), ensuring a task "exists" if either:

The known field exists (task is scheduled/queued), OR
The state is running (task is currently executing)

This elegantly closes the race window without major architectural changes.

✅ Comprehensive Test Coverage

test_duplicate_execution_race_condition_non_perpetual_task (lines 707-767) thoroughly reproduces the bug with proper timeline documentation
Tests both default TTL and execution_ttl=0 scenarios to ensure the fix doesn't depend on volatile result keys
Uses concurrent workers and timing controls to reliably trigger the race condition

✅ Pattern Updates

The test updates in test_fundamentals.py correctly switch from add() to replace() for self-perpetuating tasks (lines 908, 934, 960, 966), aligning with how Perpetual tasks work internally.

Code Quality

✅ Clean Implementation

The Lua script modification is minimal and focused
No performance overhead - just an additional field check
Maintains backwards compatibility with legacy key locations

✅ Documentation

Excellent inline comments explaining the race condition timeline
Clear test documentation of the bug scenario
Proper commit message explaining the issue and fix

Potential Concerns

⚠️ Minor: Legacy Code Cleanup

Lines 509-510, 576-580, 589-590, 602-604, 712-714, 735-736 contain TODO comments about removing legacy key support in v0.14.0. While not related to this fix, tracking these for future cleanup would be good.

⚠️ Edge Case: State Transitions

The fix assumes state='running' is reliably set before the known field is deleted. The atomic Lua script in claim() ensures this, but any future refactoring should maintain this invariant.

Performance Considerations

✅ No Performance Impact: The additional check is a simple string comparison in the already-executing Lua script. No additional Redis operations are introduced.

Security Considerations

✅ No Security Issues: The change doesn't introduce any new attack vectors or expose sensitive data.

Test Coverage

✅ 100% Coverage Maintained: The new test comprehensively covers the race condition scenario, and existing tests validate that the fix doesn't break normal task scheduling behavior.

Recommendation

APPROVE - This is a well-crafted fix for a critical bug. The solution is minimal, correct, and thoroughly tested. The switch to replace() for self-perpetuating patterns is the right approach and matches Perpetual task behavior.

github-actions · 2025-11-25T19:52:54Z

📚 Documentation has been built for this PR!

You can download the documentation directly here:
https://github.com/chrisguidry/docket/actions/runs/19682275839/artifacts/4678042892

codecov-commenter · 2025-11-25T19:53:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (a7c542b) to head (34835b8).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #209   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           48        48           
  Lines         7545      7567   +22     
  Branches       395       396    +1     
=========================================
+ Hits          7545      7567   +22

Flag	Coverage Δ
python-3.10	`100.00% <100.00%> (ø)`
python-3.11	`99.00% <96.15%> (-0.02%)`	⬇️
python-3.12	`100.00% <100.00%> (ø)`
python-3.13	`100.00% <100.00%> (ø)`
python-3.14	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/docket/execution.py	`100.00% <ø> (ø)`
tests/test_fundamentals.py	`100.00% <100.00%> (ø)`
tests/test_worker.py	`100.00% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chrisguidry merged commit e8f2981 into main Nov 25, 2025
25 checks passed

chrisguidry deleted the fix-duplicate-execution-race branch November 25, 2025 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicate task execution race condition#209

Fix duplicate task execution race condition#209
chrisguidry merged 1 commit intomainfrom
fix-duplicate-execution-race

chrisguidry commented Nov 25, 2025

Uh oh!

claude bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

codecov-commenter commented Nov 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chrisguidry commented Nov 25, 2025

Uh oh!

claude bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Complete

Review Checklist

Summary

Strengths

✅ Root Cause Analysis

✅ Minimal and Effective Fix

✅ Comprehensive Test Coverage

✅ Pattern Updates

Code Quality

✅ Clean Implementation

✅ Documentation

Potential Concerns

⚠️ Minor: Legacy Code Cleanup

⚠️ Edge Case: State Transitions

Performance Considerations

Security Considerations

Test Coverage

Recommendation

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

codecov-commenter commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Nov 25, 2025 •

edited

Loading

codecov-commenter commented Nov 25, 2025 •

edited

Loading