Skip to content

fix: resolve executor timeout due to lock contention in poll_work#13

Merged
phillipleblanc merged 1 commit into
spiceai-51from
phillip/250127-fix-deadlock
Jan 26, 2026
Merged

fix: resolve executor timeout due to lock contention in poll_work#13
phillipleblanc merged 1 commit into
spiceai-51from
phillip/250127-fix-deadlock

Conversation

@phillipleblanc
Copy link
Copy Markdown

The metrics instrumentation PR introduced a hot-path call to total_pending_tasks() after every scheduler event. This method acquires read locks on all execution graphs, which combined with concurrent poll_work calls (acquiring write locks), caused deadlock-like behavior due to tokio RwLock fairness semantics.

Changes:

  • Remove total_pending_tasks() from event loop hot path
  • Add periodic background task (5s interval) for pending_tasks metric
  • Add task_binding_lock mutex to serialize poll_work calls
  • Remove diagnostic logging and unnecessary timeouts
  • Clean up unused imports

@phillipleblanc phillipleblanc self-assigned this Jan 26, 2026
@phillipleblanc phillipleblanc added the bug Something isn't working label Jan 26, 2026
@phillipleblanc phillipleblanc force-pushed the phillip/250127-fix-deadlock branch from 89cf761 to 3b46cb9 Compare January 26, 2026 19:57
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Outdated
@phillipleblanc phillipleblanc force-pushed the phillip/250127-fix-deadlock branch 2 times, most recently from d5f1983 to c45c17b Compare January 26, 2026 20:03
…nt loop

The metrics instrumentation PR introduced a hot-path call to total_pending_tasks()
after every scheduler event. This method acquires read locks on all execution graphs,
which combined with concurrent poll_work calls (acquiring write locks), caused
deadlock-like behavior due to tokio RwLock fairness semantics.

Changes:
- Remove total_pending_tasks() from event loop hot path
- Add periodic background task (5s interval) for pending_tasks metric
- Add 100ms timeout to lock acquisition in total_pending_tasks() as safety net
- Clean up unused imports
@phillipleblanc phillipleblanc force-pushed the phillip/250127-fix-deadlock branch from c45c17b to 0c567ac Compare January 26, 2026 23:34
@phillipleblanc phillipleblanc merged commit 626181f into spiceai-51 Jan 26, 2026
29 checks passed
@phillipleblanc phillipleblanc deleted the phillip/250127-fix-deadlock branch January 26, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants