Skip to content

fix: broadcast job state correctly#30

Merged
peasee merged 5 commits into
spiceai-52.5from
peasee/260421-broadcast-job-state
Apr 22, 2026
Merged

fix: broadcast job state correctly#30
peasee merged 5 commits into
spiceai-52.5from
peasee/260421-broadcast-job-state

Conversation

@peasee
Copy link
Copy Markdown

@peasee peasee commented Apr 21, 2026

  • Broadcast job state AFTER storing the actual job status in Ballista job store. The previous method could result in a race where we broadcast a job as COMPLETE when we have not yet updated it in the Ballista job store. A consumer would then look up the job from the broadcast notification, find it is in fact not COMPLETE, then wait some time before checking the job again

@peasee peasee self-assigned this Apr 21, 2026
Copilot AI review requested due to automatic review settings April 21, 2026 07:49
@peasee peasee added the bug Something isn't working label Apr 21, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reorders job state broadcasting in the scheduler so terminal JobStateEvents (Completed/Failed/Cancelled) are emitted after attempting to persist the corresponding terminal status in the Ballista job store, reducing a race where subscribers observe a terminal event before the store reflects it.

Changes:

  • Move “Completed” broadcast to occur after task_manager.succeed_job is invoked.
  • Move “Failed” broadcast to occur after task_manager.abort_job is invoked (and clone fail_message to avoid move).
  • Move “Cancelled” broadcast to occur after task_manager.cancel_job is invoked.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Outdated
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Outdated
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Outdated
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Outdated
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Outdated
Copilot AI review requested due to automatic review settings April 22, 2026 00:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts scheduler job-state broadcasting to avoid races where subscribers observe a terminal JobStateEvent before the terminal JobStatus is persisted in the Ballista job store.

Changes:

  • Reordered terminal job-state broadcasts to occur after persisting succeed_job / abort_job / cancel_job outcomes.
  • Simplified several map iterations by switching from .iter() with unused keys to .values().

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Persist terminal job status before broadcasting Completed/Failed/Cancelled events to subscribers.
ballista/scheduler/src/state/executor_manager.rs Use .values() when iterating executor heartbeats where keys are unused.
ballista/scheduler/src/state/execution_graph.rs Use .values() when iterating stages where keys are unused in running_tasks().
ballista/scheduler/src/state/aqe/mod.rs Use .values() when iterating stages where keys are unused in running_tasks().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Copilot AI review requested due to automatic review settings April 22, 2026 01:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate a race between job-state broadcasts and persisted job status by broadcasting terminal job state only after writing the final status to the Ballista job store, so consumers can read the correct terminal state immediately.

Changes:

  • Reordered terminal job-state broadcasts (Failed, Completed, Cancelled) to occur after persisting job terminal status in the task manager.
  • Adjusted cancellation/task-cancellation event posting behavior and error handling in the scheduler event loop.
  • Simplified several HashMap iterations from .iter() to .values() where keys are unused.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
ballista/scheduler/src/state/executor_manager.rs Iterates heartbeats via .values() instead of .iter() where executor IDs aren’t needed.
ballista/scheduler/src/state/execution_graph.rs Iterates stages via .values() for running task enumeration.
ballista/scheduler/src/state/aqe/mod.rs Same .values() simplification for adaptive execution graph running task enumeration.
ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs Reorders terminal-state persistence vs broadcast; modifies cancellation event posting/error handling and cleanup timing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
Comment thread ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs
@peasee peasee merged commit 086bd0e into spiceai-52.5 Apr 22, 2026
33 checks passed
@peasee peasee deleted the peasee/260421-broadcast-job-state branch April 22, 2026 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-upstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants