[Jobs] Record submitted_at when a job is first queued#9980
Open
claude[bot] wants to merge 3 commits into
Open
Conversation
Previously `submitted_at` was written only at the PENDING -> STARTING transition. As a result: - No-op managed jobs (tasks with empty run commands) short-circuit straight to SUCCEEDED without ever passing through STARTING, so their `submitted_at` stayed NULL and the "Submitted" column showed blank in the dashboard and `sky jobs queue`. - Jobs that backed off and re-entered STARTING had their submission time overwritten with the latest attempt's timestamp, so "Submitted" drifted forward. Record `submitted_at` once, at queue-entry time, in `set_pending`. The STARTING transition now only backfills it via COALESCE (for legacy rows that predate this change), so it can no longer clobber the original value. `set_pending_cancelled` clears `submitted_at` back to NULL so a job cancelled while still PENDING (which never started) keeps no submission time, preserving the submitted-window filter's contract. Also harden the managed-jobs ORDER BY on `submitted_at` with NULLS LAST so any remaining NULL rows sort deterministically regardless of DB engine, for both ascending and descending order. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MMCFx7eBmLpkAnZcL9sLU1
cg505
reviewed
Jun 29, 2026
Comment on lines
+1095
to
+1101
| # A job cancelled while still PENDING never actually | ||
| # started, so it has no meaningful submission time. Clear | ||
| # the queue-entry `submitted_at` recorded by `set_pending` | ||
| # so such terminal rows stay NULL -- this preserves the | ||
| # submitted-window filter's contract that a terminal job | ||
| # with no submission time is excluded from the window. | ||
| spot_table.c.submitted_at: None, |
Collaborator
There was a problem hiding this comment.
I think we can probably skip this
cg505
reviewed
Jun 29, 2026
A job that was queued and then cancelled while still PENDING was still submitted, so it should keep the submitted_at recorded at queue-entry (set_pending). Remove the NULL-reset that set_pending_cancelled was doing, reverting it to its prior behavior and making submitted_at consistent across cancelled / failed-before-start jobs. Also: - Update the submitted-window filter comment: submitted_at is now set at queue-entry, so the NULL handling branch only applies to legacy (pre-upgrade) rows. Behavior for legacy NULL rows is unchanged. - Add a TODO marking the COALESCE(submitted_at, submit_time) backfill in set_starting_async for removal ~2 minor versions out, once legacy rows without a submitted_at no longer exist. - Update tests to match: cancelled-while-PENDING now retains submitted_at and is included by the submitted-window filter; NULL-row fixtures now simulate legacy pre-upgrade rows explicitly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MMCFx7eBmLpkAnZcL9sLU1
Contributor
Author
|
Addressed both review comments:
Also updated the now-stale window-filter comment to note that the NULL-handling branch only applies to legacy (pre-upgrade) rows, and updated the unit tests to match the new semantics. |
2 tasks
cg505
reviewed
Jun 29, 2026
cg505
reviewed
Jun 29, 2026
- Correct the COALESCE backcompat removal marker in set_starting_async to reference a real release (v0.14.0, ~2 minors out) instead of the nonexistent v1.2. - Drop the submitted_at-specific NULLS LAST ordering hardening. After this change all newly queued jobs record a non-NULL submitted_at, so NULLs only exist for transient legacy pre-upgrade rows, and the dashboard default sort is Job ID anyway. The submitted_at sort path now matches every other sort column. Remove the now-irrelevant NULLS-LAST ordering tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MMCFx7eBmLpkAnZcL9sLU1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Record a managed job's
submitted_atwhen its task row is first inserted atPENDING(queue-entry time), instead of at thePENDING -> STARTINGtransition.Before:
submitted_atwas written only when a task movedPENDING -> STARTING. Two cases broke as a result:runcommands) short-circuit straight toSUCCEEDEDwithout ever passing throughSTARTING. Theirsubmitted_atstayedNULL, so the "Submitted" column rendered blank in the dashboard and insky jobs queue.STARTINGhad itssubmitted_atoverwritten with the latest attempt's timestamp, so the displayed submission time drifted forward over time.After:
submitted_atis recorded exactly once, at the moment the job enters the queue, so both no-op jobs and jobs that back off show a stable, correct submission time.How
set_pending(sky/jobs/state.py) now writessubmitted_at = time.time()when it inserts the task row.set_starting_async(sky/jobs/state.py) no longer unconditionally writessubmitted_at; it now only backfills it viaCOALESCE(submitted_at, submit_time), so legacy rows created before this change still get a value while existing values are never clobbered on a (re-)entry intoSTARTING.set_pending_cancelled(sky/jobs/state.py) does not touchsubmitted_at. A job cancelled while stillPENDINGkeeps thesubmitted_atrecorded at queue-entry — it was still submitted, so this is consistent with jobs that fail before ever starting.Secondary change
The managed-jobs
ORDER BY submitted_atis hardened withNULLS LAST(both the paginated subquery and the final query), so any remainingNULLrows sort deterministically regardless of the DB engine's default NULL ordering, in both ascending and descending order.Tests
bash format.sh(yapf + isort clean, pylint 10.00/10 on changed files)tests/unit_tests/test_sky/jobs/test_jobs_state.py:PENDINGjob has a non-NULLsubmitted_atsubmitted_atSTARTINGtransition does not clobber an existingsubmitted_at, but backfills a NULL onesubmitted_atsubmitted_atplaces NULL rows last for both asc and desc (paginated and non-paginated)tests/unit_tests/test_sky/skylet/test_managed_jobs_service.pyto reflect that a still-PENDING job now reports a non-NULLsubmitted_at.pytest tests/unit_tests/test_sky/jobs/test_jobs_state.py tests/unit_tests/test_sky/skylet/test_managed_jobs_service.py(116 passed)🤖 Generated with Claude Code