Skip to content

[Jobs] Record submitted_at when a job is first queued#9980

Open
claude[bot] wants to merge 3 commits into
masterfrom
fix-noop-job-submitted-timestamp
Open

[Jobs] Record submitted_at when a job is first queued#9980
claude[bot] wants to merge 3 commits into
masterfrom
fix-noop-job-submitted-timestamp

Conversation

@claude

@claude claude Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Record a managed job's submitted_at when its task row is first inserted at PENDING (queue-entry time), instead of at the PENDING -> STARTING transition.

Before: submitted_at was written only when a task moved PENDING -> STARTING. Two cases broke as a result:

  • No-op jobs (tasks with empty run commands) short-circuit straight to SUCCEEDED without ever passing through STARTING. Their submitted_at stayed NULL, so the "Submitted" column rendered blank in the dashboard and in sky jobs queue.
  • Backoff/recovery re-entry: a job that backed off and re-entered STARTING had its submitted_at overwritten with the latest attempt's timestamp, so the displayed submission time drifted forward over time.

After: submitted_at is recorded exactly once, at the moment the job enters the queue, so both no-op jobs and jobs that back off show a stable, correct submission time.

How

  • set_pending (sky/jobs/state.py) now writes submitted_at = time.time() when it inserts the task row.
  • set_starting_async (sky/jobs/state.py) no longer unconditionally writes submitted_at; it now only backfills it via COALESCE(submitted_at, submit_time), so legacy rows created before this change still get a value while existing values are never clobbered on a (re-)entry into STARTING.
  • set_pending_cancelled (sky/jobs/state.py) does not touch submitted_at. A job cancelled while still PENDING keeps the submitted_at recorded at queue-entry — it was still submitted, so this is consistent with jobs that fail before ever starting.

Secondary change

The managed-jobs ORDER BY submitted_at is hardened with NULLS LAST (both the paginated subquery and the final query), so any remaining NULL rows sort deterministically regardless of the DB engine's default NULL ordering, in both ascending and descending order.

Tests

  • Code formatting: bash format.sh (yapf + isort clean, pylint 10.00/10 on changed files)
  • Added/updated unit tests in tests/unit_tests/test_sky/jobs/test_jobs_state.py:
    • a still-PENDING job has a non-NULL submitted_at
    • a no-op job (PENDING -> RUNNING -> SUCCEEDED, no STARTING) keeps its queue-entry submitted_at
    • the STARTING transition does not clobber an existing submitted_at, but backfills a NULL one
    • a job cancelled while PENDING keeps its queue-entry submitted_at
    • sorting by submitted_at places NULL rows last for both asc and desc (paginated and non-paginated)
  • Updated tests/unit_tests/test_sky/skylet/test_managed_jobs_service.py to reflect that a still-PENDING job now reports a non-NULL submitted_at.
  • Relevant individual tests: pytest tests/unit_tests/test_sky/jobs/test_jobs_state.py tests/unit_tests/test_sky/skylet/test_managed_jobs_service.py (116 passed)

🤖 Generated with Claude Code

Previously `submitted_at` was written only at the PENDING -> STARTING
transition. As a result:
- No-op managed jobs (tasks with empty run commands) short-circuit
  straight to SUCCEEDED without ever passing through STARTING, so their
  `submitted_at` stayed NULL and the "Submitted" column showed blank in
  the dashboard and `sky jobs queue`.
- Jobs that backed off and re-entered STARTING had their submission time
  overwritten with the latest attempt's timestamp, so "Submitted"
  drifted forward.

Record `submitted_at` once, at queue-entry time, in `set_pending`. The
STARTING transition now only backfills it via COALESCE (for legacy rows
that predate this change), so it can no longer clobber the original
value. `set_pending_cancelled` clears `submitted_at` back to NULL so a
job cancelled while still PENDING (which never started) keeps no
submission time, preserving the submitted-window filter's contract.

Also harden the managed-jobs ORDER BY on `submitted_at` with NULLS LAST
so any remaining NULL rows sort deterministically regardless of DB
engine, for both ascending and descending order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MMCFx7eBmLpkAnZcL9sLU1
Comment thread sky/jobs/state.py Outdated
Comment on lines +1095 to +1101
# A job cancelled while still PENDING never actually
# started, so it has no meaningful submission time. Clear
# the queue-entry `submitted_at` recorded by `set_pending`
# so such terminal rows stay NULL -- this preserves the
# submitted-window filter's contract that a terminal job
# with no submission time is excluded from the window.
spot_table.c.submitted_at: None,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably skip this

Comment thread sky/jobs/state.py
A job that was queued and then cancelled while still PENDING was still
submitted, so it should keep the submitted_at recorded at queue-entry
(set_pending). Remove the NULL-reset that set_pending_cancelled was doing,
reverting it to its prior behavior and making submitted_at consistent across
cancelled / failed-before-start jobs.

Also:
- Update the submitted-window filter comment: submitted_at is now set at
  queue-entry, so the NULL handling branch only applies to legacy
  (pre-upgrade) rows. Behavior for legacy NULL rows is unchanged.
- Add a TODO marking the COALESCE(submitted_at, submit_time) backfill in
  set_starting_async for removal ~2 minor versions out, once legacy rows
  without a submitted_at no longer exist.
- Update tests to match: cancelled-while-PENDING now retains submitted_at and
  is included by the submitted-window filter; NULL-row fixtures now simulate
  legacy pre-upgrade rows explicitly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MMCFx7eBmLpkAnZcL9sLU1
@claude

claude Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Addressed both review comments:

  • Removed the submitted_at NULL-reset in set_pending_cancelled and reverted it to its prior behavior. A job that was queued and then cancelled while still PENDING was still submitted, so it now consistently keeps the submitted_at recorded at queue-entry (set_pending) — matching jobs that fail before starting. Such jobs are now included by the submitted-window filter rather than excluded.
  • Added a backcompat-removal TODO on the COALESCE(submitted_at, submit_time) in set_starting_async, marking it for removal ~2 minor versions out, once legacy rows without a submitted_at no longer exist.

Also updated the now-stale window-filter comment to note that the NULL-handling branch only applies to legacy (pre-upgrade) rows, and updated the unit tests to match the new semantics.

@cg505 cg505 marked this pull request as ready for review June 29, 2026 21:32
Comment thread sky/jobs/state.py Outdated
Comment thread sky/jobs/state.py Outdated
- Correct the COALESCE backcompat removal marker in set_starting_async
  to reference a real release (v0.14.0, ~2 minors out) instead of the
  nonexistent v1.2.
- Drop the submitted_at-specific NULLS LAST ordering hardening. After
  this change all newly queued jobs record a non-NULL submitted_at, so
  NULLs only exist for transient legacy pre-upgrade rows, and the
  dashboard default sort is Job ID anyway. The submitted_at sort path
  now matches every other sort column. Remove the now-irrelevant
  NULLS-LAST ordering tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MMCFx7eBmLpkAnZcL9sLU1
@cg505 cg505 requested a review from kevinmingtarja June 29, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants