Skip to content

[iris] Fix scheduling loop: filter reservation jobs at SQL level#4179

Merged
rjpower merged 1 commit intomainfrom
rjpower/20260326-iris-opt
Mar 26, 2026
Merged

[iris] Fix scheduling loop: filter reservation jobs at SQL level#4179
rjpower merged 1 commit intomainfrom
rjpower/20260326-iris-opt

Conversation

@rjpower
Copy link
Copy Markdown
Collaborator

@rjpower rjpower commented Mar 26, 2026

Add has_reservation column to jobs table so _claim_workers_for_reservations
can filter at SQL level instead of deserializing all active job protobufs.
On production DB (1,340 active jobs, 218MB of request_proto): p50 drops
from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms. Adds before/after
comparison to the benchmark script.

Add has_reservation column to jobs table so _claim_workers_for_reservations
can filter at SQL level instead of deserializing all active job protobufs.
On production DB (1,340 active jobs, 218MB of request_proto): p50 drops
from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms.
@rjpower rjpower added the agent-generated Created by automation/agent label Mar 26, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 26, 2026

Claude finished @rjpower's task in 1m 37s —— View job


PR Review: [iris] Fix scheduling loop: filter reservation jobs at SQL level

  • Read and analyze all changed files
  • Check migration correctness
  • Check query and controller logic
  • Post review

Adds a denormalized has_reservation column to the jobs table so _claim_workers_for_reservations can filter at SQL level instead of deserializing all active job protobufs. The implementation matches the stated objective and the performance claim is credible.

Findings:

  • Reservation holder child job INSERT (transitions.py:667-686) omits has_reservation: The child "reservation holder" job INSERT at line 667 doesn't include has_reservation in its column list. This is correct — the holder job's LaunchJobRequest is constructed without a reservation field, so has_reservation=0 (the DEFAULT) is the right value. Just noting this for reviewers since the column is explicitly included in the parent INSERT but silently defaulted here.

  • _jobs_with_reservations uses snapshot._fetchall (private method): The new function calls snapshot._fetchall() directly. This is consistent with how _tasks_with_attempts (already imported in controller.py) operates — it also builds raw SQL against snapshot internals. No issue, just noting the pattern.

  • Migration backfill is idempotent: The migration checks for existing has_reservation=1 rows before backfilling, and uses IF NOT EXISTS for the index. Restart-safe.

  • Partial index is well-designed: CREATE INDEX ... ON jobs(has_reservation, state) WHERE has_reservation = 1 will keep the index small since most jobs won't have reservations.

No bugs, correctness issues, or guideline violations found. Clean PR.

@rjpower rjpower requested a review from yonromai March 26, 2026 17:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a18b6333ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@rjpower rjpower merged commit 1e301c8 into main Mar 26, 2026
50 checks passed
@rjpower rjpower deleted the rjpower/20260326-iris-opt branch March 26, 2026 17:55
ravwojdyla pushed a commit that referenced this pull request Mar 26, 2026
Add has_reservation column to jobs table so
_claim_workers_for_reservations
can filter at SQL level instead of deserializing all active job
protobufs.
On production DB (1,340 active jobs, 218MB of request_proto): p50 drops
from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms. Adds before/after
comparison to the benchmark script.
rjpower added a commit that referenced this pull request Mar 27, 2026
Delete Table, Column, Predicate, SelectExpr, Order, Join, JoinedQuery
and all 16 module-level table constants from db.py. Replace ~100
callsites across 8 production files and 12 test files with raw SQL +
decode_rows/decode_one helpers. Every query is now visible as plain SQL
at its callsite. Net -556 lines.

Follows #4179 which fixed the immediate scheduling perf issue.

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Russell Power <rjpower@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Helw150 pushed a commit that referenced this pull request Apr 8, 2026
Add has_reservation column to jobs table so
_claim_workers_for_reservations
can filter at SQL level instead of deserializing all active job
protobufs.
On production DB (1,340 active jobs, 218MB of request_proto): p50 drops
from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms. Adds before/after
comparison to the benchmark script.
Helw150 pushed a commit that referenced this pull request Apr 8, 2026
Delete Table, Column, Predicate, SelectExpr, Order, Join, JoinedQuery
and all 16 module-level table constants from db.py. Replace ~100
callsites across 8 production files and 12 test files with raw SQL +
decode_rows/decode_one helpers. Every query is now visible as plain SQL
at its callsite. Net -556 lines.

Follows #4179 which fixed the immediate scheduling perf issue.

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Russell Power <rjpower@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants