[iris] Fix scheduling loop: filter reservation jobs at SQL level by rjpower · Pull Request #4179 · marin-community/marin

rjpower · 2026-03-26T17:49:00Z

Add has_reservation column to jobs table so _claim_workers_for_reservations
can filter at SQL level instead of deserializing all active job protobufs.
On production DB (1,340 active jobs, 218MB of request_proto): p50 drops
from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms. Adds before/after
comparison to the benchmark script.

Add has_reservation column to jobs table so _claim_workers_for_reservations can filter at SQL level instead of deserializing all active job protobufs. On production DB (1,340 active jobs, 218MB of request_proto): p50 drops from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms.

claude · 2026-03-26T17:49:18Z

Claude finished @rjpower's task in 1m 37s —— View job

PR Review: [iris] Fix scheduling loop: filter reservation jobs at SQL level

Read and analyze all changed files
Check migration correctness
Check query and controller logic
Post review

Adds a denormalized has_reservation column to the jobs table so _claim_workers_for_reservations can filter at SQL level instead of deserializing all active job protobufs. The implementation matches the stated objective and the performance claim is credible.

Findings:

Reservation holder child job INSERT (transitions.py:667-686) omits has_reservation: The child "reservation holder" job INSERT at line 667 doesn't include has_reservation in its column list. This is correct — the holder job's LaunchJobRequest is constructed without a reservation field, so has_reservation=0 (the DEFAULT) is the right value. Just noting this for reviewers since the column is explicitly included in the parent INSERT but silently defaulted here.
_jobs_with_reservations uses snapshot._fetchall (private method): The new function calls snapshot._fetchall() directly. This is consistent with how _tasks_with_attempts (already imported in controller.py) operates — it also builds raw SQL against snapshot internals. No issue, just noting the pattern.
Migration backfill is idempotent: The migration checks for existing has_reservation=1 rows before backfilling, and uses IF NOT EXISTS for the index. Restart-safe.
Partial index is well-designed: CREATE INDEX ... ON jobs(has_reservation, state) WHERE has_reservation = 1 will keep the index small since most jobs won't have reservations.

No bugs, correctness issues, or guideline violations found. Clean PR.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a18b6333ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Add has_reservation column to jobs table so _claim_workers_for_reservations can filter at SQL level instead of deserializing all active job protobufs. On production DB (1,340 active jobs, 218MB of request_proto): p50 drops from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms. Adds before/after comparison to the benchmark script.

Delete Table, Column, Predicate, SelectExpr, Order, Join, JoinedQuery and all 16 module-level table constants from db.py. Replace ~100 callsites across 8 production files and 12 test files with raw SQL + decode_rows/decode_one helpers. Every query is now visible as plain SQL at its callsite. Net -556 lines. Follows #4179 which fixed the immediate scheduling perf issue. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Russell Power <rjpower@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Add has_reservation column to jobs table so _claim_workers_for_reservations can filter at SQL level instead of deserializing all active job protobufs. On production DB (1,340 active jobs, 218MB of request_proto): p50 drops from 156ms to 0.0ms, p95 from 2,669ms to 0.1ms. Adds before/after comparison to the benchmark script.

Delete Table, Column, Predicate, SelectExpr, Order, Join, JoinedQuery and all 16 module-level table constants from db.py. Replace ~100 callsites across 8 production files and 12 test files with raw SQL + decode_rows/decode_one helpers. Every query is now visible as plain SQL at its callsite. Net -556 lines. Follows #4179 which fixed the immediate scheduling perf issue. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Russell Power <rjpower@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

rjpower added the agent-generated Created by automation/agent label Mar 26, 2026

rjpower requested a review from yonromai March 26, 2026 17:50

chatgpt-codex-connector Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread lib/iris/src/iris/cluster/controller/migrations/0013_has_reservation.py

yonromai approved these changes Mar 26, 2026

View reviewed changes

rjpower merged commit 1e301c8 into main Mar 26, 2026
50 checks passed

rjpower deleted the rjpower/20260326-iris-opt branch March 26, 2026 17:55

rjpower mentioned this pull request Mar 26, 2026

[iris] Remove ORM query builder, replace with raw SQL #4181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[iris] Fix scheduling loop: filter reservation jobs at SQL level#4179

[iris] Fix scheduling loop: filter reservation jobs at SQL level#4179
rjpower merged 1 commit intomainfrom
rjpower/20260326-iris-opt

rjpower commented Mar 26, 2026

Uh oh!

claude Bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rjpower commented Mar 26, 2026

Uh oh!

claude Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: [iris] Fix scheduling loop: filter reservation jobs at SQL level

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Mar 26, 2026 •

edited

Loading