[iris] benchmark_controller: production-mix scenario over real RPC by rjpower · Pull Request #5775 · marin-community/marin

rjpower · 2026-05-15T19:38:34Z

Summary

Reworks lib/iris/scripts/benchmark_controller.py so every benchmark exercises the controller through its real Connect/HTTP surface (serialization, ASGI dispatch, @on_loop vs threadpool, real network) rather than calling service/transitions helpers directly in-process.
Adds an out-of-process controller harness so the benchmark process's client threads don't share a GIL or asyncio loop with the server.
Adds a scenario abstraction (RpcLoad / Scenario / ScenarioRunner) that drives a mix of RPCs at independently-configurable target rates, with a PRODUCTION_MIX_RPS preset derived from the controller stats page (SetTaskStatusText dominates at 316 rps, every other RPC < 8 rps). --scale multiplies all rates for overload runs.
Also bumps _wait_until_job_drained's backoff in service.py so long replacement drains don't hammer the heartbeat path.

What's new

component	purpose
`RpcHarness`	Real `Controller(dry_run=True)` in-process; production-cadence loops, destructive writes gated.
`SubprocessRpcHarness`	Spawns the controller in a child process via the new `serve` subcommand. Parent owns clients, child owns the event loop.
`RpcLoad` / `Scenario` / `ScenarioRunner`	Multi-RPC mix runner. `n_clients_min` knob so e.g. SetTaskStatusText runs from 200 connection threads, matching the production fleet's connection count.
17 `load_*` factories	One per RPC in the production distribution. `benchmark_rpcs` now drives its per-call microbenches through the same factories.
`benchmark_scenario` group	Production-shape mix run via `... run --only scenario --scale 1.0 --scenario-duration 60`.
`benchmark_set_task_status_text` group	Head-to-head `@on_loop` vs threadpool comparison + blast-radius probe (1000 RPS SetTaskStatusText storm vs victim UpdateTaskStatus).

CLI is now a click.group:

... run [--only X] [--scale Y] [--scenario-duration Z] — benchmark groups
... serve --db-path X --state-dir Y — controller subprocess used by SubprocessRpcHarness

Refactor lib/iris/scripts/benchmark_controller.py so every benchmark exercises the controller through its real Connect/HTTP surface rather than calling service/transitions helpers directly. The previous in-process call path missed serialization, ASGI dispatch, the @on_loop vs threadpool decision in AsyncServiceAdapter, and the GIL-shared cost of running clients alongside the server -- all of which dominate the production tail. Major changes: * New RpcHarness boots a real Controller(dry_run=True) backed by the cloned snapshot. The scheduler / polling / ping / heartbeat loops tick at their natural production cadence; only destructive writes are gated by dry_run. * New SubprocessRpcHarness spawns the controller in a separate Python process via a new `serve` subcommand on the same script. The benchmark process owns the clients; the controller owns the event loop. Mirrors production where ~200 task processes each push their own SetTaskStatusText from independent interpreters. * New scenario abstraction (RpcLoad / Scenario / ScenarioRunner) that drives a *mix* of RPCs at independently-configurable target rates. Each RpcLoad also carries an n_clients_min so we can match the production fleet's connection count, not just throughput. * PRODUCTION_MIX_RPS preset derived from the controller stats page (SetTaskStatusText dominates at 316 rps, every other RPC < 8 rps). Run via `... run --only scenario --scale 1.0 --scenario-duration 60`. --scale multiplies all rates uniformly for overload runs. * 17 load_* factories cover every RPC in the production distribution. benchmark_rpcs now drives the per-call microbenches through the same factories so request-construction lives in one place. * New benchmark_set_task_status_text group probes the @on_loop vs threadpool dispatch decision head-to-head and reports the blast radius a 1000 rps SetTaskStatusText storm imposes on a victim UpdateTaskStatus. CLI is now a click.group: `... run [...]` for benchmarks, `... serve --db-path X --state-dir Y` for the controller subprocess. Also bumps the _wait_until_job_drained backoff (0.05->1.0 initial, 1.0->10.0 max, 1.5->2 factor) to reduce poll pressure on the heartbeat path during long replacement drains.

claude · 2026-05-15T19:38:53Z

Claude finished @rjpower's task in 4m 26s —— View job

Code review

Check PR eligibility for review
Identify relevant CLAUDE.md / AGENTS.md files
Summarize PR changes
Run parallel review agents (2x CLAUDE.md compliance, 2x bug scan)
Validate flagged issues
Post inline review comments

Posted 5 inline comments on high-signal issues:

hasattr(_Tx, "fetchall") compat hack (L58-62) — explicit AGENTS.md violation ("No ad-hoc compatibility hacks (hasattr); update code consistently").
Stale Usage: docstring after click.group() conversion (L2525-2530) — the three example invocations in the module docstring fail with No such option: --db / --only because those flags moved onto the run subcommand.
Mutable module globals _SCENARIO_SCALE / _SCENARIO_DURATION (L356-357) — set via global in run_cmd, read inside benchmark_scenario; AGENTS.md requires explicit constructor/config parameters over implicit channels.
Local imports inside _toggle_set_task_status_text_on_loop (L1968-1969) — no circular-dep / optional-dep justification; same pattern in _FakeProvider.reconcile_workers and several closures around db_mod.
_toggle_set_task_status_text_on_loop(False) at L2418 doesn't restore source state — source has @on_loop, so forcing False is the opposite of "restore"; benign for the default group ordering but a landmine for any future in-process group after set_task_status_text.

Items considered but not flagged (low signal or borderline):

service.py ExponentialBackoff change — explicitly mentioned in the PR description, intentional behavior change, not a bug.
getattr(t, "current_worker_id", None) at L96 — defensive but lacks the explicit "old_attr restore" character of the hasattr rule.
SubprocessRpcHarness.__init__ blocking readline() ignoring startup_timeout_s — real but uvicorn's bootup logging keeps it from blocking forever in practice.
Concurrent ANALYZE between parent sample_db and the subprocess controller — speculative without a repro.
scenarios raw-dict list in _run_apply_under_contention (~L1919) — style nit, not a clear correctness issue.
· Branch: iris-bench-scenario

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da3d35b9fd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T19:43:02Z

+            line = self._proc.stdout.readline() if self._proc.stdout else ""
+            if not line:


Make subprocess startup timeout actually enforceable

SubprocessRpcHarness.__init__ uses self._proc.stdout.readline() inside the timeout loop, but readline() blocks until a newline/EOF. If the child process hangs before emitting a line, this call blocks indefinitely and the startup_timeout_s deadline is never checked, so benchmark_scenario can hang forever instead of failing fast.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-15T19:43:03Z

+    # Restore the function attribute to whatever the source file declares so
+    # subsequent groups in the same run aren't observably mutated.
+    _toggle_set_task_status_text_on_loop(False)


Restore SetTaskStatusText dispatch mode to source default

This cleanup call always forces set_task_status_text out of @on_loop mode for the rest of the process, but the source method is currently decorated with @on_loop. In the default run ordering, set_task_status_text runs before scenario, so the production-mix scenario is measured with the wrong dispatch mode and reports misleading latency/throughput numbers.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-15T19:43:03Z

+    tp_baseline, tp_under = _measure_update_task_status_under_storm(
+        db, on_loop=False, storm_rps_target=storm_rps, storm_threads=storm_threads, victim_duration_s=victim_seconds
+    )


Run contention probe against a cloned controller DB

benchmark_set_task_status_text passes the shared db into _measure_update_task_status_under_storm, which repeatedly invokes update_task_status (a write path). Unlike the other write-heavy benchmark groups that clone the DB first, this mutates the same checkpoint backing the rest of the run, contaminating later groups (especially scenario) and the cached local benchmark DB state.

Useful? React with 👍 / 👎.

claude · 2026-05-15T19:49:25Z

 # Branch removed Tx.fetchall/fetchone; restore for this benchmark script.
 if not hasattr(_Tx, "fetchall"):
    _Tx.fetchall = lambda self, stmt, params=None: self.execute(stmt, params).all()
    _Tx.fetchone = lambda self, stmt, params=None: self.execute(stmt, params).first()
 from iris.cluster.controller.projections.endpoints import EndpointQuery, EndpointRow, EndpointsProjection


CLAUDE.md violation: ad-hoc compatibility hack via hasattr.

AGENTS.md says: "No ad-hoc compatibility hacks (hasattr(m, 'old_attr')); update code consistently." The leading comment ("Branch removed Tx.fetchall/fetchone; restore for this benchmark script.") makes this an exact instance of the banned pattern.

Two follow-on problems:

The import on line 62 is displaced after the conditional block, breaking "All imports at the top of the file".

Monkey-patching _Tx globally mutates the class for any code that imports it after this script runs (the serve subcommand is in the same module).

Fix: update this script's call sites to use tx.execute(stmt, params).all() / .first() directly (matching the rest of the codebase), then drop the patch and move the displaced import back to the top block.

claude · 2026-05-15T19:49:39Z

+_SCENARIO_SCALE: float = 1.0
+_SCENARIO_DURATION: float = 60.0


CLAUDE.md violation: mutable module globals used as an implicit parameter channel.

These look like top-level constants (ALL_CAPS) but are mutated by run_cmd via global at lines 2560-2562 and then read by benchmark_scenario at line 2325.

AGENTS.md requires: "Force explicit specification of critical parameters (no silent defaults). Centralize defaults in one canonical location. Prefer explicit constructor/config parameters over env vars." A global mutated by a CLI handler and read by a sibling benchmark function is the same anti-pattern this rule targets — the value flows invisibly between functions.

Suggested fix: thread scale / duration_s through benchmark_scenario(db, *, scale, duration_s) and the groups dispatch loop. The other group benchmarks don't need these params, so either accept **kwargs or branch on the group name to pass them only to benchmark_scenario.

claude · 2026-05-15T19:49:47Z

+    """
+    from iris.cluster.controller.service import ControllerServiceImpl
+    from iris.rpc.async_adapter import _ON_LOOP_ATTR


CLAUDE.md violation: local imports without a circular-dependency / optional-dep justification.

AGENTS.md requires: "All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps."

Neither symbol here is on a circular-import path:

ControllerServiceImpl lives in iris.cluster.controller.service, which is already imported at lines 74-78 (for USER_JOB_STATES et al.).

_ON_LOOP_ATTR is from iris.rpc.async_adapter, which the script does not yet import — adding it at the top is fine.

A similar local import sits in _FakeProvider.reconcile_workers at line 154 (from iris.cluster.controller.worker_provider import WorkerReconcileResult) — same pattern, please move that one too. There are also several from iris.cluster.controller import db as db_mod lines inside closures (around lines 1283, 1309, 1440, 1553) that should be hoisted.

claude · 2026-05-15T19:50:20Z

+@click.group()
+def main() -> None:
+    """Benchmark the Iris controller against a local checkpoint."""
+
+
+@main.command("run")


The click.group() conversion makes the module docstring's Usage: examples stale.

The module docstring at lines 12-22 still shows:

uv run python lib/iris/scripts/benchmark_controller.py uv run python lib/iris/scripts/benchmark_controller.py --db ./controller.sqlite3 uv run python lib/iris/scripts/benchmark_controller.py --only polling

After this change all three fail with Error: No such option: --db / --only — the flags now live on the run subcommand. Please update the docstring to use ... run --db ... / ... run --only ... (and consider listing the new set_task_status_text / scenario groups while you're there).

yonromai

🤖 Requesting changes because auto-merge is enabled and there are clear correctness issues to resolve before merge. I did not repost the existing inline feedback, but I agree the current branch can hang forever waiting for serve startup output and that benchmark_set_task_status_text runs UpdateTaskStatus against the shared checkpoint DB instead of a clone, contaminating the cached/input benchmark state. I also verified the now-existing stale-usage comment: the script's documented pre-run --db / --only invocations are rejected by the new click group shape.

Generated with Codex

…cess Every bench group now spawns the dry-run controller as a subprocess (the production shape: each client is its own interpreter). The in-process ``RpcHarness`` was kept only for ``benchmark_rpcs``'s endpoint-cache rehydrate trick, which we now drop — ``load_register_endpoint`` already mints unique endpoint names per call, so the table can grow during the bench without skewing measurements. - Delete the in-process ``RpcHarness`` (and ``_find_free_port``) + the ``_reset_endpoint`` callback + the ``transitions``/``health``/ ``endpoints``/``worker_attrs`` projection exposure that only worked in-process. - Rename ``SubprocessRpcHarness`` → ``RpcHarness``. - Wire ``--on-loop`` through ``serve_cmd`` so the SetTaskStatusText ``@on_loop`` toggle now happens in the child before the controller boots (the previous in-parent patch can't reach the child interpreter). ``_run_set_status_text_variant`` and ``_measure_update_task_status_under_storm`` pass ``on_loop`` through the harness constructor. Verification: scenario --scale 0.1 --scenario-duration 8, the full rpcs group, and the full set_task_status_text group (both on_loop modes) all run end-to-end against the new subprocess harness. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

yonromai

🤖 Still requesting changes on f383e13dd8bf5402200d201a32e03d42c5896254. The follow-up commit fixes the shared-checkpoint mutation concern by cloning before the SetTaskStatusText probes, and it moves the @on_loop toggle into the subprocess path. The blocking startup issue is still present, though: RpcHarness.__init__ still calls blocking stdout.readline() inside the timeout loop, so a child that hangs before writing a newline can still hang the benchmark forever. I also rechecked the CLI usage issue; the module doc still advertises benchmark_controller.py --db ... / --only ..., and those options are still rejected unless they are placed after run.

Generated with Codex

rjpower added the agent-generated Created by automation/agent label May 15, 2026

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

rjpower requested a review from yonromai May 15, 2026 19:44

rjpower enabled auto-merge (squash) May 15, 2026 19:44

claude Bot reviewed May 15, 2026

View reviewed changes

Comment thread lib/iris/scripts/benchmark_controller.py Outdated

claude Bot reviewed May 15, 2026

View reviewed changes

yonromai requested changes May 15, 2026

View reviewed changes

yonromai approved these changes May 15, 2026

View reviewed changes

rjpower merged commit 0c0a3ae into main May 15, 2026
36 checks passed

rjpower deleted the iris-bench-scenario branch May 15, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[iris] benchmark_controller: production-mix scenario over real RPC#5775

[iris] benchmark_controller: production-mix scenario over real RPC#5775
rjpower merged 2 commits into
mainfrom
iris-bench-scenario

rjpower commented May 15, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 15, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

claude Bot May 15, 2026

Uh oh!

claude Bot May 15, 2026

Uh oh!

claude Bot May 15, 2026

Uh oh!

Uh oh!

claude Bot May 15, 2026

Uh oh!

yonromai left a comment

Uh oh!

yonromai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		line = self._proc.stdout.readline() if self._proc.stdout else ""
		if not line:

		_SCENARIO_SCALE: float = 1.0
		_SCENARIO_DURATION: float = 60.0

Conversation

rjpower commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new

Uh oh!

claude Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yonromai left a comment

Choose a reason for hiding this comment

Uh oh!

yonromai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rjpower commented May 15, 2026 •

edited

Loading

claude Bot commented May 15, 2026 •

edited

Loading