Skip to content

[iris] Cache autoscaler pending hints per evaluate() cycle#4848

Merged
rjpower merged 1 commit intomainfrom
agent/20260416-fix-4844
Apr 16, 2026
Merged

[iris] Cache autoscaler pending hints per evaluate() cycle#4848
rjpower merged 1 commit intomainfrom
agent/20260416-fix-4844

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude bot commented Apr 16, 2026

GetJobStatus rebuilt the full AutoscalerStatus proto on every dashboard poll, re-running routing_decision_to_proto over every demand entry and unmet entry. Cache the routing-decision proto and derived pending-hint dict on Autoscaler, invalidated in evaluate(). Repeated polls between scheduling cycles now hit cached objects instead of reserializing the whole plan. Fixes #4844

@claude claude bot added the agent-generated Created by automation/agent label Apr 16, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0b1d1045f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/iris/src/iris/cluster/controller/autoscaler/runtime.py
@rjpower rjpower requested a review from ravwojdyla April 16, 2026 23:20
@rjpower
Copy link
Copy Markdown
Collaborator

rjpower commented Apr 16, 2026

@claude rebase to main pls

@claude
Copy link
Copy Markdown
Contributor Author

claude bot commented Apr 16, 2026

Claude finished @rjpower's task in 1m 31s —— View job


Rebase to main ✅

  • Fetch latest main (had to unshallow)
  • Rebase branch onto origin/main (clean, no conflicts)
  • Force-push to agent/20260416-fix-4844

Summary: Rebased onto c3429008a ([iris] Move periodic checkpoint to its own thread...) and force-pushed. The single PR commit now sits on top of main as 3a2e49840. No conflicts had to be resolved.

No code changes made, so no tests were run. The outstanding Codex review comment about ordering around _last_scale_plan / _last_pending_hints / _last_routing_decision_proto in runtime.py:257 is unaddressed — let me know if you want me to tackle that next.
• Branch: agent/20260416-fix-4844

GetJobStatus rebuilt the full AutoscalerStatus proto on every dashboard
poll, re-running routing_decision_to_proto over every demand entry and
unmet entry. Cache the routing-decision proto and derived pending-hint
dict on Autoscaler, invalidated in evaluate().

Fixes #4844
@claude claude bot force-pushed the agent/20260416-fix-4844 branch from b0b1d10 to 3a2e498 Compare April 16, 2026 23:23
@rjpower rjpower merged commit 7bf6c3b into main Apr 16, 2026
37 of 39 checks passed
@rjpower rjpower deleted the agent/20260416-fix-4844 branch April 16, 2026 23:29
rjpower added a commit that referenced this pull request Apr 17, 2026
The hint was dropped from GetJobStatus because rebuilding the full
routing table per call was 35% of wall time. #4848 now caches the
per-job hint dict per evaluate() cycle, so the lookup is a single
dict get — attach this job's hint without serializing the routing
decision.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

iris: autoscaler pending hints are too expensive

2 participants