Skip to content

[iris] Remove 8 GB Docker memory cap on build containers#4625

Merged
rjpower merged 2 commits intomainfrom
iris/remove-build-memory-limit
Apr 10, 2026
Merged

[iris] Remove 8 GB Docker memory cap on build containers#4625
rjpower merged 2 commits intomainfrom
iris/remove-build-memory-limit

Conversation

@rjpower
Copy link
Copy Markdown
Collaborator

@rjpower rjpower commented Apr 10, 2026

uv sync on a large workspace OOMs against the fixed 8 GB _BUILD_MEMORY_LIMIT_BYTES ceiling. Confirmed via dmesg on the worker: CONSTRAINT_MEMCG killed uv at exactly 8 GB while the host had 1.4 TB free. Remove the artificial limit so build containers are bounded only by host memory.

uv sync on a large workspace exceeds the 8 GB _BUILD_MEMORY_LIMIT_BYTES
ceiling. The host OOM killer fires against the container cgroup, not host
memory (confirmed: CONSTRAINT_MEMCG in dmesg on a worker with 1.4 TB free).
Remove the artificial limit so builds use whatever the host has available.
@rjpower rjpower added the agent-generated Created by automation/agent label Apr 10, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 10, 2026

Claude finished @rjpower's task in 1m 34s —— View job


Review of PR #4625

  • Read the diff and changed file
  • Review for correctness and safety
  • Post review feedback

LGTM — the change is correct and well-motivated. The 8 GB fixed ceiling was clearly too low for large-workspace uv sync, and the dmesg evidence (CONSTRAINT_MEMCG kill at exactly 8 GB with 1.4 TB free) is conclusive.

One subtlety to flag

Passing memory_limit_mb=None to _docker_create means the build container now falls through to:

effective_memory_mb = memory_limit_mb or config.get_memory_mb()  # line 683

This has two cases:

Task memory configured? Old behavior New behavior
No (get_memory_mb()None) Build capped at 8 GB No Docker --memory flag → bounded only by host cgroup ✅
Yes (e.g. task requests 200 GB) max(8 GB, 200 GB) = 200 GB 200 GB (from task config) ✅
Yes (e.g. task requests 4 GB) max(8 GB, 4 GB) = 8 GB 4 GB (from task config) — lower than before

The third case is a mild regression: if a task requests less than 8 GB, the build container now gets a lower cap than it used to. In practice this is probably fine — Iris tasks on beefy hosts almost certainly request more than 8 GB — but it's worth being aware of. If this could bite anyone, an easy fix is:

memory_limit_mb=None,  # →  skip the docker --memory flag entirely for builds

and branching inside _docker_create on the phase label, or simply not passing --memory when memory_limit_mb is None (distinct from 0 or a positive int). But looking at line 683-685, None or NoneNone → the if effective_memory_mb: guard skips --memory entirely, which only fires when the task has no memory resource set.

If the intent is "builds should never have a Docker memory cap", the cleanest approach would be to add a skip_memory_limit: bool = False parameter (or sentinel) to _docker_create so the build path explicitly opts out rather than relying on the None-passthrough. But this is a nit — the current behavior is safe for the common case.

Summary

  • Constant removal is clean; no other references to _BUILD_MEMORY_LIMIT_BYTES remain.
  • Comment explaining the rationale is clear and includes the diagnostic evidence.
  • Run containers are unaffected (they never used _BUILD_MEMORY_LIMIT_BYTES).
  • No test changes needed — this is a runtime configuration change.

Ship it. 🚢

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2e2805e65

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

command=["bash", "/app/_setup_env.sh"],
label_suffix="_build",
memory_limit_mb=build_memory_mb,
memory_limit_mb=None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip memory fallback when removing build limits

Passing memory_limit_mb=None here does not actually remove the Docker memory cap: _docker_create() computes effective_memory_mb = memory_limit_mb or config.get_memory_mb() and still sets --memory whenever the task specifies resources.memory_bytes. That means build containers for memory-scoped tasks remain capped (and tasks below 8 GB now get a tighter limit than before), so uv sync can still OOM despite this change’s stated goal of running build containers without an artificial limit.

Useful? React with 👍 / 👎.

@rjpower
Copy link
Copy Markdown
Collaborator Author

rjpower commented Apr 10, 2026

@claude let's just bump from 8G -> 32G instead

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 10, 2026

Claude finished @rjpower's task in 2m 37s —— View job


Bumped build memory limit from 8 GB → 32 GB

  • Read current file state on branch
  • Restore _BUILD_MEMORY_LIMIT_BYTES constant at 32 GB and the max(limit, task_request) logic
  • Run pre-commit checks — all passed
  • Run relevant tests — 6/6 passed (lib/iris/tests/cluster/runtime/test_docker_runtime.py)
  • Commit and push

Replaced the "remove limit entirely" approach with a simple bump: _BUILD_MEMORY_LIMIT_BYTES goes from 8 * 1024**332 * 1024**3. The max(32 GB, task_request) logic is restored so build containers always get at least 32 GB, or whatever the task requests if higher. Updated the comment to note the old 8 GB OOM.
· Branch iris/remove-build-memory-limit

Restore the max(limit, task_request) logic but with a 32 GB floor
instead of 8 GB. The old ceiling caused uv sync OOMs on large
workspaces (confirmed via CONSTRAINT_MEMCG in dmesg).

Co-authored-by: Russell Power <rjpower@users.noreply.github.com>
@rjpower rjpower merged commit 3aa1607 into main Apr 10, 2026
40 checks passed
@rjpower rjpower deleted the iris/remove-build-memory-limit branch April 10, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant