fix(flotilla): wire min_cpu_per_task into TaskResourceRequest by XiaoHongbo-Hope · Pull Request #7125 · Eventual-Inc/Daft

XiaoHongbo-Hope · 2026-06-13T13:14:14Z

Changes Made

Fixes min_cpu_per_task (DAFT_MIN_CPU_PER_TASK) being a no-op in the flotilla scheduler — it had no readers.

Wiring: TaskResourceRequest::num_cpus() falls back to the configured min_cpu_per_task instead of a hardcoded 1.0, and
LocalPhysicalPlan::resource_request() leaves num_cpus unset so the fallback applies. Default bumped 0.5 → 1.0 to keep existing
behavior.
Autoscaler: Ray request_resources (pinned ray==2.55.1) only accepts integer bundles, so fractional CPU/GPU demand is
aggregated into ceil(sum) integer bundles — N tasks at 0.1 request ceil(0.1*N) units, not N (issue min_cpu_per_task config has no effect (unwired in flotilla scheduler) #7123). Sub-GPU tasks pack
into {"CPU","GPU":1} bundles that carry their co-located CPU; sub-CPU tasks into {"CPU":1} bundles; memory / whole-unit (>= 1)
tasks keep an individual bundle. The high-water mark tracks the post-aggregation integer request, so each cycle escalates by a whole
unit and the ramp converges instead of stalling.
Validation: reject non-finite / non-positive min_cpu_per_task (Python + env, shared rule) and non-finite / negative
num_cpus/num_gpus in ResourceRequest.

Related Issues

greptile-apps · 2026-06-13T13:16:44Z

Greptile Summary

This PR fixes a long-standing bug where min_cpu_per_task (also settable via DAFT_MIN_CPU_PER_TASK) had no effect on the flotilla/Swordfish distributed scheduler — TaskResourceRequest::num_cpus() always fell back to the hardcoded literal 1.0. The fix threads min_cpu_per_task from DaftExecutionConfig into TaskResourceRequest and uses it as a proper floor.

Core wiring (task.rs): SwordfishTaskBuilder now passes self.config.min_cpu_per_task to TaskResourceRequest::new, and num_cpus() uses .max(min_cpu_per_task) so the floor applies even when a plan supplies an explicit but smaller CPU value. Three new unit tests cover all cases.
Default fix (lib.rs): The config default is corrected from 0.5 → 1.0, restoring the original observable behaviour for users who never set the config; val > 0.0 validation rejects zero/negative env values.
Python validation (python.rs): set_execution_config(min_cpu_per_task=...) now raises PyValueError on ≤ 0 values; docstring updated in context.py.

Confidence Score: 5/5

Safe to merge; the change correctly restores the pre-existing 1.0 CPU default and wires the config field that was previously silently ignored.

The fix is narrow and well-tested: three new unit tests cover all branches of the floor logic, the default is restored to 1.0 (matching the literal that was there before), Python and env-var paths both validate the > 0 constraint, and MockTaskBuilder::with_resource_request now correctly preserves min_cpu_per_task. SwordfishTaskBuilder is the only production site constructing TaskResourceRequest, and it already carried the config it needed.

No files require special attention.

Important Files Changed

Filename	Overview
src/daft-distributed/src/scheduling/task.rs	Core fix: adds `min_cpu_per_task` field to `TaskResourceRequest`, changes `num_cpus()` to apply it as a floor via `.max()`, wires it from `SwordfishTaskBuilder.config` at task creation. Three new unit tests cover all cases.
src/common/daft-config/src/lib.rs	Default changed from 0.5 to 1.0 (restores the old TaskResourceRequest fallback behaviour). Env-var loading gains a `val > 0.0` guard that silently rejects zero/negative values.
src/common/daft-config/src/python.rs	Adds an explicit `PyValueError` guard rejecting `min_cpu_per_task <= 0.0` from Python.
daft/context.py	Docstring updated to reflect the new 1.0 default, floor semantics, and the `> 0` constraint.

_{Reviews (2): Last reviewed commit: "fix(clippy): collapse let-chain in min_c..." | Re-trigger Greptile}

greptile-apps · 2026-06-13T13:16:48Z

    pub fn num_cpus(&self) -> f64 {
-        self.resource_request.num_cpus().unwrap_or(1.0)
+        self.resource_request
+            .num_cpus()
+            .unwrap_or(self.min_cpu_per_task)
    }


Silent default-CPU reduction from 1.0 → 0.5

The old fallback was the literal 1.0; DaftExecutionConfig::default().min_cpu_per_task is 0.5. Any deployment that relied on the previous implicit 1.0-CPU bundle request will now submit 0.5-CPU bundles without any config change. On an autoscaler that uses CPU headroom to decide when to launch new nodes, halving the per-task CPU request could double the number of tasks packed per node before a scale-out fires, increasing the likelihood of OOM or throttled execution. The PR description notes this trade-off, but the change in default observable behaviour is worth confirming with the team before merging.

greptile-apps · 2026-06-13T13:16:49Z

        pub fn with_resource_request(mut self, resource_request: ResourceRequest) -> Self {
-            self.resource_request = TaskResourceRequest::new(resource_request);
+            self.resource_request = TaskResourceRequest::new(
+                resource_request,
+                DaftExecutionConfig::default().min_cpu_per_task,
+            );
            self


with_resource_request ignores the builder's own config

MockTaskBuilder::with_resource_request constructs a fresh DaftExecutionConfig::default() to obtain the min_cpu_per_task fallback instead of reusing whatever min_cpu_per_task the calling test may have injected into self.resource_request already. Any test that constructs a MockTask with a non-default min_cpu_per_task and later calls with_resource_request will silently reset the fallback to 0.5. This won't break current tests (none set a custom value), but it makes the helper subtly inconsistent and could mask future test regressions. Consider storing the min_cpu_per_task on the builder and reusing it here.

The min_cpu_per_task execution config field had no readers in the distributed scheduler: TaskResourceRequest::num_cpus() returned a hardcoded 1.0 when the plan's ResourceRequest had no num_cpus. This was wired up in Eventual-Inc#4506 for the legacy ray runner only; Eventual-Inc#5375 removed that runner along with its lone reader, and the flotilla scheduler was never wired in. Result: setting min_cpu_per_task via daft.set_execution_config or DAFT_MIN_CPU_PER_TASK had zero effect on autoscaler bundle requests. Plumb the value from DaftExecutionConfig (already in scope at SwordfishTaskBuilder::build) into TaskResourceRequest, and use it as the fallback in num_cpus() instead of the literal 1.0. Closes Eventual-Inc#7123

- Comment said 'Floor' but the implementation is 'Default-when-None' (explicit num_cpus is honored as-is). Reword to match. - Add two unit tests: * num_cpus falls back to min_cpu_per_task when ResourceRequest is empty * explicit num_cpus passes through unchanged

Before this PR, the flotilla scheduler fell back to a hardcoded 1.0 CPU when ResourceRequest had no num_cpus; the configured min_cpu_per_task default of 0.5 was inert. Wiring the field through (1d05985) without changing the default would silently halve the per-task CPU floor for every existing user, which Greptile's review flagged as a behaviour change risk on capacity-tuned clusters (e.g. KubeRay packs 2x more tasks before scale-out, increasing OOM risk). Move the default to 1.0 so the no-explicit-num_cpus path matches the pre-wiring behaviour exactly. Users who want a smaller floor set it explicitly via daft.set_execution_config(min_cpu_per_task=...) or DAFT_MIN_CPU_PER_TASK, which is the original purpose of the knob.

…uest MockTaskBuilder.with_resource_request was constructing a fresh DaftExecutionConfig::default() to fetch the min_cpu_per_task fallback, which silently reset any non-default value already set on the builder. Reuse self.resource_request.min_cpu_per_task instead so the fallback threads through chained .with_* calls correctly. Test-helper-only; no production behavior change. Addresses greptile P2 review comment.

…e values

…ntics

…fallback applies

…inite min_cpu_per_task Two follow-ups from review of the min_cpu_per_task wiring: - The Ray autoscaler bundle still rounded CPU up via `num_cpus().ceil()`, so a configured `min_cpu_per_task=0.1` reached `request_resources` as `{"CPU": 1}` (issue Eventual-Inc#7123). Add `TaskResourceRequest::autoscale_bundle` which keeps CPU fractional (GPU/memory stay integer, zero values dropped) and build the Ray bundle as a `PyDict` with a float CPU. Update the `try_autoscale` type hint accordingly. - `min_cpu_per_task` validation let NaN/inf through (`nan <= 0` and `inf <= 0` are both false). Use `is_finite() && > 0` on both the Python setter and the env path. Add Rust unit tests for the fractional bundle and env validation, and a Python test covering acceptance of fractional values and rejection of 0/-0.5/nan/inf.

…undles Ray's request_resources (<= 2.55; Daft pins ray==2.55.1) rejects non-integer bundle values: `isinstance(bundle[key], int)` raises `TypeError: each bundle key should be str and value as int.`. Sending a float CPU therefore crashes at runtime — even a whole 1.0, since pyo3 emits a Python float. Replace the per-task float bundle with `aggregate_ray_bundles`: CPU-only tasks have their fractional CPU summed and emitted as ceil(sum) unit {"CPU": 1} bundles, so N tasks at 0.1 CPU request ceil(0.1*N) CPUs instead of N (issue Eventual-Inc#7123), while never sending a non-integer value. Tasks carrying GPU or memory keep an individual bundle (CPU rounded up) since those resources pin placement to a node. Revert the try_autoscale type hint to dict[str, int]. Replace the fractional bundle unit tests with aggregation tests.

aggregate_ray_bundles packed every CPU-only task into unit {"CPU": 1} bundles. That is wrong for a task requesting num_cpus >= 1: a 4-CPU task runs on one worker, so splitting it into 4 spread bundles lets the autoscaler provision 4 single-CPU nodes and leaves the task unschedulable. It also turned CPU magnitude into the loop count, so a huge or non-finite explicit num_cpus (inf as i64 == i64::MAX) could hang/OOM, and a NaN poisoned the running sum and zeroed the batch's CPU request. Only pack sub-1.0 CPU-only tasks now; tasks with GPU, memory, or num_cpus >= 1 keep an individual bundle (CPU rounded up to at least 1). Non-finite / non-positive CPU contributes nothing. The packed sum is now bounded by task count, so the loop can no longer blow up.

…r mark The high-water mark recorded the fractional cpu_sum, but the request actually sent to Ray is the integer-aggregated bundle total. With min_cpu_per_task=0.1 the mark grew ~0.1 per cycle while ceil() only bumped the real CPU request every ~10 cycles, so scale-up for many pending tasks stalled for ~1/min_cpu_per_task cycles (≈50s at the default 5s interval) per extra CPU. Record the aggregated integer bundle totals (what Ray actually receives) as the mark instead. Because each cycle selects bundles until the fractional cpu_sum exceeds the integer mark, ceil() now bumps by at least one CPU every cycle, restoring the intended one-unit-per-cycle ramp while still never requesting less than before. Convergence is unchanged: once pending demand can no longer exceed the mark, the cycle is skipped. Verified: cargo test -p daft-distributed --lib (8 task tests pass), cargo check/clippy -p daft-distributed --features python clean.

try_new_internal only checked num_gpus for negativity, so an explicit num_cpus = inf/NaN (or negative) flowed through. Downstream that became a bundle of {"CPU": i64::MAX} (inf as i64 saturates) in the autoscaler. Require both num_cpus and num_gpus to be finite and nonnegative; this also catches a NaN num_gpus that the existing >1-must-be-integer check missed.

aggregate_ray_bundles only packed fractional CPU; fractional-GPU tasks (Daft supports num_gpus<1) were each emitted as a full {"GPU":1} bundle. Combined with the high-water mark comparing raw fractional gpu_sum against the integer mark, 11 tasks at 0.1 GPU would request 11 GPUs instead of ceil(1.1)=2 — likely tripping "requested bundles exceed max capacity, autoscaler refuses all". Sum sub-GPU demand into ceil(sum) {"CPU":1,"GPU":1} bundles (their small CPU rides on the GPU node, not double-counted); memory/multi-unit tasks still keep an individual placement-pinned bundle. Extract the select-then-aggregate ramp into a pure next_autoscale_request() in task.rs so it's unit-testable outside the python-gated worker manager; add tests for whole-unit CPU/GPU escalation, GPU packing, and the skip-below-mark case. Tighten comments. Verified: cargo test -p daft-distributed --lib (78 pass) and -p common-resource-request; cargo check/clippy --features python clean.

Packing sub-GPU tasks dropped their CPU demand entirely: a task with num_gpus<1 and the min_cpu_per_task fallback (e.g. @daft.cls(gpus=0.5) -> num_cpus=None -> 1.0 CPU) only contributed to fractional_gpu_sum, so the {CPU:1,GPU:1} bundles under-counted CPU. Worse, next_autoscale_request selects on raw cpu_sum while the high-water mark records the post-aggregation totals, so two such tasks at mark 1/1 kept re-requesting {CPU:1,GPU:1} without the mark ever growing — the cluster could stall and never provision the second CPU, leaving tasks unschedulable. Accumulate the packed tasks' CPU (gpu_cpu_sum) and spread it across the GPU bundles as ceil(gpu_cpu_sum / gpu_bundles). The request now reflects real CPU demand, so the recorded mark grows whenever selection triggers and the ramp converges. Add a regression test for the gpus=0.5 + fallback-CPU stall. Verified: cargo test -p daft-distributed --lib (13 task tests) and cargo check/clippy --features python clean.

The "finite and > 0" check for min_cpu_per_task was duplicated in the env path and the Python setter, which could drift. Extract it into a single DaftExecutionConfig::is_valid_min_cpu_per_task and call it from both.

Carrying GPU tasks' CPU as ceil(gpu_cpu_sum / gpu_bundles) produced bundles like {CPU:2, GPU:1}. As a single Ray request_resources shape that fits no standard 1-CPU/1-GPU node, so the autoscaler can't scale up — and the value is recorded as the high-water mark, stalling further attempts. Emit unit {CPU:1, GPU:1} bundles instead (a sub-GPU task's cpu and gpu are each <= 1, so one always fits a standard GPU node), with the count covering both dimensions: ceil(max(gpu_sum, gpu_cpu_sum)). Two 1-CPU/0.5-GPU tasks now request two {CPU:1,GPU:1} shapes (2 CPU / 2 GPU) rather than one unschedulable {CPU:2}. Assert the schedulable shape in the regression test.

XiaoHongbo-Hope requested a review from a team as a code owner June 13, 2026 13:14

github-actions Bot added the fix label Jun 13, 2026

XiaoHongbo-Hope marked this pull request as draft June 13, 2026 13:15

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

XiaoHongbo-Hope force-pushed the fix/min-cpu-per-task-wiring branch from ca66e01 to 1d05985 Compare June 13, 2026 13:20

XiaoHongbo-Hope added 5 commits June 13, 2026 12:06

fix(flotilla): apply min_cpu_per_task as floor and reject non-positiv…

ab1e67a

…e values

fix(clippy): collapse let-chain in min_cpu_per_task env guard

48a4d2a

XiaoHongbo-Hope marked this pull request as ready for review June 19, 2026 15:07

Merge branch 'main' into fix/min-cpu-per-task-wiring

ebdf031

XiaoHongbo-Hope marked this pull request as draft June 19, 2026 15:26

XiaoHongbo-Hope added 12 commits June 19, 2026 09:27

fix(plan): leave num_cpus unset so min_cpu_per_task can lower task CPU

c36c458

docs: align min_cpu_per_task docstring with fallback (not floor) sema…

cd6ea1f

…ntics

fix(autoscale): use TaskResourceRequest wrappers so min_cpu_per_task …

710c932

…fallback applies

refactor(daft-config): share min_cpu_per_task validity rule

5f25d91

The "finite and > 0" check for min_cpu_per_task was duplicated in the env path and the Python setter, which could drift. Extract it into a single DaftExecutionConfig::is_valid_min_cpu_per_task and call it from both.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(flotilla): wire min_cpu_per_task into TaskResourceRequest#7125

fix(flotilla): wire min_cpu_per_task into TaskResourceRequest#7125
XiaoHongbo-Hope wants to merge 19 commits into
Eventual-Inc:mainfrom
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring

XiaoHongbo-Hope commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XiaoHongbo-Hope commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Related Issues

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

XiaoHongbo-Hope commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading