Skip to content

Conversation

@anton-ubi
Copy link
Contributor

@anton-ubi anton-ubi commented Dec 4, 2025

Problem
Hosts with GPU capabilities (ALLOW_GPU=true) incorrectly reject frames that have zero GPU memory requirements. This prevents efficient resource utilization where GPU-enabled hosts should accept both GPU and CPU-only workloads.

Root Cause
The getGpuJobs() method in CoreUnitDispatcher calls removeGpu() when no GPU-specific jobs are found. This method reduces all host resources (CPU cores, memory, GPU resources) to "reserve space for future GPU frames." When normal frames are subsequently evaluated, they fail resource checks due to these artificially reduced limits.

The problematic flow:

  1. Look for GPU jobs → none found
  2. Call removeGpu() → reduces idleCores (-100) and idleMemory (-4GB)
  3. Evaluate normal frames → rejected due to insufficient resources
  4. Call restoreGpu() → too late, frames already rejected

Solution
Add a configuration property dispatcher.gpu.skip_resource_reservation that disables GPU resource reservation when set to true. Default is false to maintain backward compatibility.

Configuration
The new behavior is controlled by environment variable CUEBOT_DISPATCHER_SKIP_GPU_RESERVATION:

  • Default (false): Preserves existing behavior
  • Skip (true): Disables resource reservation, allows full resource utilization

Recap

  • Before fix: GPU hosts may reject CPU-only frames despite having sufficient resources
  • After fix: GPU hosts efficiently handle both GPU and CPU-only workloads when optimization enabled.

Backward compatibility is preserved.
This resolves the counter-intuitive behavior where GPU-capable hosts reject valid CPU-only frames.

@anton-ubi anton-ubi changed the title skip_resource_reservation Fix GPU Resource Reservation Preventing Non-GPU Frame Dispatch Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant