Skip to content

Conversation

@itsomri
Copy link
Collaborator

@itsomri itsomri commented Feb 5, 2026

Description

This PR fixed max pod tracking on scheduling simulations.

Changes:

  • "Pods" are now tracked as a scalar resource
  • Added an e2e test to verify preemption simulates pod limits correctly
  • Changed node max pods predicate to handle edge cases for reservation pods

Checklist

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Breaking Changes

Additional Notes

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/max-pods-as-resource

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

📊 Performance Benchmark Results

Comparing PR (fix/max-pods-as-resource) vs main branch:

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
                                    │ main-bench.txt │            pr-bench.txt            │
                                    │     sec/op     │    sec/op     vs base              │
AllocateAction_SmallCluster-4           107.8m ±  0%   108.0m ±  0%       ~ (p=0.132 n=6)
AllocateAction_MediumCluster-4          134.7m ±  2%   136.4m ±  2%       ~ (p=0.310 n=6)
AllocateAction_LargeCluster-4           223.9m ± 12%   221.1m ± 14%       ~ (p=0.937 n=6)
ReclaimAction_SmallCluster-4            102.8m ±  0%   102.8m ±  0%       ~ (p=0.310 n=6)
ReclaimAction_MediumCluster-4           105.5m ±  0%   105.6m ±  0%  +0.04% (p=0.041 n=6)
PreemptAction_SmallCluster-4            103.6m ±  0%   103.6m ±  0%       ~ (p=0.818 n=6)
PreemptAction_MediumCluster-4           112.9m ±  0%   113.6m ±  0%  +0.62% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4      113.1m ±  0%   113.9m ±  0%  +0.70% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4     199.9m ±  2%   206.3m ±  1%  +3.20% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4      104.9m ±  0%   105.2m ±  0%  +0.28% (p=0.002 n=6)
FullSchedulingCycle_MediumCluster-4     118.3m ±  1%   119.4m ±  1%  +0.92% (p=0.002 n=6)
FullSchedulingCycle_LargeCluster-4      156.3m ±  1%   158.9m ±  1%  +1.66% (p=0.002 n=6)
ManyQueues_MediumCluster-4              139.0m ±  1%   140.7m ±  0%  +1.20% (p=0.002 n=6)
GangScheduling_MediumCluster-4          156.3m ±  1%   157.4m ±  2%       ~ (p=0.699 n=6)
geomean                                 130.0m         130.9m        +0.68%

                                    │ main-bench.txt │            pr-bench.txt             │
                                    │      B/op      │     B/op      vs base               │
AllocateAction_SmallCluster-4           2.152Mi ± 0%   2.268Mi ± 0%   +5.39% (p=0.002 n=6)
AllocateAction_MediumCluster-4          11.84Mi ± 0%   12.31Mi ± 0%   +3.98% (p=0.002 n=6)
AllocateAction_LargeCluster-4           41.54Mi ± 0%   42.71Mi ± 0%   +2.82% (p=0.002 n=6)
ReclaimAction_SmallCluster-4            892.8Ki ± 1%   968.8Ki ± 1%   +8.51% (p=0.002 n=6)
ReclaimAction_MediumCluster-4           2.830Mi ± 0%   3.148Mi ± 0%  +11.22% (p=0.002 n=6)
PreemptAction_SmallCluster-4            1.003Mi ± 1%   1.064Mi ± 0%   +6.00% (p=0.002 n=6)
PreemptAction_MediumCluster-4           4.018Mi ± 0%   4.253Mi ± 0%   +5.86% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4      5.606Mi ± 0%   5.807Mi ± 0%   +3.58% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4     46.88Mi ± 0%   48.04Mi ± 0%   +2.46% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4      1.373Mi ± 1%   1.469Mi ± 0%   +6.96% (p=0.002 n=6)
FullSchedulingCycle_MediumCluster-4     6.836Mi ± 0%   7.229Mi ± 0%   +5.74% (p=0.002 n=6)
FullSchedulingCycle_LargeCluster-4      22.83Mi ± 0%   23.80Mi ± 0%   +4.23% (p=0.002 n=6)
ManyQueues_MediumCluster-4              16.31Mi ± 0%   16.78Mi ± 0%   +2.88% (p=0.002 n=6)
GangScheduling_MediumCluster-4          17.17Mi ± 0%   17.96Mi ± 0%   +4.62% (p=0.002 n=6)
geomean                                 6.331Mi        6.665Mi        +5.28%

                                    │ main-bench.txt │           pr-bench.txt            │
                                    │   allocs/op    │  allocs/op   vs base              │
AllocateAction_SmallCluster-4            36.21k ± 0%   36.79k ± 0%  +1.62% (p=0.002 n=6)
AllocateAction_MediumCluster-4           325.2k ± 0%   327.6k ± 0%  +0.73% (p=0.002 n=6)
AllocateAction_LargeCluster-4            1.394M ± 0%   1.400M ± 0%  +0.42% (p=0.002 n=6)
ReclaimAction_SmallCluster-4             8.399k ± 0%   8.790k ± 0%  +4.66% (p=0.002 n=6)
ReclaimAction_MediumCluster-4            26.54k ± 0%   28.14k ± 0%  +6.03% (p=0.002 n=6)
PreemptAction_SmallCluster-4             11.19k ± 0%   11.47k ± 0%  +2.56% (p=0.002 n=6)
PreemptAction_MediumCluster-4            38.77k ± 0%   39.95k ± 0%  +3.03% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4       73.58k ± 0%   74.57k ± 0%  +1.34% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4      685.9k ± 0%   691.7k ± 0%  +0.84% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4       21.36k ± 0%   21.85k ± 0%  +2.29% (p=0.002 n=6)
FullSchedulingCycle_MediumCluster-4      174.7k ± 0%   176.6k ± 0%  +1.13% (p=0.002 n=6)
FullSchedulingCycle_LargeCluster-4       727.3k ± 0%   732.1k ± 0%  +0.67% (p=0.002 n=6)
ManyQueues_MediumCluster-4               363.3k ± 0%   365.7k ± 0%  +0.65% (p=0.002 n=6)
GangScheduling_MediumCluster-4           597.0k ± 0%   601.0k ± 0%  +0.67% (p=0.002 n=6)
geomean                                  111.7k        113.8k       +1.89%

Legend

  • 📉 Negative delta = Performance improvement (faster)
  • 📈 Positive delta = Performance regression (slower)
  • p-value < 0.05 indicates statistically significant change
Raw benchmark data

PR branch:

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 107964268 ns/op	 2378014 B/op	   36792 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108000822 ns/op	 2375840 B/op	   36788 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107958022 ns/op	 2377573 B/op	   36792 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107813292 ns/op	 2380247 B/op	   36797 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108059418 ns/op	 2377868 B/op	   36793 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108016062 ns/op	 2380756 B/op	   36795 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136341950 ns/op	12912495 B/op	  327576 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136009187 ns/op	12910671 B/op	  327565 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 139159497 ns/op	12912111 B/op	  327573 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136915127 ns/op	12911844 B/op	  327569 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134800530 ns/op	12914372 B/op	  327569 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136467655 ns/op	12912016 B/op	  327574 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 225283137 ns/op	44778683 B/op	 1400178 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 214857149 ns/op	44799755 B/op	 1400170 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 250553044 ns/op	44790953 B/op	 1400163 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       4	 252986931 ns/op	44777242 B/op	 1400168 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 208366798 ns/op	44775929 B/op	 1400152 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 216995072 ns/op	44790492 B/op	 1400149 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102761977 ns/op	  979149 B/op	    8759 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102817413 ns/op	  992184 B/op	    8780 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102830359 ns/op	  995848 B/op	    8792 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102725261 ns/op	  991856 B/op	    8790 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102778855 ns/op	 1000620 B/op	    8791 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102725816 ns/op	  991548 B/op	    8789 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105806410 ns/op	 3298685 B/op	   28141 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105551395 ns/op	 3302476 B/op	   28142 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105728006 ns/op	 3302652 B/op	   28143 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105614976 ns/op	 3298762 B/op	   28141 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105578833 ns/op	 3302791 B/op	   28143 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105574430 ns/op	 3298724 B/op	   28141 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103496329 ns/op	 1111760 B/op	   11476 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103687537 ns/op	 1111434 B/op	   11474 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103615928 ns/op	 1115055 B/op	   11474 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103751032 ns/op	 1118196 B/op	   11475 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103609734 ns/op	 1115320 B/op	   11475 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103639715 ns/op	 1115517 B/op	   11476 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113386465 ns/op	 4455478 B/op	   39946 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113916736 ns/op	 4459700 B/op	   39948 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113513320 ns/op	 4460043 B/op	   39949 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113712242 ns/op	 4455727 B/op	   39947 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113598886 ns/op	 4460096 B/op	   39949 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113615933 ns/op	 4459778 B/op	   39948 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113669479 ns/op	 6085096 B/op	   74564 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114104789 ns/op	 6093007 B/op	   74644 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114062649 ns/op	 6090873 B/op	   74536 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113767953 ns/op	 6087001 B/op	   74587 allocs/op

Main branch:

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 107921950 ns/op	 2255004 B/op	   36205 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108088240 ns/op	 2256461 B/op	   36206 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107844962 ns/op	 2256343 B/op	   36206 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107734769 ns/op	 2259076 B/op	   36213 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107371011 ns/op	 2254937 B/op	   36204 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107369449 ns/op	 2257126 B/op	   36209 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137001857 ns/op	12434907 B/op	  325202 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137367295 ns/op	12418419 B/op	  325198 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 132240133 ns/op	12416951 B/op	  325195 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 133354854 ns/op	12418178 B/op	  325203 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135578308 ns/op	12416658 B/op	  325188 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 133808021 ns/op	12420204 B/op	  325193 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 221036575 ns/op	43565160 B/op	 1394298 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 215626883 ns/op	43557452 B/op	 1394296 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 234186589 ns/op	43558161 B/op	 1394305 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 207399643 ns/op	43558369 B/op	 1394293 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 250046299 ns/op	43556472 B/op	 1394290 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 226685843 ns/op	43556516 B/op	 1394292 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102811017 ns/op	  901709 B/op	    8369 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102856777 ns/op	  911058 B/op	    8388 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102763677 ns/op	  914247 B/op	    8399 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102843206 ns/op	  914272 B/op	    8399 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102680596 ns/op	  915364 B/op	    8398 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102883936 ns/op	  914429 B/op	    8400 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105606663 ns/op	 2965754 B/op	   26540 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105267720 ns/op	 2969552 B/op	   26541 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105543884 ns/op	 2965525 B/op	   26540 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105555132 ns/op	 2969735 B/op	   26542 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105566514 ns/op	 2969328 B/op	   26541 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105487439 ns/op	 2965712 B/op	   26541 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103727734 ns/op	 1056055 B/op	   11190 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103689914 ns/op	 1048229 B/op	   11187 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103670132 ns/op	 1051938 B/op	   11189 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103530339 ns/op	 1062661 B/op	   11192 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103472188 ns/op	 1052125 B/op	   11189 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103579559 ns/op	 1052087 B/op	   11189 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112891461 ns/op	 4215553 B/op	   38774 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112617726 ns/op	 4215112 B/op	   38772 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112936214 ns/op	 4214954 B/op	   38771 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112768385 ns/op	 4211169 B/op	   38773 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112975150 ns/op	 4206625 B/op	   38770 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112917057 ns/op	 4206638 B/op	   38770 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 112878271 ns/op	 5879450 B/op	   73585 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113308444 ns/op	 5878435 B/op	   73594 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113273929 ns/op	 5885516 B/op	   73570 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113053714 ns/op	 5878549 B/op	   73583 allocs/op

@itsomri
Copy link
Collaborator Author

itsomri commented Feb 5, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@itsomri itsomri force-pushed the fix/max-pods-as-resource branch 7 times, most recently from 1131ad9 to 39f9e25 Compare February 11, 2026 13:41
itsomri and others added 10 commits February 11, 2026 17:00
Alternative approach to fixing max pods predicate with releasing pods.
Instead of tracking allocated pod count separately, treat pods as a
scalar resource similar to CPU and memory.

Changes:
- Add pods to node allocatable resources (v1.ResourcePods)
- Add pods to task resource requirements (1 pod per task, 2 for shared GPU)
- Remove explicit max pods check from predicates (resource accounting handles it)
- Update test helpers to include pods in node resources
- Add test demonstrating releasing pods don't block preemption

This approach naturally handles releasing pods because they are tracked
in the Releasing resources, making their pod count available for new allocations.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Implement predicate-based pod counting that accurately accounts for GPU
group reservation pods. Previously, all shared GPU tasks added +1 to pod
count, but the correct behavior is:
- First fractional pod on a GPU: +2 pods (task + reservation)
- Additional pods on same GPU: +1 pod (task only)

Changes:
- Remove blanket +1 pod count from pod_info ResReq
- Add checkMaxPodsWithGpuGroupReservation predicate that:
  * Determines if task creates new GPU group using allocation logic
  * Counts reservation pods being freed when GPU groups fully release
  * Only adds reservation pod count when new group is created
- Export GPU sharing functions for predicate access
- Store session in predicates plugin for GPU allocation logic access

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add comprehensive e2e tests to validate accurate pod counting with
GPU sharing and max pods limits:

1. Simple case: Verify preemption works on node at max pods
2. Fractions allocation: Verify fraction pod cannot allocate on node
   at maxPods-1 (would need maxPods+1 with reservation pod)
3. Proper reservation calculation: Verify fraction can preempt another
   fraction and reuse the same GPU group reservation pod

These tests dynamically read the node's max pod capacity instead of
hardcoding values, making them portable across different cluster
configurations.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@itsomri itsomri force-pushed the fix/max-pods-as-resource branch from 6ed007d to 4d5651a Compare February 11, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant