Skip to content

refactor(scheduler): migrate NodeInfo, PodInfo, and plugin resource r…#1146

Merged
enoodle merged 10 commits into
mainfrom
erez/migrate-node-info-pod-info-to-vectors
Mar 13, 2026
Merged

refactor(scheduler): migrate NodeInfo, PodInfo, and plugin resource r…#1146
enoodle merged 10 commits into
mainfrom
erez/migrate-node-info-pod-info-to-vectors

Conversation

@enoodle

@enoodle enoodle commented Mar 5, 2026

Copy link
Copy Markdown
Collaborator

…eads to vectors

Description

Convert all read-path methods from Resource to ResourceVector operations:

  • NodeInfo: IsTaskAllocatable, FittingError, GetSumOfIdleGPUs, IsCPUOnlyNode
  • External plugins: proportion, topology, nodeplacement, nodeavailability, resourcetype
  • Framework: session logging, statement references
  • Error handling: pod_errors, job_errors

Add AcceptedResourceVector to PodInfo. Add QuantifyVector util to proportion plugin. Rewrite topology calcNodeAccommodation from iterative pod probing to division-based vector approach.

Resource fields still maintained via dual-write for backward compatibility until removal in subsequent commits.

Related Issues

Fixes #

Checklist

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Summary by CodeRabbit

Release Notes

  • Refactor
    • Modernized resource tracking infrastructure by transitioning from scalar resource representations to vector-based resource tracking, enabling more precise allocation of GPUs, CPUs, memory, and custom resources.
    • Enhanced resource requirement comparisons and capacity calculations throughout the scheduler to use vector operations.
    • Updated error reporting for resource-related failures to provide more detailed, vector-aware diagnostics.

@coderabbitai

coderabbitai Bot commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45ded0d3-3a3a-42b9-933e-8bef0f4fb484

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The pull request refactors the scheduler's resource representation system, migrating from scalar Resource pointers to vector-based ResourceVector value types with an accompanying ResourceVectorMap for indexed access. This systematic change affects resource allocation, topology calculations, error handling, and logging throughout multiple scheduler components and plugins.

Changes

Cohort / File(s) Summary
Core Resource Representation & Error Handling
pkg/scheduler/api/common_info/job_errors.go, pkg/scheduler/api/common_info/job_errors_test.go, pkg/scheduler/api/common_info/pod_errors.go, pkg/scheduler/api/common_info/pod_errors_test.go
Updated error constructors to accept ResourceVector and ResourceVectorMap instead of scalar Resource pointers. NewTopologyInsufficientResourcesError and NewFitErrorInsufficientResource now use vector-based resource parameters with indexed access for GPU, CPU, memory, and MIG resources. Test fixtures updated to use ToVector(vectorMap) for resource construction.
Node & Pod Info Structures
pkg/scheduler/api/node_info/node_info.go, pkg/scheduler/api/node_info/node_info_test.go, pkg/scheduler/api/pod_info/pod_info.go
Introduced AllocatableVector, IdleVector, ReleasingVector, and VectorMap fields to NodeInfo and AcceptedResourceVector to PodInfo. Updated initialization, cloning, and SetVectorMap logic to maintain vector representations. lessEqualTaskToNodeResources and isTaskAllocatableOnNonAllocatedResources now accept ResourceVector parameters.
GPU Sharing & Node Availability
pkg/scheduler/api/node_info/gpu_sharing_node_info.go, pkg/scheduler/cache/cluster_info/cluster_info.go, pkg/scheduler/plugins/nodeavailability/nodeavailability.go, pkg/scheduler/plugins/resourcetype/resourcetype.go
Replaced scalar GPU counting (Idle.GPUs(), Used.GPUs()) with vector-indexed access (IdleVector.Get(gpuIdx), UsedVector.Get(gpuIdx)) using VectorMap.GetIndex("gpu"). GPU resource accounting now fully vector-based.
Scheduler Framework & Session
pkg/scheduler/framework/session.go, pkg/scheduler/framework/statement.go
Updated GPU resource accounting and logging to use vector-based representations (IdleVector, UsedVector, ReleasingVector, ResReqVector). Replaced scalar Idle/Releasing/Used counts with indexed vector access.
Node Placement Plugins
pkg/scheduler/plugins/nodeplacement/pack.go, pkg/scheduler/plugins/nodeplacement/spread.go, pkg/scheduler/plugins/nodeplacement/nodepack_test.go, pkg/scheduler/plugins/nodeplacement/nodespread_test.go
Replaced Allocatable.Get(resourceName) with AllocatableVector.Get(node.VectorMap.GetIndex(string(resourceName))) for resource availability checks. Test setup now initializes AllocatableVector, IdleVector, ReleasingVector, and VectorMap.
Capacity & Proportion Plugins
pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go, pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy_test.go, pkg/scheduler/plugins/proportion/proportion.go, pkg/scheduler/plugins/proportion/proportion_test.go
Replaced per-task resource accounting with vector-based representation. QuantifyVector utility introduced for vector quantification. Refactored totalVictimsResources to store ResourceVector slices. Updated quota extraction and accumulation to use vector operations.
Reclaimability & Strategies
pkg/scheduler/plugins/proportion/reclaimable/reclaimer_info.go, pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go, pkg/scheduler/plugins/proportion/reclaimable/reclaimable_test.go, pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go, pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies_test.go
Updated ReclaimerInfo.RequiredResources to use ResourceVector and added VectorMap field. Reclaimable method signatures now accept ResourceVector and vectorMap. Updated all quota calculations to use QuantifyVector with vector map.
Utility Functions
pkg/scheduler/plugins/proportion/utils/utils.go
Added QuantifyVector function to convert ResourceVector to ResourceQuantities using a ResourceVectorMap for index-based resource extraction.
Topology Plugin
pkg/scheduler/plugins/topology/job_filtering.go, pkg/scheduler/plugins/topology/job_filtering_test.go, pkg/scheduler/plugins/topology/node_scoring_test.go, pkg/scheduler/plugins/topology/topology_plugin.go, pkg/scheduler/plugins/topology/topology_structs.go
Extensive refactoring replacing scalar Resource with ResourceVector throughout topology allocation logic. Updated getTasksAllocationMetadata, getJobAllocatableDomains, checkJobDomainFit, and related methods to use vector-based resources. DomainInfo.IdleOrReleasingResources replaced with IdleOrReleasingVector. Info struct now includes VectorMap field.
Scheduler Actions & Tests
pkg/scheduler/actions/common/allocate.go, pkg/scheduler/actions/common/feasible_nodes_test.go, pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go, pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go, pkg/scheduler/actions/common/solvers/pod_scenario_builder_test.go
Updated log messages to use ResReqVector instead of ResReq. Test initialization now includes ResourceVectorMap setup and vector initialization for nodes. Updated GPU accounting to use vector-indexed access.
Job Ordering Utilities
pkg/scheduler/actions/utils/job_order_by_queue_test.go
Updated test fixtures to include VectorMap, AcceptedResourceVector, and AllocatedVector alongside or replacing scalar resource fields. Test data structures now support vector-based resource representations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Vectors now dance where scalars once stood
Resource maps guide the scheduler's wood
From pointers to values, the system takes flight
With indexed resource pools burning bright
A migration complete, the logic shines right!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive Title is truncated and incomplete. It reads 'refactor(scheduler): migrate NodeInfo, PodInfo, and plugin resource r…' which cuts off mid-word, making it unclear what specific resource concept is being addressed. Complete the PR title to clearly convey the full scope, e.g., 'refactor(scheduler): migrate NodeInfo, PodInfo, and plugin resources to vectors' for clarity.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description covers the main changes adequately, explaining the conversion to ResourceVector operations and updated plugin behavior. However, it lacks detailed context for understanding the full scope and rationale for the architectural change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch erez/migrate-node-info-pod-info-to-vectors

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Mar 5, 2026

Copy link
Copy Markdown

📊 Performance Benchmark Results

Comparing PR (erez/migrate-node-info-pod-info-to-vectors) vs main branch:

main-bench.txt:155: parsing iteration count: invalid syntax
pr-bench.txt:155: parsing iteration count: invalid syntax
goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
                                    │ main-bench.txt │           pr-bench.txt            │
                                    │     sec/op     │   sec/op     vs base              │
AllocateAction_SmallCluster-4           108.7m ±  1%   109.0m ± 0%       ~ (p=0.310 n=6)
AllocateAction_MediumCluster-4          137.6m ±  1%   138.0m ± 1%       ~ (p=0.818 n=6)
AllocateAction_LargeCluster-4           227.9m ± 21%   220.4m ± 8%       ~ (p=0.180 n=6)
ReclaimAction_SmallCluster-4            103.1m ±  0%   103.0m ± 0%  -0.12% (p=0.004 n=6)
ReclaimAction_MediumCluster-4           106.4m ±  0%   106.5m ± 1%       ~ (p=0.699 n=6)
PreemptAction_SmallCluster-4            103.8m ±  0%   103.7m ± 0%       ~ (p=0.180 n=6)
PreemptAction_MediumCluster-4           114.6m ±  0%   113.8m ± 0%  -0.70% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4      114.8m ±  0%   114.8m ± 0%       ~ (p=0.818 n=6)
ConsolidationAction_MediumCluster-4     207.8m ±  1%   207.0m ± 1%       ~ (p=0.240 n=6)
FullSchedulingCycle_SmallCluster-4      105.7m ±  0%   105.7m ± 0%       ~ (p=0.589 n=6)
FullSchedulingCycle_MediumCluster-4     120.8m ±  1%   121.0m ± 1%       ~ (p=0.132 n=6)
FullSchedulingCycle_LargeCluster-4      162.2m ±  1%   162.5m ± 0%       ~ (p=0.589 n=6)
ManyQueues_MediumCluster-4              143.0m ±  2%   143.6m ± 1%       ~ (p=0.310 n=6)
GangScheduling_MediumCluster-4          160.5m ±  1%   160.0m ± 2%       ~ (p=1.000 n=6)
geomean                                 132.3m         132.0m       -0.26%

                                    │ main-bench.txt │            pr-bench.txt             │
                                    │      B/op      │     B/op       vs base              │
AllocateAction_SmallCluster-4           2.346Mi ± 0%    2.372Mi ± 0%  +1.09% (p=0.002 n=6)
AllocateAction_MediumCluster-4          12.63Mi ± 0%    12.73Mi ± 0%  +0.80% (p=0.002 n=6)
AllocateAction_LargeCluster-4           43.48Mi ± 0%    43.72Mi ± 0%  +0.56% (p=0.002 n=6)
ReclaimAction_SmallCluster-4            998.9Ki ± 1%   1007.8Ki ± 1%  +0.89% (p=0.002 n=6)
ReclaimAction_MediumCluster-4           3.260Mi ± 0%    3.308Mi ± 0%  +1.47% (p=0.002 n=6)
PreemptAction_SmallCluster-4            1.095Mi ± 0%    1.111Mi ± 1%  +1.48% (p=0.002 n=6)
PreemptAction_MediumCluster-4           4.383Mi ± 0%    4.444Mi ± 0%  +1.40% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4      5.898Mi ± 0%    5.909Mi ± 0%  +0.20% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4     48.42Mi ± 0%    48.22Mi ± 0%  -0.41% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4      1.523Mi ± 1%    1.541Mi ± 1%  +1.17% (p=0.004 n=6)
FullSchedulingCycle_MediumCluster-4     7.445Mi ± 0%    7.517Mi ± 0%  +0.97% (p=0.002 n=6)
FullSchedulingCycle_LargeCluster-4      24.31Mi ± 0%    24.49Mi ± 0%  +0.72% (p=0.002 n=6)
ManyQueues_MediumCluster-4              17.10Mi ± 0%    17.20Mi ± 0%  +0.59% (p=0.002 n=6)
GangScheduling_MediumCluster-4          18.54Mi ± 0%    18.72Mi ± 0%  +1.01% (p=0.002 n=6)
geomean                                 6.840Mi         6.898Mi       +0.85%

                                    │ main-bench.txt │           pr-bench.txt            │
                                    │   allocs/op    │  allocs/op   vs base              │
AllocateAction_SmallCluster-4            37.85k ± 0%   38.48k ± 0%  +1.67% (p=0.002 n=6)
AllocateAction_MediumCluster-4           331.8k ± 0%   334.3k ± 0%  +0.77% (p=0.002 n=6)
AllocateAction_LargeCluster-4            1.410M ± 0%   1.417M ± 0%  +0.45% (p=0.002 n=6)
ReclaimAction_SmallCluster-4             9.241k ± 0%   9.421k ± 0%  +1.95% (p=0.002 n=6)
ReclaimAction_MediumCluster-4            29.92k ± 0%   30.67k ± 0%  +2.51% (p=0.002 n=6)
PreemptAction_SmallCluster-4             11.95k ± 0%   12.16k ± 0%  +1.71% (p=0.002 n=6)
PreemptAction_MediumCluster-4            41.83k ± 0%   42.68k ± 0%  +2.03% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4       75.92k ± 0%   77.09k ± 0%  +1.55% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4      697.1k ± 0%   706.9k ± 0%  +1.40% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4       22.61k ± 0%   23.01k ± 0%  +1.79% (p=0.002 n=6)
FullSchedulingCycle_MediumCluster-4      179.6k ± 0%   181.3k ± 0%  +0.92% (p=0.002 n=6)
FullSchedulingCycle_LargeCluster-4       739.4k ± 0%   743.4k ± 0%  +0.55% (p=0.002 n=6)
ManyQueues_MediumCluster-4               369.8k ± 0%   372.4k ± 0%  +0.69% (p=0.002 n=6)
GangScheduling_MediumCluster-4           608.7k ± 0%   613.6k ± 0%  +0.81% (p=0.002 n=6)
geomean                                  116.8k        118.3k       +1.34%

pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/integration_tests/reclaim
                            │ main-bench.txt │            pr-bench.txt             │
                            │     sec/op     │    sec/op      vs base              │
ReclaimLargeJobs_10Node-4      105.2m ± 0%     105.2m ± 0%         ~ (p=1.000 n=6)
ReclaimLargeJobs_50Node-4      145.2m ± 1%     144.4m ± 1%         ~ (p=0.180 n=6)
ReclaimLargeJobs_100Node-4     297.1m ± 1%     293.3m ± 2%         ~ (p=0.093 n=6)
ReclaimLargeJobs_200Node-4      1.199 ± 6%      1.176 ± 8%         ~ (p=0.180 n=6)
ReclaimLargeJobs_500Node-4      13.91 ± 2%      13.44 ± 1%    -3.42% (p=0.002 n=6)
ReclaimLargeJobs_1000Node-4     118.2 ±  ∞ ¹    112.8 ±  ∞ ¹  -4.52% (p=0.029 n=4)
geomean                         1.441           1.413         -1.98%
¹ need >= 6 samples for confidence interval at level 0.95

                            │ main-bench.txt │             pr-bench.txt             │
                            │      B/op      │      B/op       vs base              │
ReclaimLargeJobs_10Node-4     1.930Mi ± 2%     1.949Mi ± 2%    +0.96% (p=0.041 n=6)
ReclaimLargeJobs_50Node-4     18.01Mi ± 0%     18.17Mi ± 0%    +0.89% (p=0.002 n=6)
ReclaimLargeJobs_100Node-4    61.27Mi ± 0%     61.50Mi ± 0%    +0.36% (p=0.002 n=6)
ReclaimLargeJobs_200Node-4    241.4Mi ± 0%     240.7Mi ± 0%    -0.33% (p=0.002 n=6)
ReclaimLargeJobs_500Node-4    1.739Gi ± 0%     1.706Gi ± 0%    -1.89% (p=0.002 n=6)
ReclaimLargeJobs_1000Node-4   9.090Gi ±  ∞ ¹   8.778Gi ±  ∞ ¹  -3.44% (p=0.029 n=4)
geomean                       142.9Mi          142.1Mi         -0.59%
¹ need >= 6 samples for confidence interval at level 0.95

                            │ main-bench.txt │            pr-bench.txt             │
                            │   allocs/op    │   allocs/op    vs base              │
ReclaimLargeJobs_10Node-4      20.96k ± 3%     21.30k ± 3%    +1.66% (p=0.032 n=6)
ReclaimLargeJobs_50Node-4      240.6k ± 0%     244.2k ± 0%    +1.48% (p=0.002 n=6)
ReclaimLargeJobs_100Node-4     892.5k ± 0%     908.1k ± 0%    +1.75% (p=0.002 n=6)
ReclaimLargeJobs_200Node-4     3.752M ± 0%     3.843M ± 0%    +2.41% (p=0.002 n=6)
ReclaimLargeJobs_500Node-4     29.82M ± 0%     31.00M ± 0%    +3.96% (p=0.002 n=6)
ReclaimLargeJobs_1000Node-4    165.1M ±  ∞ ¹   174.0M ±  ∞ ¹  +5.41% (p=0.029 n=4)
geomean                        2.089M          2.147M         +2.77%
¹ need >= 6 samples for confidence interval at level 0.95

Legend

  • 📉 Negative delta = Performance improvement (faster)
  • 📈 Positive delta = Performance regression (slower)
  • p-value < 0.05 indicates statistically significant change
Raw benchmark data

PR branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 109273946 ns/op	 2489708 B/op	   38487 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 109489666 ns/op	 2487984 B/op	   38480 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108921968 ns/op	 2486845 B/op	   38475 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108806685 ns/op	 2486906 B/op	   38477 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108477324 ns/op	 2487676 B/op	   38477 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 109037603 ns/op	 2484287 B/op	   38471 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137741901 ns/op	13351224 B/op	  334305 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138122178 ns/op	13353842 B/op	  334306 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138261042 ns/op	13356316 B/op	  334302 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137148778 ns/op	13350365 B/op	  334293 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137922872 ns/op	13351827 B/op	  334311 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138443959 ns/op	13350875 B/op	  334300 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 220216053 ns/op	45841257 B/op	 1416677 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 220595146 ns/op	45841236 B/op	 1416679 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 225438484 ns/op	45842545 B/op	 1416689 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 237095781 ns/op	45843195 B/op	 1416691 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 216465480 ns/op	45842598 B/op	 1416695 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 215563287 ns/op	45842592 B/op	 1416690 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102948536 ns/op	 1026599 B/op	    9392 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102942604 ns/op	 1032141 B/op	    9413 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103017752 ns/op	 1035535 B/op	    9422 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102921961 ns/op	 1031596 B/op	    9421 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102976030 ns/op	 1031781 B/op	    9422 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102957571 ns/op	 1040157 B/op	    9422 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 107474525 ns/op	 3464968 B/op	   30673 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106369094 ns/op	 3468938 B/op	   30675 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106518383 ns/op	 3468600 B/op	   30674 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106413789 ns/op	 3465013 B/op	   30674 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106546783 ns/op	 3468914 B/op	   30675 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106465065 ns/op	 3468792 B/op	   30675 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103698285 ns/op	 1160620 B/op	   12155 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103876873 ns/op	 1165132 B/op	   12159 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103814880 ns/op	 1164698 B/op	   12157 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103715880 ns/op	 1164684 B/op	   12157 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103706500 ns/op	 1167540 B/op	   12158 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103718186 ns/op	 1153090 B/op	   12154 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113331278 ns/op	 4660223 B/op	   42679 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113575609 ns/op	 4660296 B/op	   42680 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114045023 ns/op	 4664676 B/op	   42681 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114017141 ns/op	 4655960 B/op	   42679 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113851481 ns/op	 4664724 B/op	   42681 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113815696 ns/op	 4660248 B/op	   42679 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114661725 ns/op	 6195224 B/op	   77115 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114678887 ns/op	 6192058 B/op	   77083 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114879316 ns/op	 6205166 B/op	   77067 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114784503 ns/op	 6198615 B/op	   77067 allocs/op

Main branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 109105328 ns/op	 2460451 B/op	   37847 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108528921 ns/op	 2462262 B/op	   37847 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108736170 ns/op	 2460352 B/op	   37845 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108752829 ns/op	 2462684 B/op	   37849 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 109129403 ns/op	 2459488 B/op	   37842 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108162610 ns/op	 2459590 B/op	   37842 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138386301 ns/op	13250031 B/op	  331759 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138826082 ns/op	13245079 B/op	  331749 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137374696 ns/op	13246263 B/op	  331758 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137161865 ns/op	13245141 B/op	  331748 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137189573 ns/op	13244921 B/op	  331746 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137866037 ns/op	13249361 B/op	  331755 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 235371265 ns/op	45588334 B/op	 1410406 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 227412025 ns/op	45586574 B/op	 1410399 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 215788986 ns/op	45584766 B/op	 1410385 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 225568758 ns/op	45606912 B/op	 1410388 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 276683213 ns/op	45593769 B/op	 1410389 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 228376009 ns/op	45584707 B/op	 1410379 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103100941 ns/op	 1015770 B/op	    9212 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103012439 ns/op	 1021196 B/op	    9234 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103066084 ns/op	 1024546 B/op	    9242 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103037009 ns/op	 1024724 B/op	    9243 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103079712 ns/op	 1025671 B/op	    9242 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103101372 ns/op	 1016767 B/op	    9240 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106792057 ns/op	 3418492 B/op	   29923 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106501557 ns/op	 3418636 B/op	   29923 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106384680 ns/op	 3422528 B/op	   29925 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106425652 ns/op	 3418588 B/op	   29923 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106415324 ns/op	 3418232 B/op	   29922 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106454429 ns/op	 3418524 B/op	   29923 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103748502 ns/op	 1144600 B/op	   11952 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103811303 ns/op	 1144290 B/op	   11950 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103931879 ns/op	 1148480 B/op	   11953 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103779959 ns/op	 1146983 B/op	   11949 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103850903 ns/op	 1148485 B/op	   11953 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103790413 ns/op	 1148391 B/op	   11953 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114662368 ns/op	 4591650 B/op	   41830 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114274848 ns/op	 4595860 B/op	   41831 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114610867 ns/op	 4595901 B/op	   41832 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 115161465 ns/op	 4595760 B/op	   41831 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114488079 ns/op	 4595917 B/op	   41831 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114726353 ns/op	 4595862 B/op	   41831 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114494690 ns/op	 6181496 B/op	   75929 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114933610 ns/op	 6188520 B/op	   75973 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114801278 ns/op	 6188875 B/op	   75914 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114830426 ns/op	 6185913 B/op	   75952 allocs/op

@enoodle enoodle force-pushed the erez/migrate-node-info-pod-info-to-vectors branch from db49cb7 to 4a7b05d Compare March 5, 2026 12:24
@enoodle

enoodle commented Mar 5, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Mar 5, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/scheduler/actions/common/allocate.go (1)

177-190: ⚠️ Potential issue | 🟡 Minor

Fix log wording to match vector payload.

These logs now print task.ResReqVector (full resource vector), but the message still says “requires … GPUs”. That is misleading during debugging.

Suggested patch
-	log.InfraLogger.V(6).Infof("Binding Task <%v/%v> to node <%v>, requires: %v GPUs",
+	log.InfraLogger.V(6).Infof("Binding Task <%v/%v> to node <%v>, requires resources vector: %v",
 		task.Namespace, task.Name, node.Name, task.ResReqVector)

-	log.InfraLogger.V(6).Infof("Pipelining Task <%v/%v> to node <%v> requires: %v GPUs",
+	log.InfraLogger.V(6).Infof("Pipelining Task <%v/%v> to node <%v> requires resources vector: %v",
 		task.Namespace, task.Name, node.Name, task.ResReqVector)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/actions/common/allocate.go` around lines 177 - 190, Update the
log messages that currently print task.ResReqVector but say "requires: ... GPUs"
to accurately reflect the payload; change the wording in the binding log (the
log right before stmt.Allocate(...) inside the Allocate/Bind function) and in
pipelineTaskToNode(...) so they mention "requires resource vector" or "requires
resources" (or similar) instead of "requires: %v GPUs", keeping the same %v
placeholder for task.ResReqVector.
🧹 Nitpick comments (9)
pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go (1)

154-154: Use a single constant for the GPU vector key.

Both Line 154 and Line 177 hardcode "gpu". Centralizing this key reduces typo/drift risk while vector migration is still in progress.

Suggested refactor
 const (
 	idleGpuFilterName               = "AccumulatedIdleGpus"
 	nonAccumulatedScenarioBaseError = "accumulatedIdleGpus requires all the filters scenarios using the same instance to be based on the same scenario with accumulation of potential victims. "
 	requiredResourcesDiffError      = "The pending task %s didn't appear in the scenario given at the AccumulatedIdleGpus ctor"
 	recordedVictimsDiffError        = "The recorded victims should remain the same between the different scenario filtering. %d cache hits, pre update recorded tasks seen %d, post update recorded tasks seen %d"
 	potentialVictimsDiffError       = "The list of potential victims for the current scenario should contain the previous list of potential victims. Only %d out %d tasks are the contained in the current scenario"
+	gpuVectorKey                    = "gpu"
 )
@@
-		requiredResources = append(requiredResources, pod.ResReqVector.Get(pod.VectorMap.GetIndex("gpu")))
+		requiredResources = append(requiredResources, pod.ResReqVector.Get(pod.VectorMap.GetIndex(gpuVectorKey)))
@@
-	ig.nodesNameToIdleGpus[task.NodeName] += task.AcceptedResourceVector.Get(task.VectorMap.GetIndex("gpu"))
+	ig.nodesNameToIdleGpus[task.NodeName] += task.AcceptedResourceVector.Get(task.VectorMap.GetIndex(gpuVectorKey))

Also applies to: 177-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go`
at line 154, The code hardcodes "gpu" in multiple places (e.g., in the
requiredResources append using
pod.ResReqVector.Get(pod.VectorMap.GetIndex("gpu")) and the other occurrence at
line 177); introduce a single file-level constant (e.g., const gpuVectorKey =
"gpu") and replace all hardcoded "gpu" usages with that constant (use
pod.VectorMap.GetIndex(gpuVectorKey) wherever needed) so the key is centralized
and avoids drift during vector migration.
pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go (1)

28-28: Prefer per-test vector map instances over a shared package variable.

Line 28 introduces mutable shared test state. A per-test map keeps fixtures isolated and avoids hidden coupling as more vector keys are introduced in future tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go`
at line 28, Replace the package-level mutable testVectorMap with per-test
instances to avoid shared mutable state: remove the var testVectorMap =
resource_info.NewResourceVectorMap() and instantiate
resource_info.NewResourceVectorMap() inside each test (or a test setup helper)
that currently references testVectorMap (search for usages of testVectorMap in
idle_gpus_test.go) so each test gets its own fresh ResourceVectorMap fixture.
pkg/scheduler/plugins/proportion/proportion_test.go (1)

623-627: Reduce duplicated resource fixture setup for ResReq/ResReqVector.

The same requirement is repeatedly parsed twice per pod (once for ResReq, once for ResReqVector). Create the requirement once and derive both fields from it to avoid fixture drift.

♻️ Example pattern
-ResReq:       common_info.BuildResourceRequirements("1", "1G"),
-ResReqVector: common_info.BuildResourceRequirements("1", "1G").ToVector(testVectorMap),
+req := common_info.BuildResourceRequirements("1", "1G")
+ResReq:       req,
+ResReqVector: req.ToVector(testVectorMap),

Also applies to: 634-638, 661-665, 672-676, 686-690, 713-717, 724-728, 738-742, 843-847, 875-879, 881-885, 887-891, 921-925, 927-931, 958-962, 964-968

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/plugins/proportion/proportion_test.go` around lines 623 - 627,
The test fixtures duplicate parsing by calling
common_info.BuildResourceRequirements twice for ResReq and ResReqVector; fix by
creating a single requirement variable (e.g., req :=
common_info.BuildResourceRequirements("2", "2G")) and reuse it for both fields
so ResReq: req and ResReqVector: req.ToVector(testVectorMap); apply the same
refactor to all similar blocks that build resources (lines referenced in the
comment) to avoid fixture drift and keep testVectorMap usage consistent.
pkg/scheduler/plugins/proportion/utils/utils.go (1)

24-24: Add a GoDoc comment for exported QuantifyVector.

Please add a short GoDoc comment starting with QuantifyVector ....

As per coding guidelines "Add GoDoc-style comments for exported functions and types."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/plugins/proportion/utils/utils.go` at line 24, QuantifyVector
is missing a GoDoc comment; add a GoDoc-style comment directly above the
exported function QuantifyVector that begins with "QuantifyVector ..." and
briefly describes what the function does, its parameters (vec and vectorMap) and
the returned rs.ResourceQuantities to satisfy the project's documentation
guidelines.
pkg/scheduler/plugins/proportion/proportion.go (1)

221-239: Refactor getResources to single-pass accumulation and nil-skip.

Current implementation allocates an intermediate slice and then re-iterates. A single pass is simpler and safer if any AcceptedResourceVector is nil.

Suggested patch
 func getResources(ignoreReallocatedTasks bool, pods ...*pod_info.PodInfo) resource_info.ResourceVector {
-	var vectors []resource_info.ResourceVector
+	var total resource_info.ResourceVector
 	for _, task := range pods {
 		if ignoreReallocatedTasks && pod_status.IsActiveAllocatedStatus(task.Status) {
 			continue
 		}
-		vectors = append(vectors, task.AcceptedResourceVector)
-	}
-
-	if len(vectors) == 0 {
-		return nil
-	}
-
-	total := vectors[0].Clone()
-	for _, vec := range vectors[1:] {
+		vec := task.AcceptedResourceVector
+		if vec == nil {
+			continue
+		}
+		if total == nil {
+			total = vec.Clone()
+			continue
+		}
 		total.Add(vec)
 	}
 
 	return total
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/plugins/proportion/proportion.go` around lines 221 - 239, The
getResources function currently builds an intermediate slice then sums it;
instead, iterate once over pods, skip tasks where ignoreReallocatedTasks &&
pod_status.IsActiveAllocatedStatus(task.Status) and also skip when
task.AcceptedResourceVector is nil, and accumulate directly into a single
resource_info.ResourceVector named total by setting total =
task.AcceptedResourceVector.Clone() for the first non-nil vector and
total.Add(vec) for subsequent ones; if no non-nil vectors are seen return nil.
Use the existing function name getResources, types resource_info.ResourceVector
and pod_info.PodInfo, and the AcceptedResourceVector field/Clone/Add methods
referenced in the diff.
pkg/scheduler/cache/cluster_info/cluster_info.go (1)

285-285: Replace the hardcoded GPU key with the shared constant.

Using a string literal here is brittle when a canonical constant already exists in this file.

♻️ Proposed cleanup
-			if nodeInfo.AllocatableVector.Get(nodeInfo.VectorMap.GetIndex("gpu")) > 0 {
+			if nodeInfo.AllocatableVector.Get(nodeInfo.VectorMap.GetIndex(constants.GpuResource)) > 0 {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/cache/cluster_info/cluster_info.go` at line 285, The code uses
a hardcoded string literal "gpu" in nodeInfo.VectorMap.GetIndex("gpu") when
checking nodeInfo.AllocatableVector.Get(...); replace that literal with the
package-level/shared GPU constant (use the existing canonical constant in this
file, e.g., GPUResourceKey) so the check reads
nodeInfo.VectorMap.GetIndex(GPUResourceKey) (keep using nodeInfo,
AllocatableVector.Get and VectorMap.GetIndex as-is).
pkg/scheduler/plugins/topology/job_filtering.go (1)

113-124: Extract duplicated task-vector aggregation into one helper.

The same accumulation logic exists in getTasksAllocationMetadata and sortTreeFromRoot; consolidating it will reduce drift risk.

Also applies to: 431-440

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/plugins/topology/job_filtering.go` around lines 113 - 124, The
aggregation logic that builds a combined ResourceVector and task count is
duplicated in getTasksAllocationMetadata and in sortTreeFromRoot; extract it
into a single helper (e.g., aggregateTasksResources or
computeTasksResourceSummary) that accepts []*pod_info.PodInfo and returns
(resource_info.ResourceVector, int), move the loop that clones the first
task.ResReqVector and adds subsequent vectors into that helper, and replace the
bodies of getTasksAllocationMetadata and the code in sortTreeFromRoot (and the
duplicate at lines ~431-440) to call the new helper to obtain tasksResources and
tasksCount.
pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go (1)

18-24: Add a GoDoc comment for exported FitsReclaimStrategy.

The exported function signature was updated, but it still has no GoDoc comment.

As per coding guidelines "Add GoDoc-style comments for exported functions and types".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go` around
lines 18 - 24, Add a GoDoc comment for the exported function FitsReclaimStrategy
that briefly describes what the function does and explains its parameters and
return value: mention that it determines whether a reclaimer
(reclaimerResources) can reclaim resources from a reclaimee given the resource
vector map (vectorMap), queue attributes for reclaimer and reclaimee
(reclaimerQueue, reclaimeeQueue) and the reclaimee's remaining share
(reclaimeeRemainingShare), and that it returns a bool indicating whether the
reclaim fits; place the comment immediately above the FitsReclaimStrategy
declaration and follow GoDoc conventions (starts with the function name and is a
complete sentence).
pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go (1)

6-20: Reorder imports into standard / external / internal groups.

k8s.io/api/core/v1 is an external dependency and should be separated from internal imports.

♻️ Suggested import grouping
 import (
 	"fmt"
 	"maps"
 	"math"
 	"strings"
 
+	v1 "k8s.io/api/core/v1"
+
 	commonconstants "github.com/NVIDIA/KAI-scheduler/pkg/common/constants"
 	"github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/common_info"
 	"github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/resource_info"
 	"github.com/NVIDIA/KAI-scheduler/pkg/scheduler/log"
 	"github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies"
 	rs "github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/resource_share"
 	"github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/utils"
-	v1 "k8s.io/api/core/v1"
 )

As per coding guidelines "Organize imports in three groups separated by blank lines: (1) Standard library, (2) External dependencies, (3) Internal packages".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go` around lines 6 -
20, Reorder the import block in reclaimable.go into three groups separated by
blank lines: (1) standard library imports (fmt, maps, math, strings), (2)
external dependencies (v1 "k8s.io/api/core/v1"), and (3) internal packages
(commonconstants, common_info, resource_info, log, strategies, rs, utils) while
preserving existing aliases (e.g., v1 and rs) and import order within each group
to satisfy the project's import grouping guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/scheduler/api/common_info/pod_errors.go`:
- Around line 80-83: The code reads vectors using indices from
vectorMap.GetIndex without checking for -1; update the logic around the uses of
vectorMap.GetIndex for migProfile (and the analogous scalar/resource access
later) to first capture the index (e.g., migIdx :=
vectorMap.GetIndex(string(migProfile))) and guard with if migIdx >= 0 before
calling availableVector.Get(migIdx) or capacityVector.Get(migIdx); if the index
is < 0, return or surface an explicit error/log indicating the resource/MIG
profile is unregistered (so you don't silently treat it as zero), mirroring the
existing pattern used elsewhere in resource_vector.go.

In `@pkg/scheduler/api/node_info/node_info.go`:
- Around line 394-403: The helper lessEqualVectorsExcludingGPU mutates its input
ResourceVector values (a and b) which can race; change it to a non-mutating
comparison: get gpuIdx via ni.VectorMap.GetIndex(commonconstants.GpuResource)
and then loop over all valid indices (or use a length accessor) comparing
a.Get(i) <= b.Get(i) for each i except gpuIdx, returning false on the first
violation and true otherwise. Ensure you reference and use only
ResourceVector.Get and the VectorMap/GetIndex (no Set calls) so neither
task.ResReqVector nor node vectors are modified during the check.

In `@pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go`:
- Around line 79-83: getRequiredQuota iterates tasksToAllocate and blindly calls
utils.QuantifyVector(pod.ResReqVector, pod.VectorMap) which will nil-deref for
pods that haven't initialized vector fields; add a defensive check in
getRequiredQuota: if pod.ResReqVector==nil || pod.VectorMap==nil then compute
quantities from the pod's scalar resource fields (e.g., CPU/milliCPU, Memory,
GPU scalar fields the pod struct exposes) and add those to quota.GPU,
quota.MilliCPU and quota.Memory, otherwise call utils.QuantifyVector as before;
update variable names (quantities, quota.GPU/MilliCPU/Memory) accordingly so
behavior is identical for vectorized pods.

In `@pkg/scheduler/plugins/proportion/proportion_test.go`:
- Around line 811-817: The node's VectorMap and AllocatableVector are being
built from a newly created vectorMap, which can diverge from the test fixtures'
testVectorMap and break vector index alignment; change the setup so
testData.node.VectorMap is assigned the existing testVectorMap (use
testVectorMap instead of a locally created vectorMap) and compute
testData.node.AllocatableVector via
testData.node.Allocatable.ToVector(testVectorMap) so getNodeResources and pod
vectors share the same ResourceVectorMap instance.

In `@pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go`:
- Around line 269-289: getInvolvedResourcesNames currently only checks the
GpuResource slot; add a check for the MIG-backed GPU slot so MIG-only vectors
also mark GPU as involved: retrieve the MIG index from vectorMap (similar to
cpuIdx/memIdx/gpuIdx, e.g. migIdx :=
vectorMap.GetIndex(commonconstants.MigResource) or the project's MIG constant),
ensure the index is valid, and if vec.Get(migIdx) > 0 set
involvedResources[rs.GpuResource] = struct{}{} (keep the existing checks for
cpuIdx/memIdx/gpuIdx intact in the getInvolvedResourcesNames function).

In `@pkg/scheduler/plugins/topology/job_filtering_test.go`:
- Around line 31-32: Tests currently mix a global testVectorMap (created via
resource_info.NewResourceVectorMap()) with per-tree VectorMap instances causing
inconsistent vector/indices; replace the global testVectorMap by creating a
single VectorMap inside each test and pass that same VectorMap instance to every
topology/tree creation in that test (instead of creating new maps inside
functions), update usages referencing testVectorMap so all calls use the local
VectorMap, and remove any other per-tree NewResourceVectorMap() calls so each
test runs end-to-end with one consistent VectorMap.

In `@pkg/scheduler/plugins/topology/topology_plugin.go`:
- Around line 59-64: If nodes can be empty, ensure sharedVectorMap is
initialized to a non-nil default instead of staying nil: after the loop that
sets sharedVectorMap from nodeInfo.VectorMap, check if sharedVectorMap == nil
and, if so, create and assign a new resource_info.ResourceVectorMap (or call the
existing constructor/helper) before it is used or assigned into topology trees
(references: sharedVectorMap, nodes, resource_info.ResourceVectorMap); this
prevents nil derefs when later converting/indexing vectors.

---

Outside diff comments:
In `@pkg/scheduler/actions/common/allocate.go`:
- Around line 177-190: Update the log messages that currently print
task.ResReqVector but say "requires: ... GPUs" to accurately reflect the
payload; change the wording in the binding log (the log right before
stmt.Allocate(...) inside the Allocate/Bind function) and in
pipelineTaskToNode(...) so they mention "requires resource vector" or "requires
resources" (or similar) instead of "requires: %v GPUs", keeping the same %v
placeholder for task.ResReqVector.

---

Nitpick comments:
In
`@pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go`:
- Line 28: Replace the package-level mutable testVectorMap with per-test
instances to avoid shared mutable state: remove the var testVectorMap =
resource_info.NewResourceVectorMap() and instantiate
resource_info.NewResourceVectorMap() inside each test (or a test setup helper)
that currently references testVectorMap (search for usages of testVectorMap in
idle_gpus_test.go) so each test gets its own fresh ResourceVectorMap fixture.

In
`@pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go`:
- Line 154: The code hardcodes "gpu" in multiple places (e.g., in the
requiredResources append using
pod.ResReqVector.Get(pod.VectorMap.GetIndex("gpu")) and the other occurrence at
line 177); introduce a single file-level constant (e.g., const gpuVectorKey =
"gpu") and replace all hardcoded "gpu" usages with that constant (use
pod.VectorMap.GetIndex(gpuVectorKey) wherever needed) so the key is centralized
and avoids drift during vector migration.

In `@pkg/scheduler/cache/cluster_info/cluster_info.go`:
- Line 285: The code uses a hardcoded string literal "gpu" in
nodeInfo.VectorMap.GetIndex("gpu") when checking
nodeInfo.AllocatableVector.Get(...); replace that literal with the
package-level/shared GPU constant (use the existing canonical constant in this
file, e.g., GPUResourceKey) so the check reads
nodeInfo.VectorMap.GetIndex(GPUResourceKey) (keep using nodeInfo,
AllocatableVector.Get and VectorMap.GetIndex as-is).

In `@pkg/scheduler/plugins/proportion/proportion_test.go`:
- Around line 623-627: The test fixtures duplicate parsing by calling
common_info.BuildResourceRequirements twice for ResReq and ResReqVector; fix by
creating a single requirement variable (e.g., req :=
common_info.BuildResourceRequirements("2", "2G")) and reuse it for both fields
so ResReq: req and ResReqVector: req.ToVector(testVectorMap); apply the same
refactor to all similar blocks that build resources (lines referenced in the
comment) to avoid fixture drift and keep testVectorMap usage consistent.

In `@pkg/scheduler/plugins/proportion/proportion.go`:
- Around line 221-239: The getResources function currently builds an
intermediate slice then sums it; instead, iterate once over pods, skip tasks
where ignoreReallocatedTasks && pod_status.IsActiveAllocatedStatus(task.Status)
and also skip when task.AcceptedResourceVector is nil, and accumulate directly
into a single resource_info.ResourceVector named total by setting total =
task.AcceptedResourceVector.Clone() for the first non-nil vector and
total.Add(vec) for subsequent ones; if no non-nil vectors are seen return nil.
Use the existing function name getResources, types resource_info.ResourceVector
and pod_info.PodInfo, and the AcceptedResourceVector field/Clone/Add methods
referenced in the diff.

In `@pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go`:
- Around line 6-20: Reorder the import block in reclaimable.go into three groups
separated by blank lines: (1) standard library imports (fmt, maps, math,
strings), (2) external dependencies (v1 "k8s.io/api/core/v1"), and (3) internal
packages (commonconstants, common_info, resource_info, log, strategies, rs,
utils) while preserving existing aliases (e.g., v1 and rs) and import order
within each group to satisfy the project's import grouping guideline.

In `@pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go`:
- Around line 18-24: Add a GoDoc comment for the exported function
FitsReclaimStrategy that briefly describes what the function does and explains
its parameters and return value: mention that it determines whether a reclaimer
(reclaimerResources) can reclaim resources from a reclaimee given the resource
vector map (vectorMap), queue attributes for reclaimer and reclaimee
(reclaimerQueue, reclaimeeQueue) and the reclaimee's remaining share
(reclaimeeRemainingShare), and that it returns a bool indicating whether the
reclaim fits; place the comment immediately above the FitsReclaimStrategy
declaration and follow GoDoc conventions (starts with the function name and is a
complete sentence).

In `@pkg/scheduler/plugins/proportion/utils/utils.go`:
- Line 24: QuantifyVector is missing a GoDoc comment; add a GoDoc-style comment
directly above the exported function QuantifyVector that begins with
"QuantifyVector ..." and briefly describes what the function does, its
parameters (vec and vectorMap) and the returned rs.ResourceQuantities to satisfy
the project's documentation guidelines.

In `@pkg/scheduler/plugins/topology/job_filtering.go`:
- Around line 113-124: The aggregation logic that builds a combined
ResourceVector and task count is duplicated in getTasksAllocationMetadata and in
sortTreeFromRoot; extract it into a single helper (e.g., aggregateTasksResources
or computeTasksResourceSummary) that accepts []*pod_info.PodInfo and returns
(resource_info.ResourceVector, int), move the loop that clones the first
task.ResReqVector and adds subsequent vectors into that helper, and replace the
bodies of getTasksAllocationMetadata and the code in sortTreeFromRoot (and the
duplicate at lines ~431-440) to call the new helper to obtain tasksResources and
tasksCount.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3560b2aa-a32c-4a45-a10b-4d963473d8f0

📥 Commits

Reviewing files that changed from the base of the PR and between 49dd391 and 4a7b05d.

📒 Files selected for processing (38)
  • pkg/scheduler/actions/common/allocate.go
  • pkg/scheduler/actions/common/feasible_nodes_test.go
  • pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go
  • pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go
  • pkg/scheduler/actions/common/solvers/pod_scenario_builder_test.go
  • pkg/scheduler/actions/utils/job_order_by_queue_test.go
  • pkg/scheduler/api/common_info/job_errors.go
  • pkg/scheduler/api/common_info/job_errors_test.go
  • pkg/scheduler/api/common_info/pod_errors.go
  • pkg/scheduler/api/common_info/pod_errors_test.go
  • pkg/scheduler/api/node_info/gpu_sharing_node_info.go
  • pkg/scheduler/api/node_info/node_info.go
  • pkg/scheduler/api/node_info/node_info_test.go
  • pkg/scheduler/api/pod_info/pod_info.go
  • pkg/scheduler/cache/cluster_info/cluster_info.go
  • pkg/scheduler/framework/session.go
  • pkg/scheduler/framework/statement.go
  • pkg/scheduler/plugins/nodeavailability/nodeavailability.go
  • pkg/scheduler/plugins/nodeplacement/nodepack_test.go
  • pkg/scheduler/plugins/nodeplacement/nodespread_test.go
  • pkg/scheduler/plugins/nodeplacement/pack.go
  • pkg/scheduler/plugins/nodeplacement/spread.go
  • pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go
  • pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy_test.go
  • pkg/scheduler/plugins/proportion/proportion.go
  • pkg/scheduler/plugins/proportion/proportion_test.go
  • pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go
  • pkg/scheduler/plugins/proportion/reclaimable/reclaimable_test.go
  • pkg/scheduler/plugins/proportion/reclaimable/reclaimer_info.go
  • pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go
  • pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies_test.go
  • pkg/scheduler/plugins/proportion/utils/utils.go
  • pkg/scheduler/plugins/resourcetype/resourcetype.go
  • pkg/scheduler/plugins/topology/job_filtering.go
  • pkg/scheduler/plugins/topology/job_filtering_test.go
  • pkg/scheduler/plugins/topology/node_scoring_test.go
  • pkg/scheduler/plugins/topology/topology_plugin.go
  • pkg/scheduler/plugins/topology/topology_structs.go

Comment thread pkg/scheduler/api/common_info/pod_errors.go Outdated
Comment thread pkg/scheduler/api/node_info/node_info.go Outdated
Comment thread pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go
Comment thread pkg/scheduler/plugins/proportion/proportion_test.go
Comment thread pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go
Comment thread pkg/scheduler/plugins/topology/job_filtering_test.go
Comment thread pkg/scheduler/plugins/topology/topology_plugin.go
@enoodle enoodle force-pushed the erez/migrate-node-info-pod-info-to-vectors branch from 4a7b05d to fb1e847 Compare March 6, 2026 16:00
@github-actions

github-actions Bot commented Mar 6, 2026

Copy link
Copy Markdown

Merging this branch changes the coverage (4 decrease, 6 increase)

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common 23.11% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common/solvers 22.22% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus 86.64% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/utils 56.48% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/common_info 62.57% (+0.32%) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/node_info 72.80% (-0.20%) 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/pod_info 66.52% (-0.29%) 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.80% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/framework 33.25% (-0.12%) 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeavailability 88.89% (+1.39%) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement 92.00% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion 37.39% (+0.40%) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy 98.61% (+0.02%) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable 92.62% (-0.30%) 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies 73.33% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/utils 0.00% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/resourcetype 90.91% (+0.91%) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/topology 88.08% (+0.61%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common/allocate.go 0.00% (ø) 123 0 123
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go 77.53% (ø) 89 69 20
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/common_info/job_errors.go 96.92% (-0.34%) 65 (-8) 63 (-8) 2 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors.go 48.35% (+3.79%) 91 (-1) 44 (+3) 47 (-4) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/node_info/gpu_sharing_node_info.go 76.62% (+0.46%) 154 (+3) 118 (+3) 36 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/node_info/node_info.go 71.10% (-0.42%) 346 (+23) 246 (+15) 100 (+8) 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info.go 62.35% (-0.31%) 162 (+4) 101 (+2) 61 (+2) 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info.go 85.87% (ø) 269 231 38
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/framework/session.go 0.00% (ø) 189 (+3) 0 189 (+3)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/framework/statement.go 50.35% (ø) 288 145 143
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeavailability/nodeavailability.go 88.89% (+1.39%) 9 (+1) 8 (+1) 1 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/pack.go 96.88% (ø) 32 31 1
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/spread.go 100.00% (ø) 11 11 0
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go 100.00% (ø) 26 (+1) 26 (+1) 0
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion.go 37.39% (+0.40%) 222 (+3) 83 (+2) 139 (+1) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go 92.62% (-0.30%) 122 (+9) 113 (+8) 9 (+1) 👎
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimer_info.go 0.00% (ø) 0 0 0
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go 73.33% (ø) 15 11 4
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/utils/utils.go 0.00% (ø) 17 (+14) 0 17 (+14)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/resourcetype/resourcetype.go 90.91% (+0.91%) 11 (+1) 10 (+1) 1 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering.go 93.99% (+0.56%) 283 (+9) 266 (+10) 17 (-1) 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/topology/topology_plugin.go 92.11% (+0.93%) 38 (+4) 35 (+4) 3 👍
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/topology/topology_structs.go 73.33% (ø) 15 11 4

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common/feasible_nodes_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/common/solvers/pod_scenario_builder_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/utils/job_order_by_queue_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/common_info/job_errors_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/node_info/node_info_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodepack_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodespread_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering_test.go
  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/topology/node_scoring_test.go

@enoodle enoodle force-pushed the erez/migrate-node-info-pod-info-to-vectors branch from fb1e847 to 51f1e17 Compare March 10, 2026 00:18
@github-actions

Copy link
Copy Markdown

Merging this branch changes the coverage (4 decrease, 6 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common 23.11% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers 22.22% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus 86.64% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils 56.48% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info 62.57% (+0.32%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info 72.80% (-0.20%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info 66.52% (-0.29%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.80% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework 33.25% (-0.12%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability 88.89% (+1.39%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement 92.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion 37.39% (+0.40%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy 98.61% (+0.02%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable 92.62% (-0.30%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies 73.33% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype 90.91% (+0.91%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology 88.08% (+0.61%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/allocate.go 0.00% (ø) 123 0 123
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go 77.53% (ø) 89 69 20
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors.go 96.92% (-0.34%) 65 (-8) 63 (-8) 2 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors.go 48.35% (+3.79%) 91 (-1) 44 (+3) 47 (-4) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/gpu_sharing_node_info.go 76.62% (+0.46%) 154 (+3) 118 (+3) 36 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info.go 71.10% (-0.42%) 346 (+23) 246 (+15) 100 (+8) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info.go 62.35% (-0.31%) 162 (+4) 101 (+2) 61 (+2) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info.go 85.87% (ø) 269 231 38
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/session.go 0.00% (ø) 189 (+3) 0 189 (+3)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/statement.go 50.35% (ø) 288 145 143
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability/nodeavailability.go 88.89% (+1.39%) 9 (+1) 8 (+1) 1 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/pack.go 96.88% (ø) 32 31 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/spread.go 100.00% (ø) 11 11 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go 100.00% (ø) 26 (+1) 26 (+1) 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion.go 37.39% (+0.40%) 222 (+3) 83 (+2) 139 (+1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go 92.62% (-0.30%) 122 (+9) 113 (+8) 9 (+1) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimer_info.go 0.00% (ø) 0 0 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go 73.33% (ø) 15 11 4
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils/utils.go 0.00% (ø) 17 (+14) 0 17 (+14)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype/resourcetype.go 90.91% (+0.91%) 11 (+1) 10 (+1) 1 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering.go 93.99% (+0.56%) 283 (+9) 266 (+10) 17 (-1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_plugin.go 92.11% (+0.93%) 38 (+4) 35 (+4) 3 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_structs.go 73.33% (ø) 15 11 4

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/feasible_nodes_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/pod_scenario_builder_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils/job_order_by_queue_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodepack_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodespread_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/node_scoring_test.go

Comment thread pkg/scheduler/actions/common/feasible_nodes_test.go
Comment thread pkg/scheduler/api/common_info/job_errors.go Outdated
Comment thread pkg/scheduler/api/common_info/pod_errors.go Outdated
@github-actions

Copy link
Copy Markdown

Merging this branch changes the coverage (5 decrease, 4 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common 23.11% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers 22.22% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus 86.64% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils 56.48% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info 61.96% (-0.29%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info 72.63% (-0.36%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info 66.52% (-0.29%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info 53.63% (+1.01%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.80% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework 34.88% (-0.09%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability 87.50% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement 92.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion 37.39% (+0.40%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy 98.61% (+0.02%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable 92.44% (-0.48%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies 73.33% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype 90.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology 87.97% (+0.50%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/allocate.go 0.00% (ø) 123 0 123
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go 77.53% (ø) 89 69 20
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors.go 96.92% (-0.34%) 65 (-8) 63 (-8) 2 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors.go 46.59% (+2.03%) 88 (-4) 41 47 (-4) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/gpu_sharing_node_info.go 76.16% (ø) 151 115 36
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info.go 71.04% (-0.47%) 335 (+12) 238 (+7) 97 (+5) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info.go 62.35% (-0.31%) 162 (+4) 101 (+2) 61 (+2) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/resource_vector.go 60.71% (+3.67%) 140 (-9) 85 55 (-9) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info.go 85.87% (ø) 269 231 38
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/session.go 0.53% (-0.01%) 188 (+2) 1 187 (+2) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/statement.go 54.51% (ø) 288 157 131
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability/nodeavailability.go 87.50% (ø) 8 7 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/pack.go 96.88% (ø) 32 31 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/spread.go 100.00% (ø) 11 11 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go 100.00% (ø) 26 (+1) 26 (+1) 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion.go 37.39% (+0.40%) 222 (+3) 83 (+2) 139 (+1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go 92.44% (-0.48%) 119 (+6) 110 (+5) 9 (+1) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimer_info.go 0.00% (ø) 0 0 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go 73.33% (ø) 15 11 4
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils/utils.go 0.00% (ø) 14 (+11) 0 14 (+11)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype/resourcetype.go 90.00% (ø) 10 9 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering.go 93.91% (+0.48%) 279 (+5) 262 (+6) 17 (-1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_plugin.go 92.11% (+0.93%) 38 (+4) 35 (+4) 3 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_structs.go 73.33% (ø) 15 11 4

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/feasible_nodes_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/pod_scenario_builder_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils/job_order_by_queue_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodepack_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodespread_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/node_scoring_test.go

enoodle and others added 7 commits March 12, 2026 11:30
…eads to vectors

Convert all read-path methods from Resource to ResourceVector operations:
- NodeInfo: IsTaskAllocatable, FittingError, GetSumOfIdleGPUs, IsCPUOnlyNode
- External plugins: proportion, topology, nodeplacement, nodeavailability, resourcetype
- Framework: session logging, statement references
- Error handling: pod_errors, job_errors

Add AcceptedResourceVector to PodInfo. Add QuantifyVector util to proportion
plugin. Rewrite topology calcNodeAccommodation from iterative pod probing to
division-based vector approach.

Resource fields still maintained via dual-write for backward compatibility
until removal in subsequent commits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Thread vectorMap through getJobRatioToFreeResources, sortTree,
  sortTreeFromRoot to avoid allocating a new ResourceVectorMap on
  every call in the hot scheduling path
- Reorder gpuIdx guard before gpuRequest extraction for clarity
- Improve comment explaining releasing memory handling in fractional
  GPU capacity calculation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sons

- Fix log messages in allocate.go that said "requires: %v GPUs" but now
  print a full resource vector
- Rewrite lessEqualVectorsExcludingGPU to iterate and skip the GPU index
  instead of mutating input vectors via save/zero/compare/restore, which
  is not safe under concurrent access

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CPUIndex, MemoryIndex, GPUIndex, and PodsIndex constants to replace
GetIndex calls for core resources, which are always at fixed positions
since NewResourceVectorMap guarantees their insertion order.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@enoodle enoodle force-pushed the erez/migrate-node-info-pod-info-to-vectors branch from 9534600 to 8f7d6b3 Compare March 12, 2026 10:32
@github-actions

Copy link
Copy Markdown

Merging this branch changes the coverage (5 decrease, 4 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common 23.11% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers 22.22% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus 86.64% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils 56.48% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info 61.96% (-0.29%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info 72.63% (-0.36%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info 66.52% (-0.29%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info 53.63% (+1.01%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.80% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework 34.88% (-0.09%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability 87.50% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement 92.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion 37.39% (+0.40%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy 98.61% (+0.02%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable 92.44% (-0.48%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies 73.33% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype 90.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology 87.97% (+0.50%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/allocate.go 0.00% (ø) 123 0 123
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go 77.53% (ø) 89 69 20
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors.go 96.92% (-0.34%) 65 (-8) 63 (-8) 2 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors.go 46.59% (+2.03%) 88 (-4) 41 47 (-4) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/gpu_sharing_node_info.go 76.16% (ø) 151 115 36
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info.go 71.04% (-0.47%) 335 (+12) 238 (+7) 97 (+5) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info.go 62.35% (-0.31%) 162 (+4) 101 (+2) 61 (+2) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/resource_vector.go 60.71% (+3.67%) 140 (-9) 85 55 (-9) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info.go 85.87% (ø) 269 231 38
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/session.go 0.53% (-0.01%) 188 (+2) 1 187 (+2) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/statement.go 54.51% (ø) 288 157 131
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability/nodeavailability.go 87.50% (ø) 8 7 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/pack.go 96.88% (ø) 32 31 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/spread.go 100.00% (ø) 11 11 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go 100.00% (ø) 26 (+1) 26 (+1) 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion.go 37.39% (+0.40%) 222 (+3) 83 (+2) 139 (+1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go 92.44% (-0.48%) 119 (+6) 110 (+5) 9 (+1) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimer_info.go 0.00% (ø) 0 0 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go 73.33% (ø) 15 11 4
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils/utils.go 0.00% (ø) 14 (+11) 0 14 (+11)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype/resourcetype.go 90.00% (ø) 10 9 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering.go 93.91% (+0.48%) 279 (+5) 262 (+6) 17 (-1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_plugin.go 92.11% (+0.93%) 38 (+4) 35 (+4) 3 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_structs.go 73.33% (ø) 15 11 4

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/feasible_nodes_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/pod_scenario_builder_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils/job_order_by_queue_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodepack_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodespread_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/node_scoring_test.go

Comment thread pkg/scheduler/plugins/nodeplacement/pack.go Outdated
Comment thread pkg/scheduler/plugins/proportion/utils/utils.go
Comment thread pkg/scheduler/plugins/topology/job_filtering.go Outdated
enoodle added 3 commits March 13, 2026 09:42
Change GetIndex, AddResource, ResourceAt, and internal fields to use
v1.ResourceName instead of string, removing unnecessary string casts
at call sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
…MIG GPU summation

Add TotalGPUs method on ResourceVector that sums regular GPUs and MIG
device portions, replacing duplicated logic in GetSumOfIdleGPUs,
GetSumOfReleasingGPUs, and QuantifyVector.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
…topology filtering

Extract the shared GPU fraction summation logic into
NodeInfo.AvailableSharedGPUFractions() and simplify
calcNodeAccommodation to use it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
@github-actions

Copy link
Copy Markdown

Merging this branch changes the coverage (5 decrease, 4 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common 23.11% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers 22.22% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus 86.64% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils 56.48% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info 61.96% (-0.29%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info 71.88% (-1.12%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info 66.52% (-0.29%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info 52.73% (+0.11%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.80% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework 34.88% (-0.09%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/gpu_sharing 43.24% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability 87.50% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement 92.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/podaffinity 88.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion 37.39% (+0.40%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy 98.61% (+0.02%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable 92.44% (-0.48%) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies 73.33% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype 90.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology 87.95% (+0.48%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/allocate.go 0.00% (ø) 123 0 123
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus.go 77.53% (ø) 89 69 20
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors.go 96.92% (-0.34%) 65 (-8) 63 (-8) 2 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors.go 46.59% (+2.03%) 88 (-4) 41 47 (-4) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/gpu_sharing_node_info.go 70.55% (-5.61%) 163 (+12) 115 48 (+12) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info.go 72.56% (+1.04%) 317 (-6) 230 (-1) 87 (-5) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info.go 62.35% (-0.31%) 162 (+4) 101 (+2) 61 (+2) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/resource_vector.go 57.43% (+0.39%) 148 (-1) 85 63 (-1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info.go 85.87% (ø) 269 231 38
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/session.go 0.53% (-0.01%) 188 (+2) 1 187 (+2) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/framework/statement.go 54.51% (ø) 288 157 131
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability/nodeavailability.go 87.50% (ø) 8 7 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/pack.go 96.88% (ø) 32 31 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/spread.go 100.00% (ø) 11 11 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go 100.00% (ø) 26 (+1) 26 (+1) 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion.go 37.39% (+0.40%) 222 (+3) 83 (+2) 139 (+1) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable.go 92.44% (-0.48%) 119 (+6) 110 (+5) 9 (+1) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimer_info.go 0.00% (ø) 0 0 0
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies.go 73.33% (ø) 15 11 4
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/utils/utils.go 0.00% (ø) 4 (+1) 0 4 (+1)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype/resourcetype.go 90.00% (ø) 10 9 1
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering.go 94.07% (+0.64%) 270 (-4) 254 (-2) 16 (-2) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_plugin.go 92.11% (+0.93%) 38 (+4) 35 (+4) 3 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/topology_structs.go 73.33% (ø) 15 11 4

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/feasible_nodes_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/accumulated_scenario_filters/idle_gpus/idle_gpus_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/common/solvers/pod_scenario_builder_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/utils/job_order_by_queue_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/job_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info/pod_errors_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info_benchmark_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/node_info/node_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/resource_vector_benchmark_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/resource_vector_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/gpu_sharing/gpuSharing_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeavailability/nodeavailability_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodepack_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodespread_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/podaffinity/podaffinity_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/proportion_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/reclaimable_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/proportion/reclaimable/strategies/strategies_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/resourcetype/resourcetype_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/job_filtering_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/plugins/topology/node_scoring_test.go

@enoodle enoodle enabled auto-merge March 13, 2026 09:48
@enoodle enoodle added this pull request to the merge queue Mar 13, 2026
Merged via the queue into main with commit 3effe53 Mar 13, 2026
11 of 13 checks passed
@enoodle enoodle deleted the erez/migrate-node-info-pod-info-to-vectors branch March 13, 2026 11:07
davidLif pushed a commit that referenced this pull request Apr 5, 2026
#1146)

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants