scx_layered: StickyDynamic fails to allocate CPUs when initial LLC is saturated, despite hundreds of idle cores available on other LLCs

Issue Description 
Summary I am running scx_layered on a large server (512 logical cores). I have configured two Confined layers for two VMs, both using the StickyDynamic growth algorithm. Crucially, I have NOT set any llcs or nodes constraints.

Both workloads (VMs) inherently start execution on the same NUMA node (likely Node 0) due to standard Linux behavior. The result: The first layer is successfully allocated 20 CPUs (saturating the local LLC). The second layer receives 0 CPUs and completely falls back to Open mode, even though the machine has vast amounts of idle resources on other NUMA nodes/LLCs.

It appears StickyDynamic is too conservative: when the local domain is full, it fails to "spill over" or migrate the workload to other empty nodes to satisfy the cpus_range minimum.

System Environment: 
```
Hardware: Dual AMD EPYC 9745 (Zen5c)
Topology: 512 logical cores total (SMT enabled).
LLC Layout: 32 logical cores per LLC.
Kernel: 6.18.0-rc4
scx_layered version: 1.0.22-g460ed4d5
```

Configuration (config.json) No specific topology constraints are defined, only cpus_range.
```
[
  {
    "name": "test0",
    "matches": [
      [
        {
          "CgroupRegex": ".*test0.*vcpu.*"
        }
      ],
      [
        {
           "CgroupRegex": ".*test0.*emulator.*"
        }
      ]
    ],
    "kind": {
      "Confined": {
        "cpus_range": [20, 32],
        "util_range": [0.1, 1.0],
        "growth_algo": "StickyDynamic",
        "common": { "preempt": true, "slice_us": 2000 }
      }
    }
  },
  {
    "name": "test1",
    "matches": [
      [
        {
          "CgroupRegex": ".*test1.*vcpu.*"
        }
      ],
      [
        {
           "CgroupRegex": ".*test1.*emulator.*"
        }
      ]
    ],
    "kind": {
      "Confined": {
        "cpus_range": [20, 32],
        "util_range": [0.1, 1.0],
        "growth_algo": "StickyDynamic",
        "common": { "preempt": true, "slice_us": 2000 }
      }
    }
  },
  { "name": "default", "matches": [ [] ], "kind": { "Open": {} } }
]
```
Steps to Reproduce:
```
- Start two VMs (test0 and test1), each with 20 vCPUs.
- Apply a heavy load (e.g., 20 threads) inside both VMs.
- Observe that initially, both VMs are likely scheduled on the first few cores (Node 0) by the default kernel scheduler.
- Start scx_layered with the configuration above.
```
Observed Behavior:
Layer test06: Allocates 20 CPUs successfully (likely taking up most of LLC 0).
Layer test1: Fails to allocate any CPUs. Monitor shows cpus=0.
```
###### Thu, 15 Jan 2026 06:19:58 -0500 ######
tot= 349732 local_sel/enq=22.11/ 1.61 open_idle= 0.00 affn_viol= 0.00 hi/lo= 0.00/40.85
busy=  4.9 util/hi/lo= 2352.0/ 0.00/988.4 fallback_cpu/util= 10/ 0.0 proc=15ms sys_util_ewma=  2.7
excl_coll=0.00 excl_preempt=0.00 excl_idle=0.00 excl_wakeup=0.00
skip_preempt=0 antistall=0 fixup_vtime=0 preempting_mismatch=0
gpu_tasks_affinitized=0 gpu_task_affinitization_time=0
  test0: util/open/frac=1360.4/ 0.00/   57.8 prot/prot_preempt= 0.01/ 0.00 tasks=    40
             tot= 206443 local_sel/enq=37.28/ 2.72 enq_dsq=57.28 wake/exp/reenq=60.00/ 0.00/ 0.00 dsq_ewma=33.62
             keep/max/busy= 0.00/ 0.00/ 0.00 yield/ign= 0.00/    0
             open_idle= 0.00 mig=76.59 xnuma_mig= 0.00 xllc_mig/skip= 0.00/ 0.00 affn_viol= 0.00
             preempt/first/xllc/xnuma/idle/fail= 0.00/ 0.00/ 0.00/ 0.00/ 0.00/ 0.00
             xlayer_wake/re= 0.69/ 0.10 llc_drain/try= 0.00/ 0.00 skip_rnode= 0.00
             slice=20ms min_exec= 0.00/   0.00ms
             cpus= 20 [ 20, 20] 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000003ff,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000003ff
             [LLC] nr_cpus: sched% lat_ms
             [000] 20:100.0%   0.18 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
             [004]  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
             [008]  0: 0.00%   0.31 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
             [012]  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
  test1: util/open/frac= 988.5/100.0/   42.0 prot/prot_preempt= 0.00/ 0.00 tasks=    45
             tot= 142861 local_sel/enq= 0.00/ 0.00 enq_dsq= 0.00 wake/exp/reenq=99.99/ 0.01/ 0.00 dsq_ewma= 0.00
             keep/max/busy= 0.00/ 0.00/ 0.00 yield/ign= 0.00/    0
             open_idle= 0.00 mig= 9.95 xnuma_mig= 0.00 xllc_mig/skip= 0.00/ 0.00 affn_viol= 0.00
             preempt/first/xllc/xnuma/idle/fail= 0.00/ 0.00/ 0.00/ 0.00/ 0.00/ 0.00
             xlayer_wake/re= 8.12/ 0.35 llc_drain/try= 0.00/ 0.00 skip_rnode= 0.00
             slice=20ms min_exec= 0.00/   0.00ms
             cpus=  0 [  0,  0] 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
             [LLC] nr_cpus: sched% lat_ms
             [000]  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
             [004]  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
             [008]  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
             [012]  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00 |  0: 0.00%   0.00
  default  : util/open/frac=   3.1/ 1.07/    0.1 prot/prot_preempt= 0.01/58.08 tasks=   289
             tot=    428 local_sel/enq=88.55/ 2.80 enq_dsq= 0.00 wake/exp/reenq= 8.64/ 0.00/ 0.00 dsq_ewma= 0.08
             keep/max/busy= 0.00/ 0.00/ 0.00 yield/ign= 0.23/    0
             open_idle= 0.00 mig= 6.31 xnuma_mig= 0.00 xllc_mig/skip= 0.00/ 0.00 affn_viol= 0.00
             preempt/first/xllc/xnuma/idle/fail= 0.00/ 0.00/ 0.00/ 0.00/ 0.00/ 0.00
             xlayer_wake/re= 4.67/ 0.47 llc_drain/try= 0.00/ 0.00 skip_rnode= 0.00
             slice=20ms min_exec= 0.00/   0.00ms
             cpus=492 [492,492] ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,fffffc00,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,fffffc00
             [LLC] nr_cpus: sched% lat_ms
             [000]  0: 0.00%   0.73 |  0: 0.00%   0.45 |  0: 0.00%   0.00 |  0: 0.00%   0.31
             [004]  0: 0.00%   0.00 |  0: 0.00%   0.16 |  0: 0.00%   0.45 |  0: 0.00%   0.00
             [008]  0: 0.00%   0.87 |  0: 0.00%   0.87 |  0: 0.00%   0.31 |  0: 0.00%   0.60
             [012]  0: 0.00%   0.31 |  0: 0.00%   0.00 |  0: 0.00%   0.31 |  0: 0.00%   0.31
```
Expected Behavior Since cpus_range has a hard minimum of 20, and the machine has ~400 idle cores on other NUMA nodes/LLCs, StickyDynamic should detect that the current location is saturated and migrate the second layer to an empty LLC/Node. Instead, it seems to strictly adhere to the "Sticky" principle, sees 0 available capacity locally, and gives up allocation entirely.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_layered: StickyDynamic fails to allocate CPUs when initial LLC is saturated, despite hundreds of idle cores available on other LLCs #3233

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scx_layered: StickyDynamic fails to allocate CPUs when initial LLC is saturated, despite hundreds of idle cores available on other LLCs #3233

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions