Helper scripts for installing, enabling, monitoring, benchmarking, and
resetting scx_flow through the shared scx.service systemd unit.
scx_flow is a budget-based sched_ext scheduler with a small number of
bounded service paths and decayed confidence signals. In plain terms:
- sleeping tasks refill budget
- short responsive wakeups can get bounded faster service
- repeated good behavior strengthens locality and IPC confidence
- repeated exhaustion raises containment and latency pressure
- shared fallback work still runs so the machine does not become unfair
Read the diagram like this:
- start at the
Startcircle - follow arrows from top to bottom
- normal rectangles are actions or scheduler steps
- diamond shapes are yes/no decisions
- arrow labels such as
YesandNotell you which branch to follow - the loop at the bottom means the task goes back to sleep and the cycle begins again
flowchart TD
Start((Start)) --> A[Task Sleeps]
A --> B[Budget Refill + Signal Update]
B --> C[Recompute Wake Profile]
C --> D{Positive Budget?}
D -- No --> Shared[Shared Path]
D -- Yes --> E{Containment Active?}
E -- Yes --> Contained[Contained Path]
E -- No --> F{RT or Preempt Ready?}
F -- Yes --> RT[Preempt + Tiny Local Slice]
F -- No --> G{Latency Allowance or Pressure?}
G -- Yes --> Latency[Latency / Urgent Latency Path]
G -- No --> H{Locality or IPC Confidence?}
H -- Yes --> Local[Bounded Local Fast Path]
H -- No --> Reserved[Reserved Path]
RT --> Dispatch[Dispatch Arbitration]
Latency --> Dispatch
Local --> Dispatch
Reserved --> Dispatch
Contained --> Dispatch
Shared --> Dispatch
Dispatch --> Run[Task Runs]
Run --> I{Exhausted Budget?}
I -- Yes --> Bad[Raise Containment + Latency Pressure]
I -- No --> Good[Good Short Sleep Raises Locality and IPC Confidence]
Bad --> EndCycle([Task Stops And Sleeps Again])
Good --> EndCycle
EndCycle --> A
scx_flowis not a plain FIFO scheduler- it is not trying to be globally fair in one queue either
- it tries to keep wakeups responsive while still bounding interference from heavy tasks
If you are reading benchmark output, this mental model helps:
- strong latency numbers usually mean the bounded lanes are doing their job
- strong FPS numbers usually mean the locality-friendly and reserved paths are feeding short bursts well
- bad regressions often mean tasks are being classified into the wrong path
When checking whether scx_flow is healthy, focus on these signals first:
- Is
sched_extenabled right now? - Is
scx_flowthe active scheduler right now? - Is
scx.servicestill running, or is it crash-looping? - Are logs showing fresh runtime errors, or only old history from earlier runs?
Everything else is secondary.
In practice, trust the live kernel state before anything else:
cat /sys/kernel/sched_ext/state
cat /sys/kernel/sched_ext/root/ops
systemctl status scx.serviceIf those three look healthy, then scx_flow is running even if a helper script
or benchmark step prints a separate tooling error.
Healthy scx_flow usually looks like this:
cat /sys/kernel/sched_ext/state
enabled
cat /sys/kernel/sched_ext/root/ops
scx_flow_*And:
./status_scx_flow.shends withscx_flow is installed, configured, and currently active./monitor_scx_flow.shshows:sched_ext state: enabledService status: active (running)- a current
Main PID
systemctl status scx.serviceshowsactive (running)journalctl -u scx.serviceshows a recentStarting scx_flow schedulerline and no new repeated runtime errors
If sudo ./benchmark.sh prints:
Current scheduler: scx_flow_*that is already a healthy scheduler signal. The benchmark step itself is a separate concern.
These are the main failure patterns to care about:
Bad signs:
cat /sys/kernel/sched_ext/state
disabledor:
cat /sys/kernel/sched_ext/root/ops
(empty)This means the scheduler is not currently active.
Bad signs:
systemctl status scx.serviceshowsfailedstatus_scx_flow.shreports inactive/disabled/nonejournalctl -u scx.service -fkeeps showing repeated start/fail/start/fail cycles
Bad signs in logs:
runtime errorinvalid CPU ... from ops.select_cpu()- repeated
disabled (runtime error)after the most recent start
Old errors in dmesg or journalctl do not matter by themselves if the
current service instance is healthy. Always check whether the latest run is
still active.
If you see:
cyclictest: command not foundthat is not a scheduler failure. It means benchmark dependencies are not installed yet.
Run:
sudo ./install_benchmark_deps.shbefore sudo ./benchmark.sh.
On Arch/CachyOS, hackbench and lmbench may not exist as standalone official
pacman package targets. That is expected here.
This pattern is common and should not be mistaken for a scheduler failure:
Current scheduler: scx_flow_*
...
cyclictest: command not foundMeaning:
scx_flowis activesched_extis working- the benchmark script stopped only because
rt-testsis not installed
Fix:
sudo ./install_benchmark_deps.sh
sudo ./benchmark.shOnly treat it as a scheduler problem if the live state also goes bad, for
example sched_ext becomes disabled, root/ops becomes empty, or
scx.service stops running.
If install_benchmark_deps.sh reports package errors such as:
error: target not found: hackbench
error: target not found: lmbenchthat is a packaging issue, not a scheduler issue.
What happens now:
- the dependency installer only installs the official packages that exist on your system
benchmark.shchecks tool availability before each benchmark- if
hackbenchis missing, the script falls back tosysbench
sudo ./install.sh --forceExpected:
- binary builds and installs
scx.servicerestarts- installer reports an active scheduler
./status_scx_flow.shExpected:
- installed binary
- installed version
scx.service state: activescx.service enabled: enabled- active scheduler matches
scx_floworscx_flow_*
./monitor_scx_flow.shExpected:
- current scheduler shown at top
sched_ext state: enabledscx.serviceactive- logs from the current service activation, not stale old runs
sudo ./test_scx_flow.shExpected:
- uninstall works
- reinstall works
sched_ext stateisenabled- active scheduler is
scx_flow_* - status helper reports active
sudo ./install_benchmark_deps.sh
sudo ./benchmark.shExpected:
- required benchmark tools exist
- benchmark log file is created
- no service crash during benchmark run
If the benchmark stops with command not found, install benchmark dependencies
first and rerun. That result does not mean scx_flow failed.
sudo ./validate_hooks_scx_flow.shExpected:
runnable()shows non-zero activity under the wake-heavy testcpu_release()may or may not trigger depending on RT pressure timing, but the script shows whether it was actually exercised- the monitor excerpt contains
runnable=andcpu_release=fields fromscx_flow --monitor
Use this when broad benchmarks look healthy but you specifically want to verify
that newer scx hooks are doing real work instead of just compiling.
sudo ./validate_lifecycle_scx_flow.shExpected:
init_task()should go non-zero under short-lived task burstsenable()/exit_task()may stay zero depending on kernel behavior, and the script explains that distinction explicitly- the output helps separate “optional lifecycle hook not exercised” from “task creation path is broken”
Use this when you want to verify lifecycle coverage after changing
init_task(), enable(), or exit_task().
sudo ./latency_stress_scx_flow.shExpected:
- creates a timestamped result directory in
latency-stress-results/ - keeps only the newest three latency-stress result directories automatically
- runs a mixed-load phase with
cyclictest, wake storms, and short-lived task churn - runs an RT-interference phase when
taskset,chrt, andtimeoutare available - captures
scx_flow --monitoroutput and writes a machine-readable summary env file - records mixed and RT latency
p95,p99, max, spike counts, and sample counts from thecyclictestsamples
Use this when the broad benchmark looks good but you want a more adversarial latency-focused check before claiming the scheduler is review-ready.
sudo ./validate_containment_scx_flow.shExpected:
- long-lived bursty workers first behave like budget-exhausting hogs and then switch into a recovery phase
hog_containshould go non-zero if the containment path is alivehog_recovershould go non-zero if the same workers recover cleanly enough- the output also shows
exhaust,pos_wake, andlatency_enqso you can tell whether the workload actually matched the trigger shape
Use this when a broad stress run fails to prove whether the new v2
containment logic is truly helping, truly dead, or just too conservative.
sudo ./mini_benchmarker.shExpected:
- it compares
baseline,scx_cosmos,scx_bpfland,scx_cake, andscx_flowby default - each scheduler gets its own raw benchmark log and summary file
- a comparison CSV, PNG, SVG, and Markdown report are generated
- only the newest three comparison result directories are kept automatically
- if one scheduler fails to activate, the run continues and saves diagnostics in
comparison-results/.../diagnostics/
If you only want scx_cosmos vs scx_flow:
sudo ./mini_benchmarker.sh --schedulers "scx_cosmos scx_flow"Pass --hard-rt to use the hard real-time cyclictest configuration (FIFO
priority 99, SMP spread across all CPUs, 200us interval, histogram up to
20us) instead of the default single-CPU 1ms interval:
sudo ./mini_benchmarker.sh --hard-rt --schedulers "scx_flow scx_bpfland scx_cosmos"sudo ./deadline_benchmarker.sh --runs 2 --schedulers "baseline scx_cosmos scx_flow"Expected:
- it compares periodic frame-target wake deadline behavior across the selected schedulers
- each scheduler gets its own raw log, summary env, and raw JSON probe output
- a comparison CSV, PNG, SVG, and Markdown report are generated
- the report and charts now include
Deadline Jitter p99 (us)for deadline-consistency comparisons - only the newest three deadline comparison result directories are kept automatically
- you can add another installed scheduler such as
scx_pandemoniumdirectly in--schedulers
Use this when scx_flow already looks good in broad latency/FPS testing and
you want a tighter answer to “how often does a frame-like periodic task wake up
late enough to miss its deadline under load?”
./prepare_review_bundle.sh \
--comparison-dir ./comparison-results/<timestamp> \
--hook-log /path/to/hook-validation.log \
--lifecycle-log /path/to/lifecycle-validation.logExpected:
- generates a concise Markdown bundle from a
mini_benchmarker.shcomparison snapshot - includes optional hook/lifecycle validation maxima when those logs are provided
- automatically includes the newest latency-stress summary when one exists
- surfaces latency-stress tail metrics such as mixed/RT
p95andp99when the summary provides them - keeps the claims and the known limits in one review-friendly place
Use this when you want one artifact to share with senior engineers instead of pointing them at multiple directories and terminal transcripts.
sudo ./burst_benchmarker.sh --runs 2 --schedulers "baseline scx_cosmos scx_flow"
sudo ./burst_benchmarker.sh --strict --runs 2 --schedulers "baseline scx_flow"Expected:
- it compares sudden load-spike tail latency across the selected schedulers
- each scheduler gets its own raw log, summary env, and raw JSON probe output
- a comparison CSV, PNG, SVG, and Markdown report are generated
- only the newest three burst comparison result directories are kept automatically
- you can add another installed scheduler such as
scx_pandemoniumdirectly in--schedulers --strictswitches to a much longer burst run so ultra-low miss ratios such as0.01%and below are easier to measure credibly
Use this when you specifically want a local equivalent of the “Burst P99 (us)” style tables from other scheduler benchmark suites.
sudo ./mixed_benchmarker.sh
sudo ./mixed_benchmarker.sh --schedulers "scx_cosmos scx_pandemonium scx_flow"
sudo ./mixed_benchmarker.sh --schedulers "baseline scx_cosmos scx_flow"Expected:
- it compares the mixed latency-stress workload across the selected schedulers
- each scheduler gets its own raw log, env summary, monitor log, and kernel log
- a comparison CSV, PNG, SVG, and Markdown report are generated
- the charts focus on mixed/RT
p95,p99, and max latency plus kernel stall events - only the newest three mixed comparison result directories are kept automatically
- the default run skips
baselineto save time and avoid the plain-kernel RT-hog corner case during everyday mixed comparisons baselineis still supported explicitly when you do want a plain-kernel comparison in the same mixed table
Use this when you want a local equivalent of “Mixed Workload Latency P99 (us)” style tables without hand-comparing separate latency-stress result directories.
sudo ./longrun_benchmarker.sh
sudo ./longrun_benchmarker.sh --schedulers "baseline scx_cosmos scx_pandemonium scx_flow"Expected:
- it compares sustained periodic wake latency under continuous background CPU load
- each scheduler gets its own raw log, env summary, and raw JSON probe output
- a comparison CSV, PNG, SVG, and Markdown report are generated
- the charts focus on long-run miss ratio, late-over-threshold ratio, and
p95/p99/max - the default longrun soft threshold is intentionally lower than the probe period so miss ratio and late-over-threshold ratio do not collapse into the same metric
- only the newest three longrun comparison result directories are kept automatically
Use this when you specifically want a local equivalent of “Long-Run Latency P99 (us)” style tables from other scheduler benchmark suites.
sudo ./ipc_benchmarker.sh
sudo ./ipc_benchmarker.sh --schedulers "baseline scx_cosmos scx_pandemonium scx_flow"Expected:
- it compares Unix socket ping-pong round-trip tails under background CPU load
- each scheduler gets its own raw log, env summary, and raw JSON probe output
- a comparison CSV, PNG, SVG, and Markdown report are generated
- the charts focus on IPC over-threshold ratio plus
p95/p99/maxround-trip latency - only the newest three IPC comparison result directories are kept automatically
Use this when you want a local equivalent of an “IPC Round-Trip P99 (us)” table instead of inferring IPC behavior from broader mixed or longrun tests.
sudo ./app_launch_benchmarker.sh
sudo ./app_launch_benchmarker.sh --schedulers "baseline scx_cosmos scx_pandemonium scx_flow"Expected:
- it compares repeated app-launch latency under background CPU load
- each scheduler gets its own raw log, env summary, and raw JSON probe output
- a comparison CSV, PNG, SVG, and Markdown report are generated
- the charts focus on app-launch over-threshold ratio plus
p95/p99/maxlaunch latency - only the newest three app-launch comparison result directories are kept automatically
Use this when you want a direct local equivalent of an “App Launch P99 (us)” table instead of guessing from IPC or mixed-workload results.
sudo ./fork_thread_benchmarker.sh
sudo ./fork_thread_benchmarker.sh --schedulers "baseline scx_cosmos scx_pandemonium scx_flow"Expected:
- it compares
perf bench sched messagingelapsed time across schedulers - each scheduler gets its own raw benchmark log plus raw
perfstdout/stat paths - a comparison CSV, PNG, SVG, and Markdown report are generated
- the charts focus on elapsed time, IPC, and cache misses
- only the newest three fork-thread comparison result directories are kept automatically
Use this when you want a local equivalent of the fork-thread throughput table instead of inferring cache behavior from unrelated latency benchmarks.
sudo ./keeper_validate_scx_flow.shExpected:
- runs the current "keeper" validation bundle in one go
- covers burst, mixed, deadline, longrun, and fork/thread comparisons
- is useful before freezing a scheduler checkpoint or preparing a reviewer-facing summary
Note:
- the example intentionally avoids placeholder paths like
<latest-timestamp>because shells such asfishinterpret angle brackets as redirection syntax rather than literal text - if you do want to pass a specific latency-stress summary manually, use a real path, not a placeholder token
Use this table when reading terminal output:
| What you see | What it means | What to do |
|---|---|---|
state = enabled and root/ops = scx_flow_* |
Scheduler is healthy and active | Keep testing |
scx.service is active (running) |
Service is healthy right now | Keep monitoring |
Old invalid CPU ... lines in older logs |
Historical failure from a previous run | Ignore if current run is healthy |
cyclictest: command not found |
Missing benchmark dependency | Run sudo ./install_benchmark_deps.sh |
error: target not found: hackbench |
Arch/CachyOS package mismatch, not scheduler failure | Let the script use its fallback path |
state = disabled or empty root/ops |
Scheduler is not active | Reinstall or inspect logs |
| Repeated service restart/fail loops | Current runtime failure | Check journalctl -u scx.service -f immediately |
Builds scx_flow, installs /usr/bin/scx_flow, writes scx.service, updates
/etc/default/scx, restarts the service, and fails if scx_flow does not
become active.
By default it builds from the local scx tree at
scheds/experimental/scx_flow.
Self-contained installer that clones the scx scheduler source and the
testing repository, runs install.sh from the testing clone with
SCX_SOURCE_DIR set, then cleans up all temporary clones. Only leaves
/usr/bin/scx_flow, /etc/systemd/system/scx.service, and /etc/default/scx.
Unlike install.sh, this script does not require a pre-existing local scx
tree — it handles the full build pipeline from scratch.
./install_scx_flow_standalone.shRewrites /etc/default/scx for scx_flow, restarts scx.service, and fails
if the active scheduler does not match scx_flow.
Shows the installed binary, version, service state, configured scheduler, configured flags, and active scheduler.
Shows current scheduler state, a short systemctl status, and follows
scx.service logs from the current activation time.
Stops scx.service, kills leftover scheduler processes, and waits for
sched_ext to become idle.
Runs cyclictest, a throughput benchmark (hackbench when available, otherwise
sysbench), stress-ng, and uptime, writing results to a timestamped log
file. It also supports machine-readable summary output for automation.
Runs targeted wake-heavy and RT-pressure checks while capturing
scx_flow --monitor output, so you can confirm whether runnable() and
cpu_release() are actually being exercised on your machine.
Runs short-lived task bursts while capturing scx_flow --monitor output so you
can verify the task-creation and lifecycle-related hooks separately from the
broad benchmark suite.
Runs long-lived burst workers that first accumulate budget exhaustions and then
shift into a recovery phase, so you can confirm whether the v2 hog
containment and recovery counters actually move on your machine.
Runs multi-scheduler comparisons using benchmark.sh, generates a CSV summary,
PNG/SVG charts, and a Markdown report, and rotates old comparison result
directories so only the latest three are kept by default.
Runs a single Aquarium + stress-ng benchmark against the current scheduler,
using Playwright to sample frame timing, FPS, and jank directly from the
WebGL Aquarium tab while the system is under load.
Runs multi-scheduler comparisons using aquarium_benchmark.sh, generates a
CSV summary, PNG/SVG charts, and a Markdown report, and rotates old Aquarium
result directories so only the latest three are kept by default. By default it
also performs one uncounted warmup run per scheduler before the measured runs
so browser/WebGL warmup does not pollute the shared charts.
Switches to a chosen scheduler, runs one Aquarium benchmark under perf sched,
captures turbostat when available, and writes a small Markdown trace report
plus raw trace artifacts. Use this when Aquarium FPS looks wrong and you need
evidence about run-time fragmentation or frequency behavior before changing
scx_flow.
Runs a targeted mixed-load and RT-interference latency check against the active
scx_flow, writes timestamped logs and a machine-readable summary env file,
captures scx_flow --monitor output, and rotates old result directories so
only the latest three are kept by default. It also records kernel
sched_ext/scx_flow events from the run window so runnable-task stalls are
called out explicitly instead of being mistaken for a clean pass. The summary
now includes mixed and RT latency p95/p99 tails in addition to max and
spike counts.
Runs the strict latency-stress validation several times in a row, then writes a
CSV, env summary, and Markdown report with median and worst-case mixed/RT
latency metrics, including p95, p99, max, and spikes over 100us. By
default it reinstalls scx_flow between runs, but it can
also manually launch another scheduler such as scx_cosmos for apples-to-apples
repeat validation:
sudo ./validate_latency_repeat_scx_flow.sh --runs 5
sudo ./validate_latency_repeat_scx_flow.sh --runs 5 --scheduler-name scx_cosmos --scheduler-bin "$(command -v scx_cosmos)"Use this before tuning further so single noisy runs do not get mistaken for real progress. It also emits PNG/SVG charts so repeated tail behavior is easier to inspect quickly by eye.
Runs the same latency-stress workload against multiple schedulers, currently
useful for direct scx_cosmos vs scx_flow comparisons, and writes a small
Markdown report plus CSV summary so you can see whether a stall is specific to
scx_flow or reproduces across schedulers while also comparing mixed/RT
latency tails such as p95 and p99. It also generates PNG/SVG comparison
charts in the result directory.
Runs a periodic absolute-timer wake probe using a frame-like target period
(default 16.666ms) and reports lateness tails plus deadline miss ratio. It is
the measurement core used by the deadline benchmark wrapper.
Runs the periodic frame-target deadline probe against the currently active
scheduler, optionally under stress-ng CPU load, and writes both a human log
and machine-readable summary env file plus raw JSON probe output.
Runs multi-scheduler deadline comparisons using deadline_benchmark.sh,
generates a CSV summary, PNG/SVG charts, and a Markdown report, and rotates old
deadline comparison result directories so only the latest three are kept by
default. You can include baseline, scx_flow, and any other installed
scheduler binary such as scx_pandemonium in the scheduler list.
Runs a fast periodic wake probe while controlled CPU burners turn on and off in short windows, then reports overall, idle, and burst-only lateness tails. This is the measurement core used by the burst benchmark wrapper.
Runs the burst-tail probe against the currently active scheduler and writes both
a human log and machine-readable summary env file plus raw JSON probe output.
Its --strict preset extends the run long enough that the summary can also
report a meaningful BURST_MISS_RATIO_RESOLUTION_PCT for tiny miss ratios.
Runs multi-scheduler burst-tail comparisons using burst_benchmark.sh,
generates a CSV summary, PNG/SVG charts, and a Markdown report, and rotates old
burst comparison result directories so only the latest three are kept by
default. You can include baseline, scx_flow, and any other installed
scheduler binary such as scx_pandemonium in the scheduler list.
Runs a Unix socket ping-pong round-trip probe between paired worker CPUs and
reports over-threshold ratio plus p95, p99, and max round-trip latency.
Runs the IPC round-trip probe against the currently active scheduler under
optional stress-ng CPU load and writes both a human log and machine-readable
summary env file plus raw JSON probe output.
Runs multi-scheduler IPC comparisons using ipc_benchmark.sh, generates a CSV
summary, PNG/SVG charts, and a Markdown report, and rotates old IPC comparison
result directories so only the latest three are kept by default. You can
include baseline, scx_flow, and any other installed scheduler binary such
as scx_pandemonium in the scheduler list.
Runs repeated launches of a configured command and reports app-launch
over-threshold ratio plus p95, p99, and max launch latency.
Runs the app-launch probe against the currently active scheduler under optional
stress-ng CPU load and writes both a human log and machine-readable summary
env file plus raw JSON probe output.
Runs multi-scheduler app-launch comparisons using app_launch_benchmark.sh,
generates a CSV summary, PNG/SVG charts, and a Markdown report, and rotates old
app-launch comparison result directories so only the latest three are kept by
default. You can include baseline, scx_flow, and any other installed
scheduler binary such as scx_pandemonium in the scheduler list.
Runs perf bench sched messaging while collecting perf stat counters for
instructions, cycles, cache misses, and cache references, then writes a small
env summary for automation.
Runs multi-scheduler fork/thread throughput comparisons using
fork_thread_benchmark.sh, generates a CSV summary, PNG/SVG charts, and a
Markdown report, and rotates old fork-thread comparison result directories so
only the latest three are kept by default.
Renders PNG/SVG charts for the latency-stress comparison and repeat-validation CSV outputs so tail metrics can be scanned visually instead of only reading the Markdown/CSV summaries.
Runs a focused mixed workload while sampling cyclictest thread placement,
capturing CPU-to-LLC topology, recording scx_flow --monitor output, and
optionally collecting a small system-wide perf stat snapshot. Use this
before making topology-aware changes so you can see whether the remaining gap
actually looks like a locality problem.
Reads the summary env files written by mini_benchmarker.sh and renders the
human-friendly comparison artifacts.
Reads the summary env files written by aquarium_benchmarker.sh and renders
the Aquarium comparison artifacts.
Builds a compact review-facing Markdown summary from a mini_benchmarker.sh
comparison result directory and optional validation logs.
Runs the current keeper validation bundle in one shot so you can quickly reconfirm burst, mixed, deadline, longrun, and fork/thread behavior before freezing a checkpoint.
Installs the benchmark tools that are available from the local official package
repositories and leaves unsupported optional tools to graceful fallback logic in
benchmark.sh. It also installs python-matplotlib for chart generation.
Installs the local npm dependency for browser automation and downloads the Playwright Chromium build used by the Aquarium benchmark scripts.
Runs an uninstall/install cycle and prints only kernel log entries from the current test window.
- Active scheduler checks use
/sys/kernel/sched_ext/root/ops. - Your kernel may report the active scheduler as a fully qualified name such as
scx_flow_2.2.0_x86_64_unknown_linux_gnu; that is still correct. - The current documented reference line is
scx_flow v2.2.4. scx_flowis intended for general-purpose production use. Treat these scripts as validation and regression tools, not as a claim that one benchmark result alone proves correctness under every possible workload.