PathFinder is a systematic, informative, and lightweight CXL.mem profiler. PathFinder leverages the capabilities of existing hardware performance monitors (PMUs) and dissects the CXL.mem protocol at adequate granularities.
Kernel version > 6.5.0
-
Check perf support: machine model should exist in
tools/perf/pmu-events/arch/x86 -
Check CHA PMU support for CXL path:
- Automatically supported starting from v6.11-rc1 kernel
- Or patch with:
https://github.com/torvalds/linux/commit/a5a6ff3d639d088d4af7e2935e1ee0d8b4e817d4
Required Python libraries are listed in requirements.txt
Run the following command:
python3 analyse/detectTopo.py
This will automatically detect socket, CPU, and PMU topology
and write the configuration to the file analyse/archconf.conf.
You may also manually modify the contents of this configuration file.
Subsequent steps will use it to determine which PMUs to monitor along the CXL.mem path.
Installing a database enables support for historical data analysis.
If you choose not to install one, online analysis of counter data is still available.
Once InfluxDB is running, set the address and port in the configuration file:
analyse/config.conf
This section lists the available command-line options and their meanings.
| Option Name | Value Type | Description | Default Value | Option Value |
|---|---|---|---|---|
module |
string | Function selector: "monitor" enables perf and PEBS latency sampling, "build" uses PFBuilder to analyze historical data, "esti" uses PFEstimator, "analy" uses PFAnalyser, "mater" uses PFMaterializer. To enable both sampling and analysis, use the online parameter. |
empty | monitor, build, esti, analy, mater |
path |
string | Select target mFlow path(s), comma-separated. Used in "monitor" to specify sampling targets, and in "build"/"esti"/"analy"/"mater" to specify analysis targets. |
empty | DRd, RFO, HWPF, DWr |
pmu |
string | Select target component(s), comma-separated. 'default' indicates all components. |
empty | core32-SB, ..., socket1-CHA0-31 |
app |
string | In "monitor": specify the app command to run and monitor. In "build"/"esti"/"analy"/"mater": specify the app key to fetch historical data from the database. |
— | — |
savename |
string | If set, uses this as the alias for storing and retrieving data in InfluxDB. | — | — |
file |
string | Export counter data in chronological order to an Excel .xlsx file. |
— | — |
time |
number | Sampling time interval in seconds. | 5 | — |
app_exist |
string | In "monitor": monitor for already running app. In "build"/"esti"/"analy"/"mater": specify the app key to fetch historical data from the database. |
— | — |
DRd,RFO,HWPF,DWrfor core and CHA componentsLD,STfor iMC and M2PCIe PMUs
Defined in analyse/config.conf
- Supports multiple components via comma separation, e.g.
core32-SB, core32-L1D, core32-LFB, core32-L2, core33-LFB - Supports aggregating counters from multiple small components, e.g.
socket1-CHA0-31aggregates all 32 CHAs on socket1
Collect core and uncore counter data for mbw and store in InfluxDB with app=mbw1024 tag:
python3.8 main.py monitor -app "mbw -t1 -n 2000 1024" -savename mbw1024PathFinder supports command-line parameters to specify the target app, module, mFlow path, and components.
By setting module to PFBuilder, PFEstimator, PFAnalyser, or PFMaterializer, you can inspect different metrics of the monitored application.
Use path to select DRd/RFO/HWPF/DWr (or LD/ST), and pmu to specify which components to trace: SB/L1D/LFB/L2/LLC/CHA/MC.
Analyze historical app counter data stored in the InfluxDB database.
python3.8 main.py build -path DRd,RFO,HWPF,DWr,LD,ST -pmu default -app 525.x264_r-memnode4
python3.8 main.py build -path DRd,RFO,HWPF,DWr,LD,ST -pmu core32-SB,core32-L1D,core32-LFB,core32-L2,core33-LFB,socket1-CHA0,socket1-CHA31,socket1-PCIe5 -app 525.x264_r-memnode4 python3.8 main.py esti -path DRd,RFO,HWPF,DWr -pmu default -app 525.x264_r-memnode4
python3.8 main.py esti -path DRd,RFO,HWPF,DWr -pmu core32-SB,core32-L1D,core32-LFB,core32-L2,core33-LFB,core47-SB,core47-L1D,core47-LFB,core47-L2,socket1-CHA0,socket1-CHA31 -app 525.x264_r-memnode4 python3.8 main.py analy -path DRd,RFO,HWPF,DWr -pmu default -app workloada-10mbw
python3.8 main.py analy -path DRd,RFO,HWPF,DWr -pmu core32-SB,core32-L1D,core32-LFB,core32-L2,core32-core_LLC,core33-LFB,core47-SB,core47-L1D,core47-LFB,core47-L2,socket1-CHA0,socket1-CHA31 -app workloada-10mbw python3.8 main.py mater -path DWr,DRd,RFO,HWPF,LD,ST -app 525.x264_r-memnode4 -pmu core32-L2 -option cluster
python3.8 main.py mater -path DWr,DRd,RFO,HWPF,LD,ST -app 525.x264_r-memnode4 -pmu core32-L2 -option trendAnalyze counter data in real time as it is being collected:
python3.8 main.py monitor -app "sudo numactl --physcpubind=32 --membind=4 mbw -t1 -n 2000 1024" -online estimate -time 5
python3.8 main.py monitor -app "mbw -t1 -n 2000 1024" -online analy| Component | Specification |
|---|---|
| Server Platform | 2U Supermicro |
| Processor | 2× Intel Xeon Gold 6438Y+ (Sapphire Rapids) |
| NUMA Configuration | Sub-NUMA Clustering (SNC) enabled |
| Memory | 256 GB DDR5 |
| CXL Device | 1× CXL Type-3 memory device (appears as CPU-less NUMA node) |
| Operating System | Linux with 6.5 kernel |
| PMU Support | CHA PMU support patches applied(https://github.com/torvalds/linux/commit/a5a6ff3d639d088d4af7e2935e1ee0d8b4e817d4) |
Take the example of monitoring PFBuilder output when SPEC CPU2017 649.fotonik3d_s accesses CXL memory.
Steps:
-
Use
numactlto control the app to access CXL memory:sudo numactl --membind=4 bin/runcpu --config=config 649.fotonik3d_s -
Use
onlinemode to observe PFBuilder output during app execution, and store it in InfluxDB under the name649.fotonik3d_s. -
To view aggregated PFBuilder output on all paths and components:
python3.8 main.py monitor -app "sudo numactl --physcpubind=32 --membind=4 path/to/SPEC/bin/runcpu --config=path/to/SPEC/config 649.fotonik3d_s" -online build -time 5 -savename 649.fotonik3d_s -path DRd,RFO,HWPF,DWr,LD,ST -pmu default -
To view PFBuilder output on DRd path for specific core32 components:
python3.8 main.py monitor -app "sudo numactl --physcpubind=32 --membind=4 path/to/SPEC/bin/runcpu --config=path/to/SPEC/config 649.fotonik3d_s" -online build -time 5 -savename 649.fotonik3d_s -path DRd -pmu core32-SB,core32-L1D,core32-LFB,core32-L2
Use mbw to access CXL memory as interference traffic to the running YCSB application on the same CXL memory.
Steps:
-
Run Redis on the CXL node:
sudo numactl --membind=4 redis-server -
Run YCSB-A workload:
sudo numactl --physcpubind=47 --membind=4 path/to/ycsb/bin/ycsb run redis -s -P path/to/ycsb/workloads/workloada -p "redis.host=127.0.0.1" -p "redis.port=6379" -
Run
mbwto access CXL memory and introduce contention:sudo numactl --physcpubind=32 --membind=4 mbw -t1 -n 2000 1024 -
Use
onlinemode to monitor traffic on host components, and store results in InfluxDB under the nameycsb-1mbw:python3.8 main.py monitor -app_exist ycsb -online esti -time 5 -savename ycsb-1mbw -path DRd,RFO,HWPF,DWr,LD,ST -pmu default
Use a PARSEC application to access local memory with 50% CPU utilization, while simultaneously running the GUPS benchmark on the same core to access CXL memory.
Steps:
- Run the PARSEC
raytraceapplication with 50% CPU to access local memory
sudo cpulimit -l 50 -- numactl --physcpubind=32 --membind=4 /path/to/parsec/bin/parsecmgmt -a run -p raytrace -i native- Run the GUPS benchmark on the same core (CPU 32) with 30% CPU to access CXL memory
sudo cpulimit -l 30 -- numactl --membind=4 /home/xiaoli/TPP/colloid-6.3/apps/gups/gups-r 1- Monitor CXL-induced stalls on core 32 using online estimation mode
python3.8 main.py monitor -app_exist parsec_gups -online esti -savename parsec_gups -path DRd,RFO,HWPF,DWr,LD,ST -pmu default- Or monitor the variation in queueing degree
python3.8 main.py monitor -app_exist parsec_gups -online analy -savename parsec_gups -path DRd,RFO,HWPF,DWr,LD,ST -pmu default