Skip to content

netlab-wisconsin/PathFinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

PathFinder is a systematic, informative, and lightweight CXL.mem profiler. PathFinder leverages the capabilities of existing hardware performance monitors (PMUs) and dissects the CXL.mem protocol at adequate granularities.

Preparation

Check Kernel Version

Kernel version > 6.5.0

Getting Started

Install Dependencies

Required Python libraries are listed in requirements.txt

Detecting and Verifying PMU Topology

Run the following command:

python3 analyse/detectTopo.py

This will automatically detect socket, CPU, and PMU topology
and write the configuration to the file analyse/archconf.conf.

You may also manually modify the contents of this configuration file.
Subsequent steps will use it to determine which PMUs to monitor along the CXL.mem path.

Optional: Install InfluxDB to Record Data

Installing a database enables support for historical data analysis.
If you choose not to install one, online analysis of counter data is still available.

Once InfluxDB is running, set the address and port in the configuration file:

analyse/config.conf

Options

This section lists the available command-line options and their meanings.

Option Name Value Type Description Default Value Option Value
module string Function selector: "monitor" enables perf and PEBS latency sampling, "build" uses PFBuilder to analyze historical data, "esti" uses PFEstimator, "analy" uses PFAnalyser, "mater" uses PFMaterializer. To enable both sampling and analysis, use the online parameter. empty monitor, build, esti, analy, mater
path string Select target mFlow path(s), comma-separated. Used in "monitor" to specify sampling targets, and in "build"/"esti"/"analy"/"mater" to specify analysis targets. empty DRd, RFO, HWPF, DWr
pmu string Select target component(s), comma-separated. 'default' indicates all components. empty core32-SB, ..., socket1-CHA0-31
app string In "monitor": specify the app command to run and monitor. In "build"/"esti"/"analy"/"mater": specify the app key to fetch historical data from the database.
savename string If set, uses this as the alias for storing and retrieving data in InfluxDB.
file string Export counter data in chronological order to an Excel .xlsx file.
time number Sampling time interval in seconds. 5
app_exist string In "monitor": monitor for already running app. In "build"/"esti"/"analy"/"mater": specify the app key to fetch historical data from the database.

monitor Parameters

path

  • DRd, RFO, HWPF, DWr for core and CHA components
  • LD, ST for iMC and M2PCIe PMUs

component

Defined in analyse/config.conf

  • Supports multiple components via comma separation, e.g.
    core32-SB, core32-L1D, core32-LFB, core32-L2, core33-LFB
  • Supports aggregating counters from multiple small components, e.g.
    socket1-CHA0-31 aggregates all 32 CHAs on socket1

Collector Example

Collect core and uncore counter data for mbw and store in InfluxDB with app=mbw1024 tag:

    python3.8 main.py monitor -app "mbw -t1 -n 2000 1024" -savename mbw1024

Analyse

PathFinder supports command-line parameters to specify the target app, module, mFlow path, and components.

By setting module to PFBuilder, PFEstimator, PFAnalyser, or PFMaterializer, you can inspect different metrics of the monitored application.

Use path to select DRd/RFO/HWPF/DWr (or LD/ST), and pmu to specify which components to trace: SB/L1D/LFB/L2/LLC/CHA/MC.


Offline Analysis

Analyze historical app counter data stored in the InfluxDB database.

PFBuilder

    python3.8 main.py build -path DRd,RFO,HWPF,DWr,LD,ST -pmu default -app 525.x264_r-memnode4

    python3.8 main.py build -path DRd,RFO,HWPF,DWr,LD,ST -pmu core32-SB,core32-L1D,core32-LFB,core32-L2,core33-LFB,socket1-CHA0,socket1-CHA31,socket1-PCIe5 -app 525.x264_r-memnode4

PFEstimator

    python3.8 main.py esti -path DRd,RFO,HWPF,DWr -pmu default -app 525.x264_r-memnode4

    python3.8 main.py esti -path DRd,RFO,HWPF,DWr -pmu core32-SB,core32-L1D,core32-LFB,core32-L2,core33-LFB,core47-SB,core47-L1D,core47-LFB,core47-L2,socket1-CHA0,socket1-CHA31 -app 525.x264_r-memnode4

PFAnalyser

    python3.8 main.py analy -path DRd,RFO,HWPF,DWr -pmu default -app workloada-10mbw

    python3.8 main.py analy -path DRd,RFO,HWPF,DWr -pmu core32-SB,core32-L1D,core32-LFB,core32-L2,core32-core_LLC,core33-LFB,core47-SB,core47-L1D,core47-LFB,core47-L2,socket1-CHA0,socket1-CHA31 -app workloada-10mbw

PFMaterializer

    python3.8 main.py mater -path DWr,DRd,RFO,HWPF,LD,ST -app 525.x264_r-memnode4 -pmu core32-L2 -option cluster

    python3.8 main.py mater -path DWr,DRd,RFO,HWPF,LD,ST -app 525.x264_r-memnode4 -pmu core32-L2 -option trend

Online Analysis

Analyze counter data in real time as it is being collected:

    python3.8 main.py monitor -app "sudo numactl --physcpubind=32 --membind=4 mbw -t1 -n 2000 1024" -online estimate -time 5

    python3.8 main.py monitor -app "mbw -t1 -n 2000 1024" -online analy

Use case System Configuration

Component Specification
Server Platform 2U Supermicro
Processor 2× Intel Xeon Gold 6438Y+ (Sapphire Rapids)
NUMA Configuration Sub-NUMA Clustering (SNC) enabled
Memory 256 GB DDR5
CXL Device 1× CXL Type-3 memory device (appears as CPU-less NUMA node)
Operating System Linux with 6.5 kernel
PMU Support CHA PMU support patches applied(https://github.com/torvalds/linux/commit/a5a6ff3d639d088d4af7e2935e1ee0d8b4e817d4)

Use Case 1: PFBuilder on SPEC CPU2017 Benchmark

Take the example of monitoring PFBuilder output when SPEC CPU2017 649.fotonik3d_s accesses CXL memory.

Steps:

  1. Use numactl to control the app to access CXL memory:

    sudo numactl --membind=4 bin/runcpu --config=config 649.fotonik3d_s
    
  2. Use online mode to observe PFBuilder output during app execution, and store it in InfluxDB under the name 649.fotonik3d_s.

  3. To view aggregated PFBuilder output on all paths and components:

    python3.8 main.py monitor -app "sudo numactl --physcpubind=32 --membind=4 path/to/SPEC/bin/runcpu --config=path/to/SPEC/config 649.fotonik3d_s" -online build -time 5 -savename 649.fotonik3d_s -path DRd,RFO,HWPF,DWr,LD,ST -pmu default
    
  4. To view PFBuilder output on DRd path for specific core32 components:

    python3.8 main.py monitor -app "sudo numactl --physcpubind=32 --membind=4 path/to/SPEC/bin/runcpu --config=path/to/SPEC/config 649.fotonik3d_s" -online build -time 5 -savename 649.fotonik3d_s -path DRd -pmu core32-SB,core32-L1D,core32-LFB,core32-L2
    

Use Case 2: Concurrent CXL Access Contention on YCSB Benchmark

Use mbw to access CXL memory as interference traffic to the running YCSB application on the same CXL memory.

Steps:

  1. Run Redis on the CXL node:

    sudo numactl --membind=4 redis-server
    
  2. Run YCSB-A workload:

    sudo numactl --physcpubind=47 --membind=4 path/to/ycsb/bin/ycsb run redis -s -P path/to/ycsb/workloads/workloada -p "redis.host=127.0.0.1" -p "redis.port=6379"
    
  3. Run mbw to access CXL memory and introduce contention:

    sudo numactl --physcpubind=32 --membind=4 mbw -t1 -n 2000 1024
    
  4. Use online mode to monitor traffic on host components, and store results in InfluxDB under the name ycsb-1mbw:

    python3.8 main.py monitor -app_exist ycsb -online esti -time 5 -savename ycsb-1mbw -path DRd,RFO,HWPF,DWr,LD,ST -pmu default
    

Use case 3: Interference between GUPS and PARSEC app

Use a PARSEC application to access local memory with 50% CPU utilization, while simultaneously running the GUPS benchmark on the same core to access CXL memory.


Steps:

  1. Run the PARSEC raytrace application with 50% CPU to access local memory
sudo cpulimit -l 50 -- numactl --physcpubind=32 --membind=4 /path/to/parsec/bin/parsecmgmt -a run -p raytrace -i native
  1. Run the GUPS benchmark on the same core (CPU 32) with 30% CPU to access CXL memory
sudo cpulimit -l 30 -- numactl --membind=4 /home/xiaoli/TPP/colloid-6.3/apps/gups/gups-r 1
  1. Monitor CXL-induced stalls on core 32 using online estimation mode
python3.8 main.py monitor -app_exist parsec_gups -online esti -savename parsec_gups -path DRd,RFO,HWPF,DWr,LD,ST -pmu default
  1. Or monitor the variation in queueing degree
python3.8 main.py monitor -app_exist parsec_gups -online analy -savename parsec_gups -path DRd,RFO,HWPF,DWr,LD,ST -pmu default

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors