This document provides comprehensive documentation for the PerFlow Python bindings, which expose the C++ performance analysis library to Python.
PerFlow provides two levels of Python APIs:
- Low-level API (this document): Direct bindings to C++ classes like
TreeBuilder,HotspotAnalyzer, etc. - Dataflow API (see Dataflow API Reference): High-level dataflow-based programming framework for composing analysis workflows.
For most use cases, we recommend using the Dataflow API as it provides:
- Fluent, user-friendly workflow composition
- Automatic parallel execution of independent tasks
- Result caching and lazy evaluation
- Pre-built analysis nodes
# Clone the repository
git clone https://github.com/yuyangJin/PerFlow.git
cd PerFlow
# Initialize submodules (required for pybind11)
git submodule add https://github.com/pybind/pybind11.git third_party/pybind11
# Build with Python bindings
mkdir build && cd build
cmake .. -DPERFLOW_BUILD_PYTHON=ON
make -j4
# The module will be built in python/perflow/cd PerFlow
pip install -e .from perflow.dataflow import WorkflowBuilder
# Create and execute a workflow in one fluent chain
results = (
WorkflowBuilder("MyAnalysis")
.load_data([('rank_0.pflw', 0), ('rank_1.pflw', 1)])
.find_hotspots(top_n=10)
.analyze_balance()
.execute()
)
# Access results
for node_id, output in results.items():
if 'hotspots' in output:
for h in output['hotspots']:
print(f"{h.function_name}: {h.self_percentage:.1f}%")import perflow
# Build a performance tree from sample files
builder = perflow.TreeBuilder()
builder.load_library_maps([('rank_0.libmap', 0), ('rank_1.libmap', 1)])
builder.build_from_files([('rank_0.pflw', 0), ('rank_1.pflw', 1)])
# Get the tree
tree = builder.tree
# Find performance hotspots
hotspots = perflow.HotspotAnalyzer.find_hotspots(tree, top_n=10)
for h in hotspots:
print(f"{h.function_name}: {h.self_percentage:.1f}%")
# Analyze workload balance
balance = perflow.BalanceAnalyzer.analyze(tree)
print(f"Imbalance factor: {balance.imbalance_factor:.2f}")Defines how call stacks are aggregated into a tree.
| Value | Description |
|---|---|
ContextFree |
Nodes with same function are merged regardless of calling context |
ContextAware |
Nodes are distinguished by their full calling context |
Defines how samples are counted during tree building.
| Value | Description |
|---|---|
Exclusive |
Only track self samples (samples at leaf nodes) |
Inclusive |
Only track total samples (all samples including children) |
Both |
Track both inclusive and exclusive samples |
Represents a resolved stack frame with symbol information.
Attributes:
raw_address: int- Original raw addressoffset: int- Offset within the library/binarylibrary_name: str- Library or binary namefunction_name: str- Function name (if available)filename: str- Source file name (if available)line_number: int- Line number (if available)
Represents a vertex in the performance tree.
Properties:
frame: ResolvedFrame- The resolved frame informationfunction_name: str- The function namelibrary_name: str- The library nametotal_samples: int- Total samples across all processesself_samples: int- Self samples (samples at this leaf)sampling_counts: List[int]- Per-process sampling countsexecution_times: List[float]- Per-process execution times (microseconds)parent: TreeNode- Parent node (None if root)children: List[TreeNode]- List of child nodeschild_count: int- Number of childrenis_leaf: bool- True if this is a leaf nodeis_root: bool- True if this is the root nodedepth: int- Depth in the tree (root = 0)total_execution_time: float- Total execution time across all processes
Methods:
sampling_count(process_id: int) -> int- Get sampling count for a specific processexecution_time(process_id: int) -> float- Get execution time for a specific processsiblings() -> List[TreeNode]- Get all sibling nodesget_path() -> List[str]- Get path from root to this node as function namesfind_child_by_name(func_name: str) -> TreeNode- Find child by function nameget_call_count(child: TreeNode) -> int- Get call count to a specific child
Aggregates call stack samples into a tree structure.
Constructor:
tree = PerformanceTree(
mode: TreeBuildMode = TreeBuildMode.ContextFree,
count_mode: SampleCountMode = SampleCountMode.Exclusive
)Properties:
root: TreeNode- The root nodeprocess_count: int- Number of processestotal_samples: int- Total number of samplesbuild_mode: TreeBuildMode- Tree build modesample_count_mode: SampleCountMode- Sample count modenode_count: int- Total number of nodesmax_depth: int- Maximum depth of the treeall_nodes: List[TreeNode]- All nodes in the treeleaf_nodes: List[TreeNode]- All leaf nodes
Methods:
clear()- Clear all data from the treeinsert_call_stack(frames: List[ResolvedFrame], process_id: int, count: int = 1, time_us: float = 0.0)- Insert a call stacknodes_at_depth(depth: int) -> List[TreeNode]- Get all nodes at a specific depthfind_nodes_by_name(func_name: str) -> List[TreeNode]- Find nodes by function namefind_nodes_by_library(lib_name: str) -> List[TreeNode]- Find nodes by library namefilter_by_samples(min_samples: int) -> List[TreeNode]- Filter nodes by sample countfilter_by_self_samples(min_self_samples: int) -> List[TreeNode]- Filter by self samplestraverse_preorder(visitor: Callable[[TreeNode], bool])- Pre-order traversaltraverse_postorder(visitor: Callable[[TreeNode], bool])- Post-order traversaltraverse_levelorder(visitor: Callable[[TreeNode], bool])- Level-order traversal
Constructs performance trees from sample data files.
Constructor:
builder = TreeBuilder(
mode: TreeBuildMode = TreeBuildMode.ContextFree,
count_mode: SampleCountMode = SampleCountMode.Exclusive
)Properties:
tree: PerformanceTree- The performance treebuild_mode: TreeBuildMode- The build modesample_count_mode: SampleCountMode- The sample count mode
Methods:
set_build_mode(mode: TreeBuildMode)- Set the tree build modeset_sample_count_mode(mode: SampleCountMode)- Set the sample count modebuild_from_file(sample_file: str, process_id: int, time_per_sample: float = 1000.0) -> bool- Build from a single filebuild_from_files(sample_files: List[Tuple[str, int]], time_per_sample: float = 1000.0) -> int- Build from multiple filesload_library_maps(libmap_files: List[Tuple[str, int]]) -> int- Load library mapsclear()- Clear all data
Contains workload balance information.
Attributes:
mean_samples: float- Mean samples per processstd_dev_samples: float- Standard deviationmin_samples: float- Minimum samplesmax_samples: float- Maximum samplesimbalance_factor: float- (max - min) / meanmost_loaded_process: int- Process with most samplesleast_loaded_process: int- Process with least samplesprocess_samples: List[float]- Per-process sample counts
Describes a performance hotspot.
Attributes:
function_name: str- Function namelibrary_name: str- Library namesource_location: str- Source file locationtotal_samples: int- Total/inclusive samplespercentage: float- Percentage of total samplesself_samples: int- Self/exclusive samplesself_percentage: float- Percentage of self samples
Analyzes workload distribution across processes.
Static Methods:
analyze(tree: PerformanceTree) -> BalanceAnalysisResult- Analyze workload balance
Identifies performance bottlenecks.
Static Methods:
find_hotspots(tree: PerformanceTree, top_n: int = 10) -> List[HotspotInfo]- Find top hotspots by self timefind_self_hotspots(tree: PerformanceTree, top_n: int = 10) -> List[HotspotInfo]- Alias for find_hotspotsfind_total_hotspots(tree: PerformanceTree, top_n: int = 10) -> List[HotspotInfo]- Find hotspots by total time
results = perflow.analyze_samples(
sample_files: List[Tuple[str, int]], # List of (filepath, process_id)
libmap_files: List[Tuple[str, int]] = None, # Optional library maps
top_n: int = 10,
mode: TreeBuildMode = TreeBuildMode.ContextFree
) -> dictReturns a dictionary with:
tree: The PerformanceTree objecthotspots: List of HotspotInfo objectsbalance: BalanceAnalysisResult object
perflow.print_hotspots(
tree: PerformanceTree,
top_n: int = 10,
show_inclusive: bool = False
)Prints hotspot analysis results in a formatted table.
perflow.print_balance(tree: PerformanceTree)Prints workload balance analysis results.
perflow.traverse_tree(
tree: PerformanceTree,
visitor: Callable[[TreeNode], bool],
order: str = 'preorder' # 'preorder', 'postorder', or 'levelorder'
)Traverse the tree with a visitor function.
import perflow
# Create a tree builder
builder = perflow.TreeBuilder()
# Build tree from sample files
builder.build_from_files([
('rank_0.pflw', 0),
('rank_1.pflw', 1),
('rank_2.pflw', 2),
('rank_3.pflw', 3),
])
tree = builder.tree
# Print hotspots
perflow.print_hotspots(tree, top_n=10)
# Print balance
perflow.print_balance(tree)import perflow
# Build tree...
tree = perflow.PerformanceTree()
# ...insert data...
# Find all functions with more than 1000 samples
def find_hot_functions(node):
if node.total_samples > 1000:
print(f"{node.function_name}: {node.total_samples} samples")
return True # Continue traversal
tree.traverse_preorder(find_hot_functions)import perflow
tree = builder.tree
# Get only leaf nodes (actual execution points)
leaves = tree.leaf_nodes
print(f"Found {len(leaves)} leaf functions")
# Get nodes with significant self time
hot_nodes = tree.filter_by_self_samples(min_self_samples=100)
for node in hot_nodes:
print(f"{node.function_name}: {node.self_samples} self samples")
# Find all functions from a specific library
math_functions = tree.find_nodes_by_library("libm.so.6")