Skip to content
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
47ff19f
[COPILOT] refactor extraction code to separate module
patricktnast Jan 6, 2026
3222987
format
patricktnast Jan 6, 2026
bb99066
[COPILOT] consolidate benchmark and phase configs.
patricktnast Jan 6, 2026
ee4c080
refactor extraction to create a 'configuration'
patricktnast Jan 7, 2026
8eacffe
remove unused imports
patricktnast Jan 7, 2026
be56673
fix method sig
patricktnast Jan 7, 2026
725cfb4
minor fixes
patricktnast Jan 7, 2026
ba57664
cleanup
patricktnast Jan 7, 2026
0ba03a9
add basic unit tests
patricktnast Jan 7, 2026
d740d88
add back result summary columns
patricktnast Jan 7, 2026
b98b547
make callpattern more ergonomic
patricktnast Jan 7, 2026
b6d861a
condense
patricktnast Jan 7, 2026
cd0520d
add cli for summarization
patricktnast Jan 7, 2026
12fe5fb
[COPILOT] Add tests
patricktnast Jan 8, 2026
95c86cc
edits for readability
patricktnast Jan 8, 2026
9ffd7d5
change nan check to warning
patricktnast Jan 8, 2026
266ffae
format
patricktnast Jan 8, 2026
ac6e2c2
add summarize run at the end of the run_benchmark loop
patricktnast Jan 8, 2026
881154e
[COPILOT] extract plotting functions to new module
patricktnast Jan 8, 2026
0b43719
[COPILOT] refactor plots
patricktnast Jan 8, 2026
d353529
adjust so that we only create fractions for bottleneck patterns, whic…
patricktnast Jan 8, 2026
708d155
make bottleneck patterns more strict
patricktnast Jan 8, 2026
305a9a1
Merge branch 'pnast/feature/mic-6519-agg' into pnast/refactor/mic-637…
patricktnast Jan 8, 2026
e391d47
[COPILOT] add nb generation
patricktnast Jan 8, 2026
6bd4573
adjust organization
patricktnast Jan 8, 2026
2643cb6
format
patricktnast Jan 8, 2026
13dd0e5
Merge branch 'pnast/refactor/mic-6373-plotting' into pnast/feature/mi…
patricktnast Jan 8, 2026
4f069fa
[COPILOT] add explicit extraction configuration
patricktnast Jan 8, 2026
e5a4ecd
remove preset
patricktnast Jan 9, 2026
70268a8
remove other examples
patricktnast Jan 9, 2026
5f07a2c
consolidate tests
patricktnast Jan 9, 2026
13c22fb
format
patricktnast Jan 9, 2026
0cb59be
add context manager
patricktnast Jan 9, 2026
4e9a02f
use a fixture instead
patricktnast Jan 12, 2026
5ae07b4
fix typo
patricktnast Jan 12, 2026
5ee0995
remove duplicate param
patricktnast Jan 13, 2026
7f1c7e5
rename callpattern
patricktnast Jan 14, 2026
232da87
add line number to extraction
patricktnast Jan 15, 2026
ab1a103
add test to ensure we can select correct line
patricktnast Jan 15, 2026
f68c810
use pipeline call as ex instead
patricktnast Jan 15, 2026
2a7244a
Merge branch 'pnast/feature/mic-6518-extraction' into pnast/feature/m…
patricktnast Jan 15, 2026
2a53ee9
Merge remote-tracking branch 'origin/main' into pnast/feature/mic-651…
patricktnast Jan 15, 2026
7a2705a
format
patricktnast Jan 15, 2026
5a103d7
updates
patricktnast Jan 15, 2026
134c980
format
patricktnast Jan 8, 2026
1b1b672
Merge branch 'pnast/feature/mic-6519-agg' into pnast/refactor/mic-637…
patricktnast Jan 15, 2026
b55733a
Merge branch 'pnast/refactor/mic-6373-plotting' of https://github.com…
patricktnast Jan 15, 2026
1b8da72
Merge remote-tracking branch 'origin/main' into pnast/refactor/mic-63…
patricktnast Jan 15, 2026
ff6e733
Merge branch 'pnast/refactor/mic-6373-plotting' into pnast/feature/mi…
patricktnast Jan 15, 2026
c16265e
Merge remote-tracking branch 'origin/main' into pnast/feature/mic-637…
patricktnast Jan 15, 2026
572005a
Merge branch 'pnast/feature/mic-6373-nb' into pnast/feature/pattern-c…
patricktnast Jan 15, 2026
08e82ff
Merge remote-tracking branch 'origin/main' into pnast/feature/pattern…
patricktnast Jan 15, 2026
9263d06
rename callpattern
patricktnast Jan 15, 2026
31eb746
adjust test
patricktnast Jan 15, 2026
b9efede
add name to tests
patricktnast Jan 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions extraction_config_example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Example Extraction Configuration
#
# This file demonstrates the YAML syntax for configuring extraction patterns
# that define which profiling metrics to extract from benchmark results.
#
# Usage:
# run_benchmark -m models/*.yaml -r 10 -b 20 --extraction-config extraction_config.yaml
# summarize results/profile_*/benchmark_results.csv --extraction-config extraction_config.yaml
#
# Each pattern defines:
# - What function to match in cProfile output (filename + function_name)
# - Which metrics to extract (cumtime, percall, ncalls) - all optional with defaults
# - How to name the resulting columns (templates) - optional with defaults
#
# Common use cases:
#
# 1. Bottlenecks: Extract all 3 metrics (cumtime, percall, ncalls)
# Use for performance hotspots you want to track in detail
#
# 2. Simulation phases: Extract only cumtime with "rt_{name}_s" column naming
# Use for high-level simulation phases
#
# 3. Custom patterns: Mix and match metrics and templates as needed

patterns:
# ===========================================================================
# BOTTLENECK PATTERNS
# ===========================================================================
# These extract cumtime, percall, and ncalls for detailed bottleneck analysis

- name: gather_results
filename: results/manager.py
function_name: gather_results
extract_cumtime: true
extract_percall: true
extract_ncalls: true
# Generates columns: gather_results_cumtime, gather_results_percall, gather_results_ncalls

- name: pipeline_call
filename: values/pipeline.py
function_name: __call__
extract_cumtime: true
extract_percall: true
extract_ncalls: true
# Generates columns: pipeline_call_cumtime, pipeline_call_percall, pipeline_call_ncalls

- name: population_get
filename: population/population_view.py
function_name: get
extract_cumtime: true
extract_percall: true
extract_ncalls: true
# Generates columns: population_get_cumtime, population_get_percall, population_get_ncalls

# ===========================================================================
# SIMULATION PHASE PATTERNS
# ===========================================================================
# These extract only cumtime for high-level simulation phases
# Uses custom template for runtime column naming: rt_{name}_s

- name: setup
filename: /vivarium/framework/engine.py
function_name: setup
extract_cumtime: true
cumtime_template: "rt_{name}_s"
# extract_percall and extract_ncalls default to false
# Generates column: rt_setup_s

- name: initialize_simulants
filename: /vivarium/framework/engine.py
function_name: initialize_simulants
extract_cumtime: true
cumtime_template: "rt_{name}_s"
# Generates column: rt_initialize_simulants_s

- name: run
filename: /vivarium/framework/engine.py
function_name: run
extract_cumtime: true
cumtime_template: "rt_{name}_s"
# Generates column: rt_run_s

- name: finalize
filename: /vivarium/framework/engine.py
function_name: finalize
extract_cumtime: true
cumtime_template: "rt_{name}_s"
# Generates column: rt_finalize_s

- name: report
filename: /vivarium/framework/engine.py
function_name: report
extract_cumtime: true
cumtime_template: "rt_{name}_s"
# Generates column: rt_report_s

# ===========================================================================
# CUSTOM PATTERN EXAMPLES
# ===========================================================================

# Example: Custom function with selective metric extraction and custom templates
# - name: my_bottleneck
# filename: my/module.py
# function_name: my_function
# extract_cumtime: true
# extract_percall: true
# extract_ncalls: false
# cumtime_template: "{name}_total_time"
# percall_template: "{name}_avg_time"
# # Generates columns: my_bottleneck_total_time, my_bottleneck_avg_time

# Example: Minimal pattern (only cumtime with default template)
# - name: simple_func
# filename: simple/module.py
# function_name: simple_function
# # extract_cumtime defaults to true
# # extract_percall and extract_ncalls default to false
# # cumtime_template defaults to "{name}_cumtime"
# # Generates column: simple_func_cumtime

# ===========================================================================
# FIELD REFERENCE
# ===========================================================================
#
# Required fields:
# name: Logical name used in column name templates (e.g., "setup", "gather_results")
# filename: Path pattern to match the source file (can be partial path)
# function_name: Name of the function to match in cProfile output
#
# Optional fields (with defaults):
# extract_cumtime: Extract cumulative time (default: true)
# extract_percall: Extract time per call (default: false)
# extract_ncalls: Extract number of calls (default: false)
# cumtime_template: Column name template for cumtime (default: "{name}_cumtime")
# percall_template: Column name template for percall (default: "{name}_percall")
# ncalls_template: Column name template for ncalls (default: "{name}_ncalls")
#
# Template variables:
# {name}: Replaced with the pattern's name field
#
# File path matching:
# - Patterns match any path ending with the specified filename
# - Use forward slashes even on Windows
# - Special regex characters (., *, etc.) are automatically escaped
# - Example: "results/manager.py" matches "/full/path/to/results/manager.py"
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
"matplotlib",
"seaborn",
"scalene",
"nbformat>=5.0",
]

setup_requires = ["setuptools_scm"]
Expand Down Expand Up @@ -98,5 +99,6 @@
make_artifacts=vivarium_profiling.tools.cli:make_artifacts
run_benchmark=vivarium_profiling.tools.cli:run_benchmark
profile_sim=vivarium_profiling.tools.cli:profile_sim
summarize=vivarium_profiling.tools.cli:summarize
""",
)
6 changes: 6 additions & 0 deletions src/vivarium_profiling/templates/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Templates for vivarium_profiling."""

from pathlib import Path

TEMPLATES_DIR = Path(__file__).parent
ANALYSIS_NOTEBOOK_TEMPLATE = TEMPLATES_DIR / "analysis_template.ipynb"
216 changes: 216 additions & 0 deletions src/vivarium_profiling/templates/analysis_template.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "072e8e0a",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from pathlib import Path\n",
"from vivarium_profiling.tools.extraction import ExtractionConfig\n",
"from vivarium_profiling.tools import plotting\n",
"\n",
"# Configure matplotlib for notebook\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"id": "b7058668",
"metadata": {},
"source": [
"## Load Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9241f5cb",
"metadata": {},
"outputs": [],
"source": [
"# Load benchmark results\n",
"benchmark_results_path = Path(r\"{{BENCHMARK_RESULTS_PATH}}\")\n",
"summary_path = Path(r\"{{SUMMARY_PATH}}\")\n",
"\n",
"raw = pd.read_csv(benchmark_results_path)\n",
"summary = pd.read_csv(summary_path)\n",
"\n",
"# Load extraction config\n",
"config = ExtractionConfig()\n",
"\n",
"print(f\"Loaded {len(raw)} raw benchmark results\")\n",
"print(f\"Loaded {len(summary)} model summaries\")\n",
"print(f\"\\nRaw data shape: {raw.shape}\")\n",
"print(f\"Summary data shape: {summary.shape}\")"
]
},
{
"cell_type": "markdown",
"id": "4c47df2b",
"metadata": {},
"source": [
"## Performance Analysis\n",
"\n",
"Overall runtime and memory usage comparison across models."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22bb73fb",
"metadata": {},
"outputs": [],
"source": [
"plotting.create_figures(\n",
" summary,\n",
" output_dir=None,\n",
" chart_title=\"performance_analysis\",\n",
" time_col=\"rt_s\",\n",
" mem_col=\"mem_mb\",\n",
" time_pdiff_col=\"rt_s_pdiff\",\n",
" save=False\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7e31e5f0",
"metadata": {},
"source": [
"## Phase Runtime Analysis\n",
"\n",
"Detailed analysis of individual simulation phases (setup, initialize_simulants, run, finalize, report)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ee250c94",
"metadata": {},
"outputs": [],
"source": [
"# Get phase metrics from config\n",
"phase_patterns = [p for p in config.patterns if p.cumtime_template == \"rt_{name}_s\"]\n",
"\n",
"for pattern in phase_patterns:\n",
" time_col = pattern.cumtime_col\n",
" time_pdiff_col = f\"{time_col}_pdiff\"\n",
" \n",
" print(f\"\\n=== {pattern.name.upper()} ===\")\n",
" plotting.create_figures(\n",
" summary,\n",
" output_dir=None,\n",
" chart_title=f\"runtime_analysis_{pattern.name}\",\n",
" time_col=time_col,\n",
" mem_col=None,\n",
" time_pdiff_col=time_pdiff_col,\n",
" save=False\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "7f07476e",
"metadata": {},
"source": [
"## Non-Run Time Analysis\n",
"\n",
"Analysis of time spent outside the main run phase (setup, initialization, reporting, etc.)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bf6f0d7",
"metadata": {},
"outputs": [],
"source": [
"plotting.create_figures(\n",
" summary,\n",
" output_dir=None,\n",
" chart_title=\"runtime_analysis_non_run\",\n",
" time_col=\"rt_non_run_s\",\n",
" mem_col=None,\n",
" time_pdiff_col=\"rt_non_run_s_pdiff\",\n",
" save=False\n",
")"
]
},
{
"cell_type": "markdown",
"id": "aa16a06d",
"metadata": {},
"source": [
"## Bottleneck Cumulative Time Analysis\n",
"\n",
"Analysis of cumulative time spent in known bottleneck functions (gather_results, pipeline_call, population_get)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "691b5377",
"metadata": {},
"outputs": [],
"source": [
"# Get bottleneck patterns from config\n",
"bottleneck_patterns = [\n",
" p for p in config.patterns\n",
" if p.extract_cumtime and p.cumtime_col == f\"{p.name}_cumtime\"\n",
"]\n",
"\n",
"for pattern in bottleneck_patterns:\n",
" time_col = pattern.cumtime_col\n",
" time_pdiff_col = f\"{time_col}_pdiff\"\n",
" \n",
" print(f\"\\n=== {pattern.name.upper()} ===\")\n",
" plotting.create_figures(\n",
" summary,\n",
" output_dir=None,\n",
" chart_title=f\"bottleneck_runtime_analysis_{pattern.name}\",\n",
" time_col=time_col,\n",
" mem_col=None,\n",
" time_pdiff_col=time_pdiff_col,\n",
" save=False\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "4f267afb",
"metadata": {},
"source": [
"## Bottleneck Fractions vs Scale Factor\n",
"\n",
"Fraction of run() time spent in each bottleneck function, plotted against model scale factor."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ddcc58f6",
"metadata": {},
"outputs": [],
"source": [
"plotting.plot_bottleneck_fractions(\n",
" summary,\n",
" output_dir=None,\n",
" config=config,\n",
" metric=\"median\",\n",
" save=False\n",
")"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
1 change: 1 addition & 0 deletions src/vivarium_profiling/tools/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .app_logging import configure_logging_to_terminal
from .extraction import ExtractionConfig
from .make_artifacts import build_artifacts
from .run_benchmark import run_benchmark_loop
Loading