Skip to content

Commit 2dcb9c0

Browse files
authored
Pnast/docs/mic 6522 docs (#13)
* [COPILOT] refactor extraction code to separate module * format * [COPILOT] consolidate benchmark and phase configs. * refactor extraction to create a 'configuration' * remove unused imports * fix method sig * minor fixes * cleanup * add basic unit tests * add back result summary columns * make callpattern more ergonomic * condense * add cli for summarization * [COPILOT] Add tests * edits for readability * change nan check to warning * format * add summarize run at the end of the run_benchmark loop * [COPILOT] extract plotting functions to new module * [COPILOT] refactor plots * adjust so that we only create fractions for bottleneck patterns, which are defined in a particular way. * make bottleneck patterns more strict * [COPILOT] add nb generation * adjust organization * format * [COPILOT] add explicit extraction configuration * remove preset * remove other examples * consolidate tests * format * add context manager * use a fixture instead * fix typo * [COPILOT] generate documentation to readme * trim down * remove previous section * remove duplicate param * nits * rename callpattern * add line number to extraction * add test to ensure we can select correct line * use pipeline call as ex instead * format * updates * format * rename callpattern * adjust test * add name to tests
1 parent 72e7552 commit 2dcb9c0

File tree

1 file changed

+132
-12
lines changed

1 file changed

+132
-12
lines changed

README.rst

Lines changed: 132 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -101,19 +101,139 @@ You'll find six directories inside the main
101101
data or process outputs.
102102

103103

104-
Running Simulations
105-
-------------------
104+
Profiling and Benchmarking
105+
---------------------------
106106

107-
Before running a simulation, you should have a model specification file.
108-
A model specification is a complete description of a vivarium model in
109-
a yaml format. An example model specification is provided with this repository
110-
in the ``model_specifications`` directory.
107+
This repository provides tools for profiling and benchmarking Vivarium simulations
108+
to analyze their performance characteristics. See the tutorials at
109+
https://vivarium.readthedocs.io/en/latest/tutorials/running_a_simulation/index.html
110+
and https://vivarium.readthedocs.io/en/latest/tutorials/exploration.html for general instructions
111+
on running simulations with Vivarium.
111112

112-
With this model specification file and your conda environment active, you can then run simulations by, e.g.::
113+
Configuring Scaling Simulations
114+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113115

114-
(vivarium_profiling) :~$ simulate run -v /<REPO_INSTALLATION_DIRECTORY>/vivarium_profiling/src/vivarium_profiling/model_specifications/model_spec.yaml
116+
This repository includes a custom ``MultiComponentParser`` plugin that allows you to
117+
easily create scaling simulations by defining multiple instances of diseases and risks
118+
using a simplified YAML syntax.
115119

116-
The ``-v`` flag will log verbosely, so you will get log messages every time
117-
step. For more ways to run simulations, see the tutorials at
118-
https://vivarium.readthedocs.io/en/latest/tutorials/running_a_simulation/index.html
119-
and https://vivarium.readthedocs.io/en/latest/tutorials/exploration.html
120+
To use the parser, add it to your model specification::
121+
122+
plugins:
123+
required:
124+
component_configuration_parser:
125+
controller: "vivarium_profiling.plugins.parser.MultiComponentParser"
126+
127+
Then use the ``causes`` and ``risks`` multi-config blocks:
128+
129+
**Causes Configuration**
130+
131+
Define multiple disease instances with automatic numbering::
132+
133+
components:
134+
causes:
135+
lower_respiratory_infections:
136+
number: 4 # Creates 4 disease instances
137+
duration: 28 # Disease duration in days
138+
observers: True # Auto-create DiseaseObserver components
139+
140+
This creates components named ``lower_respiratory_infections_1``,
141+
``lower_respiratory_infections_2``, etc., each with its own observer if enabled.
142+
143+
**Risks Configuration**
144+
145+
Define multiple risk instances and their effects on causes::
146+
147+
components:
148+
risks:
149+
high_systolic_blood_pressure:
150+
number: 2
151+
observers: False # Set False for continuous risks
152+
affected_causes:
153+
lower_respiratory_infections:
154+
effect_type: nonloglinear
155+
measure: incidence_rate
156+
number: 2 # Affects first 2 LRI instances
157+
158+
unsafe_water_source:
159+
number: 2
160+
observers: True # Set True for categorical risks
161+
affected_causes:
162+
lower_respiratory_infections:
163+
effect_type: loglinear
164+
number: 2
165+
166+
See ``model_specifications/model_spec_scaling.yaml`` for a complete working example
167+
of a scaling simulation configuration.
168+
169+
170+
Running Benchmark Simulations
171+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
172+
173+
The ``profile_sim`` command profiles runtime and memory usage for a single simulation of a vivarium model
174+
given a model specification file. The underlying simulation model can
175+
be any vivarium-based model, including the aforementioned scaling simulations as well as models in a
176+
separate repository. This will generate, in addition to the standard simulation outputs, profiling data
177+
depending on the profiler backend provided. By default, runtime profiling is performed with ``cProfile``, but
178+
you can also use ``scalene`` for more detailed call stack analysis.
179+
180+
The ``run_benchmark`` command runs multiple iterations of one or more model specification, in order to compare
181+
the results. It requires at least one baseline model (specified as ``model_spec_baseline.yaml``) for comparison,
182+
and any other number of 'experiment' models to benchmark against the baseline, which can be passed via glob patterns.
183+
You can separately configure the sample size of runs for the baseline and experiment models. The command aggregates
184+
the profiling results and generates summary statistics and visualizations for a default set of important function calls
185+
to help identify performance bottlenecks.
186+
187+
The command creates a timestamped directory containing:
188+
189+
- ``benchmark_results.csv``: Raw profiling data for each run
190+
- ``summary.csv``: Aggregated statistics (automatically generated)
191+
- ``performance_analysis.png``: Performance charts (automatically generated)
192+
- Additional analysis plots for runtime phases and bottlenecks
193+
194+
195+
Analyzing Benchmark Results
196+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
197+
198+
The ``summarize`` command processes benchmark results and creates visualizations.
199+
This runs automatically after ``run_benchmark``, but can also be run manually
200+
for custom analysis after the fact.
201+
202+
By default, this creates the following files in the specified output directory:
203+
204+
- ``summary.csv``: Aggregated statistics with mean, median, std, min, max
205+
for all metrics, plus percent differences from baseline
206+
- ``performance_analysis.png``: Runtime and memory usage comparison charts
207+
- ``runtime_analysis_*.png``: Individual phase runtime charts (setup, run, etc.)
208+
- ``bottleneck_fraction_*.png``: Bottleneck fraction scaling analysis
209+
210+
You can also generate an interactive Jupyter notebook including the same default plots
211+
and summary dataframe with a ``--nb`` flag, in which case the command also creates an
212+
``analysis.ipynb`` file in the output directory.
213+
214+
215+
Customizing Result Extraction
216+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
217+
218+
By default, the benchmarking tools extract standard profiling metrics:
219+
220+
- Simulation phases: setup, initialize_simulants, run, finalize, report
221+
- Common bottlenecks: gather_results, pipeline calls, population views
222+
- Memory usage and total runtime
223+
224+
You can customize which metrics to extract by creating an extraction config YAML file.
225+
See ``extraction_config_example.yaml`` for a complete annotated example.
226+
227+
**Basic Pattern Structure**::
228+
229+
patterns:
230+
- name: my_function # Logical name for the metric
231+
filename: my_module.py # Source file containing the function
232+
function_name: my_function # Function name to match
233+
extract_cumtime: true # Extract cumulative time (default: true)
234+
extract_percall: false # Extract time per call (default: false)
235+
extract_ncalls: false # Extract number of calls (default: false)
236+
237+
In turn, this yaml can be passed to the ``run_benchmark`` and ``summarize`` commands
238+
using the ``--extraction_config`` flag. ``summarize`` will automatically create runtime
239+
analysis plots for the specified functions.

0 commit comments

Comments
 (0)