Reduce peak memory in PostProcess.py PDF weight extraction by dprim7 · Pull Request #334 · LPC-HH/HH4b

dprim7 · 2026-04-07T19:17:25Z

Reduce peak memory in PostProcess.py PDF weight extraction

Problem

load_process_run3_samples() consumes 17+ GB of transient RSS while extracting PDF weight columns for signal samples, contributing to OOM crashes during template production.

Cause

The PDF weights are pulled out one column at a time:

{f"pdf_weights_{i}": events_dict["pdf_weights"][i].to_numpy() for i in range(n_pdf_weights)}

For a 232K-event sample with 101 PDF columns, the per-column df[i].to_numpy() calls pile up enough pandas intermediates to push RSS to 19 GB before the GC catches up. The actual data is ~188 MB.

Fix

Materialize the DataFrame once, then slice columns out of the numpy array:

pdf_np = events_dict["pdf_weights"].to_numpy()
{f"pdf_weights_{i}": pdf_np[:, i].copy() for i in range(n_pdf_weights)}

Same pattern applied to scale_weights. +4 / -7 lines.

Benchmarks

Isolated, single sample (vbfhh4b-k2v0, 232K events, 101 PDF columns):

	Peak RSS
before	19.0 GB
after	2.2 GB

Full pipeline (PostProcess.py --years 2022, 20 samples, glopart-v2):

	Peak RSS
before	23.0 GB
after	19.3 GB

The full-pipeline delta is smaller because the accumulated events_dict_postprocess baseline (~11 GB after the data sample) dominates. The fix removes the single largest transient spike, which is what trips OOM on 2024.

Correctness

90/90 template histograms bit-identical between before/after (np.allclose with equal_nan=True).

Things ruled out along the way

glibc fragmentation — jemalloc and MALLOC_TRIM_THRESHOLD_=0 both gave 22-23 GB
gc.collect() + malloc_trim(0) between samples — spike is intra-sample
del events_dict after last use — peak is during more_vars construction, while events_dict is still live

How to reproduce

PYTHONPATH=src /usr/bin/time -v micromamba run -n hh4b python bench_spike_isolate.py
PYTHONPATH=src /usr/bin/time -v micromamba run -n hh4b python bench_fix_validation.py

Compare Maximum resident set size between the two.

…t contract - calculate_trigger_weights: data returns ones, MC/signal apply SF with uncertainty - calculate_txbb_weights: per-jet SF multiplication, data/signal/single-jet paths - discretize_var: default/custom bins, clipping, integer output - Output contract: define required columns per sample type (data/signal/ttbar/bg), verify event list columns are a subset, cross-check consistency These tests guard against regressions when refactoring load_process_run3_samples for memory optimization (early events_dict deletion).

Per-column df[i].to_numpy() called 101 times in a loop creates 17+ GB of transient allocations due to pandas column-access overhead. Bulk df.to_numpy() then column slicing avoids this entirely. Isolated benchmark (vbfhh4b-k2v0, 232K events, 101 PDF columns): Before: 19.0 GB peak RSS After: 2.2 GB peak RSS (-88%) Full pipeline (2022, 20 samples): Before: 23.0 GB peak RSS After: 19.3 GB peak RSS (-16%) Template output verified bit-identical (90/90 histograms).

dprim7 and others added 4 commits April 2, 2026 17:06

added new gitignore entry

644ae3e

Merge branch 'main' into reduce-peak-memory

13bf001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce peak memory in PostProcess.py PDF weight extraction#334

Reduce peak memory in PostProcess.py PDF weight extraction#334
dprim7 wants to merge 4 commits intoLPC-HH:mainfrom
dprim7:reduce-peak-memory

dprim7 commented Apr 7, 2026 •

edited by cmantill

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dprim7 commented Apr 7, 2026 • edited by cmantill Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!