feat(build-dsp): add per-processor profiling report by gipert · Pull Request #200 · legend-exp/dspeed

gipert · 2026-05-31T14:53:33Z

Summary

ProcessorManager.execute() now measures wall time (perf_counter) and RSS delta (psutil) around each processor call per block, accumulating total wall time, per-block max RSS delta, and per-block average RSS delta
ProcessingChain gains get_profile() returning the full stats dict; get_timing() delegates to it (backward compatible)
build_dsp logs an INFO-level profiling table after each table is processed, showing the top 15 processors by wall time with wall time (s + %), max RSS delta, and average RSS delta; remaining processors are collapsed into a single summary line
psutil added as a package dependency

Example output:

=== DSP profiling report (34 processors, 10000 entries) ===
 processor                                            wall_s       %    Δrss_max    Δrss_avg
────────────────────────────────────────────────────────────────────────────────────────────────────
 fft_convolve_wf(wf_blsub[:round(…)], cusp_k…)        1.234   25.0%   +12.30 MB    +0.41 MB
 fft_convolve_wf(wf_blsub[:round(…)], zac_ke…)        1.107   22.4%   +12.10 MB    +0.39 MB
 cusp_filter(db.cusp.sigma/wf_blsub.period, …)        0.781   15.8%    +8.40 MB    +0.28 MB
 ...
 [19 more processors]                                  0.512   10.4%         ...         ...
────────────────────────────────────────────────────────────────────────────────────────────────────
 total                                                  4.941  100.0%   +41.00 MB    +1.34 MB

Test plan

All existing tests pass (tests/test_processing_chain.py, tests/test_build_dsp.py)
Visually verify report appears in INFO log output when running build_dsp on a real file

After processing each table, log an INFO-level report showing wall time and RSS delta (max and rolling average) per processor, sorted by wall time descending. Top 15 processors are shown explicitly; any remainder is collapsed into a single summary line. Mechanics: ProcessorManager.execute() now snapshots RSS via psutil before/after each block call and accumulates wall time with perf_counter. ProcessingChain gains get_profile() returning the full stats dict; get_timing() delegates to it for backward compatibility. psutil added as a package dependency.

gipert · 2026-05-31T14:58:39Z

this would be very useful in the dataflow

iguinn · 2026-06-01T01:40:45Z

Hey Luigi, agreed that we want this kind of monitoring, but I don't think this is an accurate picture of the memory usage for processors. This is because most of the memory is allocated during setup. The memory buffers are allocated when the processor is added (these are just numpy arrays so you could probably just get the sizes directly; they also shouldn't be that big in the end) and the numba processors (I've never found a good way to figure out how to attribute this unfortunately). This will capture any added memory that isn't happening in place within buffers (I guess this is probably why the ffts are on top, since fft uses an additional buffer).

edit: I just realized I was misinterpretting things and the delta isn't maximum memory usage of the processor while it is running, but memory that is allocated and not immediately deallocated...Which could imply that there is some sort of a leak, but I'm curious if there isn't clean-up happening after the delta is calculated? Otherwise, adding this around add_processor would be very interesting

Similarly, the timing won't capture everything (particular IO), but I think it is probably capturing what is important (and the built in profilers do awful with numba functions).

Replace per-block RSS measurement in ProcessorManager.execute() with a setup-time snapshot in build_processing_chain. The new measurement wraps both the init_args factory call (where filter kernels are allocated) and add_processor (where output buffers are created), giving a meaningful per-processor memory footprint. Per-execution RSS was misleading since buffers are pre-allocated and only FFT temporaries showed non-zero deltas. The profile report column is renamed to setup_MB accordingly.

gipert · 2026-06-01T07:31:24Z

@iguinn does it make more sense now?

gipert requested a review from iguinn May 31, 2026 14:58

gipert requested a review from ggmarshall May 31, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(build-dsp): add per-processor profiling report#200

feat(build-dsp): add per-processor profiling report#200
gipert wants to merge 2 commits into
mainfrom
feat/processing-chain-profiling

gipert commented May 31, 2026

Uh oh!

gipert commented May 31, 2026

Uh oh!

iguinn commented Jun 1, 2026 •

edited

Loading

Uh oh!

gipert commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gipert commented May 31, 2026

Summary

Test plan

Uh oh!

gipert commented May 31, 2026

Uh oh!

iguinn commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gipert commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iguinn commented Jun 1, 2026 •

edited

Loading