ENH: PyDarshan Report changes to enable name filtering #1017

shanedsnyder · 2025-03-11T20:22:12Z

Enable filtering of PyDarshan report record data to exclude/include record names matching some given filter patterns.

This functionality is integrated into DarshanReport objects directly, with relevant routines (e.g., constructor, open(), read_all(), other log read routines) all taking filter_patterns and filter_mode arguments. filter_patterns is a list of Python regex strings to match against and filter_mode is either "exclude" (don't load any records that match strings in filter_patterns) or "include" (only load records that match strings in filter_patterns).

This functionality is exposed to the PyDarshan report summary tool via --exclude_names and --include_names command line arguments. Only one of these options may be provided. In either case, the strings supplied to these command line arguments are used to form the regex list to filter (exclude or include) against. For example, the following could be used to generate a summary report containing only file record names starting with /file_dir/ or ending in .txt:

python -m darshan summary --include_names="^/file_dir/" --include_names="\.txt$" logfile.darshan

Special logic was integrated into the job summary tool and lower-level aggregation/plotting routines to properly handle cases when all records for a given module have been filtered out (i.e., the Darshan log metadata indicates a module has data, but filtering has resulted in no records being stored in memory for the module).

Testing changes included for DarshanReport objects and lower-level plot routines.

github-actions bot added the pydarshan label Mar 11, 2025

Shane Snyder added 3 commits April 25, 2025 14:56

darshan report changes to enable name filtering

70d7287

updates job summary to handle name exclude/include

6139dec

keep helpful heatmap warning messages

a4aef23

shanedsnyder force-pushed the snyder/pydarshan-name-filters branch from 388fdeb to a4aef23 Compare April 25, 2025 20:20

Shane Snyder added 2 commits April 25, 2025 16:03

fix some typing hints

9c7276a

more tweaks to make sure warnings are printed

08a49b0

shanedsnyder changed the title ~~WIP: PyDarshan Report changes to enable name filtering~~ ENH: PyDarshan Report changes to enable name filtering Apr 26, 2025

testing additions

a9f1986

shanedsnyder added this to the 3.4.7 milestone Apr 26, 2025

shanedsnyder mentioned this pull request Apr 29, 2025

ENH: new PyDarshan CLI tools for job/file stats for many logs #1016

Merged

carns approved these changes Apr 30, 2025

View reviewed changes

shanedsnyder merged commit 4147a4a into main Apr 30, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: PyDarshan Report changes to enable name filtering #1017

ENH: PyDarshan Report changes to enable name filtering #1017

Uh oh!

shanedsnyder commented Mar 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ENH: PyDarshan Report changes to enable name filtering #1017

ENH: PyDarshan Report changes to enable name filtering #1017

Uh oh!

Conversation

shanedsnyder commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shanedsnyder commented Mar 11, 2025 •

edited

Loading