Skip to content

Optimize GenerateXdmf#7296

Open
hen-w wants to merge 4 commits into
sxs-collaboration:developfrom
hen-w:xdmf
Open

Optimize GenerateXdmf#7296
hen-w wants to merge 4 commits into
sxs-collaboration:developfrom
hen-w:xdmf

Conversation

@hen-w

@hen-w hen-w commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

Various optimizations of GenerateXdmf.py. This results in 60 times speed up for large volume files.

Upgrade instructions

Code review checklist

  • The code is documented and the documentation renders correctly. Run
    make doc to generate the documentation locally into BUILD_DIR/docs/html.
    Then open index.html.
  • The code follows the stylistic and code quality guidelines listed in the
    code review guide.
  • The PR lists upgrade instructions and is labeled bugfix or
    new feature if appropriate.
  • If a coding agent is used, have one of
    "Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com",
    "Co-Authored-by: Codex noreply@openai.com", or
    "Co-Authored-By: GitHub Copilot CLI noreply@microsoft.com"
    as the last line of the commit, depending on the agent.

Further comments

@hen-w hen-w added the small Only changes a few lines of code, does a rename or is otherwise quick to review label Jun 12, 2026
@hen-w hen-w marked this pull request as draft June 12, 2026 20:30
@hen-w hen-w force-pushed the xdmf branch 3 times, most recently from 0767ef4 to 0a7a69c Compare June 12, 2026 21:03
@hen-w hen-w removed the small Only changes a few lines of code, does a rename or is otherwise quick to review label Jun 12, 2026
hen-w and others added 2 commits June 12, 2026 17:05
Instead of loading and walking the mixed-topology connectivity
array in Python, read the cell count from
len(observation["ElementId"]) which is free HDF5 metadata.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoid repeated HDF5 metadata lookups by caching the component list
and dtypes. The cache is invalidated when the dataset count changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cache grid metadata (num_points, number_of_cells, connectivity lengths)
in _ObservationCache. In fast mode, all HDF5 metadata is read only once
and reused for every timestep. This gives a large speedup on slow
filesystems when the grid and variables are static.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hen-w hen-w marked this pull request as ready for review June 14, 2026 19:12
Bug 1: the cell count read once from 'ElementId' counts the *main* connectivity's cells and was
then reused for the pole-filling Mixed topology, whose 'pole_connectivity'
has a different, time-varying cell count. The pole topology therefore
reported the main grid's NumberOfElements. Restore counting the pole
connectivity's own cells (per observation, since it can change between
timesteps); the main connectivity keeps the free len(ElementId) count.

Bug 2: the
component/dtype cache was invalidated only when the *number* of datasets
changed. If the set of observed fields changes while the count stays the
same, the stale field list was reused,
emitting an attribute that points at a non-existent dataset and dropping
the field that is actually present. Caching is now gated entirely on
--fast-mode: the default mode re-reads component names and dtypes on every
observation (always correct), while --fast-mode reads them once (the user
already asserts that the grid and variables are static).

Adds unit tests that trigger both bugs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@wthrowe wthrowe left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Please squash in some manner so that there are not broken commits in the middle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants