-
Notifications
You must be signed in to change notification settings - Fork 473
A new X-MeshGraphNet example for reservoir simulation. #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds a new reservoir simulation example to PhysicsNeMo using X-MeshGraphNet (XMGN), located in examples/reservoir_simulation/. The implementation provides a complete workflow for training graph neural networks on subsurface flow data from ECLIPSE/OPM/IX simulators. The example includes: (1) sim_utils/ package for reading binary simulation files and constructing graph representations from reservoir grids, (2) preprocessing pipeline that converts raw simulator outputs to partitioned PyTorch Geometric graphs with train/val/test splits, (3) training infrastructure with distributed support, MLflow tracking, and early stopping, and (4) autoregressive inference that predicts multiple timesteps and exports results in both HDF5 and GRDECL formats. The code integrates with PhysicsNeMo's distributed training patterns and checkpoint management while adding domain-specific features like transmissibility-based edge weights, well completion handling, and nonlinear scaling for physical variables. The architecture mirrors existing PhysicsNeMo examples (Hydra configuration, modular utilities) while addressing reservoir simulation requirements such as handling inactive cells, temporal sequences, and industry-standard file formats.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
examples/reservoir_simulation/xmgn/src/train.py |
2/5 | Adds complete training pipeline but has critical issues: MLflow run never closed, gradient accumulation logic inside partition loop instead of outside, and potential race conditions in distributed trash cleanup |
examples/reservoir_simulation/xmgn/requirements.txt |
2/5 | Adds dependencies but contains version mismatch between torch (>=2.4.0) and PyG wheel instructions (torch-2.9.0), plus unused torchvision/torchaudio dependencies |
examples/reservoir_simulation/xmgn/src/data/__init__.py |
2/5 | Creates data package API but uses absolute import from sim_utils import ... that will fail since sim_utils is a sibling directory, should use relative import from ...sim_utils import ... |
examples/reservoir_simulation/xmgn/src/data/dataloader.py |
3/5 | Implements graph dataset and partitioning but missing null checks for edge_attr indexing and empty tensor validation before shape access could cause runtime crashes |
examples/reservoir_simulation/sim_utils/grid.py |
3/5 | Adds Grid class for reservoir connectivity but remapping logic uses slow Python loops, creates invalid (0,0) edges in edge cases, and has debug prints that should be converted to logging |
examples/reservoir_simulation/xmgn/src/preprocessor.py |
3.5/5 | Implements METIS-based graph partitioning with fallback, but swallows exceptions during partition organization and shadows loop variable i in nested loops |
examples/reservoir_simulation/xmgn/src/utils/mlflow_utils.py |
3/5 | Provides MLflow logging utilities but contains bare except clause catching system exits, unsafe attribute access leaving dataset_name unbound, and uses deprecated datetime.utcnow() |
examples/reservoir_simulation/sim_utils/ecl_reader.py |
4/5 | Adds ECLIPSE binary file parser with extensive functionality but loads entire files into memory and uses bare except clauses that could mask critical errors |
examples/reservoir_simulation/sim_utils/well.py |
4/5 | Introduces well/completion data structures with minor documentation inconsistencies (missing stat parameter, references nonexistent set_out_flow_rate method) |
examples/reservoir_simulation/xmgn/src/data/graph_builder.py |
4/5 | Comprehensive graph builder converting simulator outputs to PyTorch Geometric format with extensive validation, minor unused imports and potentially strict 20% failure threshold |
examples/reservoir_simulation/xmgn/src/inference.py |
4/5 | Autoregressive inference script with checkpoint loading and dual output formats (HDF5/GRDECL), contains bare except clause catching all exceptions including keyboard interrupts |
examples/reservoir_simulation/xmgn/src/utils/path_utils.py |
4/5 | Path management utilities with inconsistent validation - checks runspec.job_name but not dataset.sim_dir before access which could raise AttributeError |
examples/reservoir_simulation/xmgn/conf/config.yaml |
4/5 | Main Hydra configuration for XMGN with comprehensive settings, minor trailing whitespace issue and should validate array length constraints at runtime |
examples/reservoir_simulation/xmgn/conf/config_norne.yaml |
4/5 | Norne dataset-specific configuration with relative paths assuming specific directory structure and skip_graphs flag requiring pre-existing files |
examples/reservoir_simulation/xmgn/README.md |
4/5 | Documents complete workflow from data prep to inference with minor config path discrepancy (conf/config.yaml vs config/config.yaml) and incomplete Option 2 section |
examples/reservoir_simulation/sim_utils/__init__.py |
4/5 | Creates sim_utils package API with clean exports, assumes existence of sibling modules (ecl_reader, grid, well) |
examples/reservoir_simulation/sim_utils/README.md |
5/5 | Well-structured documentation for sim_utils package with practical code examples and clear usage patterns |
CHANGELOG.md |
5/5 | Adds single entry documenting new reservoir simulation example in correct format and location |
examples/reservoir_simulation/xmgn/.gitignore |
5/5 | Standard MLflow mlruns/ exclusion, correct and necessary |
examples/reservoir_simulation/xmgn/src/__init__.py |
5/5 | Empty package marker with proper license header, standard Python packaging |
examples/reservoir_simulation/xmgn/src/utils/__init__.py |
5/5 | Empty utils package initialization with license header and empty __all__ list |
Confidence score: 2/5
- This PR requires careful review due to multiple critical issues in core training and data loading logic that could cause runtime failures or incorrect behavior
- Score reflects critical bugs in train.py gradient accumulation, import path errors in data/init.py, version mismatches in requirements.txt, missing null checks in dataloader, and bare except clauses that could mask errors; additionally concerns about memory usage patterns (loading entire files), potential race conditions in distributed cleanup, and deprecated API usage
- Pay close attention to
examples/reservoir_simulation/xmgn/src/train.py(gradient accumulation logic, MLflow lifecycle),examples/reservoir_simulation/xmgn/src/data/__init__.py(import paths),examples/reservoir_simulation/xmgn/requirements.txt(version alignment),examples/reservoir_simulation/xmgn/src/data/dataloader.py(null safety), andexamples/reservoir_simulation/xmgn/src/utils/mlflow_utils.py(exception handling)
Sequence Diagram
sequenceDiagram
participant User
participant Preprocessor
participant EclReader
participant Grid
participant GraphBuilder
participant Trainer
participant Model
participant Inference
User->>Preprocessor: python src/preprocessor.py
Preprocessor->>Preprocessor: validate_config()
Preprocessor->>Preprocessor: check_existing_data()
Preprocessor->>GraphBuilder: ReservoirGraphBuilder(cfg)
GraphBuilder->>EclReader: find simulation files (.DATA/.AFI)
loop For each simulation case
GraphBuilder->>EclReader: read_init(static properties)
GraphBuilder->>EclReader: read_egrid(grid geometry)
GraphBuilder->>Grid: Grid(init_data, egrid_data)
Grid->>Grid: compute cell coordinates
Grid->>Grid: compute connections & transmissibilities
GraphBuilder->>EclReader: read_restart(dynamic properties)
GraphBuilder->>GraphBuilder: build_graph_from_simulation_data()
GraphBuilder->>GraphBuilder: apply_nonlinear_scaling()
GraphBuilder->>Preprocessor: return graph
end
Preprocessor->>Preprocessor: save_graphs()
Preprocessor->>Preprocessor: create_partitions_from_graphs()
Preprocessor->>Preprocessor: split_samples_by_case()
Preprocessor->>Preprocessor: organize_partitions_by_split()
Preprocessor->>Preprocessor: compute_global_statistics()
Preprocessor->>Preprocessor: save_dataset_metadata()
User->>Trainer: torchrun src/train.py
Trainer->>Trainer: InitializeLoggers(cfg)
Trainer->>Trainer: initialize_dataloaders()
Trainer->>Trainer: initialize_model(MeshGraphNet)
Trainer->>Trainer: initialize_optimizer(AdamW)
loop For each epoch
loop For each batch
Trainer->>Model: forward(x, edge_attr, graph)
Model->>Trainer: predictions
Trainer->>Trainer: compute_weighted_loss()
Trainer->>Trainer: backward()
Trainer->>Trainer: optimizer.step()
end
alt Validation epoch
Trainer->>Trainer: validate_epoch()
Trainer->>Trainer: denormalize_predictions()
Trainer->>Trainer: compute_per_variable_losses()
Trainer->>Trainer: check_early_stopping()
Trainer->>Trainer: save_checkpoint()
end
end
User->>Inference: python src/inference.py
Inference->>Inference: load best checkpoint
Inference->>Inference: load test dataset
loop For each test case
loop Autoregressive timesteps
Inference->>Model: forward(x, edge_attr, graph)
Model->>Inference: predictions
Inference->>Inference: update_node_features_with_predictions()
Inference->>Inference: denormalize_predictions()
end
end
Inference->>Inference: save_case_results_hdf5()
Inference->>Inference: run_post()
Inference->>Inference: generate GRDECL files
Inference->>User: inference results & visualizations
21 files reviewed, 39 comments
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The current diff contains documentation-only fixes to the well.py file within the reservoir simulation utilities. Three docstring errors were corrected: (1) removed a reference to a non-existent id attribute in the Completion class, (2) fixed a method name typo from set_out_flow_rate to the correct set_flow_rate, and (3) added the missing stat parameter to the Well.__init__ docstring. These changes align the API documentation with the actual implementation and address the three syntax issues flagged in the previous review (lines 33-42, 106-110).
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/sim_utils/well.py | 5/5 | Fixed three docstring inconsistencies: removed non-existent id reference, corrected method name typo, and added missing stat parameter documentation |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects documentation-only changes with no functional code modifications—these fixes correct previously flagged syntax issues in docstrings without touching any logic or introducing new code paths
- No files require special attention
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest update addresses two documentation and defensive programming issues identified in previous reviews. The first change corrects a path inconsistency in the README where the configuration file path was referenced as both config/config.yaml and conf/config.yaml in different locations—the fix standardizes all references to conf/config.yaml to match the actual project directory structure. The second change adds validation for the cfg.dataset.sim_dir configuration attribute in path_utils.py, mirroring the existing pattern for cfg.runspec.job_name validation. This prevents potential AttributeErrors when the configuration is incomplete, ensuring the utility function fails fast with a clear error message rather than propagating an obscure exception during path construction.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/README.md | 5/5 | Documentation fix correcting configuration file path from config/config.yaml to conf/config.yaml for internal consistency |
| examples/reservoir_simulation/xmgn/src/utils/path_utils.py | 5/5 | Added defensive validation check for cfg.dataset.sim_dir before access, preventing AttributeError on incomplete configuration |
Confidence score: 5/5
- This PR is safe to merge with minimal risk—both changes are minor defensive improvements with no functional side effects
- Score reflects that these are documentation and validation fixes addressing specific review feedback, with no new logic, API changes, or complex interactions that could introduce bugs
- No files require special attention—both changes are straightforward corrections following established patterns in the codebase
2 files reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review. The current change is a trivial formatting fix in examples/reservoir_simulation/xmgn/conf/config.yaml, removing trailing whitespace after the silu activation function specification on line 88. This addresses a linting violation previously flagged and ensures compliance with the repository's pre-commit hooks (which include ruff format and markdownlint). The change has zero functional impact—it simply removes invisible whitespace characters that violated the project's formatting standards.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/conf/config.yaml | 5/5 | Removed trailing whitespace after 'silu' on line 88 (pure formatting fix) |
Confidence score: 5/5
- This PR is safe to merge with minimal risk—it contains only a trivial whitespace removal
- Score reflects the non-functional nature of the change: removing invisible trailing spaces cannot introduce bugs or break existing functionality
- No files require special attention; this is a single-line formatting correction with no code logic changes
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest change updates the requirements.txt file to adjust PyTorch and PyTorch Geometric (PyG) extension versions and versioning strategy.
What changed: The torch version constraint was modified from an implicit dependency on torch 2.9.0+cu128 to an explicit minimum version torch>=2.4.0. The PyG extension URL comment was updated from torch 2.9.0+cu128 to torch 2.4.0+cu121 wheels, and minimum version constraints (>=) were added to torch-scatter, torch-sparse, torch-cluster, and torch-spline-conv packages. This change aligns the CUDA toolkit version from CUDA 12.8 to CUDA 12.1.
Why this matters: The developer is standardizing on torch 2.4.0+cu121 as the tested baseline, likely because torch 2.9.0 is not yet widely available or because PyG wheel compatibility is more reliable for 2.4.0. The change from exact pinning to minimum version constraints (>=) allows patch and minor updates while ensuring a tested baseline. However, this introduces a version coordination risk between the torch requirement and the PyG wheel URL.
How it fits: This requirements file defines the environment for the new X-MeshGraphNet reservoir simulation example (examples/reservoir_simulation/xmgn/). The dependency versions must align with the PhysicsNeMo core library's torch requirements (defined in the root pyproject.toml) and with available pre-built PyG wheels to ensure smooth installation.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/requirements.txt | 3/5 | Updated torch from 2.9.0+cu128 to >=2.4.0, changed PyG wheel URL to torch-2.4.0+cu121, and added minimum version constraints to PyG extensions |
Confidence score: 3/5
- This change reduces installation brittleness by moving from torch2.9.0 (potentially unavailable) to the more stable 2.4.0+cu121 baseline, but introduces version coordination risks between the torch constraint and PyG wheel URL.
- Score reflects a critical version mismatch issue: the torch requirement
torch>=2.4.0(line 5) permits any version ≥2.4.0, but the PyG install URL comment (line 17) points to wheels built specifically for torch 2.4.0+cu121. If users install torch 2.5+ or 2.9+, pip will fail to find matching PyG wheels and attempt error-prone source builds. Additionally, the>=constraints on PyG extensions lack upper bounds, risking future incompatibilities. - Pay close attention to
examples/reservoir_simulation/xmgn/requirements.txtlines 5 and 17-21: Consider pinning torch to==2.4.0or explicitly documenting that users must install torch 2.4.0 before running the PyG install command. The version mismatch between the flexible torch constraint and the fixed PyG wheel URL will cause installation failures in many environments.
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. Two files were modified: a minor exception handling improvement and a dependency version update. In examples/reservoir_simulation/xmgn/src/utils/mlflow_utils.py, a bare except: clause was changed to except Exception: on line 80, following Python best practices to avoid catching system-level signals like KeyboardInterrupt. This utility flattens Hydra configurations for MLflow logging in the X-MeshGraphNet reservoir simulation example. In examples/reservoir_simulation/xmgn/requirements.txt, minimum version constraints were added to four PyG extension packages (torch-scatter>=2.1.2, torch-sparse>=0.6.18, torch-cluster>=1.6.3, torch-spline-conv>=1.2.2) that previously had no version pins. These packages depend on PyG 2.6.0 and torch 2.4.0+cu121, and the version constraints improve reproducibility by preventing installation of older incompatible versions. Both changes are small incremental improvements to code quality and dependency management.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/src/utils/mlflow_utils.py | 5/5 | Changed bare except to except Exception: on line 80 to allow system signals to propagate |
| examples/reservoir_simulation/xmgn/requirements.txt | 4/5 | Added minimum version constraints to four PyG extension packages (torch-scatter, torch-sparse, torch-cluster, torch-spline-conv) |
Confidence score: 4/5
- This PR contains low-risk quality improvements with one potential installation issue that requires attention
- Score reflects that the exception handling fix is safe and correct, but the requirements.txt change creates a potential version mismatch between the hardcoded PyG wheel URL (torch-2.4.0+cu121.html) and the newly specified minimum versions which may not exist at that URL, potentially causing pip to fall back to source builds
- Pay close attention to examples/reservoir_simulation/xmgn/requirements.txt - verify that the minimum versions specified (torch-scatter>=2.1.2, torch-sparse>=0.6.18, torch-cluster>=1.6.3, torch-spline-conv>=1.2.2) are actually available at the PyG wheel repository URL mentioned in the installation comment, or update the comment to reflect the correct installation procedure
2 files reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The most recent update improves error handling in mlflow_utils.py by addressing variable scoping issues and deprecation warnings that were identified in previous reviews.
The changes pre-initialize job_name and dataset_name with fallback values before the try block, preventing UnboundLocalError exceptions when the exception handler needs to access these variables. Additionally, nested attribute access patterns are replaced with safer getattr() calls that provide defaults when config sections are missing. The deprecated datetime.utcnow() call is updated to datetime.now(timezone.utc) for Python 3.12+ compatibility. These defensive programming patterns align with PhysicsNeMo's approach of graceful degradation—the MLflow logging should capture as much metadata as possible without failing the entire training run if configuration fields are malformed or absent.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/src/utils/mlflow_utils.py | 5/5 | Pre-initializes fallback variables and replaces direct attribute access with safer getattr patterns to prevent UnboundLocalError and AttributeError exceptions; updates deprecated datetime API |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- All previous critical issues in this file have been addressed: variable scoping prevents
UnboundLocalError, defensive attribute access preventsAttributeError, and the deprecated datetime API is updated for forward compatibility - No files require special attention
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. Two minor fixes were applied based on previous code review feedback: one addresses a Python 3.12 deprecation in MLflow timestamp logging, and the other replaces a bare except: clause with a more specific exception handler in the inference script.
Changes:
examples/reservoir_simulation/xmgn/src/utils/mlflow_utils.py: Replaced deprecateddatetime.utcnow()withdatetime.now(timezone.utc)for timezone-aware timestamp generation in MLflow loggingexamples/reservoir_simulation/xmgn/src/inference.py: Changed bareexcept:on line 145 toexcept Exception:to prevent catching system-level exceptions likeKeyboardInterruptandSystemExit
Both changes follow Python best practices and align with patterns established elsewhere in the physicsnemo codebase. The MLflow fix future-proofs the code against Python 3.12+ where utcnow() will be removed, while the inference fix ensures proper exception handling without blocking graceful program termination.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/src/utils/mlflow_utils.py | 5/5 | Replaced deprecated datetime.utcnow() with timezone-aware datetime.now(timezone.utc) for MLflow timestamp logging |
| examples/reservoir_simulation/xmgn/src/inference.py | 5/5 | Fixed bare except: clause to except Exception: to allow system interrupts to propagate correctly |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects simple, well-understood deprecation and best-practice fixes with no functional impact or complex logic changes
- No files require special attention
2 files reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The most recent commit removes two unused return values from helper methods in examples/reservoir_simulation/xmgn/src/inference.py: the target_vars variable that was being discarded by all callers, and a redundant case_name extraction that was immediately re-computed at the call site. This is a straightforward code cleanup that simplifies tuple unpacking without changing behavior.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/src/inference.py | 5/5 | Removed unused return values (target_vars and case_name) from two private helper methods to eliminate dead code |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- The changes are pure refactoring—removing return values that were systematically ignored by all callers (evidenced by
_unpacking at lines 271 and 791), with no logic changes or new functionality - No files require special attention; this is straightforward dead-code elimination
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR.
A single targeted fix in the ECLIPSE binary file reader replaces a bare except: clause with explicit exception handling. The change on line 584 of ecl_reader.py now catches only struct.error and IndexError - the two specific exceptions that can occur when unpacking binary record headers during UNRST file parsing. This follows up on a previous review comment about exception-handling hygiene.
The _read_unrst method reads ECLIPSE reservoir simulation restart files by parsing binary records. Each record header is validated using struct.unpack to read a 4-byte integer. The new exception specification ensures that if malformed data or buffer boundary issues occur, the code silently continues pattern matching (as intended), but critical exceptions like KeyboardInterrupt and SystemExit will now propagate correctly. This is a standard practice for production code that processes potentially corrupted binary data streams.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/sim_utils/ecl_reader.py | 5/5 | Replaced bare except with specific struct.error and IndexError handling in binary file parser |
Confidence score: 5/5
- This is a safe, localized fix that improves exception handling without changing functionality
- The change is minimal (single line), well-scoped, and follows Python best practices for exception handling
- No files require special attention - this is a straightforward code quality improvement
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest change applies explicit log-level guards to all logging calls in the _read_unrst method of examples/reservoir_simulation/sim_utils/ecl_reader.py. Each logging.info() and logging.debug() statement is now wrapped in an if logging.getEffectiveLevel() <= logging.INFO/DEBUG: check to prevent string formatting overhead when the logger is set to WARNING or ERROR levels. This addresses a performance concern identified in earlier reviews for a data-loading hotpath that processes thousands of timesteps in ECLIPSE reservoir simulation restart files. While Python's logging module performs internal level checks, the explicit guards prevent f-string evaluation before the check, which can be meaningful when processing large datasets in production environments where verbose logging is disabled.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/sim_utils/ecl_reader.py | 4/5 | Added explicit log-level checks around every logging statement in _read_unrst to avoid string formatting overhead when verbose logging is disabled |
Confidence score: 4/5
- This PR is safe to merge with minimal risk; the changes are purely performance optimizations that do not alter functionality
- Score reflects that while the pattern introduces code duplication and reduces readability, it does not introduce bugs or breaking changes; the optimization is valid though the benefit may be marginal since Python's logging module already performs level checks internally
- Pay close attention to
examples/reservoir_simulation/sim_utils/ecl_reader.pyto verify the readability trade-off is acceptable for your team's coding standards, as the repeated guards make the method significantly more verbose
1 file reviewed, 2 comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest change is a minor documentation refinement to the README file for the X-MeshGraphNet reservoir simulation example. The update removes the word "wells" from the visualization section caption (since the images don't actually show wells) and makes the permeability label more specific by adding "(PERMX)" to clarify that the visualization shows permeability in the X-direction. This matches petroleum engineering conventions where permeability is directional (PERMX, PERMY, PERMZ) and improves consistency between the caption, the image filename, and the table cell description. The change is purely cosmetic and does not affect any code, configurations, or functionality.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/README.md | 5/5 | Documentation clarification removing misleading "wells" reference and adding "(PERMX)" to permeability label for precision |
Confidence score: 5/5
- This PR is safe to merge with minimal risk; it contains only documentation improvements with no functional changes
- Score reflects the purely cosmetic nature of the changes—only caption text was modified to improve accuracy and consistency
- No files require special attention; this is a straightforward documentation correction
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The current update addresses previously identified issues with minimal code changes across four files. The copyright year is updated to 2025 in three files (__init__.py, grid.py, train.py). The sim_utils package receives proper initialization with explicit imports, an __all__ declaration, version definition, and comprehensive documentation. A critical MLflow resource leak is fixed by removing redundant mlflow.start_run() call that conflicted with PhysicsNeMo's initialize_mlflow(). The fix_layernorm_compatibility() function is refactored to replace a lambda with a named function for better debuggability. These changes improve code maintainability and resolve integration issues with PhysicsNeMo's logging framework, while the bulk of the X-MeshGraphNet reservoir simulation example code remains unchanged from the previous review.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/sim_utils/init.py | 5/5 | Formalized package structure with explicit imports, __all__ declaration, version definition, and comprehensive documentation; copyright updated to 2025 |
| examples/reservoir_simulation/sim_utils/grid.py | 5/5 | Copyright year updated from2024 to 2025; no functional changes |
| examples/reservoir_simulation/xmgn/src/utils.py | 3/5 | Refactored lambda to named function _register_load_state_dict_pre_hook for improved debugging; potential method signature issue remains |
| examples/reservoir_simulation/xmgn/src/train.py | 5/5 | Fixed MLflow resource leak by removing redundant mlflow.start_run() call; copyright updated to2025 |
Confidence score: 4/5
- This PR is mostly safe to merge but requires careful review of one file due to a potential runtime issue
- Score reflects that three files are straightforward improvements (copyright updates, package initialization, resource leak fix) but
utils.pystill has a potentially problematic method signature in_register_load_state_dict_pre_hookthat may cause runtime errors when loading checkpoints. The function takes(self, hook)parameters but PyTorch'sregister_load_state_dict_pre_hooktypically expects different arguments. If checkpoint loading is not tested in this PR's scope, this could surface as a runtime error later. - Pay close attention to
examples/reservoir_simulation/xmgn/src/utils.pyline 38-44. Test checkpoint loading functionality to verify the compatibility shim works correctly, or confirm this is intentionally a no-op stub. The other three files are safe to merge without additional scrutiny.
4 files reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. This update is a documentation-only change that improves the README file for the X-MeshGraphNet reservoir simulation example. The changes add a comprehensive "Overview" section that provides domain context (explaining reservoir simulation, subsurface flow, and why X-MGN is suitable for this application), enhance the setup and configuration instructions with clearer guidance on config file usage across different stages (preprocessing, training, inference), and standardize placeholder naming conventions from <your_experiment_name> to <your-experiment-name> throughout command examples. These improvements help users understand the example's purpose and reduce confusion during setup, particularly around which configuration file to use at each stage of the workflow. The changes align with the repository's documentation standards and improve accessibility for users unfamiliar with reservoir simulation domains.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/README.md | 5/5 | Added overview section explaining reservoir simulation context, clarified config file usage across workflow stages, and standardized placeholder syntax from underscores to hyphens |
Confidence score: 5/5
- This PR is safe to merge with minimal risk as it contains only documentation improvements with no code or configuration changes
- Score reflects that all changes are purely informational and cannot introduce functional bugs, runtime errors, or behavioral changes to the codebase
- No files require special attention; the README improvements enhance clarity without any technical concerns
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. This update modifies the requirements.txt for the X-MeshGraphNet reservoir simulation example by consolidating PyG extension installation and pinning core dependencies to exact versions. The --find-links directive is now placed at the top of the file, streamlining the installation process by eliminating the need for a separate two-step pip install command with the -f flag. Core dependencies (torch, torchaudio, torchvision) are now pinned to exact versions (2.4.0/0.19.0) instead of using minimum version constraints (>=), and nvidia-physicsnemo is pinned to version 1.2.0. Installation instruction comments have been removed, and a note about scipy being auto-installed has been added. This change fits within the broader PhysicsNeMo examples structure (see /examples directory documentation), which provides domain-specific demonstrations with Hydra configuration and turnkey setup processes, aiming to reduce friction for users setting up the reservoir simulation example.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/requirements.txt | 3/5 | Consolidated PyG installation with --find-links directive and pinned all core dependencies to exact versions, removing flexible version constraints |
Confidence score: 3/5
- This PR is relatively safe to merge but has potential issues that should be addressed to improve flexibility and avoid future compatibility problems.
- Score reflects concerns about strict version pinning preventing security patches and potential version mismatch between the wheel repository URL (torch-2.4.0+cu121) and the declared torch version (==2.4.0), which could cause installation failures if the wheel repository expects exactly torch 2.4.0+cu121 but the requirements install torch 2.4.0 without the CUDA suffix.
- Pay close attention to
examples/reservoir_simulation/xmgn/requirements.txt- verify that the--find-linksURL (https://data.pyg.org/whl/torch-2.4.0+cu121.html) is compatible with the exact torch version (==2.4.0) specified, and consider whether strict pinning of all dependencies will cause issues with security patches and minor version updates.
1 file reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used in README?
docs/img/reservoir_simulation/xmgn/2D_Q5SP/sample_103_timestep_003_PRESSURE.png
Show resolved
Hide resolved
docs/img/reservoir_simulation/xmgn/2D_Q5SP/sample_103_timestep_003_SWAT.png
Show resolved
Hide resolved
| from .grid import Grid | ||
| from .well import Well, Completion | ||
|
|
||
| __all__ = ["EclReader", "Grid", "Well", "Completion"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify the code for all these utilities is written by NVIDIA. If the code has been taken from somewhere else, even if partially, we need to modify the license header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reused part of an open source code (my repository with Apache license: https://github.com/GEG-ETHZ/pyflowdiagnostics/tree/v0.1.0) and modified a bit. Do we need to add something to the license header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in that case, please include the original license header first, followed by the NVIDIA license header. Add #ignore_header_test at the beginning of the file to skip the ci header test.
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds a comprehensive X-MeshGraphNet example for reservoir simulation surrogate modeling. The implementation includes Eclipse/IX binary file readers (sim_utils), parallel graph construction from simulation data, METIS-based graph partitioning for distributed training, and complete training/inference pipelines with MLflow tracking.
Key additions:
sim_utilspackage for reading Eclipse binary formats (INIT, EGRID, UNRST) with proper endianness handling- Parallel graph builder processing multiple simulation cases using multiprocessing with progress tracking
- Preprocessor orchestrating graph generation, partitioning, data splits, and global statistics
- Training pipeline with distributed support, gradient checkpointing, and early stopping
- Inference with autoregressive rollout and GRDECL export for visualization
- Comprehensive documentation with examples for 2D waterflood and Norne field datasets
The code demonstrates good software engineering practices with proper error handling, configuration validation, and memory-efficient processing. Previous review comments have identified several style and documentation issues that should be addressed, but no critical blocking issues were found in this review round.
Confidence Score: 4/5
- This PR is safe to merge with minor style improvements recommended
- The implementation is well-structured with comprehensive error handling, proper validation, and good documentation. Previous review rounds identified numerous style issues (bare excepts, logging inconsistencies, unused imports) and some logic concerns (import paths, potential race conditions, monkey-patching) that should be addressed but are not critical blockers. The core functionality appears sound with appropriate use of multiprocessing, memory management, and distributed training patterns. No new critical issues were identified in this review.
- Previous comments highlight style improvements needed in
train.py,inference.py,utils.py,grid.py, andwell.py- primarily bare except clauses, logging consistency, and documentation fixes
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/src/data/graph_builder.py | 4/5 | New file implementing parallel graph construction from reservoir simulation data with multiprocessing, proper error handling, and efficient memory usage through immediate file saves |
| examples/reservoir_simulation/xmgn/src/preprocessor.py | 4/5 | New preprocessor orchestrating graph generation, partitioning, train/val/test splits, and global statistics computation with proper validation and user interaction |
| examples/reservoir_simulation/sim_utils/ecl_reader.py | 4/5 | New Eclipse binary file reader supporting INIT, EGRID, UNRST formats with endianness detection and comprehensive error handling |
Sequence Diagram
sequenceDiagram
participant User
participant Preprocessor
participant GraphBuilder
participant EclReader
participant Train
participant Inference
User->>Preprocessor: Run preprocessor.py
Preprocessor->>Preprocessor: Validate config
Preprocessor->>GraphBuilder: Create raw graphs
loop For each simulation case
GraphBuilder->>EclReader: Read INIT/EGRID/UNRST files
EclReader-->>GraphBuilder: Grid data + dynamic vars
GraphBuilder->>GraphBuilder: Build PyG graph per timestep
GraphBuilder->>GraphBuilder: Save graph to disk
end
GraphBuilder-->>Preprocessor: List of graph files
Preprocessor->>Preprocessor: Partition graphs (METIS)
Preprocessor->>Preprocessor: Split train/val/test
Preprocessor->>Preprocessor: Compute global statistics
Preprocessor-->>User: Preprocessing complete
User->>Train: Run train.py (distributed)
Train->>Train: Load partitioned graphs
Train->>Train: Train X-MGN model
Train->>Train: Validate & checkpoint
Train-->>User: Best model saved
User->>Inference: Run inference.py
Inference->>Inference: Load best checkpoint
loop Autoregressive rollout
Inference->>Inference: Predict next timestep
Inference->>Inference: Use prediction as input
end
Inference->>Inference: Export HDF5 + GRDECL
Inference-->>User: Results for visualization
6 files reviewed, no comments
Signed-off-by: Tsubasa Onishi <[email protected]>
Signed-off-by: Tsubasa Onishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds a comprehensive new example implementing X-MeshGraphNet (X-MGN) for reservoir simulation surrogate modeling, targeting the petroleum engineering domain. The implementation includes ~5000 lines of new code across data preprocessing, training, inference, and visualization components.
Key Components Added:
sim_utilsmodule: Binary ECLIPSE/IX file readers (ecl_reader.py,grid.py) for industry-standard reservoir simulation data formats- Data pipeline: Graph construction from reservoir grids (
graph_builder.py), partitioning for distributed training, and PyTorch Geometric dataloaders - Training infrastructure: Multi-GPU training with MLflow tracking, gradient scaling, and early stopping
- Inference pipeline: Autoregressive rollout with HDF5 and GRDECL output formats for visualization in industry tools (Petrel, ResInsight)
- Documentation: Comprehensive README with two example datasets (2D five-spot, Norne Field)
Implementation Quality:
The code demonstrates solid engineering practices with proper licensing headers, extensive documentation, and integration with PhysicsNeMo's distributed training infrastructure. The autoregressive inference implementation correctly handles temporal dependencies and partition boundaries (halo nodes).
Previous Review Coverage:
This PR has already undergone extensive review with 70+ comments addressing style issues (unused imports, logging inconsistencies), potential logical issues (gradient accumulation timing, edge index remapping), and security concerns (weights_only=False in torch.load, bare except clauses). Most issues are non-critical style/best-practice suggestions.
Critical Issues Previously Identified:
- Grid edge remapping logic complexity in
sim_utils/grid.py(lines 239-247) - Gradient scaler placement inside partition loop in
train.py(line 674) - Import path inconsistency (
sim_utilsabsolute vs relative imports) torch.loadwithweights_only=Falseposes security risk if loading untrusted partition files (inference.py:713)
Confidence Score: 3/5
- This PR introduces substantial new functionality with some logical concerns that should be addressed, but no critical blocking issues for an example/research codebase
- Score of 3 reflects a balance between the PR's strengths and concerns. Positive factors: comprehensive documentation, proper integration with PhysicsNeMo infrastructure, CHANGELOG updated, and thorough examples. Concerns that lower confidence: (1) complex grid remapping logic with potential indexing bugs that could cause incorrect results, (2) gradient accumulation timing issue that may affect training convergence, (3) security risk from
weights_only=Falsein torch.load, (4) numerous style issues suggesting the code hasn't been fully production-hardened. Since this is an example/research code rather than core library functionality, these issues are manageable but should be addressed before wider adoption examples/reservoir_simulation/sim_utils/grid.py(edge remapping logic),examples/reservoir_simulation/xmgn/src/train.py(gradient scaler placement), andexamples/reservoir_simulation/xmgn/src/inference.py(torch.load security)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| examples/reservoir_simulation/xmgn/src/inference.py | 4/5 | Autoregressive inference implementation with checkpoint loading, HDF5 output, and GRDECL generation - uses weights_only=False in torch.load on line 713 |
| examples/reservoir_simulation/sim_utils/ecl_reader.py | 4/5 | ECLIPSE binary file reader supporting INIT, EGRID, UNRST formats - extensive functionality with some logging inconsistencies |
| examples/reservoir_simulation/sim_utils/grid.py | 3/5 | Grid processing with corner-point geometry and edge mapping - contains complex remapping logic with potential indexing issues |
| examples/reservoir_simulation/xmgn/src/train.py | 3/5 | Training script with distributed support, MLflow tracking, and early stopping - has gradient accumulation and scaler timing issues |
| examples/reservoir_simulation/xmgn/src/preprocessor.py | 4/5 | Data preprocessing pipeline for graph generation, partitioning, and train/val/test splitting |
| examples/reservoir_simulation/xmgn/conf/config_norne.yaml | 4/5 | Configuration for Norne Field dataset - contains relative path that may break with Hydra's chdir setting |
Sequence Diagram
sequenceDiagram
participant User
participant Preprocessor
participant EclReader
participant Grid
participant GraphBuilder
participant Partitioner
participant Trainer
participant Model as MeshGraphNet
participant Inference
participant PostProcessor
Note over User,PostProcessor: Phase 1: Data Preprocessing
User->>Preprocessor: python preprocessor.py --config-name=config
Preprocessor->>EclReader: read ECLIPSE binary files (.INIT, .EGRID, .UNRST)
EclReader-->>Preprocessor: grid data, properties, timesteps
Preprocessor->>Grid: create grid structure
Grid-->>Preprocessor: corner-point geometry, connections, active cells
Preprocessor->>GraphBuilder: build_graphs(cases, timesteps)
loop For each case and timestep
GraphBuilder->>GraphBuilder: create nodes (active cells) and edges (connections)
GraphBuilder->>GraphBuilder: extract features (PERMX, PORO, PRESSURE, SWAT)
GraphBuilder-->>Preprocessor: graph with node/edge features
end
Preprocessor->>Partitioner: partition graphs using METIS
Partitioner-->>Preprocessor: partitioned graphs with halo nodes
Preprocessor->>Preprocessor: split into train/val/test sets
Preprocessor-->>User: processed graphs saved to dataset_dir
Note over User,PostProcessor: Phase 2: Training
User->>Trainer: torchrun train.py --config-name=config
Trainer->>Trainer: initialize distributed training
Trainer->>Trainer: load partitioned graphs
Trainer->>Model: initialize MeshGraphNet
loop For each epoch
loop For each batch
loop For each partition
Trainer->>Model: forward(x, edge_attr, partition)
Model-->>Trainer: predictions
Trainer->>Trainer: calculate MSE loss on inner nodes
Trainer->>Trainer: backward and accumulate gradients
end
Trainer->>Trainer: optimizer step
end
Trainer->>Trainer: validation on val set
Trainer->>Trainer: check early stopping
end
Trainer-->>User: best checkpoint saved
Note over User,PostProcessor: Phase 3: Inference
User->>Inference: python inference.py --config-name=config
Inference->>Inference: load best checkpoint
Inference->>Inference: load test dataset
loop For each case (autoregressive)
loop For each timestep
alt timestep < prev_timesteps + 1
Inference->>Model: forward with true features
else
Inference->>Inference: update features with previous predictions
Inference->>Model: forward with predicted features
end
Model-->>Inference: predictions
Inference->>Inference: denormalize and store results
end
end
Inference->>PostProcessor: save HDF5 files (case_name.hdf5)
Inference->>PostProcessor: generate GRDECL files
PostProcessor->>Grid: get coordinates and actnum
PostProcessor->>PostProcessor: map active cells to full grid
PostProcessor-->>User: HDF5 and GRDECL files for visualization
4 files reviewed, no comments
| Save wells_data (list of lists of dicts of Well objects) to a pickle file. | ||
| """ | ||
| with open(self._output_path_well, "wb") as f: | ||
| pickle.dump(wells_data, f, protocol=pickle.HIGHEST_PROTOCOL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, pickle has a security issue and it is recommended not to use it. Would it be possible to switch to json?
| - **HDF5 files**: Contain predictions and targets for each simulation case, | ||
| organized by timestep and variable | ||
| - **GRDECL files**: Eclipse-compatible ASCII format that can be imported into | ||
| popular software such as Petrel and [ResInsight](https://resinsight.org/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only way to visualize the results is to use these softwares, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. I chose this approach because: 1) It aligns with standard practices in the oil and gas industry. I am sure reservoir engineers will like it. 2) These visualization tools offer interactive features and the ability to display wells, faults, etc. (ResInsight is open source, similar to ParaView).
I’m happy to implement an alternative approach if you think it would make more sense.
PhysicsNeMo Pull Request
Description
Checklist
Dependencies