-
Notifications
You must be signed in to change notification settings - Fork 473
Remove zero-len arrays from test dataset #1198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove zero-len arrays from test dataset #1198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Removed scalar fields air_density and stream_velocity from the synthetic_domino_data() function. These scalar values were causing issues when saved to zarr format as they were being treated as zero-length arrays.
The fix is correct because:
- The test suite already provides these values via the
keys_to_read_if_availableparameter (line 289-292) with hardcoded tensors[1.225, 10.0] CAEDatasethas a built-in mechanism to fill missing keys fromkeys_to_read_if_available(seecae_dataset.py:116-120)- The synthetic test data should only contain array data, not scalar parameters
- This aligns with the proper separation of concerns: test data files contain mesh/field arrays, while global parameters come from configuration
Confidence Score: 5/5
- This PR is safe to merge - it's a targeted bug fix that removes problematic scalar values from test data generation
- The change correctly addresses the zero-length array issue by removing scalar values that were improperly included in the synthetic test dataset. The existing
keys_to_read_if_availablefallback mechanism already handles these values properly, making the removal safe and correct. The fix is minimal, well-targeted, and doesn't affect production code or real data pipelines. - No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| test/datapipes/test_domino_datapipe.py | 5/5 | Removed scalar fields air_density and stream_velocity from synthetic test data generation - these values are already provided via keys_to_read_if_available fallback mechanism |
Sequence Diagram
sequenceDiagram
participant Test as Test Suite
participant SynthData as synthetic_domino_data()
participant Zarr as Zarr File
participant CAEDataset as CAEDataset
participant DataPipe as DoMINODataPipe
Note over Test,DataPipe: Before: Scalar values causing issues
Test->>SynthData: Generate test data
SynthData->>SynthData: Create out_dict with arrays
SynthData->>SynthData: Add air_density scalar (np.float32)
SynthData->>SynthData: Add stream_velocity scalar (np.float32)
SynthData->>Zarr: Save dict with scalars
Note over Zarr: Scalars treated as zero-length arrays
Test->>CAEDataset: Create dataset with keys_to_read_if_available
CAEDataset->>Zarr: Read data files
CAEDataset->>CAEDataset: Fill missing keys from keys_to_read_if_available
CAEDataset->>DataPipe: Provide data
Note over Test,DataPipe: After: Clean separation of concerns
Test->>SynthData: Generate test data
SynthData->>SynthData: Create out_dict with arrays only
Note over SynthData: Scalars removed from synthetic data
SynthData->>Zarr: Save dict (arrays only)
Test->>CAEDataset: Create dataset with keys_to_read_if_available
Note over CAEDataset: global_params from keys_to_read_if_available
CAEDataset->>Zarr: Read data files
CAEDataset->>CAEDataset: Fill global_params from fallback
CAEDataset->>DataPipe: Provide complete data
1 file reviewed, no comments
|
/blossom-ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Removes two unused scalar fields (air_density and stream_velocity) from the synthetic test dataset generation in test_domino_datapipe.py. These fields were never read by the DoMINO datapipe or used in any tests.
- The datapipe uses
keys_to_read_if_availablewith default values forglobal_params_valuesandglobal_params_reference(both[1.225, 10.0]) - The removed fields were only being written to test data files but never accessed
- This cleanup makes the test dataset more accurate by only including fields that are actually used
Confidence Score: 5/5
- This PR is safe to merge with no risk
- The change only removes unused test data fields that were never read by the datapipe or referenced in any tests. The removed fields (
air_densityandstream_velocity) were dead code that had no functional impact. All necessary data fields remain intact, and the datapipe uses default values fromkeys_to_read_if_availableinstead. - No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| test/datapipes/test_domino_datapipe.py | 5/5 | Removed unused scalar fields (air_density, stream_velocity) from synthetic test dataset generation - these fields were never read by the datapipe |
Sequence Diagram
sequenceDiagram
participant Test as Test Suite
participant SyntheticData as synthetic_domino_data()
participant DataFile as Test Data Files
participant Dataset as CAEDataset
participant DataPipe as DoMINODataPipe
Test->>SyntheticData: Generate test data
SyntheticData->>SyntheticData: Create mesh & field data
Note over SyntheticData: Removed: air_density, stream_velocity<br/>(unused scalar fields)
SyntheticData->>DataFile: Save to zarr/npz/npy
Test->>Dataset: Create CAEDataset
Note over Dataset: keys_to_read_if_available:<br/>global_params_values=[1.225, 10.0]<br/>global_params_reference=[1.225, 10.0]
Dataset->>DataFile: Read keys_to_read
DataFile-->>Dataset: Return data
Dataset->>Dataset: Fill missing keys with defaults
Test->>DataPipe: Create DoMINODataPipe
DataPipe->>Dataset: Request data
Dataset-->>DataPipe: Return complete dataset
DataPipe-->>Test: Process and return sample
1 file reviewed, no comments
PhysicsNeMo Pull Request
Fixes errors in the release candidate branch.
Description
Checklist
Dependencies