DAS-2418: Add exclusions for SMAP L3 string variables#33
Conversation
flamingbear
left a comment
There was a problem hiding this comment.
This looks good to me. Nice work. I had a couple of minor nits.
| def is_excluded_science_variable(var_info: VarInfoFromNetCDF4, var) -> bool: | ||
| """Returns True if variable is explicitly excluded by VarInfo configuration.""" | ||
| exclusions_pattern = re.compile( | ||
| '|'.join(var_info.cf_config.excluded_science_variables) | ||
| ) | ||
| return var_info.variable_is_excluded(var, exclusions_pattern) |
There was a problem hiding this comment.
So this is probably the time to move this to earthdata-varinfo. Because I'll bet Josie has a similar function in her PR.
| expected_output_netcdf4_file_test05, decode_times=False | ||
| ) as expected_datatree, | ||
| xr.open_datatree(temp_output_file_path, decode_times=False) as results_datatree, | ||
| ): |
There was a problem hiding this comment.
The first time I really hate black's formatting. 😬
| sample_datatree = xr.DataTree( | ||
| xr.Dataset( | ||
| data_vars={ | ||
| 'string_time_utc_seconds': xr.DataArray(np.ones((3, 3))), |
There was a problem hiding this comment.
There's nothing in your PR that explicitly ignores strings correct?
Separately, could this be a string variable?
There was a problem hiding this comment.
Correct, there is no explicit exclusion of strings.
I can see if I can make them strings to be more aligned with the variable name, but as I said, there is no specific handling for strings.
There was a problem hiding this comment.
Updated in 69d3374. Let me know if this is what you were looking for.
There was a problem hiding this comment.
Yes, just something with an actual string var. Thanks
flamingbear
left a comment
There was a problem hiding this comment.
Thanks for the updates. I verified all of the test steps. Files without Strings and the regression tests. I know there is a change coming for the regression tests due to a phony_dim_name numbering difference. But this can merge.
lyonthefrog
left a comment
There was a problem hiding this comment.
I ran the unit and regression tests, and they all appear to be passing! Added a couple little questions, but once those are answered I'll approve
There was a problem hiding this comment.
Just commenting on this file as a whole - I never thought about creating a test config file like this, kinda cool! I suppose I wonder if it might be an issue that we're not directly testing the changes to earthdata_varinfo_config.json, but if this is an accepted practice I'll accept it.
There was a problem hiding this comment.
While I think both approaches are okay, the earthdata_varinfo_test_config.json file includes configuration items for things like deleting attributes that don't currently exist in the earthdata_varinfo_config.json. I think using the test config file is beneficial for our unit tests as it won't change as desired collection specific output annotations evolve (ex. unit test using a configuration item that is later removed).
There was a problem hiding this comment.
Ah, okay got it! And that makes sense especially for the metadata annotator.
There was a problem hiding this comment.
Also, do you think it'd be worthwhile to test the specific SPL3FTA configuration, or nah?
There was a problem hiding this comment.
The precedent set in this repository has been to avoid tests for specific collection configuration rules. I think this is mainly because it would inflate the test suite (would require tests for every collection that is configured) and we'd also need to store granules for these tests in the repository.
We've been reliant on properly testing these collection specific configuration rules in the test instructions for the PRs that add them.
Description
This PR adds the ability to exclude variables from the output. Configuration rules can be defined in the
ExcludedScienceVariablessection of the earthdata-varinfo configuration file, and the metadata-annotator will exclude any variables matching these rules from the output.Configuration rules were added to specifically exclude SMAP L3 string variables.
Jira Issue ID
DAS-2418
Local Test Steps
Check out this branch and build the new image and run the tests
Ensure your HIAB configuration has:
Restart harmony to pick up the new image:
Run the DAS-2418.ipynb notebook attached to this ticket. Inspect the downloaded outputs for each collection and ensure the following variables have been excluded:
To ensure no regressions, run the applicable das-harmony-regression-tests notebooks. Make sure to set harmony_host_url = 'http://localhost:3000/'.
PR Acceptance Checklist
CHANGELOG.mdupdated to include high level summary of PR changes.docker/service_version.txtupdated if publishing a release.