TRT-635: Refactor retrieval of GeoTIFF variable names.#41
TRT-635: Refactor retrieval of GeoTIFF variable names.#41owenlittlejohns merged 1 commit intomainfrom
Conversation
|
|
||
| class DownloadError(HGAException): | ||
| """Raised when the Harmony GDAL Adapter cannot retrieve input data.""" | ||
| class DownloadError(HarmonyException): |
There was a problem hiding this comment.
I kind of wonder if the HGANoRetryException is an unnecessary intermediate layer. @flamingbear - you may have already said this when you realised that the service name wasn't actually being used as expected.
I didn't cut out the HGANoRetryException for now, but it feels like we could.
There was a problem hiding this comment.
I do think this is the right thing probably.
| layernames.append(layer_id) | ||
| filelist.append(filename) | ||
| else: | ||
| raise MissingVariableError(requested_variable.name) |
There was a problem hiding this comment.
This is a behaviour change: previously, if a requested variable could not be matched to anything in the GeoTIFF, the service would just carry on silently to the next requested variable. I think that goes against the general theme of: if a user explicitly asks for something, and a service can't do it, then we should fail and say so.
There was a problem hiding this comment.
Also, generally, I'm lacking test coverage of methods on this class. My plan is:
- Pull unnecessary things from the class, to reduce overall size. Add unit tests to things pulled out.
- Rely on the regression tests for now on the larger behaviour.
- Once the adapter is cut down to size, start adding more tests in the repository for methods still there.
If that just sounds lazy, though, let me know and I can look into adding some tests for this method.
There was a problem hiding this comment.
if a user explicitly asks for something, and a service can't do it, then we should fail and say so.
I'm full agreed with this, but... I think it might do users a disservice when they're using the EDSC. I am seeing folks requesting things they probably didn't mean to, and then the failure messages are not easily sensible (to a user). But that's a big bag of another day.
| if "netCDF" in gdalinfo_lines[0] or "HDF" in gdalinfo_lines[0]: | ||
| # netCDF/Network Common Data Format, HDF5/Hierarchical Data Format | ||
| # Release 5 | ||
| # Normal case of NetCDF / HDF, where variables are subdatasets | ||
| for subdataset in filter( | ||
| (lambda line: re.match(r"^\s*SUBDATASET_\d+_NAME=", line)), | ||
| gdalinfo_lines, | ||
| ): | ||
| result.append(HarmonyVariable({"name": re.split(r":", subdataset)[-1]})) |
There was a problem hiding this comment.
This is what started me off with this PR - I spotted code that was trying to accommodate netCDF4 input files. But those are no longer handled, so this code is unreachable.
| * The standard_name as retrieved from the band metadata. | ||
| * "BandN", if there is no standard_name. | ||
|
|
||
| """ |
There was a problem hiding this comment.
I think I like the new dictionary comprehension in place of a for loop with an if/else inside. But if it isn't as readable, then let me know.
There was a problem hiding this comment.
I know how you love a comprehension. I can read this, but I would have probably done this, but no need to change it.
result = {}
with OpenGDAL(filename) as dataset:
for band_index in range(1, dataset.RasterCount + 1):
result[f'Band{band_index}'] = (
dataset.GetRasterBand(band_index).GetMetadata().get("standard_name")
or f"Band{band_index}"
)
| if band is None: | ||
| # No standard name matched, now try raw band names, i.e. BandN | ||
| band = next( | ||
| ( | ||
| band_name | ||
| for band_name in geotiff_variables | ||
| if requested_variable.name.lower() in band_name.lower() | ||
| ), | ||
| None, | ||
| ) |
There was a problem hiding this comment.
I don't have a strong opinion, but an alternative implementation could be to combine these two searches down so that the standard_name and the band name ("BandN") are being checked at the same time. Meh.
flamingbear
left a comment
There was a problem hiding this comment.
This looks good. I was able to run the tests locally and the regressions against Harmony-In-A-Box. All succeeded. Changes make sense to me.
|
|
||
| class DownloadError(HGAException): | ||
| """Raised when the Harmony GDAL Adapter cannot retrieve input data.""" | ||
| class DownloadError(HarmonyException): |
There was a problem hiding this comment.
I do think this is the right thing probably.
| if band is None: | ||
| # No standard name matched, now try raw band names, i.e. BandN | ||
| band = next( | ||
| ( | ||
| band_name | ||
| for band_name in geotiff_variables | ||
| if requested_variable.name.lower() in band_name.lower() | ||
| ), | ||
| None, | ||
| ) |
| * The standard_name as retrieved from the band metadata. | ||
| * "BandN", if there is no standard_name. | ||
|
|
||
| """ |
There was a problem hiding this comment.
I know how you love a comprehension. I can read this, but I would have probably done this, but no need to change it.
result = {}
with OpenGDAL(filename) as dataset:
for band_index in range(1, dataset.RasterCount + 1):
result[f'Band{band_index}'] = (
dataset.GetRasterBand(band_index).GetMetadata().get("standard_name")
or f"Band{band_index}"
)
Description
This PR is something I spotted when about to convert the shell commands to be Python instead. I also threw in the use of the
NoRetryExceptionbecause that seems like a good thing to do.I've made it a patch release, but I'm also fine to stack this up with some other changes. No strong opinions on that, as this is a service that is not in production.
Jira Issue ID
TRT-635
Local Test Steps
./bin/build-image && ./bin/build-test && ./bin/run-test.LOCALLY_DEPLOYED_SERVICES=harmony-gdal-adapter../bin/bootstrap-harmony.mainbranch) activate thepapermill-hgaenvironment.harmony_host_url='http://localhost:3000'.PR Acceptance Checklist
Jira ticket acceptance criteria met.version.txtandCHANGELOG.mdupdated if any service code is changed.Documentation updated (if needed).