Skip to content

TRT-635: Refactor retrieval of GeoTIFF variable names.#41

Merged
owenlittlejohns merged 1 commit intomainfrom
TRT-635-get-geotiff-variables
Oct 1, 2025
Merged

TRT-635: Refactor retrieval of GeoTIFF variable names.#41
owenlittlejohns merged 1 commit intomainfrom
TRT-635-get-geotiff-variables

Conversation

@owenlittlejohns
Copy link
Copy Markdown
Member

Description

This PR is something I spotted when about to convert the shell commands to be Python instead. I also threw in the use of the NoRetryException because that seems like a good thing to do.

I've made it a patch release, but I'm also fine to stack this up with some other changes. No strong opinions on that, as this is a service that is not in production.

Jira Issue ID

TRT-635

Local Test Steps

  • Pull this branch
  • Check the unit tests pass: ./bin/build-image && ./bin/build-test && ./bin/run-test.
  • Check the regression test suite passes against a local Harmony in a Box (HiaB) instance with this new image:
    • After building the image, make sure your HiaB has LOCALLY_DEPLOYED_SERVICES=harmony-gdal-adapter.
    • Start HiaB: ./bin/bootstrap-harmony.
    • In the regression test repository (main branch) activate the papermill-hga environment.
    • Start a Jupyter notebook server.
    • In your browser, in the notebook, set harmony_host_url='http://localhost:3000'.
    • Run all the cells in the notebook. The tests should all pass.

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • version.txt and CHANGELOG.md updated if any service code is changed.
  • Tests added/updated and passing.
  • Documentation updated (if needed).


class DownloadError(HGAException):
"""Raised when the Harmony GDAL Adapter cannot retrieve input data."""
class DownloadError(HarmonyException):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wonder if the HGANoRetryException is an unnecessary intermediate layer. @flamingbear - you may have already said this when you realised that the service name wasn't actually being used as expected.

I didn't cut out the HGANoRetryException for now, but it feels like we could.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is the right thing probably.

layernames.append(layer_id)
filelist.append(filename)
else:
raise MissingVariableError(requested_variable.name)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behaviour change: previously, if a requested variable could not be matched to anything in the GeoTIFF, the service would just carry on silently to the next requested variable. I think that goes against the general theme of: if a user explicitly asks for something, and a service can't do it, then we should fail and say so.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, generally, I'm lacking test coverage of methods on this class. My plan is:

  • Pull unnecessary things from the class, to reduce overall size. Add unit tests to things pulled out.
  • Rely on the regression tests for now on the larger behaviour.
  • Once the adapter is cut down to size, start adding more tests in the repository for methods still there.

If that just sounds lazy, though, let me know and I can look into adding some tests for this method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a user explicitly asks for something, and a service can't do it, then we should fail and say so.

I'm full agreed with this, but... I think it might do users a disservice when they're using the EDSC. I am seeing folks requesting things they probably didn't mean to, and then the failure messages are not easily sensible (to a user). But that's a big bag of another day.

Comment on lines -703 to -711
if "netCDF" in gdalinfo_lines[0] or "HDF" in gdalinfo_lines[0]:
# netCDF/Network Common Data Format, HDF5/Hierarchical Data Format
# Release 5
# Normal case of NetCDF / HDF, where variables are subdatasets
for subdataset in filter(
(lambda line: re.match(r"^\s*SUBDATASET_\d+_NAME=", line)),
gdalinfo_lines,
):
result.append(HarmonyVariable({"name": re.split(r":", subdataset)[-1]}))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what started me off with this PR - I spotted code that was trying to accommodate netCDF4 input files. But those are no longer handled, so this code is unreachable.

* The standard_name as retrieved from the band metadata.
* "BandN", if there is no standard_name.

"""
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the new dictionary comprehension in place of a for loop with an if/else inside. But if it isn't as readable, then let me know.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how you love a comprehension. I can read this, but I would have probably done this, but no need to change it.

 result = {}
 with OpenGDAL(filename) as dataset:
     for band_index in range(1, dataset.RasterCount + 1):
         result[f'Band{band_index}'] = (
             dataset.GetRasterBand(band_index).GetMetadata().get("standard_name")
             or f"Band{band_index}"
         )            

    

Comment on lines +260 to +269
if band is None:
# No standard name matched, now try raw band names, i.e. BandN
band = next(
(
band_name
for band_name in geotiff_variables
if requested_variable.name.lower() in band_name.lower()
),
None,
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but an alternative implementation could be to combine these two searches down so that the standard_name and the band name ("BandN") are being checked at the same time. Meh.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably clearer.

Copy link
Copy Markdown
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I was able to run the tests locally and the regressions against Harmony-In-A-Box. All succeeded. Changes make sense to me.


class DownloadError(HGAException):
"""Raised when the Harmony GDAL Adapter cannot retrieve input data."""
class DownloadError(HarmonyException):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is the right thing probably.

Comment on lines +260 to +269
if band is None:
# No standard name matched, now try raw band names, i.e. BandN
band = next(
(
band_name
for band_name in geotiff_variables
if requested_variable.name.lower() in band_name.lower()
),
None,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably clearer.

* The standard_name as retrieved from the band metadata.
* "BandN", if there is no standard_name.

"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how you love a comprehension. I can read this, but I would have probably done this, but no need to change it.

 result = {}
 with OpenGDAL(filename) as dataset:
     for band_index in range(1, dataset.RasterCount + 1):
         result[f'Band{band_index}'] = (
             dataset.GetRasterBand(band_index).GetMetadata().get("standard_name")
             or f"Band{band_index}"
         )            

    

@owenlittlejohns owenlittlejohns merged commit 9666160 into main Oct 1, 2025
4 checks passed
@owenlittlejohns owenlittlejohns deleted the TRT-635-get-geotiff-variables branch October 1, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants