TRT-635: Migrate GDAL commands to Python bindings.#42
Conversation
flamingbear
left a comment
There was a problem hiding this comment.
This is great, so much nicer. But I see at least one typo and had a question about using process from the Format.
| def execute_gdal_command(self, *args) -> list[str]: | ||
| """This is used to execute gdal* commands.""" | ||
| self.logger.info( | ||
| args[0] + " " + " ".join(["'{}'".format(arg) for arg in args[1:]]) | ||
| ) | ||
| result_str = check_output(args).decode("utf-8") | ||
| return result_str.split("\n") |
| def resize(self, layerid: str, srcfile: str, dstdir: str) -> str: | ||
| """Resizes the input layer and returns the name of the output file. | ||
|
|
||
| Uses a call to gdal_translate using information in message.format. |
There was a problem hiding this comment.
Well, it kind of does - osgeo.gdal.Translate is just a binding around gdal_translate - but I can improve the documentation string.
There was a problem hiding this comment.
I definitely thought gdal_translate was the executable. I didn't realize osgeo.gdal.Translate actually called it.
| Uses a call to gdal_translate using information in message.format. | ||
|
|
||
| """ | ||
| interpolation = self.message.format.process("interpolation") |
There was a problem hiding this comment.
Doesn't process mark this as being "used" or something? I thought process was a way to remove things from a message before downstream services use it. I only have a vague idea about this and I'm sure you told me. But just looking at this change and want to make sure we're not breaking something.
There was a problem hiding this comment.
You're right - process is meant to do that. From what I can tell:
- Using Message.format.process will drop that property from the Message.output_data dict.
- The output message from the service is then derived from the Message.output_data dict after processing
So:
- My first concern was that the message wouldn't be able to reuse the same property within the service, and some of the message properties are used multiple times (CRS, for example). This is irrelevant.
Message.output_datawon't preclude parameter reuse in the same service. - Do we want to strip things out of the message? I'm honestly not sure. I could see times when it's good (preventing repeat processing) and times when it's bad (the same parameter being needed in multiple services).
I guess I don't have an answer for what we should do, but definitely acknowledge this is now doing something different. What are your thoughts?
There was a problem hiding this comment.
I don't have strong opinions. This used to remove an "interpolation" value so it wouldn't be available to downstream services, but this is the service that should do interpolation probably. I don't think a lot of services are using the process mechanism. But I also don't know if they should be.
How is that for a non-answer?
There was a problem hiding this comment.
😆
I added a disclaimer in the method and function documentation strings that makes it clear the parameters aren't being removed in b37c762.
| resampleAlg=resample_algorithm, | ||
| height=output_height, | ||
| width=output_width, | ||
| ) |
There was a problem hiding this comment.
I think you're missing the unscale option from L476? (nevermind, we stopped supporting PNGs)
|
|
||
| interpolation = rgetattr(harmony_message, "format.interpolation", None) | ||
|
|
||
| if interpolation in get_args(ResampleAlgorithms): |
There was a problem hiding this comment.
I definitely had to google get_args
There was a problem hiding this comment.
Yeah - I think we might have used it in xarray when we were doing the DataTree migration, but it's not my most familiar go-to.
|
The unit tests failing are... interesting. The failures are imports of Update: The |
flamingbear
left a comment
There was a problem hiding this comment.
I don't have opinions on the change to the message format behavior. But if you leave it maybe note it as a difference in behavior? Thanks for fixing the other things. I do think keeping this smaller and adding tests and refactoring should be done in another PR.
|
|
||
| # Install GDAL | ||
| RUN conda run --name hga conda install gdal=3.6.2 | ||
| RUN conda run --name hga conda install gdal=3.6.2 --channel conda-forge --override-channels |
There was a problem hiding this comment.
That's a head scratcher that I won't get pulled into. But good catch.
Yup. I think that would be the best target for the next PR. (Not sure whether that comes before or after adding regression tests for projection. But these feel like the two next things to do) |
Description
Alright then. This is the big one! (Or at least one of the things we've been talking about for the longest)
This PR converts requests to GDAL over to using the available Python bindings, rather than executing shell commands directly. I'll leave some comments on the PR, but I think a couple of really helpful links:
Translate).gdal_translate).Also, I must admit, I initially leant on the Google AI generated answer for how to convert
ogr2ogrto Python, but the options in VectorTranslateOptions gave me a bit of confidence.I think we're also possibly missing some regression tests - looks like we don't really test reprojection or shape file workflows.
Jira Issue ID
TRT-635 - the neverending story
Local Test Steps
./bin/build-image && ./bin/build-test && ./bin/run-test. All tests should pass.LOCALLY_DEPLOYED_SERVICES=harmony-gdal-adapterpapermill-hgaconda environment:conda env create -f hga/environment.yml.conda activate papermill-hga.jupyter notebook.harmony_host_url='http://localhost:3000'.http://localhost:3000/jobsand pick a job with a GeoTIFF output.gdalinfoon that output.gdalinfooutput in your terminal to find a line:(This will be in the
Metadatasection, a little bit above the corner points.)PR Acceptance Checklist
Jira ticket acceptance criteria met.version.txtandCHANGELOG.mdupdated if any service code is changed.Documentation updated (if needed).