Skip to content

Conversation

@abradley60
Copy link
Collaborator

Major uplift to the aws pipeline. Key changes include:

  • The addition of collection as a parameter that can be specified at the top level of the workflow
  • Addition of pystac to requirements.txt
  • Improved docstrings for cli functions
  • Major overhall of STAC handling classes

@abradley60 abradley60 requested a review from caitlinadams March 18, 2025 22:03
Copy link
Collaborator

@caitlinadams caitlinadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good to me! Thanks for working on this! I've put a few questions/comments in, but for the most part I think this is a great first go at adding capability to produce stac metadata.

One question I have is whether you think any of the functionality you've added needs tests, and whether you want to write these as part of this PR, or come back to them?

config_path : Path,
make_folders : bool = True,
):
"""Download the required data for the RTC/opera and create a configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click tends toward doc strings that match Unix command help utils, so we should probably try and follow that.

Specifically, this involves a slightly different formatting of help information for click arguments: https://click.palletsprojects.com/en/stable/documentation/#documenting-arguments

raise FileNotFoundError(f"Metadata file not found: {self.file_path}")

# add product stac extension properties
self.item.properties['product:type'] = 'NRB' # or RTC ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we flag this somewhere as something we'll need to review in future?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add TODO in front?


# remove polarizations we don't have from the required products
# e.g. don't try add HH if it did not exist in original source data
IGNORE_ASSETS = [f'_{p}.tif' for p in ['HH','HV','VV','VH'] if p not in self.item.properties['sar:polarizations']]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to simplify this logic to focus directly on only including files that are mentioned in sar:polarizations -- or will that miss something? There's a bit of double-negative going on here that might be easier to read if it's explicit inclusion rather than explicit exclusion. Not a big deal though, code is readable as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Caitlin, agree it could be nicer. My thinking is that it was nice to have all of the files explicitly listed in the REQUIRED_ASSET_FILETYPES, then remove those that don't exist for a given product. I can also implement an append to this list that happens here instead. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood! I thought you could do something like INCLUDED_ASSETS = [f'_{p}.tif' for p in self.item.properties['sar:polarizations']], but missed that there are files other than polarisation that you're looking to include.

With your comment on the append, is the suggestion to remove the polarisations from the REQUIRED_ASSET_FILETYPES, and just append the identified polarisation files using something similar to my one line above? I think that would work, but I also now understand your logic better, so would be totally fine to leave it as is so the REQUIRED_ASSET_FILETYPES is complete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you were understanding my suggestion right. Cool, I think I will leave it as is so the REQUIRED_ASSET_FILETYPES is complete.

@abradley60
Copy link
Collaborator Author

Thanks @caitlinadams. Re tests they are certainly needed for a lot of this. My thinking was they could come at a later date once things like STAC logic have been finalised. This was largely to get an updated version for Aman to test. My next PRs will focus on tests and better logging

@caitlinadams
Copy link
Collaborator

caitlinadams commented Mar 19, 2025

Re tests they are certainly needed for a lot of this. My thinking was they could come at a later date once things like STAC logic have been finalised.

Yep, totally reasonable! In that case, I think the full docs update of the CLI could also wait (I've had this on my to-do list for other CLI functions for a while anyway!) For the others, they're mostly about readability/user experience, so happy for those changes to come in a future version if you think it makes sense.

@abradley60
Copy link
Collaborator Author

Sounds good. I think I might update the CLI docs now anyway while I incorporate the other suggestions

@abradley60
Copy link
Collaborator Author

@caitlinadams I think I have covered all the changes including the improved cli docs

@caitlinadams caitlinadams merged commit c170a61 into main Mar 19, 2025
1 check passed
@caitlinadams caitlinadams deleted the aws-iw-pipeline-uplift branch March 19, 2025 21:39
abradley60 pushed a commit that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants