Add error handling in stac creation and ingestion #404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

kyle-lesinger wants to merge 3 commits into dev from feat/errorLogging

Member

kyle-lesinger commented Oct 9, 2025

Summary:

Addresses #325 partially. These proposed changes will add additional information regarding what the actual errors are and which files and ID's are associated with the file error. My current issue was only in the submit_stac.py which had the errors I've listed below.

These changes improve the error reporting within Airflow and if the process is killed then the debugging process is made simpler. I've also incorporated the same logging that was completed in #380 for handler.py which also has logging in case the stac creation process has error.

Changes

Detailed list or prose of changes
Add additional logging.warnings into handler.py which prints filename that caused the issue.
Add logging.error in submit_stac.py which prints the ID and filename that caused the issue.
These log directly in Airflow.

See example below for the pre-logging error statement.

[2025-10-09, 14:15:55 UTC] {taskinstance.py:3313} ERROR - Task failed with exception

See below for the new log errors statement. These are more interpretable.

[2025-10-09, 14:52:06 UTC] {submit_stac.py:133} ERROR - Failed to submit STAC item. Item ID: 202507_Flood_TX_CentralTX_S2_NDVI_Change_c2025-06-17_2025-07-17_day, Filename: 202507_Flood_TX_CentralTX_S2_NDVI_Change_c2025-06-17_2025-07-17_day.tif, Error: InvalidJSONError: Out of range float values are not JSON compliant
[2025-10-09, 14:56:55 UTC] {taskinstance.py:3313} ERROR - Task failed with exception

PR Checklist

Unit tests
Ad-hoc testing - Deploy changes and test manually
Integration tests
Infrastructure changes - test using terraform validate and terraform plan


          add error handling in stac creation and ingestion

7368d48

kyle-lesinger changed the title ~~add error handling in stac creation and ingestion~~ Add error handling in stac creation and ingestion


          update submit_stac

kyle-lesinger requested review from anayeaye and smohiudd

October 10, 2025 12:37

smohiudd mentioned this pull request

Granular error status reporting for item ingestion #407

Open

1 task

ividito approved these changes

View reviewed changes

Contributor

ividito left a comment

One nit (the same pattern is also repeated a couple other times in the PR), but overall this is good and should help us with debugging 👍

dags/veda_data_pipeline/utils/submit_stac.py

    
                      # Extract filename/item_id from the event for error reporting

                      item_id = event.get("id", "unknown")

                      filename = "unknown"

Contributor

ividito Dec 11, 2025

Can this be None to allow us to differentiate between the two types of "unknowns"

Member Author

kyle-lesinger Dec 12, 2025

@ividito just for clarification, are you suggesting that we make either item_id or filename None?


          Merge branch 'dev' into feat/errorLogging

1fa6f14

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/build_stac/handler.py

    
                  except Exception as ex:

                      out_err: StacItemOutput = {"stac_item": {"error": f"{ex}", "event": event}}

                      # Extract filename from first asset for better error reporting

                      filename = "unknown"

Contributor

smohiudd Dec 19, 2025

Suggested change

filename = "unknown"

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/build_stac/handler.py

    
                      if event.get("assets"):

                          first_asset = next(iter(event["assets"].values()), {})

                          href = first_asset.get("href", "")

                          filename = href.split("/")[-1] if href else "unknown"

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                        filename = href.split("/")[-1] if href else "unknown"
          
                        filename = href.split("/")[-1] if href else None

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/build_stac/handler.py

    
                          href = first_asset.get("href", "")

                          filename = href.split("/")[-1] if href else "unknown"

                      item_id = event.get("item_id", "unknown")

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                    item_id = event.get("item_id", "unknown")
          
                    item_id = event.get("item_id")

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/build_stac/handler.py

    
                              error_breakdown[error_msg] = error_breakdown.get(error_msg, 0) + 1

                              # Extract filename

                              filename = failure.get('filename', 'unknown')

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                            filename = failure.get('filename', 'unknown')
          
                            filename = failure.get('filename')

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/build_stac/handler.py

    
                              # Extract filename

                              filename = failure.get('filename', 'unknown')

                              if filename == 'unknown' and 'event' in failure:

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                            if filename == 'unknown' and 'event' in failure:
          
                            if filename is not None and 'event' in failure:

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/build_stac/handler.py

    
                                  if assets:

                                      first_asset = next(iter(assets.values()), {})

                                      href = first_asset.get('href', '')

                                      filename = href.split("/")[-1] if href else 'unknown'

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                                    filename = href.split("/")[-1] if href else 'unknown'
          
                                    filename = href.split("/")[-1] if href else None

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/submit_stac.py

    
                      )

                      # Extract filename/item_id from the event for error reporting

                      item_id = event.get("id", "unknown")

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                    item_id = event.get("id", "unknown")
          
                    item_id = event.get("id")

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/submit_stac.py

    
                      # Extract filename/item_id from the event for error reporting

                      item_id = event.get("id", "unknown")

                      filename = "unknown"

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                    filename = "unknown"
          
                    filename = None

smohiudd reviewed

View reviewed changes

dags/veda_data_pipeline/utils/submit_stac.py

    
                          if assets:

                              first_asset = next(iter(assets.values()), {})

                              href = first_asset.get("href", "")

                              filename = href.split("/")[-1] if href else "unknown"

Contributor

smohiudd Dec 19, 2025

Suggested change

      
                            filename = href.split("/")[-1] if href else "unknown"
          
                            filename = href.split("/")[-1] if href else None

smohiudd requested changes

View reviewed changes

Contributor

smohiudd left a comment •

edited

Loading

Generally would recommend not using "unknown" and instead just defaulting to None. Python dict get method automatically returns None when the key doesn't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet