Skip to content

Conversation

@kyle-lesinger
Copy link
Member

Summary:

Addresses #325 partially. These proposed changes will add additional information regarding what the actual errors are and which files and ID's are associated with the file error. My current issue was only in the submit_stac.py which had the errors I've listed below.

These changes improve the error reporting within Airflow and if the process is killed then the debugging process is made simpler. I've also incorporated the same logging that was completed in #380 for handler.py which also has logging in case the stac creation process has error.

Changes

  • Detailed list or prose of changes
    Add additional logging.warnings into handler.py which prints filename that caused the issue.
    Add logging.error in submit_stac.py which prints the ID and filename that caused the issue.
    These log directly in Airflow.

See example below for the pre-logging error statement.

[2025-10-09, 14:15:55 UTC] {taskinstance.py:3313} ERROR - Task failed with exception

See below for the new log errors statement. These are more interpretable.

[2025-10-09, 14:52:06 UTC] {submit_stac.py:133} ERROR - Failed to submit STAC item. Item ID: 202507_Flood_TX_CentralTX_S2_NDVI_Change_c2025-06-17_2025-07-17_day, Filename: 202507_Flood_TX_CentralTX_S2_NDVI_Change_c2025-06-17_2025-07-17_day.tif, Error: InvalidJSONError: Out of range float values are not JSON compliant
[2025-10-09, 14:56:55 UTC] {taskinstance.py:3313} ERROR - Task failed with exception

PR Checklist

  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests
  • Infrastructure changes - test using terraform validate and terraform plan

@kyle-lesinger kyle-lesinger changed the title add error handling in stac creation and ingestion Add error handling in stac creation and ingestion Oct 9, 2025
Copy link
Contributor

@ividito ividito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit (the same pattern is also repeated a couple other times in the PR), but overall this is good and should help us with debugging 👍


# Extract filename/item_id from the event for error reporting
item_id = event.get("id", "unknown")
filename = "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be None to allow us to differentiate between the two types of "unknowns"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ividito just for clarification, are you suggesting that we make either item_id or filename None?

except Exception as ex:
out_err: StacItemOutput = {"stac_item": {"error": f"{ex}", "event": event}}
# Extract filename from first asset for better error reporting
filename = "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename = "unknown"

if event.get("assets"):
first_asset = next(iter(event["assets"].values()), {})
href = first_asset.get("href", "")
filename = href.split("/")[-1] if href else "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename = href.split("/")[-1] if href else "unknown"
filename = href.split("/")[-1] if href else None

href = first_asset.get("href", "")
filename = href.split("/")[-1] if href else "unknown"

item_id = event.get("item_id", "unknown")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
item_id = event.get("item_id", "unknown")
item_id = event.get("item_id")

error_breakdown[error_msg] = error_breakdown.get(error_msg, 0) + 1

# Extract filename
filename = failure.get('filename', 'unknown')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename = failure.get('filename', 'unknown')
filename = failure.get('filename')


# Extract filename
filename = failure.get('filename', 'unknown')
if filename == 'unknown' and 'event' in failure:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if filename == 'unknown' and 'event' in failure:
if filename is not None and 'event' in failure:

if assets:
first_asset = next(iter(assets.values()), {})
href = first_asset.get('href', '')
filename = href.split("/")[-1] if href else 'unknown'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename = href.split("/")[-1] if href else 'unknown'
filename = href.split("/")[-1] if href else None

)

# Extract filename/item_id from the event for error reporting
item_id = event.get("id", "unknown")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
item_id = event.get("id", "unknown")
item_id = event.get("id")


# Extract filename/item_id from the event for error reporting
item_id = event.get("id", "unknown")
filename = "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename = "unknown"
filename = None

if assets:
first_asset = next(iter(assets.values()), {})
href = first_asset.get("href", "")
filename = href.split("/")[-1] if href else "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename = href.split("/")[-1] if href else "unknown"
filename = href.split("/")[-1] if href else None

Copy link
Contributor

@smohiudd smohiudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally would recommend not using "unknown" and instead just defaulting to None. Python dict get method automatically returns None when the key doesn't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants