-
Notifications
You must be signed in to change notification settings - Fork 5
Add error handling in stac creation and ingestion #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
ividito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit (the same pattern is also repeated a couple other times in the PR), but overall this is good and should help us with debugging 👍
|
|
||
| # Extract filename/item_id from the event for error reporting | ||
| item_id = event.get("id", "unknown") | ||
| filename = "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be None to allow us to differentiate between the two types of "unknowns"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ividito just for clarification, are you suggesting that we make either item_id or filename None?
| except Exception as ex: | ||
| out_err: StacItemOutput = {"stac_item": {"error": f"{ex}", "event": event}} | ||
| # Extract filename from first asset for better error reporting | ||
| filename = "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| filename = "unknown" |
| if event.get("assets"): | ||
| first_asset = next(iter(event["assets"].values()), {}) | ||
| href = first_asset.get("href", "") | ||
| filename = href.split("/")[-1] if href else "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| filename = href.split("/")[-1] if href else "unknown" | |
| filename = href.split("/")[-1] if href else None |
| href = first_asset.get("href", "") | ||
| filename = href.split("/")[-1] if href else "unknown" | ||
|
|
||
| item_id = event.get("item_id", "unknown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| item_id = event.get("item_id", "unknown") | |
| item_id = event.get("item_id") |
| error_breakdown[error_msg] = error_breakdown.get(error_msg, 0) + 1 | ||
|
|
||
| # Extract filename | ||
| filename = failure.get('filename', 'unknown') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| filename = failure.get('filename', 'unknown') | |
| filename = failure.get('filename') |
|
|
||
| # Extract filename | ||
| filename = failure.get('filename', 'unknown') | ||
| if filename == 'unknown' and 'event' in failure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if filename == 'unknown' and 'event' in failure: | |
| if filename is not None and 'event' in failure: |
| if assets: | ||
| first_asset = next(iter(assets.values()), {}) | ||
| href = first_asset.get('href', '') | ||
| filename = href.split("/")[-1] if href else 'unknown' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| filename = href.split("/")[-1] if href else 'unknown' | |
| filename = href.split("/")[-1] if href else None |
| ) | ||
|
|
||
| # Extract filename/item_id from the event for error reporting | ||
| item_id = event.get("id", "unknown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| item_id = event.get("id", "unknown") | |
| item_id = event.get("id") |
|
|
||
| # Extract filename/item_id from the event for error reporting | ||
| item_id = event.get("id", "unknown") | ||
| filename = "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| filename = "unknown" | |
| filename = None |
| if assets: | ||
| first_asset = next(iter(assets.values()), {}) | ||
| href = first_asset.get("href", "") | ||
| filename = href.split("/")[-1] if href else "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| filename = href.split("/")[-1] if href else "unknown" | |
| filename = href.split("/")[-1] if href else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally would recommend not using "unknown" and instead just defaulting to None. Python dict get method automatically returns None when the key doesn't exist.
Summary:
Addresses #325 partially. These proposed changes will add additional information regarding what the actual errors are and which files and ID's are associated with the file error. My current issue was only in the
submit_stac.pywhich had the errors I've listed below.These changes improve the error reporting within Airflow and if the process is killed then the debugging process is made simpler. I've also incorporated the same logging that was completed in #380 for
handler.pywhich also has logging in case the stac creation process has error.Changes
Add additional logging.warnings into
handler.pywhich prints filename that caused the issue.Add logging.error in
submit_stac.pywhich prints the ID and filename that caused the issue.These log directly in Airflow.
See example below for the pre-logging error statement.
See below for the new log errors statement. These are more interpretable.
PR Checklist
terraform validateandterraform plan