improvement(gce-utils): surface GCP audit log errors in operation failures#13672
Draft
improvement(gce-utils): surface GCP audit log errors in operation failures#13672
Conversation
Copilot
AI
changed the title
[WIP] Enhance wait_for_extended_operation for better error reporting
improvement(gce-utils): extract operation metadata on timeout for better diagnostics
Feb 19, 2026
Copilot
AI
changed the title
improvement(gce-utils): extract operation metadata on timeout for better diagnostics
improvement(gce-utils): surface GCP audit log errors in operation failures
Feb 19, 2026
…iled timeout error information Enhanced the wait_for_extended_operation function to provide more detailed error information when operations timeout. When a timeout occurs, the function now extracts and includes operation details such as operation ID, status, error codes, error messages, target links, and operation types in the error message and logs. This helps diagnose GCP instance creation failures by providing more context about why the operation timed out instead of just saying it timed out without additional information. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Enhanced wait_for_extended_operation to query GCP audit logs when operations fail or timeout. When instance creation fails, the function now fetches relevant audit log entries to provide additional context about the failure (e.g., BACKEND_ERROR, QUOTA_EXCEEDED). Added get_operation_audit_logs method to GceLoggingClient to query activity logs by operation ID. The method queries cloudaudit activity logs which contain error details like status codes and messages that aren't available in the operation object itself. This helps diagnose instance creation failures by surfacing errors from GCP's audit logs, such as backend errors, quota issues, or resource exhaustion that may not be clearly indicated in the operation response. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
bf3dfaa to
633ce75
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
GCP operations fail with generic timeout/error messages while critical diagnostics sit in audit logs. Example:
BACKEND_ERRORwith status code 13 fromcloudaudit.googleapis.com/activitynever surfaces to developers, requiring manual GCP console inspection.Changes:
GceLoggingClient.get_operation_audit_logs()to query activity logs by operation IDwait_for_extended_operation()to automatically fetch and include audit log errors when operations fail/timeoutcreate_instance()to pass instance context for audit log queriesBefore:
After:
Surfaces GCP-specific errors (BACKEND_ERROR, QUOTA_EXCEEDED, RESOURCE_EXHAUSTED) immediately instead of requiring audit log inspection.
Testing
PR pre-checks (self review)
backportlabelsReminders
sdcm/sct_config.py)unit-test/folder)💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.