fix(observability): resolve A2A and plugin post-invoke silent-success traps#4238
Open
bogdanmariusc10 wants to merge 2 commits intomainfrom
Conversation
Two pre-existing bugs caused metrics_buffer.record_tool_metric(success=True)
to be recorded for responses the caller sees as errors, inflating tool success
rates and masking real failures in observability.
Trap 1 - A2A 200-OK branch ignores JSONRPC error envelopes:
- Added detection for JSONRPC error envelopes ({"error": {...}})
- Added detection for application-level error indicators ({"isError": true})
- Route A2A responses through _coerce_to_tool_result for uniform handling
- Compute success = not tool_result.is_error after coercion
- Explicitly set success = False in non-200 branch (no scope fall-through)
Trap 2 - Plugin TOOL_POST_INVOKE hook silently flips error state:
- Preserve is_error during ToolResult reconstruction from plugin payload
- Check both camelCase (isError) and snake_case (is_error) aliases
- Re-sync success = not tool_result.is_error after plugin post-invoke
- Ensures metrics reflect plugin intent (recovery/validation plugins)
Regression tests added:
- test_a2a_jsonrpc_error_envelope_records_success_false_in_metrics
- test_a2a_application_level_is_error_true_records_success_false
- test_a2a_non_200_response_explicitly_sets_success_false
- test_plugin_post_invoke_flips_error_to_success_records_success_true
- test_plugin_post_invoke_flips_success_to_error_records_success_false
Fixes symmetric with REST and direct-proxy paths fixed in #4202.
Natural follow-ups to #4202, addressing A2A and plugin-hook paths.
Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔗 Related Issue
Closes #4211
📝 Summary
Fixes two pre-existing silent-success traps that caused
metrics_buffer.record_tool_metric(success=True)to be recorded for responses the caller sees as errors, inflating tool success rates and masking real failures in observability.Trap 1 - A2A 200-OK branch ignores JSONRPC error envelopes:
The A2A integration path unconditionally set
is_error=Falseandsuccess=Truefor all 200-OK responses, regardless of whether the response body contained a JSONRPC{"error": {...}}envelope or application-level error indicators like{"isError": true}.Trap 2 - Plugin TOOL_POST_INVOKE hook silently flips error state:
The plugin post-invoke reconstruction dropped
is_errorentirely (Pydantic defaults to False), and thesuccessflag computed before plugin invocation was never re-synced with the plugin-modified state. This caused validation plugins that flip success→error to have their intent erased, and recovery plugins that flip error→success to still recordsuccess=False.Both fixes are symmetric with the REST and direct-proxy paths corrected in #4202.
🏷️ Type of Change
🧪 Verification
make lintmake testmake coverage✅ Checklist
make black isort pre-commit)📓 Notes
Implementation Details
Trap 1 Fix (A2A) -
mcpgateway/services/tool_service.pylines 5536-5559:{"error": {...}}){"isError": true}or{"is_error": true})_coerce_to_tool_resultfor uniform handlingsuccess = not tool_result.is_errorafter coercionsuccess = Falsein non-200 branch (no scope fall-through)Trap 2 Fix (Plugin Post-Invoke) -
mcpgateway/services/tool_service.pylines 5584-5599:is_errorduring ToolResult reconstruction from plugin payloadisError) and snake_case (is_error) aliasessuccess = not tool_result.is_errorafter plugin post-invoke completesRegression Tests Added
Five new tests in
tests/unit/mcpgateway/services/test_tool_service_coverage.py(lines 7231-7820):A2A Error Handling:
test_a2a_jsonrpc_error_envelope_records_success_false_in_metrics- JSONRPC error envelope detectiontest_a2a_application_level_is_error_true_records_success_false- Application-level error indicator detectiontest_a2a_non_200_response_explicitly_sets_success_false- Non-200 HTTP status handlingPlugin Post-Invoke Error State:
4.
test_plugin_post_invoke_flips_error_to_success_records_success_true- Recovery plugin pattern5.
test_plugin_post_invoke_flips_success_to_error_records_success_false- Validation plugin patternImpact
Related PRs