Skip to content

Comments

Fix false client-termination detection and add channel close observability metrics#3200

Open
amakutunowicz wants to merge 5 commits intolinkedin:masterfrom
amakutunowicz:artur/fix-netty-client-termination-observability
Open

Fix false client-termination detection and add channel close observability metrics#3200
amakutunowicz wants to merge 5 commits intolinkedin:masterfrom
amakutunowicz:artur/fix-netty-client-termination-observability

Conversation

@amakutunowicz
Copy link
Contributor

@amakutunowicz amakutunowicz commented Feb 15, 2026

Summary

This PR improves Netty error handling for client-termination detection and adds observability to distinguish true disconnects from false positives.

Behavioral fixes

  • Distinguish possible client termination on active vs inactive channels.
  • Return 500 Internal Server Error when the channel is still active (retryable false-positive path).
  • Classify truly inactive client-termination as 400 Bad Request for accounting only, and short-circuit without any write attempt.
  • Ensure paths that proceed to response construction for possible client termination use 500, avoiding accidental 4xx responses to still-connected clients.
  • Fix potential NPE when building failure reason headers from exceptions with null messages.
  • Preserve existing keep-alive/connection-close behavior (no policy change to CLOSE_CONNECTION_ERROR_STATUSES).

Observability additions

  • Added counters for:
    • channelInactive split with/without active request.
    • Client termination split on active/inactive channels.
    • Error response sent vs not sent.
    • Exception type breakdown (Connection reset, Broken pipe, ClosedChannelException, SSLException, other IOException).

Logging improvements

  • Added richer context in exceptionCaught and client-termination logs, including channel active state.

Test updates

  • Updated clientEarlyTerminationTest to assert 500 on active channel false-positive detection.
  • Added clientTerminationOnInactiveChannelTest.
  • Added errorResponseMetricsTest.
  • Added errorResponseDeliveryMatrixTest covering 4xx/5xx behavior on active/inactive channels.
  • Updated MockNettyMessageProcessor expected response status for early client termination on active channel.
  • Updated errorResponseTrackingHeadersTest to correctly handle channel lifecycle based on close policy.

Local validation

  • :ambry-rest:test --tests com.github.ambry.rest.NettyResponseChannelTest
  • :ambry-rest:test --tests com.github.ambry.rest.NettyMessageProcessorTest
  • :ambry-rest:test

All tests above were executed with JDK 17 (/tmp/jdks/temurin17/jdk-17.0.18+8).

@codecov-commenter
Copy link

codecov-commenter commented Feb 15, 2026

Codecov Report

❌ Patch coverage is 78.46154% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.81%. Comparing base (52ba813) to head (8bfec48).
⚠️ Report is 344 commits behind head on master.

Files with missing lines Patch % Lines
...a/com/github/ambry/rest/NettyMessageProcessor.java 45.00% 8 Missing and 3 partials ⚠️
...va/com/github/ambry/rest/NettyResponseChannel.java 86.95% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3200      +/-   ##
============================================
+ Coverage     64.24%   69.81%   +5.56%     
- Complexity    10398    12819    +2421     
============================================
  Files           840      930      +90     
  Lines         71755    79148    +7393     
  Branches       8611     9470     +859     
============================================
+ Hits          46099    55257    +9158     
+ Misses        23004    20939    -2065     
- Partials       2652     2952     +300     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants