Skip to content

[Cosmos] HTTP/2 PING health check: replace per-channel ScheduledFuture with per-EventLoop scanner #49389

@jeet1995

Description

@jeet1995

Background

PR #49095 introduces an HTTP/2 PING-based broken-connection health check for the Cosmos gateway transport (Http2PingHandler). In the current design, each parent H2 channel installs its own handler, and handlerAdded schedules a periodic check via ctx.executor().scheduleAtFixedRate(...). This means one ScheduledFuture is created per parent H2 connection.

This is functionally correct and mirrors Netty's own IdleStateHandler (which is also per-channel), so it is a fine starting point. This issue tracks a scalability follow-up, not a defect in the merged PR.

Primary trigger: HTTP/2 flipping to default-on

The handler's install is gated purely on HTTP/2 being effectively enabled (Http2PingHandler.isPingHealthEffectivelyEnabled → kill-switch AND ping interval > 0 AND Http2ConnectionConfig.isEffectivelyEnabled()). There is no thin-client restriction — any connection on a client with H2 enabled installs the handler.

Today HTTP/2 is in preview and off by default (Configs.DEFAULT_HTTP2_ENABLED = false; Http2ConnectionConfig.setEnabled(...) javadoc: "the default value (false while in preview, true later) will be applied"). So the per-channel ScheduledFuture footprint is currently limited to clients that explicitly opt into H2.

When that default flips to true, the handler installs on every gateway H2 parent channel across every CosmosClient in the process — with no opt-in gate. That is the primary, near-certain amplifier of the scaling problem below; the multi-client scenario is a secondary case that stacks on top of it.

Problem

The number of scheduled timer tasks scales with channels, while the number of EventLoops is fixed and bounded by CPU (reactor-netty's default LoopResources, ~max(cores, 4) workers; the gateway client does not call .runOn(...), so it inherits the shared default loop group).

Because EventLoop count is a fixed denominator, channels concentrate onto a small set of loops rather than spreading out:

  • H2 pool ceiling is DEFAULT_HTTP2_MAX_CONNECTION_POOL_SIZE = 1000 connections per client.
  • Channel count further multiplies by distinct endpoints and by the number of CosmosClient instances in the process.
  • Each scheduled task runs on the EventLoop I/O thread, so periodic checks and socket I/O for all channels on that loop share one thread.

At high fan-out (e.g. a multi-tenant host with hundreds of CosmosClient instances), this produces a large per-loop scheduled-task-queue depth and a per-channel object/ScheduledFutureTask footprint that grows without any ceiling tied to the EventLoop count.

Proposed design: one scanner per EventLoop

Move scheduling off the per-channel handler and onto a shared per-EventLoop scanner so the timer count tracks loops (bounded) instead of channels (unbounded):

  • Keep Http2PingHandler in each channel's pipeline — it must still intercept inbound PING-ACK frames and track lastReadNanos. The per-channel state object is unavoidable.
  • Remove the per-handler scheduleAtFixedRate.
  • Introduce a shared registry ConcurrentMap<EventExecutor, LoopPingState>, where each LoopPingState holds a Set<Http2PingHandler> plus a single ScheduledFuture.
  • handlerAddedcomputeIfAbsent(ctx.executor()), add self to the loop's set, and schedule the single scanner only when the state is newly created.
  • Scanner tick → iterate the loop's handlers and run the existing per-channel check logic (maybeSendPing, extracted into a runPingCheck(now) method).
  • handlerRemoved / channelInactive → deregister; the scanner self-cancels when it finds its set empty.

Why it is safe

The scanner runs on the loop thread, and so do handlerAdded / handlerRemoved for every channel assigned to that loop. Therefore the per-loop set and the empty→cancel decision are single-threaded per loop and require no locks. Only the top-level map is cross-thread, which ConcurrentHashMap covers.

Result: ScheduledFuture count = active EventLoops = O(cores), flat across both channels and clients.

Alternatives considered

  • Status quo (per-channel ScheduledFuture) — matches Netty IdleStateHandler; correct, but the timer/object count scales with channels. This is the same pattern high-fanout deployments typically outgrow.
  • HashedWheelTimer — also collapses the timer count, but runs on its own thread, forcing an eventLoop().execute(...) hop back onto the channel's loop for every channel-state access. The per-EventLoop scanner keeps everything on the loop thread and avoids that hop.

Acceptance criteria

  • Http2PingHandler no longer schedules a per-channel ScheduledFuture; scheduling is owned by a per-EventLoop scanner.
  • Active scheduled tasks for PING health = number of active EventLoops, independent of channel and client count (verifiable in a unit/diagnostic test).
  • PING send / ACK tracking, failure-threshold counting, and connection close-on-threshold behavior are unchanged.
  • Scanner is correctly cancelled when a loop has no remaining channels (no leak across client open/close cycles on the shared default loop group).
  • Existing PING health tests (including the network-fault lifecycle tests) continue to pass.

References

  • Follow-up to PR Add HTTP/2 PING for broken connection detection. #49095.
  • sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2PingHandler.java — per-channel scheduling in handlerAdded; pingTask field; maybeSendPing.
  • sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/ReactorNettyClient.java — handler install path.
  • sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Configs.javaDEFAULT_HTTP2_MAX_CONNECTION_POOL_SIZE, PING defaults, and DEFAULT_HTTP2_ENABLED = false (H2 preview default-off).
  • sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/Http2ConnectionConfig.javaisEffectivelyEnabled() (per-client override falling back to the global flag); setEnabled(...) javadoc noting the default flips to true post-preview.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ClientThis issue points to a problem in the data-plane of the library.CosmosService AttentionWorkflow: This issue is responsible by Azure service team.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions