Commit 87e5223
authored
fix(csharp): add error resilience to heartbeat poller (#372)
## Summary
- Wraps the `PollOperationStatus` polling loop in a try-catch so that
transient exceptions (e.g. `ObjectDisposedException` from TLS connection
recycling) no longer kill the heartbeat poller silently
- Adds a **max consecutive failure limit** (`MaxConsecutiveFailures =
10`) so persistent errors (auth expired, server gone) don't cause
infinite polling — at the default 60s heartbeat interval this gives ~10
minutes of tolerance before the poller stops itself
- A single successful poll resets the failure counter, so intermittent
transient errors are handled gracefully
- Logs errors via `Activity.Current?.AddEvent()` telemetry with error
type, message, poll count, and consecutive failure count
- Properly handles `OperationCanceledException` from the cancellation
token to still allow graceful shutdown
- Updates the `StopsPollingOnException` test to
`ContinuesPollingOnException` to match the new resilient behavior
## Context
Without this fix, a single transient network error permanently stops the
heartbeat poller. The server-side `commandInactivityTimeout` (default 20
minutes) then expires because no `GetOperationStatus` calls refresh it,
causing the server to terminate the query. This manifests as CloudFetch
failures in Power BI (ES-1778880).
## Design decisions
- **Why not `finally` for `Task.Delay`?** A `finally` block runs even on
`break`, which would add an unnecessary 60s delay on every clean exit
path (terminal state, cancellation, null handle). Placing the delay
after the try-catch means it only executes when the loop continues.
- **Why 10 max failures?** At 60s intervals, 10 failures = ~10 minutes —
enough to ride out transient network issues but not so long that a
permanently broken connection wastes resources indefinitely.
- **Request timeouts are treated as transient errors.** The per-request
`GetOperationStatusTimeoutToken` throws `OperationCanceledException` but
the cancellation filter (`when
cancellationToken.IsCancellationRequested`) correctly routes it to the
general catch since the main token isn't cancelled.
## Test plan
- [x] Verify build succeeds (confirmed locally, 0 warnings, 0 errors)
- [x] Update `StopsPollingOnException` → `ContinuesPollingOnException`
to assert `pollCount > 1`
- [ ] Verify existing unit tests pass
- [ ] Manual validation: inject a transient exception during polling and
confirm the poller recovers and continues heartbeating
- [ ] Verify cancellation still stops the poller gracefully
- [ ] Verify persistent errors stop the poller after ~10 consecutive
failures
This pull request was AI-assisted by Isaac.1 parent adf4853 commit 87e5223
File tree
2 files changed
+71
-30
lines changed- csharp
- src/Reader
- test/Unit
2 files changed
+71
-30
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
52 | 57 | | |
53 | 58 | | |
54 | 59 | | |
| |||
82 | 87 | | |
83 | 88 | | |
84 | 89 | | |
| 90 | + | |
| 91 | + | |
85 | 92 | | |
86 | 93 | | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
106 | 122 | | |
| 123 | + | |
107 | 124 | | |
108 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
109 | 157 | | |
110 | 158 | | |
111 | 159 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
175 | | - | |
| 175 | + | |
176 | 176 | | |
177 | 177 | | |
178 | | - | |
| 178 | + | |
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
| 189 | + | |
| 190 | + | |
198 | 191 | | |
199 | 192 | | |
200 | 193 | | |
| |||
0 commit comments