Commit 62c0aa7
authored
Remove redundant DataStreamer retry mechanism (#3957)
* Remove internal daprovider server for AnyTrust
This change is a backport of 0195abd.
There were significant changes in this area that prevent it from being
applied cleanly but the idea is the same.
The internal daprovider server was originally added in PR #2533 to unify
the code paths between internal AnyTrust and external DA providers by
making both use RPC clients. After working with it for a while it's
clear that the drawbacks outweigh any benefit from it: it added an
unnecessary HTTP/RPC layer for in-process communication, introduced
timeout configuration complexity, and made retry behavior difficult to
reason about.
The proper abstraction point is the daprovider.Writer and Reader
interfaces, not the transport layer. External DA providers need RPC
because they're remote, but internal AnyTrust components can communicate
directly. This change removes the internal server entirely and connects
the Nitro node directly to the AnyTrust aggregator writer and REST
aggregator reader.
* Remove redundant DataStreamer retry mechanism
The DataStreamer had its own retry logic (5 attempts with exponential
backoff) on top of the underlying RPC client's retry mechanism (4
attempts with configurable retry patterns). This created a nested retry
system where:
- RPC client retries transient errors (timeouts, connection failures)
- DataStreamer retried ALL errors indiscriminately, including permanent
ones like "method does not exist"
This redundancy caused problems in production:
1. Blocked fallback logic: When a DAS backend doesn't support
chunked streaming, it returns "method does not exist". The
DataStreamer would retry this error 5 times over 30 seconds instead
of failing fast, preventing the intended fallback to legacy store
from executing before the overall request timeout.
2. Excessive retry attempts: Each RPC method (start/chunk/finalize)
could be retried up to 20 times (5 DataStreamer × 4 RPC client),
wasting time and resources on non-retryable errors.
3. Error masking complexity: To properly handle permanent errors
while keeping DataStreamer retries, we would need error filtering at
TWO levels:
- RPC client level (already handles transient vs permanent)
- DataStreamer level (would need to duplicate this logic)
This duplication adds complexity with no benefit since the RPC client
already provides appropriate retry behavior.
The RPC client's retry mechanism is sufficient:
- Retries on context.DeadlineExceeded
- Retries on connection errors (configurable regex pattern)
- Immediately fails on application errors (e.g., "method does not exist")
- Configurable timeout and retry count per deployment
Removed CLI flags:
- --*.data-stream.base-retry-delay
- --*.data-stream.max-retry-delay
- --*.data-stream.max-retry-attempts1 parent 64f6515 commit 62c0aa7
2 files changed
+9
-83
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
9 | 8 | | |
10 | 9 | | |
11 | 10 | | |
| |||
169 | 168 | | |
170 | 169 | | |
171 | 170 | | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | 171 | | |
190 | 172 | | |
191 | 173 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
| 22 | + | |
| 23 | + | |
27 | 24 | | |
28 | 25 | | |
29 | 26 | | |
30 | 27 | | |
31 | 28 | | |
32 | 29 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | 30 | | |
39 | 31 | | |
40 | 32 | | |
41 | 33 | | |
42 | 34 | | |
43 | 35 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | 36 | | |
48 | 37 | | |
49 | 38 | | |
50 | 39 | | |
51 | 40 | | |
52 | 41 | | |
53 | 42 | | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | 43 | | |
58 | 44 | | |
59 | 45 | | |
60 | 46 | | |
61 | 47 | | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | 48 | | |
66 | 49 | | |
67 | 50 | | |
68 | 51 | | |
69 | 52 | | |
70 | 53 | | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
76 | 58 | | |
77 | 59 | | |
78 | 60 | | |
| |||
104 | 86 | | |
105 | 87 | | |
106 | 88 | | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | 89 | | |
113 | 90 | | |
114 | 91 | | |
| |||
144 | 121 | | |
145 | 122 | | |
146 | 123 | | |
147 | | - | |
| 124 | + | |
148 | 125 | | |
149 | 126 | | |
150 | 127 | | |
| |||
172 | 149 | | |
173 | 150 | | |
174 | 151 | | |
175 | | - | |
| 152 | + | |
176 | 153 | | |
177 | 154 | | |
178 | 155 | | |
179 | 156 | | |
180 | 157 | | |
181 | 158 | | |
182 | 159 | | |
183 | | - | |
| 160 | + | |
184 | 161 | | |
185 | 162 | | |
186 | 163 | | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | 164 | | |
221 | 165 | | |
222 | 166 | | |
| |||
0 commit comments