Commit 881440a
feat: Add reset() method, gRPC keep-alive, and transport resilience (#40)
* fix: Use gRPC DnsNameResolver for periodic DNS re-resolution
Arrow Flight's default FlightClient.Builder uses
NettyChannelBuilder.forAddress(SocketAddress), which calls
Location.toSocketAddress() -> new InetSocketAddress(host, port).
This eagerly resolves DNS once at construction time and registers a
DirectAddressNameResolverProvider that never re-resolves.
For long-lived clients connecting to load-balanced endpoints (e.g.
AWS ALBs) where backend IPs can change, this causes the gRPC channel
to get stuck on stale IPs indefinitely. If the old IP is recycled to
a different service, the client sees TLS certificate mismatches and
cannot recover without being fully reconstructed.
This change builds the gRPC ManagedChannel directly using
NettyChannelBuilder.forTarget("dns:///host:port") instead of going
through Arrow's FlightClient.Builder. The "dns:///" target scheme
activates gRPC's DnsNameResolver, which periodically re-resolves the
hostname (default 30s cache TTL) and triggers re-resolution on
transient failures via its refresh() method.
The FlightClient is then created via FlightGrpcUtils.createFlightClient()
with the custom channel.
* feat: Add reset() method, gRPC keep-alive, and transport resilience
- Add public reset() method to SpiceClient for recovering from
unrecoverable transport failures (SSL cert mismatch, persistent
UNAVAILABLE errors, stale connections pinned to wrong backend IPs)
- Extract buildFlightClient() for reuse during construction and reset
- Configure HTTP/2 keep-alive (30s interval, 10s timeout) on the
gRPC channel to detect dead/idle connections behind load balancers
- Add ensureFlightClient() guard in queryInternal() for safety
- Handle null flightClient in close() after reset
- Add ResetTest.java with 20 unit tests covering happy path, edge
cases, concurrency, construction variants, and integration
- Document transport resilience and reset() usage in README
* fix: Resolve merge conflicts and fix TimeUnit ambiguity
- Resolve merge conflicts from trunk merge (buildFlightClient extraction)
- Fix TimeUnit import ambiguity: fully qualify java.util.concurrent.TimeUnit
at keepAlive call sites to avoid conflict with Arrow's TimeUnit
- Add closed guard to reset() — throws IllegalStateException after close()
- Make close() idempotent with early return on already-closed client
- Update testResetAfterClose to expect IllegalStateException
* fix: Ensure FlightStream is closed after query in ResetTest
* chore: Add CI quality checks, address PR review comments
- Add SpotBugs, OWASP dependency-check, JaCoCo, Enforcer, Checkstyle plugins
- Add 'quality' CI job running static analysis and dependency scanning
- Add JaCoCo coverage reporting to build_multi_os CI job
- Make 'closed' field volatile for thread-safety
- Synchronize close() to prevent race with reset()/buildFlightClient()
- Add channel cleanup (shutdownNow) on buildFlightClient() failure
- Remove unused VectorSchemaRoot import in ResetTest
- Fix stale/misleading comments in ResetTest
* chore: Add permissions block to quality CI job
Address CodeQL finding: restrict GITHUB_TOKEN to contents:read.
* fix: Add SpotBugs exclusion filter and synchronize queryInternal method
* perf: Use snapshot-under-lock in queryInternal for concurrent query throughput
Instead of synchronizing the entire queryInternal() method (which would
serialize all gRPC RPCs), snapshot flightClient and authCallOptions under
a short synchronized block, then execute RPCs without holding the lock.
This allows concurrent queries to run in parallel while still being safe
against concurrent reset() calls.
* perf: Add HTTP connect timeout and double-checked locking for ADBC init
- Set 10s connect timeout on the static HttpClient used for dataset
refresh, preventing threads from blocking indefinitely on unreachable
endpoints.
- Replace synchronized initADBCIfNeeded() with double-checked locking
using a volatile adbcInitialized flag. After warmup, parameterized
queries skip the monitor entirely (volatile read only), eliminating
contention at high concurrency.
* chore: Cap gRPC inbound message size at ~2 GiB and metadata at 16 MiB
Extract named constants MAX_INBOUND_MESSAGE_SIZE (Integer.MAX_VALUE ≈ 2 GiB)
and MAX_INBOUND_METADATA_SIZE (16 MiB) to make the caps explicit and prevent
unbounded metadata from consuming heap on large dataset transfers.
* perf: Fix medium-priority performance issues (#5-#8)
- #5: Eliminate intermediate Object[] and ArrowType[] arrays in parameter
binding; build schema fields directly in a single pass and read values
from the original params array during vector population.
- #6: Close AdbcStatement eagerly after executeQuery() returns the reader.
The ArrowReader holds its own Flight stream and no longer needs the
statement, freeing server-side resources immediately instead of waiting
for slow consumers.
- #7: Create auth middleware once per RPC in HeaderAuthMiddlewareFactory
instead of re-creating it in each callback (onBeforeSendingHeaders,
onHeadersReceived, onCallCompleted). Eliminates 2 redundant allocations
per RPC.
- #8: Replace message.contains() string scanning in ADBC retry logic with
AdbcException.getStatus() switch on AdbcStatusCode enum (IO, UNKNOWN,
TIMEOUT, INTERNAL). Avoids string allocation and scanning on the
exception hot path.
* fix: Address PR review comments (round 3)
- Add closed guard to ensureFlightClient() to prevent rebuilding
transport after client is closed.
- Use local temporaries in initADBCIfNeeded() to avoid leaking
partially created AdbcDatabase/AdbcConnection on failure.
- Wrap FlightStream and ArrowReader in try-with-resources in
ResetTest to prevent resource leaks during integration runs.
- Add bounded 30s timeout to CountDownLatch.await() in concurrent
tests to prevent indefinite hangs on regression.
- Scope SpotBugs exclusions: limit example package exclusion to
DLS_DEAD_LOCAL_STORE, scope CT_CONSTRUCTOR_THROW to specific
classes (SpiceClient, SpiceClientBuilder).
- Fix README example to use try-with-resources for FlightStream.
- Add OWASP Dependency-Check data caching and NVD API key support
in CI to improve reliability and speed.
* fix: Remove outdated commit message template
* fix: Address PR review comments (round 4)
- Close allocator in SpiceClient constructor if buildFlightClient() or
initRetryers() throws, preventing off-heap memory leaks on failed
client construction.
- Replace raw Thread usage in concurrent tests with ExecutorService
and proper shutdownNow() cleanup, preventing thread leaks and JVM
hangs on test timeout.
- Refactor integration tests to probe server availability in setUp()
and gate with a boolean flag (matching TpchIntegrationTest pattern)
instead of brittle exception message substring matching.
- Clarify README example: add comment explaining isTransportFailure()
is application-defined and suggest which exception types to check.
* fix: Fix CI failures - SpotBugs exclusion, integration test tables, remove stale .commitmsg
- Add CNT_ROUGH_CONSTANT_VALUE SpotBugs exclusion for example code
(3.14/3.14159265359 are intentional demo values for float params)
- Change ResetTest integration tests to use taxi_trips table (available
in CI quickstart dataset) instead of tpch.customer (not available)
- Remove .commitmsg that was accidentally re-added
* fix: Address Copilot review comments
- Make server availability probe static/one-time in ResetTest (avoids
redundant client+query per test method)
- Use CopyOnWriteArrayList in concurrent reset+query test for thread safety
- Add closed guard in queryWithParams() for consistent IllegalStateException
instead of confusing NPE after close()
---------
Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech>1 parent 9cf7937 commit 881440a
7 files changed
Lines changed: 944 additions & 80 deletions
File tree
- .github/workflows
- src
- main/java/ai/spice
- test/java/ai/spice
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
100 | 108 | | |
101 | 109 | | |
102 | 110 | | |
| |||
178 | 186 | | |
179 | 187 | | |
180 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
216 | 254 | | |
217 | 255 | | |
218 | 256 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
| 92 | + | |
| 93 | + | |
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| |||
148 | 149 | | |
149 | 150 | | |
150 | 151 | | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
151 | 228 | | |
152 | 229 | | |
153 | 230 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
27 | | - | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
33 | | - | |
| 35 | + | |
34 | 36 | | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
38 | | - | |
| 40 | + | |
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
| |||
0 commit comments