Skip to content

Commit 881440a

Browse files
feat: Add reset() method, gRPC keep-alive, and transport resilience (#40)
* fix: Use gRPC DnsNameResolver for periodic DNS re-resolution Arrow Flight's default FlightClient.Builder uses NettyChannelBuilder.forAddress(SocketAddress), which calls Location.toSocketAddress() -> new InetSocketAddress(host, port). This eagerly resolves DNS once at construction time and registers a DirectAddressNameResolverProvider that never re-resolves. For long-lived clients connecting to load-balanced endpoints (e.g. AWS ALBs) where backend IPs can change, this causes the gRPC channel to get stuck on stale IPs indefinitely. If the old IP is recycled to a different service, the client sees TLS certificate mismatches and cannot recover without being fully reconstructed. This change builds the gRPC ManagedChannel directly using NettyChannelBuilder.forTarget("dns:///host:port") instead of going through Arrow's FlightClient.Builder. The "dns:///" target scheme activates gRPC's DnsNameResolver, which periodically re-resolves the hostname (default 30s cache TTL) and triggers re-resolution on transient failures via its refresh() method. The FlightClient is then created via FlightGrpcUtils.createFlightClient() with the custom channel. * feat: Add reset() method, gRPC keep-alive, and transport resilience - Add public reset() method to SpiceClient for recovering from unrecoverable transport failures (SSL cert mismatch, persistent UNAVAILABLE errors, stale connections pinned to wrong backend IPs) - Extract buildFlightClient() for reuse during construction and reset - Configure HTTP/2 keep-alive (30s interval, 10s timeout) on the gRPC channel to detect dead/idle connections behind load balancers - Add ensureFlightClient() guard in queryInternal() for safety - Handle null flightClient in close() after reset - Add ResetTest.java with 20 unit tests covering happy path, edge cases, concurrency, construction variants, and integration - Document transport resilience and reset() usage in README * fix: Resolve merge conflicts and fix TimeUnit ambiguity - Resolve merge conflicts from trunk merge (buildFlightClient extraction) - Fix TimeUnit import ambiguity: fully qualify java.util.concurrent.TimeUnit at keepAlive call sites to avoid conflict with Arrow's TimeUnit - Add closed guard to reset() — throws IllegalStateException after close() - Make close() idempotent with early return on already-closed client - Update testResetAfterClose to expect IllegalStateException * fix: Ensure FlightStream is closed after query in ResetTest * chore: Add CI quality checks, address PR review comments - Add SpotBugs, OWASP dependency-check, JaCoCo, Enforcer, Checkstyle plugins - Add 'quality' CI job running static analysis and dependency scanning - Add JaCoCo coverage reporting to build_multi_os CI job - Make 'closed' field volatile for thread-safety - Synchronize close() to prevent race with reset()/buildFlightClient() - Add channel cleanup (shutdownNow) on buildFlightClient() failure - Remove unused VectorSchemaRoot import in ResetTest - Fix stale/misleading comments in ResetTest * chore: Add permissions block to quality CI job Address CodeQL finding: restrict GITHUB_TOKEN to contents:read. * fix: Add SpotBugs exclusion filter and synchronize queryInternal method * perf: Use snapshot-under-lock in queryInternal for concurrent query throughput Instead of synchronizing the entire queryInternal() method (which would serialize all gRPC RPCs), snapshot flightClient and authCallOptions under a short synchronized block, then execute RPCs without holding the lock. This allows concurrent queries to run in parallel while still being safe against concurrent reset() calls. * perf: Add HTTP connect timeout and double-checked locking for ADBC init - Set 10s connect timeout on the static HttpClient used for dataset refresh, preventing threads from blocking indefinitely on unreachable endpoints. - Replace synchronized initADBCIfNeeded() with double-checked locking using a volatile adbcInitialized flag. After warmup, parameterized queries skip the monitor entirely (volatile read only), eliminating contention at high concurrency. * chore: Cap gRPC inbound message size at ~2 GiB and metadata at 16 MiB Extract named constants MAX_INBOUND_MESSAGE_SIZE (Integer.MAX_VALUE ≈ 2 GiB) and MAX_INBOUND_METADATA_SIZE (16 MiB) to make the caps explicit and prevent unbounded metadata from consuming heap on large dataset transfers. * perf: Fix medium-priority performance issues (#5-#8) - #5: Eliminate intermediate Object[] and ArrowType[] arrays in parameter binding; build schema fields directly in a single pass and read values from the original params array during vector population. - #6: Close AdbcStatement eagerly after executeQuery() returns the reader. The ArrowReader holds its own Flight stream and no longer needs the statement, freeing server-side resources immediately instead of waiting for slow consumers. - #7: Create auth middleware once per RPC in HeaderAuthMiddlewareFactory instead of re-creating it in each callback (onBeforeSendingHeaders, onHeadersReceived, onCallCompleted). Eliminates 2 redundant allocations per RPC. - #8: Replace message.contains() string scanning in ADBC retry logic with AdbcException.getStatus() switch on AdbcStatusCode enum (IO, UNKNOWN, TIMEOUT, INTERNAL). Avoids string allocation and scanning on the exception hot path. * fix: Address PR review comments (round 3) - Add closed guard to ensureFlightClient() to prevent rebuilding transport after client is closed. - Use local temporaries in initADBCIfNeeded() to avoid leaking partially created AdbcDatabase/AdbcConnection on failure. - Wrap FlightStream and ArrowReader in try-with-resources in ResetTest to prevent resource leaks during integration runs. - Add bounded 30s timeout to CountDownLatch.await() in concurrent tests to prevent indefinite hangs on regression. - Scope SpotBugs exclusions: limit example package exclusion to DLS_DEAD_LOCAL_STORE, scope CT_CONSTRUCTOR_THROW to specific classes (SpiceClient, SpiceClientBuilder). - Fix README example to use try-with-resources for FlightStream. - Add OWASP Dependency-Check data caching and NVD API key support in CI to improve reliability and speed. * fix: Remove outdated commit message template * fix: Address PR review comments (round 4) - Close allocator in SpiceClient constructor if buildFlightClient() or initRetryers() throws, preventing off-heap memory leaks on failed client construction. - Replace raw Thread usage in concurrent tests with ExecutorService and proper shutdownNow() cleanup, preventing thread leaks and JVM hangs on test timeout. - Refactor integration tests to probe server availability in setUp() and gate with a boolean flag (matching TpchIntegrationTest pattern) instead of brittle exception message substring matching. - Clarify README example: add comment explaining isTransportFailure() is application-defined and suggest which exception types to check. * fix: Fix CI failures - SpotBugs exclusion, integration test tables, remove stale .commitmsg - Add CNT_ROUGH_CONSTANT_VALUE SpotBugs exclusion for example code (3.14/3.14159265359 are intentional demo values for float params) - Change ResetTest integration tests to use taxi_trips table (available in CI quickstart dataset) instead of tpch.customer (not available) - Remove .commitmsg that was accidentally re-added * fix: Address Copilot review comments - Make server availability probe static/one-time in ResetTest (avoids redundant client+query per test method) - Use CopyOnWriteArrayList in concurrent reset+query test for thread safety - Add closed guard in queryWithParams() for consistent IllegalStateException instead of confusing NPE after close() --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech>
1 parent 9cf7937 commit 881440a

7 files changed

Lines changed: 944 additions & 80 deletions

File tree

.github/workflows/build.yaml

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,14 @@ jobs:
9797
path: target/surefire-reports/
9898
retention-days: 7
9999

100+
- name: Upload coverage report
101+
if: always()
102+
uses: actions/upload-artifact@v4
103+
with:
104+
name: coverage-report-${{ matrix.os }}
105+
path: target/site/jacoco/
106+
retention-days: 7
107+
100108
build:
101109
runs-on: ubuntu-latest
102110
timeout-minutes: 20
@@ -178,3 +186,52 @@ jobs:
178186
name: test-results-jdk${{ matrix.java.version }}-${{ matrix.java.distribution }}
179187
path: target/surefire-reports/
180188
retention-days: 7
189+
190+
quality:
191+
name: Code quality checks
192+
runs-on: ubuntu-latest
193+
timeout-minutes: 30
194+
permissions:
195+
contents: read
196+
steps:
197+
- uses: actions/checkout@v4
198+
199+
- name: Set up JDK 17 (Oracle)
200+
uses: actions/setup-java@v4
201+
with:
202+
java-version: 17
203+
distribution: oracle
204+
cache: maven
205+
206+
- name: Build
207+
run: mvn install -DskipTests=true -Dgpg.skip -B -V
208+
209+
- name: Maven Enforcer
210+
run: mvn validate -B
211+
212+
- name: SpotBugs
213+
run: mvn spotbugs:check -B
214+
215+
- name: Checkstyle
216+
run: mvn checkstyle:check -B
217+
218+
- name: Cache OWASP Dependency-Check data
219+
uses: actions/cache@v4
220+
with:
221+
path: ~/.m2/repository/org/owasp/dependency-check-data
222+
key: dependency-check-data-${{ runner.os }}-${{ hashFiles('**/pom.xml') }}
223+
restore-keys: |
224+
dependency-check-data-${{ runner.os }}-
225+
226+
- name: OWASP Dependency-Check
227+
env:
228+
NVD_API_KEY: ${{ secrets.NVD_API_KEY }}
229+
run: mvn dependency-check:check -B
230+
231+
- name: Upload dependency-check report
232+
if: always()
233+
uses: actions/upload-artifact@v4
234+
with:
235+
name: dependency-check-report
236+
path: target/dependency-check-report.html
237+
retention-days: 30

README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,44 @@ SpiceClient client = SpiceClient.builder()
213213
.build();
214214
```
215215

216+
### Long-lived Clients and Transport Resilience
217+
218+
The `SpiceClient` is designed for long-lived reuse. The underlying gRPC channel uses `dns:///` resolution, which periodically re-resolves hostnames so clients automatically recover from load-balancer IP rotation (e.g. AWS NLB). HTTP/2 keep-alive is enabled by default (30s interval, 10s timeout) to detect dead connections quickly.
219+
220+
For the rare case where the transport becomes permanently stuck (e.g. TLS handshake to a wrong backend, persistent `UNAVAILABLE` after retries), use `reset()` to discard the bad connection and immediately establish a fresh one:
221+
222+
```java
223+
SpiceClient client = SpiceClient.builder()
224+
.withApiKey(API_KEY)
225+
.withSpiceCloud()
226+
.build();
227+
228+
// Long-lived usage with transport recovery.
229+
// isTransportFailure() is application-defined; check for
230+
// io.grpc.StatusRuntimeException with Status.UNAVAILABLE,
231+
// SSLHandshakeException, or similar transport-level errors.
232+
try {
233+
try (FlightStream stream = client.query(sql)) {
234+
// process results...
235+
}
236+
} catch (ExecutionException e) {
237+
if (isTransportFailure(e.getCause())) {
238+
client.reset(); // discard bad transport, reconnect immediately
239+
try (FlightStream stream = client.query(sql)) {
240+
// process results with fresh connection...
241+
}
242+
} else {
243+
throw e;
244+
}
245+
}
246+
```
247+
248+
**DNS cache TTL:** The gRPC `DnsNameResolver` respects the JVM's DNS cache TTL. For more aggressive DNS refresh (recommended for cloud-deployed clients), set the JVM property:
249+
250+
```bash
251+
-Dnetworkaddress.cache.ttl=30
252+
```
253+
216254
### Iterating Through Results
217255

218256
For more control over query results, you can iterate through rows and access individual field values:

pom.xml

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,8 @@
8989
<artifactId>maven-surefire-plugin</artifactId>
9090
<version>3.5.4</version>
9191
<configuration>
92-
<argLine>--add-opens=java.base/java.nio=ALL-UNNAMED</argLine>
92+
<!-- @{argLine} is set by JaCoCo's prepare-agent goal -->
93+
<argLine>@{argLine} --add-opens=java.base/java.nio=ALL-UNNAMED</argLine>
9394
</configuration>
9495
</plugin>
9596
<plugin>
@@ -148,6 +149,82 @@
148149
<waitUntil>published</waitUntil> -->
149150
</configuration>
150151
</plugin>
152+
<!-- P0: Static bug detection — null derefs, concurrency, resource leaks -->
153+
<plugin>
154+
<groupId>com.github.spotbugs</groupId>
155+
<artifactId>spotbugs-maven-plugin</artifactId>
156+
<version>4.9.3.0</version>
157+
<configuration>
158+
<excludeFilterFile>spotbugs-exclude.xml</excludeFilterFile>
159+
</configuration>
160+
</plugin>
161+
<!-- P0: CVE scanning for transitive dependencies (Netty, gRPC, Arrow) -->
162+
<plugin>
163+
<groupId>org.owasp</groupId>
164+
<artifactId>dependency-check-maven</artifactId>
165+
<version>12.1.1</version>
166+
<configuration>
167+
<failBuildOnCVSS>7</failBuildOnCVSS>
168+
</configuration>
169+
</plugin>
170+
<!-- P1: Test coverage reporting -->
171+
<plugin>
172+
<groupId>org.jacoco</groupId>
173+
<artifactId>jacoco-maven-plugin</artifactId>
174+
<version>0.8.13</version>
175+
<executions>
176+
<execution>
177+
<id>prepare-agent</id>
178+
<goals>
179+
<goal>prepare-agent</goal>
180+
</goals>
181+
</execution>
182+
<execution>
183+
<id>report</id>
184+
<phase>test</phase>
185+
<goals>
186+
<goal>report</goal>
187+
</goals>
188+
</execution>
189+
</executions>
190+
</plugin>
191+
<!-- P2: Build hygiene — enforce Maven/JDK versions, dependency convergence -->
192+
<plugin>
193+
<groupId>org.apache.maven.plugins</groupId>
194+
<artifactId>maven-enforcer-plugin</artifactId>
195+
<version>3.5.0</version>
196+
<executions>
197+
<execution>
198+
<id>enforce</id>
199+
<goals>
200+
<goal>enforce</goal>
201+
</goals>
202+
<configuration>
203+
<rules>
204+
<requireMavenVersion>
205+
<version>[3.6.0,)</version>
206+
</requireMavenVersion>
207+
<requireJavaVersion>
208+
<version>[11,)</version>
209+
</requireJavaVersion>
210+
<banDuplicatePomDependencyVersions/>
211+
</rules>
212+
</configuration>
213+
</execution>
214+
</executions>
215+
</plugin>
216+
<!-- P3: Code style consistency (Google Java Style) -->
217+
<plugin>
218+
<groupId>org.apache.maven.plugins</groupId>
219+
<artifactId>maven-checkstyle-plugin</artifactId>
220+
<version>3.6.0</version>
221+
<configuration>
222+
<configLocation>google_checks.xml</configLocation>
223+
<consoleOutput>true</consoleOutput>
224+
<violationSeverity>warning</violationSeverity>
225+
<failOnViolation>false</failOnViolation>
226+
</configuration>
227+
</plugin>
151228
</plugins>
152229
</build>
153230
</project>

spotbugs-exclude.xml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
SpotBugs exclusion filter for pre-existing findings.
4+
These are findings in code that predates the introduction of SpotBugs.
5+
They should be addressed in follow-up PRs.
6+
-->
7+
<FindBugsFilter>
8+
<!-- Example code: dead stores are intentional (demonstrating all param types) -->
9+
<Match>
10+
<Package name="ai.spice.example"/>
11+
<Bug pattern="DLS_DEAD_LOCAL_STORE"/>
12+
</Match>
13+
14+
<!-- Example code: rough constants like 3.14 are intentional demo values for float params -->
15+
<Match>
16+
<Package name="ai.spice.example"/>
17+
<Bug pattern="CNT_ROUGH_CONSTANT_VALUE"/>
18+
</Match>
19+
20+
<!-- Config: dead store in getUserAgent() -->
21+
<Match>
22+
<Class name="ai.spice.Config"/>
23+
<Bug pattern="DLS_DEAD_LOCAL_STORE"/>
24+
</Match>
25+
26+
<!-- HeaderAuthMiddlewareFactory: internal representation exposure (by design, internal class) -->
27+
<Match>
28+
<Class name="ai.spice.HeaderAuthMiddlewareFactory"/>
29+
<Bug pattern="EI_EXPOSE_REP2"/>
30+
</Match>
31+
32+
<!-- RefreshOptions: public fields are part of the public API -->
33+
<Match>
34+
<Class name="ai.spice.RefreshOptions"/>
35+
<Bug pattern="PA_PUBLIC_PRIMITIVE_ATTRIBUTE"/>
36+
</Match>
37+
38+
<!-- Constructor-throw warnings: these constructors intentionally validate/fail early -->
39+
<Match>
40+
<Class name="ai.spice.SpiceClient"/>
41+
<Bug pattern="CT_CONSTRUCTOR_THROW"/>
42+
</Match>
43+
<Match>
44+
<Class name="ai.spice.SpiceClientBuilder"/>
45+
<Bug pattern="CT_CONSTRUCTOR_THROW"/>
46+
</Match>
47+
</FindBugsFilter>

src/main/java/ai/spice/HeaderAuthMiddlewareFactory.java

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,23 @@ public HeaderAuthMiddlewareFactory(ClientIncomingAuthHeaderMiddleware.Factory au
2121

2222
@Override
2323
public FlightClientMiddleware onCallStarted(CallInfo callInfo) {
24+
// Create the auth middleware once per RPC, not once per callback
25+
final FlightClientMiddleware authMiddleware = authFactory.onCallStarted(callInfo);
2426
return new FlightClientMiddleware() {
2527
@Override
2628
public void onBeforeSendingHeaders(CallHeaders callHeaders) {
27-
authFactory.onCallStarted(callInfo).onBeforeSendingHeaders(callHeaders);
29+
authMiddleware.onBeforeSendingHeaders(callHeaders);
2830
headers.forEach(callHeaders::insert);
2931
}
3032

3133
@Override
3234
public void onHeadersReceived(CallHeaders callHeaders) {
33-
authFactory.onCallStarted(callInfo).onHeadersReceived(callHeaders);
35+
authMiddleware.onHeadersReceived(callHeaders);
3436
}
3537

3638
@Override
3739
public void onCallCompleted(CallStatus callStatus) {
38-
authFactory.onCallStarted(callInfo).onCallCompleted(callStatus);
40+
authMiddleware.onCallCompleted(callStatus);
3941
}
4042
};
4143
}

0 commit comments

Comments
 (0)