Add last connection failure time diff and exception clarification to the pool timeout exception #2362

jrobe · 2025-10-10T05:32:14Z

I just spent a week attempting to diagnose why my services failed to reconnect to my database after an RDS Failover for Postgres.

What happened was the DNS cache was getting hits and misses, and I hit the race condition where the hikari pool was reaching out to the old unhealthy host after failover (but also after it has reached out to the new host once) - so my service was displaying "The database is not currently accepting connections" as the final "Caused by" in the Hikari stack-trace for connection pool timeouts for 1-2 hours after the failover had succeeded.

I didn't know that the last "Caused by" exception in the Hikari stack trace was actually the last connection failure that happened on a different thread; I thought it was actually part of the stack trace and running on the same thread - and I certainly didn't think it could be delayed by 1+ hours.

Because of this I dove down DNS rabbit holes, when the actual problem was my Postgres JDBC driver got stuck trying to log in (after the startup packet was acknowledged, but before hearing back from auth), that all could have been avoided if the exception message from Hikari was a bit more clear that the Failure shown in the stack trace might be stale and delayed.

I'm hoping that we can add the clarification that the failure happened some other time to Hikari, so some poor soul doesn't have the diagnostic issues I had.

… timeout exception

jrobe · 2025-10-10T05:46:56Z

I do recognize that the shown timestamp might not actually match the time that the exception shown happened (in the case of a race condition where the atomic reference with the throwable is updated before the atomicLong, and this thread reaches out and gets the new exception but the old timestamp). However, the message I think would still be helpful enough as is without needing to introduce an exception-timestamp tuple to explicitly pair them.

jrobe · 2025-10-10T06:29:22Z

This new message is displayed as below in the unit tests.

00:05:29,153456 [seq  107] [ari.pool.ExceptionTest:connection-closer] DEBUG PoolBase             - com.zaxxer.hikari.pool.ExceptionTest - Closing connection com.zaxxer.hikari.mocks.StubConnection@2f33eff8: (connection is dead)
java.sql.SQLTransientConnectionException: com.zaxxer.hikari.pool.ExceptionTest - Connection is not available, request timed out after 6006ms (total=1, active=1, idle=0, waiting=0) - The last connection failure (appended as the next exception below) happened 5ms999µs100ns ago.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:719)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:184)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:142)
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:99)
	at com.zaxxer.hikari.pool.ExceptionTest.testLastErrorTimeout(ExceptionTest.java:150)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:377)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:284)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:248)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:167)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:456)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:169)
	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:595)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:581)
Caused by: java.sql.SQLException: Exception 1
	at com.zaxxer.hikari.pool.ExceptionTest.lambda$testLastErrorTimeout$0(ExceptionTest.java:147)
	at com.zaxxer.hikari.mocks.StubConnection.setNetworkTimeout(StubConnection.java:535)
	at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:594)
	at com.zaxxer.hikari.pool.PoolBase.isConnectionDead(PoolBase.java:162)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:166)
	... 33 more

jrobe · 2025-10-13T20:01:04Z

Oh, and why did I hit a loginTimeout issue? I use driver-based connection (which spins up a DriverDataSource from the utils package) - and apparently hit this 8 year old bug where Postgres JDBC doesn't respect the DriverManager loginTimeout... pgjdbc/pgjdbc#879

jrobe added 2 commits October 9, 2025 22:28

Add last connection failure timestamp and clarification to connection…

24acbf2

… timeout exception

Fix whitespace

3cb81e4

jrobe changed the title ~~Add last connection failure timestamp and exception clarification to the pool timeout Exception~~ Add last connection failure timestamp and exception clarification to the pool timeout exception Oct 10, 2025

jrobe added 2 commits October 9, 2025 23:47

Change the time to a relative time

a45a5a7

Fix time diff to use lastConnectionFailureTimestamp

05a4c04

jrobe changed the title ~~Add last connection failure timestamp and exception clarification to the pool timeout exception~~ Add last connection failure time diff and exception clarification to the pool timeout exception Oct 10, 2025

Fix style to match above lines

b216e82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add last connection failure time diff and exception clarification to the pool timeout exception #2362

Add last connection failure time diff and exception clarification to the pool timeout exception #2362

jrobe commented Oct 10, 2025 •

edited

Loading

Uh oh!

jrobe commented Oct 10, 2025 •

edited

Loading

Uh oh!

jrobe commented Oct 10, 2025 •

edited

Loading

Uh oh!

jrobe commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add last connection failure time diff and exception clarification to the pool timeout exception #2362

Are you sure you want to change the base?

Add last connection failure time diff and exception clarification to the pool timeout exception #2362

Conversation

jrobe commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrobe commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrobe commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrobe commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jrobe commented Oct 10, 2025 •

edited

Loading

jrobe commented Oct 10, 2025 •

edited

Loading

jrobe commented Oct 10, 2025 •

edited

Loading