Skip to content

Commit 75def5f

Browse files
committed
docs: add postgres connection pool spec aligning broker with state
1 parent eac644c commit 75def5f

1 file changed

Lines changed: 299 additions & 0 deletions

File tree

specs/postgres_connection_pool.md

Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
# PostgreSQL Connection Pool Configuration (HikariCP)
2+
3+
The Broker already pools its PostgreSQL connections with HikariCP, but the pool
4+
is configured by hand: every tuning value is a hardcoded literal in
5+
`PostgresConnectivity.createDataSource()`, the pool is unnamed, the
6+
`DataSource` is never closed, and each repository opens its own
7+
`Database.connect(...)`. The companion service `sdkman-state` recently
8+
reworked the same concern as part of a throughput initiative and arrived at a
9+
more operable standard: pool tuning is externalised to config and environment
10+
variables with documented defaults, the pool is named, the `DataSource`
11+
lifecycle is managed, and a single Exposed `Database.connect` pins an explicit
12+
transaction isolation level. This spec brings the Broker's PostgreSQL pooling
13+
up to that standard so the two services — twin towers either side of the same
14+
Postgres schema — behave and are tuned consistently.
15+
16+
The scope here is **connection pooling and Exposed wiring only**. The Broker's
17+
query logic, the Mongo backend, the `postgres.*` config namespace, and the
18+
read-only contract with `sdkman-state` are all unchanged.
19+
20+
*Reference: `sdkman-state/specs/concurrency-optimisations.md` (R1, R2, R9 and
21+
the "Extra Considerations" notes on isolation level) — the upstream spec that
22+
established this standard.*
23+
24+
## Behaviour
25+
26+
After this change the Broker's pooled `javax.sql.DataSource` is configured
27+
entirely from `application.conf` with environment-variable overrides and the
28+
same defaults `sdkman-state` ships. The five tunables — maximum pool size,
29+
minimum idle, connection timeout, max lifetime, idle timeout — are no longer
30+
literals in code. The pool carries the name `sdkman-broker-pool` so it is
31+
identifiable in HikariCP log lines and JMX/metrics alongside
32+
`sdkman-state-pool`. The `DataSource` is constructed with
33+
`initializationFailTimeout = -1`, so the application boots even when Postgres
34+
is temporarily unreachable and surfaces the outage through `/meta/health`
35+
rather than failing at startup. On shutdown the pool is closed cleanly. Exposed
36+
is wired through a single `Database.connect` that sets
37+
`defaultIsolationLevel = TRANSACTION_READ_COMMITTED`, and every repository
38+
shares that one `Database` rather than connecting independently.
39+
40+
None of this is observable to a download client. The HTTP contract in
41+
`postgres_version_repository.md` is unchanged; this spec only changes how the
42+
connection pool is configured and wired.
43+
44+
## Business Rules
45+
46+
1. **Pool tuning is externalised, with `sdkman-state`'s defaults.** The five
47+
HikariCP tunables are read from configuration, not hardcoded. The Broker
48+
keeps its existing `postgres.*` namespace (renaming to state's `database.*`
49+
would break already-deployed environment variables), so the new keys live
50+
under `postgres.pool.*` with `POSTGRES_POOL_*` overrides. The structure,
51+
key names, and **default values match `sdkman-state` exactly**:
52+
53+
| Setting | Config key | Env override | Default | Meaning |
54+
|-----------------------|-------------------------------------|---------------------------------------|-------------|-----------------------------------------------------|
55+
| `maximumPoolSize` | `postgres.pool.maxSize` | `POSTGRES_POOL_MAX_SIZE` | `20` | Max total connections. |
56+
| `minimumIdle` | `postgres.pool.minIdle` | `POSTGRES_POOL_MIN_IDLE` | `2` | Warm idle connections kept ready. |
57+
| `connectionTimeout` | `postgres.pool.connectionTimeoutMs` | `POSTGRES_POOL_CONNECTION_TIMEOUT_MS` | `5000` | Fail fast when the pool is starved (ms). |
58+
| `maxLifetime` | `postgres.pool.maxLifetimeMs` | `POSTGRES_POOL_MAX_LIFETIME_MS` | `1800000` | Retire a connection after 30 min. |
59+
| `idleTimeout` | `postgres.pool.idleTimeoutMs` | `POSTGRES_POOL_IDLE_TIMEOUT_MS` | `600000` | Drop idle connections after 10 min. |
60+
61+
This changes two current values: `maximumPoolSize` moves from a hardcoded
62+
`10` to a default of `20`, and `connectionTimeout` from a hardcoded
63+
`30000` to a default of `5000`. Both now match state and both are
64+
overridable per environment.
65+
66+
2. **`AppConfig` exposes the pool tunables.** The `AppConfig` interface gains
67+
five members and `DefaultAppConfig` reads them with the existing
68+
`getIntOrDefault` / `getLongOrDefault` helpers so absent keys fall back to
69+
the defaults in Rule 1. Timeouts are `Long` (milliseconds); sizes are `Int`.
70+
`PostgresConnectivity` consumes these instead of literals.
71+
72+
3. **The pool is named.** `HikariConfig.poolName = "sdkman-broker-pool"`,
73+
mirroring state's `sdkman-state-pool`. This makes the two pools
74+
distinguishable in logs and metrics when both services share infrastructure.
75+
76+
4. **The pool boots even when Postgres is down.** Set
77+
`initializationFailTimeout = -1` so `HikariDataSource` construction does not
78+
block on or fail because of an unreachable database. This is a deliberate
79+
behaviour change: today `PostgresConnectivity.dataSource()` wraps
80+
construction in `Either.catch` and rethrows `IllegalStateException`, so a
81+
database that is down at boot kills the process. Under this rule the process
82+
starts and the outage is reported through the existing
83+
`PostgresHealthRepository` / `/meta/health` path instead. The `Either.catch`
84+
wrapper stays (it still guards against genuine misconfiguration such as a
85+
bad driver class), but connectivity failures no longer abort startup.
86+
87+
> **Why this value.** The choice is driven purely by **parity with
88+
> `sdkman-state`**, which sets `-1`, not by a resilience requirement —
89+
> managed Postgres downtime is effectively a non-event in this deployment.
90+
> Fail-fast-at-boot would be an equally defensible standard; what matters is
91+
> that both services use the *same* one. If the teams ever prefer fail-fast,
92+
> change it in both services together, never in just one.
93+
94+
5. **One Exposed `Database`, with an explicit isolation level.** Replace the
95+
per-repository `Database.connect(dataSource)` (currently created `by lazy`
96+
independently in `PostgresVersionRepository` and `PostgresAuditRepository`)
97+
with a single `Database.connect` performed once at wiring time and shared by
98+
all repositories:
99+
100+
```kotlin
101+
Database.connect(
102+
datasource = dataSource,
103+
databaseConfig = DatabaseConfig { defaultIsolationLevel = Connection.TRANSACTION_READ_COMMITTED },
104+
)
105+
```
106+
107+
Pinning `defaultIsolationLevel` matches state and makes the isolation level
108+
deterministic instead of driver-default; it also stops Exposed probing the
109+
database for its default isolation level on the first transaction.
110+
111+
> **Note on motivation.** In `sdkman-state` this setting is load-bearing: it
112+
> runs queries inside `newSuspendedTransaction`, where a first-transaction
113+
> metadata probe against an unreachable database deadlocks the request
114+
> coroutine (state R9). The Broker today uses **blocking** `transaction(db)`
115+
> and its health check uses **raw JDBC** (`dataSource.connection`), so it is
116+
> not exposed to that coroutine deadlock. Here the setting is adopted for
117+
> consistency, deterministic isolation, and one fewer startup round-trip —
118+
> and as cheap insurance should the Broker later move to suspended
119+
> transactions.
120+
121+
6. **The pool is closed on shutdown.** The `HikariDataSource` must be closed
122+
when the application stops. Because the Broker builds the `DataSource` in
123+
`App.main` (outside the Ktor module, before `embeddedServer(...).start`),
124+
register a JVM shutdown hook — or subscribe to Ktor's `ApplicationStopped`
125+
monitor event — to call `dataSource.close()`. State closes its pool via
126+
`monitor.subscribe(ApplicationStopped) { dataSource.close() }`; the Broker
127+
should achieve the equivalent given its wiring.
128+
129+
7. **The corrected SSL handling is preserved, not regressed.** The Broker's
130+
`buildConnectionString()` already appends a lowercase `?sslmode=...` (the
131+
parameter name the PostgreSQL JDBC driver actually honours) and is
132+
configurable via `postgres.sslmode` / `POSTGRES_SSLMODE`. This is **more
133+
correct** than state, which hardcodes `?sslMode=prefer` (camel-cased, and
134+
therefore likely ignored by pgjdbc). The Broker keeps its own handling
135+
unchanged; aligning state's URL is a separate, upstream concern (see Out of
136+
Scope).
137+
138+
## Examples
139+
140+
### `application.conf` — new `postgres.pool` block
141+
142+
The `postgres` block gains a nested `pool` block; the existing connection keys
143+
are unchanged.
144+
145+
```hocon
146+
postgres {
147+
database = "sdkman"
148+
database = ${?POSTGRES_DATABASE}
149+
host = "127.0.0.1"
150+
host = ${?POSTGRES_HOST}
151+
port = "5432"
152+
port = ${?POSTGRES_PORT}
153+
username = "postgres"
154+
username = ${?POSTGRES_USERNAME}
155+
password = "postgres"
156+
password = ${?POSTGRES_PASSWORD}
157+
sslmode = "disable"
158+
sslmode = ${?POSTGRES_SSLMODE}
159+
160+
pool {
161+
maxSize = 20
162+
maxSize = ${?POSTGRES_POOL_MAX_SIZE}
163+
minIdle = 2
164+
minIdle = ${?POSTGRES_POOL_MIN_IDLE}
165+
connectionTimeoutMs = 5000
166+
connectionTimeoutMs = ${?POSTGRES_POOL_CONNECTION_TIMEOUT_MS}
167+
maxLifetimeMs = 1800000
168+
maxLifetimeMs = ${?POSTGRES_POOL_MAX_LIFETIME_MS}
169+
idleTimeoutMs = 600000
170+
idleTimeoutMs = ${?POSTGRES_POOL_IDLE_TIMEOUT_MS}
171+
}
172+
}
173+
```
174+
175+
### `AppConfig` additions
176+
177+
```kotlin
178+
interface AppConfig {
179+
// existing postgres.* members …
180+
val postgresPoolMaxSize: Int
181+
val postgresPoolMinIdle: Int
182+
val postgresPoolConnectionTimeoutMs: Long
183+
val postgresPoolMaxLifetimeMs: Long
184+
val postgresPoolIdleTimeoutMs: Long
185+
}
186+
187+
class DefaultAppConfig : AppConfig {
188+
//
189+
override val postgresPoolMaxSize: Int = config.getIntOrDefault("postgres.pool.maxSize", 20)
190+
override val postgresPoolMinIdle: Int = config.getIntOrDefault("postgres.pool.minIdle", 2)
191+
override val postgresPoolConnectionTimeoutMs: Long =
192+
config.getLongOrDefault("postgres.pool.connectionTimeoutMs", 5_000L)
193+
override val postgresPoolMaxLifetimeMs: Long =
194+
config.getLongOrDefault("postgres.pool.maxLifetimeMs", 1_800_000L)
195+
override val postgresPoolIdleTimeoutMs: Long =
196+
config.getLongOrDefault("postgres.pool.idleTimeoutMs", 600_000L)
197+
}
198+
```
199+
200+
### `PostgresConnectivity.createDataSource` — before and after
201+
202+
```kotlin
203+
// before — hardcoded literals, no pool name, fail-fast on a down database
204+
private fun createDataSource(connectionString: String): DataSource =
205+
with(HikariConfig()) {
206+
jdbcUrl = connectionString
207+
config.postgresUsername.map { username = it }
208+
config.postgresPassword.map { password = it }
209+
driverClassName = "org.postgresql.Driver"
210+
maximumPoolSize = 10
211+
minimumIdle = 2
212+
connectionTimeout = 30000
213+
idleTimeout = 600000
214+
maxLifetime = 1800000
215+
HikariDataSource(this)
216+
}
217+
218+
// after — sourced from AppConfig, named, boots even if Postgres is down
219+
private fun createDataSource(connectionString: String): DataSource =
220+
with(HikariConfig()) {
221+
jdbcUrl = connectionString
222+
config.postgresUsername.map { username = it }
223+
config.postgresPassword.map { password = it }
224+
driverClassName = "org.postgresql.Driver"
225+
maximumPoolSize = config.postgresPoolMaxSize
226+
minimumIdle = config.postgresPoolMinIdle
227+
connectionTimeout = config.postgresPoolConnectionTimeoutMs
228+
maxLifetime = config.postgresPoolMaxLifetimeMs
229+
idleTimeout = config.postgresPoolIdleTimeoutMs
230+
poolName = "sdkman-broker-pool"
231+
initializationFailTimeout = -1
232+
HikariDataSource(this)
233+
}
234+
```
235+
236+
### Environment-variable overrides (operations)
237+
238+
| Env variable | Overrides | Default |
239+
|---------------------------------------|------------------------|-----------|
240+
| `POSTGRES_POOL_MAX_SIZE` | `maximumPoolSize` | `20` |
241+
| `POSTGRES_POOL_MIN_IDLE` | `minimumIdle` | `2` |
242+
| `POSTGRES_POOL_CONNECTION_TIMEOUT_MS` | `connectionTimeout` | `5000` |
243+
| `POSTGRES_POOL_MAX_LIFETIME_MS` | `maxLifetime` | `1800000` |
244+
| `POSTGRES_POOL_IDLE_TIMEOUT_MS` | `idleTimeout` | `600000` |
245+
246+
## Out of Scope
247+
248+
- **The `postgres.*``database.*` namespace rename.** State uses `database.*`;
249+
the Broker keeps `postgres.*` to avoid breaking deployed environment
250+
variables. Only the pool sub-structure, keys, defaults, and behaviour are
251+
aligned — not the prefix.
252+
- **Aligning the JDBC driver version.** The Broker is on `postgresql:42.7.7`,
253+
state on `42.7.2`. Bumping is unrelated to pooling; if the teams want a
254+
matched driver it should be a deliberate, separate change (state catching up
255+
is preferable to the Broker downgrading).
256+
- **Migrating to a Gradle version catalog.** State declares HikariCP via
257+
`libs.versions.toml`; the Broker declares it inline in `build.gradle.kts`.
258+
Both already pin HikariCP `5.1.0`, so this is a build-hygiene preference, not
259+
a pooling concern.
260+
- **Fixing state's `sslMode=prefer` typo.** That lives in
261+
`sdkman-state`'s `ConfigExtensions.kt` and is an upstream fix; the Broker's
262+
correct lowercase `sslmode` handling is preserved as-is (Business Rule 7).
263+
- **The Mongo backend.** `MongoConnectivity` and the Mongo repositories are
264+
untouched.
265+
- **Query logic, the UPSERT, and single-transaction write path** from state's
266+
spec (R3–R6). The Broker is a read-only consumer of the `versions` table and
267+
performs no writes to it, so those requirements do not apply.
268+
- **Production Flyway wiring.** State runs Flyway at startup against its pool;
269+
the Broker owns no migrations (Flyway is a `testImplementation` only) because
270+
the schema is owned by `sdkman-state`. No production migration wiring is
271+
added.
272+
273+
## Acceptance Criteria
274+
275+
- [ ] The five HikariCP tunables are read from `postgres.pool.*` config with
276+
`POSTGRES_POOL_*` env overrides; no pool tunable remains a literal in
277+
`PostgresConnectivity`.
278+
- [ ] Defaults applied when env vars are absent match `sdkman-state`: maxSize
279+
`20`, minIdle `2`, connectionTimeout `5000`, maxLifetime `1800000`,
280+
idleTimeout `600000`.
281+
- [ ] `AppConfig` / `DefaultAppConfig` expose the five pool members and a unit
282+
test verifies they load from HOCON with defaults applied when env vars
283+
are unset.
284+
- [ ] The pool is named `sdkman-broker-pool`.
285+
- [ ] `initializationFailTimeout = -1` is set, and the application starts
286+
successfully against an unreachable Postgres, with `/meta/health`
287+
reporting the database as unhealthy rather than the process aborting at
288+
boot.
289+
- [ ] Exposed is wired through a single `Database.connect` with
290+
`defaultIsolationLevel = TRANSACTION_READ_COMMITTED`, shared by all
291+
Postgres repositories (no per-repository `Database.connect`).
292+
- [ ] The `HikariDataSource` is closed on application shutdown.
293+
- [ ] The existing lowercase `sslmode` connection-string handling is unchanged.
294+
- [ ] An integration test opens more concurrent queries than a small configured
295+
`maxSize` and asserts they all complete (queued requests served as
296+
connections free up), proving the pool is the active path.
297+
- [ ] Existing download and health acceptance/integration specs pass unchanged.
298+
- [ ] All quality gates pass (`./gradlew clean check` — compile, detekt,
299+
ktlint, tests).

0 commit comments

Comments
 (0)