Fix: session leak, unbounded limits, missing config passthrough by devdgna · Pull Request #35 · Bhutan-NDI/ngotag-agent-controller

devdgna · 2026-03-10T09:58:52Z

Issue
The credo's RDS instance requires a some of the largest instances in cloud environments to run due to excessive CPU and RAM consumption. DB insights shows the top SQL queries are all Askar SELECT ... ARRAY_AGG queries on the shared items/items_tags tables, with massive DB load utilization.

Root causes

Tenant session leak in ProofEvents
File: ProofEvents.ts
Every proof state change event for a multi-tenant agent calls getTenantAgent() which acquires a database session (PostgreSQL connection + wallet profile lock), but never calls endSession(). These sessions accumulate indefinitely over the lifetime of the process. With continuous proof exchanges across tenants, this causes hundreds of orphaned database connections consuming DB memory.
Unbounded tenant session limits
File: cliAgent.ts
sessionLimit and sessionAcquireTimeout both defaulted to Infinity. This means every concurrent tenant API request opens a simultaneous DB session with no backpressure; if 500 requests arrive at once, 500 sessions are opened against PostgreSQL.
No DB connection pool defaults
File: cli.ts
maxConnections, connectTimeout, and idleTimeout resolved to NaN when neither CLI args nor environment variables were set (since Number(undefined) returns NaN). Askar was left to open connections without any pool ceiling.
Unbounded in-memory LRU cache
File: cliAgent.ts
The InMemoryLruCache defaulted to limit: Infinity, meaning cached entries (resolved DIDs, schemas, credential definitions) were never evicted. This caused unbounded memory growth in the Node.js process and GC pressure.
walletScheme config never passed through
File: cli.ts
The wallet-scheme CLI option was parsed but never included in the object passed to runRestAgent(). The walletScheme parameter always arrived as undefined, making the fallback AskarMultiWalletDatabaseScheme.ProfilePerWallet always activate regardless of what was configured.
Unguarded DB queries in CredentialEvents
File: CredentialEvents.ts
connections.findById() and credentials.getFormatData() were called without error handling on every credential state change event. A failure in either query could crash the event handler and leave connections in an indeterminate state.

Changes

Fix session leak: ProofEvents.ts

Wrapped tenantAgent.proofs.getFormatData() in try/finally
Added await tenantAgent.endSession() in the finally block to ensure the DB session is always released

Set bounded defaults: cliAgent.ts

Changed sessionLimit from Infinity to 100 (configurable via SESSION_LIMIT env var)
Changed sessionAcquireTimeout from Infinity to 30000ms (configurable via SESSION_ACQUIRE_TIMEOUT env var)
Changed InMemoryLruCache limit from Infinity to 500 (configurable via INMEMORY_LRU_CACHE_LIMIT env var)

Set connection pool defaults and fix config passthrough: cli.ts

Added fallback defaults: maxConnections: 25, connectTimeout: 10000 ms, idleTimeout: 30000 ms
Added missing walletScheme: parsed['wallet-scheme'] to the config object passed to runRestAgent()

Guard DB queries: CredentialEvents.ts

Wrapped connections.findById() and credentials.getFormatData() in try/catch
On failure, logs a warning and still sends the webhook/websocket event with the data available (graceful degradation)

RECOMMENDED values:

SESSION_ACQUIRE_TIMEOUT=30000 (30s) - fail fast instead of hanging!
SESSION_LIMIT=500
INMEMORY_LRU_CACHE_LIMIT=2000 - Covers all tenants + schemas + DIDs with room

windowMs=1000
maxRateLimit=800

CONNECT_TIMEOUT=10000
MAX_CONNECTIONS=100
IDLE_TIMEOUT=30000

Copilot

Pull request overview

This PR addresses excessive DB/CPU/memory usage in the multi-tenant Credo controller by ensuring tenant DB sessions are released, adding bounded defaults for tenant/session/cache/pool settings, passing through walletScheme, and making credential event enrichment resilient to query failures.

Changes:

Ensure tenant sessions are ended in ProofEvents via try/finally.
Add bounded defaults for tenant session limits/timeouts and in-memory LRU cache sizing.
Add sane wallet pool defaults and pass walletScheme through the CLI config; guard credential enrichment queries.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`src/events/ProofEvents.ts`	Ensures tenant session cleanup after proof format data enrichment.
`src/events/CredentialEvents.ts`	Wrapes credential event enrichment in error handling to avoid handler crashes.
`src/cliAgent.ts`	Sets bounded defaults for LRU cache size and tenant session limits/timeouts.
`src/cli.ts`	Adds wallet pool defaults and passes `walletScheme` into `runRestAgent()` config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T10:03:21Z

+        const data = await tenantAgent.proofs.getFormatData(record.id)
+        body.proofData = data
+      } finally {
+        await tenantAgent.endSession()


endSession() is awaited inside finally, but if endSession() rejects it will override any error from getFormatData() and will also prevent the webhook/websocket emission that follows. Consider wrapping tenantAgent.endSession() in its own try/catch (log and continue) so session cleanup failures don’t break event delivery or mask the original error.

Suggested change

await tenantAgent.endSession()

try {

await tenantAgent.endSession()

} catch (error) {

agent.config.logger.error('Failed to end tenant session', { error })

}

Copilot · 2026-03-10T10:03:22Z

+
+      const data = await agent.credentials.getFormatData(record.id)
+      body.credentialData = data
+    } catch (error) {
+      agent.config.logger.warn('Failed to enrich credential event data', { error })


The single try/catch wraps both connections.findById() and credentials.getFormatData(). If the connection lookup throws (e.g., record not found / storage error), the code skips fetching credentialData, even though it may still be available. To match the “send event with the data available” behavior, handle these enrichments independently (separate try/catch blocks) so one failing query doesn’t prevent the other enrichment.

Suggested change

const data = await agent.credentials.getFormatData(record.id)

body.credentialData = data

} catch (error) {

agent.config.logger.warn('Failed to enrich credential event data', { error })

} catch (error) {

agent.config.logger.warn('Failed to enrich credential event data with connection information', { error })

}

try {

const data = await agent.credentials.getFormatData(record.id)

body.credentialData = data

} catch (error) {

agent.config.logger.warn('Failed to enrich credential event data with credential format data', { error })

Copilot · 2026-03-10T10:03:22Z

+      cache: new InMemoryLruCache({ limit: Number(process.env.INMEMORY_LRU_CACHE_LIMIT) || 500 }),
    }),


Using Number(process.env.INMEMORY_LRU_CACHE_LIMIT) || 500 means an explicit value of 0 (or any falsy number) will be ignored and replaced with the default, and non-numeric values will be silently treated as “unset”. If 0 is intended to disable caching (or if you want clearer misconfig handling), prefer explicit parsing/validation (e.g., nullish coalescing plus Number.isFinite checks) rather than ||.

Copilot · 2026-03-10T10:03:22Z

+      sessionAcquireTimeout: Number(process.env.SESSION_ACQUIRE_TIMEOUT) || 30000,
+      sessionLimit: Number(process.env.SESSION_LIMIT) || 100,
    }),


Number(process.env.SESSION_ACQUIRE_TIMEOUT) || 30000 and Number(process.env.SESSION_LIMIT) || 100 treat 0 as “unset” and silently fall back to defaults. If you want to support 0 (or fail fast on invalid values), switch to explicit parsing/validation (e.g., const v = Number(env); if (!Number.isFinite(v) || v < 1) ...). This also avoids silently accepting negative numbers.

Copilot · 2026-03-10T10:03:23Z

+          connectTimeout: parsed['wallet-connect-timeout'] || Number(process.env.CONNECT_TIMEOUT) || 10000,
+          maxConnections: parsed['wallet-max-connections'] || Number(process.env.MAX_CONNECTIONS) || 25,
+          idleTimeout: parsed['wallet-idle-timeout'] || Number(process.env.IDLE_TIMEOUT) || 30000,


These numeric config fallbacks use ||, so an explicit CLI/env value of 0 will be ignored and replaced with the next fallback. If 0 is a meaningful value (e.g., disable timeout), use nullish coalescing (??) and validate with Number.isFinite/range checks instead of relying on truthiness.

fix: Patch session leak, unbounded limits, missing config passthrough

d31b488

devdgna requested review from Copilot and kinxa0 March 10, 2026 09:59

devdgna self-assigned this Mar 10, 2026

Copilot started reviewing on behalf of devdgna March 10, 2026 10:00 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: session leak, unbounded limits, missing config passthrough#35

Fix: session leak, unbounded limits, missing config passthrough#35
devdgna wants to merge 1 commit into
mainfrom
fix/optimisation

devdgna commented Mar 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		cache: new InMemoryLruCache({ limit: Number(process.env.INMEMORY_LRU_CACHE_LIMIT) \|\| 500 }),
		}),

Conversation

devdgna commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RECOMMENDED values:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devdgna commented Mar 10, 2026 •

edited

Loading