Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions sample-amazon-aurora-dsql-auth-session-mgmt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,13 +153,25 @@ The `sessions.token_hash` column is intentionally NOT given an explicit `CREATE

### Production hardening checklist

This sample focuses on demonstrating Aurora DSQL patterns. Before running in production, add the following layers, none of which are DSQL-specific:
This sample focuses on demonstrating Aurora DSQL patterns. Before running in production, add the following layers:

- **Custom database role for the runtime.** The default `connection.ts` connects as `admin`, which is fine for the proof-of-concept but gives the runtime far more authority than it needs. For production, run the included setup script once to create a least-privilege role and map it to your runtime IAM principal:

```bash
AWS_REGION=us-east-1 \
DSQL_ENDPOINT=<cluster-id>.dsql.us-east-1.on.aws \
APP_ROLE_NAME=app_runtime \
APP_TASK_ROLE_ARN=arn:aws:iam::111122223333:role/auth-service-task-role \
npm run setup-runtime-role
```

Then change `connection.ts` to connect as `app_runtime` instead of `admin`, and attach an IAM task-role policy that grants only `dsql:DbConnect` (not `dsql:DbConnectAdmin`). Keep `admin` for one-off setup steps such as creating the role itself or running migrations. This follows Aurora DSQL's [Database roles and IAM authentication](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-database-roles.html) guidance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note for the undo path — verified live: DROP ROLE app_runtime fails with 2BP01 cannot be dropped because some objects depend on it if you don't first AWS IAM REVOKE app_runtime FROM '<arn>'. Worth one line at the end of this bullet so readers can roll the role back cleanly.

- **Rate limiting.** This sample does not include `express-rate-limit` or any throttling. At minimum, add per-IP rate limits to `/api/auth/register` and `/api/auth/login` to slow brute-force credential stuffing.
- **Trusted proxy / X-Forwarded-For handling.** `req.ip` is recorded in `client_metadata` for session listing. Behind a load balancer, this is the LB IP unless you configure `app.set('trust proxy', ...)` with the correct hop count. Configure it explicitly — never set `trust proxy: true` on a publicly exposed app, since that lets clients spoof `X-Forwarded-For`.
- **bcrypt cost factor.** `passwordHasher.ts` uses cost 10 (~80100 ms per verify on a typical CPU), suitable for a proof-of-concept. Increase to 12 or higher in production after benchmarking your target hardware.
- **Logging and observability.** Replace the `console.warn` / `console.error` calls in `retryWithBackoff.ts` with a structured logger of your choice and forward to CloudWatch / your aggregator.
- **Trusted proxy / X-Forwarded-For handling.** `req.ip` is recorded in `client_metadata` for session listing. Behind a load balancer, this is the LB IP unless you configure `app.set('trust proxy', ...)` with the correct hop count. Configure it explicitly. Never set `trust proxy: true` on a publicly exposed app, since that lets clients spoof `X-Forwarded-For`.
- **bcrypt cost factor.** `passwordHasher.ts` uses cost 10 (~80-100 ms per verify on a typical CPU), suitable for a proof-of-concept. Increase to 12 or higher in production after benchmarking your target hardware.
- **Logging and observability.** Replace the `console.warn` / `console.error` calls in `retryWithBackoff.ts` with a structured logger of your choice and forward to CloudWatch or your aggregator.
- **Token storage on the client.** This sample returns the session token in JSON. In a browser app you'll typically want an HTTP-only, Secure cookie instead.
- **Periodic session purge.** Run `npm run housekeeping` on a schedule (cron, ECS scheduled task, EventBridge-triggered Lambda) to delete expired and long-revoked rows. Configurable via `SESSION_RETENTION_DAYS` (default 30 days). The script wraps each batch in `retryWithBackoff` so transient OCC conflicts don't fail the run.

## Cleanup

Expand Down
4 changes: 3 additions & 1 deletion sample-amazon-aurora-dsql-auth-session-mgmt/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
"start": "node dist/index.js",
"dev": "ts-node src/index.ts",
"test": "vitest run",
"test:watch": "vitest"
"test:watch": "vitest",
"housekeeping": "node dist/scripts/housekeeping.js",
"setup-runtime-role": "node dist/scripts/setup-runtime-role.js"
},
"keywords": [
"aurora-dsql",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ const constructorCalls: unknown[] = [];
vi.mock('@aws/aurora-dsql-node-postgres-connector', () => {
class MockAuroraDSQLPool {
end = mockEnd;
// Stub method matching the real connector's signature so the
// typeof-check assertion in getPool() passes.
transaction = async <T>(cb: (client: unknown) => Promise<T>): Promise<T> => cb({});
constructor(config: unknown) {
constructorCalls.push(config);
}
Expand Down Expand Up @@ -65,6 +68,7 @@ describe('DSQL Connection Pool', () => {
database: 'postgres',
max: 10,
idleTimeoutMillis: 300_000,
maxLifetimeSeconds: 3300,
});
expect(pool).toBeDefined();
});
Expand Down
34 changes: 27 additions & 7 deletions sample-amazon-aurora-dsql-auth-session-mgmt/src/db/connection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,15 @@ let pool: AuroraDSQLPool | null = null;
* Returns the shared DSQL connection pool, creating it on first access.
*
* The pool is configured with:
* - `host` — read from the `DSQL_ENDPOINT` environment variable
* - `user` — `'admin'` (DSQL's default IAM-authenticated user)
* - `database` — `'postgres'` (DSQL's fixed database name)
* - `max` — 10 concurrent connections
* - `idleTimeoutMillis` — 300 000 ms (5 minutes), well under the 1-hour
* DSQL connection timeout so connections are recycled
* before they expire
* - `host` — read from the `DSQL_ENDPOINT` environment variable
* - `user` — `'admin'` (DSQL's default IAM-authenticated user)
* - `database` — `'postgres'` (DSQL's fixed database name)
* - `max` — 10 concurrent connections
* - `idleTimeoutMillis` — 300 000 ms (5 minutes), well under the 1-hour
* DSQL connection timeout so connections are
* recycled before they expire
* - `maxLifetimeSeconds` — 3 300 s (55 minutes), so each connection retires
* ahead of DSQL's hard 1-hour cap
*
* @throws {Error} If `DSQL_ENDPOINT` is not set in the environment.
*/
Expand All @@ -56,8 +58,26 @@ export function getPool(): AuroraDSQLPool {
database: 'postgres',
max: 10,
idleTimeoutMillis: 300_000, // 5 minutes
// Matches the connector default in @aws/aurora-dsql-node-postgres-connector
// v0.1.9 (parsePgConfig sets maxLifetimeSeconds: 3300 unless overridden);
// kept here for visibility so readers see the 1-hour cap accommodation
// without having to grep the connector source.
maxLifetimeSeconds: 3300,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the connector's parsePgConfig already sets maxLifetimeSeconds: 3300 as a default (grep the published 0.1.9 build). Setting it explicitly here is harmless but redundant. I'd either drop it, or add a comment saying "matches connector default, kept for visibility" so the next person doesn't think it's load-bearing.

});

// Production guarantee: AuroraDSQLPool always exposes transaction(), and
// the repositories rely on it for OCC retry. Surface a clear error here
// if a future refactor accidentally returns a plain pg.Pool, rather than
// silently degrading to the manual BEGIN/COMMIT fallback that exists for
// unit-test mocks.
if (typeof (pool as { transaction?: unknown }).transaction !== 'function') {
throw new Error(
'Pool does not expose transaction(); expected AuroraDSQLPool from ' +
'@aws/aurora-dsql-node-postgres-connector. Repositories rely on ' +
'pool.transaction() for OCC retry.',
);
}

return pool;
}

Expand Down
125 changes: 81 additions & 44 deletions sample-amazon-aurora-dsql-auth-session-mgmt/src/db/migrate.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,25 @@ import { runMigrations, PoolLike, ClientLike } from './migrate';

/**
* Creates a mock client that records every SQL statement it receives.
* Optionally accepts a callback that can throw to simulate failures.
*
* Returns `{ rows: [{ job_id: 'fake-job-id' }] }` for `CREATE INDEX ASYNC`
* statements (matching the real DSQL behaviour that returns a job_id row),
* and `{ rows: [] }` for everything else. Tests that need a different shape
* can pass a custom `onQuery` to override.
*/
function createMockClient(
onQuery?: (sql: string) => void
onQuery?: (sql: string) => void,
): ClientLike & { queries: string[] } {
const queries: string[] = [];
return {
queries,
query: vi.fn(async (sql: string) => {
query: vi.fn(async (sql: string, _params?: unknown[]) => {
queries.push(sql);
onQuery?.(sql);
if (sql.startsWith('CREATE INDEX ASYNC')) {
return { rows: [{ job_id: 'fake-job-id' }] };
}
return { rows: [] };
}),
release: vi.fn(),
};
Expand Down Expand Up @@ -60,17 +68,22 @@ describe('runMigrations', () => {

await runMigrations(pool);

// We expect 3 separate transactions: users table, sessions table,
// and the user_id index. The token_hash UNIQUE constraint already
// creates a backing index, so we deliberately do NOT add a second.
expect(clients).toHaveLength(3);
// 3 DDL transactions (users table, sessions table, user_id index) plus
// a 4th client for the post-DDL `sys.wait_for_job` call after the
// async index. The token_hash UNIQUE constraint already creates a
// backing index, so we deliberately do NOT add a second.
expect(clients).toHaveLength(4);

// Each client should have received BEGIN → DDL → COMMIT
for (const client of clients) {
// First three clients ran BEGIN → DDL → COMMIT.
for (const client of clients.slice(0, 3)) {
expect(client.queries[0]).toBe('BEGIN');
expect(client.queries[2]).toBe('COMMIT');
expect(client.queries).toHaveLength(3);
}

// The 4th client ran the wait_for_job for the async index.
expect(clients[3].queries).toHaveLength(1);
expect(clients[3].queries[0]).toBe('SELECT sys.wait_for_job($1)');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock client's query is vi.fn(async (sql: string) => ...), only sql gets recorded. So a regression where the migrator passes undefined (or the wrong variable) for $1 would still pass this test. Cheap fix:

expect(clients[3].query).toHaveBeenCalledWith(
  'SELECT sys.wait_for_job($1)',
  ['fake-job-id'],
);

});

it('creates the users table first', async () => {
Expand Down Expand Up @@ -98,7 +111,7 @@ describe('runMigrations', () => {
expect(ddl).toContain('token_hash VARCHAR(64) NOT NULL UNIQUE');
expect(ddl).toContain('expires_at TIMESTAMPTZ NOT NULL');
expect(ddl).toContain('revoked_at TIMESTAMPTZ');
expect(ddl).toContain('client_metadata JSONB');
expect(ddl).toContain('client_metadata TEXT');
});

it('creates the user_id index third', async () => {
Expand All @@ -116,11 +129,34 @@ describe('runMigrations', () => {

await runMigrations(pool);

const allDdl = clients.map((c) => c.queries[1]).join('\n');
const allDdl = clients.slice(0, 3).map((c) => c.queries[1]).join('\n');
expect(allDdl).not.toContain('idx_sessions_token_hash');
expect(allDdl).not.toMatch(/CREATE INDEX[^\n]*ON sessions \(token_hash\)/);
});

it('waits for the async index job to complete', async () => {
const { pool, clients } = createMockPool();

await runMigrations(pool);

// The 4th client (after the 3 DDL clients) ran SELECT sys.wait_for_job($1).
// Assert on the params too so a regression that passes undefined for $1
// would fail the test.
expect(clients[3].query).toHaveBeenCalledWith(
'SELECT sys.wait_for_job($1)',
['fake-job-id'],
);
});

it('skips wait_for_job when waitForAsyncJobs is false', async () => {
const { pool, clients } = createMockPool();

await runMigrations(pool, { waitForAsyncJobs: false });

// Only the 3 DDL clients — no wait_for_job client.
expect(clients).toHaveLength(3);
});

it('releases every client back to the pool', async () => {
const { pool, clients } = createMockPool();

Expand All @@ -135,47 +171,48 @@ describe('runMigrations', () => {
// Error handling
// -----------------------------------------------------------------------

it('rolls back and re-throws when a DDL statement fails', async () => {
const ddlError = new Error('relation already exists');
let callCount = 0;

const failingPool: PoolLike = {
it('rolls back the transaction when a DDL statement fails', async () => {
const clients: ReturnType<typeof createMockClient>[] = [];
const pool: PoolLike = {
connect: vi.fn(async () => {
callCount++;
// Fail on the second DDL (sessions table)
if (callCount === 2) {
return createMockClient((sql) => {
if (sql !== 'BEGIN' && sql !== 'ROLLBACK') {
throw ddlError;
}
});
}
return createMockClient();
const indexBeingCreated = clients.length; // 0,1,2,...
const client = createMockClient((sql) => {
// Throw on the second client's DDL (the sessions table CREATE)
if (indexBeingCreated === 1 && sql.startsWith('CREATE TABLE')) {
throw new Error('simulated DDL failure');
}
});
clients.push(client);
return client;
}),
};

await expect(runMigrations(failingPool)).rejects.toThrow(
'relation already exists'
);
});
await expect(runMigrations(pool)).rejects.toThrow('simulated DDL failure');

it('rolls back the transaction on failure before releasing the client', async () => {
const ddlError = new Error('syntax error');
const failingClient = createMockClient((sql) => {
if (sql !== 'BEGIN' && sql !== 'ROLLBACK') {
throw ddlError;
}
});
// Second client should have run BEGIN, attempted DDL, and ROLLBACK.
const secondClient = clients[1];
expect(secondClient.queries).toContain('BEGIN');
expect(secondClient.queries).toContain('ROLLBACK');
});

const failingPool: PoolLike = {
connect: vi.fn(async () => failingClient),
it('still releases the client when the DDL fails', async () => {
const clients: ReturnType<typeof createMockClient>[] = [];
const pool: PoolLike = {
connect: vi.fn(async () => {
const indexBeingCreated = clients.length;
const client = createMockClient((sql) => {
// Fail the very first DDL (users table CREATE)
if (indexBeingCreated === 0 && sql.startsWith('CREATE TABLE')) {
throw new Error('simulated failure');
}
});
clients.push(client);
return client;
}),
};

await expect(runMigrations(failingPool)).rejects.toThrow('syntax error');
await expect(runMigrations(pool)).rejects.toThrow('simulated failure');

// Should have attempted ROLLBACK after the failure
expect(failingClient.queries).toContain('ROLLBACK');
// Client must still be released even after an error
expect(failingClient.release).toHaveBeenCalledTimes(1);
expect(clients[0].release).toHaveBeenCalledTimes(1);
});
});
Loading