Skip to content

S3 binary data: socket pool exhaustion causes workflow hangs after N iterations #26968

@dpbutter

Description

@dpbutter

Bug Description

Bug Description

Workflows that repeatedly read a binary file stored in S3, using a node that calls destroy() before fully exhausting the buffer, will eventually hang with no error. The hang is silent — no timeout, no log output — and persists until the worker process is restarted. The issue manifests after a variable number of iterations (proportional to maxSockets, default 50 in the AWS SDK's NodeHttpHandler).

A typical pattern that triggers it: a loop that reads a large CSV from S3 in batches using the Extract from File node, where each iteration reads the first X rows then stops.

Root cause

The NodeHttpHandler in AWS SDK v3 (@aws-sdk/client-s3) uses an http.Agent keep-alive pool with maxSockets: 50. When ObjectStoreService.get() is called with mode: 'stream', it returns the raw SDK response body (SdkStream). If the caller does not fully consume this stream, the socket stays in the agent's active set and is never returned to the free pool.

The Extract from File node (fromFile.operation.ts), when set with max rows, deliberately reads only the first X rows of a CSV, then calls stream.destroy() to stop early. Calling .destroy() on the SDK response stream sends a TCP RST but does not correctly decrement the socket count in the http.Agent — the slot remains marked as occupied. After 50 such calls (one per batch iteration), all slots are phantom-occupied and every subsequent S3 request queues forever.

This is a known issue with AWS: aws/aws-sdk-js-v3#6691

Possible solution

Note: There are nodes other than Extract From CSV that call destroy(), so a good solution would be to fix in one place once and for all.

The AWS SDK v3 documentation explicitly states that the correct way to abandon a response stream mid-flight is via AbortController, not stream.destroy(). The abort path routes through the SDK's own request teardown machinery, which correctly handles agent bookkeeping.

In ObjectStoreService.get() (stream mode): Pass an AbortController signal to s3Client.send(), wrap the response body in a PassThrough, and on wrapper 'close' (triggered when the caller destroys the stream), call abortController.abort() if the body wasn't fully consumed:

	const command = new GetObjectCommand({
		Bucket: this.bucket,
		Key: fileId,
	});

	try {
		const abortController = new AbortController();
		const { Body: body } = await this.s3Client.send(command, {
			abortSignal: abortController.signal,
		});
		if (!body) throw new UnexpectedError('Received empty response body');

		if (mode === 'stream') {
			if (!(body instanceof Readable)) {
				throw new UnexpectedError(`Expected stream but received ${typeof body}.`);
			}

			const wrapper = new PassThrough();

			// When the wrapper closes without the body being fully consumed
			// (e.g. caller calls destroy() after a partial read), route cleanup
			// through the SDK's abort path, which correctly frees the socket slot
			// in the http.Agent. A raw destroy() on the response stream does not.
			wrapper.on('close', () => {
				if (!body.readableEnded) {
					abortController.abort();
					body.destroy();
				}
			});

			body.on('error', (err) => wrapper.destroy(err));
			body.pipe(wrapper);

			return wrapper;
		}

		return await streamToBuffer(body as Readable);
	} catch (e) {
		throw new UnexpectedError('Request to S3 failed', { cause: e });
	}
}

This fixes all callers at once — no changes needed to the individual nodes that consume streams.

(Acknowledgement to Claude Code for helping debug the problem and test a solution.)

To Reproduce

  1. Store a decent-size CSV (e.g. 10MB+) in S3 as a binary data object
  2. Create a workflow that loops over the file in batches using Extract from File (CSV format, Max Row Count set)
  3. Run the workflow — it succeeds for the first ~50 iterations, then hangs indefinitely

Expected behavior

It doesn't hang.

Debug Info

Debug info

core

  • n8nVersion: 2.12.0
  • platform: docker (self-hosted)
  • nodeJsVersion: 24.13.1
  • nodeEnv: production
  • database: postgres
  • executionMode: scaling (single-main)
  • concurrency: -1
  • license: enterprise (production)
  • consumerId: 2951cab7-17a3-41ae-9aac-bbd7e1a15af4

storage

  • success: all
  • error: all
  • progress: false
  • manual: true
  • binaryMode: s3

pruning

  • enabled: true
  • maxAge: 336 hours
  • maxCount: 10000 executions

client

  • userAgent: mozilla/5.0 (macintosh; intel mac os x 10_15_7) applewebkit/537.36 (khtml, like gecko) chrome/145.0.0.0 safari/537.36
  • isTouchDevice: false

Generated at: 2026-03-13T00:31:01.100Z

Operating System

Docker (MacOS)

n8n Version

2.12.0

Node.js Version

24.13.1

Database

PostgreSQL

Execution mode

queue

Hosting

self hosted

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions