Support for true streaming of large file uploads

#### **Motivation**

The `@remix-run/multipart-parser` is exceptionally fast and efficient for small-to-medium sized payloads. However, its current implementation buffers the entirety of each file part into memory before yielding it to the consumer. This behavior prevents what is arguably the most critical use case for a streaming parser: handling large file uploads in a memory-constrained environment.

This issue proposes introducing a true, end-to-end streaming API for file parts to make the parser robust for all use cases and align its implementation with the "Memory Efficient" promise in the README.

#### **The Core Issue: Unbounded Memory Buffering**

In a real-world test on a system with 16GB of RAM, the current buffering behavior becomes a critical bottleneck:

*   Uploading a **1GB file** caused the process's memory usage to spike to over 1GB.
*   Attempting to upload a **2.5GB file** exhausted all available system memory, crashing the process.
*   In contrast, a library like `busboy` on the same system handled a **20GB file** upload with a stable memory footprint of ~700MB.

The current API design encourages this memory-intensive pattern, as the entire file's content is loaded into `part.bytes` before it can be processed:

```typescript
for await (let part of parseMultipartRequest(request)) {
  if (part.isFile) {
    // By the time this loop yields a `part`, its entire content is already
    // buffered in `part.bytes`, causing memory usage to spike to the size of the file.
    await saveFile(part.filename, part.bytes);
  }
}
```

This effectively turns a streaming transport layer into a buffered-per-part implementation at the application layer, negating the benefits of streaming for large files.

#### **Steps to Reproduce**

The memory exhaustion issue can be reliably reproduced using the `bun-large-file` demo within this repository.

1.  **Clone the repository and navigate to the demo:**
    ```bash
    git clone https://github.com/remix-run/remix.git
    cd remix/packages/multipart-parser/demos/bun-large-file
    ```

2.  **Use a minimal server to isolate the issue:** Ensure `server.ts` uses the standard `parseMultipartRequest` and accesses `part.bytes`.

    ```typescript
    // packages/multipart-parser/demos/bun-large-file/server.ts
    import { parseMultipartRequest } from '@remix-run/multipart-parser'
    import * as fs from 'fs/promises'
    import * as path from 'path'

    const UPLOAD_DIR = path.resolve(__dirname, 'uploads')
    await fs.mkdir(UPLOAD_DIR, { recursive: true })

    Bun.serve({
      port: 3001,
      maxRequestBodySize: Infinity,
      async fetch(request) {
        if (request.method === 'POST') {
          try {
            for await (let part of parseMultipartRequest(request,  { maxFileSize:Infinity })) {
              if (part.isFile) {
                const filePath = path.join(UPLOAD_DIR, part.filename!)
                // This line buffers the entire file into memory before writing.
                await fs.writeFile(filePath, part.bytes)
              }
            }
            return new Response('Upload complete', { status: 200 })
          } catch (error) {
            console.error(error)
            return new Response('Error', { status: 500 })
          }
        }
        return new Response('OK')
      },
    })
    console.log('Server listening on http://localhost:3001 ...')
    ```

3.  **Install dependencies and start the server:**
    ```bash
    pnpm install
    bun start
    ```

4.  **Upload a file larger than available RAM:**
    ```bash
    # Create a dummy 3GB file
    dd if=/dev/zero of=large_file.bin bs=1G count=3

    # Upload the file
    curl -X POST -F "file=@large_file.bin" http://localhost:3001
    ```

5.  **Monitor memory usage:** Observe the `bun` process's memory consumption. It will grow linearly with the size of the upload, eventually leading to process or system instability.

#### **Proposed Solution: A True Streaming API for Parts**

To address this, the content of a file part should be exposed as a `ReadableStream`, allowing the consumer to process the file in chunks as they arrive. This keeps memory usage low and constant, regardless of file size.

##### **Proposal 1: Expose a `ReadableStream` on the `MultipartPart`**

This approach is idiomatic with modern JavaScript and maintains the ergonomic `for await...of` API.

```typescript
for await (let part of parseMultipartRequest(request)) {
  if (part.isFile) {
    // Get a stream of the file content
    const stream = part.stream; // or part.contentStream

    // Pipe it directly to a file on disk or a cloud storage service
    await stream.pipeTo(fs.createWriteStream(part.filename));

  } else {
    // Non-file parts can still be buffered as they are typically small
    console.log(part.name, await part.text());
  }
}
```

**Implementation Considerations:**
*   To prevent accidental buffering, accessing `.bytes` or `.text()` on a part that has had its stream consumed should throw an error.
*   Conversely, accessing `.stream` after `.bytes` has been read should yield an empty stream or throw.
*   This new property would only be necessary for file parts (`isFile === true`).

##### **Proposal 2: An Event-Driven API (like Busboy)**

Alternatively, an event-based approach is a well-established pattern for memory-efficient stream processing.

```typescript
const parser = createStreamingMultipartParser(request);

parser.on('file', (filename, stream, contentType) => {
  // 'stream' is a ReadableStream of the file content
  console.log(`Receiving file: ${filename}`);
  stream.pipeTo(fs.createWriteStream(filename));
});

parser.on('field', (name, value) => {
  console.log(`Received field: ${name} = ${value}`);
});

await parser.done();
```
This pattern, while a larger departure from the current API, is proven to be highly effective for this use case.

#### **Conclusion**

Implementing a true streaming primitive for file parts would solidify `@remix-run/multipart-parser`'s position as a top-tier solution. It would combine its already benchmarked speed with the memory safety required for modern, production-grade applications, making it a clear and compelling choice for all multipart parsing needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for true streaming of large file uploads #10845

Motivation

The Core Issue: Unbounded Memory Buffering

Steps to Reproduce

Proposed Solution: A True Streaming API for Parts

Proposal 1: Expose a `ReadableStream` on the `MultipartPart`

Proposal 2: An Event-Driven API (like Busboy)

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for true streaming of large file uploads #10845

Description

Motivation

The Core Issue: Unbounded Memory Buffering

Steps to Reproduce

Proposed Solution: A True Streaming API for Parts

Proposal 1: Expose a ReadableStream on the MultipartPart

Proposal 2: An Event-Driven API (like Busboy)

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Proposal 1: Expose a `ReadableStream` on the `MultipartPart`