Skip to content

File transfare via Shared volume + REST or message queue #228

@StarmanMartin

Description

@StarmanMartin

For very large file transfers between services in the same Docker Compose environment, using REST to push the entire payload is often the least efficient option.

If both containers run on the same host (or same Docker network), sending multi-GB files over HTTP means:

  • serializing/deserializing request bodies
  • copying data through HTTP stacks
  • buffering
  • potential reverse proxy limits/timeouts
  • retry complexity on partial transfers

A shared volume + lightweight REST/events is often significantly faster and simpler for bulk data exchange.

Option 1: Keep full file transfer over REST

Pros

  • Simple mental model: one API call does everything.
  • Strong request/response semantics.
  • Easier authentication/authorization boundaries.
  • Language/framework agnostic.
  • Easy remote extension later (if services move to separate hosts).
  • Can stream data (chunked, multipart, resumable uploads).

Cons

  • Extra copies of data in memory/kernel/network stack.

  • HTTP overhead (headers, parsing, TLS if enabled).

  • Large request buffering depending on framework/proxy.

  • Timeouts:

    • client timeout
    • reverse proxy timeout
    • idle timeout
  • Harder resumability after failure unless explicitly implemented.

  • Can fill logs/metrics/tracing systems unexpectedly.

  • Higher CPU usage.

  • More disk I/O if temporary upload storage is used.

  • Large files can break load balancers/proxies (e.g. Nginx body limits).

When REST is still fine

  • Files <100–500 MB
  • Infrequent transfers
  • Need external compatibility
  • Already using streaming APIs properly

Option 2: Shared directory/volume + REST for metadata/triggers

Pattern:

  1. Producer writes file to shared volume
  2. Producer calls REST:
    POST /process {path:"/shared/job123/file.bin"}
  3. Consumer reads file directly
  4. Consumer responds/status updates

This is usually the best choice for your case.

Example Docker Compose:

services:
  producer:
    volumes:
      - shared-data:/shared

  consumer:
    volumes:
      - shared-data:/shared

volumes:
  shared-data:

Pros

Performance

  • Usually fastest on single host
  • No network transfer of actual file
  • Zero/minimal serialization
  • OS filesystem caching helps
  • Lower CPU

Reliability

  • File persists independently of service restarts
  • Consumer can retry reading later
  • Easier resume/recovery

Simplicity for huge files

  • No multipart upload complexity
  • No HTTP size limits
  • No proxy issues

Decoupling

  • Producer and consumer can operate asynchronously.

Cons

Shared-state complexity

Now you manage:

  • file naming
  • lifecycle
  • cleanup
  • retention
  • versioning

Without discipline, shared folders become digital attics full of forgotten gigabytes.

Race conditions

Consumer may read file before producer is finished.

Need strategies:

  • temp file then atomic rename
  • lock files
  • completion marker

Example:

file.tmp
mv file.tmp file.done

or

file.bin
file.bin.ready

Security/isolation

Shared volume weakens service boundaries:

  • accidental overwrite
  • unauthorized reads

Need permissions/read-only mounts where possible.

Harder horizontal scaling

Works great on one host.

Gets harder when:

  • multiple hosts
  • Kubernetes
  • cloud autoscaling

Then you need distributed/shared storage.

Cleanup required

Need janitor process/TTL cleanup.


Recommended improvements if using shared directory

Use workflow like:

write -> fsync -> atomic rename -> notify

Detailed:

  1. write /shared/job42/output.part
  2. close + fsync
  3. rename to /shared/job42/output.dat
  4. REST call: "job42 ready"

Atomic rename avoids half-written reads.

Add metadata:

{
  "job_id": "42",
  "path": "/shared/job42/output.dat",
  "checksum": "sha256:..."
}

Consumer validates checksum.


Alternative 3: Shared volume + message queue (better than REST triggers)

Instead of REST triggers:

  • write file to shared volume
  • send message to queue

Tools:

  • RabbitMQ
  • Apache Kafka
  • Redis streams/pubsub

Flow:

Producer -> shared file
Producer -> queue message
Consumer -> receives event -> reads file

Pros

  • asynchronous
  • retries
  • backpressure
  • dead letter queues
  • less coupling than REST

Cons

  • extra infrastructure
  • operational complexity

Best for many jobs/high throughput.


Decision matrix

Approach Speed Complexity Scalability Reliability
REST full transfer Low–Medium Low High Medium
Shared volume + REST Very High Medium Low–Medium High
Shared volume + Queue Very High Medium–High Medium High

My recommendation for your setup

Since already exists:

  • same Docker Compose environment
  • huge files
  • already have REST for commands

Best architecture is probably:

Shared Docker volume for file payloads
+
REST (or queue) for commands/status/events

So:

Service A:
  writes /shared/job123/input.dat
  POST /jobs/job123/start

Service B:
  reads /shared/job123/input.dat
  processes
  writes /shared/job123/result.dat
  POST /jobs/job123/done

This gives:

  • fastest transfer
  • minimal code changes
  • keeps control plane in REST
  • data plane via filesystem

A nice split: REST for intentions, filesystem for bulk data.

That’s usually the sweet spot.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions