Skip to content

Improve export scalability using asynchronous background processing #3266

@dsk-dev-ai

Description

@dsk-dev-ai

Problem

The current CSV export implementation has been improved using streaming
responses and chunked iteration. However, it still relies on synchronous
request handling.

For large datasets and concurrent users, this can lead to:

  • High CPU usage during export processing
  • Heavy database queries executed within the request lifecycle
  • Blocking HTTP requests for long durations
  • Potential server overload under concurrent usage

Proposed Solution

Move export processing to an asynchronous background job system:

  • Use a task queue such as Celery with Redis
  • Trigger export as a background task instead of processing in the request
  • Generate the CSV file in a worker process
  • Store the generated file (local storage or S3)
  • Return a response indicating “export in progress”
  • Provide a download link once processing is complete

Suggested Architecture

User Request

→ API Endpoint→ Task Queue (Redis)→ Worker (Celery)→ File Storage→ Download Link

Benefits

  • Eliminates request blocking
  • Distributes load across worker processes
  • Handles large datasets reliably
  • Improves scalability under concurrent usage
  • Reduces risk of server crashes

Possible Enhancements

  • Export job status tracking (pending, processing, completed)
  • Notification system (UI or email when export is ready)
  • Retry mechanism for failed exports
  • Download history for users

Acceptance Criteria

  • Export requests should not block the HTTP response lifecycle
  • Large dataset exports should complete without server errors
  • Multiple concurrent exports should not overload the server
  • Users should be able to download the generated file after completion

Scope

This issue focuses on backend export processing.
UI enhancements (progress indicators, notifications, etc.) can be handled in follow-up issues.

Context

This complements the recent improvements using streaming responses,
which reduce memory usage but do not fully address heavy computation
or concurrency concerns in production environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions