Skip to content

Worker Identification and Logger Binding #48

@plutopulp

Description

@plutopulp

Worker Identification and Logger Binding

Problem

Currently, download workers are anonymous. When running with max_workers=3, logs show interleaved messages from all workers without an easy way to distinguish which worker is doing what, other than by the URL they are processing.

DEBUG | Downloading https://example.com/file1.zip
DEBUG | Downloading https://example.com/file2.zip  <-- Which worker?
DEBUG | Download completed ...

Proposed Solution

Assign a unique ID (e.g., worker-1, worker-2) to each worker instance and bind it to the logger.

Implementation

  1. Modify WorkerPool.start() to pass an index/ID to create_worker:
for i in range(self._max_workers):
    worker = self.create_worker(client, worker_id=i)
  1. Update WorkerPool.create_worker() to bind the logger:
def create_worker(self, client: ClientSession, worker_id: int) -> BaseWorker:
    # Bind worker_id to the logger context
    worker_logger = self._logger.bind(worker_id=f"worker-{worker_id}")

    # Pass bound logger to factory
    # DownloadWorker doesn't need code changes, it just uses the logger provided
    worker = self._worker_factory(client, worker_logger, emitter)
    return worker

Benefits

  • Better Debugging: Easily filter logs by worker ID.
  • Concurrency Visibility: Clearer picture of how tasks are distributed.
  • Traceability: Follow a single worker's lifecycle through multiple downloads.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions