Skip to content

Optimise Event Emission Performance #29

@plutopulp

Description

@plutopulp

Problem

Worker emits progress and speed events on every chunk (potentially thousands per file), even when no handlers are subscribed. This creates unnecessary object allocation and async overhead.

Current code (worker.py, lines 304-327):

async for chunk in response.content.iter_chunked(chunk_size):
    # ... process chunk ...
    
    # Always emits, even if no listeners
    await self.emitter.emit("worker.progress", WorkerProgressEvent(...))
    await self.emitter.emit("worker.speed_updated", WorkerSpeedUpdatedEvent(...))

Impact:

  • 1MB file with 1KB chunks = 1,000 events
  • 100MB file = 100,000 events
  • Each event creates dataclass + async call overhead
  • Worker already has TODO comment about this (line 73-75)

Proposed Solutions

Option A: Add listener check (simplest)

# In EventEmitter
def has_listeners(self, event_type: str) -> bool:
    return event_type in self._handlers and len(self._handlers[event_type]) > 0

# In worker
if self.emitter.has_listeners("worker.progress"):
    await self.emitter.emit("worker.progress", event)

Option B: Event throttling

# Only emit progress events every N chunks or N milliseconds
if chunks_since_last_emit >= 10 or time.monotonic() - last_emit_time >= 0.1:
    await self.emitter.emit("worker.progress", event)

Option C: Sampling-based emission

# Emit progress at most 10 times per second
if should_emit_progress(current_time, last_emit_time, min_interval=0.1):
    await self.emitter.emit(...)

Leaning towards

Option A + Option B combined:

  1. Add has_listeners() check (eliminates overhead when no subscribers)
  2. Add throttling (reduces event spam even with subscribers)

Tasks

  • Add has_listeners() to EventEmitter
  • Add throttling logic to worker progress/speed emissions
  • Add configuration for throttle interval
  • Benchmark before/after performance
  • Document event throttling behavior

Priority

Medium - Nice optimization but not critical. Can defer to 0.2.0.

Labels

enhancement, performance, medium-priority, optimization

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions