feat: Implement sliding window pipeline for concurrent downloading and processing #76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR implements a "sliding window" pipeline to address Issue #73, replacing the previous batch-based approach with an asynchronous producer-consumer model. This allows data processing (CPU-bound) to occur concurrently with data downloading (I/O-bound), significantly improving throughput.
Key Changes
src/satellite_consumer/download_eumetsat.py:
deque-based task buffer to ensuring that while downloads happen in parallel, results are yielded in strict chronological order to the consumer.src/satellite_consumer/consume.py:
async.async forloop over the buffered stream.process_rawfunction inasyncio.to_threadto prevent blocking the event loop (and thus the background downloads).src/satellite_consumer/cmd/main.py:
asyncio.run().concurrent_downloadsconfiguration (defaulting to 5 workers).Verification
uv run python -m unittest discover ...)mypychecks on all modified files.ruff check.. . . . [Processing] num_files=1).