Situation
My previous tests used only a small subset of the whole documentation set. But when using sles/*/en-us, the CPU went up to 100% even with 16 CPU cores. This looks suspicious and I think, there is a problem in the code.
Use Case
Making the subcommand really work and still making the CPU responsive.
Background
If we have, for example, 50 deliverables, the process_deliverable would attempt to launch 50 git, daps, and temporary clone tasks concurrently. This leads to a classic "thundering herd" problem, where a massive number of tasks are spawned simultaneously, creating several negative consequences:
- CPU Saturation
Even with many cores, trying to run hundreds of CPU-intensive tasks at once will overwhelm the system, causing CPU usage to spike to 100%.
- Memory Exhaustion
Each of thos tasks consumes a significant amount of memory. Launching too many can exhaust the system's RAM.
- I/O Bottlenecks
The simultaneous creation of hundreds of temporary directories and repository clones can create a bottleneck on the disk's I/O, slowing everything down.
Possible Implementation
The most idiomatic way to control resource usage from within an asyncio application is to limit the number of concurrent tasks that can run simultaneously. Spawning hundreds of git and daps processes at once, even asynchronously, creates significant overhead for the OS scheduler and can easily consume all available CPU cores.
Some ideas:
asyncio.Sempaphore to act as a gate, ensuring only a certain number of process_deliverable tasks are active at any given time. A sensible limit is the number of CPU cores on your machine or from the app config.
- Using an
asyncio:Queue and implement a producer/worker pattern. With a queue, you only create a small, fixed number of long-lived worker tasks. This can be more memory-efficient if the work items themselves are small, as you don't have hundreds of suspended coroutine objects waiting.
- Going through the file and identifying blocking commands.
Some ideas outside of Python:
Situation
My previous tests used only a small subset of the whole documentation set. But when using
sles/*/en-us, the CPU went up to 100% even with 16 CPU cores. This looks suspicious and I think, there is a problem in the code.Use Case
Making the subcommand really work and still making the CPU responsive.
Background
If we have, for example, 50 deliverables, the
process_deliverablewould attempt to launch 50git,daps, and temporary clone tasks concurrently. This leads to a classic "thundering herd" problem, where a massive number of tasks are spawned simultaneously, creating several negative consequences:Even with many cores, trying to run hundreds of CPU-intensive tasks at once will overwhelm the system, causing CPU usage to spike to 100%.
Each of thos tasks consumes a significant amount of memory. Launching too many can exhaust the system's RAM.
The simultaneous creation of hundreds of temporary directories and repository clones can create a bottleneck on the disk's I/O, slowing everything down.
Possible Implementation
The most idiomatic way to control resource usage from within an
asyncioapplication is to limit the number of concurrent tasks that can run simultaneously. Spawning hundreds of git and daps processes at once, even asynchronously, creates significant overhead for the OS scheduler and can easily consume all available CPU cores.Some ideas:
asyncio.Sempaphoreto act as a gate, ensuring only a certain number ofprocess_deliverabletasks are active at any given time. A sensible limit is the number of CPU cores on your machine or from the app config.asyncio:Queueand implement a producer/worker pattern. With a queue, you only create a small, fixed number of long-lived worker tasks. This can be more memory-efficient if the work items themselves are small, as you don't have hundreds of suspended coroutine objects waiting.Some ideas outside of Python:
tasksetpins a process to a specific set of CPU cores:cpulimit(openSUSE package name:cpulimit) monitors and throttles a process to keep its CPU usage below a certain percentage:# Limit the process to 200% CPU (i.e., two full cores) cpulimit --limit=200 -- python -m docbuild metadata ...