I’m trying to handle 5 concurrent clients using one model instance. However, I noticed voice distortion and heavy CPU spikes when requests run in parallel.
Is the run_stream logic thread-safe?
How should I handle multiple concurrent tasks without quality degradation?