Reworking the threading (at least from my last experience the input thread is the bottleneck, not the actual computation)
Reworking the threading (at least from my last experience the input thread is the bottleneck, not the actual computation)