Some preliminary investigation has shown that the token bucket used for throttling is not very CPU efficient. This causes a significant performance difference between having no throttle and having a throttle that is higher than achievable. Perhaps the throttle implementation can be changed to use something more efficient. Consider using bucket4j and removing unnecessary locks.