When EPP flow controller is enabled in simulated mode experiments, we see that WVA scale up is delayed as it does not see traffic queued up in EPP. The dropped request count from the GuideLLM client perspective is increased. If this is current limitation then it should be documented by running experiments on real GPU cluster.
When EPP flow controller is enabled in simulated mode experiments, we see that WVA scale up is delayed as it does not see traffic queued up in EPP. The dropped request count from the GuideLLM client perspective is increased. If this is current limitation then it should be documented by running experiments on real GPU cluster.