Skip to content

Sticky Load Balancing during inference not hitting all the Triton Servers sitting behind the Load Balancer #820

Open
@animikhaich

Description

@animikhaich

I have an architecture that looks like this:

Image

The main issue is that the load balancer is not distributing the traffic across the different server VMs. Instead at any given time I only see 40-50% of the VMs under full load being used. I am closing the gRPC client after each request using client.close(). However, it seems that session is still sticky, which is likely the root cause.

I am using Azure Load Balancer.

Would love to know if anyone has faced this issue and resolved it. If so, how?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions