feat(haproxy-load-balancer): add haproxy load balancer #241

AWarno · 2025-09-29T11:45:15Z

Enabling Multi-Instance Deployment with HAProxy

Why HAProxy

HAProxy is a lightweight, reliable, and widely used load balancer. It generalizes well to all server types. Using an external load balancer is officially recommended in the vLLM documentation (see vLLM Data Parallel Deployment); the documentation provides an example using NGINX, but HAProxy should work similarly.

Alternative Solutions

Ray
This is useful for multi-node deployments when a model is too large for a single node. It can also be used for multi-instance setups, but it requires knowing how to launch and manage each server type individually (vLLM, SGLang may have different CLI arguments for this). It does not generalize as well as using an external load balancer. However, we may want to provide an example of how to use it for multi-node large model deployment.
LiteLLM
Offers backend orchestration but is generally overkill for simple load balancing. The project evolves quickly, which may affect stability.
NGINX
Very similar to HAProxy for this use case and officially recommended in the vLLM documentation:
vLLM Data Parallel Deployment
HAProxy, however, is slightly simpler/nicer to use in practice (based on my experience).

Literature

TODO

Run on longer tasks to validate stability and performance. (I have checked ifeval so far)
Check if the HAProxy template is correctly included in the pip wheel (consider renaming it)
Documentation
dataclass in types fix!!!!

Next Steps

Add a multi-node deployment example using Ray server. This will likely just require creating one example configuration file under examples/.

copy-pr-bot · 2025-09-29T11:45:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

1. Add total stats. 2. Add reasoning token stats (if provided). - https://platform.openai.com/docs/guides/reasoning or "reasoning_tokens" in usage, (completion_tokens_details, output_tokens_details) 3. Make stats cache-resistant — do not include stats if the response is from cache. --------- Signed-off-by: Anna Warno <[email protected]>

checkbox added Signed-off-by: AWarno <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

It unblocks us to use new Eval Factory containers in the launcher — they don't have `nv-eval`/`nv_eval` alias anymore. Signed-off-by: Piotr Januszewski <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

This is a very basic migration of the readme content + adding a minimal toctree to the home index page so that the sphinx site produces a sidebar. The sidebar will mature and break out in the future into sections such as About, Get Started, etc. We will also add more sections/cards to this page after all other basic edits have been checked in, so it won't be a direct copy of the README, instead it will become a proper docs site home page. --------- Signed-off-by: Lawrence Lane <[email protected]> Signed-off-by: L.B. <[email protected]> Co-authored-by: jgerh <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Docs update --------- Signed-off-by: Anna Warno <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: AWarno <[email protected]> Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: Alexey Gronskiy <[email protected]> Co-authored-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: oliver könig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Marta Stepniewska-Dziubinska <[email protected]> Signed-off-by: Anna Warno <[email protected]>

AWarno · 2025-10-23T10:44:01Z

/ok to test b14f3d7

copy-pr-bot · 2025-10-23T10:44:04Z

/ok to test b14f3d7

@AWarno, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

AWarno · 2025-10-23T10:45:37Z

/ok to test e2fb5ce

Signed-off-by: Anna Warno <[email protected]>

…ator into awarno/haproxy

Signed-off-by: Anna Warno <[email protected]>

AWarno · 2025-10-27T13:41:26Z

/ok to test 10d1c02

Signed-off-by: Anna Warno <[email protected]>

AWarno · 2025-10-27T13:57:56Z

/ok to test f74f6c4

Signed-off-by: Anna Warno <[email protected]>

AWarno requested review from a team and agronskiy as code owners September 29, 2025 11:45

ko3n1g and others added 24 commits September 29, 2025 13:56

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.4

1326d00

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.4

3a846a9

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

chore(ci/release): enable cron run (#216)

098f3fb

Signed-off-by: Anna Warno <[email protected]>

(feat) Configure request method for progress tracking requests (#213)

da80190

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.5

f3e36e7

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.5

7f77d93

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Update overview.md (#210)

6e81164

checkbox added Signed-off-by: AWarno <[email protected]> Signed-off-by: Anna Warno <[email protected]>

feat(multi-instance): haproxy

75aeeb6

Signed-off-by: Anna Warno <[email protected]>

fix(health-url): health url fixed

86a557d

Signed-off-by: Anna Warno <[email protected]>

fix(conflict): fix conflict

0681135

Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.4

710ca95

Signed-off-by: Oliver Koenig <[email protected]>

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.5

fc772a2

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.5

a5bee56

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

fix(executors): migrate to eval-factory cmd (#229)

4884615

It unblocks us to use new Eval Factory containers in the launcher — they don't have `nv-eval`/`nv_eval` alias anymore. Signed-off-by: Piotr Januszewski <[email protected]> Signed-off-by: Anna Warno <[email protected]>

(chore) Update container versions (#230)

d6cf567

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

(chore) Switch to referring to latest in the docs (#231)

de17a9e

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

(chore) Revert switch to latest (#232)

6aa2f8c

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.6

4375a60

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.6

a1fb564

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

ci(fix): Dependabot (#236)

c78bb73

Signed-off-by: oliver könig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Add changes from PR 215 to the README (#239)

1452fce

Signed-off-by: Marta Stepniewska-Dziubinska <[email protected]> Signed-off-by: Anna Warno <[email protected]>

AWarno force-pushed the awarno/haproxy branch from 84c264e to 1452fce Compare September 29, 2025 12:55

copy-pr-bot bot temporarily deployed to test October 23, 2025 10:46 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 23, 2025 10:46 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 23, 2025 10:46 Failure

copy-pr-bot bot temporarily deployed to nemo-ci October 23, 2025 10:48 Inactive

AWarno added 2 commits October 23, 2025 13:48

fix(slurm-tests): fix slurm tests

e74f14f

Signed-off-by: Anna Warno <[email protected]>

Merge branch 'awarno/haproxy' of https://github.com/NVIDIA-NeMo/Evalu…

a1df099

…ator into awarno/haproxy

github-actions bot added the tests label Oct 23, 2025

Merge branch 'awarno/haproxy' of https://github.com/NVIDIA-NeMo/Evalu…

ad31a42

…ator into awarno/haproxy

fgalko-oss previously approved these changes Oct 27, 2025

View reviewed changes

AWarno dismissed fgalko-oss’s stale review via ad31a42 October 27, 2025 12:18

AWarno added 3 commits October 27, 2025 13:27

fix(conflicts): fix conflicts

291601e

Signed-off-by: Anna Warno <[email protected]>

fix(fix-conflicts): fix conflicts

ddb4e1e

Signed-off-by: Anna Warno <[email protected]>

feat(mater-ip): add master ip

10d1c02

Signed-off-by: Anna Warno <[email protected]>

feat(mater-ip): add master ip lint

f74f6c4

Signed-off-by: Anna Warno <[email protected]>

copy-pr-bot bot temporarily deployed to test October 27, 2025 13:58 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 27, 2025 13:59 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 27, 2025 14:00 Inactive

fix(multinode-deployment-health): fix multinode deployment health

3674487

Signed-off-by: Anna Warno <[email protected]>

fgalko-oss self-requested a review October 28, 2025 04:49

fgalko-oss previously approved these changes Oct 28, 2025

View reviewed changes

fix(missing-template): fix missing template

f3f02d7

Signed-off-by: Anna Warno <[email protected]>

AWarno dismissed fgalko-oss’s stale review via f3f02d7 October 28, 2025 13:20

fgalko-oss approved these changes Oct 29, 2025

View reviewed changes

Merge branch 'main' into awarno/haproxy

f2fa8e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(haproxy-load-balancer): add haproxy load balancer #241

feat(haproxy-load-balancer): add haproxy load balancer #241

Uh oh!

AWarno commented Sep 29, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

AWarno commented Oct 23, 2025

Uh oh!

copy-pr-bot bot commented Oct 23, 2025

Uh oh!

AWarno commented Oct 23, 2025

Uh oh!

AWarno commented Oct 27, 2025

Uh oh!

AWarno commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

feat(haproxy-load-balancer): add haproxy load balancer #241

Are you sure you want to change the base?

feat(haproxy-load-balancer): add haproxy load balancer #241

Uh oh!

Conversation

AWarno commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enabling Multi-Instance Deployment with HAProxy

Why HAProxy

Alternative Solutions

Literature

TODO

Next Steps

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

AWarno commented Oct 23, 2025

Uh oh!

copy-pr-bot bot commented Oct 23, 2025

Uh oh!

AWarno commented Oct 23, 2025

Uh oh!

AWarno commented Oct 27, 2025

Uh oh!

AWarno commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

AWarno commented Sep 29, 2025 •

edited

Loading