Skip to content

Enable reverse proxy in server-requesting pod#423

Open
delavet wants to merge 1 commit intollm-d-incubation:mainfrom
delavet:requester-reverse-proxy
Open

Enable reverse proxy in server-requesting pod#423
delavet wants to merge 1 commit intollm-d-incubation:mainfrom
delavet:requester-reverse-proxy

Conversation

@delavet
Copy link
Copy Markdown

@delavet delavet commented Apr 14, 2026

Based on our discussion in #363, I have split PR #363 and am resubmitting this PR to introduce the first part of the changes: adding a reverse proxy for server-requesting pods.
I have re-conducted the reverse proxy overhead experiments, primarily updating the experimental parameter configurations and adding monitoring and analysis of pod resource usage. For details, please refer to the documentation below.

https://docs.google.com/document/d/1qX4KtcTJEfdatgVmtsveOMfjm883MG-HW-e7ddmlqkA/edit?usp=sharing

The part on dynamic port allocation in PR #363 can be considered for addition after Milestone 3.

@delavet delavet force-pushed the requester-reverse-proxy branch from 1b84ee7 to 605acef Compare April 14, 2026 12:16
Comment thread pkg/spi/interface.go
// holds that starting position of a log chunk.
const LogStartPosParam = "startPos"

// InitProxy is the path for initializing the HTTP reverse proxy。
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a strange period at the end here...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Please change it to a regular period (character code 0x2E).

Comment thread cmd/requester/main.go
var ready atomic.Bool

var wg sync.WaitGroup
wg.Add(2)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must change to wg.Add(3)?

Copy link
Copy Markdown
Collaborator

@rubambiza rubambiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggested changes for debugging and code readability. Not major blockers.

url := fmt.Sprintf("http://%s:%s%s", requestingPod.Status.PodIP, adminPort, stubapi.InitProxy)
if err := doPostWithData(url, bytes.NewReader([]byte(fmt.Sprintf("{\"address\":\"%s\",\"port\":%d}",
launcherIP, desiredPort)))); err != nil {
logger.Error(err, "Failed to initialize requester proxy")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should include the requesting pod ip:port pair in the error for troubleshooting an environment with multiple requesters?

Copy link
Copy Markdown
Collaborator

@MikeSpreitzer MikeSpreitzer Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should include both host:port pairs.


// proxy is a lazy HTTP reverse proxy that only starts after receiving
// the first configuration request
type proxy struct {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type being proxy and having a field named proxy is a bit confusing. I'd like to suggest making the type more specific to what it does in FMA, though I don't have an appropriate name off the top of my head.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But I think that this is moot because it should be a TCP proxy rather than an HTTP proxy.

Comment thread pkg/spi/interface.go
// The request body should contain a JSON object with "address"
// and "port" fields. After successful initialization,
// the proxy will forward requests to the configured target server.
const InitProxy = "/v1/proxy/init"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question but is it possible for this single proxy server to route requests to the launcher mgmt API (/v2/vllm/instances....*) running on port 8001 to get status of the vllm instances, get logs etc in addition to the actual vllm's inference endpoints?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requester should not expose information about more than itself and its corresponding vllm instance.

Comment thread cmd/requester/main.go
defer wg.Done()
err := proxy.Run(ctx, proxyPort)
if err != nil {
logger.Error(err, "failed to start requester proxy server")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"start" or "run"?

@@ -0,0 +1,187 @@
/*
Copyright 2025 The llm-d Authors.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New files should be born with the current year here.

Comment on lines +36 to +39
type ConfigRequest struct {
Address string `json:"address"`
Port int `json:"port"`
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be declared in a common place where the dual-pods controller can use it too.

Comment on lines +65 to +66
WriteTimeout: 5 * time.Minute, // Long timeout for inference requests
IdleTimeout: 120 * time.Second,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these two timeouts should be configurable?

}

// singleton instance initialized once at startup
var instance = &proxy{}
Copy link
Copy Markdown
Collaborator

@MikeSpreitzer MikeSpreitzer Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The level of pointer indirection here is just unhelpful complexity. It could be just

var instance proxy

Comment on lines +124 to +128
// Try initialize server
if instance.initialized.Load() {
http.Error(w, "proxy already initialized", http.StatusConflict)
return
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is unnecessary complexity. Just delete it.

Comment thread docs/dual-pods.md
```json
{"address": "10.244.1.5", "port": 8005}
```

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proxy also implements GET /v1/proxy/init, and that should be documented too.

Comment thread docs/dual-pods.md

The reverse proxy operates as follows:

1. **Initialization**: When the dual-pods controller binds a
Copy link
Copy Markdown
Collaborator

@MikeSpreitzer MikeSpreitzer Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is also a GET on the same path, the design should be modified to conform to REST. Define a schema for the resource at /v1/proxy/init (or maybe just /v1/proxy or /v1/proxy/config?), and define PUT and GET to write and read this resource. GET when uninitialized returns HTTP status 404.

Comment thread docs/dual-pods.md
include those details.

#### Requester Reverse Proxy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP is too specific. It forecloses things like HTTPS and HTTP 2 or 3. The proxy should be simply a TCP proxy.

Copy link
Copy Markdown
Collaborator

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments on the study in https://docs.google.com/document/d/1qX4KtcTJEfdatgVmtsveOMfjm883MG-HW-e7ddmlqkA

"20 req/sec" is cited as a concurrency level, but that is actually a rate. Please be accurate and give a short sharp statement of the behavior of the benchmarking client(s).

Does every request go on a new TCP connection? If not then there is an additional thing to measure, the added latency in TCP connection setup. The measurement would be something like time from (a) client sending request to open connection to (b) client receiving first token from first request.

When doing performance studies we usually pay attention to the distribution of the result. Stuff like average, median, and high percentiles (90, 95, 99, 99.9). Of course, for high percentiles you need enough cycles to get statistically significant results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants