Enable reverse proxy in server-requesting pod by delavet · Pull Request #423 · llm-d-incubation/llm-d-fast-model-actuation

delavet · 2026-04-14T12:05:12Z

Based on our discussion in #363, I have split PR #363 and am resubmitting this PR to introduce the first part of the changes: adding a reverse proxy for server-requesting pods.
I have re-conducted the reverse proxy overhead experiments, primarily updating the experimental parameter configurations and adding monitoring and analysis of pod resource usage. For details, please refer to the documentation below.

https://docs.google.com/document/d/1qX4KtcTJEfdatgVmtsveOMfjm883MG-HW-e7ddmlqkA/edit?usp=sharing

The part on dynamic port allocation in PR #363 can be considered for addition after Milestone 3.

aavarghese · 2026-04-14T13:36:18Z

 // holds that starting position of a log chunk.
 const LogStartPosParam = "startPos"
+
+// InitProxy is the path for initializing the HTTP reverse proxy。


There's a strange period at the end here...

Yes. Please change it to a regular period (character code 0x2E).

rubambiza

A few suggested changes for debugging and code readability. Not major blockers.

rubambiza · 2026-04-15T13:43:35Z

+			url := fmt.Sprintf("http://%s:%s%s", requestingPod.Status.PodIP, adminPort, stubapi.InitProxy)
+			if err := doPostWithData(url, bytes.NewReader([]byte(fmt.Sprintf("{\"address\":\"%s\",\"port\":%d}",
+				launcherIP, desiredPort)))); err != nil {
+				logger.Error(err, "Failed to initialize requester proxy")


Perhaps we should include the requesting pod ip:port pair in the error for troubleshooting an environment with multiple requesters?

The error message should include both host:port pairs.

rubambiza · 2026-04-15T14:29:26Z

+
+// proxy is a lazy HTTP reverse proxy that only starts after receiving
+// the first configuration request
+type proxy struct {


The type being proxy and having a field named proxy is a bit confusing. I'd like to suggest making the type more specific to what it does in FMA, though I don't have an appropriate name off the top of my head.

I agree. But I think that this is moot because it should be a TCP proxy rather than an HTTP proxy.

MikeSpreitzer · 2026-04-15T19:10:53Z

+		defer wg.Done()
+		err := proxy.Run(ctx, proxyPort)
+		if err != nil {
+			logger.Error(err, "failed to start requester proxy server")


"start" or "run"?

IMO, “start” is used in error messages (describing the action of failing to start the server). “Run” as a function name indicates a blocking call—the server has started and continues running.
Both terms are semantically correct: the Run() function is invoked to start the server.

If the Run function really is blocking (and I mention this because Kubernetes code has a bad habit of using the wrong name) then saying "start" in the error message is, at best, risky code: it depends on Run only erring during the start phase --- and violations of this expectation are not checked (except by you, too late, when you are debugging). It would be correct to say "failed to run" in response to an error from Run.

MikeSpreitzer · 2026-04-15T19:50:29Z

@@ -0,0 +1,187 @@
+/*
+Copyright 2025 The llm-d Authors.


New files should be born with the current year here.

MikeSpreitzer · 2026-04-15T19:58:07Z

+type ConfigRequest struct {
+	Address string `json:"address"`
+	Port    int    `json:"port"`
+}


This should be declared in a common place where the dual-pods controller can use it too.

MikeSpreitzer · 2026-04-15T20:02:11Z

+		WriteTimeout: 5 * time.Minute, // Long timeout for inference requests
+		IdleTimeout:  120 * time.Second,


Maybe these two timeouts should be configurable?

MikeSpreitzer · 2026-04-15T20:04:36Z

+}
+
+// singleton instance initialized once at startup
+var instance = &proxy{}


The level of pointer indirection here is just unhelpful complexity. It could be just

var instance proxy

MikeSpreitzer · 2026-04-15T20:14:11Z

+	// Try initialize server
+	if instance.initialized.Load() {
+		http.Error(w, "proxy already initialized", http.StatusConflict)
+		return
+	}


This code is unnecessary complexity. Just delete it.

MikeSpreitzer · 2026-04-15T20:16:25Z

+   ```json
+   {"address": "10.244.1.5", "port": 8005}
+   ```
+


The proxy also implements GET /v1/proxy/init, and that should be documented too.

MikeSpreitzer · 2026-04-15T20:18:18Z

+
+The reverse proxy operates as follows:
+
+1. **Initialization**: When the dual-pods controller binds a


Since there is also a GET on the same path, the design should be modified to conform to REST. Define a schema for the resource at /v1/proxy/init (or maybe just /v1/proxy or /v1/proxy/config?), and define PUT and GET to write and read this resource. GET when uninitialized returns HTTP status 404.

MikeSpreitzer · 2026-04-15T20:23:14Z

 include those details.

+#### Requester Reverse Proxy
+


HTTP is too specific. It forecloses things like HTTPS and HTTP 2 or 3. The proxy should be simply a TCP proxy.

MikeSpreitzer

I have some comments on the study in https://docs.google.com/document/d/1qX4KtcTJEfdatgVmtsveOMfjm883MG-HW-e7ddmlqkA

"20 req/sec" is cited as a concurrency level, but that is actually a rate. Please be accurate and give a short sharp statement of the behavior of the benchmarking client(s).

Does every request go on a new TCP connection? If not then there is an additional thing to measure, the added latency in TCP connection setup. The measurement would be something like time from (a) client sending request to open connection to (b) client receiving first token from first request.

When doing performance studies we usually pay attention to the distribution of the result. Stuff like average, median, and high percentiles (90, 95, 99, 99.9). Of course, for high percentiles you need enough cycles to get statistically significant results.

delavet · 2026-04-21T02:29:47Z

I have some comments on the study in https://docs.google.com/document/d/1qX4KtcTJEfdatgVmtsveOMfjm883MG-HW-e7ddmlqkA

"20 req/sec" is cited as a concurrency level, but that is actually a rate. Please be accurate and give a short sharp statement of the behavior of the benchmarking client(s).

Does every request go on a new TCP connection? If not then there is an additional thing to measure, the added latency in TCP connection setup. The measurement would be something like time from (a) client sending request to open connection to (b) client receiving first token from first request.

When doing performance studies we usually pay attention to the distribution of the result. Stuff like average, median, and high percentiles (90, 95, 99, 99.9). Of course, for high percentiles you need enough cycles to get statistically significant results.

Thanks! I’ve just refactored the proxy into a pure TCP proxy. I still need a bit more time to set up a new experiment and investigate these issues.

aavarghese · 2026-04-21T19:42:22Z

Can we add a new test case to our e2e suite here right after Multiple Instances Share One Launcher?

Claude generated test below:

# ---------------------------------------------------------------------------
# Reverse Proxy Initialization and Forwarding
# ---------------------------------------------------------------------------

intro_case Reverse Proxy Initialization and Forwarding

# Verify the proxy is initialized and pointing at launcher1's IP
kubectl port-forward pod/"$req3" 28091:8081 -n "$NS" &
PF_SPI_PID=$!
sleep 2

proxy_resp=$(curl -sf "http://localhost:28091/v1/proxy/init" 2>/dev/null || echo "")
kill "$PF_SPI_PID" 2>/dev/null || true
wait "$PF_SPI_PID" 2>/dev/null || true

if ! echo "$proxy_resp" | grep -q "proxying to"; then
    echo "ERROR: expected proxy to be initialized, got: '$proxy_resp'" >&2
    exit 1
fi
echo "Proxy is initialized: $proxy_resp"

launcher1_ip=$(kubectl get pod "$launcher1" -n "$NS" -o jsonpath='{.status.podIP}')
if ! echo "$proxy_resp" | grep -qF "$launcher1_ip"; then
    echo "ERROR: proxy target does not contain launcher IP $launcher1_ip: '$proxy_resp'" >&2
    exit 1
fi
echo "Proxy target matches launcher IP: $launcher1_ip"

# On OpenShift the launcher runs a real vLLM — verify traffic actually flows
# through the TCP proxy port (8082) to the launcher's vLLM /health endpoint.
if [ "$E2E_PLATFORM" = "openshift" ]; then
    kubectl port-forward pod/"$req3" 28092:8082 -n "$NS" &
    PF_PROXY_PID=$!
    sleep 2

    health_status=$(curl -s -o /dev/null -w "%{http_code}" \
        "http://localhost:28092/health" 2>/dev/null || echo "000")
    kill "$PF_PROXY_PID" 2>/dev/null || true
    wait "$PF_PROXY_PID" 2>/dev/null || true

    if [ "$health_status" != "200" ]; then
        echo "ERROR: vLLM /health via proxy port returned $health_status (expected 200)" >&2
        exit 1
    fi
    echo "Proxy forwarding verified: /health via proxy port → 200"
fi

cheer Successful reverse proxy initialization and forwarding

enable reverse proxy in server-requesting pod

605acef

delavet force-pushed the requester-reverse-proxy branch from 1b84ee7 to 605acef Compare April 14, 2026 12:16

aavarghese reviewed Apr 14, 2026

View reviewed changes

Comment thread cmd/requester/main.go Outdated

aavarghese added this to the 3 - System with model swapping and sleep/wake milestone Apr 14, 2026

rubambiza reviewed Apr 15, 2026

View reviewed changes

aavarghese reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/spi/interface.go

MikeSpreitzer reviewed Apr 15, 2026

View reviewed changes

refactor the reverse proxy to TCP proxy

71f35cb

fix linting

683b802

		WriteTimeout: 5 * time.Minute, // Long timeout for inference requests
		IdleTimeout: 120 * time.Second,


		The reverse proxy operates as follows:

		1. Initialization: When the dual-pods controller binds a

Conversation

delavet commented Apr 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rubambiza left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer left a comment

Choose a reason for hiding this comment

Uh oh!

delavet commented Apr 21, 2026

Uh oh!

aavarghese commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MikeSpreitzer Apr 15, 2026 •

edited

Loading

MikeSpreitzer Apr 22, 2026 •

edited

Loading

MikeSpreitzer Apr 15, 2026 •

edited

Loading

MikeSpreitzer Apr 15, 2026 •

edited

Loading