Self-annotation on launcher pods to signal hosted instance changes by waltforme · Pull Request #391 · llm-d-incubation/llm-d-fast-model-actuation

waltforme · 2026-03-27T14:58:44Z

This PR starts to address stage 1 of #375.

Copilot

Pull request overview

Adds a launcher-pod sidecar that periodically polls the launcher’s /v2/vllm/instances endpoint and publishes a signature of the hosted vLLM instance state onto the enclosing Pod via an annotation, enabling the dual-pods controller to observe instance changes through Pod watch events (stage 1 of #375).

Changes:

Inject a vllm-instance-notifier sidecar into launcher pods during pod construction.
Add launcher_pod_notifier.py to compute a stable signature of launcher instance state and patch it onto the pod annotation.
Update launcher Dockerfiles and e2e RBAC so the notifier script is present in images and can patch pod annotations in e2e.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/e2e/run-launcher-based.sh	Grants the launcher test ServiceAccount `pods/get` + `pods/patch` so the notifier can self-annotate in e2e.
pkg/controller/utils/pod-helper.go	Unconditionally injects the notifier sidecar into launcher pods.
inference_server/launcher/launcher_pod_notifier.py	Implements polling + signature computation + pod annotation patching logic.
dockerfiles/Dockerfile.launcher.cpu	Copies the notifier script into the launcher CPU image.
dockerfiles/Dockerfile.launcher.benchmark	Copies the notifier script into the launcher benchmark image.

Copilot · 2026-03-27T15:04:33Z

+def canonicalize_launcher_state(payload: dict[str, Any]) -> dict[str, Any]:
+    instances = payload.get("instances", [])
+    canonical_instances: list[dict[str, str]] = []
+    for instance in instances:
+        if not isinstance(instance, dict):
+            raise ValueError(f"unexpected instance entry: {instance!r}")
+        instance_id = str(instance.get("instance_id", ""))
+        status = str(instance.get("status", ""))
+        canonical_instances.append({"instance_id": instance_id, "status": status})
+    canonical_instances.sort(key=lambda item: (item["instance_id"], item["status"]))
+    return {
+        "total_instances": int(payload.get("total_instances", len(canonical_instances))),
+        "running_instances": int(payload.get("running_instances", 0)),
+        "instances": canonical_instances,
+    }
+
+
+def compute_signature(payload: dict[str, Any]) -> str:
+    canonical = canonicalize_launcher_state(payload)
+    blob = json.dumps(canonical, sort_keys=True, separators=(",", ":")).encode("utf-8")
+    return hashlib.sha256(blob).hexdigest()
+


This new notifier introduces non-trivial logic (canonicalization + signature computation + publish-on-change behavior) but there are no accompanying unit tests. Given there is already a inference_server/launcher/tests/test_launcher.py suite, adding targeted tests for canonicalize_launcher_state/compute_signature (ordering stability, signature changes on status change, invalid payload handling) would help prevent regressions.

I think that additional test cases are indeed warranted. I would not complain if they are added in a follow-on PR.

MikeSpreitzer · 2026-03-27T18:54:08Z

 )

+const (
+	controllerQueuePerItemRetryMaxDelay = 20 * time.Second


Why is this needed?
Would a larger value be adequate?

Now more events from the launchers are informed so the retry interval increases faster. This is needed to reduce the slack between 'vLLM instance ready' and 'controller's next retry'.

I think that we need an even more aggressive solution to that problem. This latency is on the critical path that we want to minimize. OK if we address this in a later PR.

MikeSpreitzer · 2026-03-27T18:58:58Z

+		},
+	}
+
+	if idx >= 0 {


I would rather define this case as a user error, to be reflected in .status.errors of the LauncherConfig.

Make later in another PR?

OK with me.

MikeSpreitzer

I left some independent comments.

MikeSpreitzer · 2026-03-27T19:01:56Z

/ok-to-test

github-actions · 2026-03-27T19:02:06Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

waltforme · 2026-03-27T19:12:10Z

Force-push to 4d313ac was a rebase onto main.

waltforme · 2026-03-27T19:12:31Z

/ok-to-test

github-actions · 2026-03-27T19:12:41Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

MikeSpreitzer · 2026-03-27T19:27:48Z

As demonstrated in https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23663224910/job/68939317606#step:18:860 , the pod log dumping job step is going to need enhancing so that the logs of both containers in the launcher Pod are dumped.

waltforme · 2026-03-30T15:30:24Z

As demonstrated in https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23663224910/job/68939317606#step:18:860 , the pod log dumping job step is going to need enhancing so that the logs of both containers in the launcher Pod are dumped.

Enhanced.

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

waltforme · 2026-03-30T19:31:11Z

Force-push to 3f4b6cb was due to a rebase onto main.

MikeSpreitzer · 2026-03-30T19:55:19Z

/ok-to-test

github-actions · 2026-03-30T19:55:28Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

MikeSpreitzer · 2026-03-30T20:25:00Z

What is causing this log noise? https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23764698262/job/69242210800#step:18:1625

/usr/lib/python3/dist-packages/blinker/base.py:96: SyntaxWarning: invalid escape sequence '*' sender= as a single positional argument and any **kwargs that
?

MikeSpreitzer · 2026-03-30T20:25:53Z

It would be better if this error message indicated which URL was involved. https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23764698262/job/69242210800#step:18:1632

WARNING launcher_pod_notifier: Notifier loop failed: <urlopen error [Errno 111] Connection refused>

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

waltforme · 2026-03-30T21:46:10Z

/ok-to-test

github-actions · 2026-03-30T21:46:20Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

waltforme · 2026-03-31T00:20:32Z

Turns out that Notifier loop failed because the sidecar container checks the instances before the 'inference-server' container gets ready to serve.

The SyntaxWarning noise was not reproduced in the kind based test though.

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

waltforme · 2026-03-31T01:02:57Z

/ok-to-test

github-actions · 2026-03-31T01:03:05Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

This reverts commit 4d87e7e.

waltforme · 2026-03-31T01:26:58Z

Looks like the noisy message has been there before this PR, e.g.
https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23666793449/job/68951201456#step:18:877

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

MikeSpreitzer · 2026-03-31T01:38:53Z

Re https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23776211419/job/69278559364?pr=391#step:9:21 - IMHO that linter is more picky than we need.

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

waltforme · 2026-03-31T01:42:08Z

Re https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23776211419/job/69278559364?pr=391#step:9:21 - IMHO that linter is more picky than we need.

It always is IMHO.

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

waltforme · 2026-03-31T01:50:31Z

/ok-to-test

github-actions · 2026-03-31T01:50:42Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

MikeSpreitzer

This is good enough to merge.

Copilot AI review requested due to automatic review settings March 27, 2026 14:58

Copilot started reviewing on behalf of waltforme March 27, 2026 14:59 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes