Skip to content

Self-annotation on launcher pods to signal hosted instance changes#391

Merged
waltforme merged 11 commits intollm-d-incubation:mainfrom
waltforme:self-annotate
Mar 31, 2026
Merged

Self-annotation on launcher pods to signal hosted instance changes#391
waltforme merged 11 commits intollm-d-incubation:mainfrom
waltforme:self-annotate

Conversation

@waltforme
Copy link
Copy Markdown
Collaborator

@waltforme waltforme commented Mar 27, 2026

This PR starts to address stage 1 of #375.

Copilot AI review requested due to automatic review settings March 27, 2026 14:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a launcher-pod sidecar that periodically polls the launcher’s /v2/vllm/instances endpoint and publishes a signature of the hosted vLLM instance state onto the enclosing Pod via an annotation, enabling the dual-pods controller to observe instance changes through Pod watch events (stage 1 of #375).

Changes:

  • Inject a vllm-instance-notifier sidecar into launcher pods during pod construction.
  • Add launcher_pod_notifier.py to compute a stable signature of launcher instance state and patch it onto the pod annotation.
  • Update launcher Dockerfiles and e2e RBAC so the notifier script is present in images and can patch pod annotations in e2e.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/e2e/run-launcher-based.sh Grants the launcher test ServiceAccount pods/get + pods/patch so the notifier can self-annotate in e2e.
pkg/controller/utils/pod-helper.go Unconditionally injects the notifier sidecar into launcher pods.
inference_server/launcher/launcher_pod_notifier.py Implements polling + signature computation + pod annotation patching logic.
dockerfiles/Dockerfile.launcher.cpu Copies the notifier script into the launcher CPU image.
dockerfiles/Dockerfile.launcher.benchmark Copies the notifier script into the launcher benchmark image.

Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
Comment on lines +65 to +86
def canonicalize_launcher_state(payload: dict[str, Any]) -> dict[str, Any]:
instances = payload.get("instances", [])
canonical_instances: list[dict[str, str]] = []
for instance in instances:
if not isinstance(instance, dict):
raise ValueError(f"unexpected instance entry: {instance!r}")
instance_id = str(instance.get("instance_id", ""))
status = str(instance.get("status", ""))
canonical_instances.append({"instance_id": instance_id, "status": status})
canonical_instances.sort(key=lambda item: (item["instance_id"], item["status"]))
return {
"total_instances": int(payload.get("total_instances", len(canonical_instances))),
"running_instances": int(payload.get("running_instances", 0)),
"instances": canonical_instances,
}


def compute_signature(payload: dict[str, Any]) -> str:
canonical = canonicalize_launcher_state(payload)
blob = json.dumps(canonical, sort_keys=True, separators=(",", ":")).encode("utf-8")
return hashlib.sha256(blob).hexdigest()

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new notifier introduces non-trivial logic (canonicalization + signature computation + publish-on-change behavior) but there are no accompanying unit tests. Given there is already a inference_server/launcher/tests/test_launcher.py suite, adding targeted tests for canonicalize_launcher_state/compute_signature (ordering stability, signature changes on status change, invalid payload handling) would help prevent regressions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that additional test cases are indeed warranted. I would not complain if they are added in a follow-on PR.

Comment thread inference_server/launcher/launcher_pod_notifier.py
Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
)

const (
controllerQueuePerItemRetryMaxDelay = 20 * time.Second
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?
Would a larger value be adequate?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now more events from the launchers are informed so the retry interval increases faster. This is needed to reduce the slack between 'vLLM instance ready' and 'controller's next retry'.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need an even more aggressive solution to that problem. This latency is on the critical path that we want to minimize. OK if we address this in a later PR.

Comment thread pkg/controller/utils/pod-helper.go Outdated
Comment thread pkg/controller/utils/pod-helper.go Outdated
},
}

if idx >= 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather define this case as a user error, to be reflected in .status.errors of the LauncherConfig.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make later in another PR?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with me.

Comment thread test/e2e/run-launcher-based.sh
Copy link
Copy Markdown
Collaborator

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some independent comments.

@MikeSpreitzer
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@waltforme
Copy link
Copy Markdown
Collaborator Author

Force-push to 4d313ac was a rebase onto main.

@waltforme
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@MikeSpreitzer
Copy link
Copy Markdown
Collaborator

As demonstrated in https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23663224910/job/68939317606#step:18:860 , the pod log dumping job step is going to need enhancing so that the logs of both containers in the launcher Pod are dumped.

@waltforme
Copy link
Copy Markdown
Collaborator Author

waltforme commented Mar 30, 2026

As demonstrated in https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23663224910/job/68939317606#step:18:860 , the pod log dumping job step is going to need enhancing so that the logs of both containers in the launcher Pod are dumped.

Enhanced.

Comment thread .github/workflows/ci-e2e-openshift.yaml
Comment thread .github/workflows/ci-e2e-openshift.yaml Outdated
Comment thread test/e2e/mkobjs-openshift.sh Outdated
Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
@waltforme
Copy link
Copy Markdown
Collaborator Author

Force-push to 3f4b6cb was due to a rebase onto main.

@MikeSpreitzer
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@MikeSpreitzer
Copy link
Copy Markdown
Collaborator

What is causing this log noise? https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23764698262/job/69242210800#step:18:1625

/usr/lib/python3/dist-packages/blinker/base.py:96: SyntaxWarning: invalid escape sequence '*' sender= as a single positional argument and any **kwargs that
?

@MikeSpreitzer
Copy link
Copy Markdown
Collaborator

It would be better if this error message indicated which URL was involved. https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23764698262/job/69242210800#step:18:1632

WARNING launcher_pod_notifier: Notifier loop failed: <urlopen error [Errno 111] Connection refused>

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
@waltforme
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@waltforme
Copy link
Copy Markdown
Collaborator Author

Turns out that Notifier loop failed because the sidecar container checks the instances before the 'inference-server' container gets ready to serve.

The SyntaxWarning noise was not reproduced in the kind based test though.

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
@waltforme
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

Comment thread inference_server/launcher/launcher_pod_notifier.py
@waltforme
Copy link
Copy Markdown
Collaborator Author

Looks like the noisy message has been there before this PR, e.g.
https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23666793449/job/68951201456#step:18:877

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
Comment thread inference_server/launcher/launcher_pod_notifier.py Outdated
@MikeSpreitzer
Copy link
Copy Markdown
Collaborator

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
@waltforme
Copy link
Copy Markdown
Collaborator Author

Re https://github.com/llm-d-incubation/llm-d-fast-model-actuation/actions/runs/23776211419/job/69278559364?pr=391#step:9:21 - IMHO that linter is more picky than we need.

It always is IMHO.

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
@waltforme
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

Copy link
Copy Markdown
Collaborator

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good enough to merge.

@waltforme waltforme merged commit 162829c into llm-d-incubation:main Mar 31, 2026
25 checks passed
@waltforme waltforme deleted the self-annotate branch March 31, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants