Garbage collect obsolete vLLM instances by waltforme · Pull Request #428 · llm-d-incubation/llm-d-fast-model-actuation

waltforme · 2026-04-15T04:22:46Z

This PR carries out 'A2 to Q4' in #201 (comment).

This PR does more than that. The PR also covers corresponding instances that are not sleeping when an ISC changes --- they are garbage collected when unbounding.

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

…unbinding Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

Copilot

Pull request overview

This PR adds controller-side garbage collection for vLLM instances that become obsolete after an InferenceServerConfig (ISC) change, covering both sleeping instances (cleanup after ISC update) and awake instances (delete on unbind rather than sleeping). It also extends the E2E suite to validate these behaviors.

Changes:

Add async cleanup of obsolete sleeping launcher instances triggered by ISC spec updates.
On unbind, detect obsolete awake launcher instances and delete them instead of sleeping.
Extend E2E test-cases to validate obsolete-instance deletion scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
`test/e2e/test-cases.sh`	Adds helper + new E2E cases validating deletion of obsolete sleeping/awake instances.
`pkg/controller/dual-pods/inference-server.go`	Adds “delete obsolete instance on unbind” logic; threads `context.Context` into HTTP GET helpers.
`pkg/controller/dual-pods/controller.go`	Triggers obsolete sleeping-instance cleanup on ISC updates; adds internal state deletion helper.

Copilot · 2026-04-15T04:28:57Z

+			resp, err := http.Post(sleepURL, "", nil)
+			if err != nil {
+				return fmt.Errorf("failed to put provider %q to sleep, POST %s got error: %w", serverDat.ProvidingPodName, sleepURL, err)
+			}


ensureUnbound calls http.Post but doesn’t close resp.Body. This can leak connections/file descriptors over time in a long-running controller. Ensure resp.Body.Close() is called (typically via defer) after a successful POST, even when only checking StatusCode.

Suggested change

}

}

defer func() {

_, _ = io.Copy(io.Discard, resp.Body)

_ = resp.Body.Close()

}()

Copilot · 2026-04-15T04:28:57Z


+get_launcher_total_instances() {
+    local launcher_pod="$1"
+    kubectl exec -n "$NS" "$launcher_pod" -- python3 -c 'import json,urllib.request; print(json.load(urllib.request.urlopen("http://127.0.0.1:8001/v2/vllm/instances"))["total_instances"])'


get_launcher_total_instances uses a Python urlopen() with no timeout. Since it’s invoked inside expect loops, a hung HTTP call (or a stuck kubectl exec) can stall the entire E2E run indefinitely. Consider adding an explicit timeout (e.g., urlopen(..., timeout=...) and/or wrapping kubectl exec with timeout / --request-timeout).

Suggested change

kubectl exec -n "$NS" "$launcher_pod" -- python3 -c 'import json,urllib.request; print(json.load(urllib.request.urlopen("http://127.0.0.1:8001/v2/vllm/instances"))["total_instances"])'

kubectl --request-timeout=15s exec -n "$NS" "$launcher_pod" -- python3 -c 'import json,urllib.request; print(json.load(urllib.request.urlopen("http://127.0.0.1:8001/v2/vllm/instances", timeout=10))["total_instances"])'

Copilot · 2026-04-15T04:28:57Z

+
+# Mutate isc in a hash-relevant way so its sleeping instance becomes obsolete.
+kubectl patch inferenceserverconfig "$isc" -n "$NS" --type=merge -p='{"spec":{"modelServerConfig":{"options":"--model HuggingFaceTB/SmolLM2-360M-Instruct --served-model-name obsolete-after-update"}}}'
+
+expect '[ "$(get_launcher_total_instances "$launcher1")" == "$((old_total_instances - 1))" ]'
+


This test mutates the primary InferenceServerConfig ($isc) but never restores it. Even if the current script doesn’t reuse $isc, leaving cluster-scoped test objects mutated makes the E2E script less self-contained and can complicate later cases/debugging. Consider capturing the original options (as done for $isc2) and restoring them after the assertion.

Suggested change

# Mutate isc in a hash-relevant way so its sleeping instance becomes obsolete.

kubectl patch inferenceserverconfig "$isc" -n "$NS" --type=merge -p='{"spec":{"modelServerConfig":{"options":"--model HuggingFaceTB/SmolLM2-360M-Instruct --served-model-name obsolete-after-update"}}}'

expect '[ "$(get_launcher_total_instances "$launcher1")" == "$((old_total_instances - 1))" ]'

original_isc_options="$(kubectl get inferenceserverconfig "$isc" -n "$NS" -o jsonpath='{.spec.modelServerConfig.options}')"

restore_primary_isc_options() {

kubectl patch inferenceserverconfig "$isc" -n "$NS" --type=merge --patch "$(cat <<EOF

spec:

modelServerConfig:

options: >-

$original_isc_options

EOF

)"

}

trap 'restore_primary_isc_options' EXIT

# Mutate isc in a hash-relevant way so its sleeping instance becomes obsolete.

kubectl patch inferenceserverconfig "$isc" -n "$NS" --type=merge -p='{"spec":{"modelServerConfig":{"options":"--model HuggingFaceTB/SmolLM2-360M-Instruct --served-model-name obsolete-after-update"}}}'

expect '[ "$(get_launcher_total_instances "$launcher1")" == "$((old_total_instances - 1))" ]'

restore_primary_isc_options

trap - EXIT

Copilot · 2026-04-15T04:28:58Z

+	ctl.mutex.Lock()
+	defer ctl.mutex.Unlock()
+	nodeDat := ctl.nodeNameToData[nodeName]
+	if nodeDat == nil {
+		return
+	}
+	launcherDat := nodeDat.Launchers[launcherPodName]
+	if launcherDat == nil {
+		return
+	}
+	delete(launcherDat.Instances, instanceID)


deleteInstanceFromLauncherData takes ctl.mutex and then reads nodeDat.Launchers / mutates launcherDat.Instances, but those maps are also accessed elsewhere without ctl.mutex (they’re intended to be accessed only from the nodeItem.process call hierarchy). Locking ctl.mutex here doesn’t make those maps safe and can still race with concurrent node processing, especially now that cleanup can run async. Align on one synchronization strategy (e.g., only touch Launchers/Instances inside nodeItem processing, or guard them with a shared mutex used by all code paths).

Suggested change

ctl.mutex.Lock()

defer ctl.mutex.Unlock()

nodeDat := ctl.nodeNameToData[nodeName]

if nodeDat == nil {

return

}

launcherDat := nodeDat.Launchers[launcherPodName]

if launcherDat == nil {

return

}

delete(launcherDat.Instances, instanceID)

// Intentionally do not touch nodeDat.Launchers / launcherDat.Instances here.

// Those maps are maintained by the node-processing path and are also accessed

// without ctl.mutex elsewhere, so mutating them from async cleanup would race.

//

// The launcher instance has already been deleted from the cluster; controller

// state will be refreshed by the regular reconciliation flow.

_ = nodeName

_ = launcherPodName

_ = instanceID

Copilot · 2026-04-15T04:28:58Z

+		return false
+	}
+	instState, err := lClient.GetInstanceState(ctx, serverDat.InstanceID)
+	if err != nil {


maybeDeleteObsoleteInstance returns false for any GetInstanceState error. If the instance was already deleted (e.g., by concurrent cleanup) and the launcher returns 404, ensureUnbound will fall through to the normal sleep path and likely fail the /sleep POST (connection refused), causing retries and noisy errors. Treat an instance-not-found response as “already gone”: update internal launcher state accordingly and skip the sleep call.

Suggested change

if err != nil {

if err != nil {

if IsInstanceNotFoundError(err) {

nodeName, _ := getProviderNodeName(providingPod)

ctl.deleteInstanceFromLauncherData(nodeName, providingPod.Name, serverDat.InstanceID)

logger.V(2).Info("Instance already absent during unbinding; removed stale launcher state",

"instanceID", serverDat.InstanceID)

return true

}

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

MikeSpreitzer · 2026-04-17T04:49:39Z

This PR carries out 'A2 to Q4' in #201 (comment).

This PR does more than that. The PR also covers corresponding instances that are not sleeping when an ISC changes --- they are garbage collected when unbounding.

Good. Q4 and A2 are stated colloquially in terms of one relevant edge, but the proper framing of the question and answer are state based. What, if anything, should be done about a vllm instance that was created by a non-current revision of the relevant InferenceServerConfig? Answer: deletion if the vllm instance is sleeping.

MikeSpreitzer · 2026-04-17T05:06:48Z

-// There are two types of item in the controller's work queue.
+// There are three types of item in the controller's work queue.


This commentary about queuing was already outdated before this PR, not mentioning cmItem nor unboundLauncherItem nor nodeItem and its nested structure.

waltforme added 2 commits April 13, 2026 18:53

Delete obsolete sleeping instances

aeb4401

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

Add possible cleanup for obsolete and previously awake instance when …

10e072a

…unbinding Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

Copilot AI review requested due to automatic review settings April 15, 2026 04:22

waltforme marked this pull request as draft April 15, 2026 04:23

Copilot started reviewing on behalf of waltforme April 15, 2026 04:23 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

MikeSpreitzer mentioned this pull request Apr 15, 2026

E2E: probe for a 2-GPU node before running test cases #429

Merged

2 tasks

Make use of the controller's workqueue for inferenceserverconfigs

01602c3

Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>

waltforme marked this pull request as ready for review April 16, 2026 18:12

MikeSpreitzer added this to the 3 - System with model swapping and sleep/wake milestone Apr 17, 2026

MikeSpreitzer reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage collect obsolete vLLM instances#428

Garbage collect obsolete vLLM instances#428
waltforme wants to merge 3 commits intollm-d-incubation:mainfrom
waltforme:gc-insts

waltforme commented Apr 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

MikeSpreitzer commented Apr 17, 2026

Uh oh!

MikeSpreitzer Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-			}
+			}
+			defer func() {
+				_, _ = io.Copy(io.Discard, resp.Body)
+				_ = resp.Body.Close()
+			}()

	kubectl exec -n "$NS" "$launcher_pod" -- python3 -c 'import json,urllib.request; print(json.load(urllib.request.urlopen("http://127.0.0.1:8001/v2/vllm/instances"))["total_instances"])'
	kubectl --request-timeout=15s exec -n "$NS" "$launcher_pod" -- python3 -c 'import json,urllib.request; print(json.load(urllib.request.urlopen("http://127.0.0.1:8001/v2/vllm/instances", timeout=10))["total_instances"])'

-# Mutate isc in a hash-relevant way so its sleeping instance becomes obsolete.
-kubectl patch inferenceserverconfig "$isc" -n "$NS" --type=merge -p='{"spec":{"modelServerConfig":{"options":"--model HuggingFaceTB/SmolLM2-360M-Instruct --served-model-name obsolete-after-update"}}}'
-expect '[ "$(get_launcher_total_instances "$launcher1")" == "$((old_total_instances - 1))" ]'
+original_isc_options="$(kubectl get inferenceserverconfig "$isc" -n "$NS" -o jsonpath='{.spec.modelServerConfig.options}')"
+restore_primary_isc_options() {
+    kubectl patch inferenceserverconfig "$isc" -n "$NS" --type=merge --patch "$(cat <<EOF
+spec:
+  modelServerConfig:
+    options: >-
+      $original_isc_options
+EOF
+)"
+}
+trap 'restore_primary_isc_options' EXIT
+# Mutate isc in a hash-relevant way so its sleeping instance becomes obsolete.
+kubectl patch inferenceserverconfig "$isc" -n "$NS" --type=merge -p='{"spec":{"modelServerConfig":{"options":"--model HuggingFaceTB/SmolLM2-360M-Instruct --served-model-name obsolete-after-update"}}}'
+expect '[ "$(get_launcher_total_instances "$launcher1")" == "$((old_total_instances - 1))" ]'
+restore_primary_isc_options
+trap - EXIT

-	ctl.mutex.Lock()
-	defer ctl.mutex.Unlock()
-	nodeDat := ctl.nodeNameToData[nodeName]
-	if nodeDat == nil {
-		return
-	}
-	launcherDat := nodeDat.Launchers[launcherPodName]
-	if launcherDat == nil {
-		return
-	}
-	delete(launcherDat.Instances, instanceID)
+	// Intentionally do not touch nodeDat.Launchers / launcherDat.Instances here.
+	// Those maps are maintained by the node-processing path and are also accessed
+	// without ctl.mutex elsewhere, so mutating them from async cleanup would race.
+	//
+	// The launcher instance has already been deleted from the cluster; controller
+	// state will be refreshed by the regular reconciliation flow.
+	_ = nodeName
+	_ = launcherPodName
+	_ = instanceID

-	if err != nil {
+	if err != nil {
+		if IsInstanceNotFoundError(err) {
+			nodeName, _ := getProviderNodeName(providingPod)
+			ctl.deleteInstanceFromLauncherData(nodeName, providingPod.Name, serverDat.InstanceID)
+			logger.V(2).Info("Instance already absent during unbinding; removed stale launcher state",
+				"instanceID", serverDat.InstanceID)
+			return true
+		}

		// There are two types of item in the controller's work queue.
		// There are three types of item in the controller's work queue.

Conversation

waltforme commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer commented Apr 17, 2026

Uh oh!

MikeSpreitzer Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

waltforme commented Apr 15, 2026 •

edited

Loading