Skip to content

Commit 8247e5a

Browse files
authored
Merge pull request #258 from zdtsw-forking/sync/upstream-0.8.0-rc2
sync: upstream v0.9.0-rc2
2 parents 4e12850 + 33eb8c8 commit 8247e5a

13 files changed

Lines changed: 125 additions & 82 deletions

File tree

.github/ISSUE_TEMPLATE/new-release.md

Lines changed: 67 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,26 @@ This document defines the process for releasing llm-d-router.
2525
`refs/tags/v*` restricts who can push release tags, which is what triggers
2626
the release build.
2727

28-
1. Set the required environment variables based on the expected release number:
28+
1. Choose whether you are releasing a release candidate or an official release, and set the environment variables accordingly:
2929

30-
```shell
31-
export MAJOR=0
32-
export MINOR=1
33-
export PATCH=0
34-
export REMOTE=origin
35-
```
30+
- For a **Release Candidate** (e.g. `v0.9.0-rc.1`):
31+
```shell
32+
export VERSION=v0.9.0-rc.1
33+
export BRANCH_VERSION=0.9
34+
export REMOTE=origin
35+
```
3636

37-
1. If creating a release candidate, set the release candidate number.
37+
- For an **Official Release** (e.g. `v0.9.0`):
38+
```shell
39+
export VERSION=v0.9.0
40+
export BRANCH_VERSION=0.9
41+
export REMOTE=origin
42+
```
43+
44+
1. (Optional) If the latency predictor release version does **not** align with the router version, also set the expected tag (refer to the [latency predictor releases] to find the latest valid release tag):
3845

3946
```shell
40-
export RC=1
47+
export LATENCY_PREDICTOR_TAG=v0.8.0-rc.1
4148
```
4249
1. If needed, clone the llm-d-router [repo].
4350

@@ -53,54 +60,46 @@ This document defines the process for releasing llm-d-router.
5360
5461
1. Release Branch Handling:
5562
- For a Release Candidate:
56-
Create a new release branch from the `main` branch. The branch should be named `release-${MAJOR}.${MINOR}`, for example, `release-0.1`:
63+
Create a new release branch from the `main` branch. The branch should be named `release-${BRANCH_VERSION}`, for example, `release-0.9`:
5764
5865
```shell
59-
git checkout -b release-${MAJOR}.${MINOR}
66+
git checkout -b release-${BRANCH_VERSION}
6067
```
6168
6269
- For a Major, Minor or Patch Release:
6370
A release branch should already exist. In this case, check out the existing branch:
6471
6572
```shell
66-
git checkout release-${MAJOR}.${MINOR} ${REMOTE}/release-${MAJOR}.${MINOR}
73+
git checkout release-${BRANCH_VERSION} ${REMOTE}/release-${BRANCH_VERSION}
6774
```
6875
69-
1. Push your release branch to the llm-d-router remote.
76+
1. By default, `LATENCY_PREDICTOR_TAG` in the `Makefile` resolves from the router release tag (via `BUILD_REF`). If the latency predictor tag does **not** align with the router version, update the default value of `LATENCY_PREDICTOR_TAG` in the `Makefile` to match your exported `${LATENCY_PREDICTOR_TAG}`.
77+
Commit the change (if modified):
7078
7179
```shell
72-
git push ${REMOTE} release-${MAJOR}.${MINOR}
80+
# Update LATENCY_PREDICTOR_TAG ?= vX.Y.Z in Makefile
81+
git commit -a -s -m "release: set LATENCY_PREDICTOR_TAG to ${LATENCY_PREDICTOR_TAG}"
7382
```
7483
75-
### Tag commit and trigger image build
76-
77-
1. Tag the head of your release branch with the sem-ver release version.
78-
79-
For a release candidate:
80-
81-
```shell
82-
git tag -s -a v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} -m "llm-d-router v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} Release Candidate"
83-
```
84-
85-
For a major, minor or patch release:
84+
1. Push your release branch to the llm-d-router remote.
8685
8786
```shell
88-
git tag -s -a v${MAJOR}.${MINOR}.${PATCH} -m "llm-d-router v${MAJOR}.${MINOR}.${PATCH} Release"
87+
git push ${REMOTE} release-${BRANCH_VERSION}
8988
```
9089
91-
1. Push the tag to the llm-d-router repo.
90+
### Tag commit and trigger image build
9291
93-
For a release candidate:
92+
1. Tag the head of your release branch with the version:
9493
95-
```shell
96-
git push ${REMOTE} v${MAJOR}.${MINOR}.${PATCH}-rc.${RC}
97-
```
94+
```shell
95+
git tag -s -a ${VERSION} -m "llm-d-router ${VERSION} Release"
96+
```
9897
99-
For a major, minor or patch release:
98+
1. Push the tag to the llm-d-router repo:
10099
101-
```shell
102-
git push ${REMOTE} v${MAJOR}.${MINOR}.${PATCH}
103-
```
100+
```shell
101+
git push ${REMOTE} ${VERSION}
102+
```
104103
105104
1. Pushing the tag triggers CI action to build and publish the EPP image (`ghcr.io/llm-d/llm-d-router-endpoint-picker`) and sidecar image (`ghcr.io/llm-d/llm-d-router-disagg-sidecar`) to the [ghcr registry].
106105
1. Verify the [CI release workflow] completed successfully before proceeding.
@@ -111,21 +110,49 @@ This document defines the process for releasing llm-d-router.
111110
1. Create a [new release]:
112111
1. Choose the tag that you created for the release.
113112
1. Use the tag as the release title, e.g. `v0.1.0`.
114-
1. Click "Generate release notes" and preview the release body.
115-
1. Ensure the release body includes: highlights, breaking changes (if any), known issues, and upgrade steps.
113+
1. Click "Generate release notes" to auto-populate the list of PRs and contributors.
114+
1. Summarize the release notes using an LLM of your choice (e.g., Gemini, Copilot, ChatGPT). Provide the newly compiled release notes block from `RELEASE-NOTES.md` (or the unreleased fragments in `release-notes.d/unreleased/`) with the following prompt:
115+
116+
```text
117+
Please summarize these release notes into three clear sections:
118+
1. Highlights (key features, performance wins, bug fixes)
119+
2. Upgrade Steps & Deprecations (configuration changes, deprecated flags/metrics)
120+
3. Known Issues (if any, otherwise omit)
121+
```
122+
123+
Review the generated content, edit it if necessary to ensure accuracy, and then copy and prepend this summary at the very top of the release description box on GitHub.
116124
1. If this is a release candidate, select the "This is a pre-release" checkbox.
117125
1. If you find any bugs in this process, create an [issue].
118126
119127
## Announce the Release
120128
121129
Use the following steps to announce the release.
122130
123-
1. Send an announcement email to `llm-d-contributors@googlegroups.com` with the subject:
131+
1. Generate the announcement email content by running the following block in your terminal (make sure `${VERSION}` is set in your current shell):
124132
125133
```shell
126-
[ANNOUNCE] llm-d-router v${MAJOR}.${MINOR}.${PATCH} is released
134+
cat <<EOF
135+
Subject: [ANNOUNCE] llm-d-router ${VERSION} is released
136+
137+
Hi all,
138+
139+
We are pleased to announce the release of llm-d-router ${VERSION}!
140+
141+
### Container Images
142+
* Endpoint Picker: ghcr.io/llm-d/llm-d-router-endpoint-picker:${VERSION}
143+
* Disaggregated Sidecar: ghcr.io/llm-d/llm-d-router-disagg-sidecar:${VERSION}
144+
145+
### Helm Charts (OCI)
146+
* Standalone Chart: oci://ghcr.io/llm-d/charts/llm-d-router-standalone (version ${VERSION})
147+
* Gateway Chart: oci://ghcr.io/llm-d/charts/llm-d-router-gateway (version ${VERSION})
148+
149+
### Release Notes
150+
For more details, please see the GitHub release notes: https://github.com/llm-d/llm-d-router/releases/tag/${VERSION}
151+
EOF
127152
```
128153
154+
1. Copy the generated subject and body, and send an email to `llm-d-contributors@googlegroups.com`.
155+
129156
1. Add a link to the final release in this issue.
130157
131158
1. Close this issue.
@@ -135,3 +162,4 @@ Use the following steps to announce the release.
135162
[new release]: https://github.com/llm-d/llm-d-router/releases/new
136163
[issue]: https://github.com/llm-d/llm-d-router/issues/new/choose
137164
[CI release workflow]: https://github.com/llm-d/llm-d-router/actions/workflows/ci-release.yaml
165+
[latency predictor releases]: https://github.com/orgs/llm-d/packages?repo_name=llm-d-latency-predictor

Makefile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ GIT_COMMIT_SHA ?= $(shell git rev-parse HEAD 2>/dev/null)
5555
# Match only root-level release tags (v[0-9]*) so submodule tags don't leak into image versions.
5656
ROOT_RELEASE_TAG_MATCH ?= v[0-9]*
5757
BUILD_REF ?= $(shell git describe --tags --match '$(ROOT_RELEASE_TAG_MATCH)' --abbrev=0 2>/dev/null)
58+
LATENCY_PREDICTOR_TAG ?= $(or $(EXTRA_TAG),$(BUILD_REF),latest)
5859

5960
# Host directories for Go module and build caches, bind-mounted into the builder container.
6061
GO_MOD_CACHE_VOL ?= $(HOME)/.cache/llm-d-gomodcache
@@ -214,7 +215,7 @@ check-latest-tags-strict: ## Check ':latest' image tags in YAML (strict; fails o
214215

215216
.PHONY: presubmit
216217
presubmit: LINT_NEW_ONLY=true
217-
presubmit: git-branch-check signed-commits-check go-mod-check format lint vulncheck check-latest-tags
218+
presubmit: git-branch-check signed-commits-check go-mod-check format lint vulncheck check-latest-tags-strict
218219

219220
.PHONY: git-branch-check
220221
git-branch-check:
@@ -348,7 +349,7 @@ verify-helm-charts: helm-install kubectl-validate ## Render and validate Helm ch
348349
.PHONY: helm-push
349350
helm-push: yq helm-install ## Package and push a specified Helm chart. Usage: make helm-push CHART=<chart_name>
350351
@if [ -z "$(CHART)" ]; then echo "Error: CHART variable is required (e.g. CHART=llm-d-router-standalone)"; exit 1; fi
351-
CHART=$(CHART) EXTRA_TAG="$(EXTRA_TAG)" CHART_SUFFIX="$(CHART_SUFFIX)" EPP_RELEASE_IMAGE_REPOSITORY="$(EPP_RELEASE_IMAGE_REPOSITORY)" YQ="$(YQ)" HELM="$(HELM)" ./hack/push-chart.sh
352+
CHART=$(CHART) EXTRA_TAG="$(EXTRA_TAG)" CHART_SUFFIX="$(CHART_SUFFIX)" EPP_RELEASE_IMAGE_REPOSITORY="$(EPP_RELEASE_IMAGE_REPOSITORY)" LATENCY_PREDICTOR_TAG="$(LATENCY_PREDICTOR_TAG)" YQ="$(YQ)" HELM="$(HELM)" ./hack/push-chart.sh
352353

353354
.PHONY: helm-push-gateway
354355
helm-push-gateway: ## Package and push the llm-d-router-gateway Helm chart.

config/manifests/sglang/gpu-deployment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ spec:
1717
spec:
1818
containers:
1919
- name: sglang
20-
image: lmsysorg/sglang:latest
20+
image: lmsysorg/sglang:v0.5.12
2121
command: ["python3", "-m", "sglang.launch_server"]
2222
args:
2323
- "--model-path=Qwen/Qwen3-32B"

config/manifests/vllm/gpu-deployment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ spec:
1515
spec:
1616
containers:
1717
- name: vllm
18-
image: "vllm/vllm-openai:latest"
18+
image: "vllm/vllm-openai:v0.21.0"
1919
imagePullPolicy: Always
2020
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
2121
args:

config/manifests/vllm/gpu-grpc-deployment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ spec:
1515
spec:
1616
containers:
1717
- name: vllm-server
18-
image: vllm/vllm-openai:latest
18+
image: vllm/vllm-openai:v0.21.0
1919
command: ["python3", "-m", "vllm.entrypoints.grpc_server"]
2020
args:
2121
- "--model"

config/manifests/vllm/gpu-multilora-deployment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ spec:
1515
spec:
1616
containers:
1717
- name: vllm
18-
image: "vllm/vllm-openai:latest"
18+
image: "vllm/vllm-openai:v0.21.0"
1919
imagePullPolicy: Always
2020
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
2121
args:

config/manifests/vllm/gpu-prefix-cache-deployment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ spec:
1515
spec:
1616
containers:
1717
- name: vllm
18-
image: "vllm/vllm-openai:latest"
18+
image: "vllm/vllm-openai:v0.21.0"
1919
imagePullPolicy: Always
2020
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
2121
args:

pkg/epp/framework/plugins/requestcontrol/dataproducer/preciseprefixcache/producer.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ func (p *Producer) Produce(ctx context.Context,
229229
)
230230
defer span.End()
231231

232-
span.SetAttributes(attribute.Int("llm_d.producer.candidate_endpoints", len(endpoints)))
232+
span.SetAttributes(attribute.Int("llm_d.epp.producer.candidate_endpoints", len(endpoints)))
233233
if request != nil {
234234
if request.TargetModel != "" {
235235
span.SetAttributes(attribute.String("gen_ai.request.model", request.TargetModel))
@@ -245,7 +245,7 @@ func (p *Producer) Produce(ctx context.Context,
245245
return fmt.Errorf("failed to compute block keys: %w", err)
246246
}
247247
if len(perPromptKeys) == 0 {
248-
span.SetAttributes(attribute.String("llm_d.producer.result", "skipped_no_tokens"))
248+
span.SetAttributes(attribute.String("llm_d.epp.producer.result", "skipped_no_tokens"))
249249
return nil
250250
}
251251

@@ -310,8 +310,8 @@ func (p *Producer) produceFromBlockKeys(ctx context.Context, span trace.Span,
310310
}
311311

312312
span.SetAttributes(
313-
attribute.Int("llm_d.producer.total_blocks", totalBlocks),
314-
attribute.Int("llm_d.producer.max_match_blocks", maxMatch),
313+
attribute.Int("llm_d.epp.producer.total_blocks", totalBlocks),
314+
attribute.Int("llm_d.epp.producer.max_match_blocks", maxMatch),
315315
)
316316

317317
logger.V(logging.TRACE).Info("Produce completed",

pkg/epp/framework/plugins/scheduling/profilehandler/disagg/disagg_profile_handler.go

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ func (h *Handler) Pick(ctx context.Context, request *scheduling.InferenceRequest
271271
defer span.End()
272272

273273
if request == nil {
274-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "complete_nil_request"))
274+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "complete_nil_request"))
275275
return map[string]scheduling.SchedulerProfile{}
276276
}
277277

@@ -284,18 +284,18 @@ func (h *Handler) Pick(ctx context.Context, request *scheduling.InferenceRequest
284284
if _, executed := profileResults[h.decodeProfile]; !executed {
285285
decodeProfile, ok := profiles[h.decodeProfile]
286286
if !ok {
287-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "error_missing_decode_profile"))
287+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "error_missing_decode_profile"))
288288
return map[string]scheduling.SchedulerProfile{}
289289
}
290-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "run_decode"))
290+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "run_decode"))
291291
return map[string]scheduling.SchedulerProfile{h.decodeProfile: decodeProfile}
292292
}
293293

294294
decodeRes := profileResults[h.decodeProfile]
295295
if decodeRes == nil || len(decodeRes.TargetEndpoints) == 0 {
296296
span.SetAttributes(
297-
attribute.String("llm_d.profile_handler.decision", "complete"),
298-
attribute.Bool("llm_d.profile_handler.decode_failed", true),
297+
attribute.String("llm_d.epp.profile_handler.decision", "complete"),
298+
attribute.Bool("llm_d.epp.profile_handler.decode_failed", true),
299299
)
300300
return map[string]scheduling.SchedulerProfile{}
301301
}
@@ -304,25 +304,25 @@ func (h *Handler) Pick(ctx context.Context, request *scheduling.InferenceRequest
304304
if _, hasEncodeProfile := profiles[h.encodeProfile]; hasEncodeProfile {
305305
if _, executed := profileResults[h.encodeProfile]; !executed {
306306
if h.encodeDecider != nil && h.encodeDecider.disaggregate(ctx, request, decodeRes.TargetEndpoints[0]) {
307-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "run_encode"))
307+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "run_encode"))
308308
return map[string]scheduling.SchedulerProfile{h.encodeProfile: profiles[h.encodeProfile]}
309309
}
310310
// Decider rejected encode - mark as evaluated so we don't re-run the decider.
311311
profileResults[h.encodeProfile] = nil
312-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "skip_encode"))
312+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "skip_encode"))
313313
}
314314
}
315315

316316
// ── Stage 3: Prefill (optional) ────────────────────────────────────────
317317
if _, hasPrefillProfile := profiles[h.prefillProfile]; hasPrefillProfile {
318318
if _, executed := profileResults[h.prefillProfile]; !executed {
319319
if h.pdDecider != nil && h.pdDecider.disaggregate(ctx, request, decodeRes.TargetEndpoints[0]) {
320-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "run_prefill"))
320+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "run_prefill"))
321321
return map[string]scheduling.SchedulerProfile{h.prefillProfile: profiles[h.prefillProfile]}
322322
}
323323
// Decider rejected prefill - mark as evaluated so we don't re-run the decider.
324324
profileResults[h.prefillProfile] = nil
325-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "skip_prefill"))
325+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "skip_prefill"))
326326
}
327327
}
328328

@@ -332,7 +332,7 @@ func (h *Handler) Pick(ctx context.Context, request *scheduling.InferenceRequest
332332

333333
decision := DisaggDecisionType(encodeUsed, prefillUsed)
334334
RecordDisaggDecision(h.typedName.Name, h.typedName.Type, request.TargetModel, decision)
335-
span.SetAttributes(attribute.String("llm_d.profile_handler.decision", "complete_"+decision))
335+
span.SetAttributes(attribute.String("llm_d.epp.profile_handler.decision", "complete_"+decision))
336336

337337
return map[string]scheduling.SchedulerProfile{}
338338
}

pkg/epp/framework/plugins/scheduling/profilehandler/disagg/pd_profile_handler.go

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -167,8 +167,8 @@ func (h *PdProfileHandler) Pick(ctx context.Context, request *scheduling.Inferen
167167

168168
// Set initial attributes
169169
span.SetAttributes(
170-
attribute.Int("llm_d.profile_handler.total_profiles", len(profiles)),
171-
attribute.Int("llm_d.profile_handler.executed_profiles", len(profileResults)),
170+
attribute.Int("llm_d.epp.profile_handler.total_profiles", len(profiles)),
171+
attribute.Int("llm_d.epp.profile_handler.executed_profiles", len(profileResults)),
172172
)
173173

174174
// Set optional request attributes if request is not nil
@@ -184,8 +184,8 @@ func (h *PdProfileHandler) Pick(ctx context.Context, request *scheduling.Inferen
184184
if _, executed := profileResults[h.decodeProfile]; !executed {
185185
// if decode profile was not executed yet, first let the scheduler run the decode profile
186186
span.SetAttributes(
187-
attribute.String("llm_d.profile_handler.decision", "run_decode"),
188-
attribute.String("llm_d.profile_handler.selected_profile", h.decodeProfile),
187+
attribute.String("llm_d.epp.profile_handler.decision", "run_decode"),
188+
attribute.String("llm_d.epp.profile_handler.selected_profile", h.decodeProfile),
189189
)
190190
return map[string]scheduling.SchedulerProfile{
191191
h.decodeProfile: profiles[h.decodeProfile],
@@ -197,8 +197,8 @@ func (h *PdProfileHandler) Pick(ctx context.Context, request *scheduling.Inferen
197197
// check if all configured profiles have been executed, or if decode failed, no need to run more profiles.
198198
if len(profiles) == len(profileResults) || profileResults[h.decodeProfile] == nil {
199199
span.SetAttributes(
200-
attribute.String("llm_d.profile_handler.decision", "complete"),
201-
attribute.Bool("llm_d.profile_handler.decode_failed", profileResults[h.decodeProfile] == nil),
200+
attribute.String("llm_d.epp.profile_handler.decision", "complete"),
201+
attribute.Bool("llm_d.epp.profile_handler.decode_failed", profileResults[h.decodeProfile] == nil),
202202
)
203203
return map[string]scheduling.SchedulerProfile{}
204204
}
@@ -207,8 +207,8 @@ func (h *PdProfileHandler) Pick(ctx context.Context, request *scheduling.Inferen
207207
RecordPDDecision(h.typedName.Name, h.typedName.Type, request.TargetModel, DecisionTypePrefillDecode) //nolint:staticcheck // intentional: pd-profile-handler is itself deprecated
208208
// run the prefill profile
209209
span.SetAttributes(
210-
attribute.String("llm_d.profile_handler.decision", "prefill_decode"),
211-
attribute.String("llm_d.profile_handler.selected_profile", h.prefillProfile),
210+
attribute.String("llm_d.epp.profile_handler.decision", "prefill_decode"),
211+
attribute.String("llm_d.epp.profile_handler.selected_profile", h.prefillProfile),
212212
)
213213
return map[string]scheduling.SchedulerProfile{
214214
h.prefillProfile: profiles[h.prefillProfile],
@@ -217,7 +217,7 @@ func (h *PdProfileHandler) Pick(ctx context.Context, request *scheduling.Inferen
217217

218218
RecordPDDecision(h.typedName.Name, h.typedName.Type, request.TargetModel, DecisionTypeDecodeOnly) //nolint:staticcheck // intentional: pd-profile-handler is itself deprecated
219219
span.SetAttributes(
220-
attribute.String("llm_d.profile_handler.decision", "decode_only"),
220+
attribute.String("llm_d.epp.profile_handler.decision", "decode_only"),
221221
)
222222
return map[string]scheduling.SchedulerProfile{} // do not run prefill
223223
}

0 commit comments

Comments
 (0)