You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Push your release branch to the llm-d-router remote.
@@ -79,13 +79,13 @@ This document defines the process for releasing llm-d-router.
79
79
For a release candidate:
80
80
81
81
```shell
82
-
git tag -s -a v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} -m 'llm-d-router v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} Release Candidate'
82
+
git tag -s -a v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} -m "llm-d-router v${MAJOR}.${MINOR}.${PATCH}-rc.${RC} Release Candidate"
83
83
```
84
84
85
85
For a major, minor or patch release:
86
86
87
87
```shell
88
-
git tag -s -a v${MAJOR}.${MINOR}.${PATCH} -m 'llm-d-router v${MAJOR}.${MINOR}.${PATCH} Release'
88
+
git tag -s -a v${MAJOR}.${MINOR}.${PATCH} -m "llm-d-router v${MAJOR}.${MINOR}.${PATCH} Release"
89
89
```
90
90
91
91
1. Push the tag to the llm-d-router repo.
@@ -102,16 +102,17 @@ This document defines the process for releasing llm-d-router.
102
102
git push ${REMOTE} v${MAJOR}.${MINOR}.${PATCH}
103
103
```
104
104
105
-
1. Pushing the tag triggers CI action to build and publish the [EPP image] and [sidecar image] to the [ghcr registry].
106
-
1. Test the steps in the tagged quickstart guide after the PR merges. TODO add e2e tests!<!-- link to an e2e tests once we have such one -->
105
+
1. Pushing the tag triggers CI action to build and publish the EPP image (`ghcr.io/llm-d/llm-d-router-endpoint-picker`) and sidecar image (`ghcr.io/llm-d/llm-d-router-disagg-sidecar`) to the [ghcr registry].
106
+
1. Verify the [CI release workflow] completed successfully before proceeding.
107
+
1. Test the steps in the tagged quickstart guide after the PR merges.
107
108
108
109
### Create the release!
109
110
110
111
1. Create a [new release]:
111
112
1. Choose the tag that you created for the release.
112
-
1. Use the tag as the release title, i.e.`v0.1.0` refer to previous release for the content of the release body.
113
+
1. Use the tag as the release title, e.g.`v0.1.0`.
113
114
1. Click "Generate release notes" and preview the release body.
114
-
1. Go to Gateway Inference Extension latest release and make sure to include the highlights in llm-d-router as well.
115
+
1. Ensure the release body includes: highlights, breaking changes (if any), known issues, and upgrade steps.
115
116
1. If this is a release candidate, selectthe"This is a pre-release" checkbox.
116
117
1. If you find any bugs in this process, create an [issue].
117
118
@@ -131,7 +132,6 @@ Use the following steps to announce the release.
Copy file name to clipboardExpand all lines: pkg/epp/framework/plugins/scheduling/filter/sessionaffinity/README.md
+25-1Lines changed: 25 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,17 +7,41 @@ Pins subsequent requests in a session to the same pod the first request was sent
7
7
The session is carried in a request header whose value is the base64-encoded `namespace/name` of the previously selected pod. As a [`ResponseHeaderProcessor`](../../../../interface/requestcontrol/plugins.go), the filter writes that same header on the response so the client can echo it back on the next request.
8
8
9
9
## Parameters
10
-
10
+
11
11
| Name | Type | Default | Description |
12
12
|---|---|---|---|
13
13
|`headerName`| string |`x-session-token`| Request and response header carrying the session token. When set, only this header is read; the default is ignored. |
14
+
|`profileName`| string || The name of the profile this instance is associated with. When set (e.g. `prefill`), the plugin looks up the target pod from the results of that profile in `SchedulingResult` during the response received phase. When empty, it defaults to the primary (decode) pod. |
To support session affinity with PD disaggregation, configure two separate instances of the filter: one for decode and one for prefill.
27
+
28
+
```yaml
29
+
# Instance for the decode profile (pins decode requests)
30
+
- name: session-affinity-decode
31
+
type: session-affinity-filter
32
+
parameters:
33
+
headerName: x-session-token
34
+
35
+
# Instance for the prefill profile (pins prefill requests)
36
+
- name: session-affinity-prefill
37
+
type: session-affinity-filter
38
+
parameters:
39
+
headerName: x-session-token-prefill
40
+
profileName: prefill
41
+
```
42
+
43
+
The decode instance uses the default behavior (writing the decode pod to `x-session-token`). The prefill instance uses `profileName: prefill` to look up the prefill pod from the scheduling results and write it to `x-session-token-prefill`. This ensures that subsequent requests in the same session target both the same prefill pod and the same decode pod.
44
+
21
45
## Relationship to the session affinity scorer
22
46
23
47
The [session affinity scorer](../../scorer/sessionaffinity/README.md) (`session-affinity-scorer`) provides the same affinity behavior as a soft preference and writes the same response header.
0 commit comments