Skip to content

Commit 5eeec80

Browse files
committed
fix e2e tests
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
1 parent 0cf794a commit 5eeec80

File tree

4 files changed

+35
-17
lines changed

4 files changed

+35
-17
lines changed

docs/architecture.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -208,27 +208,37 @@ Selects the profiles to use when running with disaggregated prefill/decode
208208
- **Parameters**:
209209
- `decodeProfile`: specifies the name of the profile used for the decode scheduling. Only needed if the decode profile is not named `decode`.
210210
- `prefillProfile`: specifies the name of the profile used for the prefill scheduling. Only needed if the prefill profile is not named `prefill`.
211-
- `decider`: specifies the name of the decider, which determines whether disaggregated PD should be executed
212-
- `name`: decider name, currently supported values are: "prefix-based-disaggregation-decider" and "always-disaggregated-decider"
213-
- `parameters`: parameters for this specific decider type
211+
- `deciderPluginName`: specifies the name of the decider plugin. Decider determines whether disaggregated PD should be executed
214212
- `primaryPort`: the base port number used for data parallel communication.
215213

216214
**Note:** When using this plugin you must also have a PrefixCachePlugin configured in the prefill and decode scheduling profiles.
217215

218-
**Parameters for `prefix-based-disaggregation-decider`**
216+
---
217+
218+
#### Prefix Based Decider Plugin
219+
220+
Type: `prefix-based-pd-decider`
221+
222+
**Parameters**
219223
- `nonCachedTokens`: length, in token, of the uncached part of the user input above which disaggregated PD is triggered.
220-
- `pluginName`: the prefix plugin name. Optional, required when overriding the default plugin name.
224+
225+
Note: `prepareDataPlugins` feature gate should be enabled
221226

222227
**Example**
223228
```yaml
229+
kind: EndpointPickerConfig
230+
featureGates:
231+
- prepareDataPlugins
232+
plugins:
233+
- type: prefix-based-pd-decider
234+
parameters:
235+
nonCachedTokens: 4
224236
- type: pd-profile-handler
225237
parameters:
226238
primaryPort: 8000
227-
decider:
228-
name: prefix-based-disaggregation-decider
229-
parameters:
230-
nonCachedTokens: 10
239+
deciderPluginName: prefix-based-pd-decider
231240
```
241+
232242
---
233243

234244
#### ByLabelSelector

docs/disagg_pd.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,8 @@ Below is a minimal `EndpointPickerConfig` that enables integration with workload
155155
```yaml
156156
apiVersion: inference.networking.x-k8s.io/v1alpha1
157157
kind: EndpointPickerConfig
158+
featureGates:
159+
- prepareDataPlugins
158160
plugins:
159161
# Prefill selection: match Pods with label role=prefill
160162
- type: by-label
@@ -176,10 +178,12 @@ plugins:
176178
lruCapacityPerServer: 31250
177179
- type: max-score-picker
178180
- type: prefill-header-handler
179-
- type: pd-profile-handler
181+
- type: prefix-based-pd-decider
180182
parameters:
181-
threshold: 0
182-
hashBlockSize: 5
183+
nonCachedTokens: 8
184+
- type: pd-profile-handler
185+
parameters:
186+
deciderPluginName: prefix-based-pd-decider
183187
primaryPort: 8000
184188
schedulingProfiles:
185189
- name: prefill
File renamed without changes.

test/e2e/e2e_test.go

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -126,8 +126,8 @@ var _ = ginkgo.Describe("Run end to end tests", ginkgo.Ordered, func() {
126126
labelFilter2 := fmt.Sprintf(`decision_type="decode-only",model_name="%s"`, modelName)
127127
decodeOnlyCount := getCounterMetric(metricsURL, "llm_d_inference_scheduler_pd_decision_total", labelFilter2)
128128

129-
gomega.Expect(prefillDecodeCount).Should(gomega.Equal(6))
130-
gomega.Expect(decodeOnlyCount).Should(gomega.Equal(0))
129+
gomega.Expect(prefillDecodeCount).Should(gomega.Equal(4))
130+
gomega.Expect(decodeOnlyCount).Should(gomega.Equal(2))
131131

132132
testutils.DeleteObjects(testConfig, epp)
133133
testutils.DeleteObjects(testConfig, modelServers)
@@ -843,20 +843,24 @@ schedulingProfiles:
843843
// EPP configuration for running with P/D
844844
const pdConfig = `apiVersion: inference.networking.x-k8s.io/v1alpha1
845845
kind: EndpointPickerConfig
846+
featureGates:
847+
- prepareDataPlugins
846848
plugins:
847849
- type: prefill-header-handler
848850
- type: prefix-cache-scorer
849851
parameters:
850-
blockSizeTokens: 10
852+
blockSizeTokens: 4
851853
maxPrefixBlocksToMatch: 256
852854
lruCapacityPerServer: 256
853855
- type: prefill-filter
854856
- type: decode-filter
855857
- type: max-score-picker
858+
- type: prefix-based-pd-decider
859+
parameters:
860+
nonCachedTokens: 4
856861
- type: pd-profile-handler
857862
parameters:
858-
hashBlockSize: 10
859-
threshold: 40
863+
deciderPluginName: prefix-based-pd-decider
860864
schedulingProfiles:
861865
- name: prefill
862866
plugins:

0 commit comments

Comments
 (0)