You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
-52Lines changed: 0 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -129,58 +129,6 @@ Notes and assumptions
129
129
- EPP implementation forwards incoming request headers per the Gateway API specification for endpoint selection context.
130
130
- BBR implementation processes request bodies directly for model detection without external communication.
131
131
132
-
133
-
Future Enhancements / TODO
134
-
---------------------------
135
-
136
-
### KV Prefix Caching Support
137
-
138
-
To support KV prefix caching as described in the [Gateway API Inference Extension prefix-aware EPP configuration](https://gateway-api-inference-extension.sigs.k8s.io/guides/epp-configuration/prefix-aware/), the EPP service needs access to request bodies to compute prefix hashes.
139
-
140
-
**Background**:
141
-
The reference EPP implementation's [prefix cache plugin](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins/multi/prefix/plugin.go) extracts prompts directly from `request.Body` (via `getUserInputBytes()`) to compute prefix hashes for optimal routing. This means EPP services expect to receive the full request body via the Envoy ext-proc protocol.
0 commit comments