Skip to content

Commit 8263296

Browse files
committed
Move TODO to GH issue
1 parent 6e21dd9 commit 8263296

File tree

1 file changed

+0
-52
lines changed

1 file changed

+0
-52
lines changed

README.md

Lines changed: 0 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -129,58 +129,6 @@ Notes and assumptions
129129
- EPP implementation forwards incoming request headers per the Gateway API specification for endpoint selection context.
130130
- BBR implementation processes request bodies directly for model detection without external communication.
131131

132-
133-
Future Enhancements / TODO
134-
---------------------------
135-
136-
### KV Prefix Caching Support
137-
138-
To support KV prefix caching as described in the [Gateway API Inference Extension prefix-aware EPP configuration](https://gateway-api-inference-extension.sigs.k8s.io/guides/epp-configuration/prefix-aware/), the EPP service needs access to request bodies to compute prefix hashes.
139-
140-
**Background**:
141-
The reference EPP implementation's [prefix cache plugin](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins/multi/prefix/plugin.go) extracts prompts directly from `request.Body` (via `getUserInputBytes()`) to compute prefix hashes for optimal routing. This means EPP services expect to receive the full request body via the Envoy ext-proc protocol.
142-
143-
**Current State**:
144-
- EPP uses **headers-only mode** (`end_of_stream: true`, `request_body_mode: None`)
145-
- BBR uses `ngx_http_read_client_request_body()` to buffer the body into nginx's internal buffer chain
146-
- BBR's `read_request_body()` function extracts data from nginx buffers (memory + file-backed)
147-
148-
**Implementation Strategy**:
149-
1. **Single Body Read**: Call `ngx_http_read_client_request_body()` once (coordinate between BBR and EPP)
150-
2. **Shared Access**: Both modules read from nginx's `request_body->bufs` chain
151-
3. **Body-Aware gRPC**: EPP switches to body-aware mode following Envoy ext-proc protocol:
152-
- Send `ProcessingRequest` with `RequestHeaders` (set `end_of_stream: false`)
153-
- Send `ProcessingRequest` with `RequestBody` containing the full body
154-
- Set `request_body_mode: BodySendMode::Buffered`
155-
156-
**Execution Flow**:
157-
```
158-
1. Request arrives
159-
2. If BBR or EPP needs body: Call ngx_http_read_client_request_body() once
160-
3. Body buffered by nginx into buffer chain (memory + file if large)
161-
4. If BBR enabled: Read from nginx buffers, extract model, set header
162-
5. If EPP enabled: Read from same nginx buffers, send via gRPC body message
163-
6. Continue to upstream proxy
164-
```
165-
166-
**Configuration Options to Add**:
167-
- `inference_epp_send_body on|off` - enable body-aware mode (default: off for backward compatibility)
168-
- `inference_epp_body_max_size` - maximum body size for EPP (default: 10MB, same as BBR)
169-
- Consider unified `inference_max_body_size` if both BBR and EPP are commonly used together
170-
171-
**Implementation Details**:
172-
- Refactor `read_request_body()` into a shared utility function in `src/modules/mod.rs`
173-
- Both BBR and EPP call the same body reading function
174-
- Handle execution order: if both enabled, read body once, process for both
175-
- EPP sends body in gRPC `RequestBody` message (standard Envoy ext-proc protocol)
176-
- Maintain backward compatibility: headers-only remains default
177-
178-
**Trade-offs**:
179-
- Higher latency: Must buffer entire body before routing decision (vs headers-only)
180-
- Required for prefix caching: This is the standard protocol, not a design choice
181-
- Memory efficient: No double buffering, shared nginx buffer chain
182-
183-
184132
Testing
185133
-------
186134

0 commit comments

Comments
 (0)