Move TODO to GH issue

cvandesande · cvandesande · commit 8263296cf12f · 2025-12-07T18:59:36.000Z
diff --git a/README.md b/README.md
@@ -129,58 +129,6 @@ Notes and assumptions
   - EPP implementation forwards incoming request headers per the Gateway API specification for endpoint selection context.
   - BBR implementation processes request bodies directly for model detection without external communication.
 
-
-Future Enhancements / TODO
----------------------------
-
-### KV Prefix Caching Support
-
-To support KV prefix caching as described in the [Gateway API Inference Extension prefix-aware EPP configuration](https://gateway-api-inference-extension.sigs.k8s.io/guides/epp-configuration/prefix-aware/), the EPP service needs access to request bodies to compute prefix hashes.
-
-**Background**:
-The reference EPP implementation's [prefix cache plugin](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins/multi/prefix/plugin.go) extracts prompts directly from `request.Body` (via `getUserInputBytes()`) to compute prefix hashes for optimal routing. This means EPP services expect to receive the full request body via the Envoy ext-proc protocol.
-
-**Current State**:
-- EPP uses **headers-only mode** (`end_of_stream: true`, `request_body_mode: None`)
-- BBR uses `ngx_http_read_client_request_body()` to buffer the body into nginx's internal buffer chain
-- BBR's `read_request_body()` function extracts data from nginx buffers (memory + file-backed)
-
-**Implementation Strategy**:
-1. **Single Body Read**: Call `ngx_http_read_client_request_body()` once (coordinate between BBR and EPP)
-2. **Shared Access**: Both modules read from nginx's `request_body->bufs` chain
-3. **Body-Aware gRPC**: EPP switches to body-aware mode following Envoy ext-proc protocol:
-   - Send `ProcessingRequest` with `RequestHeaders` (set `end_of_stream: false`)
-   - Send `ProcessingRequest` with `RequestBody` containing the full body
-   - Set `request_body_mode: BodySendMode::Buffered`
-
-**Execution Flow**:
-```
-1. Request arrives
-2. If BBR or EPP needs body: Call ngx_http_read_client_request_body() once
-3. Body buffered by nginx into buffer chain (memory + file if large)
-4. If BBR enabled: Read from nginx buffers, extract model, set header
-5. If EPP enabled: Read from same nginx buffers, send via gRPC body message
-6. Continue to upstream proxy
-```
-
-**Configuration Options to Add**:
-- `inference_epp_send_body on|off` - enable body-aware mode (default: off for backward compatibility)
-- `inference_epp_body_max_size` - maximum body size for EPP (default: 10MB, same as BBR)
-- Consider unified `inference_max_body_size` if both BBR and EPP are commonly used together
-
-**Implementation Details**:
-- Refactor `read_request_body()` into a shared utility function in `src/modules/mod.rs`
-- Both BBR and EPP call the same body reading function
-- Handle execution order: if both enabled, read body once, process for both
-- EPP sends body in gRPC `RequestBody` message (standard Envoy ext-proc protocol)
-- Maintain backward compatibility: headers-only remains default
-
-**Trade-offs**:
-- Higher latency: Must buffer entire body before routing decision (vs headers-only)
-- Required for prefix caching: This is the standard protocol, not a design choice
-- Memory efficient: No double buffering, shared nginx buffer chain
-
-
 Testing
 -------