You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Example configurations: [docs/examples/README.md](docs/examples/README.md)
15
17
16
18
Current behavior and defaults
17
19
-----------------------------
@@ -22,23 +24,26 @@ Current behavior and defaults
22
24
- Directive `inference_bbr_max_body_size` sets maximum body size for BBR processing in bytes (default 10MB).
23
25
- Directive `inference_bbr_default_model` sets the default model value when no model is found in request body (default `unknown`).
24
26
- Hybrid memory/file support: small bodies stay in memory, large bodies are read from NGINX temporary files.
25
-
- Memory allocation is capped at 1MB regardless of body size to prevent excessive memory usage.
27
+
- Memory allocation pre-allocation is capped at 1MB to avoid large upfront allocations. Actual in-memory accumulation may grow up to the configured `inference_bbr_max_body_size` limit; large payloads spill to disk and are read incrementally.
- Directive `inference_epp_endpoint` sets the gRPC endpoint for standard EPP ext-proc server communication.
30
32
- Directive `inference_epp_header_name` configures the upstream header name to read from EPP responses (default `X-Inference-Upstream`).
33
+
- Directive `inference_epp_timeout_ms` sets the gRPC timeout for EPP communication (default `200ms`).
31
34
- EPP follows the Gateway API Inference Extension specification: performs headers-only exchange, reads header mutations from responses, and sets the upstream header for endpoint selection.
32
-
- The `$inference_upstream` NGINX variable exposes the EPP-selected endpoint and can be used in `proxy_pass` directives.
35
+
- The `$inference_upstream` NGINX variable exposes the EPP-selected endpoint (read from the header configured by `inference_epp_header_name`) and can be used in `proxy_pass` directives.
33
36
34
37
- Fail-open/closed:
35
-
-`inference_bbr_failure_mode_allow on|off` and `inference_epp_failure_mode_allow on|off` control whether to fail-open when the ext-proc is unavailable or errors. Fail-closed returns `502 Bad Gateway`.
38
+
-`inference_bbr_failure_mode_allow on|off` and `inference_epp_failure_mode_allow on|off` control fail-open vs fail-closed behavior.
39
+
- In fail-closed mode, BBR enforces size limits and may return `413 Request Entity Too Large` or `500 Internal Server Error` on processing errors; EPP failures return `502 Bad Gateway`.
40
+
- In fail-open mode, processing continues without terminating the request.
36
41
37
42
NGINX configuration
38
43
-------------------
39
44
Example configuration snippet for a location using BBR followed by EPP:
40
-
```
41
-
# Load the compiled module (path depends on your build output)
45
+
```nginx
46
+
# Load the compiled module (Linux: .so path; macOS local build: .dylib)
- EPP follows the standard Gateway API specification with headers-only mode (no body streaming).
87
92
- BBR implements hybrid memory/file processing: small bodies (< client_body_buffer_size) stay in memory, larger bodies are read from NGINX temporary files.
88
-
- Memory allocation is capped at 1MB to prevent excessive memory usage regardless of request body size.
93
+
- Memory allocation pre-allocation is capped at 1MB to avoid large upfront allocations. Actual in-memory accumulation may grow up to the configured `inference_bbr_max_body_size` limit; large payloads spill to disk and are read incrementally.
89
94
- BBR respects configurable size limits via `inference_bbr_max_body_size` directive.
90
95
91
96
- Request headers to ext-proc:
@@ -129,7 +134,8 @@ For local development and testing without Docker:
129
134
130
135
2.**Start local services and run tests:**
131
136
```bash
132
-
# Start local nginx with the compiled module plus mock services
137
+
# Start local mock services (echo server on :8080 and mock ext-proc on :9001).
138
+
# NGINX is started automatically by 'make test-local'.
133
139
make start-local
134
140
135
141
# Run configuration tests locally
@@ -141,13 +147,13 @@ Troubleshooting
141
147
- If EPP endpoints are unreachable or not listening on gRPC, you may see `BAD_GATEWAY` when failure mode allow is off. Toggle `*_failure_mode_allow on` to fail-open during testing.
142
148
- Ensure your EPP implementation is configured to return a header mutation for the upstream endpoint. The module will parse response frames and search for `header_mutation` entries.
143
149
- BBR processes JSON directly in the module - ensure request bodies contain valid JSON with a "model" field.
144
-
- Use `error_log` and debug logging to verify module activation. The access-phase handler logs `ngx-inference: bbr_enable=<..> epp_enable=<..>` per request.
150
+
- Use `error_log` and debug logging to verify module activation. BBR logs body reading and size limit enforcement; EPP logs gRPC errors. Set `error_log` to `debug` to observe processing details.
145
151
146
152
Roadmap
147
153
-------
148
154
- Validate EPP and BBR implementations against Gateway API Inference Extension conformance tests.
149
155
- Align exact header names and semantics to the upstream specification and reference implementations.
150
-
-Add configurable maximum body size and back-pressure handling for BBR.
156
+
-Validate large body handling and back-pressure for BBR; refine chunked reads/writes and resource usage for very large payloads.
151
157
- TLS support for gRPC once available in the Gateway API specification.
152
158
- Connection pooling and caching for improved performance at scale.
0 commit comments