Skip to content

Commit 53eacdb

Browse files
committed
Update README, remove unused deps
1 parent ff0b79d commit 53eacdb

File tree

3 files changed

+15
-92
lines changed

3 files changed

+15
-92
lines changed

Cargo.lock

Lines changed: 0 additions & 81 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,9 @@ tonic = "0.14"
3232
tonic-prost = "0.14"
3333
prost = "0.14"
3434
prost-types = "0.14"
35-
serde = { version = "1.0", features = ["derive"] }
3635
serde_json = "1.0"
3736
libc = "0.2"
3837
tracing = "0.1"
39-
tracing-subscriber = { version = "0.3", features = ["fmt", "env-filter"] }
4038

4139
[build-dependencies]
4240
tonic-prost-build = "0.14"

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ It implements two standard components:
1212
Reference docs:
1313
- NGF design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md
1414
- EPP reference implementation: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/pkg/epp
15+
- Module configuration: [docs/configuration.md](docs/configuration.md)
16+
- Example configurations: [docs/examples/README.md](docs/examples/README.md)
1517

1618
Current behavior and defaults
1719
-----------------------------
@@ -22,23 +24,26 @@ Current behavior and defaults
2224
- Directive `inference_bbr_max_body_size` sets maximum body size for BBR processing in bytes (default 10MB).
2325
- Directive `inference_bbr_default_model` sets the default model value when no model is found in request body (default `unknown`).
2426
- Hybrid memory/file support: small bodies stay in memory, large bodies are read from NGINX temporary files.
25-
- Memory allocation is capped at 1MB regardless of body size to prevent excessive memory usage.
27+
- Memory allocation pre-allocation is capped at 1MB to avoid large upfront allocations. Actual in-memory accumulation may grow up to the configured `inference_bbr_max_body_size` limit; large payloads spill to disk and are read incrementally.
2628

2729
- EPP:
2830
- Directive `inference_epp on|off` enables/disables EPP functionality.
2931
- Directive `inference_epp_endpoint` sets the gRPC endpoint for standard EPP ext-proc server communication.
3032
- Directive `inference_epp_header_name` configures the upstream header name to read from EPP responses (default `X-Inference-Upstream`).
33+
- Directive `inference_epp_timeout_ms` sets the gRPC timeout for EPP communication (default `200ms`).
3134
- EPP follows the Gateway API Inference Extension specification: performs headers-only exchange, reads header mutations from responses, and sets the upstream header for endpoint selection.
32-
- The `$inference_upstream` NGINX variable exposes the EPP-selected endpoint and can be used in `proxy_pass` directives.
35+
- The `$inference_upstream` NGINX variable exposes the EPP-selected endpoint (read from the header configured by `inference_epp_header_name`) and can be used in `proxy_pass` directives.
3336

3437
- Fail-open/closed:
35-
- `inference_bbr_failure_mode_allow on|off` and `inference_epp_failure_mode_allow on|off` control whether to fail-open when the ext-proc is unavailable or errors. Fail-closed returns `502 Bad Gateway`.
38+
- `inference_bbr_failure_mode_allow on|off` and `inference_epp_failure_mode_allow on|off` control fail-open vs fail-closed behavior.
39+
- In fail-closed mode, BBR enforces size limits and may return `413 Request Entity Too Large` or `500 Internal Server Error` on processing errors; EPP failures return `502 Bad Gateway`.
40+
- In fail-open mode, processing continues without terminating the request.
3641

3742
NGINX configuration
3843
-------------------
3944
Example configuration snippet for a location using BBR followed by EPP:
40-
```
41-
# Load the compiled module (path depends on your build output)
45+
```nginx
46+
# Load the compiled module (Linux: .so path; macOS local build: .dylib)
4247
load_module /usr/lib/nginx/modules/libngx_inference.so;
4348
4449
http {
@@ -85,7 +90,7 @@ Notes and assumptions
8590
- Body processing:
8691
- EPP follows the standard Gateway API specification with headers-only mode (no body streaming).
8792
- BBR implements hybrid memory/file processing: small bodies (< client_body_buffer_size) stay in memory, larger bodies are read from NGINX temporary files.
88-
- Memory allocation is capped at 1MB to prevent excessive memory usage regardless of request body size.
93+
- Memory allocation pre-allocation is capped at 1MB to avoid large upfront allocations. Actual in-memory accumulation may grow up to the configured `inference_bbr_max_body_size` limit; large payloads spill to disk and are read incrementally.
8994
- BBR respects configurable size limits via `inference_bbr_max_body_size` directive.
9095

9196
- Request headers to ext-proc:
@@ -129,7 +134,8 @@ For local development and testing without Docker:
129134

130135
2. **Start local services and run tests:**
131136
```bash
132-
# Start local nginx with the compiled module plus mock services
137+
# Start local mock services (echo server on :8080 and mock ext-proc on :9001).
138+
# NGINX is started automatically by 'make test-local'.
133139
make start-local
134140

135141
# Run configuration tests locally
@@ -141,13 +147,13 @@ Troubleshooting
141147
- If EPP endpoints are unreachable or not listening on gRPC, you may see `BAD_GATEWAY` when failure mode allow is off. Toggle `*_failure_mode_allow on` to fail-open during testing.
142148
- Ensure your EPP implementation is configured to return a header mutation for the upstream endpoint. The module will parse response frames and search for `header_mutation` entries.
143149
- BBR processes JSON directly in the module - ensure request bodies contain valid JSON with a "model" field.
144-
- Use `error_log` and debug logging to verify module activation. The access-phase handler logs `ngx-inference: bbr_enable=<..> epp_enable=<..>` per request.
150+
- Use `error_log` and debug logging to verify module activation. BBR logs body reading and size limit enforcement; EPP logs gRPC errors. Set `error_log` to `debug` to observe processing details.
145151

146152
Roadmap
147153
-------
148154
- Validate EPP and BBR implementations against Gateway API Inference Extension conformance tests.
149155
- Align exact header names and semantics to the upstream specification and reference implementations.
150-
- Add configurable maximum body size and back-pressure handling for BBR.
156+
- Validate large body handling and back-pressure for BBR; refine chunked reads/writes and resource usage for very large payloads.
151157
- TLS support for gRPC once available in the Gateway API specification.
152158
- Connection pooling and caching for improved performance at scale.
153159

0 commit comments

Comments
 (0)