Skip to content

Commit 3c554c1

Browse files
committed
Update README
1 parent 4a36428 commit 3c554c1

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@ http {
4848
inference_bbr on;
4949
inference_bbr_max_body_size 52428800; # 50MB for AI workloads
5050
inference_bbr_default_model "gpt-3.5-turbo"; # Default model when none found
51-
inference_bbr_failure_mode_allow off; # Fail-closed for production
5251
5352
# Configure the inference module for EPP (Endpoint Picker Processor)
5453
inference_epp on;
@@ -58,6 +57,9 @@ http {
5857
# inference_epp_tls off; # Disable TLS for development/testing
5958
# inference_epp_ca_file /etc/ssl/certs/ca.crt; # Custom CA file
6059
60+
# Default upstream fallback when EPP fails and failure_mode_allow is on
61+
# inference_default_upstream "fallback-server:8080";
62+
6163
# Proxy to the chosen upstream (will be determined by EPP)
6264
# Use the $inference_upstream variable set by the module
6365
proxy_set_header Host $host;
@@ -77,7 +79,6 @@ Current behavior and defaults
7779
- Directive `inference_bbr_header_name` configures the model header name to inject (default `X-Gateway-Model-Name`).
7880
- Directive `inference_bbr_max_body_size` sets maximum body size for BBR processing in bytes (default 10MB).
7981
- Directive `inference_bbr_default_model` sets the default model value when no model is found in request body (default `unknown`).
80-
- Directive `inference_bbr_failure_mode_allow on|off` controls fail-open vs fail-closed behavior (default `off`).
8182
- Hybrid memory/file support: small bodies stay in memory, large bodies are read from NGINX temporary files.
8283
- Memory allocation pre-allocation is capped at 1MB to avoid large upfront allocations. Actual in-memory accumulation may grow up to the configured `inference_bbr_max_body_size` limit; large payloads spill to disk and are read incrementally.
8384

@@ -87,15 +88,16 @@ Current behavior and defaults
8788
- Directive `inference_epp_header_name` configures the upstream header name to read from EPP responses (default `X-Inference-Upstream`).
8889
- Directive `inference_epp_timeout_ms` sets the gRPC timeout for EPP communication (default `200ms`).
8990
- Directive `inference_epp_failure_mode_allow on|off` controls fail-open vs fail-closed behavior (default `off`).
91+
- Directive `inference_default_upstream` sets a fallback upstream when EPP fails and `inference_epp_failure_mode_allow` is `on`.
9092
- Directive `inference_epp_tls on|off` enables TLS for gRPC connections (default `on`).
9193
- Directive `inference_epp_ca_file /path/to/ca.crt` specifies CA certificate file path for TLS verification (optional).
9294
- EPP follows the Gateway API Inference Extension specification: performs headers-only exchange, reads header mutations from responses, and sets the upstream header for endpoint selection.
9395
- The `$inference_upstream` NGINX variable exposes the EPP-selected endpoint (read from the header configured by `inference_epp_header_name`) and can be used in `proxy_pass` directives.
9496

9597
- Fail-open/closed:
96-
- `inference_bbr_failure_mode_allow on|off` and `inference_epp_failure_mode_allow on|off` control fail-open vs fail-closed behavior.
97-
- In fail-closed mode, BBR enforces size limits and may return `413 Request Entity Too Large` or `500 Internal Server Error` on processing errors; EPP failures return `502 Bad Gateway`.
98-
- In fail-open mode, processing continues without terminating the request.
98+
- `inference_epp_failure_mode_allow on|off` controls EPP fail-open vs fail-closed behavior.
99+
- EPP fail-closed mode returns `500 Internal Server Error` on EPP processing failures.
100+
- EPP fail-open mode continues processing when EPP fails. When `inference_epp_failure_mode_allow` is `on`, you can configure `inference_default_upstream` to specify a fallback upstream when EPP fails.
99101

100102

101103

0 commit comments

Comments
 (0)