You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/features/gateway-plugins.rst
+71-3Lines changed: 71 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -159,6 +159,8 @@ Below are routing strategies gateway supports:
159
159
* ``prefix-cache-preble``: routes request considering both prefix cache hits and pod load, implementation is based of Preble: Efficient Distributed Prompt Scheduling for LLM Serving: https://arxiv.org/abs/2407.00023.
160
160
* ``vtc-basic``: routes request using a hybrid score balancing fairness (user token count) and pod utilization. It is a simple variant of Virtual Token Counter (VTC) algorithm. See more details at https://github.com/Ying1123/VTC-artifact
161
161
162
+
Some routing strategies rely on metrics queried from the Prometheus HTTP API (PromQL). See :ref:`prometheus-api-access` for configuration.
163
+
162
164
.. code-block:: bash
163
165
164
166
curl -v http://${ENDPOINT}/v1/chat/completions \
@@ -205,7 +207,7 @@ How session affinity works:
205
207
- This is especially useful for **multi-turn chat applications** where maintaining context on the same backend instance improves performance and consistency.
206
208
207
209
.. note::
208
-
The x-session-id header is not a security token—it only encodes network location. Do not rely on it for authentication or authorization.
210
+
The x-session-id header is not a security token—it only encodes network location. Do not rely on it for authentication or authorization.
209
211
210
212
Rate Limiting
211
213
-------------
@@ -233,7 +235,7 @@ To set up rate limiting, add the user header in the request, like this:
233
235
234
236
235
237
External Filter
236
-
===============
238
+
---------------
237
239
The ``external-filter`` header is evaluated **before** the routing strategy selects the optimal target pod. allows users to dynamically restrict the target Pods using Kubernetes ``labelSelector`` expressions.
238
240
239
241
The header value follows the Kubernetes label selector syntax:
Some routing strategies rely on metrics queried from the Prometheus HTTP API (PromQL). Configure the API endpoint and optional Basic Auth with the following environment variables.
442
+
443
+
.. list-table::
444
+
:header-rows: 1
445
+
:widths: 40 18 60
446
+
447
+
* - Environment Variable
448
+
- Default
449
+
- Description
450
+
* - ``PROMETHEUS_ENDPOINT``
451
+
- (empty)
452
+
- Prometheus HTTP API base URL (for example: ``http://prometheus-operated.prometheus.svc:9090``). If empty, PromQL-based metrics are skipped.
453
+
* - ``PROMETHEUS_BASIC_AUTH_SECRET_NAME``
454
+
- (empty)
455
+
- Kubernetes Secret name that stores the Basic Auth credentials. When set, it takes precedence over the plaintext env vars below.
456
+
* - ``PROMETHEUS_BASIC_AUTH_SECRET_NAMESPACE``
457
+
- ``aibrix-system``
458
+
- Namespace of the Secret specified by ``PROMETHEUS_BASIC_AUTH_SECRET_NAME``.
459
+
* - ``PROMETHEUS_BASIC_AUTH_USERNAME_KEY``
460
+
- ``username``
461
+
- Key in ``Secret.data`` used as the Basic Auth username.
462
+
* - ``PROMETHEUS_BASIC_AUTH_PASSWORD_KEY``
463
+
- ``password``
464
+
- Key in ``Secret.data`` used as the Basic Auth password.
465
+
* - ``PROMETHEUS_BASIC_AUTH_USERNAME``
466
+
- (empty)
467
+
- Basic Auth username, used only when ``PROMETHEUS_BASIC_AUTH_SECRET_NAME`` is not set.
468
+
* - ``PROMETHEUS_BASIC_AUTH_PASSWORD``
469
+
- (empty)
470
+
- Basic Auth password, used only when ``PROMETHEUS_BASIC_AUTH_SECRET_NAME`` is not set.
0 commit comments