You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/llm-gateway/bedrock-boto3.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ For assume-role setup, see [AWS Bedrock - Assume Role Setup](./bedrock-assume-ro
41
41
42
42
## Configure boto3
43
43
44
-
Set `endpoint_url` to the QuilrAI Bedrock Runtime endpoint. Use the same QuilrAI key for both `aws_access_key_id` and `aws_secret_access_key`; the gateway uses SigV4 to authenticate the request.
44
+
Set `endpoint_url` to the closest regional QuilrAI Bedrock Runtime endpoint. Use the same QuilrAI key for both `aws_access_key_id` and `aws_secret_access_key`; the gateway uses SigV4 to authenticate the request.
@@ -269,13 +269,13 @@ async function safeChat(userMessage: string): Promise<string> {
269
269
270
270
```bash
271
271
# Check raw text
272
-
curl -X POST https://guardrails.quilr.ai/sdk/v1/check \
272
+
curl -X POST https://guardrails-usa-2.quilr.ai/sdk/v1/check \
273
273
-H "Authorization: Bearer sk-quilr-xxx" \
274
274
-H "Content-Type: application/json" \
275
275
-d '{"text": "Call me at 555-867-5309", "type": "request"}'
276
276
277
277
# Check a conversation
278
-
curl -X POST https://guardrails.quilr.ai/sdk/v1/check \
278
+
curl -X POST https://guardrails-usa-2.quilr.ai/sdk/v1/check \
279
279
-H "Authorization: Bearer sk-quilr-xxx" \
280
280
-H "Content-Type: application/json" \
281
281
-d '{
@@ -305,7 +305,7 @@ Or copy `quilr_litellm_guardrails.py` into your project.
305
305
| Variable | Required | Default | Description |
306
306
|----------|----------|---------|-------------|
307
307
|`QUILR_GUARDRAILS_KEY`| Yes | - | Your `quilr_sdk` API key |
308
-
|`QUILR_GUARDRAILS_BASE_URL`| No |`https://guardrails.quilr.ai`| Override for self-hosted deployments|
308
+
|`QUILR_GUARDRAILS_BASE_URL`| No |`https://guardrails.quilr.ai`| Override with the closest regional endpoint for production or with a self-hosted deployment URL|
309
309
|`QUILR_GUARDRAILS_TIMEOUT`| No |`3`| Seconds before the check times out (request passes on timeout) |
310
310
|`APPLY_QUILR_GUARDRAILS_FOR_MODELS`| No | (all) | Comma-separated list of models to restrict guardrails to |
311
311
|`APPLY_QUILR_GUARDRAILS_FOR_KEY_NAMES`| No | (all) | Comma-separated list of LiteLLM key names to restrict guardrails to |
Append the API format path to any base URL - for example, `https://guardrails-usa-1.quilr.ai/openai_compatible/`. See the [Integration Guide](./integration-guide) for all supported formats.
22
23
24
+
For production traffic, choose the location-specific endpoint closest to your application as the primary base URL. Use `guardrails.quilr.ai` only when you explicitly want global auto-routing.
25
+
23
26
:::info Expanding regions
24
27
This list will continue to grow as we bring new regions online. Check this page or the [Integration Guide](./integration-guide) for the latest endpoints.
25
28
:::
26
29
27
30
## Routing Architecture
28
31
29
-
When you send a request to `guardrails.quilr.ai`, it automatically routes to the nearest available gateway server based on your geographic location. No configuration needed.
32
+
If you use `guardrails.quilr.ai`, it automatically routes to the nearest available gateway server based on your geographic location. For predictable production routing, use a regional endpoint directly.
C -->|"US traffic"| D["guardrails-usa-1.quilr.ai"]
36
-
C -->|"India traffic"| E["guardrails-india-1.quilr.ai"]
37
-
C -->|"Future regions"| F["..."]
38
+
C -->|"US Central West traffic"| D["guardrails-usa-1.quilr.ai"]
39
+
C -->|"US East traffic"| E["guardrails-usa-2.quilr.ai"]
40
+
C -->|"India traffic"| F["guardrails-india-1.quilr.ai"]
41
+
C -->|"Future regions"| H["..."]
38
42
D --> G["LLM Providers"]
39
43
E --> G
40
44
F --> G
45
+
H --> G
41
46
```
42
47
43
48
Each regional server runs the full QuilrAI pipeline - validation, scanning, transformation, routing, and observability - so there is no functional difference between endpoints.
@@ -48,17 +53,17 @@ Each regional server runs the full QuilrAI pipeline - validation, scanning, tran
48
53
{
49
54
label: "Attempt 1",
50
55
items: [
51
-
"→ guardrails.quilr.ai",
52
-
"Auto-routes to nearest ✓",
53
-
"Optimal latency ✓",
56
+
"→ guardrails-usa-2.quilr.ai",
57
+
"Direct to US East server ✓",
58
+
"Primary regional endpoint ✓",
54
59
],
55
60
},
56
61
{
57
62
label: "Attempt 2",
58
63
items: [
59
64
"→ guardrails-usa-1.quilr.ai",
60
-
"Direct to US server ✓",
61
-
"Bypasses auto-routing ✓",
65
+
"Direct to US Central West server ✓",
66
+
"Host-level redundancy ✓",
62
67
],
63
68
},
64
69
{
@@ -71,24 +76,26 @@ Each regional server runs the full QuilrAI pipeline - validation, scanning, tran
71
76
},
72
77
]} />
73
78
74
-
Even though `guardrails.quilr.ai` auto-routes to the nearest healthy server, we recommend a three-tier retry strategy that falls back to explicit regional endpoints:
79
+
For production retry logic, use explicit regional endpoints. Start with the location-specific endpoint closest to your application, then fail over to other regional hosts. Do not include the global auto-routed endpoint in the retry chain.
80
+
81
+
Example order for a US East deployment:
75
82
76
-
1.**First attempt** - `guardrails.quilr.ai` - Uses auto-routing for optimal latency under normal conditions.
77
-
2.**Second attempt** - `guardrails-usa-1.quilr.ai` - Direct connection to the US server, bypassing the auto-routing layer entirely.
83
+
1.**First attempt** - `guardrails-usa-2.quilr.ai` - Direct connection to the nearest regional server.
84
+
2.**Second attempt** - `guardrails-usa-1.quilr.ai` - Direct connection to another US server for host-level redundancy.
78
85
3.**Third attempt** - `guardrails-india-1.quilr.ai` - Targets a geographically distinct server for maximum redundancy.
79
86
80
87
### Why retry with regional endpoints?
81
88
82
-
Auto-routing handles most failure scenarios transparently. However, explicit regional fallbacks protect against edge cases that auto-routing alone cannot cover:
89
+
Explicit regional fallbacks protect against edge cases that auto-routing alone cannot cover:
83
90
84
-
-**DNS or routing-layer issues** - If the global endpoint's routing layer itself is degraded, direct regional URLs bypass it entirely.
85
-
-**Auto-routing detection latency** - The auto-router takes 3-7 seconds to detect a downed host. During this window, your request may still be routed to the unhealthy server. Retrying with an explicit regional URL immediately targets a different host, avoiding the detection delay.
91
+
-**DNS or routing-layer issues** - Direct regional URLs bypass the global routing layer entirely.
92
+
-**Deterministic failover** - Retrying with an explicit regional URL immediately targets a different host instead of letting the auto-router choose.
86
93
-**Regional propagation delays** - A server that has just recovered may not yet be visible to the auto-router. Hitting it directly avoids propagation lag.
87
94
-**Geographic redundancy** - Retrying across regions ensures your request reaches an entirely independent infrastructure stack, eliminating single points of failure.
88
95
89
96
The overhead is minimal - two additional fallback URLs in your retry logic - but the resilience improvement is significant.
90
97
91
-
We recommend **one retry per QuilrAI host**. If a request fails on a given endpoint, move on to the next one rather than retrying the same host. This maximizes the chance of hitting a healthy server quickly, especially during the 3-7 second window before auto-routing detects a failure.
98
+
We recommend **one retry per QuilrAI host**. If a request fails on a given endpoint, move on to the next one rather than retrying the same host. This maximizes the chance of hitting a healthy server quickly.
92
99
93
100
### Code Example
94
101
@@ -97,8 +104,8 @@ import time
97
104
import httpx
98
105
99
106
ENDPOINTS= [
100
-
"https://guardrails.quilr.ai", # auto-routes to nearest
101
-
"https://guardrails-usa-1.quilr.ai", # direct US fallback
107
+
"https://guardrails-usa-2.quilr.ai", # primary US East endpoint
108
+
"https://guardrails-usa-1.quilr.ai", # direct US Central West fallback
102
109
"https://guardrails-india-1.quilr.ai", # direct India fallback
103
110
]
104
111
@@ -124,8 +131,8 @@ import time
124
131
from openai import OpenAI
125
132
126
133
ENDPOINTS= [
127
-
"https://guardrails.quilr.ai/openai_compatible/v1", # auto-routes to nearest
128
-
"https://guardrails-usa-1.quilr.ai/openai_compatible/v1", # direct US fallback
134
+
"https://guardrails-usa-2.quilr.ai/openai_compatible/v1", # primary US East endpoint
135
+
"https://guardrails-usa-1.quilr.ai/openai_compatible/v1", # direct US Central West fallback
129
136
"https://guardrails-india-1.quilr.ai/openai_compatible/v1", # direct India fallback
0 commit comments