Skip to content

Commit e478b8f

Browse files
committed
add guardrails-usa-2
Signed-off-by: Praneeth Bedapudi <praneeth@bpraneeth.com>
1 parent 8cf8ded commit e478b8f

11 files changed

Lines changed: 85 additions & 74 deletions

docs/llm-gateway/architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ How the QuilrAI LLM Gateway processes every request - from your application to t
1212
source={{
1313
label: "Your Application",
1414
code: `client = OpenAI(
15-
base_url='https://guardrails.quilr.ai/openai_compatible/',
15+
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
1616
api_key='sk-quilr-xxx'
1717
)
1818
client.chat.completions.create(

docs/llm-gateway/bedrock-boto3.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ For assume-role setup, see [AWS Bedrock - Assume Role Setup](./bedrock-assume-ro
4141

4242
## Configure boto3
4343

44-
Set `endpoint_url` to the QuilrAI Bedrock Runtime endpoint. Use the same QuilrAI key for both `aws_access_key_id` and `aws_secret_access_key`; the gateway uses SigV4 to authenticate the request.
44+
Set `endpoint_url` to the closest regional QuilrAI Bedrock Runtime endpoint. Use the same QuilrAI key for both `aws_access_key_id` and `aws_secret_access_key`; the gateway uses SigV4 to authenticate the request.
4545

4646
```python
4747
import boto3
@@ -52,7 +52,7 @@ QUILR_KEY = "sk-quilr-xxx"
5252
bedrock = boto3.client(
5353
"bedrock-runtime",
5454
region_name="us-east-1",
55-
endpoint_url="https://guardrails.quilr.ai/bedrock-runtime",
55+
endpoint_url="https://guardrails-usa-2.quilr.ai/bedrock-runtime",
5656
aws_access_key_id=QUILR_KEY,
5757
aws_secret_access_key=QUILR_KEY,
5858
config=Config(read_timeout=300),
@@ -75,7 +75,7 @@ print(response["output"]["message"]["content"][0]["text"])
7575
You can also set `endpoint_url` to the service root:
7676

7777
```python
78-
endpoint_url="https://guardrails.quilr.ai"
78+
endpoint_url="https://guardrails-usa-2.quilr.ai"
7979
```
8080

8181
Both endpoint styles are accepted.

docs/llm-gateway/features/copilot-studio.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,19 @@ Microsoft external threat detection is called for generative agents that use gen
2424

2525
## Endpoint
2626

27-
Create a QuilrAI key with provider `copilot_studio`, then use this endpoint as the external threat detection base URL:
27+
Create a QuilrAI key with provider `copilot_studio`, then use the closest regional endpoint as the external threat detection base URL:
2828

2929
```text
30-
https://guardrails.quilr.ai/copilot_studio/sk-quilr-xxx
30+
https://guardrails-usa-2.quilr.ai/copilot_studio/sk-quilr-xxx
3131
```
3232

33-
Use the regional base URL if your tenant uses a regional QuilrAI deployment:
33+
The example uses US East. Choose the nearest regional base URL for your tenant:
3434

3535
| Endpoint | Region | Endpoint base |
3636
|----------|--------|---------------|
3737
| Global (auto-routed) | Nearest | `https://guardrails.quilr.ai/copilot_studio/sk-quilr-xxx` |
38-
| USA | US East | `https://guardrails-usa-1.quilr.ai/copilot_studio/sk-quilr-xxx` |
38+
| USA 1 | US Central West | `https://guardrails-usa-1.quilr.ai/copilot_studio/sk-quilr-xxx` |
39+
| USA 2 | US East | `https://guardrails-usa-2.quilr.ai/copilot_studio/sk-quilr-xxx` |
3940
| India | Mumbai | `https://guardrails-india-1.quilr.ai/copilot_studio/sk-quilr-xxx` |
4041

4142
Treat this URL as a secret. The QuilrAI key is part of the path because Copilot Studio owns the webhook call shape.
@@ -49,11 +50,11 @@ Copilot Studio appends these paths to the endpoint base:
4950
| `POST /validate` | Checks that the QuilrAI endpoint is reachable and ready. |
5051
| `POST /analyze-tool-execution` | Sends proposed tool execution context for allow/block evaluation. |
5152

52-
For example, if the endpoint base is `https://guardrails.quilr.ai/copilot_studio/sk-quilr-xxx`, Copilot Studio calls:
53+
For example, if the endpoint base is `https://guardrails-usa-2.quilr.ai/copilot_studio/sk-quilr-xxx`, Copilot Studio calls:
5354

5455
```text
55-
https://guardrails.quilr.ai/copilot_studio/sk-quilr-xxx/validate
56-
https://guardrails.quilr.ai/copilot_studio/sk-quilr-xxx/analyze-tool-execution
56+
https://guardrails-usa-2.quilr.ai/copilot_studio/sk-quilr-xxx/validate
57+
https://guardrails-usa-2.quilr.ai/copilot_studio/sk-quilr-xxx/analyze-tool-execution
5758
```
5859

5960
Copilot Studio may also include an `api-version` query parameter. QuilrAI ignores unknown query parameters.

docs/llm-gateway/features/prompt-store.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ This applies uniformly across Chat Completions, Anthropic Messages (both the top
7979
from openai import OpenAI
8080

8181
client = OpenAI(
82-
base_url='https://guardrails.quilr.ai/openai_compatible/',
82+
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
8383
api_key='sk-quilr-xxx'
8484
)
8585

@@ -101,7 +101,7 @@ response = client.chat.completions.create(
101101
import anthropic
102102

103103
client = anthropic.Anthropic(
104-
base_url='https://guardrails.quilr.ai/anthropic_messages/',
104+
base_url='https://guardrails-usa-2.quilr.ai/anthropic_messages/',
105105
api_key='sk-quilr-xxx'
106106
)
107107

docs/llm-gateway/features/request-routing.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ Your code still sends `model="gpt-4.1"` - zero code changes, but requests get ro
135135
from openai import OpenAI
136136

137137
client = OpenAI(
138-
base_url='https://guardrails.quilr.ai/openai_compatible/',
138+
base_url='https://guardrails-usa-2.quilr.ai/openai_compatible/',
139139
api_key='sk-quilr-xxx'
140140
)
141141

@@ -150,7 +150,7 @@ print(response.choices[0].message.content)
150150
### cURL
151151

152152
```bash
153-
curl https://guardrails.quilr.ai/openai_compatible/v1/chat/completions \
153+
curl https://guardrails-usa-2.quilr.ai/openai_compatible/v1/chat/completions \
154154
-H "Content-Type: application/json" \
155155
-H "Authorization: Bearer sk-quilr-xxx" \
156156
-d '{

docs/llm-gateway/features/sdk-mode.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ A typical pattern: check the user message before sending it to your LLM, then ch
118118
```python
119119
import httpx
120120

121-
QUILR_BASE = "https://guardrails.quilr.ai"
121+
QUILR_BASE = "https://guardrails-usa-2.quilr.ai"
122122
QUILR_SDK_KEY = "sk-quilr-xxx"
123123

124124
async def check_messages(messages: list[dict]) -> dict:
@@ -184,7 +184,7 @@ asyncio.run(safe_chat("What is my SSN?"))
184184
```python
185185
import requests
186186

187-
QUILR_BASE = "https://guardrails.quilr.ai"
187+
QUILR_BASE = "https://guardrails-usa-2.quilr.ai"
188188
QUILR_SDK_KEY = "sk-quilr-xxx"
189189

190190
def check_text(text: str, type_: str = "response") -> dict:
@@ -212,7 +212,7 @@ match result["status"]:
212212
### JavaScript / TypeScript - `fetch`
213213

214214
```typescript
215-
const QUILR_BASE = "https://guardrails.quilr.ai";
215+
const QUILR_BASE = "https://guardrails-usa-2.quilr.ai";
216216
const QUILR_SDK_KEY = "sk-quilr-xxx";
217217

218218
async function checkMessages(messages: Array<{ role: string; content: string }>) {
@@ -269,13 +269,13 @@ async function safeChat(userMessage: string): Promise<string> {
269269

270270
```bash
271271
# Check raw text
272-
curl -X POST https://guardrails.quilr.ai/sdk/v1/check \
272+
curl -X POST https://guardrails-usa-2.quilr.ai/sdk/v1/check \
273273
-H "Authorization: Bearer sk-quilr-xxx" \
274274
-H "Content-Type: application/json" \
275275
-d '{"text": "Call me at 555-867-5309", "type": "request"}'
276276

277277
# Check a conversation
278-
curl -X POST https://guardrails.quilr.ai/sdk/v1/check \
278+
curl -X POST https://guardrails-usa-2.quilr.ai/sdk/v1/check \
279279
-H "Authorization: Bearer sk-quilr-xxx" \
280280
-H "Content-Type: application/json" \
281281
-d '{
@@ -305,7 +305,7 @@ Or copy `quilr_litellm_guardrails.py` into your project.
305305
| Variable | Required | Default | Description |
306306
|----------|----------|---------|-------------|
307307
| `QUILR_GUARDRAILS_KEY` | Yes | - | Your `quilr_sdk` API key |
308-
| `QUILR_GUARDRAILS_BASE_URL` | No | `https://guardrails.quilr.ai` | Override for self-hosted deployments |
308+
| `QUILR_GUARDRAILS_BASE_URL` | No | `https://guardrails.quilr.ai` | Override with the closest regional endpoint for production or with a self-hosted deployment URL |
309309
| `QUILR_GUARDRAILS_TIMEOUT` | No | `3` | Seconds before the check times out (request passes on timeout) |
310310
| `APPLY_QUILR_GUARDRAILS_FOR_MODELS` | No | (all) | Comma-separated list of models to restrict guardrails to |
311311
| `APPLY_QUILR_GUARDRAILS_FOR_KEY_NAMES` | No | (all) | Comma-separated list of LiteLLM key names to restrict guardrails to |

docs/llm-gateway/ha-and-sla.md

Lines changed: 30 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -15,29 +15,34 @@ All endpoints are fully interchangeable - same API surface, same features, same
1515
| Endpoint | Region | Base URL |
1616
|----------|--------|----------|
1717
| **Global (auto-routed)** | Nearest | `https://guardrails.quilr.ai` |
18-
| **USA** | US East | `https://guardrails-usa-1.quilr.ai` |
18+
| **USA 1** | US Central West | `https://guardrails-usa-1.quilr.ai` |
19+
| **USA 2** | US East | `https://guardrails-usa-2.quilr.ai` |
1920
| **India** | Mumbai | `https://guardrails-india-1.quilr.ai` |
2021

2122
Append the API format path to any base URL - for example, `https://guardrails-usa-1.quilr.ai/openai_compatible/`. See the [Integration Guide](./integration-guide) for all supported formats.
2223

24+
For production traffic, choose the location-specific endpoint closest to your application as the primary base URL. Use `guardrails.quilr.ai` only when you explicitly want global auto-routing.
25+
2326
:::info Expanding regions
2427
This list will continue to grow as we bring new regions online. Check this page or the [Integration Guide](./integration-guide) for the latest endpoints.
2528
:::
2629

2730
## Routing Architecture
2831

29-
When you send a request to `guardrails.quilr.ai`, it automatically routes to the nearest available gateway server based on your geographic location. No configuration needed.
32+
If you use `guardrails.quilr.ai`, it automatically routes to the nearest available gateway server based on your geographic location. For predictable production routing, use a regional endpoint directly.
3033

3134
```mermaid
3235
flowchart TD
3336
A["Your Application"] --> B["guardrails.quilr.ai"]
3437
B --> C{"Auto-route to<br/>nearest server"}
35-
C -->|"US traffic"| D["guardrails-usa-1.quilr.ai"]
36-
C -->|"India traffic"| E["guardrails-india-1.quilr.ai"]
37-
C -->|"Future regions"| F["..."]
38+
C -->|"US Central West traffic"| D["guardrails-usa-1.quilr.ai"]
39+
C -->|"US East traffic"| E["guardrails-usa-2.quilr.ai"]
40+
C -->|"India traffic"| F["guardrails-india-1.quilr.ai"]
41+
C -->|"Future regions"| H["..."]
3842
D --> G["LLM Providers"]
3943
E --> G
4044
F --> G
45+
H --> G
4146
```
4247

4348
Each regional server runs the full QuilrAI pipeline - validation, scanning, transformation, routing, and observability - so there is no functional difference between endpoints.
@@ -48,17 +53,17 @@ Each regional server runs the full QuilrAI pipeline - validation, scanning, tran
4853
{
4954
label: "Attempt 1",
5055
items: [
51-
"→ guardrails.quilr.ai",
52-
"Auto-routes to nearest ✓",
53-
"Optimal latency ✓",
56+
"→ guardrails-usa-2.quilr.ai",
57+
"Direct to US East server ✓",
58+
"Primary regional endpoint ✓",
5459
],
5560
},
5661
{
5762
label: "Attempt 2",
5863
items: [
5964
"→ guardrails-usa-1.quilr.ai",
60-
"Direct to US server ✓",
61-
"Bypasses auto-routing ✓",
65+
"Direct to US Central West server ✓",
66+
"Host-level redundancy ✓",
6267
],
6368
},
6469
{
@@ -71,24 +76,26 @@ Each regional server runs the full QuilrAI pipeline - validation, scanning, tran
7176
},
7277
]} />
7378

74-
Even though `guardrails.quilr.ai` auto-routes to the nearest healthy server, we recommend a three-tier retry strategy that falls back to explicit regional endpoints:
79+
For production retry logic, use explicit regional endpoints. Start with the location-specific endpoint closest to your application, then fail over to other regional hosts. Do not include the global auto-routed endpoint in the retry chain.
80+
81+
Example order for a US East deployment:
7582

76-
1. **First attempt** - `guardrails.quilr.ai` - Uses auto-routing for optimal latency under normal conditions.
77-
2. **Second attempt** - `guardrails-usa-1.quilr.ai` - Direct connection to the US server, bypassing the auto-routing layer entirely.
83+
1. **First attempt** - `guardrails-usa-2.quilr.ai` - Direct connection to the nearest regional server.
84+
2. **Second attempt** - `guardrails-usa-1.quilr.ai` - Direct connection to another US server for host-level redundancy.
7885
3. **Third attempt** - `guardrails-india-1.quilr.ai` - Targets a geographically distinct server for maximum redundancy.
7986

8087
### Why retry with regional endpoints?
8188

82-
Auto-routing handles most failure scenarios transparently. However, explicit regional fallbacks protect against edge cases that auto-routing alone cannot cover:
89+
Explicit regional fallbacks protect against edge cases that auto-routing alone cannot cover:
8390

84-
- **DNS or routing-layer issues** - If the global endpoint's routing layer itself is degraded, direct regional URLs bypass it entirely.
85-
- **Auto-routing detection latency** - The auto-router takes 3-7 seconds to detect a downed host. During this window, your request may still be routed to the unhealthy server. Retrying with an explicit regional URL immediately targets a different host, avoiding the detection delay.
91+
- **DNS or routing-layer issues** - Direct regional URLs bypass the global routing layer entirely.
92+
- **Deterministic failover** - Retrying with an explicit regional URL immediately targets a different host instead of letting the auto-router choose.
8693
- **Regional propagation delays** - A server that has just recovered may not yet be visible to the auto-router. Hitting it directly avoids propagation lag.
8794
- **Geographic redundancy** - Retrying across regions ensures your request reaches an entirely independent infrastructure stack, eliminating single points of failure.
8895

8996
The overhead is minimal - two additional fallback URLs in your retry logic - but the resilience improvement is significant.
9097

91-
We recommend **one retry per QuilrAI host**. If a request fails on a given endpoint, move on to the next one rather than retrying the same host. This maximizes the chance of hitting a healthy server quickly, especially during the 3-7 second window before auto-routing detects a failure.
98+
We recommend **one retry per QuilrAI host**. If a request fails on a given endpoint, move on to the next one rather than retrying the same host. This maximizes the chance of hitting a healthy server quickly.
9299

93100
### Code Example
94101

@@ -97,8 +104,8 @@ import time
97104
import httpx
98105

99106
ENDPOINTS = [
100-
"https://guardrails.quilr.ai", # auto-routes to nearest
101-
"https://guardrails-usa-1.quilr.ai", # direct US fallback
107+
"https://guardrails-usa-2.quilr.ai", # primary US East endpoint
108+
"https://guardrails-usa-1.quilr.ai", # direct US Central West fallback
102109
"https://guardrails-india-1.quilr.ai", # direct India fallback
103110
]
104111

@@ -124,8 +131,8 @@ import time
124131
from openai import OpenAI
125132

126133
ENDPOINTS = [
127-
"https://guardrails.quilr.ai/openai_compatible/v1", # auto-routes to nearest
128-
"https://guardrails-usa-1.quilr.ai/openai_compatible/v1", # direct US fallback
134+
"https://guardrails-usa-2.quilr.ai/openai_compatible/v1", # primary US East endpoint
135+
"https://guardrails-usa-1.quilr.ai/openai_compatible/v1", # direct US Central West fallback
129136
"https://guardrails-india-1.quilr.ai/openai_compatible/v1", # direct India fallback
130137
]
131138

@@ -146,8 +153,8 @@ def call_llm(messages: list) -> str:
146153

147154
```javascript
148155
const ENDPOINTS = [
149-
"https://guardrails.quilr.ai", // auto-routes to nearest
150-
"https://guardrails-usa-1.quilr.ai", // direct US fallback
156+
"https://guardrails-usa-2.quilr.ai", // primary US East endpoint
157+
"https://guardrails-usa-1.quilr.ai", // direct US Central West fallback
151158
"https://guardrails-india-1.quilr.ai", // direct India fallback
152159
];
153160

0 commit comments

Comments
 (0)