-
Notifications
You must be signed in to change notification settings - Fork 306
Description
Problem Statement
AI gateways and routers (OpenRouter, Portkey, LiteLLM, Martian, etc.) are becoming increasingly common in the GenAI ecosystem. These services act as an intermediary layer between applications and model providers, offering features like:
- Model routing and fallback logic
- Cost optimization and load balancing
- Unified API across multiple providers
- Usage tracking and rate limiting
- Data residency and compliance routing
- Bring Your Own Key (BYOK) support
The current GenAI semantic conventions don't have a standardized way to represent this gateway layer in telemetry. Users need to distinguish between issues at the gateway level vs. the underlying provider level for proper observability.
Proposed Solution
Add a gen_ai.gateway.* attribute namespace to identify when requests are routed through an AI gateway/router.
Attribute Categories
1. Core Attributes
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.name |
string | The name of the AI gateway/router service | openrouter, portkey, litellm, martian |
gen_ai.gateway.version |
string | Version of the gateway service | 1.0.0, 2024.1 |
gen_ai.gateway.request.id |
string | Unique identifier for the routed request (distinct from gen_ai.response.id which is the provider's ID) |
gen-abc123xyz |
gen_ai.gateway.app.id |
string | Application identifier registered with the gateway | app_123456 |
gen_ai.gateway.origin |
string | Client origin/referrer URL | https://myapp.com/ |
2. Model Resolution Attributes
Gateways often resolve model aliases, auto-select models, or map requested models to canonical identifiers.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.request.model |
string | The model identifier as requested through the gateway (may be an alias or auto-routed) | auto, gpt-4-turbo, anthropic/claude-3-sonnet |
gen_ai.gateway.response.models |
string[] | The actual model(s) resolved/selected by the gateway (includes fallback models if attempted) | ["anthropic/claude-3-5-sonnet-20241022"], ["openai/gpt-4", "anthropic/claude-3-5-sonnet"] |
gen_ai.gateway.model.alias_resolved |
boolean | Whether an alias was resolved to a canonical model | true, false |
3. Routing Strategy Attributes
Gateways make intelligent decisions about where to send requests based on various strategies.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.route.strategy |
string | The routing strategy used | lowest_cost, lowest_latency, highest_throughput, round_robin, weighted_random, manual_order, fallback |
gen_ai.gateway.load_balance.enabled |
boolean | Whether load balancing was applied | true, false |
gen_ai.gateway.session.hit |
boolean | Whether sticky session cache was used (for cache affinity) | true, false |
gen_ai.gateway.session.id |
string | Session identifier for sticky routing (distinct from gen_ai.conversation.id which is for chat context) |
sess_abc123 |
4. Fallback Attributes
Gateways implement fallback mechanisms when providers fail or are unavailable.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.fallback.used |
boolean | Whether a fallback provider/model was used | true, false |
gen_ai.gateway.fallback.reason |
string | Reason fallback was triggered | rate_limit, timeout, provider_error, capacity, model_unavailable, endpoint_status, moderation_blocked |
gen_ai.gateway.fallback.attempt_number |
int | Which attempt this is (1 = first try, 2 = first fallback, etc.) | 1, 2, 3 |
gen_ai.gateway.fallback.total_attempts |
int | Total number of attempts made | 3 |
gen_ai.gateway.fallback.latency_wasted |
double | Time spent on failed attempts before success (seconds) | 1.5 |
gen_ai.gateway.fallback.model_level |
boolean | Whether fallback occurred at the model level (vs provider level) | true, false |
gen_ai.gateway.providers.attempted |
string[] | List of providers attempted in order | ["openai", "anthropic"] |
5. Performance & Latency Attributes
Gateways track provider performance metrics and request latencies.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.latency |
double | Total gateway processing latency (seconds) | 0.511 |
gen_ai.gateway.latency.moderation |
double | Time spent on content moderation (seconds) | 0.214 |
gen_ai.gateway.latency.generation |
double | Time spent on upstream generation (seconds) | 0.719 |
gen_ai.gateway.endpoint.latency_p50 |
double | 50th percentile latency for selected endpoint (seconds) | 0.5 |
gen_ai.gateway.endpoint.latency_p99 |
double | 99th percentile latency for selected endpoint (seconds) | 2.1 |
gen_ai.gateway.endpoint.throughput_p50 |
double | 50th percentile throughput (tokens/sec) | 150.0 |
gen_ai.gateway.endpoint.uptime_percent |
double | Provider endpoint uptime percentage (0-100) | 99.5 |
gen_ai.gateway.endpoint.status |
string | Current status of the selected endpoint | default, degraded, down, deprioritized |
6. Streaming & Request State Attributes
Gateways track the state and mode of requests.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.streamed |
boolean | Whether the response was streamed | true, false |
gen_ai.gateway.cancelled |
boolean | Whether the request was cancelled by the client | true, false |
7. Token Detail Attributes (Provider-Specific Breakdowns)
Standard token counts use existing gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. These gateway attributes capture additional token breakdowns that providers report.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.usage.tokens.reasoning |
int | Reasoning/thinking tokens (e.g., o1, Claude thinking) | 150 |
gen_ai.gateway.usage.tokens.cached |
int | Tokens served from provider's prompt cache | 500 |
gen_ai.gateway.usage.tokens.output_images |
int | Image output tokens | 1024 |
8. Media & Tool Usage Attributes
Gateways track media inputs/outputs and tool usage.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.media.input_count |
int | Number of media items in input (images, files) | 3 |
gen_ai.gateway.media.input_audio_count |
int | Number of audio inputs | 1 |
gen_ai.gateway.media.output_count |
int | Number of media items in output | 2 |
gen_ai.gateway.search_results_count |
int | Number of web search results used | 5 |
9. Caching Attributes
Gateways can leverage prompt caching and session caching for efficiency.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.cache.hit |
boolean | Whether gateway-level cache was hit | true, false |
gen_ai.gateway.cache.type |
string | Type of cache hit | prompt, session, response |
gen_ai.gateway.cache.tokens_saved |
int | Number of tokens served from cache | 1500 |
gen_ai.gateway.cache.cost_saved |
double | Cost savings from caching (USD) | 0.0015 |
gen_ai.gateway.cache.discount |
double | Cache discount applied (USD, negative value) | -0.0005 |
10. Rate Limiting Attributes
Gateways enforce rate limits at various levels.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.rate_limit.hit |
boolean | Whether a rate limit was triggered | true, false |
gen_ai.gateway.rate_limit.type |
string | Type of rate limit that was hit | user, endpoint, api_key, ip, free_tier |
gen_ai.gateway.rate_limit.name |
string | Name/identifier of the rate limit | user_rpm, endpoint_rpd |
gen_ai.gateway.rate_limit.remaining |
int | Remaining requests in the current window | 45 |
gen_ai.gateway.rate_limit.reset_at |
string | When the rate limit resets (ISO 8601 timestamp) | 2024-01-15T12:00:00Z |
11. Quota & Budget Attributes
Gateways track usage against quotas and budgets.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.quota.credits_remaining |
double | Remaining credits/budget | 50.25 |
gen_ai.gateway.quota.limit_type |
string | Type of quota limit | total, daily, weekly, monthly |
gen_ai.gateway.quota.limit_remaining |
double | Remaining quota value | 100.0 |
gen_ai.gateway.quota.guardrail_hit |
boolean | Whether a budget guardrail was triggered | true, false |
12. Cost Attributes
Gateways track costs at both the gateway and provider level.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.usage.cost |
double | Total cost charged by the gateway (may include markup) | 0.001296 |
gen_ai.gateway.usage.cost_currency |
string | Currency for cost values | USD |
gen_ai.gateway.usage.cost_upstream |
double | Cost charged by the underlying provider | 0.001296 |
13. BYOK (Bring Your Own Key) Attributes
Gateways support customers using their own provider API keys.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.byok.enabled |
boolean | Whether BYOK was used for this request | true, false |
gen_ai.gateway.byok.provider |
string | Which provider's customer key was used | openai, anthropic |
gen_ai.gateway.byok.fee |
double | Gateway service fee for BYOK (USD) | 0.0001 |
14. Compliance & Data Region Attributes
Gateways handle data residency and compliance requirements.
| Attribute | Type | Description | Example Values |
|---|---|---|---|
gen_ai.gateway.data_region |
string | Data region used for routing | global, europe, us-east, asia-pacific |
gen_ai.gateway.compliance.hipaa |
boolean | Whether HIPAA-compliant routing was used | true, false |
gen_ai.gateway.compliance.soc2 |
boolean | Whether SOC2-compliant routing was used | true, false |
gen_ai.gateway.data_policy |
string | Data policy applied | no_logging, no_training, full_logging |
Relationship to Existing gen_ai.* Attributes
The gateway attributes complement (not replace) existing gen_ai semantic conventions. Use the standard attributes for provider-level data:
| Use This (Standard) | For | Gateway Adds |
|---|---|---|
gen_ai.provider.name |
The provider that handled the request | gen_ai.gateway.providers.attempted for fallback chain |
gen_ai.request.model |
The resolved model sent to provider | gen_ai.gateway.request.model for original alias/request |
gen_ai.response.model |
The model that responded | gen_ai.gateway.response.models for fallback chain |
gen_ai.response.id |
Provider's response ID | gen_ai.gateway.request.id for gateway's tracking ID |
gen_ai.operation.name |
Operation type (chat, embeddings) | - (use standard attribute) |
gen_ai.usage.input_tokens |
Input token count | gen_ai.gateway.usage.tokens.cached for cache breakdown |
gen_ai.usage.output_tokens |
Output token count | gen_ai.gateway.usage.tokens.reasoning for reasoning breakdown |
gen_ai.conversation.id |
Chat thread/session ID | gen_ai.gateway.session.id for routing affinity (different purpose) |
Example Usage
Standard Request
# Standard GenAI attributes (use these for provider data)
gen_ai.provider.name = 'aws.bedrock'
gen_ai.request.model = 'anthropic/claude-4.5-haiku-20251001'
gen_ai.response.model = 'anthropic/claude-4.5-haiku-20251001'
gen_ai.response.id = '98b87d18-b2f5-4dcb-b0b0-f5a8e4f12ca4'
gen_ai.operation.name = 'chat'
gen_ai.usage.input_tokens = 841
gen_ai.usage.output_tokens = 22
gen_ai.response.finish_reasons = ['stop']
# Gateway layer attributes (additional context)
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.request.id = 'gen-1769536032-2L9dWejMDV7jMP8m7LF5'
gen_ai.gateway.origin = 'https://claude.ai/'
gen_ai.gateway.streamed = true
gen_ai.gateway.cancelled = false
gen_ai.gateway.latency = 0.511
gen_ai.gateway.latency.moderation = 0.214
gen_ai.gateway.latency.generation = 0.719
gen_ai.gateway.usage.tokens.reasoning = 0
gen_ai.gateway.usage.tokens.cached = 0
gen_ai.gateway.usage.cost = 0.001296
gen_ai.gateway.usage.cost_upstream = 0.001296
gen_ai.gateway.byok.enabled = false
gen_ai.gateway.fallback.used = false
gen_ai.gateway.endpoint.status = 'default'
Fallback Scenario
# Standard attributes reflect final successful provider
gen_ai.provider.name = 'anthropic'
gen_ai.response.model = 'anthropic/claude-3-5-sonnet'
# Gateway attributes show the full story
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.response.models = ['openai/gpt-4', 'anthropic/claude-3-5-sonnet']
gen_ai.gateway.fallback.used = true
gen_ai.gateway.fallback.reason = 'endpoint_status'
gen_ai.gateway.fallback.attempt_number = 2
gen_ai.gateway.fallback.total_attempts = 2
gen_ai.gateway.fallback.latency_wasted = 0.8
gen_ai.gateway.providers.attempted = ['openai', 'anthropic']
gen_ai.gateway.route.strategy = 'fallback'
gen_ai.gateway.endpoint.status = 'down'
BYOK Request
gen_ai.provider.name = 'openai'
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.byok.enabled = true
gen_ai.gateway.byok.provider = 'openai'
gen_ai.gateway.byok.fee = 0.0001
gen_ai.gateway.usage.cost = 0.0001
gen_ai.gateway.usage.cost_upstream = 0.0024
Request with Media & Search
gen_ai.provider.name = 'openai'
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.media.input_count = 2
gen_ai.gateway.media.input_audio_count = 0
gen_ai.gateway.media.output_count = 1
gen_ai.gateway.search_results_count = 5
gen_ai.gateway.usage.tokens.output_images = 1024
Use Cases
- Debugging latency issues: Separate gateway latency, moderation latency, and generation latency to pinpoint bottlenecks
- Cost attribution: Track costs at both the gateway and provider level, including markups and BYOK fees
- Failure analysis: Identify whether failures occurred at the gateway layer or provider layer; understand endpoint status
- Fallback monitoring: Track how often fallbacks are triggered, why (including endpoint_status), and how many attempts were needed
- Model resolution tracking: Understand how aliases/auto-routing resolve to actual models
- Performance optimization: Use endpoint performance heuristics to understand routing decisions
- Rate limit debugging: Identify which rate limits are being hit and when they reset
- Compliance auditing: Track data region routing and compliance requirements
- BYOK cost analysis: Separate BYOK service fees from upstream provider costs
- Cache efficiency: Track cache hit rates and cost savings from caching
Prior Art
The spec already acknowledges that proxies and hosting platforms exist:
"Multiple providers, including Azure OpenAI, Gemini, and AI hosting platforms are accessible using the OpenAI REST API and corresponding client libraries, but may proxy or host models from different providers."
However, there's no standardized attribute to identify these gateway/proxy layers.
Additional Context
- AI gateways are distinct from providers - they don't host models, they route to them
- This is similar to how HTTP semantic conventions distinguish between proxies and origin servers
- Multiple gateway vendors would benefit from standardization here
- These attributes were derived from real-world gateway implementations
Happy to discuss naming alternatives or prioritization of which attributes are most critical for an initial release.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status