Skip to content

Proposal: Add gen_ai.gateway.* attributes for AI routing/gateway layers #3342

@seawatts

Description

@seawatts

Problem Statement

AI gateways and routers (OpenRouter, Portkey, LiteLLM, Martian, etc.) are becoming increasingly common in the GenAI ecosystem. These services act as an intermediary layer between applications and model providers, offering features like:

  • Model routing and fallback logic
  • Cost optimization and load balancing
  • Unified API across multiple providers
  • Usage tracking and rate limiting
  • Data residency and compliance routing
  • Bring Your Own Key (BYOK) support

The current GenAI semantic conventions don't have a standardized way to represent this gateway layer in telemetry. Users need to distinguish between issues at the gateway level vs. the underlying provider level for proper observability.

Proposed Solution

Add a gen_ai.gateway.* attribute namespace to identify when requests are routed through an AI gateway/router.


Attribute Categories

1. Core Attributes

Attribute Type Description Example Values
gen_ai.gateway.name string The name of the AI gateway/router service openrouter, portkey, litellm, martian
gen_ai.gateway.version string Version of the gateway service 1.0.0, 2024.1
gen_ai.gateway.request.id string Unique identifier for the routed request (distinct from gen_ai.response.id which is the provider's ID) gen-abc123xyz
gen_ai.gateway.app.id string Application identifier registered with the gateway app_123456
gen_ai.gateway.origin string Client origin/referrer URL https://myapp.com/

2. Model Resolution Attributes

Gateways often resolve model aliases, auto-select models, or map requested models to canonical identifiers.

Attribute Type Description Example Values
gen_ai.gateway.request.model string The model identifier as requested through the gateway (may be an alias or auto-routed) auto, gpt-4-turbo, anthropic/claude-3-sonnet
gen_ai.gateway.response.models string[] The actual model(s) resolved/selected by the gateway (includes fallback models if attempted) ["anthropic/claude-3-5-sonnet-20241022"], ["openai/gpt-4", "anthropic/claude-3-5-sonnet"]
gen_ai.gateway.model.alias_resolved boolean Whether an alias was resolved to a canonical model true, false

3. Routing Strategy Attributes

Gateways make intelligent decisions about where to send requests based on various strategies.

Attribute Type Description Example Values
gen_ai.gateway.route.strategy string The routing strategy used lowest_cost, lowest_latency, highest_throughput, round_robin, weighted_random, manual_order, fallback
gen_ai.gateway.load_balance.enabled boolean Whether load balancing was applied true, false
gen_ai.gateway.session.hit boolean Whether sticky session cache was used (for cache affinity) true, false
gen_ai.gateway.session.id string Session identifier for sticky routing (distinct from gen_ai.conversation.id which is for chat context) sess_abc123

4. Fallback Attributes

Gateways implement fallback mechanisms when providers fail or are unavailable.

Attribute Type Description Example Values
gen_ai.gateway.fallback.used boolean Whether a fallback provider/model was used true, false
gen_ai.gateway.fallback.reason string Reason fallback was triggered rate_limit, timeout, provider_error, capacity, model_unavailable, endpoint_status, moderation_blocked
gen_ai.gateway.fallback.attempt_number int Which attempt this is (1 = first try, 2 = first fallback, etc.) 1, 2, 3
gen_ai.gateway.fallback.total_attempts int Total number of attempts made 3
gen_ai.gateway.fallback.latency_wasted double Time spent on failed attempts before success (seconds) 1.5
gen_ai.gateway.fallback.model_level boolean Whether fallback occurred at the model level (vs provider level) true, false
gen_ai.gateway.providers.attempted string[] List of providers attempted in order ["openai", "anthropic"]

5. Performance & Latency Attributes

Gateways track provider performance metrics and request latencies.

Attribute Type Description Example Values
gen_ai.gateway.latency double Total gateway processing latency (seconds) 0.511
gen_ai.gateway.latency.moderation double Time spent on content moderation (seconds) 0.214
gen_ai.gateway.latency.generation double Time spent on upstream generation (seconds) 0.719
gen_ai.gateway.endpoint.latency_p50 double 50th percentile latency for selected endpoint (seconds) 0.5
gen_ai.gateway.endpoint.latency_p99 double 99th percentile latency for selected endpoint (seconds) 2.1
gen_ai.gateway.endpoint.throughput_p50 double 50th percentile throughput (tokens/sec) 150.0
gen_ai.gateway.endpoint.uptime_percent double Provider endpoint uptime percentage (0-100) 99.5
gen_ai.gateway.endpoint.status string Current status of the selected endpoint default, degraded, down, deprioritized

6. Streaming & Request State Attributes

Gateways track the state and mode of requests.

Attribute Type Description Example Values
gen_ai.gateway.streamed boolean Whether the response was streamed true, false
gen_ai.gateway.cancelled boolean Whether the request was cancelled by the client true, false

7. Token Detail Attributes (Provider-Specific Breakdowns)

Standard token counts use existing gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. These gateway attributes capture additional token breakdowns that providers report.

Attribute Type Description Example Values
gen_ai.gateway.usage.tokens.reasoning int Reasoning/thinking tokens (e.g., o1, Claude thinking) 150
gen_ai.gateway.usage.tokens.cached int Tokens served from provider's prompt cache 500
gen_ai.gateway.usage.tokens.output_images int Image output tokens 1024

8. Media & Tool Usage Attributes

Gateways track media inputs/outputs and tool usage.

Attribute Type Description Example Values
gen_ai.gateway.media.input_count int Number of media items in input (images, files) 3
gen_ai.gateway.media.input_audio_count int Number of audio inputs 1
gen_ai.gateway.media.output_count int Number of media items in output 2
gen_ai.gateway.search_results_count int Number of web search results used 5

9. Caching Attributes

Gateways can leverage prompt caching and session caching for efficiency.

Attribute Type Description Example Values
gen_ai.gateway.cache.hit boolean Whether gateway-level cache was hit true, false
gen_ai.gateway.cache.type string Type of cache hit prompt, session, response
gen_ai.gateway.cache.tokens_saved int Number of tokens served from cache 1500
gen_ai.gateway.cache.cost_saved double Cost savings from caching (USD) 0.0015
gen_ai.gateway.cache.discount double Cache discount applied (USD, negative value) -0.0005

10. Rate Limiting Attributes

Gateways enforce rate limits at various levels.

Attribute Type Description Example Values
gen_ai.gateway.rate_limit.hit boolean Whether a rate limit was triggered true, false
gen_ai.gateway.rate_limit.type string Type of rate limit that was hit user, endpoint, api_key, ip, free_tier
gen_ai.gateway.rate_limit.name string Name/identifier of the rate limit user_rpm, endpoint_rpd
gen_ai.gateway.rate_limit.remaining int Remaining requests in the current window 45
gen_ai.gateway.rate_limit.reset_at string When the rate limit resets (ISO 8601 timestamp) 2024-01-15T12:00:00Z

11. Quota & Budget Attributes

Gateways track usage against quotas and budgets.

Attribute Type Description Example Values
gen_ai.gateway.quota.credits_remaining double Remaining credits/budget 50.25
gen_ai.gateway.quota.limit_type string Type of quota limit total, daily, weekly, monthly
gen_ai.gateway.quota.limit_remaining double Remaining quota value 100.0
gen_ai.gateway.quota.guardrail_hit boolean Whether a budget guardrail was triggered true, false

12. Cost Attributes

Gateways track costs at both the gateway and provider level.

Attribute Type Description Example Values
gen_ai.gateway.usage.cost double Total cost charged by the gateway (may include markup) 0.001296
gen_ai.gateway.usage.cost_currency string Currency for cost values USD
gen_ai.gateway.usage.cost_upstream double Cost charged by the underlying provider 0.001296

13. BYOK (Bring Your Own Key) Attributes

Gateways support customers using their own provider API keys.

Attribute Type Description Example Values
gen_ai.gateway.byok.enabled boolean Whether BYOK was used for this request true, false
gen_ai.gateway.byok.provider string Which provider's customer key was used openai, anthropic
gen_ai.gateway.byok.fee double Gateway service fee for BYOK (USD) 0.0001

14. Compliance & Data Region Attributes

Gateways handle data residency and compliance requirements.

Attribute Type Description Example Values
gen_ai.gateway.data_region string Data region used for routing global, europe, us-east, asia-pacific
gen_ai.gateway.compliance.hipaa boolean Whether HIPAA-compliant routing was used true, false
gen_ai.gateway.compliance.soc2 boolean Whether SOC2-compliant routing was used true, false
gen_ai.gateway.data_policy string Data policy applied no_logging, no_training, full_logging

Relationship to Existing gen_ai.* Attributes

The gateway attributes complement (not replace) existing gen_ai semantic conventions. Use the standard attributes for provider-level data:

Use This (Standard) For Gateway Adds
gen_ai.provider.name The provider that handled the request gen_ai.gateway.providers.attempted for fallback chain
gen_ai.request.model The resolved model sent to provider gen_ai.gateway.request.model for original alias/request
gen_ai.response.model The model that responded gen_ai.gateway.response.models for fallback chain
gen_ai.response.id Provider's response ID gen_ai.gateway.request.id for gateway's tracking ID
gen_ai.operation.name Operation type (chat, embeddings) - (use standard attribute)
gen_ai.usage.input_tokens Input token count gen_ai.gateway.usage.tokens.cached for cache breakdown
gen_ai.usage.output_tokens Output token count gen_ai.gateway.usage.tokens.reasoning for reasoning breakdown
gen_ai.conversation.id Chat thread/session ID gen_ai.gateway.session.id for routing affinity (different purpose)

Example Usage

Standard Request

# Standard GenAI attributes (use these for provider data)
gen_ai.provider.name = 'aws.bedrock'
gen_ai.request.model = 'anthropic/claude-4.5-haiku-20251001'
gen_ai.response.model = 'anthropic/claude-4.5-haiku-20251001'
gen_ai.response.id = '98b87d18-b2f5-4dcb-b0b0-f5a8e4f12ca4'
gen_ai.operation.name = 'chat'
gen_ai.usage.input_tokens = 841
gen_ai.usage.output_tokens = 22
gen_ai.response.finish_reasons = ['stop']

# Gateway layer attributes (additional context)
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.request.id = 'gen-1769536032-2L9dWejMDV7jMP8m7LF5'
gen_ai.gateway.origin = 'https://claude.ai/'
gen_ai.gateway.streamed = true
gen_ai.gateway.cancelled = false
gen_ai.gateway.latency = 0.511
gen_ai.gateway.latency.moderation = 0.214
gen_ai.gateway.latency.generation = 0.719
gen_ai.gateway.usage.tokens.reasoning = 0
gen_ai.gateway.usage.tokens.cached = 0
gen_ai.gateway.usage.cost = 0.001296
gen_ai.gateway.usage.cost_upstream = 0.001296
gen_ai.gateway.byok.enabled = false
gen_ai.gateway.fallback.used = false
gen_ai.gateway.endpoint.status = 'default'

Fallback Scenario

# Standard attributes reflect final successful provider
gen_ai.provider.name = 'anthropic'
gen_ai.response.model = 'anthropic/claude-3-5-sonnet'

# Gateway attributes show the full story
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.response.models = ['openai/gpt-4', 'anthropic/claude-3-5-sonnet']
gen_ai.gateway.fallback.used = true
gen_ai.gateway.fallback.reason = 'endpoint_status'
gen_ai.gateway.fallback.attempt_number = 2
gen_ai.gateway.fallback.total_attempts = 2
gen_ai.gateway.fallback.latency_wasted = 0.8
gen_ai.gateway.providers.attempted = ['openai', 'anthropic']
gen_ai.gateway.route.strategy = 'fallback'
gen_ai.gateway.endpoint.status = 'down'

BYOK Request

gen_ai.provider.name = 'openai'
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.byok.enabled = true
gen_ai.gateway.byok.provider = 'openai'
gen_ai.gateway.byok.fee = 0.0001
gen_ai.gateway.usage.cost = 0.0001
gen_ai.gateway.usage.cost_upstream = 0.0024

Request with Media & Search

gen_ai.provider.name = 'openai'
gen_ai.gateway.name = 'openrouter'
gen_ai.gateway.media.input_count = 2
gen_ai.gateway.media.input_audio_count = 0
gen_ai.gateway.media.output_count = 1
gen_ai.gateway.search_results_count = 5
gen_ai.gateway.usage.tokens.output_images = 1024

Use Cases

  1. Debugging latency issues: Separate gateway latency, moderation latency, and generation latency to pinpoint bottlenecks
  2. Cost attribution: Track costs at both the gateway and provider level, including markups and BYOK fees
  3. Failure analysis: Identify whether failures occurred at the gateway layer or provider layer; understand endpoint status
  4. Fallback monitoring: Track how often fallbacks are triggered, why (including endpoint_status), and how many attempts were needed
  5. Model resolution tracking: Understand how aliases/auto-routing resolve to actual models
  6. Performance optimization: Use endpoint performance heuristics to understand routing decisions
  7. Rate limit debugging: Identify which rate limits are being hit and when they reset
  8. Compliance auditing: Track data region routing and compliance requirements
  9. BYOK cost analysis: Separate BYOK service fees from upstream provider costs
  10. Cache efficiency: Track cache hit rates and cost savings from caching

Prior Art

The spec already acknowledges that proxies and hosting platforms exist:

"Multiple providers, including Azure OpenAI, Gemini, and AI hosting platforms are accessible using the OpenAI REST API and corresponding client libraries, but may proxy or host models from different providers."

However, there's no standardized attribute to identify these gateway/proxy layers.


Additional Context

  • AI gateways are distinct from providers - they don't host models, they route to them
  • This is similar to how HTTP semantic conventions distinguish between proxies and origin servers
  • Multiple gateway vendors would benefit from standardization here
  • These attributes were derived from real-world gateway implementations

Happy to discuss naming alternatives or prioritization of which attributes are most critical for an initial release.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Need triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions