You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KV router statistics are automatically exposed by LLM workers and KV router components on the backend system status port (port 8081) with the `dynamo_component_kvstats_*` prefix. These metrics provide insights into GPU memory usage and cache efficiency:
129
-
130
-
-`dynamo_component_kvstats_active_blocks`: Number of active KV cache blocks currently in use (gauge)
131
-
-`dynamo_component_kvstats_total_blocks`: Total number of KV cache blocks available (gauge)
132
-
-`dynamo_component_kvstats_gpu_cache_usage_percent`: GPU cache usage as a percentage (0.0-1.0) (gauge)
133
-
-`dynamo_component_kvstats_gpu_prefix_cache_hit_rate`: GPU prefix cache hit rate as a percentage (0.0-1.0) (gauge)
134
-
135
-
These metrics are published by:
136
-
-**LLM Workers**: vLLM and TRT-LLM backends publish these metrics through their respective publishers
137
-
-**KV Router**: The KV router component aggregates and exposes these metrics for load balancing decisions
138
-
139
126
### Specialized Component Metrics
140
127
141
128
Some components expose additional metrics specific to their functionality:
KV router statistics are automatically exposed by LLM workers and KV router components on the backend system status port (port 8081) with the `dynamo_component_kvstats_*` prefix. These metrics provide insights into GPU memory usage and cache efficiency:
128
-
129
-
-`dynamo_component_kvstats_active_blocks`: Number of active KV cache blocks currently in use (gauge)
130
-
-`dynamo_component_kvstats_total_blocks`: Total number of KV cache blocks available (gauge)
131
-
-`dynamo_component_kvstats_gpu_cache_usage_percent`: GPU cache usage as a percentage (0.0-1.0) (gauge)
132
-
-`dynamo_component_kvstats_gpu_prefix_cache_hit_rate`: GPU prefix cache hit rate as a percentage (0.0-1.0) (gauge)
133
-
134
-
These metrics are published by:
135
-
-**LLM Workers**: vLLM and TRT-LLM backends publish these metrics through their respective publishers
136
-
-**KV Router**: The KV router component aggregates and exposes these metrics for load balancing decisions
137
-
138
125
### Specialized Component Metrics
139
126
140
127
Some components expose additional metrics specific to their functionality:
0 commit comments