Skip to content

[Enhancement]: Server Self-Monitoring #67

@kornys

Description

@kornys

Related problem

Operators need to monitor the MCP server itself — not just Kafka clusters. No Micrometer metrics are currently exported. We need tool call counts/latencies, K8s API call metrics, guardrail activations, and active session tracking.

Suggested solution

Dependencies

strimzi-mcp/pom.xml — add:

<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-micrometer-registry-prometheus</artifactId>
</dependency>

Metrics to Export

Tool metrics (via GuardrailInterceptor or dedicated MetricsFilter):

  • mcp_tool_calls_total{tool="list_kafka_clusters", status="success|error"} — counter
  • mcp_tool_call_duration_seconds{tool="list_kafka_clusters"} — timer/histogram

Guardrail metrics:

  • mcp_guardrail_rate_limit_rejected_total{category="log|metrics|general"} — counter
  • mcp_guardrail_response_truncated_total{tool="..."} — counter
  • mcp_guardrail_log_redactions_total — counter

K8s API metrics (Fabric8 client exposes some via Micrometer automatically):

  • Kubernetes client HTTP request metrics (if not auto-instrumented, add manual timers)

Implementation

common/src/main/java/io/streamshub/mcp/common/guardrail/MetricsFilter.java

  • New GuardrailFilter at @Priority(5) (very early, before audit at 10)
  • filterInput(): start timer
  • filterOutput(): stop timer, increment counter with tool name and status tags
  • Inject MeterRegistry from Micrometer

Modify existing filters to emit metrics:

  • RateLimitFilter: increment mcp_guardrail_rate_limit_rejected_total when rejecting
  • LogRedactionFilter: increment mcp_guardrail_log_redactions_total when patterns matched
  • ResponseSizeLimitFilter: increment mcp_guardrail_response_truncated_total when truncating

Configuration

strimzi-mcp/src/main/resources/application.properties:

# Metrics endpoint
quarkus.micrometer.export.prometheus.enabled=true
quarkus.micrometer.export.prometheus.path=/q/metrics

Test Files

  • MetricsFilterTest.java — verify counters and timers are recorded
  • Integration test: invoke tools, scrape /q/metrics, verify expected metrics present

Verification

  1. mvn clean test — new tests pass
  2. mvn quarkus:dev — invoke some tools, then curl localhost:8080/q/metrics | grep mcp_
  3. Verify tool call counts, latencies, and guardrail metrics appear
  4. Verify /q/metrics is accessible without auth (health policy covers /q/*)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions