Skip to content

feat: Add Kafka internal metrics collection#3494

Closed
tippmar-nr wants to merge 12 commits intomainfrom
feature/kafka-internal-metrics
Closed

feat: Add Kafka internal metrics collection#3494
tippmar-nr wants to merge 12 commits intomainfrom
feature/kafka-internal-metrics

Conversation

@tippmar-nr
Copy link
Copy Markdown
Member

@tippmar-nr tippmar-nr commented Mar 18, 2026

Summary

Adds comprehensive Kafka internal metrics collection to the .NET agent. This enables the New Relic Kafka UI to display critical monitoring data from .NET applications using the Confluent Kafka client (librdkafka).

Key changes:

  • Statistics callback integration: Non-invasive statistics collection via librdkafka's SetStatisticsHandler, preserving any customer-configured handlers
  • Scheduled drain architecture: Lightweight callback caches latest JSON per client; a single scheduled drain task parses and reports metrics once per harvest interval (not on every callback)
  • Gauge metric pipeline: New RecordGaugeMetric API on IAgentExperimental for reporting point-in-time values
  • Comprehensive metric coverage: Client-level, broker/node-level, topic-level, and partition-level metrics including:
    • Request/response counters, byte totals, message counts
    • Broker RTT latency averages and connection counts
    • Consumer group rebalance metrics, consumer lag, committed offsets
    • Producer batch size averages, record send totals, idempotent state
  • Delta computation for cumulative counters: Tracks previous values per client to report deltas for ever-increasing librdkafka counters
  • Allocation-optimized parsing: Reusable dictionaries, struct-based metric values, merged single-pass loops, no LINQ in hot paths

Metric format:

MessageBroker/Kafka/Internal/{type}-metrics/client/{clientId}/{metric-name}
MessageBroker/Kafka/Internal/{type}-node-metrics/node/{nodeId}/client/{clientId}/{metric-name}
MessageBroker/Kafka/Internal/{type}-topic-metrics/topic/{topic}/client/{clientId}/{metric-name}
MessageBroker/Kafka/Internal/{type}-metrics/topic/{topic}/partition/{id}/client/{clientId}/{metric-name}

🤖 Generated with Claude Code

tippmar-nr and others added 8 commits March 18, 2026 15:09
…gent

Implements non-invasive statistics collection via librdkafka callbacks to generate
critical internal metrics that light up New Relic's Kafka UI, achieving parity
with the Java agent.

Key Features:
- Reflection-based statistics handler creation for Confluent.Kafka 1.4.0 compatibility
- Composite handler pattern preserves existing customer statistics callbacks
- JSON parsing via KafkaStatisticsHelper with proper Newtonsoft.Json access
- Automatic statistics enablement (5-second interval) when not customer-configured
- Support for both Producer and Consumer statistics collection

Metrics Generated:
- MessageBroker/Kafka/Internal/producer-metrics/client/{clientId}/request-counter
- MessageBroker/Kafka/Internal/producer-metrics/client/{clientId}/response-counter
- MessageBroker/Kafka/Internal/consumer-metrics/client/{clientId}/request-counter
- MessageBroker/Kafka/Internal/consumer-metrics/client/{clientId}/response-counter

Enhanced integration tests with long-lived consumers (15 seconds) to validate
statistics callbacks trigger correctly and metrics are collected as expected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes ~250+ lines of unused test/debug code and optimizes logging levels
for production deployment while preserving all core functionality.

Code Cleanup:
- Remove unused configuration fallback methods (TrySetConfigMethod, TrySetMethod,
  TryConfigPropertyAccess, TryDictionaryAccess, TryAddMethod, TryReflectionConfigMethods)
- Simplify SetStatisticsIntervalOnBuilder to rely on customer configuration
- Remove complex reflection chains that were failing in practice

Logging Optimization:
- Demote routine operations from Debug to Finest (statistics callbacks, handler creation,
  bootstrap discovery, metric recording, parsing details)
- Keep Info level for important setup events and failures
- Keep Debug level for actual troubleshooting scenarios (parsing failures, errors)

Result: ~90% reduction in routine logging noise while maintaining full statistics
collection functionality for both producer and consumer internal metrics.

Integration tests pass, confirming all critical functionality preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pattern

Add comprehensive integration test to verify that customer-provided Kafka statistics
handlers continue to work correctly alongside our internal metrics collection.

Key changes:
- Added Test_CustomerStatisticsHandlers_WorkWithInternalMetrics integration test
- Enhanced KafkaTestApp with custom statistics handler endpoints and tracking
- Integrated custom statistics testing into fixture ExerciseApplication lifecycle
- Added CustomerStatisticsCallbacks helper class for callback count tracking
- Validated composite handler pattern preserves customer functionality

The test ensures our wrapper's composite statistics handler approach doesn't
interfere with existing customer callbacks while still collecting the internal
metrics needed to populate New Relic's Kafka UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implement comprehensive statistics parsing in KafkaStatisticsHelper
- Add 31 unit tests covering all statistics parsing functionality
- Enable collection of 18 internal metrics matching Java agent UI requirements
- Add finest-level debugging to Kafka producer and consumer wrappers
- Fix test app topic name coordination between test harness and application
- Refactor integration tests to resolve constructor/topic name issues
- Add customer statistics handler validation to ensure composite pattern works

This provides New Relic Kafka UI parity by collecting critical internal metrics:
- Request/response counters for producers and consumers
- Node-level and topic-level operational metrics
- Partition-level message and byte counters
- All metrics follow MessageBroker/Kafka/Internal/ naming convention

Statistics collection runs every 5 seconds via librdkafka callbacks and
coexists with customer statistics handlers using composite pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ilderWrapper

- Add ShouldSetStatisticsInterval() method to check existing customer configuration
- Only set default 5000ms interval if customer hasn't configured statistics
- Preserve customer settings when statistics.interval.ms is already set
- Use conservative approach: don't override if configuration cannot be determined
- Add appropriate logging for debugging customer configuration scenarios

This ensures our internal metrics collection doesn't interfere with customer
applications that have their own Kafka statistics configuration requirements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l metrics

Add RecordGaugeMetric through the full agent pipeline (IAgentExperimental
→ Agent → AgentHealthReporter → MetricBuilder) with gauge wire format
[1, V, V, V, V, V²]. Implement scheduled drain pattern in KafkaBuilderWrapper
that caches librdkafka statistics JSON via Interlocked.Exchange on the callback
thread and parses/reports metrics once per 60s drain interval. Classify each
Kafka metric as Cumulative, Gauge, or WindowAvg per librdkafka STATISTICS.md,
computing deltas for cumulative counters and reporting raw values for gauges.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…improvements

- Track statistics per Kafka client instance to prevent producer/consumer overwrites
- Start drain timer exactly once via Interlocked.CompareExchange
- Tie drain interval to MetricsHarvestCycle configuration (10s in tests, 60s prod)
- Report raw value on first cumulative observation (counters start at 0)
- Add integration test assertions for Gauge, Cumulative, and WindowAvg metric types
- Reduce consumer duration from 15s to 5s and test fixture delays accordingly
- Use variable assertions for consume counts and TraceContext metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Eliminate 6 intermediate model classes (KafkaMetricsData, etc.);
  CreateMetricsDictionary works directly from deserialized KafkaStatistics
- Add PopulateMetricsDictionary for dictionary reuse across drain cycles
- Merge topic and partition iteration into single pass
- Remove duplicate client-level request-total/response-total metrics
- Replace LINQ batch-size averaging with running sum
- Use actual partition rxbytes for consumer topic bytes-consumed-total

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tippmar-nr tippmar-nr force-pushed the feature/kafka-internal-metrics branch from bd7c617 to a08f37c Compare March 18, 2026 20:09
tippmar-nr and others added 4 commits March 19, 2026 13:08
…mer lag

Covers previously untested paths: negative node IDs (seed brokers),
coordinator node IDs, large non-coordinator node IDs, and topic-level
consumer records-lag-avg/consumed-total metrics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests nodeId == int.MaxValue edge case where coordinatorId computes
to 0, covering the false branch of the coordinatorId > 0 check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.38220% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.90%. Comparing base (29da4dd) to head (644196a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/Agent/NewRelic/Agent/Core/Agent.cs 0.00% 3 Missing ⚠️
....Agent.Extensions/Helpers/KafkaStatisticsHelper.cs 98.88% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3494      +/-   ##
==========================================
+ Coverage   81.77%   81.90%   +0.13%     
==========================================
  Files         508      509       +1     
  Lines       34220    34411     +191     
  Branches     4040     4077      +37     
==========================================
+ Hits        27984    28185     +201     
+ Misses       5269     5257      -12     
- Partials      967      969       +2     
Flag Coverage Δ
Agent 82.90% <97.38%> (+0.13%) ⬆️
Profiler 71.75% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...elic/Agent/Core/AgentHealth/AgentHealthReporter.cs 91.18% <100.00%> (+0.05%) ⬆️
.../NewRelic/Agent/Core/WireModels/MetricWireModel.cs 96.78% <100.00%> (+0.02%) ⬆️
....Agent.Extensions/Helpers/KafkaStatisticsHelper.cs 98.88% <98.88%> (ø)
src/Agent/NewRelic/Agent/Core/Agent.cs 75.07% <0.00%> (-0.64%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tippmar-nr tippmar-nr changed the title feat: Add Kafka internal metrics collection for UI parity with Java agent feat: Add Kafka internal metrics collection Mar 19, 2026
@tippmar-nr tippmar-nr closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants