feat: Add Kafka internal metrics collection by tippmar-nr · Pull Request #3494 · newrelic/newrelic-dotnet-agent

tippmar-nr · 2026-03-18T20:01:37Z

Summary

Adds comprehensive Kafka internal metrics collection to the .NET agent. This enables the New Relic Kafka UI to display critical monitoring data from .NET applications using the Confluent Kafka client (librdkafka).

Key changes:

Statistics callback integration: Non-invasive statistics collection via librdkafka's SetStatisticsHandler, preserving any customer-configured handlers
Scheduled drain architecture: Lightweight callback caches latest JSON per client; a single scheduled drain task parses and reports metrics once per harvest interval (not on every callback)
Gauge metric pipeline: New RecordGaugeMetric API on IAgentExperimental for reporting point-in-time values
Comprehensive metric coverage: Client-level, broker/node-level, topic-level, and partition-level metrics including:
- Request/response counters, byte totals, message counts
- Broker RTT latency averages and connection counts
- Consumer group rebalance metrics, consumer lag, committed offsets
- Producer batch size averages, record send totals, idempotent state
Delta computation for cumulative counters: Tracks previous values per client to report deltas for ever-increasing librdkafka counters
Allocation-optimized parsing: Reusable dictionaries, struct-based metric values, merged single-pass loops, no LINQ in hot paths

Metric format:

MessageBroker/Kafka/Internal/{type}-metrics/client/{clientId}/{metric-name}
MessageBroker/Kafka/Internal/{type}-node-metrics/node/{nodeId}/client/{clientId}/{metric-name}
MessageBroker/Kafka/Internal/{type}-topic-metrics/topic/{topic}/client/{clientId}/{metric-name}
MessageBroker/Kafka/Internal/{type}-metrics/topic/{topic}/partition/{id}/client/{clientId}/{metric-name}

🤖 Generated with Claude Code

…gent Implements non-invasive statistics collection via librdkafka callbacks to generate critical internal metrics that light up New Relic's Kafka UI, achieving parity with the Java agent. Key Features: - Reflection-based statistics handler creation for Confluent.Kafka 1.4.0 compatibility - Composite handler pattern preserves existing customer statistics callbacks - JSON parsing via KafkaStatisticsHelper with proper Newtonsoft.Json access - Automatic statistics enablement (5-second interval) when not customer-configured - Support for both Producer and Consumer statistics collection Metrics Generated: - MessageBroker/Kafka/Internal/producer-metrics/client/{clientId}/request-counter - MessageBroker/Kafka/Internal/producer-metrics/client/{clientId}/response-counter - MessageBroker/Kafka/Internal/consumer-metrics/client/{clientId}/request-counter - MessageBroker/Kafka/Internal/consumer-metrics/client/{clientId}/response-counter Enhanced integration tests with long-lived consumers (15 seconds) to validate statistics callbacks trigger correctly and metrics are collected as expected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Removes ~250+ lines of unused test/debug code and optimizes logging levels for production deployment while preserving all core functionality. Code Cleanup: - Remove unused configuration fallback methods (TrySetConfigMethod, TrySetMethod, TryConfigPropertyAccess, TryDictionaryAccess, TryAddMethod, TryReflectionConfigMethods) - Simplify SetStatisticsIntervalOnBuilder to rely on customer configuration - Remove complex reflection chains that were failing in practice Logging Optimization: - Demote routine operations from Debug to Finest (statistics callbacks, handler creation, bootstrap discovery, metric recording, parsing details) - Keep Info level for important setup events and failures - Keep Debug level for actual troubleshooting scenarios (parsing failures, errors) Result: ~90% reduction in routine logging noise while maintaining full statistics collection functionality for both producer and consumer internal metrics. Integration tests pass, confirming all critical functionality preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…pattern Add comprehensive integration test to verify that customer-provided Kafka statistics handlers continue to work correctly alongside our internal metrics collection. Key changes: - Added Test_CustomerStatisticsHandlers_WorkWithInternalMetrics integration test - Enhanced KafkaTestApp with custom statistics handler endpoints and tracking - Integrated custom statistics testing into fixture ExerciseApplication lifecycle - Added CustomerStatisticsCallbacks helper class for callback count tracking - Validated composite handler pattern preserves customer functionality The test ensures our wrapper's composite statistics handler approach doesn't interfere with existing customer callbacks while still collecting the internal metrics needed to populate New Relic's Kafka UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Implement comprehensive statistics parsing in KafkaStatisticsHelper - Add 31 unit tests covering all statistics parsing functionality - Enable collection of 18 internal metrics matching Java agent UI requirements - Add finest-level debugging to Kafka producer and consumer wrappers - Fix test app topic name coordination between test harness and application - Refactor integration tests to resolve constructor/topic name issues - Add customer statistics handler validation to ensure composite pattern works This provides New Relic Kafka UI parity by collecting critical internal metrics: - Request/response counters for producers and consumers - Node-level and topic-level operational metrics - Partition-level message and byte counters - All metrics follow MessageBroker/Kafka/Internal/ naming convention Statistics collection runs every 5 seconds via librdkafka callbacks and coexists with customer statistics handlers using composite pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ilderWrapper - Add ShouldSetStatisticsInterval() method to check existing customer configuration - Only set default 5000ms interval if customer hasn't configured statistics - Preserve customer settings when statistics.interval.ms is already set - Use conservative approach: don't override if configuration cannot be determined - Add appropriate logging for debugging customer configuration scenarios This ensures our internal metrics collection doesn't interfere with customer applications that have their own Kafka statistics configuration requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…l metrics Add RecordGaugeMetric through the full agent pipeline (IAgentExperimental → Agent → AgentHealthReporter → MetricBuilder) with gauge wire format [1, V, V, V, V, V²]. Implement scheduled drain pattern in KafkaBuilderWrapper that caches librdkafka statistics JSON via Interlocked.Exchange on the callback thread and parses/reports metrics once per 60s drain interval. Classify each Kafka metric as Cumulative, Gauge, or WindowAvg per librdkafka STATISTICS.md, computing deltas for cumulative counters and reporting raw values for gauges. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…improvements - Track statistics per Kafka client instance to prevent producer/consumer overwrites - Start drain timer exactly once via Interlocked.CompareExchange - Tie drain interval to MetricsHarvestCycle configuration (10s in tests, 60s prod) - Report raw value on first cumulative observation (counters start at 0) - Add integration test assertions for Gauge, Cumulative, and WindowAvg metric types - Reduce consumer duration from 15s to 5s and test fixture delays accordingly - Use variable assertions for consume counts and TraceContext metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Eliminate 6 intermediate model classes (KafkaMetricsData, etc.); CreateMetricsDictionary works directly from deserialized KafkaStatistics - Add PopulateMetricsDictionary for dictionary reuse across drain cycles - Merge topic and partition iteration into single pass - Remove duplicate client-level request-total/response-total metrics - Replace LINQ batch-size averaging with running sum - Use actual partition rxbytes for consumer topic bytes-consumed-total Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…mer lag Covers previously untested paths: negative node IDs (seed brokers), coordinator node IDs, large non-coordinator node IDs, and topic-level consumer records-lag-avg/consumed-total metrics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lease-info" This reverts commit 02f31ba.

Tests nodeId == int.MaxValue edge case where coordinatorId computes to 0, covering the false branch of the coordinatorId > 0 check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-03-19T19:21:33Z

Codecov Report

❌ Patch coverage is 97.38220% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.90%. Comparing base (29da4dd) to head (644196a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/Agent/NewRelic/Agent/Core/Agent.cs	0.00%	3 Missing ⚠️
....Agent.Extensions/Helpers/KafkaStatisticsHelper.cs	98.88%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3494      +/-   ##
==========================================
+ Coverage   81.77%   81.90%   +0.13%     
==========================================
  Files         508      509       +1     
  Lines       34220    34411     +191     
  Branches     4040     4077      +37     
==========================================
+ Hits        27984    28185     +201     
+ Misses       5269     5257      -12     
- Partials      967      969       +2

Flag	Coverage Δ
Agent	`82.90% <97.38%> (+0.13%)`	⬆️
Profiler	`71.75% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...elic/Agent/Core/AgentHealth/AgentHealthReporter.cs	`91.18% <100.00%> (+0.05%)`	⬆️
.../NewRelic/Agent/Core/WireModels/MetricWireModel.cs	`96.78% <100.00%> (+0.02%)`	⬆️
....Agent.Extensions/Helpers/KafkaStatisticsHelper.cs	`98.88% <98.88%> (ø)`
src/Agent/NewRelic/Agent/Core/Agent.cs	`75.07% <0.00%> (-0.64%)`	⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tippmar-nr and others added 8 commits March 18, 2026 15:09

tippmar-nr force-pushed the feature/kafka-internal-metrics branch from bd7c617 to a08f37c Compare March 18, 2026 20:09

tippmar-nr and others added 4 commits March 19, 2026 13:08

fix: Update agent-metadata job dependencies to include get-release-info

02f31ba

Revert "fix: Update agent-metadata job dependencies to include get-re…

b6c146f

…lease-info" This reverts commit 02f31ba.

test: Add branch coverage for NormalizeNodeId coordinatorId <= 0 path

644196a

Tests nodeId == int.MaxValue edge case where coordinatorId computes to 0, covering the false branch of the coordinatorId > 0 check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tippmar-nr changed the title ~~feat: Add Kafka internal metrics collection for UI parity with Java agent~~ feat: Add Kafka internal metrics collection Mar 19, 2026

tippmar-nr closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Kafka internal metrics collection#3494

feat: Add Kafka internal metrics collection#3494
tippmar-nr wants to merge 12 commits intomainfrom
feature/kafka-internal-metrics

tippmar-nr commented Mar 18, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tippmar-nr commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes:

Metric format:

Uh oh!

codecov-commenter commented Mar 19, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tippmar-nr commented Mar 18, 2026 •

edited

Loading