feat: Add Kafka internal metrics collection#3494
Closed
tippmar-nr wants to merge 12 commits intomainfrom
Closed
Conversation
…gent
Implements non-invasive statistics collection via librdkafka callbacks to generate
critical internal metrics that light up New Relic's Kafka UI, achieving parity
with the Java agent.
Key Features:
- Reflection-based statistics handler creation for Confluent.Kafka 1.4.0 compatibility
- Composite handler pattern preserves existing customer statistics callbacks
- JSON parsing via KafkaStatisticsHelper with proper Newtonsoft.Json access
- Automatic statistics enablement (5-second interval) when not customer-configured
- Support for both Producer and Consumer statistics collection
Metrics Generated:
- MessageBroker/Kafka/Internal/producer-metrics/client/{clientId}/request-counter
- MessageBroker/Kafka/Internal/producer-metrics/client/{clientId}/response-counter
- MessageBroker/Kafka/Internal/consumer-metrics/client/{clientId}/request-counter
- MessageBroker/Kafka/Internal/consumer-metrics/client/{clientId}/response-counter
Enhanced integration tests with long-lived consumers (15 seconds) to validate
statistics callbacks trigger correctly and metrics are collected as expected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes ~250+ lines of unused test/debug code and optimizes logging levels for production deployment while preserving all core functionality. Code Cleanup: - Remove unused configuration fallback methods (TrySetConfigMethod, TrySetMethod, TryConfigPropertyAccess, TryDictionaryAccess, TryAddMethod, TryReflectionConfigMethods) - Simplify SetStatisticsIntervalOnBuilder to rely on customer configuration - Remove complex reflection chains that were failing in practice Logging Optimization: - Demote routine operations from Debug to Finest (statistics callbacks, handler creation, bootstrap discovery, metric recording, parsing details) - Keep Info level for important setup events and failures - Keep Debug level for actual troubleshooting scenarios (parsing failures, errors) Result: ~90% reduction in routine logging noise while maintaining full statistics collection functionality for both producer and consumer internal metrics. Integration tests pass, confirming all critical functionality preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pattern Add comprehensive integration test to verify that customer-provided Kafka statistics handlers continue to work correctly alongside our internal metrics collection. Key changes: - Added Test_CustomerStatisticsHandlers_WorkWithInternalMetrics integration test - Enhanced KafkaTestApp with custom statistics handler endpoints and tracking - Integrated custom statistics testing into fixture ExerciseApplication lifecycle - Added CustomerStatisticsCallbacks helper class for callback count tracking - Validated composite handler pattern preserves customer functionality The test ensures our wrapper's composite statistics handler approach doesn't interfere with existing customer callbacks while still collecting the internal metrics needed to populate New Relic's Kafka UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implement comprehensive statistics parsing in KafkaStatisticsHelper - Add 31 unit tests covering all statistics parsing functionality - Enable collection of 18 internal metrics matching Java agent UI requirements - Add finest-level debugging to Kafka producer and consumer wrappers - Fix test app topic name coordination between test harness and application - Refactor integration tests to resolve constructor/topic name issues - Add customer statistics handler validation to ensure composite pattern works This provides New Relic Kafka UI parity by collecting critical internal metrics: - Request/response counters for producers and consumers - Node-level and topic-level operational metrics - Partition-level message and byte counters - All metrics follow MessageBroker/Kafka/Internal/ naming convention Statistics collection runs every 5 seconds via librdkafka callbacks and coexists with customer statistics handlers using composite pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ilderWrapper - Add ShouldSetStatisticsInterval() method to check existing customer configuration - Only set default 5000ms interval if customer hasn't configured statistics - Preserve customer settings when statistics.interval.ms is already set - Use conservative approach: don't override if configuration cannot be determined - Add appropriate logging for debugging customer configuration scenarios This ensures our internal metrics collection doesn't interfere with customer applications that have their own Kafka statistics configuration requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l metrics Add RecordGaugeMetric through the full agent pipeline (IAgentExperimental → Agent → AgentHealthReporter → MetricBuilder) with gauge wire format [1, V, V, V, V, V²]. Implement scheduled drain pattern in KafkaBuilderWrapper that caches librdkafka statistics JSON via Interlocked.Exchange on the callback thread and parses/reports metrics once per 60s drain interval. Classify each Kafka metric as Cumulative, Gauge, or WindowAvg per librdkafka STATISTICS.md, computing deltas for cumulative counters and reporting raw values for gauges. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…improvements - Track statistics per Kafka client instance to prevent producer/consumer overwrites - Start drain timer exactly once via Interlocked.CompareExchange - Tie drain interval to MetricsHarvestCycle configuration (10s in tests, 60s prod) - Report raw value on first cumulative observation (counters start at 0) - Add integration test assertions for Gauge, Cumulative, and WindowAvg metric types - Reduce consumer duration from 15s to 5s and test fixture delays accordingly - Use variable assertions for consume counts and TraceContext metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Eliminate 6 intermediate model classes (KafkaMetricsData, etc.); CreateMetricsDictionary works directly from deserialized KafkaStatistics - Add PopulateMetricsDictionary for dictionary reuse across drain cycles - Merge topic and partition iteration into single pass - Remove duplicate client-level request-total/response-total metrics - Replace LINQ batch-size averaging with running sum - Use actual partition rxbytes for consumer topic bytes-consumed-total Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bd7c617 to
a08f37c
Compare
…mer lag Covers previously untested paths: negative node IDs (seed brokers), coordinator node IDs, large non-coordinator node IDs, and topic-level consumer records-lag-avg/consumed-total metrics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lease-info" This reverts commit 02f31ba.
Tests nodeId == int.MaxValue edge case where coordinatorId computes to 0, covering the false branch of the coordinatorId > 0 check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3494 +/- ##
==========================================
+ Coverage 81.77% 81.90% +0.13%
==========================================
Files 508 509 +1
Lines 34220 34411 +191
Branches 4040 4077 +37
==========================================
+ Hits 27984 28185 +201
+ Misses 5269 5257 -12
- Partials 967 969 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds comprehensive Kafka internal metrics collection to the .NET agent. This enables the New Relic Kafka UI to display critical monitoring data from .NET applications using the Confluent Kafka client (librdkafka).
Key changes:
SetStatisticsHandler, preserving any customer-configured handlersRecordGaugeMetricAPI onIAgentExperimentalfor reporting point-in-time valuesMetric format:
🤖 Generated with Claude Code