Skip to content

Commit 4df2813

Browse files
authored
[REFACTOR] Phase 3.0: Merge Duplicate Metrics Classes (#474)
* docs: add metrics system audit document - Inventory all existing metrics classes - Document field and method duplication analysis - Define canonical metric names following Prometheus conventions - Propose unified MetricsBase architecture Part of Phase 3.1 (#470) * refactor(metrics): create MetricsBase class to eliminate duplication - Add MetricsBase abstract class with common atomic counters - Migrate ThreadPoolMetrics to extend MetricsBase - Migrate EnhancedThreadPoolMetrics to extend MetricsBase - Remove 5 duplicated atomic fields from EnhancedThreadPoolMetrics - Add utilization() and success_rate() helper methods to base class Code duplication reduced by approximately 70% in metrics system. Part of Phase 3.2 (#471) * feat(metrics): add pluggable metrics backend system - Add MetricsBackend abstract interface for extensible export - Implement PrometheusBackend for Prometheus/OpenMetrics format - Implement JsonBackend for REST APIs and dashboards - Implement LoggingBackend for debugging and diagnostics - Add BackendRegistry singleton for backend management - Support custom labels and metric name prefixes Part of Phase 3.3 (#472) * refactor(metrics): migrate export methods to use MetricsBackend - Update EnhancedThreadPoolMetrics::to_json() to delegate to JsonBackend - Update EnhancedThreadPoolMetrics::to_prometheus() to use PrometheusBackend - Remove redundant sstream/iomanip includes from enhanced_metrics.cpp - Maintain backward compatible API while using unified backend system Part of Phase 3.4 (#473) * docs: update forward declarations and metrics audit document - Add metrics namespace forward declarations to forward.h - Include MetricsBase, ThreadPoolMetrics, EnhancedThreadPoolMetrics - Include MetricsBackend classes and snapshot structures - Update CMakeLists.txt with new metrics header files Completes documentation updates for Phase 3.0
1 parent dd0fefe commit 4df2813

9 files changed

Lines changed: 1291 additions & 183 deletions

File tree

core/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ set(CORE_HEADERS
5555
# Metrics module
5656
${CMAKE_CURRENT_SOURCE_DIR}/../include/kcenon/thread/metrics/enhanced_metrics.h
5757
${CMAKE_CURRENT_SOURCE_DIR}/../include/kcenon/thread/metrics/latency_histogram.h
58+
${CMAKE_CURRENT_SOURCE_DIR}/../include/kcenon/thread/metrics/metrics_backend.h
59+
${CMAKE_CURRENT_SOURCE_DIR}/../include/kcenon/thread/metrics/metrics_base.h
5860
${CMAKE_CURRENT_SOURCE_DIR}/../include/kcenon/thread/metrics/sliding_window_counter.h
5961
${CMAKE_CURRENT_SOURCE_DIR}/../include/kcenon/thread/metrics/thread_pool_metrics.h
6062
# Resilience module
@@ -102,6 +104,7 @@ set(CORE_SOURCES
102104
# Metrics implementation
103105
${CMAKE_CURRENT_SOURCE_DIR}/../src/metrics/enhanced_metrics.cpp
104106
${CMAKE_CURRENT_SOURCE_DIR}/../src/metrics/latency_histogram.cpp
107+
${CMAKE_CURRENT_SOURCE_DIR}/../src/metrics/metrics_backend.cpp
105108
${CMAKE_CURRENT_SOURCE_DIR}/../src/metrics/sliding_window_counter.cpp
106109
# Resilience implementation
107110
${CMAKE_CURRENT_SOURCE_DIR}/../src/resilience/circuit_breaker.cpp

docs/metrics_audit.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Metrics System Audit
2+
3+
## Overview
4+
5+
This document provides a comprehensive audit of the metrics system in thread_system,
6+
identifying duplications, defining canonical metric names, and proposing a unified architecture.
7+
8+
## Metrics Classes Inventory
9+
10+
### Primary Classes
11+
12+
| Class | Location | Lines | Purpose |
13+
|-------|----------|-------|---------|
14+
| `ThreadPoolMetrics` | `metrics/thread_pool_metrics.h` | 75 | Lightweight metrics container |
15+
| `EnhancedThreadPoolMetrics` | `metrics/enhanced_metrics.h/cpp` | 856 | Production-grade observability |
16+
| `LatencyHistogram` | `metrics/latency_histogram.h/cpp` | 649 | HDR-style histogram |
17+
| `SlidingWindowCounter` | `metrics/sliding_window_counter.h/cpp` | 499 | Throughput measurement |
18+
19+
### Supporting Structures
20+
21+
| Structure | Location | Purpose |
22+
|-----------|----------|---------|
23+
| `ThreadPoolMetrics::Snapshot` | `thread_pool_metrics.h` | Basic metrics snapshot |
24+
| `EnhancedSnapshot` | `enhanced_metrics.h` | Comprehensive snapshot with percentiles |
25+
| `WorkerMetrics` | `enhanced_metrics.h` | Per-worker tracking |
26+
| `scaling_metrics_sample` | `scaling/scaling_metrics.h` | Autoscaling decisions |
27+
| `work_stealing_stats_snapshot` | `stealing/work_stealing_stats.h` | NUMA-aware scheduling |
28+
29+
## Duplication Analysis
30+
31+
### Field Duplication (ThreadPoolMetrics vs EnhancedThreadPoolMetrics)
32+
33+
| Field | ThreadPoolMetrics | EnhancedThreadPoolMetrics | Status |
34+
|-------|-------------------|---------------------------|--------|
35+
| `tasks_submitted_` | Yes | Yes | **DUPLICATE** |
36+
| `tasks_executed_` | Yes | Yes | **DUPLICATE** |
37+
| `tasks_failed_` | Yes | Yes | **DUPLICATE** |
38+
| `total_busy_time_ns_` | Yes | Yes | **DUPLICATE** |
39+
| `total_idle_time_ns_` | Yes | Yes | **DUPLICATE** |
40+
| `tasks_enqueued_` | Yes | No | Unique |
41+
42+
**Field Duplication Rate**: 83% (5 of 6 fields)
43+
44+
### Method Duplication
45+
46+
| Method | ThreadPoolMetrics | EnhancedThreadPoolMetrics | Status |
47+
|--------|-------------------|---------------------------|--------|
48+
| `record_submission()` | Yes | Yes | **DUPLICATE** |
49+
| `record_execution()` | Yes | Yes | **DUPLICATE** |
50+
| `reset()` | Yes | Yes | **DUPLICATE** |
51+
| `snapshot()` | Yes | Yes | **DUPLICATE** |
52+
53+
**Method Duplication Rate**: 100% (4 core methods)
54+
55+
### Implementation Pattern Duplication
56+
57+
Both classes use identical patterns:
58+
- `fetch_add()` with `memory_order_relaxed`
59+
- `store()` with `memory_order_relaxed`
60+
- `load()` with `memory_order_relaxed`
61+
62+
**Overall Duplication Estimate**: 40-50%
63+
64+
## Canonical Metric Names (Prometheus Convention)
65+
66+
### Counter Metrics
67+
68+
| Current Name | Canonical Name | Type | Description |
69+
|--------------|----------------|------|-------------|
70+
| `tasks_submitted_` | `thread_pool_tasks_submitted_total` | Counter | Total tasks submitted |
71+
| `tasks_enqueued_` | `thread_pool_tasks_enqueued_total` | Counter | Total tasks enqueued |
72+
| `tasks_executed_` | `thread_pool_tasks_executed_total` | Counter | Total tasks executed |
73+
| `tasks_failed_` | `thread_pool_tasks_failed_total` | Counter | Total failed tasks |
74+
| `total_busy_time_ns_` | `thread_pool_busy_time_nanoseconds_total` | Counter | Total busy time |
75+
| `total_idle_time_ns_` | `thread_pool_idle_time_nanoseconds_total` | Counter | Total idle time |
76+
77+
### Gauge Metrics
78+
79+
| Current Name | Canonical Name | Type | Description |
80+
|--------------|----------------|------|-------------|
81+
| `current_queue_depth_` | `thread_pool_queue_depth` | Gauge | Current queue depth |
82+
| `peak_queue_depth_` | `thread_pool_queue_depth_peak` | Gauge | Peak queue depth |
83+
| `active_workers_` | `thread_pool_workers_active` | Gauge | Active worker count |
84+
85+
### Histogram Metrics
86+
87+
| Current Name | Canonical Name | Type | Description |
88+
|--------------|----------------|------|-------------|
89+
| `enqueue_latency_` | `thread_pool_enqueue_latency_seconds` | Histogram | Enqueue latency |
90+
| `execution_latency_` | `thread_pool_execution_latency_seconds` | Histogram | Execution latency |
91+
| `wait_time_` | `thread_pool_wait_time_seconds` | Histogram | Queue wait time |
92+
93+
## Recommended Architecture
94+
95+
### Class Hierarchy
96+
97+
```
98+
MetricsBase (abstract base)
99+
├── ThreadPoolMetrics (lightweight)
100+
└── EnhancedThreadPoolMetrics (full-featured)
101+
├── LatencyHistogram (3 instances)
102+
├── SlidingWindowCounter (2 instances)
103+
└── WorkerMetrics[] (per-worker)
104+
```
105+
106+
### MetricsBase Interface
107+
108+
```cpp
109+
class MetricsBase {
110+
public:
111+
virtual ~MetricsBase() = default;
112+
113+
// Recording methods
114+
void record_submission(std::size_t count = 1);
115+
void record_execution(std::uint64_t duration_ns, bool success);
116+
void record_idle_time(std::uint64_t duration_ns);
117+
virtual void reset();
118+
119+
// Common getters
120+
std::uint64_t tasks_submitted() const;
121+
std::uint64_t tasks_executed() const;
122+
std::uint64_t tasks_failed() const;
123+
std::uint64_t total_busy_time_ns() const;
124+
std::uint64_t total_idle_time_ns() const;
125+
126+
protected:
127+
std::atomic<std::uint64_t> tasks_submitted_{0};
128+
std::atomic<std::uint64_t> tasks_executed_{0};
129+
std::atomic<std::uint64_t> tasks_failed_{0};
130+
std::atomic<std::uint64_t> total_busy_time_ns_{0};
131+
std::atomic<std::uint64_t> total_idle_time_ns_{0};
132+
};
133+
```
134+
135+
### Benefits
136+
137+
1. **Code Reduction**: ~70% reduction in duplicated code
138+
2. **Consistency**: Uniform metric recording across all variants
139+
3. **Extensibility**: Easy to add new metrics types
140+
4. **Maintainability**: Single point of change for core logic
141+
142+
## Migration Path
143+
144+
1. **Phase 3.1** (Complete): Audit and documentation
145+
2. **Phase 3.2**: Create MetricsBase class
146+
3. **Phase 3.3**: Implement backend interfaces
147+
4. **Phase 3.4**: Migrate existing code
148+
149+
## Related Issues
150+
151+
- #436: Phase 3.0: Merge Duplicate Metrics Classes
152+
- #470: Phase 3.1: Metrics Audit and Documentation
153+
- #471: Phase 3.2: Create Unified MetricsBase Class
154+
- #472: Phase 3.3: Implement MetricsBackend Interface
155+
- #473: Phase 3.4: Migrate thread_pool to Unified Metrics
156+
157+
---
158+
159+
*Document created as part of Phase 3.0 refactoring*

include/kcenon/thread/forward.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,24 @@ using lockfree_queue [[deprecated(
182182
"Use detail::concurrent_queue<T> instead. "
183183
"For true lock-free queue, see detail::lockfree_job_queue with hazard pointers.")]] = detail::concurrent_queue<T>;
184184

185+
// ============================================================================
186+
// Metrics types (in metrics sub-namespace)
187+
// ============================================================================
188+
189+
namespace metrics {
190+
class MetricsBase;
191+
class ThreadPoolMetrics;
192+
class EnhancedThreadPoolMetrics;
193+
class MetricsBackend;
194+
class PrometheusBackend;
195+
class JsonBackend;
196+
class LoggingBackend;
197+
class BackendRegistry;
198+
struct BaseSnapshot;
199+
struct EnhancedSnapshot;
200+
struct WorkerMetrics;
201+
} // namespace metrics
202+
185203
// ============================================================================
186204
// Diagnostics types (in diagnostics sub-namespace)
187205
// ============================================================================

include/kcenon/thread/metrics/enhanced_metrics.h

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3333
#pragma once
3434

3535
#include <kcenon/thread/metrics/latency_histogram.h>
36+
#include <kcenon/thread/metrics/metrics_base.h>
3637
#include <kcenon/thread/metrics/sliding_window_counter.h>
3738

3839
#include <atomic>
@@ -266,8 +267,9 @@ struct WorkerMetrics {
266267
* @see LatencyHistogram
267268
* @see SlidingWindowCounter
268269
* @see ThreadPoolMetrics
270+
* @see MetricsBase
269271
*/
270-
class EnhancedThreadPoolMetrics {
272+
class EnhancedThreadPoolMetrics : public MetricsBase {
271273
public:
272274
/**
273275
* @brief Constructs enhanced metrics with the specified worker count.
@@ -278,7 +280,7 @@ class EnhancedThreadPoolMetrics {
278280
/**
279281
* @brief Destructor.
280282
*/
281-
~EnhancedThreadPoolMetrics() = default;
283+
~EnhancedThreadPoolMetrics() override = default;
282284

283285
// Non-copyable, non-movable for thread safety
284286
EnhancedThreadPoolMetrics(const EnhancedThreadPoolMetrics&) = delete;
@@ -394,7 +396,7 @@ class EnhancedThreadPoolMetrics {
394396
*
395397
* Clears all histograms, counters, and per-worker stats.
396398
*/
397-
void reset();
399+
void reset() override;
398400

399401
/**
400402
* @brief Update worker count.
@@ -432,10 +434,8 @@ class EnhancedThreadPoolMetrics {
432434
SlidingWindowCounter throughput_1s_;
433435
SlidingWindowCounter throughput_1m_;
434436

435-
// Basic counters
436-
std::atomic<std::uint64_t> tasks_submitted_{0};
437-
std::atomic<std::uint64_t> tasks_executed_{0};
438-
std::atomic<std::uint64_t> tasks_failed_{0};
437+
// Note: Basic counters (tasks_submitted_, tasks_executed_, tasks_failed_,
438+
// total_busy_time_ns_, total_idle_time_ns_) are inherited from MetricsBase
439439

440440
// Queue depth tracking
441441
std::atomic<std::size_t> current_queue_depth_{0};
@@ -445,8 +445,6 @@ class EnhancedThreadPoolMetrics {
445445

446446
// Worker tracking
447447
std::atomic<std::size_t> active_workers_{0};
448-
std::atomic<std::uint64_t> total_busy_time_ns_{0};
449-
std::atomic<std::uint64_t> total_idle_time_ns_{0};
450448

451449
// Per-worker metrics
452450
mutable std::mutex workers_mutex_;

0 commit comments

Comments
 (0)