Skip to content

Add pluggable stats registry for metrics collection#163

Open
equals215 wants to merge 8 commits intomasterfrom
accept-stats-registry
Open

Add pluggable stats registry for metrics collection#163
equals215 wants to merge 8 commits intomasterfrom
accept-stats-registry

Conversation

@equals215
Copy link
Copy Markdown
Member

@equals215 equals215 commented Nov 24, 2025

to be reviewed after #160 as this branch implements #160 changes

Summary

This PR refactors the metrics collection system to use a pluggable StatsRegistry interface with Prometheus-style label support, replacing global atomic counters with a flexible registry pattern that allows integration with external monitoring systems.

Changes

Core Infrastructure

  • New StatsRegistry interface (stats.go): Defines Counter, Gauge, and Histogram interfaces with a StatsRegistry for registration
  • Prometheus-style label support: All metric types include WithLabels() method for dimensional metrics
    • Labels are sorted by key for consistent metric identification
    • Each unique label combination creates a separate metric series
  • Removed global atomic variables: Eliminated global counters like DataTotal, CDXDedupeTotalBytes, etc. from client.go
  • HTTPClientSettings enhancement: Added StatsRegistry field to allow users to provide custom metrics implementations
  • Local registry implementation: Provides thread-safe default implementation when no external registry is specified

Metrics Tracked

  • WARC Writing: total_data_written - Total bytes written to WARC files
  • Local Dedupe: local_deduped_bytes_total, local_deduped_total
  • Doppelganger Dedupe: doppelganger_deduped_bytes_total, doppelganger_deduped_total
  • CDX Dedupe: cdx_deduped_bytes_total, cdx_deduped_total
  • Proxy Stats (with proxy label):
    • proxy_requests_total - Total requests through each proxy
    • proxy_errors_total - Total errors for each proxy
    • proxy_last_used_nanoseconds - Last usage timestamp

Updated Components

  • Deduplication tracking: Refactored CDX, Doppelganger, and local dedupe metrics to use the registry pattern
  • Proxy stats: Added comprehensive statistics tracking for proxy usage with label-based proxy identification
  • Updated existing tests: Modified all client tests to use the new registry-based metrics access

Documentation

  • README.md: Added comprehensive "Metrics and Observability" section with:
    • Interface implementation examples
    • Complete list of available metrics
    • Label usage examples
    • Direct link to stats.go for interface contract

Test Coverage

  • stats_test.go: Comprehensive unit tests for registry, counters, gauges, histograms, and labels
  • dialer_test.go: Extensive proxy stats tests including:
    • Metric name generation with labels
    • Request count tracking
    • Last used timestamp tracking
    • Round-robin proxy selection with stats
    • Domain and proxy type filtering with stats
    • Nil registry safety

Benefits

  • Pluggable metrics: Users can now integrate gowarc with their preferred metrics systems (Prometheus, Datadog, etc.)
  • Dimensional metrics: Label support enables rich multi-dimensional observability
  • Better encapsulation: Metrics are no longer global state, improving testability and preventing conflicts
  • Backward compatible: Automatically creates a local registry if none is provided (doesn't support histograms cause phewww it's not easy to do)
  • Thread-safe: Registry implementation uses proper locking for concurrent access
  • Standardized metric names: All metrics now have consistent naming and help text
  • Proxy observability: Track usage, errors, and performance of individual proxies

Interface Example

// Implement the StatsRegistry interface
type MyPrometheusRegistry struct {
    // Your Prometheus registry fields
}

func (r *MyPrometheusRegistry) RegisterCounter(name, help string, labelNames []string) warc.Counter {
    // Return a Counter that wraps your Prometheus counter
    // The Counter interface requires WithLabels() method for dimensional metrics
}

// Use with gowarc
clientSettings := warc.HTTPClientSettings{
    StatsRegistry: &MyPrometheusRegistry{},
    // ... other settings
}

Test Plan

  • All existing tests pass with the new registry pattern
  • New unit tests verify Counter, Gauge, and Histogram implementations
  • Label support tests verify consistent metric identification
  • Concurrency tests ensure thread-safety of the registry
  • Local dedupe, CDX dedupe, and Doppelganger dedupe metrics work correctly
  • Proxy stats tests verify request counting, timestamp tracking, and round-robin distribution
  • Domain and proxy type filtering work correctly with stats

CorentinB and others added 3 commits November 20, 2025 18:36
- Support multiple proxies with round-robin selection
- Add ProxyNetwork enum (IPv4/IPv6 filtering)
- Add ProxyType enum (Mobile/Residential/Datacenter)
- Add per-domain routing with glob patterns
- Add per-proxy statistics (RequestCount, ErrorCount, LastUsed)
- Add context-based proxy type selection
- Breaking change: replace Proxy string with Proxies []ProxyConfig
@equals215 equals215 self-assigned this Nov 24, 2025
@equals215 equals215 added the enhancement New feature or request label Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants