Skip to content

Remove use of solana_metrics::Counter and associated macro's #9185

@steviez

Description

@steviez

Problem

The metrics Counter and associated macro's need to go. They are "convenient" and "flexible" for tracking an individual metric that may get updated by multiple threads, but that convenience comes at the cost of hidden performance impact. Namely, each counter uses 5 atomics:

pub counts: AtomicUsize,
pub times: AtomicUsize,
/// last accumulated value logged
pub lastlog: AtomicUsize,
pub lograte: AtomicUsize,
pub metricsrate: AtomicU64,

and involves multiple operations on these fields when the field is updated:
pub fn inc(&self, level: log::Level, events: usize) {
let now = solana_time_utils::timestamp();
let counts = self.counts.fetch_add(events, Ordering::Relaxed);
let times = self.times.fetch_add(1, Ordering::Relaxed);
let lograte = self.lograte.load(Ordering::Relaxed);
let metricsrate = self.metricsrate.load(Ordering::Relaxed);
if times.is_multiple_of(lograte) && times > 0 && log_enabled!(level) {
log!(
level,
"COUNTER:{{\"name\": \"{}\", \"counts\": {}, \"samples\": {times}, \"now\": \
{now}, \"events\": {events}}}",
self.name,
counts + events,
);
}
let lastlog = self.lastlog.load(Ordering::Relaxed);
#[allow(deprecated)]
let prev = self
.lastlog
.compare_and_swap(lastlog, counts, Ordering::Relaxed);
if prev == lastlog {
let bucket = now / metricsrate;
let counter = CounterPoint {
name: self.name,
count: counts as i64 - lastlog as i64,
timestamp: SystemTime::now(),
};
submit_counter(counter, level, bucket);

This is not suitable for any hot path, and has mostly been removed from the codebase. There are still some instances left though.

Proposed Solution

On a case by case basis, update or remove all remaining uses of this type. From an initial skim, there look to be a couple categories:

  1. Counters that can/should get folded into a "metrics struct" that is already reported by some thread
  2. Counters for infrequent / error situations; these currently report immediately for the first instance and would be equivalent to using datapoint_
  3. Counters that can probably just be ripped out (trace or debug level, not useful, etc)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions