forked from solana-labs/solana
-
Notifications
You must be signed in to change notification settings - Fork 829
Open
Description
Problem
The metrics Counter and associated macro's need to go. They are "convenient" and "flexible" for tracking an individual metric that may get updated by multiple threads, but that convenience comes at the cost of hidden performance impact. Namely, each counter uses 5 atomics:
Lines 18 to 23 in c832757
| pub counts: AtomicUsize, | |
| pub times: AtomicUsize, | |
| /// last accumulated value logged | |
| pub lastlog: AtomicUsize, | |
| pub lograte: AtomicUsize, | |
| pub metricsrate: AtomicU64, |
and involves multiple operations on these fields when the field is updated:
Lines 169 to 198 in c832757
| pub fn inc(&self, level: log::Level, events: usize) { | |
| let now = solana_time_utils::timestamp(); | |
| let counts = self.counts.fetch_add(events, Ordering::Relaxed); | |
| let times = self.times.fetch_add(1, Ordering::Relaxed); | |
| let lograte = self.lograte.load(Ordering::Relaxed); | |
| let metricsrate = self.metricsrate.load(Ordering::Relaxed); | |
| if times.is_multiple_of(lograte) && times > 0 && log_enabled!(level) { | |
| log!( | |
| level, | |
| "COUNTER:{{\"name\": \"{}\", \"counts\": {}, \"samples\": {times}, \"now\": \ | |
| {now}, \"events\": {events}}}", | |
| self.name, | |
| counts + events, | |
| ); | |
| } | |
| let lastlog = self.lastlog.load(Ordering::Relaxed); | |
| #[allow(deprecated)] | |
| let prev = self | |
| .lastlog | |
| .compare_and_swap(lastlog, counts, Ordering::Relaxed); | |
| if prev == lastlog { | |
| let bucket = now / metricsrate; | |
| let counter = CounterPoint { | |
| name: self.name, | |
| count: counts as i64 - lastlog as i64, | |
| timestamp: SystemTime::now(), | |
| }; | |
| submit_counter(counter, level, bucket); |
This is not suitable for any hot path, and has mostly been removed from the codebase. There are still some instances left though.
Proposed Solution
On a case by case basis, update or remove all remaining uses of this type. From an initial skim, there look to be a couple categories:
- Counters that can/should get folded into a "metrics struct" that is already reported by some thread
- Counters for infrequent / error situations; these currently report immediately for the first instance and would be equivalent to using
datapoint_ - Counters that can probably just be ripped out (trace or debug level, not useful, etc)
Metadata
Metadata
Assignees
Labels
No labels