-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Consolidate process count metrics #19706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Consolidate process count metrics #19706
Conversation
@@ -217,7 +212,7 @@ def check_metrics(self, status): | |||
role_counts[rolename] = 1 | |||
|
|||
for role in role_counts: | |||
self.gauge("foundationdb.processes_per_role." + role, role_counts[role]) | |||
self.gauge("foundationdb.instances", role_counts[role], ["fdb_role:" + role]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that foundationdb.instances
was previously a count
, not a gauge
, but I think that was a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up, I still think this is a bug, but out of scope for this pull request.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Thinking about this a little more, this is definitely going to need dashboard updates, which I simply haven't done yet. I guess let's see if we agree on the basic idea first; if the general approach seems reasonable, then I'll dive into the dashboards/alerts/everything else! Also, I think we can actually collapse this down into a single
I think what will happen is that just querying The other nice thing about the "tag by role" approach (and I should have mentioned this earlier) is that it means that if new roles appear in future versions of FoundationDB, those new roles will just appear as new tags and operators won't have to add new metrics to their "how many processes do I have?" queries. Seem reasonable? |
d806243
to
b53229a
Compare
From the conversation in #19681 (comment), it sounds like the best way forward here would be to introduce a new metric rather than changing anything that already exists. I'm not in a position to do that this afternoon, but acknowledge it needs to get done and will address it as soon as I can! |
7c5e213
to
ef36263
Compare
I've revised this significantly in light of discussions elsewhere, and now I think it's much less controversial. I'll update the description to match. |
ef36263
to
c7c0690
Compare
@jon-signal Hello again! I'll take a look at this soon ™️ |
c7c0690
to
6471128
Compare
6471128
to
2959875
Compare
What does this PR do?
This introduces a new, single
foundationdb.processes_per_role
gauge that is tagged byfdb_role
andfdb_process_class
. It makes no changes to existing metrics.Motivation
There are two main motivations for this new gauge:
proxy
role, but in FoundationDB 7.x, that role is gone and has been replaced bygrv_proxy
andcommit_proxy
. Using tags (instead of individual gauges) to count processes by role gives this integration more flexibility to work with different versions of FoundationDB.storage
andlog
roles would be bad news, and operators might want to add a monitor for problematic duplicate role assignments.Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged