-
-
Notifications
You must be signed in to change notification settings - Fork 12k
[Feature]: Prometheus Metrics Abstraction #30689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Feature]: Prometheus Metrics Abstraction #30689
Conversation
Signed-off-by: Mladjan Gadzic <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a well-designed abstraction layer for metrics, which will make it easier to support different backends like Prometheus and OpenTelemetry in the future. The refactoring of existing metric collection points to use this new abstraction is thorough. I've found one critical issue related to a leaky abstraction that needs to be addressed to ensure backend-agnostic behavior.
| self.labelname_max_lora: self.max_lora, | ||
| } | ||
| self.gauge_lora_info.labels(**lora_info_labels).set_to_current_time() | ||
| self.gauge_lora_info.labels(**lora_info_labels).set_to_current_time() # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method set_to_current_time() is specific to the prometheus-client and is not part of the AbstractGauge interface. Using it breaks the abstraction layer you've introduced, and it will cause a runtime error if a different metrics backend (like the prototype OTEL one) is used.
To fix this and adhere to the abstraction, you should use the set() method from the AbstractGauge interface with the current time. The type: ignore is still necessary due to how labels() is typed in the base class, which is a separate typing issue.
| self.gauge_lora_info.labels(**lora_info_labels).set_to_current_time() # type: ignore[attr-defined] | |
| self.gauge_lora_info.labels(**lora_info_labels).set(time.time()) # type: ignore[attr-defined] |
bbartels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple follow up clean-up tasks would be good, ideally any publisher of metrics isn't aware of which backend is being exported to, so references to prometheus could be cleaned up in various places (assuming it doesn't break public API surface).
Worth updating docs (if there are any) to reflect these changes so developers are aware of the changes of how to register metrics
Purpose
Implemented feature #30394
Test Plan
Added tests.
Test Result
All metrics test passes.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.