-
Notifications
You must be signed in to change notification settings - Fork 198
collect more apt metrics (Closes: #220) #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
539a748
to
6664cbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great start, thanks for working on this!
could you give an example output before/after the change? i think you have a before in: https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/issues/19#note_3151581
or at least a sample of the new metrics on a live system, to give us an idea of what it looks like?
apt_info.py
Outdated
origins = sorted( | ||
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins} | ||
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins if o.archive !='now'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linter is unhappy with this one:
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins if o.archive !='now'} | |
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins if o.archive != 'now'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, is the count still correct with this change? I imagine it just removes the needless origin, but did you make sure the number of install packages metric total still correctly reflect the actual installed count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh interesting, I just saw the failed flake8 test in CI. I'm using pywright locally because it was simpler to integrate into my editor + plugins but it's showing nothing for the problems that flake8 found. I guess that means it's time I made the effort of switching away from pywright.
It'll be fixed in my next push
apt_info.py
Outdated
@@ -69,7 +73,7 @@ def _write_pending_upgrades(registry, cache): | |||
def _write_held_upgrades(registry, cache): | |||
held_candidates = { | |||
p.candidate for p in cache | |||
if p.is_upgradable and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD | |||
if p.is_upgradable and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD and not p.phasing_applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linter also yells here, not sure how to fix that, perhaps:
if p.is_upgradable and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD and not p.phasing_applied | |
if ( | |
p.is_upgradable | |
and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD | |
and not p.phasing_applied | |
) |
apt_info.py
Outdated
p.candidate.origins[0].origin in ['', "/var/lib/dpkg/status"]) | ||
)] | ||
|
||
g = Gauge('apt_obsolete_packages', "Apt packages which are obsolete", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder about that metric name. I think, on its own, it makes sense, but when compared to the other metrics generated here, less so. All other metrics here do not have a _packages
suffix, so I wonder if we shouldn't change the name here to be more coherent with the other metrics, which are:
apt_upgrades_pending
apt_upgrades_held
apt_autoremove_pending
apt_package_cache_timestamp_seconds
node_reboot_required
(It should be noted that most of the metrics here do not follow the proper naming convention, except maybe apt_package_cache_timestamp_seconds
, but that's probably because I added it recently. Extra credits for the node_reboot_required
metric, coming completely out of left field...)
Anyways. I think I'm convincing myself that the name you picked is actually not that bad, but if we're going to break the (bad) convention that's already there, let's make it follow a proper convention, and let's call it apt_packages_obsolete_count
, or maaybe apt_obsolete_count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes you're right. I like apt_packages_obsolete_count
, let's go for that one
apt_info.py
Outdated
per_origin = _convert_candidates_to_upgrade_infos(installed_packages) | ||
|
||
if per_origin: | ||
g = Gauge('apt_installed_per_origin', "Numberof ppt packages installed per origin.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment here: either make this coherent with the obsolete metric, or make the obsolete metric coherent with this. :)
i would call it apt_packages_per_origin_count
... heck, you could even call it apt_packages_count
, then we could add more labels (e.g. state=installed
) as we go along converting the other metrics... ;) but maybe that would be too ambiguous.
also, the description has a couple of typos here, maybe you mean:
g = Gauge('apt_installed_per_origin', "Numberof ppt packages installed per origin.", | |
g = Gauge('apt_installed_per_origin', "Number of packages installed per origin.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk about going with apt_packages_count
since some packages will be double+ counted since they can match more than one state. personally because of the name I would kind of expect the total sum for apt_packages_count
across all label variants to be the total installed packages, so if we have duplication it won't reflect that correctly.
let's go with apt_packages_per_origin_count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, then the metrics name is definitely better.. i was actually assuming we did not have duplicates and could use that metric to extract a total package count, good to know that isn't the case.
Ubuntu uses a progress upgrade throughout hosts called phasing upgrades. If a package needs to be upgraded but is marked as being delayed because of phasing, it shouldn't be listed as the packages needing upgrades since it's already being taken care of. Signed-off-by: Gabriel Filion <[email protected]>
Packages are obsolete if they either: * don't have a candidate version * their candidate version does not have an origin. * their candidate version has a single origin that points to local storage Signed-off-by: Gabriel Filion <[email protected]>
With this you can see a progression of how many packages are installed from each source used. Signed-off-by: Gabriel Filion <[email protected]>
6664cbd
to
39498fb
Compare
39e4c3f
to
c8d7c67
Compare
I always for get to include the signoff lines in the commits, sorry for the noise. @anarcat I've added two more commits on the patch series to complete the requested features: we can now enable debug logs with an env var and we can also exclude packages by name with a command-line argument. |
apt_info.py
Outdated
logger = logging.getLogger(__name__) | ||
if os.environ.get("DEBUG"): | ||
logger.setLevel(logging.DEBUG) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to sound like nitpicking, but i would avoid passing a logger
all over the place.
just skip setting a logger object and call logging.debug
directly, i would say. if we were in a larger, multi-file project with various modules and third-party libs, i would do it differently, but in this case... why bother?
if we do want to redact (say) logs from python-apt, then i would do that explicitly.
otherwise, if you really want a global logger, then do that. maybe something like:
if __name__ == "__main__":
logging.basicConfig(stream=sys.stderr)
logger = logging.getLogger(__name__)
if os.environ.get("DEBUG"):
logger.setLevel(logging.DEBUG)
_main()
that way you don't need to pass logger around... i understand a global might not be the best practice, but that's effectively what we're doing here anyways by passing the logger all over the place...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is not nitpicking it's a good recommendation, it'll remove some useless complexity in the code. I'll push a change that uses the root logger instead
apt_info.py
Outdated
@@ -147,13 +176,18 @@ def _write_reboot_required(registry): | |||
|
|||
|
|||
def _main(): | |||
logging.basicConfig(stream=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think stream=sys.stderr is pretty much the default, why do we need to do this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blah it is indeed. I'll remove the param
apt_info.py
Outdated
def _write_pending_upgrades(registry, cache, logger): | ||
def _write_pending_upgrades(registry, cache, logger, cli_args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again maybe nitpicking, but i would pass the exclude flag directly here, not the whole args namespace:
def _write_pending_upgrades(registry, cache, logger, exclude):
some nitpicking on the code, otherwise LGTM. |
This can help one in identifying what the cache filters identified in each category, which is useful both for debugging the code itself and for identifying what builtin command-line apt search filters may not be identifying in a similar manner. One example of this is the obsolete package filter, which can sometimes catch more packages than `apt list "?obsolete"` Signed-off-by: Gabriel Filion <[email protected]>
In some situations some packages may need to be excluded, e.g. if we're expecting some odd situation. In this case it would be nice to exclude the packages by name. Signed-off-by: Gabriel Filion <[email protected]>
c8d7c67
to
c8e8a48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
This patch series adds some more information about:
It also adds filtering for packages that are marked for phasing upgrades. In this situation, the upgrades are not really pending anymore but just delayed until they are actually processed.