Skip to content

collect more apt metrics (Closes: #220) #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lelutin
Copy link

@lelutin lelutin commented Jan 22, 2025

This patch series adds some more information about:

  • total installed packages, per arch and origin
  • obsolete packages -- those without a candidate, an origin or a with only one origin that points to local storage

It also adds filtering for packages that are marked for phasing upgrades. In this situation, the upgrades are not really pending anymore but just delayed until they are actually processed.

@lelutin lelutin force-pushed the 220_apt_more_metrics branch 2 times, most recently from 539a748 to 6664cbd Compare January 22, 2025 21:32
Copy link
Contributor

@anarcat anarcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great start, thanks for working on this!

could you give an example output before/after the change? i think you have a before in: https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/issues/19#note_3151581

or at least a sample of the new metrics on a live system, to give us an idea of what it looks like?

apt_info.py Outdated
origins = sorted(
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins}
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins if o.archive !='now'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linter is unhappy with this one:

Suggested change
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins if o.archive !='now'}
{f"{o.origin}:{o.codename}/{o.archive}" for o in candidate.origins if o.archive != 'now'}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is the count still correct with this change? I imagine it just removes the needless origin, but did you make sure the number of install packages metric total still correctly reflect the actual installed count?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh interesting, I just saw the failed flake8 test in CI. I'm using pywright locally because it was simpler to integrate into my editor + plugins but it's showing nothing for the problems that flake8 found. I guess that means it's time I made the effort of switching away from pywright.

It'll be fixed in my next push

apt_info.py Outdated
@@ -69,7 +73,7 @@ def _write_pending_upgrades(registry, cache):
def _write_held_upgrades(registry, cache):
held_candidates = {
p.candidate for p in cache
if p.is_upgradable and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD
if p.is_upgradable and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD and not p.phasing_applied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linter also yells here, not sure how to fix that, perhaps:

Suggested change
if p.is_upgradable and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD and not p.phasing_applied
if (
p.is_upgradable
and p._pkg.selected_state == apt_pkg.SELSTATE_HOLD
and not p.phasing_applied
)

apt_info.py Outdated
p.candidate.origins[0].origin in ['', "/var/lib/dpkg/status"])
)]

g = Gauge('apt_obsolete_packages', "Apt packages which are obsolete",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder about that metric name. I think, on its own, it makes sense, but when compared to the other metrics generated here, less so. All other metrics here do not have a _packages suffix, so I wonder if we shouldn't change the name here to be more coherent with the other metrics, which are:

apt_upgrades_pending
apt_upgrades_held
apt_autoremove_pending
apt_package_cache_timestamp_seconds
node_reboot_required

(It should be noted that most of the metrics here do not follow the proper naming convention, except maybe apt_package_cache_timestamp_seconds, but that's probably because I added it recently. Extra credits for the node_reboot_required metric, coming completely out of left field...)

Anyways. I think I'm convincing myself that the name you picked is actually not that bad, but if we're going to break the (bad) convention that's already there, let's make it follow a proper convention, and let's call it apt_packages_obsolete_count, or maaybe apt_obsolete_count.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes you're right. I like apt_packages_obsolete_count, let's go for that one

apt_info.py Outdated
per_origin = _convert_candidates_to_upgrade_infos(installed_packages)

if per_origin:
g = Gauge('apt_installed_per_origin', "Numberof ppt packages installed per origin.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment here: either make this coherent with the obsolete metric, or make the obsolete metric coherent with this. :)

i would call it apt_packages_per_origin_count... heck, you could even call it apt_packages_count, then we could add more labels (e.g. state=installed) as we go along converting the other metrics... ;) but maybe that would be too ambiguous.

also, the description has a couple of typos here, maybe you mean:

Suggested change
g = Gauge('apt_installed_per_origin', "Numberof ppt packages installed per origin.",
g = Gauge('apt_installed_per_origin', "Number of packages installed per origin.",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk about going with apt_packages_count since some packages will be double+ counted since they can match more than one state. personally because of the name I would kind of expect the total sum for apt_packages_count across all label variants to be the total installed packages, so if we have duplication it won't reflect that correctly.

let's go with apt_packages_per_origin_count

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, then the metrics name is definitely better.. i was actually assuming we did not have duplicates and could use that metric to extract a total package count, good to know that isn't the case.

Gabriel Filion added 3 commits January 23, 2025 13:15
Ubuntu uses a progress upgrade throughout hosts called phasing upgrades.
If a package needs to be upgraded but is marked as being delayed because
of phasing, it shouldn't be listed as the packages needing upgrades
since it's already being taken care of.

Signed-off-by: Gabriel Filion <[email protected]>
Packages are obsolete if they either:

* don't have a candidate version
* their candidate version does not have an origin.
* their candidate version has a single origin that points to local
  storage

Signed-off-by: Gabriel Filion <[email protected]>
With this you can see a progression of how many packages are installed
from each source used.

Signed-off-by: Gabriel Filion <[email protected]>
@lelutin lelutin force-pushed the 220_apt_more_metrics branch from 6664cbd to 39498fb Compare January 23, 2025 18:17
@lelutin lelutin force-pushed the 220_apt_more_metrics branch from 39e4c3f to c8d7c67 Compare January 28, 2025 19:22
@lelutin
Copy link
Author

lelutin commented Jan 28, 2025

I always for get to include the signoff lines in the commits, sorry for the noise.

@anarcat I've added two more commits on the patch series to complete the requested features: we can now enable debug logs with an env var and we can also exclude packages by name with a command-line argument.

@lelutin lelutin requested a review from anarcat January 28, 2025 19:25
apt_info.py Outdated
Comment on lines 180 to 194
logger = logging.getLogger(__name__)
if os.environ.get("DEBUG"):
logger.setLevel(logging.DEBUG)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to sound like nitpicking, but i would avoid passing a logger all over the place.

just skip setting a logger object and call logging.debug directly, i would say. if we were in a larger, multi-file project with various modules and third-party libs, i would do it differently, but in this case... why bother?

if we do want to redact (say) logs from python-apt, then i would do that explicitly.

otherwise, if you really want a global logger, then do that. maybe something like:

if __name__ == "__main__":
    logging.basicConfig(stream=sys.stderr)
    logger = logging.getLogger(__name__)
    if os.environ.get("DEBUG"):
        logger.setLevel(logging.DEBUG)
    _main()

that way you don't need to pass logger around... i understand a global might not be the best practice, but that's effectively what we're doing here anyways by passing the logger all over the place...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is not nitpicking it's a good recommendation, it'll remove some useless complexity in the code. I'll push a change that uses the root logger instead

apt_info.py Outdated
@@ -147,13 +176,18 @@ def _write_reboot_required(registry):


def _main():
logging.basicConfig(stream=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think stream=sys.stderr is pretty much the default, why do we need to do this here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blah it is indeed. I'll remove the param

apt_info.py Outdated
Comment on lines 63 to 64
def _write_pending_upgrades(registry, cache, logger):
def _write_pending_upgrades(registry, cache, logger, cli_args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again maybe nitpicking, but i would pass the exclude flag directly here, not the whole args namespace:

def _write_pending_upgrades(registry, cache, logger, exclude):

@anarcat
Copy link
Contributor

anarcat commented Jan 28, 2025

some nitpicking on the code, otherwise LGTM.

Gabriel Filion added 2 commits January 28, 2025 17:03
This can help one in identifying what the cache filters identified in
each category, which is useful both for debugging the code itself and
for identifying what builtin command-line apt search filters may not be
identifying in a similar manner.

One example of this is the obsolete package filter, which can sometimes
catch more packages than `apt list "?obsolete"`

Signed-off-by: Gabriel Filion <[email protected]>
In some situations some packages may need to be excluded, e.g. if we're
expecting some odd situation. In this case it would be nice to exclude
the packages by name.

Signed-off-by: Gabriel Filion <[email protected]>
@lelutin lelutin force-pushed the 220_apt_more_metrics branch from c8d7c67 to c8e8a48 Compare January 28, 2025 22:05
Copy link
Contributor

@anarcat anarcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants