Skip to content

Comments

tests/reporter: Add gauge scrape#597

Merged
kubevirt-bot merged 2 commits intok8snetworkplumbingwg:mainfrom
RamLavi:add_collision_data
Feb 10, 2026
Merged

tests/reporter: Add gauge scrape#597
kubevirt-bot merged 2 commits intok8snetworkplumbingwg:mainfrom
RamLavi:add_collision_data

Conversation

@RamLavi
Copy link
Member

@RamLavi RamLavi commented Feb 5, 2026

What this PR does / why we need it:
When a test fails on kmp_mac_collisions gauge not having the expected value, it may be related to the current test or a remnant from an old one not clearing up well, or a design bug.
However the current reporter does not log the kmp_mac_collisions gauge so that it could be debugged later.

This PR adds a log after test failure, of the kmp_mac_collisions gauge, and also print the VMIs that are attached to it, for easy debugging.

Special notes for your reviewer:

Release note:

NONE

@gemini-code-assist
Copy link

Summary of Changes

Hello @RamLavi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the diagnostic capabilities of the test suite by integrating a detailed report of MAC address collisions into the failure logs. By capturing the kmp_mac_collisions gauge and linking specific MAC addresses to their respective VirtualMachineInstances (VMIs), the changes aim to streamline the debugging process for issues related to MAC address management and prevent flakiness in tests caused by lingering collision states.

Highlights

  • Enhanced Test Failure Logging: Introduced automatic logging of kmp_mac_collisions gauge values whenever a test fails, providing immediate insight into potential MAC address collision issues.
  • VMI Association for Debugging: The collision gauge log now includes a list of VirtualMachineInstances (VMIs) associated with each reported MAC address, significantly aiding in debugging and identifying the source of collisions.
  • New Utility Functions: Added several new Go functions to scrape Prometheus metrics, parse VMI MAC addresses using kubectl commands, and format the detailed collision report for easier analysis.
Changelog
  • tests/tests_suite_test.go
    • Added regexp and github.com/k8snetworkplumbingwg/kubemacpool/tests/kubectl package imports.
    • Integrated a call to logMACCollisionGauge within dumpKubemacpoolLogs to ensure collision metrics are logged on test failure.
    • Implemented logMACCollisionGauge to orchestrate the generation and logging of the kmp_mac_collisions report.
    • Introduced macCollisionGaugeReport to fetch Prometheus metrics and retrieve VMI data for MAC association.
    • Added getVMIByMAC function to query kubectl for VMI details and map MAC addresses to VMI identifiers.
    • Developed appendMACCollisionLines to format the collision report, including the associated VMIs for each MAC.
    • Created parseMACLabelValue to extract MAC addresses from metric lines using regular expressions.
Activity
  • No specific activity (comments, reviews, etc.) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances test failure diagnostics by adding a log of the kmp_mac_collisions gauge. When a test fails, it now scrapes this metric and correlates the MAC addresses with running VMIs, which is a great addition for debugging potential MAC collision issues. The implementation is solid, but I've pointed out a small performance improvement opportunity in the new code.

@RamLavi RamLavi force-pushed the add_collision_data branch from 491c2f4 to 6f5a6f2 Compare February 8, 2026 07:46
@RamLavi
Copy link
Member Author

RamLavi commented Feb 8, 2026

Change: Address gemini code review and linter issues

Copy link
Collaborator

@ormergi ormergi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seem to dump some metrics KubeMacPoll generates in tests, then there is the scrapped content parsing and formatting, and additional non related refactoring.

Please consider keep this PR simple as possible, leaving only the metric dump part.
Making the log nicer or additional refactoring can be done on follow up.

About making the dumped metric logs nicer:
Its not clear why its necessary (i.e: before and after examples) or how common this log will be used, that it require additional processing.
Sound like something we should avoid maintaining (e.g.: content formatting changes, metric detains are added, etc..)
I am sure there are tooling that can present Prometheus metrics nicely.

Comment on lines 601 to 607
if strings.TrimSpace(report) == "" {
report = "(no kmp_mac_collisions metrics found)"
}

if err != nil {
report = fmt.Sprintf("failed to build kmp mac collision gauge report: %v\n\n%s", err, report)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the convention for errors is to not write the file, why is it needed in this case?

Copy link
Member Author

@RamLavi RamLavi Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I wanted the avoid an empty file with no explanation (or no file at all), but it would make better sense to keep the convention.
DONE

@RamLavi
Copy link
Member Author

RamLavi commented Feb 8, 2026

Please consider keep this PR simple as possible, leaving only the metric dump part.

wish I could - but it failed linter (the one function I added tipped the scale and fires the "this function is too complex" linter).
If we want to have a clean PR with only the logging - I would need to reverse the order of commit and issue it in a separate PR. But since no code logic is changed, I preferred to just add it a chore commit
Do you prefer I do that separate to a different PR anyways?

Its not clear why its necessary (i.e: before and after examples) or how common this log will be used, that it require additional processing.

I'm sorry I didn't make the intention clear on the PR Desc. will fix.
but in general - the purpose of this log report is to see the internal machinery when a test fails.
We want to make sure that when a test fails - it's not related to an old test not cleaning up properly.
When a test fail on "I expect that there would be no collisions" - this report will tell me who are the VMIs that where causing KMP to set the alert. IOW it will be a powerful debugging tool.

Sound like something we should avoid maintaining (e.g.: content formatting changes, metric detains are added, etc..)
I am sure there are tooling that can present Prometheus metrics nicely.

There are promql tools that do the parsing for us, but I figured they require too much additional resource and code for the simple purpose of our scrape. IOW IMO using these clients would be a huge overkill.
In the end the metric we scrape is one that kmp owns, so the chance of it changing format without us noticing is possible, yet slim.

Print the gauge scrape, and also print the VMIs that are attached to it,
for easy debugging.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi RamLavi force-pushed the add_collision_data branch from 6f5a6f2 to 16d6c9b Compare February 8, 2026 11:38
@RamLavi
Copy link
Member Author

RamLavi commented Feb 8, 2026

Change: address @ormergi 's review comment

@ormergi
Copy link
Collaborator

ormergi commented Feb 9, 2026

wish I could - but it failed linter (the one function I added tipped the scale and fires the "this function is too complex" linter). If we want to have a clean PR with only the logging - I would need to reverse the order of commit and issue it in a separate PR. But since no code logic is changed, I preferred to just add it a chore commit Do you prefer I do that separate to a different PR anyways?

Can we just move the part that process the metric content to different PR, that is macCollisionGaugeReport() code? So that this PR outcome is the metric being dumped as is to a file in the test artifacts dir.

It just that I am not sure we need or want to review the processing the metric part, do we really need it?
And I would rather avoid processing some log to avoid future maintenance (the raw log can change and break the log processing code)

I'm sorry I didn't make the intention clear on the PR Desc. will fix.
but in general - the purpose of this log report is to see the internal machinery when a test fails.
We want to make sure that when a test fails - it's not related to an old test not cleaning up properly.
When a test fail on "I expect that there would be no collisions" - this report will tell me who are the VMIs that where causing KMP to set the alert. IOW it will be a powerful debugging tool.

Sound like we need an assert before each test that ensure no collision occurred before the next test run,
instead of a log we will need to actively look for when something go wrong on CI.
WDYT?

There are promql tools that do the parsing for us, but I figured they require too much additional resource and code for the simple purpose of our scrape. IOW IMO using these clients would be a huge overkill. In the end the metric we scrape is one that kmp owns, so the chance of it changing format without us noticing is possible, yet slim.

I meant one could use such tools on their local env, not on CI.

@ormergi
Copy link
Collaborator

ormergi commented Feb 10, 2026

Although I would avoid parsing some log because the expected output could change, looking at this again its not harmful because it done in tests, and can improve debugging.

/lgtm

@RamLavi
Copy link
Member Author

RamLavi commented Feb 10, 2026

/approve

@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RamLavi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot merged commit a34551a into k8snetworkplumbingwg:main Feb 10, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants