tests/reporter: Add gauge scrape by RamLavi · Pull Request #597 · k8snetworkplumbingwg/kubemacpool

RamLavi · 2026-02-05T11:16:02Z

What this PR does / why we need it:
When a test fails on kmp_mac_collisions gauge not having the expected value, it may be related to the current test or a remnant from an old one not clearing up well, or a design bug.
However the current reporter does not log the kmp_mac_collisions gauge so that it could be debugged later.

This PR adds a log after test failure, of the kmp_mac_collisions gauge, and also print the VMIs that are attached to it, for easy debugging.

Special notes for your reviewer:

Release note:

NONE

gemini-code-assist · 2026-02-05T11:16:18Z

Summary of Changes

Hello @RamLavi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the diagnostic capabilities of the test suite by integrating a detailed report of MAC address collisions into the failure logs. By capturing the kmp_mac_collisions gauge and linking specific MAC addresses to their respective VirtualMachineInstances (VMIs), the changes aim to streamline the debugging process for issues related to MAC address management and prevent flakiness in tests caused by lingering collision states.

Highlights

Enhanced Test Failure Logging: Introduced automatic logging of kmp_mac_collisions gauge values whenever a test fails, providing immediate insight into potential MAC address collision issues.
VMI Association for Debugging: The collision gauge log now includes a list of VirtualMachineInstances (VMIs) associated with each reported MAC address, significantly aiding in debugging and identifying the source of collisions.
New Utility Functions: Added several new Go functions to scrape Prometheus metrics, parse VMI MAC addresses using kubectl commands, and format the detailed collision report for easier analysis.

Changelog

tests/tests_suite_test.go
- Added regexp and github.com/k8snetworkplumbingwg/kubemacpool/tests/kubectl package imports.
- Integrated a call to logMACCollisionGauge within dumpKubemacpoolLogs to ensure collision metrics are logged on test failure.
- Implemented logMACCollisionGauge to orchestrate the generation and logging of the kmp_mac_collisions report.
- Introduced macCollisionGaugeReport to fetch Prometheus metrics and retrieve VMI data for MAC association.
- Added getVMIByMAC function to query kubectl for VMI details and map MAC addresses to VMI identifiers.
- Developed appendMACCollisionLines to format the collision report, including the associated VMIs for each MAC.
- Created parseMACLabelValue to extract MAC addresses from metric lines using regular expressions.

Activity

No specific activity (comments, reviews, etc.) was provided in the context for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances test failure diagnostics by adding a log of the kmp_mac_collisions gauge. When a test fails, it now scrapes this metric and correlates the MAC addresses with running VMIs, which is a great addition for debugging potential MAC collision issues. The implementation is solid, but I've pointed out a small performance improvement opportunity in the new code.

tests/tests_suite_test.go

RamLavi · 2026-02-08T07:47:40Z

Change: Address gemini code review and linter issues

ormergi

This PR seem to dump some metrics KubeMacPoll generates in tests, then there is the scrapped content parsing and formatting, and additional non related refactoring.

Please consider keep this PR simple as possible, leaving only the metric dump part.
Making the log nicer or additional refactoring can be done on follow up.

About making the dumped metric logs nicer:
Its not clear why its necessary (i.e: before and after examples) or how common this log will be used, that it require additional processing.
Sound like something we should avoid maintaining (e.g.: content formatting changes, metric detains are added, etc..)
I am sure there are tooling that can present Prometheus metrics nicely.

ormergi · 2026-02-08T08:57:51Z

tests/tests_suite_test.go

+	if strings.TrimSpace(report) == "" {
+		report = "(no kmp_mac_collisions metrics found)"
+	}
+
+	if err != nil {
+		report = fmt.Sprintf("failed to build kmp mac collision gauge report: %v\n\n%s", err, report)
+	}


Looks like the convention for errors is to not write the file, why is it needed in this case?

You're right, I wanted the avoid an empty file with no explanation (or no file at all), but it would make better sense to keep the convention.
DONE

RamLavi · 2026-02-08T11:31:31Z

Please consider keep this PR simple as possible, leaving only the metric dump part.

wish I could - but it failed linter (the one function I added tipped the scale and fires the "this function is too complex" linter).
If we want to have a clean PR with only the logging - I would need to reverse the order of commit and issue it in a separate PR. But since no code logic is changed, I preferred to just add it a chore commit
Do you prefer I do that separate to a different PR anyways?

Its not clear why its necessary (i.e: before and after examples) or how common this log will be used, that it require additional processing.

I'm sorry I didn't make the intention clear on the PR Desc. will fix.
but in general - the purpose of this log report is to see the internal machinery when a test fails.
We want to make sure that when a test fails - it's not related to an old test not cleaning up properly.
When a test fail on "I expect that there would be no collisions" - this report will tell me who are the VMIs that where causing KMP to set the alert. IOW it will be a powerful debugging tool.

Sound like something we should avoid maintaining (e.g.: content formatting changes, metric detains are added, etc..)
I am sure there are tooling that can present Prometheus metrics nicely.

There are promql tools that do the parsing for us, but I figured they require too much additional resource and code for the simple purpose of our scrape. IOW IMO using these clients would be a huge overkill.
In the end the metric we scrape is one that kmp owns, so the chance of it changing format without us noticing is possible, yet slim.

Print the gauge scrape, and also print the VMIs that are attached to it, for easy debugging. Signed-off-by: Ram Lavi <ralavi@redhat.com>

Signed-off-by: Ram Lavi <ralavi@redhat.com>

RamLavi · 2026-02-08T11:39:10Z

Change: address @ormergi 's review comment

ormergi · 2026-02-09T10:17:34Z

wish I could - but it failed linter (the one function I added tipped the scale and fires the "this function is too complex" linter). If we want to have a clean PR with only the logging - I would need to reverse the order of commit and issue it in a separate PR. But since no code logic is changed, I preferred to just add it a chore commit Do you prefer I do that separate to a different PR anyways?

Can we just move the part that process the metric content to different PR, that is macCollisionGaugeReport() code? So that this PR outcome is the metric being dumped as is to a file in the test artifacts dir.

It just that I am not sure we need or want to review the processing the metric part, do we really need it?
And I would rather avoid processing some log to avoid future maintenance (the raw log can change and break the log processing code)

I'm sorry I didn't make the intention clear on the PR Desc. will fix.
but in general - the purpose of this log report is to see the internal machinery when a test fails.
We want to make sure that when a test fails - it's not related to an old test not cleaning up properly.
When a test fail on "I expect that there would be no collisions" - this report will tell me who are the VMIs that where causing KMP to set the alert. IOW it will be a powerful debugging tool.

Sound like we need an assert before each test that ensure no collision occurred before the next test run,
instead of a log we will need to actively look for when something go wrong on CI.
WDYT?

There are promql tools that do the parsing for us, but I figured they require too much additional resource and code for the simple purpose of our scrape. IOW IMO using these clients would be a huge overkill. In the end the metric we scrape is one that kmp owns, so the chance of it changing format without us noticing is possible, yet slim.

I meant one could use such tools on their local env, not on CI.

ormergi · 2026-02-10T12:16:26Z

Although I would avoid parsing some log because the expected output could change, looking at this again its not harmful because it done in tests, and can improve debugging.

/lgtm

RamLavi · 2026-02-10T18:31:13Z

/approve

kubevirt-bot · 2026-02-10T18:31:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RamLavi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [RamLavi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot added the release-note-none label Feb 5, 2026

kubevirt-bot requested a review from ormergi February 5, 2026 11:16

kubevirt-bot added the dco-signoff: yes label Feb 5, 2026

kubevirt-bot requested a review from phoracek February 5, 2026 11:16

kubevirt-bot added the size/L label Feb 5, 2026

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

tests/tests_suite_test.go Outdated Show resolved Hide resolved

RamLavi force-pushed the add_collision_data branch from 491c2f4 to 6f5a6f2 Compare February 8, 2026 07:46

ormergi reviewed Feb 8, 2026

View reviewed changes

RamLavi added 2 commits February 8, 2026 13:38

tests/reporter: Add gauge scrape

37d04ac

Print the gauge scrape, and also print the VMIs that are attached to it, for easy debugging. Signed-off-by: Ram Lavi <ralavi@redhat.com>

tests: Refactor dumpKubemacpoolLogs for reduce complexity

16d6c9b

Signed-off-by: Ram Lavi <ralavi@redhat.com>

RamLavi force-pushed the add_collision_data branch from 6f5a6f2 to 16d6c9b Compare February 8, 2026 11:38

kubevirt-bot assigned ormergi Feb 10, 2026

kubevirt-bot added the lgtm label Feb 10, 2026

kubevirt-bot added the approved label Feb 10, 2026

kubevirt-bot merged commit a34551a into k8snetworkplumbingwg:main Feb 10, 2026
5 checks passed

Comments

Conversation

RamLavi commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

RamLavi commented Feb 8, 2026

Uh oh!

ormergi left a comment

Choose a reason for hiding this comment

Uh oh!

ormergi Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

RamLavi Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RamLavi commented Feb 8, 2026

Uh oh!

RamLavi commented Feb 8, 2026

Uh oh!

ormergi commented Feb 9, 2026

Uh oh!

ormergi commented Feb 10, 2026

Uh oh!

RamLavi commented Feb 10, 2026

Uh oh!

kubevirt-bot commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RamLavi commented Feb 5, 2026 •

edited

Loading

RamLavi Feb 8, 2026 •

edited

Loading