Skip to content

Conversation

@vandah
Copy link
Contributor

@vandah vandah commented Nov 12, 2025

Description

This PR adds a provider for testing devices with an NPU.
It is currently focused on Intel NPUs which use the accel kernel interface.
The driver for these NPUs is distributed through the intel-npu-driver snap which also includes a gtest-based testing utility npu-umd-test. The tests in this provider check the appropriate firmware version is loaded, the user has the appropriate permissions and the rest runs individual tests from the npu-umd-test utility.

Known issues

Some of the test names coming from the npu-umd-test test suite are longer than 80 characters which triggers a warning in checkbox.

Tests

Tests have been run on Meteor Lake and Arrow Lake devices.

@vandah vandah changed the title New: Add NPU provider Add NPU provider (New) Nov 12, 2025
@vandah vandah force-pushed the npu-provider branch 5 times, most recently from 40b763c to 3c0ef4f Compare November 13, 2025 09:29
@vandah vandah marked this pull request as draft November 13, 2025 15:24
@vandah vandah marked this pull request as ready for review November 14, 2025 12:54
@@ -0,0 +1,5 @@
# Checkbox Provider - NPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything that should be done when creating a new provider? CI-wise, packaging-wise?

estimated_duration: 2s
command: check_accel_permissions.py
imports: from com.canonical.plainbox import manifest
requires: manifest.has_npu == 'True'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that new manifest entries must now be submitted to the certification team to be added to C3's feature set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C3 gets them from https://github.com/canonical/blueprints so you could also make a PR there, I think.

@vandah
Copy link
Contributor Author

vandah commented Dec 12, 2025

@fernando79513 and @tomli380576, can you please review this PR?

Copy link
Contributor

@pseudocc pseudocc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a passersby reviewer, see inline comments.

@@ -0,0 +1,5 @@
# Checkbox Provider - NPU

This provider includes tests for devices with an NPU. As of right now, it is intended only for Intel NPUs. The tests only run as long as the manifest entry `has_npu` is set to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check this instead of defining has_npu

modinfo -Falias intel_vpu                                                                                                    devenv-shell-env
pci:v00008086d0000FD3Esv*sd*bc*sc*i*
pci:v00008086d0000B03Esv*sd*bc*sc*i*
pci:v00008086d0000643Esv*sd*bc*sc*i*
pci:v00008086d0000AD1Dsv*sd*bc*sc*i*
pci:v00008086d00007D1Dsv*sd*bc*sc*i*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean there should be no manifest entry at all ?
I have now added a job to check modinfo output, is that what you had in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about reading the modinfo output in a resource job, but now I think it may not be the ideal way.

Say we have:

  • 6.17 kernel and a Panther Lake CPU (modaliases for Panther Lake were added in 6.18).

Then all NPU tests would be skipped, which shouldn't be what we expect. Let's just keep the manifest implementation; it's good.

Copy link
Contributor

@tomli380576 tomli380576 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test scripts lgtm! Could you provide a bit of documentation on what the expectations are for NPU_UMD_TEST_CONFIG like where it's supposed to be placed at, permissions, what exactly are the expected contents (for example the tree output of a correct setup), etc.

One small question: is the driver snap expected to come preinstalled? If not, I think we should print an error somewhere or mention it in the manifest.



def main():
config_path = os.environ.get("NPU_UMD_TEST_CONFIG")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NPU_UMD_TEST_CONFIG also needs to be an absolute path since it's passed to dirname in one of the jobs

assert Path(config_path).is_absolute()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (along with the model file) might have to be placed inside the driver snap's current directory or the umd tests will say "the file is bad" and trigger an apparmor deny message. For example if I put the config file in $HOME, this happens:

[Tue Dec 23 10:47:03 2025] audit: type=1400 audit(1766458023.464:331): apparmor="DENIED" operation="open" class="file" profile="snap.intel-npu-driver.npu-umd-test" name="/home/ubuntu/basic.yaml" pid=31451 comm="npu-umd-test" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this file is placed in the current directory by the script which installs the intel-npu-driver snap. The file is pretty much static but we're planning to have the file located directly inside the intel-npu-driver snap (since the format could change between versions of npu-umd-test which is already distributed as a binary in the intel-npu-driver snap).

Copy link
Contributor

@tomli380576 tomli380576 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the model files would be bundled with the snap in the future? If so, I think you can use the path of the bundled files as the default inside the test case and only use the environment variable as an override. This would also make it easier to run the test since we won't have to specify NPU_UMD_TEST_CONFIG every time.

@vandah
Copy link
Contributor Author

vandah commented Jan 23, 2026

One small question: is the driver snap expected to come preinstalled? If not, I think we should print an error somewhere or mention it in the manifest.

The snap is not pre-installed but ideally the devices that do have the has_npu manifest entry should install the snap before running checkbox... Is there any way in checkbox to define this dependency and maybe even have the snap auto-installed by checkbox?

@vandah vandah requested a review from pseudocc January 23, 2026 07:55
pseudocc
pseudocc previously approved these changes Jan 23, 2026
Copy link
Contributor

@pseudocc pseudocc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks!

@tomli380576
Copy link
Contributor

To make this case depend on whether the driver snap exists, add snap.name == "intel-npu-driver" in the requires: section of the test job; but note that this would make checkbox "silently" skip the job and put it in the "job with failed dependencies" pile if the snap is not installed even if the manifest is set to true.

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.78%. Comparing base (278c069) to head (e5835e8).
⚠️ Report is 73 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2198      +/-   ##
==========================================
+ Coverage   53.34%   54.78%   +1.44%     
==========================================
  Files         399      407       +8     
  Lines       42907    44107    +1200     
  Branches     7945     8154     +209     
==========================================
+ Hits        22887    24163    +1276     
+ Misses      19214    19114     -100     
- Partials      806      830      +24     
Flag Coverage Δ
checkbox-ng 71.63% <ø> (+0.22%) ⬆️
checkbox-support 69.55% <ø> (+4.26%) ⬆️
provider-base 31.72% <ø> (+1.89%) ⬆️
provider-certification-client 57.14% <ø> (ø)
provider-certification-server 57.14% <ø> (ø)
provider-genio 96.90% <ø> (ø)
provider-gpgpu 93.14% <ø> (ø)
provider-iiotg 100.00% <ø> (ø)
provider-resource 39.57% <ø> (+0.26%) ⬆️
provider-sru 97.97% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants