-
Notifications
You must be signed in to change notification settings - Fork 74
Add NPU provider (New) #2198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add NPU provider (New) #2198
Conversation
40b763c to
3c0ef4f
Compare
| @@ -0,0 +1,5 @@ | |||
| # Checkbox Provider - NPU | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything that should be done when creating a new provider? CI-wise, packaging-wise?
| estimated_duration: 2s | ||
| command: check_accel_permissions.py | ||
| imports: from com.canonical.plainbox import manifest | ||
| requires: manifest.has_npu == 'True' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that new manifest entries must now be submitted to the certification team to be added to C3's feature set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C3 gets them from https://github.com/canonical/blueprints so you could also make a PR there, I think.
|
@fernando79513 and @tomli380576, can you please review this PR? |
pseudocc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a passersby reviewer, see inline comments.
| @@ -0,0 +1,5 @@ | |||
| # Checkbox Provider - NPU | |||
|
|
|||
| This provider includes tests for devices with an NPU. As of right now, it is intended only for Intel NPUs. The tests only run as long as the manifest entry `has_npu` is set to `true`. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should check this instead of defining has_npu
modinfo -Falias intel_vpu devenv-shell-env
pci:v00008086d0000FD3Esv*sd*bc*sc*i*
pci:v00008086d0000B03Esv*sd*bc*sc*i*
pci:v00008086d0000643Esv*sd*bc*sc*i*
pci:v00008086d0000AD1Dsv*sd*bc*sc*i*
pci:v00008086d00007D1Dsv*sd*bc*sc*i*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean there should be no manifest entry at all ?
I have now added a job to check modinfo output, is that what you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about reading the modinfo output in a resource job, but now I think it may not be the ideal way.
Say we have:
- 6.17 kernel and a Panther Lake CPU (modaliases for Panther Lake were added in 6.18).
Then all NPU tests would be skipped, which shouldn't be what we expect. Let's just keep the manifest implementation; it's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test scripts lgtm! Could you provide a bit of documentation on what the expectations are for NPU_UMD_TEST_CONFIG like where it's supposed to be placed at, permissions, what exactly are the expected contents (for example the tree output of a correct setup), etc.
One small question: is the driver snap expected to come preinstalled? If not, I think we should print an error somewhere or mention it in the manifest.
|
|
||
|
|
||
| def main(): | ||
| config_path = os.environ.get("NPU_UMD_TEST_CONFIG") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NPU_UMD_TEST_CONFIG also needs to be an absolute path since it's passed to dirname in one of the jobs
assert Path(config_path).is_absolute()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file (along with the model file) might have to be placed inside the driver snap's current directory or the umd tests will say "the file is bad" and trigger an apparmor deny message. For example if I put the config file in $HOME, this happens:
[Tue Dec 23 10:47:03 2025] audit: type=1400 audit(1766458023.464:331): apparmor="DENIED" operation="open" class="file" profile="snap.intel-npu-driver.npu-umd-test" name="/home/ubuntu/basic.yaml" pid=31451 comm="npu-umd-test" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this file is placed in the current directory by the script which installs the intel-npu-driver snap. The file is pretty much static but we're planning to have the file located directly inside the intel-npu-driver snap (since the format could change between versions of npu-umd-test which is already distributed as a binary in the intel-npu-driver snap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the model files would be bundled with the snap in the future? If so, I think you can use the path of the bundled files as the default inside the test case and only use the environment variable as an override. This would also make it easier to run the test since we won't have to specify NPU_UMD_TEST_CONFIG every time.
The snap is not pre-installed but ideally the devices that do have the has_npu manifest entry should install the snap before running checkbox... Is there any way in checkbox to define this dependency and maybe even have the snap auto-installed by checkbox? |
pseudocc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, thanks!
|
To make this case depend on whether the driver snap exists, add |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2198 +/- ##
==========================================
+ Coverage 53.34% 54.78% +1.44%
==========================================
Files 399 407 +8
Lines 42907 44107 +1200
Branches 7945 8154 +209
==========================================
+ Hits 22887 24163 +1276
+ Misses 19214 19114 -100
- Partials 806 830 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR adds a provider for testing devices with an NPU.
It is currently focused on Intel NPUs which use the accel kernel interface.
The driver for these NPUs is distributed through the
intel-npu-driversnap which also includes a gtest-based testing utilitynpu-umd-test. The tests in this provider check the appropriate firmware version is loaded, the user has the appropriate permissions and the rest runs individual tests from thenpu-umd-testutility.Known issues
Some of the test names coming from the
npu-umd-testtest suite are longer than 80 characters which triggers a warning in checkbox.Tests
Tests have been run on Meteor Lake and Arrow Lake devices.