[Edge AI Suites] IRD and Metro apps: Add how to use NPU docs (open-edge-platform#1873)

rohitkatakol · sajeevrajput · web-flow · commit 3a811c7a7999 · 2026-02-25T10:22:03.000+05:30
Signed-off-by: Katakol, Rohit &lt;rohit.katakol@intel.com&gt;
Co-authored-by: Rajput, Sajeev &lt;sajeev.rajput@intel.com&gt;
diff --git a/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/pcb-anomaly-detection/how-to-guides.md b/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/pcb-anomaly-detection/how-to-guides.md
@@ -6,6 +6,7 @@ This section collects guides for PCB Anomaly Detection sample application.
 - [Manage pipelines](./how-to-guides/manage-pipelines.md)
 - [Run multiple AI pipelines](./how-to-guides/run-multiple-ai-pipelines.md)
 - [Use GPU For Inference](./how-to-guides/use-gpu-for-inference.md)
+- [Use NPU For Inference](./how-to-guides/use-npu-for-inference.md)
 - [Use Your AI Model and Video](./how-to-guides/use-your-ai-model-and-video.md)
 - [Change the Input Video Source](./how-to-guides/change-input-video-source.md)
 - [Scale Video Resolution](./how-to-guides/scale-video-resolution.md)
@@ -25,6 +26,7 @@ This section collects guides for PCB Anomaly Detection sample application.
 ./how-to-guides/manage-pipelines
 ./how-to-guides/run-multiple-ai-pipelines
 ./how-to-guides/use-gpu-for-inference
+./how-to-guides/use-npu-for-inference
 ./how-to-guides/use-your-ai-model-and-video
 ./how-to-guides/change-input-video-source
 ./how-to-guides/scale-video-resolution
diff --git a/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/pcb-anomaly-detection/how-to-guides/use-npu-for-inference.md b/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/pcb-anomaly-detection/how-to-guides/use-npu-for-inference.md
@@ -0,0 +1,77 @@
+# How to use NPU for inference
+
+## Pre-requisites
+
+To take full advantage of hardware acceleration, pipelines can be designed so that different stages—such as decoding and inference—are executed on the most suitable hardware devices.
+
+Low-power accelerators like a Neural Processing Unit (NPU) can offload neural network computation from the CPU or GPU, enabling more efficient resource utilization and improved overall system performance.
+
+DLStreamer and the DLStreamer Pipeline Server support inference on NPU devices, allowing applications built on these frameworks to leverage NPU acceleration for improved efficiency and performance.
+
+Before running inference on an NPU, ensure that:
+- The host system includes a supported NPU device
+- The required NPU drivers are installed and properly configured
+
+For detailed setup instructions, refer to the [documentation](https://docs.openedgeplatform.intel.com/dev/edge-ai-libraries/dlstreamer/dev_guide/advanced_install/advanced_install_guide_prerequisites.html#optional-prerequisite-2-install-intel-npu-drivers).
+
+ For containerized application, following additional changes are required.
+
+### Provide NPU access to the container
+
+This can be done by making the following changes to the docker compose file.
+
+```yaml
+services:
+  dlstreamer-pipeline-server:
+    group_add:
+      # render group ID for ubuntu 22.04 host OS
+      - "110"
+      # render group ID for ubuntu 24.04 host OS
+      - "992"
+    devices:
+      # you can add specific devices in case you don't want to provide access to all like below.
+      - "/dev:/dev"
+```
+The changes above adds the container user to the `render` group and provides access to the NPU devices.
+
+### Hardware specific encoder/decoders
+
+Unlike the changes done for the container above, the following requires a modification to the media pipeline itself.
+
+Gstreamer has a variety of hardware specific encoders and decoders elements such as Intel specific VA-API elements that you can benefit from by adding them into your media pipeline. Examples of such elements are `vah264dec`, `vah264enc`, `vajpegdec`, `vajpegdec`, etc.
+
+Additionally, one can also enforce zero-copy of buffers using GStreamer caps (capabilities) to the pipeline by adding `video/x-raw(memory: VAMemory)` for Intel NPUs.
+
+Read DL Streamer [docs](https://dlstreamer.github.io/dev_guide/gpu_device_selection.html) for more details.
+
+### NPU specific element properties
+
+DL Streamer inference elements also provides property such as `device=NPU` and `pre-process-backend=va` which should be used in pipelines with NPU memory. It performs mapping to the system memory and uses VA pre-processor. Read DL Streamer [docs](https://dlstreamer.github.io/dev_guide/model_preparation.html#model-pre-and-post-processing) for more.
+
+## Tutorial on how to use NPU specific pipelines
+
+> Note - This sample application already provides a default `docker-compose.yml` file that includes the necessary NPU access to the containers.
+
+The pipeline `pcb_anomaly_detection_npu` in `pipeline-server-config.json` contains NPU specific elements and uses NPU backend for inferencing. Follow the steps below to run the pipeline.
+
+### Steps
+
+1. Ensure that the sample application is up and running. If not, follow the steps [here](../get-started.md#set-up-the-application) to setup the application and then bring the services up
+
+    >If you're running multiple instances of app, start the services using `./run.sh up` instead.
+
+    ```sh
+    docker compose up -d
+    ```
+2. Start the pipeline.
+    ```sh
+    ./sample_start.sh -p pcb_anomaly_detection_npu
+    ```
+
+    This will start the pipeline. The inference stream can be viewed on WebRTC, in a browser, at the following url:
+
+    >If you're running multiple instances of app, ensure to provide `NGINX_HTTPS_PORT` number in the url for the app instance i.e. replace <HOST_IP> with <HOST_IP>:<NGINX_HTTPS_PORT>
+
+    ```bash
+    https://<HOST_IP>/mediamtx/anomaly/
+    ```
diff --git a/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/weld-porosity/how-to-guides.md b/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/weld-porosity/how-to-guides.md
@@ -6,6 +6,7 @@ This section collects guides for Weld Porosity sample application.
 - [Manage pipelines](./how-to-guides/manage-pipelines.md)
 - [Run multiple AI pipelines](./how-to-guides/run-multiple-ai-pipelines.md)
 - [Use GPU For Inference](./how-to-guides/use-gpu-for-inference.md)
+- [Use NPU For Inference](./how-to-guides/use-npu-for-inference.md)
 - [Use Your AI Model and Video](./how-to-guides/use-your-ai-model-and-video.md)
 - [Change the Input Video Source](./how-to-guides/change-input-video-source.md)
 - [Scale Video Resolution](./how-to-guides/scale-video-resolution.md)
@@ -25,6 +26,7 @@ This section collects guides for Weld Porosity sample application.
 ./how-to-guides/manage-pipelines
 ./how-to-guides/run-multiple-ai-pipelines
 ./how-to-guides/use-gpu-for-inference
+./how-to-guides/use-npu-for-inference
 ./how-to-guides/use-your-ai-model-and-video
 ./how-to-guides/change-input-video-source
 ./how-to-guides/scale-video-resolution
diff --git a/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/weld-porosity/how-to-guides/use-npu-for-inference.md b/manufacturing-ai-suite/industrial-edge-insights-vision/docs/user-guide/weld-porosity/how-to-guides/use-npu-for-inference.md
@@ -0,0 +1,77 @@
+# How to use NPU for inference
+
+## Pre-requisites
+
+To take full advantage of hardware acceleration, pipelines can be designed so that different stages—such as decoding and inference—are executed on the most suitable hardware devices.
+
+Low-power accelerators like a Neural Processing Unit (NPU) can offload neural network computation from the CPU or GPU, enabling more efficient resource utilization and improved overall system performance.
+
+DLStreamer and the DLStreamer Pipeline Server support inference on NPU devices, allowing applications built on these frameworks to leverage NPU acceleration for improved efficiency and performance.
+
+Before running inference on an NPU, ensure that:
+- The host system includes a supported NPU device
+- The required NPU drivers are installed and properly configured
+
+For detailed setup instructions, refer to the [documentation](https://docs.openedgeplatform.intel.com/dev/edge-ai-libraries/dlstreamer/dev_guide/advanced_install/advanced_install_guide_prerequisites.html#optional-prerequisite-2-install-intel-npu-drivers).
+
+ For containerized application, following additional changes are required.
+
+### Provide NPU access to the container
+
+This can be done by making the following changes to the docker compose file.
+
+```yaml
+services:
+  dlstreamer-pipeline-server:
+    group_add:
+      # render group ID for ubuntu 22.04 host OS
+      - "110"
+      # render group ID for ubuntu 24.04 host OS
+      - "992"
+    devices:
+      # you can add specific devices in case you don't want to provide access to all like below.
+      - "/dev:/dev"
+```
+The changes above adds the container user to the `render` group and provides access to the NPU devices.
+
+### Hardware specific encoder/decoders
+
+Unlike the changes done for the container above, the following requires a modification to the media pipeline itself.
+
+Gstreamer has a variety of hardware specific encoders and decoders elements such as Intel specific VA-API elements that you can benefit from by adding them into your media pipeline. Examples of such elements are `vah264dec`, `vah264enc`, `vajpegdec`, `vajpegdec`, etc.
+
+Additionally, one can also enforce zero-copy of buffers using GStreamer caps (capabilities) to the pipeline by adding `video/x-raw(memory: VAMemory)` for Intel NPUs.
+
+Read DL Streamer [docs](https://dlstreamer.github.io/dev_guide/gpu_device_selection.html) for more details.
+
+### NPU specific element properties
+
+DL Streamer inference elements also provides property such as `device=NPU` and `pre-process-backend=va` which should be used in pipelines with NPU memory. It performs mapping to the system memory and uses VA pre-processor. Read DL Streamer [docs](https://dlstreamer.github.io/dev_guide/model_preparation.html#model-pre-and-post-processing) for more.
+
+## Tutorial on how to use NPU specific pipelines
+
+> Note - This sample application already provides a default `docker-compose.yml` file that includes the necessary NPU access to the containers.
+
+The pipeline `weld_porosity_classification_npu` in `pipeline-server-config.json` contains NPU specific elements and uses NPU backend for inferencing. Follow the steps below to run the pipeline.
+
+### Steps
+
+1. Ensure that the sample application is up and running. If not, follow the steps [here](../get-started.md#set-up-the-application) to setup the application and then bring the services up
+
+    >If you're running multiple instances of app, start the services using `./run.sh up` instead.
+
+    ```sh
+    docker compose up -d
+    ```
+2. Start the pipeline.
+    ```sh
+    ./sample_start.sh -p weld_porosity_classification_npu
+    ```
+
+    This will start the pipeline. The inference stream can be viewed on WebRTC, in a browser, at the following url:
+
+    >If you're running multiple instances of app, ensure to provide `NGINX_HTTPS_PORT` number in the url for the app instance i.e. replace <HOST_IP> with <HOST_IP>:<NGINX_HTTPS_PORT>
+
+    ```bash
+    https://<HOST_IP>/mediamtx/weld/
+    ```
diff --git a/metro-ai-suite/image-based-video-search/docs/user-guide/how-to-use-gpu-for-inference.md b/metro-ai-suite/image-based-video-search/docs/user-guide/how-to-use-gpu-for-inference.md
@@ -7,7 +7,7 @@ if not already done.
 
 ### Volume mount GPU config
 
-Comment out CPU and NPU config and uncomment the GPU config present in [compose.yml](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/image-based-video-search/compose.yml)
+Comment out CPU and NPU volume mount and uncomment the GPU volume mount present in [compose.yml](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/image-based-video-search/compose.yml)
 file under `volumes` section as shown below:
 
 ```sh
@@ -19,7 +19,7 @@ file under `volumes` section as shown below:
 
 ### Start and run the application
 
-After the above changes to docker compose file, follow from step 3 as mentioned in the
+After the above changes to docker compose file, follow from step 3 till end of the section as mentioned in the
 [Get Started](./get-started.md#set-up-and-first-use) guide.
 
 ## Helm deployment
@@ -28,7 +28,7 @@ Follow step 1 mentioned in this [document](./get-started/deploy-with-helm.md#ste
 
 ### Update values.yaml
 
-In `values.yaml` file, change value of `pipeline` config present under
+In [`values.yaml`](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/image-based-video-search/chart/values.yaml) file, change value of `pipeline` config present under
 `dlstreamerpipelineserver` section as shown below:
 
 ```sh
diff --git a/metro-ai-suite/image-based-video-search/docs/user-guide/how-to-use-npu-for-inference.md b/metro-ai-suite/image-based-video-search/docs/user-guide/how-to-use-npu-for-inference.md
@@ -0,0 +1,53 @@
+# How to use NPU for inference
+
+## Docker deployment
+
+Follow steps 1 and 2 mentioned in [Get Started](./get-started.md#set-up-and-first-use) guide
+if not already done.
+
+### Volume mount NPU config
+
+Comment out CPU and GPU volume mount and uncomment the NPU volume mount present in [compose.yml](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/image-based-video-search/compose.yml)
+file under `volumes` section as shown below:
+
+```sh
+    volumes:
+      # - "./src/dlstreamer-pipeline-server/configs/filter-pipeline/config.cpu.json:/home/pipeline-server/config.json"
+      # - "./src/dlstreamer-pipeline-server/configs/filter-pipeline/config.gpu.json:/home/pipeline-server/config.json"
+      - "./src/dlstreamer-pipeline-server/configs/filter-pipeline/config.npu.json:/home/pipeline-server/config.json"
+```
+
+### Start and run the application
+
+After the above changes to docker compose file, follow from step 3 till end of the section as mentioned in the
+[Get Started](./get-started.md#set-up-and-first-use) guide.
+
+## Helm deployment
+
+Follow step 1 mentioned in this [document](./get-started/deploy-with-helm.md#steps-to-deploy) if not already done.
+
+### Update values.yaml
+
+In [values.yaml](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/image-based-video-search/chart/values.yaml) file, change value of `pipeline` config present under
+`dlstreamerpipelineserver` section as shown below:
+
+```sh
+dlstreamerpipelineserver:
+  # key: dlstreamerpipelineserver.repository
+  repository:
+    # key: dlstreamerpipelineserver.repository.image
+    image: docker.io/intel/dlstreamer-pipeline-server
+    # key: dlstreamerpipelineserver.repository.tag
+    tag: 2025.2.0-ubuntu24
+  # key: dlstreamerpipelineserver.replicas
+  replicas: 1
+  # key: dlstreamerpipelineserver.nodeSelector
+  nodeSelector: {}
+  # key: dlstreamerpipelineserver.pipeline
+  pipeline: config.npu.json       #### Changed value from config.cpu.json to config.npu.json
+```
+
+### Start the application
+
+After above changes to `values.yaml` file, follow from step 2 as mentioned in the
+[Helm Deployment Guide](./get-started/deploy-with-helm.md#steps-to-deploy).
diff --git a/metro-ai-suite/image-based-video-search/docs/user-guide/index.md b/metro-ai-suite/image-based-video-search/docs/user-guide/index.md
@@ -87,6 +87,7 @@ continuously and appears in the UI as soon as the application starts.
 get-started
 how-it-works
 how-to-use-gpu-for-inference
+how-to-use-npu-for-inference
 troubleshooting
 release-notes
 
diff --git a/metro-ai-suite/metro-vision-ai-app-recipe/loitering-detection/docs/user-guide/how-to-guides.md b/metro-ai-suite/metro-vision-ai-app-recipe/loitering-detection/docs/user-guide/how-to-guides.md
@@ -13,6 +13,7 @@ This section collects guides for the Loitering Detection sample application.
 
 ./how-to-guides/customize-application
 ./how-to-guides/use-gpu-for-inference
+./how-to-guides/use-npu-for-inference
 ./how-to-guides/view-telemetry-data
 ./how-to-guides/benchmark
 
diff --git a/metro-ai-suite/metro-vision-ai-app-recipe/loitering-detection/docs/user-guide/how-to-guides/use-npu-for-inference.md b/metro-ai-suite/metro-vision-ai-app-recipe/loitering-detection/docs/user-guide/how-to-guides/use-npu-for-inference.md
@@ -0,0 +1,49 @@
+# Use NPU for Inference
+
+## Pre-requisites
+
+In order to benefit from hardware acceleration, pipelines can be constructed in a manner that
+different stages such as decoding, inference etc., can make use of these devices.
+For containerized applications built using the DL Streamer Pipeline Server, first we need to
+provide NPU device(s) access to the container user.
+
+### Provide NPU access to the container
+This can be done by making the following changes to the docker compose file.
+
+```yaml
+services:
+  dlstreamer-pipeline-server:
+    group_add:
+      # render group ID for ubuntu 22.04 host OS
+      - "110"
+      # render group ID for ubuntu 24.04 host OS
+      - "992"
+    devices:
+      # you can add specific devices in case you don't want to provide access to all like below.
+      - "/dev:/dev"
+```
+The changes above adds the container user to the `render` group and provides access to the NPU
+devices.
+
+### Hardware specific encoder/decoders
+Unlike the changes done for the container above, the following requires a modification to the
+media pipeline itself.
+
+Gstreamer has a variety of hardware specific encoders and decoders elements such as Intel
+specific VA-API elements that you can benefit from by adding them into your media pipeline.
+Examples of such elements are `vah264dec`, `vah264enc`, `vajpegdec`, `vajpegdec`, etc.
+
+## Tutorial on how to use NPU specific pipelines
+
+> **Note:** This sample application already provides a default `compose-without-scenescape.yml`
+> file that includes the necessary NPU access to the containers.
+
+The pipeline `object_tracking_npu` in DLStreamer Pipeline Server's `config.json` 
+contains NPU specific elements and uses NPU backend for inferencing. We can start the pipeline
+as follows:
+
+```sh
+./sample_start.sh npu
+```
+
+Go to Grafana as explained in [Get Started](../get-started.md) to view the dashboard.
diff --git a/metro-ai-suite/metro-vision-ai-app-recipe/smart-parking/docs/user-guide/how-to-guides.md b/metro-ai-suite/metro-vision-ai-app-recipe/smart-parking/docs/user-guide/how-to-guides.md
@@ -15,6 +15,7 @@ This section collects guides for the Smart Parking sample application.
 ./how-to-guides/customize-application
 ./how-to-guides/generate-offline-package
 ./how-to-guides/use-gpu-for-inference
+./how-to-guides/use-npu-for-inference
 ./how-to-guides/view-telemetry-data
 ./how-to-guides/benchmark
 
diff --git a/metro-ai-suite/metro-vision-ai-app-recipe/smart-parking/docs/user-guide/how-to-guides/use-npu-for-inference.md b/metro-ai-suite/metro-vision-ai-app-recipe/smart-parking/docs/user-guide/how-to-guides/use-npu-for-inference.md
@@ -0,0 +1,49 @@
+# Use NPU for Inference
+
+## Pre-requisites
+In order to benefit from hardware acceleration, pipelines can be constructed in a manner that
+different stages such as decoding, inference etc., can make use of these devices.
+For containerized applications built using the DL Streamer Pipeline Server, first we need to
+provide NPU device(s) access to the container user.
+
+### Provide NPU access to the container
+This can be done by making the following changes to the docker compose file.
+
+```yaml
+services:
+  dlstreamer-pipeline-server:
+    group_add:
+      # render group ID for ubuntu 22.04 host OS
+      - "110"
+      # render group ID for ubuntu 24.04 host OS
+      - "992"
+    devices:
+      # you can add specific devices in case you don't want to provide access to all like below.
+      - "/dev:/dev"
+```
+
+The changes above adds the container user to the `render` group and provides access to the
+NPU devices.
+
+### Hardware specific encoder/decoders
+Unlike the changes done for the container above, the following requires a modification to the
+media pipeline itself.
+
+Gstreamer has a variety of hardware specific encoders and decoders elements such as Intel
+specific VA-API elements that you can benefit from by adding them into your media pipeline.
+Examples of such elements are `vah264dec`, `vah264enc`, `vajpegdec`, `vajpegdec`, etc.
+
+## Tutorial on how to use NPU specific pipelines
+
+> **Note:** This sample application already provides a default `compose-without-scenescape.yml`
+> file that includes the necessary NPU access to the containers.
+
+The pipeline `yolov11s_npu` in DLStreamer Pipeline Server's `config.json` 
+contains NPU specific elements and uses NPU backend for inferencing. We can start the pipeline
+as follows:
+
+```sh
+./sample_start.sh npu
+```
+
+Go to Grafana as explained in [Get Started](../get-started.md) to view the dashboard.