Vision Language Model

VLM submodule combines the abilities of vision and language models to handle both image and text on the **NXP i.MX9 applications processors.

Installation

1. Clone the repository (make sure git lfs is installed)

# Clone repository
git clone --single-branch -b release/v3.0 https://github.com/nxp-appcodehub/dm-eiq-genai-flow-demonstrator

2. Set up dependencies

cd vlm
./install.sh

Installation Warning:

The "Transformers" python package has a transitive dependency on "Pygments 2.19.2" package with a known vulnerability CVE-2026-4539 with no available fix at the time of this release. Please verify fix availability before integrating this dependency into your product.

Run VLM with Chat Interface GUI

Command to run the VLM and GUI.

# Run VLM
./launch.sh

It runs the chat_interface and the main vlm process. The first time you run the app it will take longer due to download of models.

-m, --model
Specifies the VLM to use. Available models are:
- smolvlm-256M
- smolvlm-500M
-im, --input_image
Path to the image to caption.

Default image delivery and industry in test/data

-p, --precision
Precision of model.
- fp32
- q8

User can choose which part of the model is fp32 vs q8 by changing config.py

-g Use GUI. Default True.

#Example
 ./launch.sh -m smolvlm-500M -im path/to/your/image/image.png -p q8 -g

#Helper
 ./launch.sh --help

Run without GUI

It is posible to run the code without the GUI interface.

python3 -m vlm

Performance on i.MX95 (CPU)

i.MX95	Precision	Vision Encoder	Decoder (TTFT)	Decoder
SmolVLM2-256M	FP32	6.66s	0.84s	0.13s - 0.16s
	INT8	3.31s	0.48s	0.08s - 0.09s
SmolVLM2-500M	FP32	6.76s	1.98s	0.21s - 0.25s
	INT8	3.34s	0.81s	0.12s - 0.19s

SmolVLM2-256 and SmolVLM2-500M share the same vision encoder so performance are the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision Language Model

Installation

1. Clone the repository (make sure git lfs is installed)

2. Set up dependencies

Run VLM with Chat Interface GUI

Run without GUI

Performance on i.MX95 (CPU)

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vision Language Model

Installation

1. Clone the repository (make sure git lfs is installed)

2. Set up dependencies

Run VLM with Chat Interface GUI

Run without GUI

Performance on i.MX95 (CPU)