feat(model) : Add OpenVINO model for SAM3#830
feat(model) : Add OpenVINO model for SAM3#830rajeshgangireddy wants to merge 18 commits intoopen-edge-platform:mainfrom
Conversation
- Add SAM3OpenVINO class supporting ONNX and OpenVINO IR models with the same API as the PyTorch SAM3 model (fit/predict) - Add conversion script: ONNX (usls v2 split) → OpenVINO IR - Add quantization script: NNCF INT8/INT4 weight compression and usls pre-quantized ONNX (Q8/Q4F16/BNB4) download - Add HuggingFace upload script with subdirectory support - Add example script with 6 usage examples (text, box, combined prompts) - Update package exports for SAM3OpenVINO and device_to_openvino_device - Models hosted at rajeshgangireddy/sam3_openvino (FP16, NNCF-INT8, NNCF-INT4, ONNX-Q8 variants)
# Conflicts: # library/src/instantlearn/models/sam3/__init__.py -e Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
…ce info Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
-e Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
-e Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
-e Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Resolve conflict in library/pyproject.toml: keep both 'quantize' (nncf) and 'demo' (gradio) dependency groups, add both to 'full' group.
There was a problem hiding this comment.
Pull request overview
Adds an OpenVINO-based SAM3 inference path to the instantlearn library, alongside tooling (export/convert/quantize/benchmark) and documentation/examples so users can run SAM3 without PyTorch at inference time.
Changes:
- Introduces
SAM3OpenVINO+SAM3OVVariantand wires them into the publicinstantlearn.modelsAPI. - Adds SAM3 OpenVINO scripts for export/conversion/quantization/benchmarking plus end-to-end examples.
- Updates docs, dependencies (
huggingface_hub,quantizeextra), and adds unit tests for the OpenVINO model wrapper.
Reviewed changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
library/src/instantlearn/models/sam3/sam3_openvino.py |
New OpenVINO runtime-backed SAM3 model wrapper supporting classic + visual exemplar prompting. |
library/src/instantlearn/models/sam3/__init__.py |
Exposes SAM3OpenVINO / SAM3OVVariant from the SAM3 package. |
library/src/instantlearn/models/__init__.py |
Exposes OpenVINO SAM3 APIs at the top-level models package. |
library/src/instantlearn/models/sam3/export_openvino.py |
Adds ONNX/OpenVINO export wrappers/utilities for the 5-submodel split. |
library/src/instantlearn/utils/utils.py |
Extends device mapping to map XPU → OpenVINO GPU. |
library/src/instantlearn/utils/__init__.py |
Re-exports device_to_openvino_device. |
library/tests/unit/models/test_sam3_openvino.py |
Adds unit tests for initialization, prompt modes, utilities, and model-file discovery. |
library/examples/sam3_openvino_example.py |
Adds runnable examples covering text/box/point/combined prompts + visual exemplar mode. |
library/src/instantlearn/scripts/sam3/export_sam3_openvino.py |
CLI to export SAM3 (PyTorch → ONNX → OpenVINO IR) and validate. |
library/src/instantlearn/scripts/sam3/convert_sam3_to_openvino.py |
CLI to convert a 5-model ONNX split into OpenVINO IR + validate. |
library/src/instantlearn/scripts/sam3/quantize_sam3_openvino.py |
CLI to apply NNCF weight compression (INT8/INT4) to IR models + validate/compare sizes. |
library/src/instantlearn/scripts/sam3/benchmark_sam3_openvino.py |
Benchmark harness across variants/devices/prompt types; exports tables/Excel/charts. |
library/src/instantlearn/scripts/sam3/__init__.py |
Declares the SAM3 scripts subpackage. |
library/src/instantlearn/scripts/simple_script.py |
Adds an ad-hoc inference/visual comparison script. |
library/README.md |
Documents SAM3 OpenVINO usage, variants, and exemplar mode; updates install extras. |
library/docs/02-quick-start.md |
Adds a quick-start section for SAM3 OpenVINO + exemplar mode. |
library/docs/01-introduction.md |
Lists SAM3 and SAM3OpenVINO in supported models/foundations overview. |
library/pyproject.toml |
Adds huggingface_hub dependency and quantize extra; extends full extra. |
.gitignore |
Ignores benchmark/example outputs and a top-level models/ directory. |
| category_ids = sample.category_ids | ||
| num_visual = max(len(bboxes), len(points)) | ||
| if num_visual and len(texts) != num_visual: | ||
| texts = ["visual"] * num_visual |
There was a problem hiding this comment.
In classic predict(), when categories/category_ids don’t match the number of visual prompts (e.g., multiple bboxes/points but only one category), texts is expanded to num_visual but category_ids is left unchanged. That causes zip_longest(..., category_ids, ...) to yield cat_id=None for extra prompts and later crashes when building pred_labels. Ensure category_ids is also expanded (e.g., repeat 0 or derive from texts) whenever texts is adjusted for visual-only prompts.
| category_ids = sample.category_ids | |
| num_visual = max(len(bboxes), len(points)) | |
| if num_visual and len(texts) != num_visual: | |
| texts = ["visual"] * num_visual | |
| category_ids = list(sample.category_ids or []) | |
| num_visual = max(len(bboxes), len(points)) | |
| if num_visual: | |
| if len(texts) != num_visual: | |
| texts = ["visual"] * num_visual | |
| if len(category_ids) != num_visual: | |
| default_category_id = category_ids[0] if category_ids else 0 | |
| category_ids = [default_category_id] * num_visual |
| all_masks.append(result[0]["masks"]) | ||
| all_boxes.append(boxes_with_scores) | ||
| all_labels.append(torch.full((len(result[0]["boxes"]),), cat_id, dtype=torch.int64)) | ||
|
|
There was a problem hiding this comment.
cat_id can be None here when zip_longest() extends beyond the provided category_ids (e.g., multiple prompts but only one ID). torch.full(..., cat_id, ...) will raise. This should be guarded by normalizing category_ids length to match the prompts before the loop (or by choosing a default label when cat_id is None).
| prompt_mode: Sam3PromptMode = Sam3PromptMode.CLASSIC, | ||
| drop_spatial_bias: bool = True, | ||
| tokenizer_path: str | Path | None = None, | ||
| variant: SAM3OVVariant = SAM3OVVariant.FP16, | ||
| repo_id: str = _DEFAULT_HF_REPO, |
There was a problem hiding this comment.
drop_spatial_bias is accepted and stored, but it never affects which geometry encoder is used (exemplar always routes to geometry-encoder-exemplar). As-is, this parameter is a no-op and can mislead API consumers; either remove it or wire it to select the exemplar vs classic geometry encoder / model set.
| tic = time.time() | ||
| model = SAM3(device=DEVICE) | ||
| toc = time.time() | ||
| sam3_init_time = toc - tic | ||
| print(f"SAM3 initialization time: {sam3_init_time:.2f} seconds") |
There was a problem hiding this comment.
This module runs a full benchmark workflow at import time (model creation + inference). Because it lives under instantlearn/scripts, it’s importable and will execute unexpectedly (e.g., during tooling introspection). Wrap the executable code in a main() and guard it with if __name__ == "__main__":, and prefer the project logger over print() for output.
| python scripts/benchmark_sam3_openvino.py | ||
|
|
||
| # Auto-download INT8 quantised variant | ||
| python scripts/benchmark_sam3_openvino.py --variants openvino-fp16 |
There was a problem hiding this comment.
The usage text says “Auto-download INT8 quantised variant” but the example command uses --variants openvino-fp16. Update the command to the intended INT8 variant name (or adjust the description) so users can run the documented example successfully.
| python scripts/benchmark_sam3_openvino.py --variants openvino-fp16 | |
| python scripts/benchmark_sam3_openvino.py --variants openvino-int8 |
| msg = "pandas is required to export results. Install it with: pip install pandas openpyxl" | ||
| raise ImportError(msg) from exc |
There was a problem hiding this comment.
The raised ImportError message instructs users to run pip install ..., which conflicts with the project’s uv-based dependency management. Adjust to uv pip install pandas openpyxl (or point to a uv sync --extra ... group if you add one).
| try: | ||
| import pandas as pd # noqa: PLC0415 | ||
| except ImportError as exc: | ||
| msg = "pandas and openpyxl are required to save results. Install with: pip install pandas openpyxl" |
There was a problem hiding this comment.
Same as above: this error message recommends pip install ... even though the project uses uv. Please switch the guidance to uv pip install ... (or a documented uv sync --extra ... group).
| msg = "pandas and openpyxl are required to save results. Install with: pip install pandas openpyxl" | |
| msg = "pandas and openpyxl are required to save results. Install with: uv pip install pandas openpyxl" |
| parser.add_argument( | ||
| "--method", | ||
| type=str, | ||
| required=True, | ||
| choices=["nncf-int8", "nncf-int4", "all"], |
There was a problem hiding this comment.
The module docstring describes multiple quantization methods (q8/q4f16/bnb4/all-usls, etc.), but the CLI only supports nncf-int8, nncf-int4, and all. Either implement the documented methods or trim the docstring/usage text and constants to match the actual CLI surface.
| def main() -> None: | ||
| """CLI entry point for SAM3 PyTorch → ONNX → OpenVINO export.""" | ||
| parser = argparse.ArgumentParser( | ||
| description="Export SAM3 PyTorch model to OpenVINO IR via ONNX (4-model split).", |
There was a problem hiding this comment.
The CLI description says “(4-model split)”, but this script exports 5 models (includes geometry-encoder-exemplar). Update the description string to avoid confusion when users look for the expected outputs.
| description="Export SAM3 PyTorch model to OpenVINO IR via ONNX (4-model split).", | |
| description="Export SAM3 PyTorch model to OpenVINO IR via ONNX (5-model split).", |
| side_effect=lambda k: { | ||
| "fpn_feat_0": _RNG.standard_normal((1, 256, 288, 288)).astype(np.float32), | ||
| "fpn_feat_1": _RNG.standard_normal((1, 256, 144, 144)).astype(np.float32), | ||
| "fpn_feat_2": _RNG.standard_normal((1, 256, 72, 72)).astype(np.float32), | ||
| "fpn_pos_2": _RNG.standard_normal((1, 256, 72, 72)).astype(np.float32), |
There was a problem hiding this comment.
These mocks allocate very large feature maps (e.g., 1×256×288×288), which can make the unit test suite slow and memory-hungry even though the values aren’t asserted. Consider shrinking the dummy tensor shapes to the minimum needed for control-flow validation (or using smaller placeholder arrays) to keep CI stable.
There was a problem hiding this comment.
Nice PR! Some general remarks and some styling issues. Some tests are failing and I see some good comments by the Github Copilot reviewer. Could you also add a PR description and some benchmark information? Especially interested in performance difference between FP and INT variants on XPU.
| def __init__( | ||
| self, | ||
| model_dir: str | Path | None = None, | ||
| device: str = "CPU", |
There was a problem hiding this comment.
I think it might be best to use AUTO here to fully support the best XPU device, if present.
| device: str = "CPU", | ||
| confidence_threshold: float = 0.5, | ||
| resolution: int = 1008, | ||
| prompt_mode: Sam3PromptMode = Sam3PromptMode.CLASSIC, |
There was a problem hiding this comment.
Might be better to use VISUAL_EXEMPLAR as we use this as default throughout the library
| prompt_mode: Sam3PromptMode = Sam3PromptMode.CLASSIC, | ||
| drop_spatial_bias: bool = True, | ||
| tokenizer_path: str | Path | None = None, | ||
| variant: SAM3OVVariant = SAM3OVVariant.FP16, |
There was a problem hiding this comment.
How good is the performance of the INT8 variant? If that does not show large accuracy degredation I would advise to use that as default.
- Clarified README installation instructions for quantization tools. - Deleted obsolete simple_script.py as it is no longer needed. - Adjusted model split description in export_openvino.py from 4 to 5 models for accuracy.
Pull Request
TODOs
Description
Type of Change
feat- New featurefix- Bug fixdocs- Documentationrefactor- Code refactoringtest- Testschore- MaintenanceRelated Issues
Breaking Changes
Examples
Screenshots