Skip to content

feat(model) : Add OpenVINO model for SAM3#830

Open
rajeshgangireddy wants to merge 18 commits intoopen-edge-platform:mainfrom
rajeshgangireddy:feature/sam3_onnx
Open

feat(model) : Add OpenVINO model for SAM3#830
rajeshgangireddy wants to merge 18 commits intoopen-edge-platform:mainfrom
rajeshgangireddy:feature/sam3_onnx

Conversation

@rajeshgangireddy
Copy link
Copy Markdown
Contributor

@rajeshgangireddy rajeshgangireddy commented Mar 9, 2026

Pull Request

TODOs

  • Check documentation - add an examples of Openvino SAM3 model
  • Make sure all variants are usable in :
    • Text prompt
    • Visual Prompt
    • Visual Prompt Exemplar Mode
  • Benchmarking
    • Visual examples showing difference in Quality
    • Averaged inference speeds across all variants on CPU and XPU
  • Any additional dependcies - add to pyproject.toml?

Description

Type of Change

  • feat - New feature
  • 🐞 fix - Bug fix
  • 📚 docs - Documentation
  • ♻️ refactor - Code refactoring
  • 🧪 test - Tests
  • 🔧 chore - Maintenance

Related Issues

Breaking Changes


Examples

Screenshots

- Add SAM3OpenVINO class supporting ONNX and OpenVINO IR models
  with the same API as the PyTorch SAM3 model (fit/predict)
- Add conversion script: ONNX (usls v2 split) → OpenVINO IR
- Add quantization script: NNCF INT8/INT4 weight compression
  and usls pre-quantized ONNX (Q8/Q4F16/BNB4) download
- Add HuggingFace upload script with subdirectory support
- Add example script with 6 usage examples (text, box, combined prompts)
- Update package exports for SAM3OpenVINO and device_to_openvino_device
- Models hosted at rajeshgangireddy/sam3_openvino (FP16, NNCF-INT8,
  NNCF-INT4, ONNX-Q8 variants)
# Conflicts:
#	library/src/instantlearn/models/sam3/__init__.py
-e
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
…ce info

Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
-e
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
-e
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
-e
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Resolve conflict in library/pyproject.toml: keep both 'quantize' (nncf)
and 'demo' (gradio) dependency groups, add both to 'full' group.
@rajeshgangireddy rajeshgangireddy marked this pull request as ready for review April 7, 2026 07:48
Copilot AI review requested due to automatic review settings April 7, 2026 07:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an OpenVINO-based SAM3 inference path to the instantlearn library, alongside tooling (export/convert/quantize/benchmark) and documentation/examples so users can run SAM3 without PyTorch at inference time.

Changes:

  • Introduces SAM3OpenVINO + SAM3OVVariant and wires them into the public instantlearn.models API.
  • Adds SAM3 OpenVINO scripts for export/conversion/quantization/benchmarking plus end-to-end examples.
  • Updates docs, dependencies (huggingface_hub, quantize extra), and adds unit tests for the OpenVINO model wrapper.

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
library/src/instantlearn/models/sam3/sam3_openvino.py New OpenVINO runtime-backed SAM3 model wrapper supporting classic + visual exemplar prompting.
library/src/instantlearn/models/sam3/__init__.py Exposes SAM3OpenVINO / SAM3OVVariant from the SAM3 package.
library/src/instantlearn/models/__init__.py Exposes OpenVINO SAM3 APIs at the top-level models package.
library/src/instantlearn/models/sam3/export_openvino.py Adds ONNX/OpenVINO export wrappers/utilities for the 5-submodel split.
library/src/instantlearn/utils/utils.py Extends device mapping to map XPU → OpenVINO GPU.
library/src/instantlearn/utils/__init__.py Re-exports device_to_openvino_device.
library/tests/unit/models/test_sam3_openvino.py Adds unit tests for initialization, prompt modes, utilities, and model-file discovery.
library/examples/sam3_openvino_example.py Adds runnable examples covering text/box/point/combined prompts + visual exemplar mode.
library/src/instantlearn/scripts/sam3/export_sam3_openvino.py CLI to export SAM3 (PyTorch → ONNX → OpenVINO IR) and validate.
library/src/instantlearn/scripts/sam3/convert_sam3_to_openvino.py CLI to convert a 5-model ONNX split into OpenVINO IR + validate.
library/src/instantlearn/scripts/sam3/quantize_sam3_openvino.py CLI to apply NNCF weight compression (INT8/INT4) to IR models + validate/compare sizes.
library/src/instantlearn/scripts/sam3/benchmark_sam3_openvino.py Benchmark harness across variants/devices/prompt types; exports tables/Excel/charts.
library/src/instantlearn/scripts/sam3/__init__.py Declares the SAM3 scripts subpackage.
library/src/instantlearn/scripts/simple_script.py Adds an ad-hoc inference/visual comparison script.
library/README.md Documents SAM3 OpenVINO usage, variants, and exemplar mode; updates install extras.
library/docs/02-quick-start.md Adds a quick-start section for SAM3 OpenVINO + exemplar mode.
library/docs/01-introduction.md Lists SAM3 and SAM3OpenVINO in supported models/foundations overview.
library/pyproject.toml Adds huggingface_hub dependency and quantize extra; extends full extra.
.gitignore Ignores benchmark/example outputs and a top-level models/ directory.

Comment on lines +757 to +760
category_ids = sample.category_ids
num_visual = max(len(bboxes), len(points))
if num_visual and len(texts) != num_visual:
texts = ["visual"] * num_visual
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In classic predict(), when categories/category_ids don’t match the number of visual prompts (e.g., multiple bboxes/points but only one category), texts is expanded to num_visual but category_ids is left unchanged. That causes zip_longest(..., category_ids, ...) to yield cat_id=None for extra prompts and later crashes when building pred_labels. Ensure category_ids is also expanded (e.g., repeat 0 or derive from texts) whenever texts is adjusted for visual-only prompts.

Suggested change
category_ids = sample.category_ids
num_visual = max(len(bboxes), len(points))
if num_visual and len(texts) != num_visual:
texts = ["visual"] * num_visual
category_ids = list(sample.category_ids or [])
num_visual = max(len(bboxes), len(points))
if num_visual:
if len(texts) != num_visual:
texts = ["visual"] * num_visual
if len(category_ids) != num_visual:
default_category_id = category_ids[0] if category_ids else 0
category_ids = [default_category_id] * num_visual

Copilot uses AI. Check for mistakes.
Comment on lines +848 to +851
all_masks.append(result[0]["masks"])
all_boxes.append(boxes_with_scores)
all_labels.append(torch.full((len(result[0]["boxes"]),), cat_id, dtype=torch.int64))

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cat_id can be None here when zip_longest() extends beyond the provided category_ids (e.g., multiple prompts but only one ID). torch.full(..., cat_id, ...) will raise. This should be guarded by normalizing category_ids length to match the prompts before the loop (or by choosing a default label when cat_id is None).

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +203
prompt_mode: Sam3PromptMode = Sam3PromptMode.CLASSIC,
drop_spatial_bias: bool = True,
tokenizer_path: str | Path | None = None,
variant: SAM3OVVariant = SAM3OVVariant.FP16,
repo_id: str = _DEFAULT_HF_REPO,
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop_spatial_bias is accepted and stored, but it never affects which geometry encoder is used (exemplar always routes to geometry-encoder-exemplar). As-is, this parameter is a no-op and can mislead API consumers; either remove it or wire it to select the exemplar vs classic geometry encoder / model set.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +119
tic = time.time()
model = SAM3(device=DEVICE)
toc = time.time()
sam3_init_time = toc - tic
print(f"SAM3 initialization time: {sam3_init_time:.2f} seconds")
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module runs a full benchmark workflow at import time (model creation + inference). Because it lives under instantlearn/scripts, it’s importable and will execute unexpectedly (e.g., during tooling introspection). Wrap the executable code in a main() and guard it with if __name__ == "__main__":, and prefer the project logger over print() for output.

Copilot uses AI. Check for mistakes.
python scripts/benchmark_sam3_openvino.py

# Auto-download INT8 quantised variant
python scripts/benchmark_sam3_openvino.py --variants openvino-fp16
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage text says “Auto-download INT8 quantised variant” but the example command uses --variants openvino-fp16. Update the command to the intended INT8 variant name (or adjust the description) so users can run the documented example successfully.

Suggested change
python scripts/benchmark_sam3_openvino.py --variants openvino-fp16
python scripts/benchmark_sam3_openvino.py --variants openvino-int8

Copilot uses AI. Check for mistakes.
Comment on lines +1221 to +1222
msg = "pandas is required to export results. Install it with: pip install pandas openpyxl"
raise ImportError(msg) from exc
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raised ImportError message instructs users to run pip install ..., which conflicts with the project’s uv-based dependency management. Adjust to uv pip install pandas openpyxl (or point to a uv sync --extra ... group if you add one).

Copilot uses AI. Check for mistakes.
try:
import pandas as pd # noqa: PLC0415
except ImportError as exc:
msg = "pandas and openpyxl are required to save results. Install with: pip install pandas openpyxl"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: this error message recommends pip install ... even though the project uses uv. Please switch the guidance to uv pip install ... (or a documented uv sync --extra ... group).

Suggested change
msg = "pandas and openpyxl are required to save results. Install with: pip install pandas openpyxl"
msg = "pandas and openpyxl are required to save results. Install with: uv pip install pandas openpyxl"

Copilot uses AI. Check for mistakes.
Comment on lines +409 to +413
parser.add_argument(
"--method",
type=str,
required=True,
choices=["nncf-int8", "nncf-int4", "all"],
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring describes multiple quantization methods (q8/q4f16/bnb4/all-usls, etc.), but the CLI only supports nncf-int8, nncf-int4, and all. Either implement the documented methods or trim the docstring/usage text and constants to match the actual CLI surface.

Copilot uses AI. Check for mistakes.
def main() -> None:
"""CLI entry point for SAM3 PyTorch → ONNX → OpenVINO export."""
parser = argparse.ArgumentParser(
description="Export SAM3 PyTorch model to OpenVINO IR via ONNX (4-model split).",
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI description says “(4-model split)”, but this script exports 5 models (includes geometry-encoder-exemplar). Update the description string to avoid confusion when users look for the expected outputs.

Suggested change
description="Export SAM3 PyTorch model to OpenVINO IR via ONNX (4-model split).",
description="Export SAM3 PyTorch model to OpenVINO IR via ONNX (5-model split).",

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +39
side_effect=lambda k: {
"fpn_feat_0": _RNG.standard_normal((1, 256, 288, 288)).astype(np.float32),
"fpn_feat_1": _RNG.standard_normal((1, 256, 144, 144)).astype(np.float32),
"fpn_feat_2": _RNG.standard_normal((1, 256, 72, 72)).astype(np.float32),
"fpn_pos_2": _RNG.standard_normal((1, 256, 72, 72)).astype(np.float32),
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These mocks allocate very large feature maps (e.g., 1×256×288×288), which can make the unit test suite slow and memory-hungry even though the values aren’t asserted. Consider shrinking the dummy tensor shapes to the minimum needed for control-flow validation (or using smaller placeholder arrays) to keep CI stable.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@Daankrol Daankrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Some general remarks and some styling issues. Some tests are failing and I see some good comments by the Github Copilot reviewer. Could you also add a PR description and some benchmark information? Especially interested in performance difference between FP and INT variants on XPU.

def __init__(
self,
model_dir: str | Path | None = None,
device: str = "CPU",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be best to use AUTO here to fully support the best XPU device, if present.

device: str = "CPU",
confidence_threshold: float = 0.5,
resolution: int = 1008,
prompt_mode: Sam3PromptMode = Sam3PromptMode.CLASSIC,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use VISUAL_EXEMPLAR as we use this as default throughout the library

prompt_mode: Sam3PromptMode = Sam3PromptMode.CLASSIC,
drop_spatial_bias: bool = True,
tokenizer_path: str | Path | None = None,
variant: SAM3OVVariant = SAM3OVVariant.FP16,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How good is the performance of the INT8 variant? If that does not show large accuracy degredation I would advise to use that as default.

- Clarified README installation instructions for quantization tools.
- Deleted obsolete simple_script.py as it is no longer needed.
- Adjusted model split description in export_openvino.py from 4 to 5 models for accuracy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants