Skip to content

refactor: centralize model metadata into MODEL_REGISTRY and replace SAMModelName enum with string IDs#607

Open
eugene123tw wants to merge 12 commits intoopen-edge-platform:mainfrom
eugene123tw:feature/model-registry
Open

refactor: centralize model metadata into MODEL_REGISTRY and replace SAMModelName enum with string IDs#607
eugene123tw wants to merge 12 commits intoopen-edge-platform:mainfrom
eugene123tw:feature/model-registry

Conversation

@eugene123tw
Copy link
Copy Markdown
Contributor

Pull Request

Description

Centralize model metadata into a single source of truth (MODEL_REGISTRY) and replace scattered enum-based model identification with string-based model IDs. This refactoring eliminates duplicate model definitions, simplifies the API, and provides a foundation for the /models/supported API endpoint.

Key changes:

  • Created model_registry.py with ModelMetadata dataclass containing all model information (id, type, family, weights_url, sha_sum, etc.)
  • Replaced SAMModelName enum with string model IDs (e.g., "sam-hq-tiny", "sam2-base")
  • Replaced AVAILABLE_IMAGE_ENCODERS dicts with registry-based validation
  • Removed MODEL_MAP from constants.py (now in registry)
  • Updated all model classes to use string IDs with registry validation

Type of Change

  • feat - New feature
  • 🐞 fix - Bug fix
  • 📚 docs - Documentation
  • ♻️ refactor - Code refactoring
  • 🧪 test - Tests
  • 🔧 chore - Maintenance

Related Issues

Breaking Changes

Yes - API signature changes:

Component Before After
Matcher.__init__ sam: SAMModelName = SAMModelName.SAM_HQ_TINY sam: str = "sam-hq-tiny"
PerDino.__init__ sam: SAMModelName = SAMModelName.SAM_HQ_TINY sam: str = "sam-hq-tiny"
SoftMatcher.__init__ sam: SAMModelName = SAMModelName.SAM_HQ_TINY sam: str = "sam-hq-tiny"
GroundedSAM.__init__ sam: SAMModelName = SAMModelName.SAM_HQ_TINY sam: str = "sam-hq-tiny"
SAMPredictor.__init__ sam_model_name: SAMModelName model_id: str
ImageEncoder.__init__ model_id: str (underscore format) model_id: str (hyphen format)

Migration:

# Before
from getiprompt.utils.constants import SAMModelName
model = Matcher(sam=SAMModelName.SAM_HQ_TINY, encoder_model="dinov3_small")

# After
model = Matcher(sam="sam-hq-tiny", encoder_model="dinov3-small")

Examples

New Model Registry Structure

from getiprompt.utils.model_registry import (
    MODEL_REGISTRY, ModelMetadata, ModelType,
    get_model, get_models_by_type
)

# Get a specific model
sam_model = get_model("sam-hq-tiny")
# ModelMetadata(id='sam-hq-tiny', type=<ModelType.SEGMENTER>, family='SAM-HQ', ...)

# Get all segmenters
segmenters = get_models_by_type(ModelType.SEGMENTER)
# [sam-hq, sam-hq-tiny, sam2-tiny, sam2-small, sam2-base, sam2-large]

# Get all encoders
encoders = get_models_by_type(ModelType.ENCODER)
# [dinov2-small, dinov2-base, ..., dinov3-small, dinov3-large, ...]

Model ID Format

All model IDs now use hyphenated format for consistency:

  • SAM models: sam-hq, sam-hq-tiny, sam2-tiny, sam2-small, sam2-base, sam2-large
  • DINOv2 encoders: dinov2-small, dinov2-base, dinov2-large, dinov2-giant
  • DINOv3 encoders: dinov3-small, dinov3-small-plus, dinov3-base, dinov3-large, dinov3-huge

Files Changed

File Change
utils/model_registry.py NEW - Central registry with ModelMetadata dataclass
utils/constants.py Removed SAMModelName enum and MODEL_MAP
models/matcher/matcher.py Uses string IDs, DEFAULT_SAM = "sam-hq-tiny"
models/per_dino.py Uses string IDs with registry validation
models/soft_matcher.py Uses Matcher.DEFAULT_SAM/DEFAULT_ENCODER
models/grounded_sam.py Uses string IDs with registry validation
components/sam/*.py SAMPredictor uses model_id: str
components/encoders/*.py Registry-based validation, removed AVAILABLE_IMAGE_ENCODERS
scripts/benchmark.py Renamed backbonesam_model for clarity
tests/**/*.py Updated to use string IDs

…s, centralize model registry

- Removed SAMModelName enum and replaced it with string IDs for SAM models across the codebase.
- Introduced a centralized model registry to manage model metadata, including weights URLs and configurations.
- Updated model loading and validation functions to utilize the new registry.
- Adjusted argument parsing to accept model IDs instead of enum values.
- Modified relevant classes and functions to accommodate the new model handling approach.
Copilot AI review requested due to automatic review settings December 23, 2025 14:33
@eugene123tw eugene123tw changed the title Feature/model registry refactor: centralize model metadata into MODEL_REGISTRY and replace SAMModelName enum with string IDs Dec 23, 2025
@eugene123tw eugene123tw changed the title refactor: centralize model metadata into MODEL_REGISTRY and replace SAMModelName enum with string IDs refactor: centralize model metadata into MODEL_REGISTRY and replace SAMModelName enum with string IDs Dec 23, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors model metadata management by introducing a centralized model registry (MODEL_REGISTRY) as a single source of truth, replacing scattered enum-based model identification with string-based model IDs. The changes eliminate duplicate model definitions across the codebase, simplify the API by removing enum dependencies, and standardize model ID formatting with hyphens (e.g., "sam-hq-tiny", "dinov3-large").

Key changes:

  • Introduced model_registry.py with ModelMetadata dataclass and registry-based validation functions
  • Replaced SAMModelName enum with string model IDs throughout the codebase
  • Updated model classes (Matcher, PerDino, SoftMatcher, GroundedSAM) to validate against the registry

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
utils/model_registry.py New centralized registry with model metadata and helper functions
utils/constants.py Removed SAMModelName enum and MODEL_MAP dictionary
utils/benchmark.py Updated parameter names from backbone to sam_model_id
utils/args.py Updated argument parsing to use registry-based model lists
scripts/benchmark.py Renamed variables for clarity and updated to use string model IDs
models/soft_matcher.py Updated to use Matcher defaults and string model IDs
models/per_dino.py Added registry validation and default model constants
models/matcher/matcher.py Added registry validation and default model constants
models/matcher/inference.py Updated to use string model IDs
models/grounded_sam.py Added registry validation and default model constant
models/base.py Added _validate_model static method for registry validation
components/sam/pytorch.py Updated to use registry for model loading and validation
components/sam/openvino.py Updated parameter names and docstrings
components/sam/base.py Updated to use registry for model validation
components/encoders/timm.py Removed local AVAILABLE_IMAGE_ENCODERS, uses registry
components/encoders/huggingface.py Removed local AVAILABLE_IMAGE_ENCODERS, uses registry
components/encoders/base.py Updated docstrings and examples
components/encoders/__init__.py Removed AVAILABLE_IMAGE_ENCODERS exports
.github/workflows/library.yml Fixed test execution to fail on errors

eugene123tw and others added 5 commits December 23, 2025 14:36
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@mpryahin mpryahin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Eugene! 🙌

@samet-akcay
Copy link
Copy Markdown
Contributor

samet-akcay commented Dec 23, 2025

if we model types were StrEnum instead of Enum, sam="sam-hq-tiny" would still be possible, no?

Something like?

def get_model(model_type: str | ModelType = "sam-hq-tiny"):
    model_type = ModelType(model_type)    # to convert str to Enum

@samet-akcay
Copy link
Copy Markdown
Contributor

Does MODEL_REGISTRY mean that each model is to be registered when it is first implemented?

@eugene123tw
Copy link
Copy Markdown
Contributor Author

eugene123tw commented Jan 2, 2026

if we model types were StrEnum instead of Enum, sam="sam-hq-tiny" would still be possible, no?

Something like?

def get_model(model_type: str | ModelType = "sam-hq-tiny"):
    model_type = ModelType(model_type)    # to convert str to Enum

@samet-akcay Your example mixes two concepts. sam-hq-tiny is a model ID, not a model type, so ModelType("sam-hq-tiny") would fail.

get_model(model_id: str) retrieves metadata for a specific model by its ID. Were you suggesting we add a ModelID enum? Something like:

class ModelID(StrEnum):
    SAM_HQ_TINY = "sam-hq-tiny"
    DINOV2_BASE = "dinov2-base"
    # ...

For querying by model type, there's get_models_by_type(model_type: ModelType) which returns all models matching that type. Since ModelType is already a StrEnum, both of these work:

get_models_by_type("encoder")
get_models_by_type(ModelType.ENCODER)

No explicit conversion needed—StrEnum members inherit from str, so m.type == "encoder" evaluates to True.

@eugene123tw
Copy link
Copy Markdown
Contributor Author

Does MODEL_REGISTRY mean that each model is to be registered when it is first implemented?

@samet-akcay

Yes, when adding a new model to the system, you add an entry to MODEL_REGISTRY. It's a static declaration, not runtime registration.

The registry serves the backend REST API—when a client sends a GET request, the backend returns available models from this list. It's designed for Geti Prompt App users (via the UI), not for library users who can instantiate models directly.

For library users, the registry is optional—they can use models without it. But for the app, it acts as the single source of truth for what's exposed to the frontend.

mpryahin
mpryahin previously approved these changes Jan 5, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is registry really a main component to be just under getiprompt?

from getiprompt.components.sam.openvino import OpenVINOSAMPredictor
from getiprompt.components.sam.pytorch import PyTorchSAMPredictor
from getiprompt.utils.constants import MODEL_MAP, Backend, SAMModelName
from getiprompt.registry import ModelType, get_model, get_models_by_type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this should be getiprompt.models.registry, no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat duplicated, but more complete..

to improve the dx, we could offload get model capabilities to get_model function? get_model_by_type could still be used under the hood if needed?

# get model by id 
get_model(id="...")

# get model by type
get_model(type="")

self,
model_folder: str | Path,
sam: SAMModelName = SAMModelName.SAM_HQ_TINY,
sam: str = Matcher.DEFAULT_SAM,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation is str, but default value is a StrEnum, which is a bit hard to follow without navigating to Matcher.DEFAULT_SAM`.

num_foreground_points: int = 40,
num_background_points: int = 2,
encoder_model: str = "dinov3_large",
encoder_model: str = DEFAULT_ENCODER,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the previous one, as it is more explicit. It's hard to tell what default encoder is ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants