refactor: centralize model metadata into MODEL_REGISTRY and replace SAMModelName enum with string IDs#607
Conversation
…s, centralize model registry - Removed SAMModelName enum and replaced it with string IDs for SAM models across the codebase. - Introduced a centralized model registry to manage model metadata, including weights URLs and configurations. - Updated model loading and validation functions to utilize the new registry. - Adjusted argument parsing to accept model IDs instead of enum values. - Modified relevant classes and functions to accommodate the new model handling approach.
MODEL_REGISTRY and replace SAMModelName enum with string IDs
There was a problem hiding this comment.
Pull request overview
This PR refactors model metadata management by introducing a centralized model registry (MODEL_REGISTRY) as a single source of truth, replacing scattered enum-based model identification with string-based model IDs. The changes eliminate duplicate model definitions across the codebase, simplify the API by removing enum dependencies, and standardize model ID formatting with hyphens (e.g., "sam-hq-tiny", "dinov3-large").
Key changes:
- Introduced
model_registry.pywithModelMetadatadataclass and registry-based validation functions - Replaced
SAMModelNameenum with string model IDs throughout the codebase - Updated model classes (
Matcher,PerDino,SoftMatcher,GroundedSAM) to validate against the registry
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
utils/model_registry.py |
New centralized registry with model metadata and helper functions |
utils/constants.py |
Removed SAMModelName enum and MODEL_MAP dictionary |
utils/benchmark.py |
Updated parameter names from backbone to sam_model_id |
utils/args.py |
Updated argument parsing to use registry-based model lists |
scripts/benchmark.py |
Renamed variables for clarity and updated to use string model IDs |
models/soft_matcher.py |
Updated to use Matcher defaults and string model IDs |
models/per_dino.py |
Added registry validation and default model constants |
models/matcher/matcher.py |
Added registry validation and default model constants |
models/matcher/inference.py |
Updated to use string model IDs |
models/grounded_sam.py |
Added registry validation and default model constant |
models/base.py |
Added _validate_model static method for registry validation |
components/sam/pytorch.py |
Updated to use registry for model loading and validation |
components/sam/openvino.py |
Updated parameter names and docstrings |
components/sam/base.py |
Updated to use registry for model validation |
components/encoders/timm.py |
Removed local AVAILABLE_IMAGE_ENCODERS, uses registry |
components/encoders/huggingface.py |
Removed local AVAILABLE_IMAGE_ENCODERS, uses registry |
components/encoders/base.py |
Updated docstrings and examples |
components/encoders/__init__.py |
Removed AVAILABLE_IMAGE_ENCODERS exports |
.github/workflows/library.yml |
Fixed test execution to fail on errors |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
if we model types were Something like? def get_model(model_type: str | ModelType = "sam-hq-tiny"):
model_type = ModelType(model_type) # to convert str to Enum |
|
Does |
@samet-akcay Your example mixes two concepts.
class ModelID(StrEnum):
SAM_HQ_TINY = "sam-hq-tiny"
DINOV2_BASE = "dinov2-base"
# ...For querying by model type, there's get_models_by_type("encoder")
get_models_by_type(ModelType.ENCODER)No explicit conversion needed— |
Yes, when adding a new model to the system, you add an entry to The registry serves the backend REST API—when a client sends a GET request, the backend returns available models from this list. It's designed for Geti Prompt App users (via the UI), not for library users who can instantiate models directly. For library users, the registry is optional—they can use models without it. But for the app, it acts as the single source of truth for what's exposed to the frontend. |
There was a problem hiding this comment.
is registry really a main component to be just under getiprompt?
| from getiprompt.components.sam.openvino import OpenVINOSAMPredictor | ||
| from getiprompt.components.sam.pytorch import PyTorchSAMPredictor | ||
| from getiprompt.utils.constants import MODEL_MAP, Backend, SAMModelName | ||
| from getiprompt.registry import ModelType, get_model, get_models_by_type |
There was a problem hiding this comment.
I feel this should be getiprompt.models.registry, no?
There was a problem hiding this comment.
somewhat duplicated, but more complete..
to improve the dx, we could offload get model capabilities to get_model function? get_model_by_type could still be used under the hood if needed?
# get model by id
get_model(id="...")
# get model by type
get_model(type="")| self, | ||
| model_folder: str | Path, | ||
| sam: SAMModelName = SAMModelName.SAM_HQ_TINY, | ||
| sam: str = Matcher.DEFAULT_SAM, |
There was a problem hiding this comment.
Annotation is str, but default value is a StrEnum, which is a bit hard to follow without navigating to Matcher.DEFAULT_SAM`.
| num_foreground_points: int = 40, | ||
| num_background_points: int = 2, | ||
| encoder_model: str = "dinov3_large", | ||
| encoder_model: str = DEFAULT_ENCODER, |
There was a problem hiding this comment.
I would prefer the previous one, as it is more explicit. It's hard to tell what default encoder is ?
Pull Request
Description
Centralize model metadata into a single source of truth (
MODEL_REGISTRY) and replace scattered enum-based model identification with string-based model IDs. This refactoring eliminates duplicate model definitions, simplifies the API, and provides a foundation for the/models/supportedAPI endpoint.Key changes:
model_registry.pywithModelMetadatadataclass containing all model information (id, type, family, weights_url, sha_sum, etc.)SAMModelNameenum with string model IDs (e.g.,"sam-hq-tiny","sam2-base")AVAILABLE_IMAGE_ENCODERSdicts with registry-based validationMODEL_MAPfrom constants.py (now in registry)Type of Change
feat- New featurefix- Bug fixdocs- Documentationrefactor- Code refactoringtest- Testschore- MaintenanceRelated Issues
Breaking Changes
Yes - API signature changes:
Matcher.__init__sam: SAMModelName = SAMModelName.SAM_HQ_TINYsam: str = "sam-hq-tiny"PerDino.__init__sam: SAMModelName = SAMModelName.SAM_HQ_TINYsam: str = "sam-hq-tiny"SoftMatcher.__init__sam: SAMModelName = SAMModelName.SAM_HQ_TINYsam: str = "sam-hq-tiny"GroundedSAM.__init__sam: SAMModelName = SAMModelName.SAM_HQ_TINYsam: str = "sam-hq-tiny"SAMPredictor.__init__sam_model_name: SAMModelNamemodel_id: strImageEncoder.__init__model_id: str(underscore format)model_id: str(hyphen format)Migration:
Examples
New Model Registry Structure
Model ID Format
All model IDs now use hyphenated format for consistency:
sam-hq,sam-hq-tiny,sam2-tiny,sam2-small,sam2-base,sam2-largedinov2-small,dinov2-base,dinov2-large,dinov2-giantdinov3-small,dinov3-small-plus,dinov3-base,dinov3-large,dinov3-hugeFiles Changed
utils/model_registry.pyModelMetadatadataclassutils/constants.pySAMModelNameenum andMODEL_MAPmodels/matcher/matcher.pyDEFAULT_SAM = "sam-hq-tiny"models/per_dino.pymodels/soft_matcher.pyMatcher.DEFAULT_SAM/DEFAULT_ENCODERmodels/grounded_sam.pycomponents/sam/*.pySAMPredictorusesmodel_id: strcomponents/encoders/*.pyAVAILABLE_IMAGE_ENCODERSscripts/benchmark.pybackbone→sam_modelfor claritytests/**/*.py