"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986
"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986sun-9545sunoj wants to merge 5 commits intosktime:mainfrom
Conversation
…d update contributors
There was a problem hiding this comment.
Pull request overview
Implements lazy, programmatic lookup for MTYPE_SOFT_DEPS and SCITYPE_* registries to break circular imports between _registry.py and _check.py, and expands mtype registry tuples to carry soft dependency metadata.
Changes:
- Expanded
MTYPE_REGISTER_*entries from 3-tuples to 4-tuples to include optionalpython_dependencies. - Added PEP 562
__getattr__-based lazy generation + caching forMTYPE_SOFT_DEPS,SCITYPE_REGISTER, andSCITYPE_LIST, with a reentrancy guard. - Updated
_check.scitypeto avoid importingSCITYPE_LISTat module import time.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| skpro/datatypes/_registry.py | Adds lazy registry lookup via __getattr__, cache, and reentrancy guard; updates scitype/mtype logic to use dynamic registries. |
| skpro/datatypes/_check.py | Avoids eager import of SCITYPE_LIST; lazily imports it when needed in scitype(). |
| skpro/datatypes/_table/_registry.py | Extends Table mtype registry tuples to include soft dependency metadata (e.g., polars). |
| skpro/datatypes/_proba/_registry.py | Extends Proba mtype registry tuples to the new 4-element format. |
| .all-contributorsrc | Adds contributor entry and minor JSON formatting adjustment. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…d update contributors
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for mtype_tuple in MTYPE_REGISTER: | ||
| mtype = mtype_tuple[0] | ||
| scitype = mtype_tuple[1] | ||
| scitypes.add(scitype) | ||
| if len(mtype_tuple) >= 4 and mtype_tuple[3] is not None: | ||
| deps = mtype_tuple[3] | ||
| soft_deps[mtype] = ( | ||
| list(deps) if isinstance(deps, tuple) else deps | ||
| ) |
There was a problem hiding this comment.
The loop over MTYPE_REGISTER overwrites any previously-collected entry in soft_deps for the same mtype. This is problematic for polars mtypes: checker tags declare python_dependencies as ["polars", "pyarrow"], but the table registry currently supplies only "polars", so this code path will drop pyarrow and make softdeps="present" behave incorrectly when polars is installed but pyarrow is not. Prefer not overwriting an existing soft_deps[mtype] (or merge/union), or ensure the registry table lists the full dependency set.
skpro/datatypes/_table/_registry.py
Outdated
| ("polars_eager_table", "Table", "polars.DataFrame representation of a data table", "polars"), | ||
| ("polars_lazy_table", "Table", "polars.LazyFrame representation of a data table", "polars"), |
There was a problem hiding this comment.
polars_eager_table/polars_lazy_table are declared here with soft dependencies set to "polars", but the actual datatype checker tags and conversion guards require both polars and pyarrow (see skpro/datatypes/_table/_check.py and _table/_convert.py). This mismatch will cause soft-dependency filtering to report these mtypes as available when only polars is installed. Update the registry entries to include the full dependency set (e.g., ['polars', 'pyarrow']) to keep registry metadata consistent with runtime requirements.
| ("polars_eager_table", "Table", "polars.DataFrame representation of a data table", "polars"), | |
| ("polars_lazy_table", "Table", "polars.LazyFrame representation of a data table", "polars"), | |
| ("polars_eager_table", "Table", "polars.DataFrame representation of a data table", ["polars", "pyarrow"]), | |
| ("polars_lazy_table", "Table", "polars.LazyFrame representation of a data table", ["polars", "pyarrow"]), |
| if name in ["MTYPE_SOFT_DEPS", "SCITYPE_REGISTER", "SCITYPE_LIST"]: | ||
| # Pre-seed _CACHE to prevent infinite recursion during generation | ||
| # Re-entrant calls (e.g., from inspect) will receive these references | ||
| _CACHE["MTYPE_SOFT_DEPS"] = {} | ||
| _CACHE["SCITYPE_REGISTER"] = [] | ||
| _CACHE["SCITYPE_LIST"] = [] | ||
|
|
||
| try: | ||
| from skpro.datatypes._check import get_check_dict | ||
|
|
||
| check_dict = get_check_dict(soft_deps="all") | ||
|
|
||
| soft_deps = {} | ||
| scitypes = set() | ||
|
|
||
| for k, cls in check_dict.items(): | ||
| if hasattr(cls, "get_class_tag"): | ||
| mtype = cls.get_class_tag("name") | ||
| scitype = cls.get_class_tag("scitype") | ||
| deps = cls.get_class_tag("python_dependencies", None) | ||
| else: | ||
| mtype = k[0] | ||
| scitype = k[1] | ||
| deps = None | ||
|
|
||
| if deps is not None: | ||
| soft_deps[mtype] = ( | ||
| list(deps) if isinstance(deps, tuple) else deps | ||
| ) | ||
| if scitype is not None: | ||
| scitypes.add(scitype) | ||
|
|
||
| for mtype_tuple in MTYPE_REGISTER: | ||
| mtype = mtype_tuple[0] | ||
| scitype = mtype_tuple[1] | ||
| scitypes.add(scitype) | ||
| if len(mtype_tuple) >= 4 and mtype_tuple[3] is not None: | ||
| deps = mtype_tuple[3] | ||
| soft_deps[mtype] = ( | ||
| list(deps) if isinstance(deps, tuple) else deps | ||
| ) | ||
|
|
There was a problem hiding this comment.
This new dynamic registry path is not currently covered by a targeted test: in particular, there’s no assertion that MTYPE_SOFT_DEPS only contains optional dependency mtypes and that the dependency lists are complete (e.g., polars requires both polars and pyarrow). Since _registry.py already has lookup tests, consider adding a regression test that validates the expected contents of MTYPE_SOFT_DEPS and the behavior of scitype_to_mtype(..., softdeps='exclude'|'present') against a known baseline.
skpro/datatypes/_registry.py
Outdated
| mtype = cls.get_class_tag("name") | ||
| scitype = cls.get_class_tag("scitype") | ||
| deps = cls.get_class_tag("python_dependencies", None) | ||
| else: | ||
| mtype = k[0] | ||
| scitype = k[1] | ||
| deps = None | ||
|
|
||
| if deps is not None: | ||
| soft_deps[mtype] = ( | ||
| list(deps) if isinstance(deps, tuple) else deps | ||
| ) |
There was a problem hiding this comment.
_get_registry currently treats any python_dependencies tag found on datatype checker classes as a soft dependency (adds it to MTYPE_SOFT_DEPS). In this codebase many core mtypes have python_dependencies set to required packages like numpy/pandas (see skpro/datatypes/_table/_check.py), so this will incorrectly mark core mtypes as requiring soft deps and change the behavior of scitype_to_mtype(..., softdeps="exclude"|"present"). Consider restricting this extraction to true optional deps only (e.g., only when the tag is a list/tuple of extras), or exclusively derive MTYPE_SOFT_DEPS from the 4th element of MTYPE_REGISTER (now that the registry tables carry that information) and avoid using checker class tags for this purpose.
| mtype = cls.get_class_tag("name") | |
| scitype = cls.get_class_tag("scitype") | |
| deps = cls.get_class_tag("python_dependencies", None) | |
| else: | |
| mtype = k[0] | |
| scitype = k[1] | |
| deps = None | |
| if deps is not None: | |
| soft_deps[mtype] = ( | |
| list(deps) if isinstance(deps, tuple) else deps | |
| ) | |
| # obtain scitype from checker class metadata | |
| scitype = cls.get_class_tag("scitype") | |
| else: | |
| # fall back to scitype from the check_dict key | |
| scitype = k[1] |
Reference Issues/PRs
Closes #9704
What does this implement/fix? Explain your changes.
This PR implements a programmatic lookup for MTYPE and SCITYPE registries to resolve long-standing circular dependency issues between _registry.py and _check.py.
Dynamic Attributes: Replaced static dictionary/list definitions with PEP 562 getattr logic to allow module-level attributes to be generated on-demand.
Recursion Guard: Implemented a global _IS_GENERATING reentrancy lock to prevent infinite recursion loops during attribute lookup, ensuring compatibility with IDE autocompletion and inspection tools.
Metadata Extraction: Updated the tabular registry (MTYPE_REGISTER_TABLE) to 4-element tuples to support lazy extraction of python_dependencies without triggering ModuleNotFoundError for missing soft dependencies like polars.
Contributor Update: Added myself to the .all-contributorsrc file with code and bug badges.
Does your contribution introduce a new dependency? If yes, which one?
No.
What should a reviewer concentrate their feedback on?
The robustness of the _IS_GENERATING guard in handling edge cases during registry assembly.
The expansion of the MTYPE_REGISTER_TABLE tuples to ensure they correctly map to the new dynamic parsing logic.
Did you add any tests for the change?
I verified the changes using the existing datatype test suite. Run pytest skpro/datatypes/tests/ -v showing 107 passed tests. The logic successfully handles environments with and without optional soft dependencies.
Any other comments?
This cleanup provides a more maintainable foundation for the datatypes module and simplifies the internal import structure of the package.