Skip to content

"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986

Open
sun-9545sunoj wants to merge 5 commits intosktime:mainfrom
sun-9545sunoj:fix/9704-programmatic-lookup
Open

"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986
sun-9545sunoj wants to merge 5 commits intosktime:mainfrom
sun-9545sunoj:fix/9704-programmatic-lookup

Conversation

@sun-9545sunoj
Copy link

Reference Issues/PRs
Closes #9704

What does this implement/fix? Explain your changes.

This PR implements a programmatic lookup for MTYPE and SCITYPE registries to resolve long-standing circular dependency issues between _registry.py and _check.py.

Dynamic Attributes: Replaced static dictionary/list definitions with PEP 562 getattr logic to allow module-level attributes to be generated on-demand.

Recursion Guard: Implemented a global _IS_GENERATING reentrancy lock to prevent infinite recursion loops during attribute lookup, ensuring compatibility with IDE autocompletion and inspection tools.

Metadata Extraction: Updated the tabular registry (MTYPE_REGISTER_TABLE) to 4-element tuples to support lazy extraction of python_dependencies without triggering ModuleNotFoundError for missing soft dependencies like polars.

Contributor Update: Added myself to the .all-contributorsrc file with code and bug badges.

Does your contribution introduce a new dependency? If yes, which one?
No.

What should a reviewer concentrate their feedback on?

The robustness of the _IS_GENERATING guard in handling edge cases during registry assembly.

The expansion of the MTYPE_REGISTER_TABLE tuples to ensure they correctly map to the new dynamic parsing logic.

Did you add any tests for the change?
I verified the changes using the existing datatype test suite. Run pytest skpro/datatypes/tests/ -v showing 107 passed tests. The logic successfully handles environments with and without optional soft dependencies.

Any other comments?
This cleanup provides a more maintainable foundation for the datatypes module and simplifies the internal import structure of the package.

Copilot AI review requested due to automatic review settings March 22, 2026 08:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements lazy, programmatic lookup for MTYPE_SOFT_DEPS and SCITYPE_* registries to break circular imports between _registry.py and _check.py, and expands mtype registry tuples to carry soft dependency metadata.

Changes:

  • Expanded MTYPE_REGISTER_* entries from 3-tuples to 4-tuples to include optional python_dependencies.
  • Added PEP 562 __getattr__-based lazy generation + caching for MTYPE_SOFT_DEPS, SCITYPE_REGISTER, and SCITYPE_LIST, with a reentrancy guard.
  • Updated _check.scitype to avoid importing SCITYPE_LIST at module import time.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
skpro/datatypes/_registry.py Adds lazy registry lookup via __getattr__, cache, and reentrancy guard; updates scitype/mtype logic to use dynamic registries.
skpro/datatypes/_check.py Avoids eager import of SCITYPE_LIST; lazily imports it when needed in scitype().
skpro/datatypes/_table/_registry.py Extends Table mtype registry tuples to include soft dependency metadata (e.g., polars).
skpro/datatypes/_proba/_registry.py Extends Proba mtype registry tuples to the new 4-element format.
.all-contributorsrc Adds contributor entry and minor JSON formatting adjustment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sun-9545sunoj sun-9545sunoj requested a review from Copilot March 22, 2026 08:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sun-9545sunoj sun-9545sunoj requested a review from Copilot March 24, 2026 05:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +114 to +122
for mtype_tuple in MTYPE_REGISTER:
mtype = mtype_tuple[0]
scitype = mtype_tuple[1]
scitypes.add(scitype)
if len(mtype_tuple) >= 4 and mtype_tuple[3] is not None:
deps = mtype_tuple[3]
soft_deps[mtype] = (
list(deps) if isinstance(deps, tuple) else deps
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop over MTYPE_REGISTER overwrites any previously-collected entry in soft_deps for the same mtype. This is problematic for polars mtypes: checker tags declare python_dependencies as ["polars", "pyarrow"], but the table registry currently supplies only "polars", so this code path will drop pyarrow and make softdeps="present" behave incorrectly when polars is installed but pyarrow is not. Prefer not overwriting an existing soft_deps[mtype] (or merge/union), or ensure the registry table lists the full dependency set.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
("polars_eager_table", "Table", "polars.DataFrame representation of a data table", "polars"),
("polars_lazy_table", "Table", "polars.LazyFrame representation of a data table", "polars"),
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polars_eager_table/polars_lazy_table are declared here with soft dependencies set to "polars", but the actual datatype checker tags and conversion guards require both polars and pyarrow (see skpro/datatypes/_table/_check.py and _table/_convert.py). This mismatch will cause soft-dependency filtering to report these mtypes as available when only polars is installed. Update the registry entries to include the full dependency set (e.g., ['polars', 'pyarrow']) to keep registry metadata consistent with runtime requirements.

Suggested change
("polars_eager_table", "Table", "polars.DataFrame representation of a data table", "polars"),
("polars_lazy_table", "Table", "polars.LazyFrame representation of a data table", "polars"),
("polars_eager_table", "Table", "polars.DataFrame representation of a data table", ["polars", "pyarrow"]),
("polars_lazy_table", "Table", "polars.LazyFrame representation of a data table", ["polars", "pyarrow"]),

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +123
if name in ["MTYPE_SOFT_DEPS", "SCITYPE_REGISTER", "SCITYPE_LIST"]:
# Pre-seed _CACHE to prevent infinite recursion during generation
# Re-entrant calls (e.g., from inspect) will receive these references
_CACHE["MTYPE_SOFT_DEPS"] = {}
_CACHE["SCITYPE_REGISTER"] = []
_CACHE["SCITYPE_LIST"] = []

try:
from skpro.datatypes._check import get_check_dict

check_dict = get_check_dict(soft_deps="all")

soft_deps = {}
scitypes = set()

for k, cls in check_dict.items():
if hasattr(cls, "get_class_tag"):
mtype = cls.get_class_tag("name")
scitype = cls.get_class_tag("scitype")
deps = cls.get_class_tag("python_dependencies", None)
else:
mtype = k[0]
scitype = k[1]
deps = None

if deps is not None:
soft_deps[mtype] = (
list(deps) if isinstance(deps, tuple) else deps
)
if scitype is not None:
scitypes.add(scitype)

for mtype_tuple in MTYPE_REGISTER:
mtype = mtype_tuple[0]
scitype = mtype_tuple[1]
scitypes.add(scitype)
if len(mtype_tuple) >= 4 and mtype_tuple[3] is not None:
deps = mtype_tuple[3]
soft_deps[mtype] = (
list(deps) if isinstance(deps, tuple) else deps
)

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new dynamic registry path is not currently covered by a targeted test: in particular, there’s no assertion that MTYPE_SOFT_DEPS only contains optional dependency mtypes and that the dependency lists are complete (e.g., polars requires both polars and pyarrow). Since _registry.py already has lookup tests, consider adding a regression test that validates the expected contents of MTYPE_SOFT_DEPS and the behavior of scitype_to_mtype(..., softdeps='exclude'|'present') against a known baseline.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +110
mtype = cls.get_class_tag("name")
scitype = cls.get_class_tag("scitype")
deps = cls.get_class_tag("python_dependencies", None)
else:
mtype = k[0]
scitype = k[1]
deps = None

if deps is not None:
soft_deps[mtype] = (
list(deps) if isinstance(deps, tuple) else deps
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_registry currently treats any python_dependencies tag found on datatype checker classes as a soft dependency (adds it to MTYPE_SOFT_DEPS). In this codebase many core mtypes have python_dependencies set to required packages like numpy/pandas (see skpro/datatypes/_table/_check.py), so this will incorrectly mark core mtypes as requiring soft deps and change the behavior of scitype_to_mtype(..., softdeps="exclude"|"present"). Consider restricting this extraction to true optional deps only (e.g., only when the tag is a list/tuple of extras), or exclusively derive MTYPE_SOFT_DEPS from the 4th element of MTYPE_REGISTER (now that the registry tables carry that information) and avoid using checker class tags for this purpose.

Suggested change
mtype = cls.get_class_tag("name")
scitype = cls.get_class_tag("scitype")
deps = cls.get_class_tag("python_dependencies", None)
else:
mtype = k[0]
scitype = k[1]
deps = None
if deps is not None:
soft_deps[mtype] = (
list(deps) if isinstance(deps, tuple) else deps
)
# obtain scitype from checker class metadata
scitype = cls.get_class_tag("scitype")
else:
# fall back to scitype from the check_dict key
scitype = k[1]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants