Skip to content

Conversation

@JustKuzya
Copy link
Collaborator

Added the consistent uniform way to handle entity-types in the code. Rebased to the dev as of Aug 19, resolved all merge conflicts.

JustKuzya and others added 21 commits July 30, 2025 21:06
- added enum
- went over the Dioptra error interfaces to modify common exceptions and errors
- cleaned the imports and started enum unit-test
This refactor changes the following:

- Removes name() method from ArtifactTaskInterface
- Updated code to use the name of the class in place of the result from name() method
- Updated implementations in extra/plugins/artifacts/tasks.py
- Updated Examples
- fixed the missing fall-backs to str at the object-creation places
- added monkey-patch extension of the dumps default method to dispatch enums out in a uniform way
- find all merge-caused issues
- make sure that enum serialization monkey-patch is ignored by mypy
- verify that all tests and mypy work as was expected
- apparently, isort disliker "import A as A" constructs
- fixed excessive map use replacing it with the simple list comprehension
- added enum
- went over the Dioptra error interfaces to modify common exceptions and errors
- cleaned the imports and started enum unit-test
- resolved merge conflicts
- rebased and merge-resolved divergencies with div branch
- verifying rebas-merge correctly applied to the branch
- added enum
- went over the Dioptra error interfaces to modify common exceptions and errors
- cleaned the imports and started enum unit-test
- rebased to dev branch
- resolved all the merge conflicts
@JustKuzya JustKuzya force-pushed the jk-i651-entity-types-cleanup branch from a665543 to a139f0e Compare August 19, 2025 16:57
Copy link
Collaborator

@keithmanville keithmanville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only two representations of a resource are the db name and the display name, why not use a standard Enum with db name as the the name and the display name as the value?

Does not address API routes. I thought this was one of the constants we would want to include.

_type_: The EntityTypes Enum
"""
if not resource_type_name:
# print(f"1. {resource_type_name=} ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove debug prints before merging

return EntityTypes[name]
# print(f"{'!' * 80}\n\t{resource_type_name=} Was Not Found in ENUM {'!' * 80}")
return EntityTypes.UNDEFINED
# ----------------------------------------------------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these "separator" comments. Not used anywhere else in codebase.

Returns:
str: returns inst-ENUM.all-lower-case - snake
"""
if self.db_schema_name == EntityTypes.to_snake_case(self.db_schema_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just

return EntityTypes.to_snake_case(self.db_schema_name)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed from the initial iteration.

new_instance._value_ = (
original_name # Set default value to make sure the Enum base works
)
new_instance.db_schema_name = original_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable names are confusing. We set db_schema_name to original_name and then the get_original_name() returns the db_schema_name. Are we trying to make some distinction between what we call this internally/externally from the enum?

if resource is None:
raise EntityDoesNotExistError(self._resource_type, resource_id=resource_id)
raise EntityDoesNotExistError(
EntityTypes.get_from_string(self._resource_type),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in some cases were are translating to and from string representations when it's not needed. In this case, _resource_type ultimately comes from the calls to generate_resource_tags_endpoint where we pass RESOURCE_TYPE.get_db_schema_name().

We could just store the enum object directly and not need to go back and forth from string representations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and perhaps the way to do this is to do something like this in the ResourceIdService:

    @inject
    def __init__(self, resource_type: EntityTypes):
        self._resource_type = resource_type

And then the ResourceTagsIdService would get this modification:

    @inject
    def __init__(
        self,
        resource_type: EntityTypes,
        resource_id_service: ClassAssistedBuilder["ResourceIdService"],
        tag_id_service: TagIdService,
    ) -> None:
        self._resource_id_service = resource_id_service.build(
            resource_type=resource_type
        )
        self._tag_id_service = tag_id_service

And so on.

Of course, you'll also need to find where the resource_type argument is actually passed in each service layer definition and change that to the appropriate enum.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, with storing the enum reference as much as possible and only output to string when needed.

ML_MODEL = "ml_model", "M.L. Model"
ML_MODEL_VERSION = "ml_model_version", "M.L. Model Version"
# --- Artifact-Related Entities ---
HAS_DRAFT = "has_draft", "Has Draft"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing several entries that aren't Resources. I thought the intent was to capture only Resources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also what I thought. There's a mixture of REST API entities, database tables that contain attributes about a resource type that is a backend implementation detail (the artifact parameters, for example), request parameters like PASSWORD, and HAS_DRAFT which is a derived attribute that we attach to resources in the app's responses.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until the end I was not sure if the third or fourth member would be needed for Errors and/or Logs. Neither was needed, but tuples would be easier to expand if I missed some use-cases later on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make a strong distinction between REST Resources and internal Entity Types which generally map to a base object. If there are other ways or object that were referred to as an entity type, we need to resolve that. @JustKuzya let me know if you see others and let's discuss.

@keithmanville
Copy link
Collaborator

My review mostly posed questions, but it might be more helpful make a recommendation and see what everyone thinks:

  • Make EntityTypes include only Resources
  • Add the API routes to the EntityTypes enum
  • Reduce reliance on get_from_string -- I think in most places we can store the enum directly. The place I'm thinking this doesn't work is where we have a user provided resource type as a string like in drafts.

I think those three would address my biggest questions with this PR.

LOGGER: BoundLogger = structlog.stdlib.get_logger()

RESOURCE_TYPE: Final[str] = "artifact"
ARTIFACT_RESOURCE_TYPE: Final[EntityTypes] = EntityTypes.ARTIFACT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wherever we use EntityTypes now it's cleaner to

from dioptra.restapi.v1.entity_types import EntityTypes

and then access the EntityTypes needed as EntityTypes.ARTIFACT.
Basically we don't need the extra ARTIFACT_RESOURCE_TYPE constants floating around anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with this.

return EntityTypes.to_snake_case(self.db_schema_name)
# ---------------------------------------------------------------------

def get_original_name(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove getters and access properties directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To that end, if you want to make these properties "read-only" (well, as much as you can do that in Python), prefer this pattern

    def __new__(
        cls,
        original_name: str,
        readable_name: str,
    ):
        """Default constructor for stuffing EntityTypes Enum

        Args:
            original_name (str): DB-schema-safe entity type name
            readable_name (str): The name to be read by user in UI/UX/Logs/Errors

        Returns:
            _type_: The created const-style immutable tuple-member the enum
        """
        new_instance = object.__new__(cls)
        new_instance._value_ = (
            original_name  # Set default value to make sure the Enum base works
        )
        new_instance._db_schema_name = original_name
        new_instance._ui_print_name = readable_name
        return new_instance

    @property
    def db_schema_name(self) -> str:
        return EntityTypes.to_snake_case(self._db_schema_name)

    @property
    def ui_print_name(self) -> str:
        return EntityTypes.to_snake_case(self._ui_print_name)

You would also need to remove these from the class definition to make room for the above:

    db_schema_name: str  # DB-compatible name representation
    ui_print_name: str  # Yser-friendly name representation

Copy link
Collaborator

@jkglasbrenner jkglasbrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review focused on the scope of where EntityTypes should (and shouldn't) be used, some implementation details, and code organization. I also left comments on the threads started by @keithmanville when I had something to add.

Comment on lines +23 to +54
# This is the serializer overwrite for the Enum-Derived objects
# Derived from stack overflow article:
# https://stackoverflow.com/questions/36699512/is-it-possible-to-dump-an-enum-in-json-without-passing-an-encoder-to-json-dumps
#
_saved_default = (
# Persisting the current state of the JSON serializer
JSONEncoder().default
) # Save the original default method for json-dumps


def _new_default(self, obj) -> Any:
"""Default function that would also handle the Enums in serialization
Args:
obj (_type_): Python object to serialize
Returns:
Any: The method to serialize the object,
with special handler for the Enum
"""
if isinstance(obj, Enum):
# OUR SPECIAL CASE WHEN THE OBJECT IS ENUM-DERIVED
return obj.name # Cough up the default property .name or .value or ._value_
else:
# Dispatch bask to the original mechanism of json-dumping for non-Enum objects
return _saved_default


# Glue-up the new default method with the patch-back,
# and tell MyPy not to wrinkle it's nose at the "monkey patch"
JSONEncoder.default = _new_default # type: ignore
# End of the monkey-patching the serialization fot he Enums
# ======================================================================
Copy link
Collaborator

@jkglasbrenner jkglasbrenner Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of monkeypatch seems brittle and unpredictable, as it only is applied once this file is imported. Depending on how code is imported and called, it's very possible for json.dump to use different versions of the JSONEncoder default during the lifecycle of the application. The original default also is never restored, so this is a permanent override.

The usual way to do this kind of thing is to inherit from the JSONEncoder class, override the default method, and then pass it to json.dump or json.dumps explicitly:

import json
from enum import Enum


class EnumJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Enum):
            return obj.name

        # Dispatch bask to the original mechanism of json-dumping for non-Enum objects
        return super().default(obj)


class Color(Enum):
    RED = 1
    GREEN = 2
    BLUE = 3


print(json.dumps({"favorite_color": Color.BLUE, "name": "First", "id": 1}, cls=EnumJSONEncoder))

which prints

{"favorite_color": "BLUE", "name": "First", "id": 1}

Stepping back, where is json.dump or json.dumps being used that requires this approach? Is it an invocation of json.dump or json.dumps that we control in the codebase? Or is it being invoked by one of our dependent libraries, and there's no way to provide the custom encoder?

If we truly need to override this for the entire application like this, then we should consider doing something similar to dioptra.restapi.patches.monkey_patch_flask_restx, which is called within dioptra.restapi.app.create_app. This would better control where in the application lifecycle the behavior gets changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the instance switch suggested, but I don't remember what was breaking without the patch. I agree that probably a uniform way of doing it once and for all enums is better way to handle the Enums serialization.

Comment on lines 194 to 228
@staticmethod
def to_snake_case(text_to_snake: str) -> str:
"""Converts a string to lower-case snake_case format.
Handles various input formats like camelCase, PascalCase,
and strings with spaces, comas, hyphens, colons,
semicolons, or hyphens.
Args:
text_to_snake (str): text to format to snake case
Returns:
str: Snake-case formatted name-type string
"""
# Bail out for empty input
if not text_to_snake:
return ""
# Replace punctuation with spaces
text_to_snake = (
text_to_snake.replace("-", " ")
.replace(",", " ")
.replace(".", " ")
.replace(";", " ")
.replace(":", " ")
)
# Replace spaces with underscores and convert to lowercase
# Insert underscore before uppercase letters in camelCase/PascalCase
# Handles cases like:
# "PascalCase" -> "Pascal_Case" and
# "varInCamelCase" -> "var_In_Camel_Case"
text_to_snake = re.sub(r"(?<!^)(?=[A-Z])", " ", text_to_snake)
# Clean multiple spaces by collapsing them, if they happen
text_to_snake = re.sub(r"\s+", " ", text_to_snake).strip()
# Replace spaces with underscores
return text_to_snake.replace(" ", "_").lower()
# ------------------------------------------------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this to be a general utility function instead of a method that is specific to the details of this class. I recommend extracting this into dioptra.restapi.utils, it would fit nicely right below the slugify function.

Also, after moving the function, I would move the regular expressions into module-level level constants and compile them, since in principle this function could be called many times.

CAMEL_PASCAL_SEP_REGEX = re.compile(r"(?<!^)(?=[A-Z])")
MULTI_SPACE_REGEX = re.compile(r"\s+")

# Used like
CAMEL_PASCAL_SEP_REGEX.sub(" ", text_to_snake)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Really good thought. Except for one thing: I always prefer to use a common prefix, in this care REGEX_... for similar purpose constants - this way the code assist becomes more helpful as all similar use constants have same prefix and pop up as a suggestion, developer only needs to read the suggestions carefully and choose. I know that most of the developers are "writers", not "readers" ☺ , but still it is more helpful than not for the most cases.

return EntityTypes.to_snake_case(self.db_schema_name)
# ---------------------------------------------------------------------

def get_original_name(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To that end, if you want to make these properties "read-only" (well, as much as you can do that in Python), prefer this pattern

    def __new__(
        cls,
        original_name: str,
        readable_name: str,
    ):
        """Default constructor for stuffing EntityTypes Enum

        Args:
            original_name (str): DB-schema-safe entity type name
            readable_name (str): The name to be read by user in UI/UX/Logs/Errors

        Returns:
            _type_: The created const-style immutable tuple-member the enum
        """
        new_instance = object.__new__(cls)
        new_instance._value_ = (
            original_name  # Set default value to make sure the Enum base works
        )
        new_instance._db_schema_name = original_name
        new_instance._ui_print_name = readable_name
        return new_instance

    @property
    def db_schema_name(self) -> str:
        return EntityTypes.to_snake_case(self._db_schema_name)

    @property
    def ui_print_name(self) -> str:
        return EntityTypes.to_snake_case(self._ui_print_name)

You would also need to remove these from the class definition to make room for the above:

    db_schema_name: str  # DB-compatible name representation
    ui_print_name: str  # Yser-friendly name representation

ML_MODEL = "ml_model", "M.L. Model"
ML_MODEL_VERSION = "ml_model_version", "M.L. Model Version"
# --- Artifact-Related Entities ---
HAS_DRAFT = "has_draft", "Has Draft"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also what I thought. There's a mixture of REST API entities, database tables that contain attributes about a resource type that is a backend implementation detail (the artifact parameters, for example), request parameters like PASSWORD, and HAS_DRAFT which is a derived attribute that we attach to resources in the app's responses.

Comment on lines +178 to +182
ENTRYPOINT = "entry_point", "Entry Point"
ENTRYPOINT_PLUGIN = "entry_point_plugin", "Entry Point Plugin"
# --- Entry-Point Entities ---
ENTRY_POINT = "entry_point", "Entry Point"
ENTRY_POINT_PLUGIN = "entry_point_plugin", "Entry Point Plugin"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use one of these everywhere instead of aliasing the two variants? I know there's inconsistency in naming in different sections of the code base (entrypoint is found in the Flask code, while entry_point is used in the SQLAlchemy ORM code), but will that impact which of these get used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DB names seem to be set in stone (or rather "Philosophers Stone") of SQLAlchemy. Meanwhile the Textual representation is supposed to be human readable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole idea of the entity was to allow a "translation" between the entity type as it is in the DB, the names it references, and the human readable description. Having an Enum also will likely prove useful for the repository pattern allowing a Repository instance to have a reference to the Entity that it is providing.

Comment on lines 326 to 330
class QueryParameterNotUniqueError(QueryParameterValidationError):
"""Query Parameters failed unique validatation check."""

def __init__(self, type: str, **kwargs):
def __init__(self, type: EntityTypes, **kwargs):
super().__init__(type, "unique", **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree with changing this error to use an enum like this. Query parameters aren't entities, they're input data that's attached to a request sent to the endpoint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with this. The context is the REST resource, not an entity resource.

Comment on lines +282 to +285
def __init__(self, type: EntityTypes, column: str, **kwargs):
super().__init__(
f"The sort parameter, {column}, for {type.get_print_name()} is not sortable."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place I'm not sure I would agree we should use EntityTypes. The sorting parameters map to attributes/properties on an Entity (or resource), they are not entities themselves.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type was passed in as a string value to provide context. I am not sure we should have two ways of providing this context. Do you have a suggestion how we can provide the proper context to indicate the properties that are (attempted) for sorting?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after thinking about this a little bit, perhaps the better way to think about this, as not so much mapping to an entity but to a resource, with resource being the actual REST API Path we are mapping to. Therefore the mapping is not to an entity, but to the resource path.

"""Input parameters failed validation."""

def __init__(self, type: str, constraint: str, **kwargs):
def __init__(self, type: EntityTypes, constraint: str, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that EntityTypes belongs here, see my other comment about query parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of EntityType, this should be the REST resource on which this query parameter exists.

if resource is None:
raise EntityDoesNotExistError(self._resource_type, resource_id=resource_id)
raise EntityDoesNotExistError(
EntityTypes.get_from_string(self._resource_type),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and perhaps the way to do this is to do something like this in the ResourceIdService:

    @inject
    def __init__(self, resource_type: EntityTypes):
        self._resource_type = resource_type

And then the ResourceTagsIdService would get this modification:

    @inject
    def __init__(
        self,
        resource_type: EntityTypes,
        resource_id_service: ClassAssistedBuilder["ResourceIdService"],
        tag_id_service: TagIdService,
    ) -> None:
        self._resource_id_service = resource_id_service.build(
            resource_type=resource_type
        )
        self._tag_id_service = tag_id_service

And so on.

Of course, you'll also need to find where the resource_type argument is actually passed in each service layer definition and change that to the appropriate enum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create an EntityType class to encapsulate constants associated with Dioptra Resources

5 participants