Jk i651 entity types cleanup #929

JustKuzya · 2025-08-19T16:46:43Z

Added the consistent uniform way to handle entity-types in the code. Rebased to the dev as of Aug 19, resolved all merge conflicts.

- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test

This refactor changes the following: - Removes name() method from ArtifactTaskInterface - Updated code to use the name of the class in place of the result from name() method - Updated implementations in extra/plugins/artifacts/tasks.py - Updated Examples

- fixed the missing fall-backs to str at the object-creation places - added monkey-patch extension of the dumps default method to dispatch enums out in a uniform way

- find all merge-caused issues - make sure that enum serialization monkey-patch is ignored by mypy - verify that all tests and mypy work as was expected

- apparently, isort disliker "import A as A" constructs

- fixed excessive map use replacing it with the simple list comprehension

- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test - resolved merge conflicts

- rebased and merge-resolved divergencies with div branch

- verifying rebas-merge correctly applied to the branch

- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test - rebased to dev branch - resolved all the merge conflicts

keithmanville

If the only two representations of a resource are the db name and the display name, why not use a standard Enum with db name as the the name and the display name as the value?

Does not address API routes. I thought this was one of the constants we would want to include.

keithmanville · 2025-08-27T13:39:39Z

src/dioptra/restapi/v1/entity_types.py

+            _type_: The EntityTypes Enum
+        """
+        if not resource_type_name:
+            # print(f"1. {resource_type_name=} ")


remove debug prints before merging

keithmanville · 2025-08-27T13:40:15Z

src/dioptra/restapi/v1/entity_types.py

+                    return EntityTypes[name]
+        # print(f"{'!' * 80}\n\t{resource_type_name=} Was Not Found in ENUM {'!' * 80}")
+        return EntityTypes.UNDEFINED
+        # ----------------------------------------------------------------------


remove these "separator" comments. Not used anywhere else in codebase.

src/dioptra/restapi/v1/entity_types.py

keithmanville · 2025-08-27T14:07:39Z

src/dioptra/restapi/v1/entity_types.py

+        Returns:
+            str: returns inst-ENUM.all-lower-case - snake
+        """
+        if self.db_schema_name == EntityTypes.to_snake_case(self.db_schema_name):


why not just

return EntityTypes.to_snake_case(self.db_schema_name)

Missed from the initial iteration.

keithmanville · 2025-08-27T14:09:47Z

src/dioptra/restapi/v1/entity_types.py

+        new_instance._value_ = (
+            original_name  # Set default value to make sure the Enum base works
+        )
+        new_instance.db_schema_name = original_name


variable names are confusing. We set db_schema_name to original_name and then the get_original_name() returns the db_schema_name. Are we trying to make some distinction between what we call this internally/externally from the enum?

src/dioptra/restapi/v1/entity_types.py

keithmanville · 2025-08-27T14:29:45Z

src/dioptra/restapi/v1/shared/tags/service.py

        if resource is None:
-            raise EntityDoesNotExistError(self._resource_type, resource_id=resource_id)
+            raise EntityDoesNotExistError(
+                EntityTypes.get_from_string(self._resource_type),


I think in some cases were are translating to and from string representations when it's not needed. In this case, _resource_type ultimately comes from the calls to generate_resource_tags_endpoint where we pass RESOURCE_TYPE.get_db_schema_name().

We could just store the enum object directly and not need to go back and forth from string representations.

I agree, and perhaps the way to do this is to do something like this in the ResourceIdService:

@inject def __init__(self, resource_type: EntityTypes): self._resource_type = resource_type

And then the ResourceTagsIdService would get this modification:

@inject def __init__( self, resource_type: EntityTypes, resource_id_service: ClassAssistedBuilder["ResourceIdService"], tag_id_service: TagIdService, ) -> None: self._resource_id_service = resource_id_service.build( resource_type=resource_type ) self._tag_id_service = tag_id_service

And so on.

Of course, you'll also need to find where the resource_type argument is actually passed in each service layer definition and change that to the appropriate enum.

agree, with storing the enum reference as much as possible and only output to string when needed.

keithmanville · 2025-08-27T14:40:42Z

src/dioptra/restapi/v1/entity_types.py

+    ML_MODEL = "ml_model", "M.L. Model"
+    ML_MODEL_VERSION = "ml_model_version", "M.L. Model Version"
+    # --- Artifact-Related Entities ---
+    HAS_DRAFT = "has_draft", "Has Draft"


I'm seeing several entries that aren't Resources. I thought the intent was to capture only Resources.

This is also what I thought. There's a mixture of REST API entities, database tables that contain attributes about a resource type that is a backend implementation detail (the artifact parameters, for example), request parameters like PASSWORD, and HAS_DRAFT which is a derived attribute that we attach to resources in the app's responses.

Until the end I was not sure if the third or fourth member would be needed for Errors and/or Logs. Neither was needed, but tuples would be easier to expand if I missed some use-cases later on.

I think we need to make a strong distinction between REST Resources and internal Entity Types which generally map to a base object. If there are other ways or object that were referred to as an entity type, we need to resolve that. @JustKuzya let me know if you see others and let's discuss.

keithmanville · 2025-08-27T14:59:00Z

My review mostly posed questions, but it might be more helpful make a recommendation and see what everyone thinks:

Make EntityTypes include only Resources
Add the API routes to the EntityTypes enum
Reduce reliance on get_from_string -- I think in most places we can store the enum directly. The place I'm thinking this doesn't work is where we have a user provided resource type as a string like in drafts.

I think those three would address my biggest questions with this PR.

keithmanville · 2025-08-27T15:04:51Z

src/dioptra/restapi/v1/artifacts/service.py

 LOGGER: BoundLogger = structlog.stdlib.get_logger()

-RESOURCE_TYPE: Final[str] = "artifact"
+ARTIFACT_RESOURCE_TYPE: Final[EntityTypes] = EntityTypes.ARTIFACT


I think wherever we use EntityTypes now it's cleaner to

from dioptra.restapi.v1.entity_types import EntityTypes

and then access the EntityTypes needed as EntityTypes.ARTIFACT.
Basically we don't need the extra ARTIFACT_RESOURCE_TYPE constants floating around anymore.

agree with this.

keithmanville · 2025-08-27T15:14:20Z

src/dioptra/restapi/v1/entity_types.py

+            return EntityTypes.to_snake_case(self.db_schema_name)
+        # ---------------------------------------------------------------------
+
+    def get_original_name(self) -> str:


remove getters and access properties directly.

To that end, if you want to make these properties "read-only" (well, as much as you can do that in Python), prefer this pattern

def __new__( cls, original_name: str, readable_name: str, ): """Default constructor for stuffing EntityTypes Enum Args: original_name (str): DB-schema-safe entity type name readable_name (str): The name to be read by user in UI/UX/Logs/Errors Returns: _type_: The created const-style immutable tuple-member the enum """ new_instance = object.__new__(cls) new_instance._value_ = ( original_name # Set default value to make sure the Enum base works ) new_instance._db_schema_name = original_name new_instance._ui_print_name = readable_name return new_instance @property def db_schema_name(self) -> str: return EntityTypes.to_snake_case(self._db_schema_name) @property def ui_print_name(self) -> str: return EntityTypes.to_snake_case(self._ui_print_name)

You would also need to remove these from the class definition to make room for the above:

db_schema_name: str # DB-compatible name representation ui_print_name: str # Yser-friendly name representation

jkglasbrenner

My review focused on the scope of where EntityTypes should (and shouldn't) be used, some implementation details, and code organization. I also left comments on the threads started by @keithmanville when I had something to add.

jkglasbrenner · 2025-08-27T16:58:41Z

src/dioptra/restapi/v1/entity_types.py

+# This is the serializer overwrite for the Enum-Derived objects
+# Derived from stack overflow article:
+# https://stackoverflow.com/questions/36699512/is-it-possible-to-dump-an-enum-in-json-without-passing-an-encoder-to-json-dumps
+#
+_saved_default = (
+    # Persisting the current state of the JSON serializer
+    JSONEncoder().default
+)  # Save the original default method for json-dumps
+
+
+def _new_default(self, obj) -> Any:
+    """Default function that would also handle the Enums in serialization
+
+    Args:
+        obj (_type_): Python object to serialize
+    Returns:
+        Any: The method to serialize the object,
+            with special handler for the Enum
+    """
+    if isinstance(obj, Enum):
+        # OUR SPECIAL CASE WHEN THE OBJECT IS ENUM-DERIVED
+        return obj.name  # Cough up the default property .name or .value or ._value_
+    else:
+        # Dispatch bask to the original mechanism of json-dumping for non-Enum objects
+        return _saved_default
+
+
+# Glue-up the new default method with the patch-back,
+# and tell MyPy not to wrinkle it's nose at the "monkey patch"
+JSONEncoder.default = _new_default  # type: ignore
+# End of the monkey-patching the serialization fot he Enums
+# ======================================================================


This kind of monkeypatch seems brittle and unpredictable, as it only is applied once this file is imported. Depending on how code is imported and called, it's very possible for json.dump to use different versions of the JSONEncoder default during the lifecycle of the application. The original default also is never restored, so this is a permanent override.

The usual way to do this kind of thing is to inherit from the JSONEncoder class, override the default method, and then pass it to json.dump or json.dumps explicitly:

import json from enum import Enum class EnumJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, Enum): return obj.name # Dispatch bask to the original mechanism of json-dumping for non-Enum objects return super().default(obj) class Color(Enum): RED = 1 GREEN = 2 BLUE = 3 print(json.dumps({"favorite_color": Color.BLUE, "name": "First", "id": 1}, cls=EnumJSONEncoder))

which prints

{"favorite_color": "BLUE", "name": "First", "id": 1}

Stepping back, where is json.dump or json.dumps being used that requires this approach? Is it an invocation of json.dump or json.dumps that we control in the codebase? Or is it being invoked by one of our dependent libraries, and there's no way to provide the custom encoder?

If we truly need to override this for the entire application like this, then we should consider doing something similar to dioptra.restapi.patches.monkey_patch_flask_restx, which is called within dioptra.restapi.app.create_app. This would better control where in the application lifecycle the behavior gets changed.

I like the instance switch suggested, but I don't remember what was breaking without the patch. I agree that probably a uniform way of doing it once and for all enums is better way to handle the Enums serialization.

jkglasbrenner · 2025-08-27T17:13:55Z

src/dioptra/restapi/v1/entity_types.py

+    @staticmethod
+    def to_snake_case(text_to_snake: str) -> str:
+        """Converts a string to lower-case snake_case format.
+        Handles various input formats like camelCase, PascalCase,
+        and strings with spaces, comas, hyphens, colons,
+        semicolons, or hyphens.
+
+        Args:
+            text_to_snake (str):  text to format to snake case
+
+        Returns:
+            str: Snake-case formatted name-type string
+        """
+        # Bail out for empty input
+        if not text_to_snake:
+            return ""
+        # Replace punctuation with spaces
+        text_to_snake = (
+            text_to_snake.replace("-", " ")
+            .replace(",", " ")
+            .replace(".", " ")
+            .replace(";", " ")
+            .replace(":", " ")
+        )
+        # Replace spaces with underscores and convert to lowercase
+        # Insert underscore before uppercase letters in camelCase/PascalCase
+        # Handles cases like:
+        #   "PascalCase" -> "Pascal_Case" and
+        #   "varInCamelCase" -> "var_In_Camel_Case"
+        text_to_snake = re.sub(r"(?<!^)(?=[A-Z])", " ", text_to_snake)
+        # Clean multiple spaces by collapsing them, if they happen
+        text_to_snake = re.sub(r"\s+", " ", text_to_snake).strip()
+        # Replace spaces with underscores
+        return text_to_snake.replace(" ", "_").lower()
+        # ------------------------------------------------------------------


I consider this to be a general utility function instead of a method that is specific to the details of this class. I recommend extracting this into dioptra.restapi.utils, it would fit nicely right below the slugify function.

Also, after moving the function, I would move the regular expressions into module-level level constants and compile them, since in principle this function could be called many times.

CAMEL_PASCAL_SEP_REGEX = re.compile(r"(?<!^)(?=[A-Z])") MULTI_SPACE_REGEX = re.compile(r"\s+") # Used like CAMEL_PASCAL_SEP_REGEX.sub(" ", text_to_snake)

Done. Really good thought. Except for one thing: I always prefer to use a common prefix, in this care REGEX_... for similar purpose constants - this way the code assist becomes more helpful as all similar use constants have same prefix and pop up as a suggestion, developer only needs to read the suggestions carefully and choose. I know that most of the developers are "writers", not "readers" ☺ , but still it is more helpful than not for the most cases.

jkglasbrenner · 2025-08-27T17:27:10Z

src/dioptra/restapi/v1/entity_types.py

+            return EntityTypes.to_snake_case(self.db_schema_name)
+        # ---------------------------------------------------------------------
+
+    def get_original_name(self) -> str:


To that end, if you want to make these properties "read-only" (well, as much as you can do that in Python), prefer this pattern

def __new__( cls, original_name: str, readable_name: str, ): """Default constructor for stuffing EntityTypes Enum Args: original_name (str): DB-schema-safe entity type name readable_name (str): The name to be read by user in UI/UX/Logs/Errors Returns: _type_: The created const-style immutable tuple-member the enum """ new_instance = object.__new__(cls) new_instance._value_ = ( original_name # Set default value to make sure the Enum base works ) new_instance._db_schema_name = original_name new_instance._ui_print_name = readable_name return new_instance @property def db_schema_name(self) -> str: return EntityTypes.to_snake_case(self._db_schema_name) @property def ui_print_name(self) -> str: return EntityTypes.to_snake_case(self._ui_print_name)

You would also need to remove these from the class definition to make room for the above:

db_schema_name: str # DB-compatible name representation ui_print_name: str # Yser-friendly name representation

jkglasbrenner · 2025-08-27T17:35:20Z

src/dioptra/restapi/v1/entity_types.py

+    ML_MODEL = "ml_model", "M.L. Model"
+    ML_MODEL_VERSION = "ml_model_version", "M.L. Model Version"
+    # --- Artifact-Related Entities ---
+    HAS_DRAFT = "has_draft", "Has Draft"


This is also what I thought. There's a mixture of REST API entities, database tables that contain attributes about a resource type that is a backend implementation detail (the artifact parameters, for example), request parameters like PASSWORD, and HAS_DRAFT which is a derived attribute that we attach to resources in the app's responses.

jkglasbrenner · 2025-08-27T17:38:54Z

src/dioptra/restapi/v1/entity_types.py

+    ENTRYPOINT = "entry_point", "Entry Point"
+    ENTRYPOINT_PLUGIN = "entry_point_plugin", "Entry Point Plugin"
+    # --- Entry-Point Entities ---
+    ENTRY_POINT = "entry_point", "Entry Point"
+    ENTRY_POINT_PLUGIN = "entry_point_plugin", "Entry Point Plugin"


Why not just use one of these everywhere instead of aliasing the two variants? I know there's inconsistency in naming in different sections of the code base (entrypoint is found in the Flask code, while entry_point is used in the SQLAlchemy ORM code), but will that impact which of these get used?

The DB names seem to be set in stone (or rather "Philosophers Stone") of SQLAlchemy. Meanwhile the Textual representation is supposed to be human readable

the whole idea of the entity was to allow a "translation" between the entity type as it is in the DB, the names it references, and the human readable description. Having an Enum also will likely prove useful for the repository pattern allowing a Repository instance to have a reference to the Entity that it is providing.

jkglasbrenner · 2025-08-27T17:52:12Z

src/dioptra/restapi/errors.py

 class QueryParameterNotUniqueError(QueryParameterValidationError):
    """Query Parameters failed unique validatation check."""

-    def __init__(self, type: str, **kwargs):
+    def __init__(self, type: EntityTypes, **kwargs):
        super().__init__(type, "unique", **kwargs)


I don't think I agree with changing this error to use an enum like this. Query parameters aren't entities, they're input data that's attached to a request sent to the endpoint.

agree with this. The context is the REST resource, not an entity resource.

jkglasbrenner · 2025-08-27T17:56:27Z

src/dioptra/restapi/errors.py

+    def __init__(self, type: EntityTypes, column: str, **kwargs):
+        super().__init__(
+            f"The sort parameter, {column}, for {type.get_print_name()} is not sortable."
+        )


Another place I'm not sure I would agree we should use EntityTypes. The sorting parameters map to attributes/properties on an Entity (or resource), they are not entities themselves.

The type was passed in as a string value to provide context. I am not sure we should have two ways of providing this context. Do you have a suggestion how we can provide the proper context to indicate the properties that are (attempted) for sorting?

after thinking about this a little bit, perhaps the better way to think about this, as not so much mapping to an entity but to a resource, with resource being the actual REST API Path we are mapping to. Therefore the mapping is not to an entity, but to the resource path.

jkglasbrenner · 2025-08-27T17:57:21Z

src/dioptra/restapi/errors.py

    """Input parameters failed validation."""

-    def __init__(self, type: str, constraint: str, **kwargs):
+    def __init__(self, type: EntityTypes, constraint: str, **kwargs):


I don't agree that EntityTypes belongs here, see my other comment about query parameters.

Instead of EntityType, this should be the REST resource on which this query parameter exists.

src/dioptra/restapi/v1/entity_types.py

jkglasbrenner · 2025-08-27T19:54:38Z

src/dioptra/restapi/v1/shared/tags/service.py

        if resource is None:
-            raise EntityDoesNotExistError(self._resource_type, resource_id=resource_id)
+            raise EntityDoesNotExistError(
+                EntityTypes.get_from_string(self._resource_type),


I agree, and perhaps the way to do this is to do something like this in the ResourceIdService:

@inject def __init__(self, resource_type: EntityTypes): self._resource_type = resource_type

And then the ResourceTagsIdService would get this modification:

@inject def __init__( self, resource_type: EntityTypes, resource_id_service: ClassAssistedBuilder["ResourceIdService"], tag_id_service: TagIdService, ) -> None: self._resource_id_service = resource_id_service.build( resource_type=resource_type ) self._tag_id_service = tag_id_service

And so on.

Of course, you'll also need to find where the resource_type argument is actually passed in each service layer definition and change that to the appropriate enum.

refactor: move regexes and functionality to module

JustKuzya and others added 21 commits July 30, 2025 21:06

feat(i561) add enumerated entity types for better code quality

63de324

- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test

fixed some import-induced errors

b447814

fix(tests): fix error in the test_drafts

40cb7a1

fix(tests) dispatched the json.dumps extension

b93e48e

- fixed the missing fall-backs to str at the object-creation places - added monkey-patch extension of the dumps default method to dispatch enums out in a uniform way

fix(merge) fix merge conflicts

badc0e1

fix(test): fix failing tests and mypy type error

1f91fd7

- find all merge-caused issues - make sure that enum serialization monkey-patch is ignored by mypy - verify that all tests and mypy work as was expected

fix(isort): fix iSort complaints

ba52990

fix(isort): remove DUPLICATE aliasing from imports

c904bf9

- apparently, isort disliker "import A as A" constructs

fix(isort) another fix of isort quircks

d051e40

fix(isort) another swing at the import sorting

51c731f

chore: merge dev into jk-i651-entity-types-cleanup

e0a73a6

fix(651): fix ruff suggested map use

407e16f

- fixed excessive map use replacing it with the simple list comprehension

feat(i561) add enumerated entity types for better code quality

a4c3521

- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test - resolved merge conflicts

fix(merge) fix merge conflicts

47f0a0d

chore(rebase) rebase/merged the div

773d2dd

- rebased and merge-resolved divergencies with div branch

chore(rebase): verify rebase-merge

a1be0d3

- verifying rebas-merge correctly applied to the branch

chore(style) fix simport style issues

79789dc

fix(i561) add enumerated entity types for better code quality

36023bd

- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test - rebased to dev branch - resolved all the merge conflicts

fix(test) fix code testing and sort imports

11b7729

chore(doc): add python functional documentation

a139f0e

JustKuzya force-pushed the jk-i651-entity-types-cleanup branch from a665543 to a139f0e Compare August 19, 2025 16:57

fix(code): fix code type mismatch

96fb069

JustKuzya mentioned this pull request Aug 19, 2025

Create an EntityType class to encapsulate constants associated with Dioptra Resources #651

Open

5 tasks

hbooth self-requested a review August 26, 2025 13:08

hbooth self-assigned this Aug 26, 2025

keithmanville linked an issue Aug 27, 2025 that may be closed by this pull request

Create an EntityType class to encapsulate constants associated with Dioptra Resources #651

Open

5 tasks

keithmanville reviewed Aug 27, 2025

View reviewed changes

jkglasbrenner requested changes Aug 27, 2025

View reviewed changes

JustKuzya added 3 commits September 8, 2025 22:12

nit: clean up comments

d13f537

refactor: move regexes and functionality to module

minor: remove unnecessary if

f888897

fix: correct lint unused variable warning

546f713

Jk i651 entity types cleanup #929

Are you sure you want to change the base?

Jk i651 entity types cleanup #929

Conversation

JustKuzya commented Aug 19, 2025

Uh oh!

keithmanville left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keithmanville commented Aug 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkglasbrenner left a comment

Choose a reason for hiding this comment

Uh oh!

jkglasbrenner Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkglasbrenner Aug 27, 2025 •

edited

Loading