-
Notifications
You must be signed in to change notification settings - Fork 47
Jk i651 entity types cleanup #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test
This refactor changes the following: - Removes name() method from ArtifactTaskInterface - Updated code to use the name of the class in place of the result from name() method - Updated implementations in extra/plugins/artifacts/tasks.py - Updated Examples
- fixed the missing fall-backs to str at the object-creation places - added monkey-patch extension of the dumps default method to dispatch enums out in a uniform way
- find all merge-caused issues - make sure that enum serialization monkey-patch is ignored by mypy - verify that all tests and mypy work as was expected
- apparently, isort disliker "import A as A" constructs
- fixed excessive map use replacing it with the simple list comprehension
- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test - resolved merge conflicts
- rebased and merge-resolved divergencies with div branch
- verifying rebas-merge correctly applied to the branch
- added enum - went over the Dioptra error interfaces to modify common exceptions and errors - cleaned the imports and started enum unit-test - rebased to dev branch - resolved all the merge conflicts
a665543 to
a139f0e
Compare
keithmanville
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only two representations of a resource are the db name and the display name, why not use a standard Enum with db name as the the name and the display name as the value?
Does not address API routes. I thought this was one of the constants we would want to include.
| _type_: The EntityTypes Enum | ||
| """ | ||
| if not resource_type_name: | ||
| # print(f"1. {resource_type_name=} ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove debug prints before merging
| return EntityTypes[name] | ||
| # print(f"{'!' * 80}\n\t{resource_type_name=} Was Not Found in ENUM {'!' * 80}") | ||
| return EntityTypes.UNDEFINED | ||
| # ---------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these "separator" comments. Not used anywhere else in codebase.
| Returns: | ||
| str: returns inst-ENUM.all-lower-case - snake | ||
| """ | ||
| if self.db_schema_name == EntityTypes.to_snake_case(self.db_schema_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just
return EntityTypes.to_snake_case(self.db_schema_name)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed from the initial iteration.
| new_instance._value_ = ( | ||
| original_name # Set default value to make sure the Enum base works | ||
| ) | ||
| new_instance.db_schema_name = original_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable names are confusing. We set db_schema_name to original_name and then the get_original_name() returns the db_schema_name. Are we trying to make some distinction between what we call this internally/externally from the enum?
| if resource is None: | ||
| raise EntityDoesNotExistError(self._resource_type, resource_id=resource_id) | ||
| raise EntityDoesNotExistError( | ||
| EntityTypes.get_from_string(self._resource_type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in some cases were are translating to and from string representations when it's not needed. In this case, _resource_type ultimately comes from the calls to generate_resource_tags_endpoint where we pass RESOURCE_TYPE.get_db_schema_name().
We could just store the enum object directly and not need to go back and forth from string representations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and perhaps the way to do this is to do something like this in the ResourceIdService:
@inject
def __init__(self, resource_type: EntityTypes):
self._resource_type = resource_typeAnd then the ResourceTagsIdService would get this modification:
@inject
def __init__(
self,
resource_type: EntityTypes,
resource_id_service: ClassAssistedBuilder["ResourceIdService"],
tag_id_service: TagIdService,
) -> None:
self._resource_id_service = resource_id_service.build(
resource_type=resource_type
)
self._tag_id_service = tag_id_serviceAnd so on.
Of course, you'll also need to find where the resource_type argument is actually passed in each service layer definition and change that to the appropriate enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, with storing the enum reference as much as possible and only output to string when needed.
| ML_MODEL = "ml_model", "M.L. Model" | ||
| ML_MODEL_VERSION = "ml_model_version", "M.L. Model Version" | ||
| # --- Artifact-Related Entities --- | ||
| HAS_DRAFT = "has_draft", "Has Draft" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing several entries that aren't Resources. I thought the intent was to capture only Resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also what I thought. There's a mixture of REST API entities, database tables that contain attributes about a resource type that is a backend implementation detail (the artifact parameters, for example), request parameters like PASSWORD, and HAS_DRAFT which is a derived attribute that we attach to resources in the app's responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until the end I was not sure if the third or fourth member would be needed for Errors and/or Logs. Neither was needed, but tuples would be easier to expand if I missed some use-cases later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make a strong distinction between REST Resources and internal Entity Types which generally map to a base object. If there are other ways or object that were referred to as an entity type, we need to resolve that. @JustKuzya let me know if you see others and let's discuss.
|
My review mostly posed questions, but it might be more helpful make a recommendation and see what everyone thinks:
I think those three would address my biggest questions with this PR. |
| LOGGER: BoundLogger = structlog.stdlib.get_logger() | ||
|
|
||
| RESOURCE_TYPE: Final[str] = "artifact" | ||
| ARTIFACT_RESOURCE_TYPE: Final[EntityTypes] = EntityTypes.ARTIFACT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think wherever we use EntityTypes now it's cleaner to
from dioptra.restapi.v1.entity_types import EntityTypesand then access the EntityTypes needed as EntityTypes.ARTIFACT.
Basically we don't need the extra ARTIFACT_RESOURCE_TYPE constants floating around anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with this.
| return EntityTypes.to_snake_case(self.db_schema_name) | ||
| # --------------------------------------------------------------------- | ||
|
|
||
| def get_original_name(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove getters and access properties directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To that end, if you want to make these properties "read-only" (well, as much as you can do that in Python), prefer this pattern
def __new__(
cls,
original_name: str,
readable_name: str,
):
"""Default constructor for stuffing EntityTypes Enum
Args:
original_name (str): DB-schema-safe entity type name
readable_name (str): The name to be read by user in UI/UX/Logs/Errors
Returns:
_type_: The created const-style immutable tuple-member the enum
"""
new_instance = object.__new__(cls)
new_instance._value_ = (
original_name # Set default value to make sure the Enum base works
)
new_instance._db_schema_name = original_name
new_instance._ui_print_name = readable_name
return new_instance
@property
def db_schema_name(self) -> str:
return EntityTypes.to_snake_case(self._db_schema_name)
@property
def ui_print_name(self) -> str:
return EntityTypes.to_snake_case(self._ui_print_name)You would also need to remove these from the class definition to make room for the above:
db_schema_name: str # DB-compatible name representation
ui_print_name: str # Yser-friendly name representation
jkglasbrenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review focused on the scope of where EntityTypes should (and shouldn't) be used, some implementation details, and code organization. I also left comments on the threads started by @keithmanville when I had something to add.
| # This is the serializer overwrite for the Enum-Derived objects | ||
| # Derived from stack overflow article: | ||
| # https://stackoverflow.com/questions/36699512/is-it-possible-to-dump-an-enum-in-json-without-passing-an-encoder-to-json-dumps | ||
| # | ||
| _saved_default = ( | ||
| # Persisting the current state of the JSON serializer | ||
| JSONEncoder().default | ||
| ) # Save the original default method for json-dumps | ||
|
|
||
|
|
||
| def _new_default(self, obj) -> Any: | ||
| """Default function that would also handle the Enums in serialization | ||
| Args: | ||
| obj (_type_): Python object to serialize | ||
| Returns: | ||
| Any: The method to serialize the object, | ||
| with special handler for the Enum | ||
| """ | ||
| if isinstance(obj, Enum): | ||
| # OUR SPECIAL CASE WHEN THE OBJECT IS ENUM-DERIVED | ||
| return obj.name # Cough up the default property .name or .value or ._value_ | ||
| else: | ||
| # Dispatch bask to the original mechanism of json-dumping for non-Enum objects | ||
| return _saved_default | ||
|
|
||
|
|
||
| # Glue-up the new default method with the patch-back, | ||
| # and tell MyPy not to wrinkle it's nose at the "monkey patch" | ||
| JSONEncoder.default = _new_default # type: ignore | ||
| # End of the monkey-patching the serialization fot he Enums | ||
| # ====================================================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of monkeypatch seems brittle and unpredictable, as it only is applied once this file is imported. Depending on how code is imported and called, it's very possible for json.dump to use different versions of the JSONEncoder default during the lifecycle of the application. The original default also is never restored, so this is a permanent override.
The usual way to do this kind of thing is to inherit from the JSONEncoder class, override the default method, and then pass it to json.dump or json.dumps explicitly:
import json
from enum import Enum
class EnumJSONEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Enum):
return obj.name
# Dispatch bask to the original mechanism of json-dumping for non-Enum objects
return super().default(obj)
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
print(json.dumps({"favorite_color": Color.BLUE, "name": "First", "id": 1}, cls=EnumJSONEncoder))which prints
{"favorite_color": "BLUE", "name": "First", "id": 1}Stepping back, where is json.dump or json.dumps being used that requires this approach? Is it an invocation of json.dump or json.dumps that we control in the codebase? Or is it being invoked by one of our dependent libraries, and there's no way to provide the custom encoder?
If we truly need to override this for the entire application like this, then we should consider doing something similar to dioptra.restapi.patches.monkey_patch_flask_restx, which is called within dioptra.restapi.app.create_app. This would better control where in the application lifecycle the behavior gets changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the instance switch suggested, but I don't remember what was breaking without the patch. I agree that probably a uniform way of doing it once and for all enums is better way to handle the Enums serialization.
| @staticmethod | ||
| def to_snake_case(text_to_snake: str) -> str: | ||
| """Converts a string to lower-case snake_case format. | ||
| Handles various input formats like camelCase, PascalCase, | ||
| and strings with spaces, comas, hyphens, colons, | ||
| semicolons, or hyphens. | ||
| Args: | ||
| text_to_snake (str): text to format to snake case | ||
| Returns: | ||
| str: Snake-case formatted name-type string | ||
| """ | ||
| # Bail out for empty input | ||
| if not text_to_snake: | ||
| return "" | ||
| # Replace punctuation with spaces | ||
| text_to_snake = ( | ||
| text_to_snake.replace("-", " ") | ||
| .replace(",", " ") | ||
| .replace(".", " ") | ||
| .replace(";", " ") | ||
| .replace(":", " ") | ||
| ) | ||
| # Replace spaces with underscores and convert to lowercase | ||
| # Insert underscore before uppercase letters in camelCase/PascalCase | ||
| # Handles cases like: | ||
| # "PascalCase" -> "Pascal_Case" and | ||
| # "varInCamelCase" -> "var_In_Camel_Case" | ||
| text_to_snake = re.sub(r"(?<!^)(?=[A-Z])", " ", text_to_snake) | ||
| # Clean multiple spaces by collapsing them, if they happen | ||
| text_to_snake = re.sub(r"\s+", " ", text_to_snake).strip() | ||
| # Replace spaces with underscores | ||
| return text_to_snake.replace(" ", "_").lower() | ||
| # ------------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider this to be a general utility function instead of a method that is specific to the details of this class. I recommend extracting this into dioptra.restapi.utils, it would fit nicely right below the slugify function.
Also, after moving the function, I would move the regular expressions into module-level level constants and compile them, since in principle this function could be called many times.
CAMEL_PASCAL_SEP_REGEX = re.compile(r"(?<!^)(?=[A-Z])")
MULTI_SPACE_REGEX = re.compile(r"\s+")
# Used like
CAMEL_PASCAL_SEP_REGEX.sub(" ", text_to_snake)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Really good thought. Except for one thing: I always prefer to use a common prefix, in this care REGEX_... for similar purpose constants - this way the code assist becomes more helpful as all similar use constants have same prefix and pop up as a suggestion, developer only needs to read the suggestions carefully and choose. I know that most of the developers are "writers", not "readers" ☺ , but still it is more helpful than not for the most cases.
| return EntityTypes.to_snake_case(self.db_schema_name) | ||
| # --------------------------------------------------------------------- | ||
|
|
||
| def get_original_name(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To that end, if you want to make these properties "read-only" (well, as much as you can do that in Python), prefer this pattern
def __new__(
cls,
original_name: str,
readable_name: str,
):
"""Default constructor for stuffing EntityTypes Enum
Args:
original_name (str): DB-schema-safe entity type name
readable_name (str): The name to be read by user in UI/UX/Logs/Errors
Returns:
_type_: The created const-style immutable tuple-member the enum
"""
new_instance = object.__new__(cls)
new_instance._value_ = (
original_name # Set default value to make sure the Enum base works
)
new_instance._db_schema_name = original_name
new_instance._ui_print_name = readable_name
return new_instance
@property
def db_schema_name(self) -> str:
return EntityTypes.to_snake_case(self._db_schema_name)
@property
def ui_print_name(self) -> str:
return EntityTypes.to_snake_case(self._ui_print_name)You would also need to remove these from the class definition to make room for the above:
db_schema_name: str # DB-compatible name representation
ui_print_name: str # Yser-friendly name representation| ML_MODEL = "ml_model", "M.L. Model" | ||
| ML_MODEL_VERSION = "ml_model_version", "M.L. Model Version" | ||
| # --- Artifact-Related Entities --- | ||
| HAS_DRAFT = "has_draft", "Has Draft" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also what I thought. There's a mixture of REST API entities, database tables that contain attributes about a resource type that is a backend implementation detail (the artifact parameters, for example), request parameters like PASSWORD, and HAS_DRAFT which is a derived attribute that we attach to resources in the app's responses.
| ENTRYPOINT = "entry_point", "Entry Point" | ||
| ENTRYPOINT_PLUGIN = "entry_point_plugin", "Entry Point Plugin" | ||
| # --- Entry-Point Entities --- | ||
| ENTRY_POINT = "entry_point", "Entry Point" | ||
| ENTRY_POINT_PLUGIN = "entry_point_plugin", "Entry Point Plugin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use one of these everywhere instead of aliasing the two variants? I know there's inconsistency in naming in different sections of the code base (entrypoint is found in the Flask code, while entry_point is used in the SQLAlchemy ORM code), but will that impact which of these get used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DB names seem to be set in stone (or rather "Philosophers Stone") of SQLAlchemy. Meanwhile the Textual representation is supposed to be human readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole idea of the entity was to allow a "translation" between the entity type as it is in the DB, the names it references, and the human readable description. Having an Enum also will likely prove useful for the repository pattern allowing a Repository instance to have a reference to the Entity that it is providing.
| class QueryParameterNotUniqueError(QueryParameterValidationError): | ||
| """Query Parameters failed unique validatation check.""" | ||
|
|
||
| def __init__(self, type: str, **kwargs): | ||
| def __init__(self, type: EntityTypes, **kwargs): | ||
| super().__init__(type, "unique", **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree with changing this error to use an enum like this. Query parameters aren't entities, they're input data that's attached to a request sent to the endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with this. The context is the REST resource, not an entity resource.
| def __init__(self, type: EntityTypes, column: str, **kwargs): | ||
| super().__init__( | ||
| f"The sort parameter, {column}, for {type.get_print_name()} is not sortable." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place I'm not sure I would agree we should use EntityTypes. The sorting parameters map to attributes/properties on an Entity (or resource), they are not entities themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type was passed in as a string value to provide context. I am not sure we should have two ways of providing this context. Do you have a suggestion how we can provide the proper context to indicate the properties that are (attempted) for sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after thinking about this a little bit, perhaps the better way to think about this, as not so much mapping to an entity but to a resource, with resource being the actual REST API Path we are mapping to. Therefore the mapping is not to an entity, but to the resource path.
| """Input parameters failed validation.""" | ||
|
|
||
| def __init__(self, type: str, constraint: str, **kwargs): | ||
| def __init__(self, type: EntityTypes, constraint: str, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree that EntityTypes belongs here, see my other comment about query parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of EntityType, this should be the REST resource on which this query parameter exists.
| if resource is None: | ||
| raise EntityDoesNotExistError(self._resource_type, resource_id=resource_id) | ||
| raise EntityDoesNotExistError( | ||
| EntityTypes.get_from_string(self._resource_type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and perhaps the way to do this is to do something like this in the ResourceIdService:
@inject
def __init__(self, resource_type: EntityTypes):
self._resource_type = resource_typeAnd then the ResourceTagsIdService would get this modification:
@inject
def __init__(
self,
resource_type: EntityTypes,
resource_id_service: ClassAssistedBuilder["ResourceIdService"],
tag_id_service: TagIdService,
) -> None:
self._resource_id_service = resource_id_service.build(
resource_type=resource_type
)
self._tag_id_service = tag_id_serviceAnd so on.
Of course, you'll also need to find where the resource_type argument is actually passed in each service layer definition and change that to the appropriate enum.
refactor: move regexes and functionality to module
Added the consistent uniform way to handle entity-types in the code. Rebased to the dev as of Aug 19, resolved all merge conflicts.