TODO in Metafile.get_class() (storage/model/metafile.py)#501
TODO in Metafile.get_class() (storage/model/metafile.py)#501sahil-03 wants to merge 1 commit intoray-project:2.0from
Conversation
| # Sample new implementation: | ||
| # if "_type" in serialized_dict: | ||
| # type_code = serialized_dict["_type"] | ||
| # if type_code in SOME_MAPPING: | ||
| # return SOME_MAPPING[type_code] | ||
| # else: | ||
| # raise an error | ||
| # else: | ||
| # raise and error |
There was a problem hiding this comment.
I like the proposal - it's definitely an improvement on what's currently here. Space optimality of metafiles isn't the biggest concern right now, since we generally expect things like S3 put/get latency to outweigh the cost of downloading the bytes as long as they all stay in the KiB size range.
With that being said, I think I'd still prefer to just use an integer-based enum here to not waste much more space than we need to when serializing with msgpack. This will probably be most relevant for Deltas, which will commonly be present in the millions or billions per table, so extra bytes will start to add up here (which reminds me of another important TODO to store the largest part of the Delta - the Manifest - in a separate file whenever it's in-memory size is larger than, say, 128 KiB).
WDYT about an enum like this that we add to storage/model/types.py:
class MetafileType(int, Enum):
NAMESPACE = 1
TABLE = 2
TABLE_VERSION = 3
STREAM = 4
PARTITION = 5
DELTA = 6
# usage examples
>>> MetafileType(1)
<MetafileType.NAMESPACE: 1>
>>> MetafileType(1).value
1
>>> MetafileType(1).name
'NAMESPACE'
>>> MetafileType.NAMESPACE
<MetafileType.NAMESPACE: 1>
Summary
Addressing TODO in storage/model/metafile.py in Metafile.get_class() function:
Rationale
A more robust get_class method. Right now, it is vulnerable to name changes and is somewhat "hard-coded". To make the implementation more reliable, I suggest adding a new field to each object where it can be reliably and easily recognized.
Changes
Changes will (potentially) include a new field being added to each object. So far, I have provided a design suggestion and sample implementation.
Impact
Discuss any potential impacts the changes may have on existing functionalities.
Testing
Describe how the changes have been tested, including both automated and manual testing strategies.
If this is a bugfix, explain how the fix has been tested to ensure the bug is resolved without introducing new issues.
Regression Risk
If this is a bugfix, assess the risk of regression caused by this fix and steps taken to mitigate it.
Checklist
Unit tests covering the changes have been added
E2E testing has been performed
Additional Notes
Any additional information or context relevant to this PR.