-
Notifications
You must be signed in to change notification settings - Fork 531
[#5202] feat(client-python): Support Gravitino Type Serdes - serializer #6903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5202] feat(client-python): Support Gravitino Type Serdes - serializer #6903
Conversation
for implementing various customized DataClassJson serializer/deserializer apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
aimed at facilitating customized DataClassJson serializer/descrializer apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add write data type in SerdesUtils apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add TypeSerializer apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for TypesSerializer apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add write struct type in SerdesUtils apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for serializing struct type apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
def test_list_type_of_primitive_and_none_types_serdes(self): for simple_string, type_ in self._primitive_and_none_types.items(): list_type = Types.ListType.of(element_type=type_, element_nullable=False) serialized_result = TypesSerdes.serialize(list_type) self.assertEqual(serialized_result.get(SerdesUtils.TYPE), SerdesUtils.LIST) self.assertEqual( serialized_result.get(SerdesUtils.LIST_ELEMENT_TYPE), simple_string ) self.assertEqual( serialized_result.get(SerdesUtils.LIST_ELEMENT_NULLABLE), False ) apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for serializing list type apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add write map type in SerdesUtils apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for serializing map type apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add write union type in SerdesUtils apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for serializing union type apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add write external type in SerdesUtils apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for serializing external type apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add write unparsed type in SerdesUtils apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for serializing unparsed type apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
revise TypeVar names to conform with naming patterns apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
@tsungchih Could you explain why you need to implement a
Reference: |
@unknowntpo Thanks for your comments. That is because all of the current Gravitino IMHO, here comes with two solutions to serialize/deserialize
I'm going to elaborate more by taking
We may have it to support for
Note that, however, the Regards |
apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
due to moving json_serdes apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
as one JsonSerializable interface apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
due to interface change apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
for TypeSerdes apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
@xunliu @unknowntpo Based on the comments, I made the following changes:
In the end, the from gravitino.api.types.json_serdes.type_serdes import TypeSerdes
class ColumnDTO(Column):
_data_type: Type = field(
metadata=config(
field_name="data_type",
encoder=TypeSerdes.serialize,
decoder=TypeSerdes.deserialize,
)
) Please take a look at it when you're available. Thanks! |
|
||
@classmethod | ||
@abstractmethod | ||
def serialize(cls, data_type: GravitinoTypeT) -> DeserializedTypeT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsungchih Can we use Json
to add some constraints to GravitinoTypeT
and DeserializedTypeT
?
I found that in config.encoder
, and config.decoder
, it has some comments about the type of Callable
, it says B
is bound as a JSON type, and in dataclasses_json.core
, it has Json
type.
So I'm wondering we can use it to add constraints to GravitinoTypeT
and DeserializedTypeT
.
# in dataclasses-json cfg.py
def config(metadata: Optional[dict] = None, *,
# TODO: these can be typed more precisely
# Specifically, a Callable[A, B], where `B` is bound as a JSON type
encoder: Optional[Callable] = None,
decoder: Optional[Callable] = None,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unknowntpo Sure, it would be better if we could have appropriate type bound. I didn’t know the Json. Allow me to investigate the type you mentioned. Thanks for your prompt comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unknowntpo I submitted a commit to add for DeserializedTypeT
to be bound as Json
type and for GravitinoTypeT
to be bound as Type
. Thanks for the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsungchih Ok, but I found another problem,
def serialize(cls, data_type: GravitinoTypeT) -> DeserializedTypeT:
This is for serializing data_type
to Json
type, so I think DeserializedTypeT
should be named as SerializedTypeT
. Or we can simply use Json
type.
def serialize(cls, data_type: GravitinoTypeT) -> Json:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unknowntpo Yes, you're right. I didn't notice that. I changed it to use Json
instead. Thanks for the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
for GravitinoTypeT and DeserializedTypeT apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
hi @tsungchih Thank you for your contributions. |
apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
apache#5202 Signed-off-by: George T. C. Lai <[email protected]>
The title has been changed. Thanks for the suggestion. |
@xunliu LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @tsungchih for your work.
…rializer (apache#6903) ### What changes were proposed in this pull request? This is the second part (totally 4 planned) of implementation to the following classes from Java to support Column and its default value, including: - JsonUtils - TypeSerializer The `TypeSerializer` will be used in the incoming `ColumnDTO` implementation to serialize `data_type` field. ### Why are the changes needed? We need to support Column and its default value in python client. apache#5202 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests --------- Signed-off-by: George T. C. Lai <[email protected]>
What changes were proposed in this pull request?
This is the second part (totally 4 planned) of implementation to the following classes from Java to support Column and its default value, including:
The
TypeSerializer
will be used in the incomingColumnDTO
implementation to serializedata_type
field.Why are the changes needed?
We need to support Column and its default value in python client.
#5202
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests