Skip to content

Fix pandas codec#2080

Merged
RobertSamoilescu merged 22 commits intoSeldonIO:masterfrom
RobertSamoilescu:fix/pandas-codec
Mar 21, 2025
Merged

Fix pandas codec#2080
RobertSamoilescu merged 22 commits intoSeldonIO:masterfrom
RobertSamoilescu:fix/pandas-codec

Conversation

@RobertSamoilescu
Copy link
Copy Markdown
Contributor

@RobertSamoilescu RobertSamoilescu commented Mar 17, 2025

PandasCodec improperly encodes columns of numeric lists. The type returned by the codec is BYTES while the content still remains a list of numerical values. This leads to improper internal decoding in SC2 (see function here), which leads to a panic due to improper conversion from numeric (e.g., float) to bytes.

This PR fixes the encoding object pandas columns to str/bytes which properly match the type of the Response/Request. This will lead to proper decoding within SC2.

In addition, this PR adds support for additional column types supported by the mlflow runtime: Array, Map, Object, Any. Note that all of they are Json-convertible objects which justifies the choice of encoding them as json.

Note that defining a dedicated JSONCodec might not be possible because it can clash with the HuggingfaceListJSONCodec or HuggingfaceJSONCodec - multiple codecs can be found based on the can_encode method.

@RobertSamoilescu RobertSamoilescu marked this pull request as draft March 18, 2025 15:21
@RobertSamoilescu RobertSamoilescu marked this pull request as ready for review March 18, 2025 16:54
Copy link
Copy Markdown
Contributor

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RobertSamoilescu RobertSamoilescu merged commit 05365b8 into SeldonIO:master Mar 21, 2025
50 checks passed
RobertSamoilescu added a commit that referenced this pull request Apr 11, 2025
* Implemented JSONCodec

* Encode object type with JSONCodec for pandas

* Included more types for mlflow

* Implemented can_encode function for JSON codec

* Included test for JSON codec

* Fixed pandas codec to treat objects differently

* Included tests for mlflow metadata

* Included test for mlflow runtime

* Fixed can_encode for JSONCodec and a pandas test

* Fixed code registry but need to remove duplicated json encoding code from hf runtime

* Fixed linting issues

* Refactored duplicated code and simplified logic for can_encode

* Renamed util function

* Fixed support to encode numpy and reverted can_encode function

* Fixed bug in pandas encoding

* Reverted comment in pandas codec

* Moved env and cli tests to sequential.

* Included sleep before retrieving metrics

* Run metrics tests seq

* Simplified code by removing JSONCodec

* Included more tests for pandas codecs

* Moved PandasJsonContentType inside PandasCodec.
RobertSamoilescu added a commit that referenced this pull request Apr 11, 2025
* Implemented JSONCodec

* Encode object type with JSONCodec for pandas

* Included more types for mlflow

* Implemented can_encode function for JSON codec

* Included test for JSON codec

* Fixed pandas codec to treat objects differently

* Included tests for mlflow metadata

* Included test for mlflow runtime

* Fixed can_encode for JSONCodec and a pandas test

* Fixed code registry but need to remove duplicated json encoding code from hf runtime

* Fixed linting issues

* Refactored duplicated code and simplified logic for can_encode

* Renamed util function

* Fixed support to encode numpy and reverted can_encode function

* Fixed bug in pandas encoding

* Reverted comment in pandas codec

* Moved env and cli tests to sequential.

* Included sleep before retrieving metrics

* Run metrics tests seq

* Simplified code by removing JSONCodec

* Included more tests for pandas codecs

* Moved PandasJsonContentType inside PandasCodec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants