Skip to content

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128) #2260

Open
@mon95

Description

@mon95

Describe the bug
I'm seeing a UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128). The code in tf_utils.py (https://github.com/onnx/tensorflow-onnx/blob/main/tf2onnx/tf_utils.py#L57) seems to mark this as expected, but the fallback to np.vectorize(lambda x: x.decode('UTF-8')) also seems to fail with a similar error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1024: invalid start byte

Urgency
N/A

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04*): GCP VM
  • TensorFlow Version: 2.9.00
  • Python version: 3.7.12
  • ONNX version (if applicable, e.g. 1.11*): onnx-1.14.1 (installed via pip install git+https://github.com/onnx/tensorflow-onnx)
  • ONNXRuntime version (if applicable, e.g. 1.11*): onnxruntime-1.14.1

To Reproduce
The model is a custom DCN v2 model built using libraries from the tensorflow ecosystem. This includes TFRS (recommender systems), TFR (ranking), TF Text, TF IO, and TF Transform. The model is saved using tf.saved_model.save(..).

Screenshots
Screen Shot 2023-10-25 at 2 00 33 PM

Additional context

  1. I found that a whole set of ops in the model don't seem to be present in the supported list of ops. But based on the troubleshooting guide, the error I'm seeing here looks different from the one mentioned in the guide. Is it possible that the decode errors are due to the unsupported ops?
  • Missing ops:
    • Bucketize
    • AssignVariableOp
    • InitializeTableFromTextFileV2
    • LookupTableImportV2
    • MergeV2Checkpoints
    • ReadVariableOp
    • ResourceGather
    • RestoreV2
    • SaveV2
    • ShardedFilename
    • StatefulPartitionedCall
    • StaticRegexFullMatch
    • VarHandleOp
    • TFText>WhitespaceTokenizeWithOffsetsV2
    • VarIsInitializedOp
       
  • Supported via ai.onnx.contrib:
    • StaticRegexReplace
    • StringJoin
    • StringSplitV2
    • StringToHashBucketFast
  1. Not sure if this is relevant, I previously found that the conversion doesn't proceed without having to explicitly import tensorflow_text. In order to do this, I have a custom script (shared below) which invokes tf2onnx.convert.main().
import tensorflow as tf
import tensorflow_text as tf_text 
import tensorflow_transform as tft 

import tf2onnx.convert

print("Done importing custom tf modules...")
print("Invoking tf2onnx.convert.main()...")
tf2onnx.convert.main()

I've tried switching my numpy version to 1.20 as mentioned in one of the github issues, but this doesn't seem to work either. Would appreciate your help with this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugAn unexpected problem or unintended behaviorpending on user responseWaiting for more information or validation from user

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions