UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128)

**Describe the bug**
I'm seeing a `UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128)`. The code in tf_utils.py (https://github.com/onnx/tensorflow-onnx/blob/main/tf2onnx/tf_utils.py#L57) seems to mark this as expected, but the fallback to `np.vectorize(lambda x: x.decode('UTF-8'))` also seems to fail with a similar error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1024: invalid start byte`

**Urgency**
N/A

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 18.04*): GCP VM
- TensorFlow Version: 2.9.00
- Python version: 3.7.12
- ONNX version (if applicable, e.g. 1.11*): onnx-1.14.1 (installed via pip install git+https://github.com/onnx/tensorflow-onnx)
- ONNXRuntime version (if applicable, e.g. 1.11*): onnxruntime-1.14.1


**To Reproduce**
The model is a custom DCN v2 model built using libraries from the tensorflow ecosystem. This includes TFRS (recommender systems), TFR (ranking), TF Text, TF IO, and TF Transform. The model is saved using `tf.saved_model.save(..)`.


**Screenshots**
![Screen Shot 2023-10-25 at 2 00 33 PM](https://github.com/onnx/tensorflow-onnx/assets/5566004/0322d6e9-679d-4037-aedd-76ac5e365ded)


**Additional context**

1. I found that a whole set of ops in the model don't seem to be present in the supported list of ops. But based on the troubleshooting guide, the error I'm seeing here looks different from the one mentioned in the guide. Is it possible that the decode errors are due to the unsupported ops? 

* Missing ops:
    * Bucketize
    * AssignVariableOp
    * InitializeTableFromTextFileV2
    * LookupTableImportV2
    * MergeV2Checkpoints
    * ReadVariableOp
    * ResourceGather
    * RestoreV2
    * SaveV2
    * ShardedFilename
    * StatefulPartitionedCall
    * StaticRegexFullMatch
    * VarHandleOp
    * TFText>WhitespaceTokenizeWithOffsetsV2
    * VarIsInitializedOp
 
* Supported via ai.onnx.contrib:
    * StaticRegexReplace
    * StringJoin
    * StringSplitV2
    * StringToHashBucketFast

2. Not sure if this is relevant, I previously found that the conversion doesn't proceed without having to explicitly import `tensorflow_text`. In order to do this, I have a custom script (shared below) which invokes `tf2onnx.convert.main()`. 

```
import tensorflow as tf
import tensorflow_text as tf_text 
import tensorflow_transform as tft 

import tf2onnx.convert

print("Done importing custom tf modules...")
print("Invoking tf2onnx.convert.main()...")
tf2onnx.convert.main()
```


I've tried switching my numpy version to 1.20 as mentioned in one of the github issues, but this doesn't seem to work either. Would appreciate your help with this!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128) #2260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128) #2260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions