Skip to content

TextVectorization: output_mode={multi_hot, count} promise int arrays but output floats #711

Open
@divyashreepathihalli

Description

@divyashreepathihalli

Issue filed in Keras by @nicdumz - keras-team/keras#18973

Documentation for output_mode currently reads:

"multi_hot": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item.
"count": Like "multi_hot", but the int array contains a count of the number of times the token at that index appeared in the batch item.

repro

import tensorflow as tf, tensorflow.version as tv

print(f"{tv.VERSION}, {tv.COMPILER_VERSION}, {tv.GIT_VERSION}")

v = tf.keras.layers.TextVectorization(output_mode="count")
v.adapt(["foo", "bar", "baz"])
print(v(["bar baz"]).dtype)

ouput

2.15.0, Ubuntu Clang 17.0.2 (++20231003073124+b2417f51dbbd-1~exp1~20231003073217.50), v2.15.0-2-g0b15fdfcb3f
<dtype: 'float32'>

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions