Skip to content

Support for bit precision in the Inference API text_embedding task #111747

Open
@jimczi

Description

@jimczi

Description

Some inference API providers now support embedding models with each dimension defined as a single bit. For example, the v3 models from Cohere offer this capability. Since we already handle the bit element type in the dense vector field, it would be beneficial to extend this support to allow the text_embedding task of the inference API to output vectors with bit precision.

Typically, bit vectors are paired with float or byte vectors to improve recall by rescoring the hits from bit vectors with higher precision vectors. To support this, we suggest allowing the text_embedding task to generate multiple vectors for the same input at different precisions (e.g., bits + floats or bits + int8). While this functionality is already available in the Cohere API, implementing it in the inference API would optimize performance by eliminating the need to make two separate API calls for each precision, thereby reducing costs for users.

This would require the mapping to be defined with two fields, each corresponding to a different precision.

Additionally, we should evaluate whether the semantic_text field could natively support this scenario.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions