-
Notifications
You must be signed in to change notification settings - Fork 178
Description
Feature request
# Define sentences for embedding
sentences = ["Embed this sentence via Infinity.", "Paris is in France."]
# Initialize the embedding engine with model specifications
array = AsyncEngineArray.from_args([
**EngineArgs(
model_name_or_path=r"./models/bce-embedding-base_v1",
served_model_name="bce-embedding-base_v1",
lengths_via_tokenize=True,
engine=InferenceEngine("optimum")
)**]
)
async def embed_image(engine: AsyncEmbeddingEngine):
await engine.astart() # initializes the engine
job1 = asyncio.create_task(engine.embed(sentences=sentences))
# submit a second job in parallel
job2 = asyncio.create_task(engine.embed(sentences=["Hello world"]))
# usage is total token count according to tokenizer.
embeddings, usage = await job1
embeddings2, usage2 = await job2
# Embeddings are now available for use - they ran in the same batch.
print(f"for {sentences}, generated embeddings {len(embeddings)} with tot_tokens={usage}")
await engine.astop()
asyncio.run(
embed_image(array["bce-embedding-base_v1"])
)
*************** EP Error *************** EP Error D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:505 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using ['TensorrtExecutionProvider'] Falling back to ['CPUExecutionProvider'] and retrying. ****************************************
If onnx failed to use TensorrtExecutionProvider,it will fall back to CPUExecutionProvider,cause slower inference.So we need a param in AsyncEmbeddingEngine to pass parameter "providers=['CUAExecutionProvider','TensorrtExecutionProvider'] " to "optimum" Engine to use only GPU.
Motivation
to prevent optimum from using CPUExecutionProvider.
Your contribution
no