Pass parameter "providers=['CUAExecutionProvider','TensorrtExecutionProvider'] "  to "optimum" Engine

### Feature request

```
# Define sentences for embedding
sentences = ["Embed this sentence via Infinity.", "Paris is in France."]
# Initialize the embedding engine with model specifications
array = AsyncEngineArray.from_args([
    **EngineArgs(
        model_name_or_path=r"./models/bce-embedding-base_v1",
        served_model_name="bce-embedding-base_v1",
        lengths_via_tokenize=True,
        engine=InferenceEngine("optimum")
    )**]
)

async def embed_image(engine: AsyncEmbeddingEngine): 
    await engine.astart()  # initializes  the engine
    job1 = asyncio.create_task(engine.embed(sentences=sentences))
    # submit a second job in parallel
    job2 = asyncio.create_task(engine.embed(sentences=["Hello world"]))
    # usage is total token count according to tokenizer.
    embeddings, usage = await job1
    embeddings2, usage2 = await job2
    # Embeddings are now available for use - they ran in the same batch.
    print(f"for {sentences}, generated embeddings {len(embeddings)} with tot_tokens={usage}")
    await engine.astop() 

asyncio.run(
    embed_image(array["bce-embedding-base_v1"])
)
```
`*************** EP Error ***************
EP Error D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:505 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
 when using ['TensorrtExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
****************************************`



If onnx failed to use TensorrtExecutionProvider，it will fall back to CPUExecutionProvider,cause slower inference.So we need a param in AsyncEmbeddingEngine to pass parameter "providers=['CUAExecutionProvider','TensorrtExecutionProvider'] "  to "optimum" Engine to use only GPU.

### Motivation

to prevent optimum from using CPUExecutionProvider.

### Your contribution

no

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass parameter "providers=['CUAExecutionProvider','TensorrtExecutionProvider'] " to "optimum" Engine #639

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pass parameter "providers=['CUAExecutionProvider','TensorrtExecutionProvider'] " to "optimum" Engine #639

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions