Is there a way to use NIM API Calls instead of having a model deployed on the GPU?
Would changing the values.yaml file while deploying the helm chart should do.
Is there any alternate for this, and the other models like reranker and embedding model?
nemollm-embedding-embedding-deployment-6bdc968784-9v2mm
nemollm-inference-nemollm-infer-deployment-5b88bf7bc-cmxsz
ranking-ms-ranking-deployment-5c7768d88b-zvtt5
These pods would associate to the models by NVIDIA, so changing to NIM can also save costs.
Any help would be appreciated.