feat(nvidia-nim): dynamic model discovery via integrate.api.nvidia.com (#1099)#1177
feat(nvidia-nim): dynamic model discovery via integrate.api.nvidia.com (#1099)#11770xghost42 wants to merge 1 commit into
Conversation
…a.com (Gitlawb#1099) Mirror the Gitlawb#1143 Groq hybrid-catalog pattern for the NVIDIA NIM gateway: replace the single-entry static catalog with discovery against https://integrate.api.nvidia.com/v1/models and a filter that excludes embedding, retriever, reranker, ASR (whisper, parakeet, canary, riva), TTS, image-gen (SDXL, flux, stable-diffusion, kosmos, florence), safety (llama-guard, nemoguard, content-safety), and reward models so the /model picker only surfaces chat/instruct ids. Settings match Groq's: - catalog.source: hybrid (keep the existing Nemotron 70B as the static fallback when discovery is unavailable) - discoveryCacheTtl: 1d - discoveryRefreshMode: background-if-stale - allowManualRefresh: true Adds a focused gateway test (`nvidia-nim.test.ts`) pinning the filter regex against representative real NVIDIA model ids — keeps, embedding drops, ASR drops, image-gen drops, safety drops, inactive drops, plus context_window forwarding — so the filter does not silently start admitting non-chat models as NVIDIA's catalog grows. The existing `src/utils/model/nvidiaNimModels.ts` env-var path (used when users set NVIDIA_NIM or detect via OPENAI_BASE_URL) is unchanged for now; its hand-rolled list keeps working. Wiring that path through the discovery service is a separate, larger change.
jatmn
left a comment
There was a problem hiding this comment.
Findings
-
[P1] Keep live non-chat NVIDIA entries out of discovery
src/integrations/gateways/nvidia-nim.ts:65
The blacklist does not catch several non-chat models that the livehttps://integrate.api.nvidia.com/v1/modelsendpoint currently returns, so this change still pollutes/modelwith entries that are not usable chat/instruct choices. For example, using this PR'smapModelagainst the live catalog acceptsbaai/bge-m3,google/deplot,nvidia/ai-synthetic-video-detector,nvidia/gliner-pii, andnvidia/ising-calibration-1-35b-a3bas normal catalog entries. That directly undercuts the PR goal of showing only chat ids and will lead users to select models that fail later. Please make the filter positive for known chat/instruct/reasoning/code patterns, or otherwise expand the mapping/tests with real current NVIDIA non-chat ids before enabling hybrid discovery. -
[P2] Let inline
/model <id>accept discovered NVIDIA models
src/utils/model/validateModel.ts:53
The picker can now surface dynamically discovered NVIDIA models, but the inline/model some-idpath still validates NVIDIA NIM only against the legacy hard-codednvidiaNimModels.tslist and returns before it can use descriptor discovery or an API probe. I reproduced this with a live discovered id:abacusai/dracarys-llama-3.1-70b-instructis accepted by this PR'smapModel, butvalidateModel()rejects it with "not found in NVIDIA NIM catalog" because it is not in the old static list. Users who type a discovered model name, or scripts that use/model <id>, will be blocked even though the descriptor catalog now knows about the model. Please route NVIDIA validation through the descriptor discovery cache/catalog, or at least fall through to the existing API validation for ids outside the legacy list.
Blockers
Non-Blocking
Looks Good
Verdict: Changes Requested — filter needs to be more comprehensive, and inline validation needs to accept discovered models. |
|
Verified @jatmn's findings. The |
Summary
Closes the dynamic-discovery half of #1099 by mirroring #1143's Groq hybrid-catalog pattern on the NVIDIA NIM gateway.
src/integrations/gateways/nvidia-nim.ts—catalog.source: 'static'(single entry) →catalog.source: 'hybrid'with adiscovery.mapModelfilter againsthttps://integrate.api.nvidia.com/v1/models.NVIDIA_NON_CHAT_PATTERNexcludes the non-chat catalog NVIDIA returns: embeddings, retrievers, rerankers, ASR (whisper, parakeet, canary, riva), TTS / voice, image-gen (SDXL, flux, stable-diffusion, kosmos, florence),nvclip, safety / guard /nemoguard/ content-safety, reward models.discoveryCacheTtl: '1d',discoveryRefreshMode: 'background-if-stale',allowManualRefresh: true.Out of scope
src/utils/model/nvidiaNimModels.ts(the parallel env-var path used whenNVIDIA_NIM=1orOPENAI_BASE_URLcontainsnvidia) is unchanged. That path has its own hand-rolled list; wiring it through the discovery service is a separate, larger change. This PR addresses the descriptor-backed half so profile users get dynamic discovery without touching the env-var path's sync API.Tests
src/integrations/gateways/nvidia-nim.test.ts(9 tests, 39 expect calls) pinning the filter regex against representative real NVIDIA model ids:active: falseand empty-id entries.context_windowwhen present.Local:
bun test src/integrations/→ 94/94 pass.bun run buildclean.Test plan
https://integrate.api.nvidia.com/v1/modelswithNVIDIA_API_KEYset, confirm/modelpicker shows only chat ids.bun test src/integrations/green in CI.