feat(nvidia-nim): dynamic model discovery via integrate.api.nvidia.com (#1099) by 0xghost42 · Pull Request #1177 · Gitlawb/openclaude

0xghost42 · 2026-05-15T07:12:40Z

Summary

Closes the dynamic-discovery half of #1099 by mirroring #1143's Groq hybrid-catalog pattern on the NVIDIA NIM gateway.

src/integrations/gateways/nvidia-nim.ts — catalog.source: 'static' (single entry) → catalog.source: 'hybrid' with a discovery.mapModel filter against https://integrate.api.nvidia.com/v1/models.
Filter regex NVIDIA_NON_CHAT_PATTERN excludes the non-chat catalog NVIDIA returns: embeddings, retrievers, rerankers, ASR (whisper, parakeet, canary, riva), TTS / voice, image-gen (SDXL, flux, stable-diffusion, kosmos, florence), nvclip, safety / guard / nemoguard / content-safety, reward models.
Settings match the Groq descriptor: discoveryCacheTtl: '1d', discoveryRefreshMode: 'background-if-stale', allowManualRefresh: true.
Existing Nemotron 70B static entry stays as the hybrid fallback when discovery is unavailable.

Out of scope

src/utils/model/nvidiaNimModels.ts (the parallel env-var path used when NVIDIA_NIM=1 or OPENAI_BASE_URL contains nvidia) is unchanged. That path has its own hand-rolled list; wiring it through the discovery service is a separate, larger change. This PR addresses the descriptor-backed half so profile users get dynamic discovery without touching the env-var path's sync API.

Tests

New src/integrations/gateways/nvidia-nim.test.ts (9 tests, 39 expect calls) pinning the filter regex against representative real NVIDIA model ids:
- Keeps chat / instruct / reasoning / code (Nemotron, Llama, Qwen, DeepSeek, Mixtral, Phi, Gemma, Kimi, vision-instruct, QwQ).
- Drops embedding / retriever / rerank.
- Drops ASR / TTS / voice.
- Drops image-gen / vision-only embedding.
- Drops safety / guard / reward.
- Drops active: false and empty-id entries.
- Forwards context_window when present.

Local: bun test src/integrations/ → 94/94 pass. bun run build clean.

Test plan

Live run against https://integrate.api.nvidia.com/v1/models with NVIDIA_API_KEY set, confirm /model picker shows only chat ids.
bun test src/integrations/ green in CI.
Existing static fallback still resolves when the live endpoint is unreachable.

…a.com (Gitlawb#1099) Mirror the Gitlawb#1143 Groq hybrid-catalog pattern for the NVIDIA NIM gateway: replace the single-entry static catalog with discovery against https://integrate.api.nvidia.com/v1/models and a filter that excludes embedding, retriever, reranker, ASR (whisper, parakeet, canary, riva), TTS, image-gen (SDXL, flux, stable-diffusion, kosmos, florence), safety (llama-guard, nemoguard, content-safety), and reward models so the /model picker only surfaces chat/instruct ids. Settings match Groq's: - catalog.source: hybrid (keep the existing Nemotron 70B as the static fallback when discovery is unavailable) - discoveryCacheTtl: 1d - discoveryRefreshMode: background-if-stale - allowManualRefresh: true Adds a focused gateway test (`nvidia-nim.test.ts`) pinning the filter regex against representative real NVIDIA model ids — keeps, embedding drops, ASR drops, image-gen drops, safety drops, inactive drops, plus context_window forwarding — so the filter does not silently start admitting non-chat models as NVIDIA's catalog grows. The existing `src/utils/model/nvidiaNimModels.ts` env-var path (used when users set NVIDIA_NIM or detect via OPENAI_BASE_URL) is unchanged for now; its hand-rolled list keeps working. Wiring that path through the discovery service is a separate, larger change.

jatmn

Findings

[P1] Keep live non-chat NVIDIA entries out of discovery
src/integrations/gateways/nvidia-nim.ts:65
The blacklist does not catch several non-chat models that the live https://integrate.api.nvidia.com/v1/models endpoint currently returns, so this change still pollutes /model with entries that are not usable chat/instruct choices. For example, using this PR's mapModel against the live catalog accepts baai/bge-m3, google/deplot, nvidia/ai-synthetic-video-detector, nvidia/gliner-pii, and nvidia/ising-calibration-1-35b-a3b as normal catalog entries. That directly undercuts the PR goal of showing only chat ids and will lead users to select models that fail later. Please make the filter positive for known chat/instruct/reasoning/code patterns, or otherwise expand the mapping/tests with real current NVIDIA non-chat ids before enabling hybrid discovery.
[P2] Let inline /model <id> accept discovered NVIDIA models
src/utils/model/validateModel.ts:53
The picker can now surface dynamically discovered NVIDIA models, but the inline /model some-id path still validates NVIDIA NIM only against the legacy hard-coded nvidiaNimModels.ts list and returns before it can use descriptor discovery or an API probe. I reproduced this with a live discovered id: abacusai/dracarys-llama-3.1-70b-instruct is accepted by this PR's mapModel, but validateModel() rejects it with "not found in NVIDIA NIM catalog" because it is not in the old static list. Users who type a discovered model name, or scripts that use /model <id>, will be blocked even though the descriptor catalog now knows about the model. Please route NVIDIA validation through the descriptor discovery cache/catalog, or at least fall through to the existing API validation for ids outside the legacy list.

Vasanthdev2004 · 2026-05-16T10:32:55Z

Blockers

Blacklist doesn't catch all non-chat models — The filter still accepts models like baai/bge-m3, google/deplot, nvidia/ai-synthetic-video-detector, etc. This pollutes /model with unusable entries.
Inline /model <id> doesn't accept discovered models — The picker shows discovered models, but typing /model some-id rejects them because it only validates against the legacy static list.

Non-Blocking

Out of scope identified (env-var path has its own hand-rolled list).

Looks Good

Mirrors the Groq hybrid-catalog pattern
9 tests with 39 expect calls
94 tests passing
Good filter regex for non-chat models (embeddings, ASR, TTS, image-gen, safety)

Verdict: Changes Requested — filter needs to be more comprehensive, and inline validation needs to accept discovered models.

gnanam1990 · 2026-05-17T07:19:03Z

Verified @jatmn's findings. The NVIDIA_NON_CHAT_PATTERN blacklist is the core issue: it's an exclusion regex, so any non-chat id that doesn't contain one of its tokens passes through — baai/bge-m3, google/deplot, nvidia/gliner-pii, nvidia/ising-calibration-… all match none of the patterns and would be admitted to /model. A positive allowlist for known chat/instruct/reasoning/code patterns (as @jatmn suggested) is the more robust direction here. His second point also holds — validateModel.ts isn't touched, so inline /model <id> still validates NVIDIA only against the legacy static list and will reject discovered ids the picker now shows. Routing NVIDIA validation through the descriptor discovery cache (or falling through to API validation outside the legacy list) would close that gap. Appreciate the work — looking forward to the follow-up.

jatmn requested changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nvidia-nim): dynamic model discovery via integrate.api.nvidia.com (#1099)#1177

feat(nvidia-nim): dynamic model discovery via integrate.api.nvidia.com (#1099)#1177
0xghost42 wants to merge 1 commit into
Gitlawb:mainfrom
0xghost42:feat/1099-nvidia-dynamic-discovery

0xghost42 commented May 15, 2026

Uh oh!

jatmn left a comment

Uh oh!

Vasanthdev2004 commented May 16, 2026

Uh oh!

gnanam1990 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

0xghost42 commented May 15, 2026

Summary

Out of scope

Tests

Test plan

Uh oh!

jatmn left a comment

Choose a reason for hiding this comment

Findings

Uh oh!

Vasanthdev2004 commented May 16, 2026

Blockers

Non-Blocking

Looks Good

Uh oh!

gnanam1990 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants