Skip to content

feat(nvidia-nim): dynamic model discovery via integrate.api.nvidia.com (#1099)#1177

Open
0xghost42 wants to merge 1 commit into
Gitlawb:mainfrom
0xghost42:feat/1099-nvidia-dynamic-discovery
Open

feat(nvidia-nim): dynamic model discovery via integrate.api.nvidia.com (#1099)#1177
0xghost42 wants to merge 1 commit into
Gitlawb:mainfrom
0xghost42:feat/1099-nvidia-dynamic-discovery

Conversation

@0xghost42
Copy link
Copy Markdown
Contributor

Summary

Closes the dynamic-discovery half of #1099 by mirroring #1143's Groq hybrid-catalog pattern on the NVIDIA NIM gateway.

  • src/integrations/gateways/nvidia-nim.tscatalog.source: 'static' (single entry) → catalog.source: 'hybrid' with a discovery.mapModel filter against https://integrate.api.nvidia.com/v1/models.
  • Filter regex NVIDIA_NON_CHAT_PATTERN excludes the non-chat catalog NVIDIA returns: embeddings, retrievers, rerankers, ASR (whisper, parakeet, canary, riva), TTS / voice, image-gen (SDXL, flux, stable-diffusion, kosmos, florence), nvclip, safety / guard / nemoguard / content-safety, reward models.
  • Settings match the Groq descriptor: discoveryCacheTtl: '1d', discoveryRefreshMode: 'background-if-stale', allowManualRefresh: true.
  • Existing Nemotron 70B static entry stays as the hybrid fallback when discovery is unavailable.

Out of scope

src/utils/model/nvidiaNimModels.ts (the parallel env-var path used when NVIDIA_NIM=1 or OPENAI_BASE_URL contains nvidia) is unchanged. That path has its own hand-rolled list; wiring it through the discovery service is a separate, larger change. This PR addresses the descriptor-backed half so profile users get dynamic discovery without touching the env-var path's sync API.

Tests

  • New src/integrations/gateways/nvidia-nim.test.ts (9 tests, 39 expect calls) pinning the filter regex against representative real NVIDIA model ids:
    • Keeps chat / instruct / reasoning / code (Nemotron, Llama, Qwen, DeepSeek, Mixtral, Phi, Gemma, Kimi, vision-instruct, QwQ).
    • Drops embedding / retriever / rerank.
    • Drops ASR / TTS / voice.
    • Drops image-gen / vision-only embedding.
    • Drops safety / guard / reward.
    • Drops active: false and empty-id entries.
    • Forwards context_window when present.

Local: bun test src/integrations/ → 94/94 pass. bun run build clean.

Test plan

  • Live run against https://integrate.api.nvidia.com/v1/models with NVIDIA_API_KEY set, confirm /model picker shows only chat ids.
  • bun test src/integrations/ green in CI.
  • Existing static fallback still resolves when the live endpoint is unreachable.

…a.com (Gitlawb#1099)

Mirror the Gitlawb#1143 Groq hybrid-catalog pattern for the NVIDIA NIM
gateway: replace the single-entry static catalog with discovery
against https://integrate.api.nvidia.com/v1/models and a filter
that excludes embedding, retriever, reranker, ASR (whisper,
parakeet, canary, riva), TTS, image-gen (SDXL, flux, stable-diffusion,
kosmos, florence), safety (llama-guard, nemoguard, content-safety),
and reward models so the /model picker only surfaces chat/instruct
ids.

Settings match Groq's:
- catalog.source: hybrid (keep the existing Nemotron 70B as the
  static fallback when discovery is unavailable)
- discoveryCacheTtl: 1d
- discoveryRefreshMode: background-if-stale
- allowManualRefresh: true

Adds a focused gateway test (`nvidia-nim.test.ts`) pinning the
filter regex against representative real NVIDIA model ids — keeps,
embedding drops, ASR drops, image-gen drops, safety drops, inactive
drops, plus context_window forwarding — so the filter does not
silently start admitting non-chat models as NVIDIA's catalog grows.

The existing `src/utils/model/nvidiaNimModels.ts` env-var path
(used when users set NVIDIA_NIM or detect via OPENAI_BASE_URL) is
unchanged for now; its hand-rolled list keeps working. Wiring that
path through the discovery service is a separate, larger change.
Copy link
Copy Markdown
Collaborator

@jatmn jatmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [P1] Keep live non-chat NVIDIA entries out of discovery
    src/integrations/gateways/nvidia-nim.ts:65
    The blacklist does not catch several non-chat models that the live https://integrate.api.nvidia.com/v1/models endpoint currently returns, so this change still pollutes /model with entries that are not usable chat/instruct choices. For example, using this PR's mapModel against the live catalog accepts baai/bge-m3, google/deplot, nvidia/ai-synthetic-video-detector, nvidia/gliner-pii, and nvidia/ising-calibration-1-35b-a3b as normal catalog entries. That directly undercuts the PR goal of showing only chat ids and will lead users to select models that fail later. Please make the filter positive for known chat/instruct/reasoning/code patterns, or otherwise expand the mapping/tests with real current NVIDIA non-chat ids before enabling hybrid discovery.

  • [P2] Let inline /model <id> accept discovered NVIDIA models
    src/utils/model/validateModel.ts:53
    The picker can now surface dynamically discovered NVIDIA models, but the inline /model some-id path still validates NVIDIA NIM only against the legacy hard-coded nvidiaNimModels.ts list and returns before it can use descriptor discovery or an API probe. I reproduced this with a live discovered id: abacusai/dracarys-llama-3.1-70b-instruct is accepted by this PR's mapModel, but validateModel() rejects it with "not found in NVIDIA NIM catalog" because it is not in the old static list. Users who type a discovered model name, or scripts that use /model <id>, will be blocked even though the descriptor catalog now knows about the model. Please route NVIDIA validation through the descriptor discovery cache/catalog, or at least fall through to the existing API validation for ids outside the legacy list.

@Vasanthdev2004
Copy link
Copy Markdown
Collaborator

Blockers

  1. Blacklist doesn't catch all non-chat models — The filter still accepts models like baai/bge-m3, google/deplot, nvidia/ai-synthetic-video-detector, etc. This pollutes /model with unusable entries.

  2. Inline /model <id> doesn't accept discovered models — The picker shows discovered models, but typing /model some-id rejects them because it only validates against the legacy static list.

Non-Blocking

  • Out of scope identified (env-var path has its own hand-rolled list).

Looks Good

  • Mirrors the Groq hybrid-catalog pattern
  • 9 tests with 39 expect calls
  • 94 tests passing
  • Good filter regex for non-chat models (embeddings, ASR, TTS, image-gen, safety)

Verdict: Changes Requested — filter needs to be more comprehensive, and inline validation needs to accept discovered models.

@gnanam1990
Copy link
Copy Markdown
Collaborator

Verified @jatmn's findings. The NVIDIA_NON_CHAT_PATTERN blacklist is the core issue: it's an exclusion regex, so any non-chat id that doesn't contain one of its tokens passes through — baai/bge-m3, google/deplot, nvidia/gliner-pii, nvidia/ising-calibration-… all match none of the patterns and would be admitted to /model. A positive allowlist for known chat/instruct/reasoning/code patterns (as @jatmn suggested) is the more robust direction here. His second point also holds — validateModel.ts isn't touched, so inline /model <id> still validates NVIDIA only against the legacy static list and will reject discovered ids the picker now shows. Routing NVIDIA validation through the descriptor discovery cache (or falling through to API validation outside the legacy list) would close that gap. Appreciate the work — looking forward to the follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants