NIM Usage Scanner

A static code analyzer that scans Git repositories to discover and catalog NVIDIA NIM (Inference Microservice) usage.

Features

Multi-repo Scanning: Clone and scan multiple repositories from a configuration file
Local NIM Detection: Find nvcr.io/nim/* Docker image references
Hosted NIM Detection: Find hosted endpoints and model references (publisher-whitelisted)
Source Classification: Distinguish between source code and GitHub Actions workflow usage
NGC API Enrichment: Resolve latest tags and fetch Function details
Query Mode: Directly query NIM information by model/image name

Quick Start

Prerequisites

Rust 1.70+ (for building from source)
NVIDIA API Key (from NGC, optional for enrichment)
GitHub Token (for cloning private repositories, optional)

Installation

# Build from source
cd nim-usage-scanner
cargo build --release

# Binary will be at ./target/release/nim-usage-scanner

Basic Usage

1. Scan Repositories

# Set environment variables (optional)
export NVIDIA_API_KEY="nvapi-xxx"
export GITHUB_TOKEN="ghp_xxx"

# Scan repositories defined in repos.yaml
./target/release/nim-usage-scanner scan -c config/repos.yaml

# Regenerate repos.yaml from Build Page before scanning
./target/release/nim-usage-scanner scan -c config/repos.yaml --refresh-repos

# Use a persistent workdir and keep repos after scan (recommended for repeated runs)
# First run: clones into /tmp/blueprint-scan. Second and later runs: reuses existing dirs and pulls latest (no full clone).
./target/release/nim-usage-scanner scan -c config/repos.yaml --workdir /tmp/blueprint-scan --keep-repos --jobs 4
# Add --refresh-repos only when you want to regenerate repos.yaml from Build Page before scanning

# Output will be in ./output/report.json, ./output/report.csv, and ./output/report_aggregate.json

2. Query NIM Information

# Query Hosted NIM details
./target/release/nim-usage-scanner query hosted-nim \
  --model "nvidia/llama-3.1-nemotron-70b-instruct" \
  --ngc-api-key "nvapi-xxx"

# Query Local NIM details
./target/release/nim-usage-scanner query local-nim \
  --image "nvidia/llama-3.2-nv-embedqa-1b-v2" \
  --ngc-api-key "nvapi-xxx"

Configuration

Create a repos.yaml file:

version: "1.0"

defaults:
  branch: main
  depth: 1

repos:
  - name: NVIDIA/GenerativeAIExamples
    url: https://github.com/NVIDIA/GenerativeAIExamples.git

  - name: my-org/my-private-repo
    url: https://github.com/my-org/my-private-repo.git
    branch: develop
    enabled: true   # optional, defaults to true; set false to skip

Generate repos.yaml from Build Blueprints (optional)

You can generate config/repos.yaml directly from the Build API and each endpoint's spec ("View GitHub" link). This uses the NGC catalog resources API (/v2/search/catalog/resources/BLUEPRINT with query and pageSize) to list all blueprints in one request, then /v2/blueprints/{orgName}/{name}/spec for each spec.

python scripts/generate_repos_from_ngc.py

Optional flags:

--label blueprint
--page-size 1000
--branch main
--depth 1
--output config/repos.yaml

Extra repos (repos.githubonly.yaml)

When you run with --refresh-repos, the scanner first overwrites repos.yaml with repos from the NGC Build API, then merges any repos from repos.githubonly.yaml in the same directory. Use this to always scan a fixed set of custom repos without changing the command.

Put repos.githubonly.yaml next to your config, e.g. config/repos.githubonly.yaml when using -c config/repos.yaml.
Same format as repos.yaml (version, defaults, repos). Repos are merged by name; duplicates (same name as in NGC list) are skipped so only extra repos are added.
Copy from config/repos.githubonly.yaml.example and add your entries under repos:.

Commands

`scan` - Scan Repositories

nim-usage-scanner scan [OPTIONS] -c <CONFIG> [--ngc-api-key <KEY>] [--github-token <TOKEN>]

Option	Description
`-c, --config`	Path to repos.yaml (required)
`-o, --output`	Output directory (default: `./output`)
`-w, --workdir`	Working directory for cloning repos (optional; uses temp dir if omitted)
`--keep-repos`	Keep cloned repositories after scanning; with `--workdir`, next run reuses and pulls instead of cloning (default: false)
`-j, --jobs`	Maximum number of parallel jobs (optional)
`--refresh-repos`	Regenerate repos.yaml from Build Page, then merge repos from repos.githubonly.yaml (same dir as config) (default: false)
`--ngc-api-key`	NVIDIA API Key (or use `NVIDIA_API_KEY` env var, optional)
`--github-token`	GitHub Token (or use `GITHUB_TOKEN` env var, optional)
`-v, --verbose`	Increase logging verbosity

`query` - Query NIM Information

`query hosted-nim`

Query Hosted NIM (cloud-hosted inference service) information.

nim-usage-scanner query hosted-nim --model <MODEL> --ngc-api-key <KEY>

Returns: Function ID, status, containerImage, inference URL, etc.

`query local-nim`

Query Local NIM (Docker container) information.

nim-usage-scanner query local-nim --image <IMAGE> --ngc-api-key <KEY>

Returns: Latest tag (actual version), description, publisher, etc.

⚠️ Important Limitations

Query Feature Differences

Hosted NIM and Local NIM are fundamentally different architectures, so the available information differs:

Information	Hosted NIM	Local NIM	Reason
Function ID	✅	❌	Only Hosted NIMs run on NVIDIA Cloud Functions (NVCF)
Status (ACTIVE/INACTIVE)	✅	❌	Hosted NIMs are managed cloud services
Container Image	✅	❌	Refers to the underlying container of Hosted NIM
Latest Tag → Actual Version	❌	✅	Local NIMs are Docker images with tags
Description	❌	✅	Comes from NGC Container Registry metadata
Inference URL	✅	❌	Hosted NIMs have cloud API endpoints

Why This Limitation Exists

Hosted NIM: Runs on NVIDIA's cloud infrastructure (NVCF). Each Hosted NIM has a unique Function ID that tracks its deployment status, container image, and API endpoint.
Local NIM: Is a Docker image that you pull and run locally. It has no "Function ID" or "status" because it's not a managed service - you manage it yourself.

Practical Implications

# ✅ This works - get Hosted NIM function details
nim-usage-scanner query hosted-nim --model "nvidia/llama-3.1-nemotron-70b-instruct"
# Returns: functionId, status, containerImage, inferenceUrl...

# ✅ This works - get Local NIM image details
nim-usage-scanner query local-nim --image "nvidia/llama-3.2-nv-embedqa-1b-v2"
# Returns: latestTag, description, publisher...

# ❌ Cannot get "status" for Local NIM - it's not a managed service
# ❌ Cannot get "latestTag" for Hosted NIM - it's not a Docker image

How Detection Works

Local NIM (Docker Images)

Local NIMs are detected by scanning file contents for Docker image references:

Full image with tag: nvcr.io/nim/<namespace>/<name>:<tag>
Image without tag: nvcr.io/nim/<namespace>/<name> (tag defaults to latest)

Additional behavior:

YAML tag context: In .yaml/.yml, if an image is found with latest, the scanner looks up to 3 lines ahead for a tag: field and uses it when present.
File types: The scanner checks common source and config formats: py, yaml/yml, json, toml, env, Dockerfile (or any filename starting with Dockerfile), md, ipynb, sh, bash, js, ts, jsx, tsx, cfg, ini, conf.

Hosted NIM (API Endpoints + Model Names)

Hosted NIMs are detected by scanning for:

API endpoints matching https://{integrate|ai|build}.api.nvidia.com/...
Model fields such as model = "org/name", model: "org/name", or model_name: "org/name" (e.g. in YAML/docs)
Known client patterns like ChatNVIDIA(...), NVIDIAEmbeddings(...), NVIDIARerank(...)
Environment or config assignments such as os.environ["APP_EMBEDDINGS_MODELNAME"] = "org/model" (e.g. in notebooks)
Build Page links like https://build.nvidia.com/org/model
Prose in docs such as for nvidia/llama-3.2-nv-embedqa-1b-v2 model or typo nvidia/llama-3.2-nv-embedqa-1b-v2model (org must be in the runtime publisher whitelist)

For all of the above, the org in org/model can be any publisher name; only those in the runtime publisher whitelist (from the NGC filters API) are counted as Hosted NIM.

In source/config files (e.g. .py, .yaml), if a model name is not present on a line but an endpoint URL is, the scanner may try to extract org/model from the URL path.
For YAML files, if an endpoint is found without a model name, the scanner searches up to 10 lines around it for a model or model_name field.

Publisher whitelist:

The model prefix (org in org/model) must be in a publisher whitelist to be counted.
The whitelist is fetched at runtime from the NGC catalog filters API (/v2/search/catalog/filters/ENDPOINT), which is separate from the catalog resources API (/v2/search/catalog/resources/BLUEPRINT) used for listing blueprints (e.g. --refresh-repos). From the filters response we use only the filterValue field from the filterCategory: "publisher" entries. The API may return publishers such as nvidia, meta, mistralai, microsoft, google, qwen, deepseek_ai. If the API is unavailable or returns no publishers, a built-in fallback list is used (nvidia, meta, mistralai, google, deepseek, stg).
Matching is case-insensitive: values are stored and compared in lowercase.
This whitelist applies to all file types, including md and ipynb.

Output Formats

JSON Report (`report.json`)

{
  "scan_time": "2025-01-21T10:30:00Z",
  "total_repos": 5,
  "source_code": {
    "local_nim": [...],
    "hosted_nim": [...]
  },
  "actions_workflow": {
    "local_nim": [...],
    "hosted_nim": [...]
  },
  "aggregated": {
    "local_nim": [...],
    "hosted_nim": [...]
  },
  "summary": {...}
}

CSV Report (`report.csv`)

Unified CSV with all findings:

source_type,nim_type,repository,file_path,line_number,image_url,tag,resolved_tag,endpoint_url,model_name,function_id,status,container_image,match_context
source_code,local_nim,NVIDIA/Example,Dockerfile,5,nvcr.io/nim/nvidia/llama,latest,1.10.0,,,,,"FROM nvcr.io/nim/..."
source_code,hosted_nim,NVIDIA/Example,src/main.py,42,,,,https://ai.api.nvidia.com,nvidia/llama,abc-123,ACTIVE,nvcr.io/...,"model=..."

Environment Variables

Variable	Description
`NVIDIA_API_KEY`	NGC API Key (optional; used for tag resolution and query enrichment)
`GITHUB_TOKEN`	GitHub Token (optional; required only for cloning private repositories)
`RUST_LOG`	Log level: `debug`, `info`, `warn`, `error`

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
docs/en		docs/en
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NIM Usage Scanner

Features

Quick Start

Prerequisites

Installation

Basic Usage

1. Scan Repositories

2. Query NIM Information

Configuration

Generate repos.yaml from Build Blueprints (optional)

Extra repos (repos.githubonly.yaml)

Commands

`scan` - Scan Repositories

`query` - Query NIM Information

`query hosted-nim`

`query local-nim`

⚠️ Important Limitations

Query Feature Differences

Why This Limitation Exists

Practical Implications

How Detection Works

Local NIM (Docker Images)

Hosted NIM (API Endpoints + Model Names)

Output Formats

JSON Report (`report.json`)

CSV Report (`report.csv`)

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NIM Usage Scanner

Features

Quick Start

Prerequisites

Installation

Basic Usage

1. Scan Repositories

2. Query NIM Information

Configuration

Generate repos.yaml from Build Blueprints (optional)

Extra repos (repos.githubonly.yaml)

Commands

scan - Scan Repositories

query - Query NIM Information

query hosted-nim

query local-nim

⚠️ Important Limitations

Query Feature Differences

Why This Limitation Exists

Practical Implications

How Detection Works

Local NIM (Docker Images)

Hosted NIM (API Endpoints + Model Names)

Output Formats

JSON Report (report.json)

CSV Report (report.csv)

Environment Variables

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scan` - Scan Repositories

`query` - Query NIM Information

`query hosted-nim`

`query local-nim`

JSON Report (`report.json`)

CSV Report (`report.csv`)

Packages