(retriever) use vLLM for nemotron-parse inference by edknv · Pull Request #1764 · NVIDIA/NeMo-Retriever

edknv · 2026-04-01T18:32:57Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

…gest into edwardk/retriever-parse-vllm

jperez999

Let me know if those comments make sense. Otherwise looks good.

nemo_retriever/src/nemo_retriever/graph/ingestor_runtime.py

nemo_retriever/src/nemo_retriever/nim/nim.py

…gest into edwardk/retriever-parse-vllm

edknv and others added 10 commits March 31, 2026 11:19

use vllm for nemotron-parse inference

bb0f544

checkpoint

7f538d2

checkpoint; nim works

073e222

add additioal parameters

b145dd5

Merge branch 'main' into edwardk/retriever-parse-vllm

629a172

use graph

f893aa1

clean up

5354a9d

lint

0dedc2a

fix tests

af67cd5

Merge branch 'main' into edwardk/retriever-parse-vllm

3280235

edknv requested review from jdye64 and jperez999 April 2, 2026 16:26

edknv marked this pull request as ready for review April 2, 2026 16:26

edknv requested review from a team as code owners April 2, 2026 16:26

edknv and others added 7 commits April 2, 2026 19:33

Merge branch 'main' into edwardk/retriever-parse-vllm

c38e8b5

Merge branch 'edwardk/retriever-parse-vllm' of github.com:edknv/nv-in…

988a049

…gest into edwardk/retriever-parse-vllm

Merge branch 'main' into edwardk/retriever-parse-vllm

ee01798

set batch size

ea2299c

Merge branch 'main' into edwardk/retriever-parse-vllm

f8f1d92

ArchetypeOperator

8404b16

fix batch size

d483408

jperez999 approved these changes Apr 3, 2026

View reviewed changes

nemo_retriever/src/nemo_retriever/graph/ingestor_runtime.py Outdated Show resolved Hide resolved

nemo_retriever/src/nemo_retriever/nim/nim.py Outdated Show resolved Hide resolved

edknv added 4 commits April 3, 2026 17:57

send correct request format to build endpoint

b3258d8

correct request/response format for build endpoint

dacde94

Merge branch 'edwardk/retriever-parse-vllm' of github.com:edknv/nv-in…

a9a4e13

…gest into edwardk/retriever-parse-vllm

move open ai tools to chat_completions.py

bb0fec4

edknv merged commit 349ce96 into NVIDIA:main Apr 4, 2026
5 checks passed

edknv deleted the edwardk/retriever-parse-vllm branch April 4, 2026 01:31

edknv mentioned this pull request Apr 4, 2026

Fix nemotron_parse: disable KV cache and suppress pdfium text #1746

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(retriever) use vLLM for nemotron-parse inference#1764

(retriever) use vLLM for nemotron-parse inference#1764
edknv merged 21 commits intoNVIDIA:mainfrom
edknv:edwardk/retriever-parse-vllm

edknv commented Apr 1, 2026

Uh oh!

jperez999 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edknv commented Apr 1, 2026

Description

Checklist

Uh oh!

jperez999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants