Skip to content

(retriever) fix image ingestion pipeline and restore api key resolution#1790

Merged
edknv merged 7 commits intoNVIDIA:mainfrom
edknv:edwardk/retriever-fix-remote-endpoint
Apr 3, 2026
Merged

(retriever) fix image ingestion pipeline and restore api key resolution#1790
edknv merged 7 commits intoNVIDIA:mainfrom
edknv:edwardk/retriever-fix-remote-endpoint

Conversation

@edknv
Copy link
Copy Markdown
Collaborator

@edknv edknv commented Apr 3, 2026

Description

  • GPU scheduling for image extraction: MultiTypeExtractOperator now inherits from GPUOperator so it gets GPU resources when running local models. The executor also checks nested ExtractParams for invoke URLs (not just top-level kwargs) to correctly detect remote vs local mode.
  • Restore NVIDIA_API_KEY auto-resolution: The refactor in Refactor step #1778 dropped resolve_remote_api_key() calls from the ingestor API. Re-added in GraphIngestor so NVIDIA_API_KEY is automatically picked up from the environment.
  • Image captioning for standalone images: ImageLoadActor now includes the full image in the images list so CaptionActor can caption standalone image files (not just images extracted from PDFs).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@edknv edknv requested review from a team as code owners April 3, 2026 01:43
@edknv edknv requested review from charlesbluca and jperez999 April 3, 2026 01:43


class MultiTypeExtractOperator(AbstractOperator):
class MultiTypeExtractOperator(AbstractOperator, GPUOperator):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need the GPU operator, specifically? Which extractor underneath uses the GPU?

Copy link
Copy Markdown
Collaborator Author

@edknv edknv Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was before the introduction ArchetypeOperator. Separated it out into a cpu operator and a gpu operator in 1b2196f

@edknv edknv force-pushed the edwardk/retriever-fix-remote-endpoint branch from 1900031 to 30061d2 Compare April 3, 2026 19:10
@edknv edknv merged commit 19d5541 into NVIDIA:main Apr 3, 2026
6 checks passed
@edknv edknv deleted the edwardk/retriever-fix-remote-endpoint branch April 3, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants