Skip to content

feat: add pgvector vector_io provider#156

Merged
mergify[bot] merged 1 commit intoopendatahub-io:mainfrom
Ygnas:pgvector
Dec 9, 2025
Merged

feat: add pgvector vector_io provider#156
mergify[bot] merged 1 commit intoopendatahub-io:mainfrom
Ygnas:pgvector

Conversation

@Ygnas
Copy link
Copy Markdown
Contributor

@Ygnas Ygnas commented Dec 9, 2025

RHAIENG-2277

What does this PR do?

Adds PGVector provider to the midstream distro image

Test Plan

Summary by CodeRabbit

  • New Features

    • Added pgvector as an opt-in remote vector storage provider (enabled via ENABLE_PGVECTOR).
    • Includes a local registry backend for pgvector persistence.
  • Chores

    • Added required database and async packages to support pgvector (Postgres and SQLite clients and asyncio DB tooling).

✏️ Tip: You can customize this high-level summary in your review settings.

@Ygnas Ygnas requested a review from kelbrown20 as a code owner December 9, 2025 11:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

Adds a pgvector remote provider: documents it, declares provider_type and additional pip packages in build.yaml, and configures a conditional provider entry plus a sqlite-backed persistence backend in run.yaml.

Changes

Cohort / File(s) Change Summary
pgvector provider support
distribution/README.md, distribution/build.yaml, distribution/run.yaml
Introduced a remote::pgvector provider entry in README and run.yaml (conditional on ENABLE_PGVECTOR), added provider_type: remote::pgvector to build.yaml, added additional_pip_packages (aiosqlite, sqlalchemy[asyncio], asyncpg, psycopg2-binary), configured provider config (host, port, db, user, password) with persistence using a new kv_pgvector sqlite backend at /opt/app-root/src/.llama/distributions/rh/pgvector_registry.db.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify pip package names and compatibility (aiosqlite, sqlalchemy[asyncio], asyncpg, psycopg2-binary).
  • Confirm environment variable gating (ENABLE_PGVECTOR) and any referenced PG-related env var names/usage.
  • Validate provider config fields align with pgvector/remote provider expectations.
  • Check sqlite backend path, permissions, and persistence namespace consistency.

Poem

🐇 I hopped in with a vector cheer,
New rows and keys now draw me near,
Host, port, creds — a tidy nest,
SQLite keeps our registry best,
Pip-packed and ready — hop to the test!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a pgvector vector_io provider to the distribution configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 722e5ef and 6a106ee.

📒 Files selected for processing (3)
  • distribution/README.md (1 hunks)
  • distribution/build.yaml (1 hunks)
  • distribution/run.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • distribution/README.md
  • distribution/build.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (2)
distribution/run.yaml (2)

238-240: Storage backend configuration is consistent with other vector_io providers.

The kv_pgvector backend uses kv_sqlite with an appropriate db_path, matching the pattern established by kv_faiss, kv_milvus_inline, and kv_milvus_remote backends.


96-106: pgvector provider configuration is syntactically correct and follows established patterns.

The conditional provider_id syntax (${env.ENABLE_PGVECTOR:+pgvector}), remote provider type, and persistence backend configuration align with existing vector_io providers. The PostgreSQL defaults (localhost, port 5432) and environment variable naming convention are appropriate.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ygnas Ygnas changed the title feat: add pgvector vectio_io provider feat: add pgvector vectior_io provider Dec 9, 2025
@Ygnas Ygnas changed the title feat: add pgvector vectior_io provider feat: add pgvector vector_io provider Dec 9, 2025
Copy link
Copy Markdown
Collaborator

@skamenan7 skamenan7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@nathan-weinberg nathan-weinberg requested a review from a team December 9, 2025 14:32
@nathan-weinberg nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Dec 9, 2025
@nathan-weinberg nathan-weinberg removed the do-not-merge Apply to PRs that should not be merged (yet) label Dec 9, 2025
@mergify mergify bot merged commit 9d5e171 into opendatahub-io:main Dec 9, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants