feat: add Valkey vector store integration#2476
Open
daric93 wants to merge 8 commits into
Open
Conversation
|
@daric93 is attempting to deploy a commit to the Arc53 Team on Vercel. A member of the Team first needs to authorize it. |
c9976d7 to
677fc12
Compare
Author
|
Hi @dartpain I'd appreciate a review on this when you get a chance — if Valkey vector store backend aligns with the project's roadmap. If someone else is better suited to review the storage/vector store layer, could you point me their way? Thanks! |
|
Hey @daric93 👋 I've opened a PR against your This sends a |
…plication/vectorstore/ using valkey-glide-sync and the valkey-search module for HNSW vector similarity search. - Add ValkeyStore class extending BaseVectorStore with full interface: search, add_texts, delete_index, get_chunks, add_chunk, delete_chunk - Register 'valkey' in VectorCreator factory - Add VALKEY_* settings to Settings class (host, port, password, tls, index_name, prefix) - Add valkey-glide-sync==2.3.1 to requirements.txt - Add Valkey config examples to .env-template - Add 22 unit tests (mocked, no external deps) - Add 7 integration tests (requires running Valkey with search module) Signed-off-by: Daria Korenieva <daric2612@gmail.com>
…aracters in source_id for tag queries (prevents malformed FT.SEARCH queries with dots, hyphens, slashes, etc.) - Use explicit password check (is not None and != '') instead of truthiness to handle empty-string env vars correctly - Add request_timeout=5000ms to GlideClientConfiguration - Use ReturnField to avoid fetching embedding blobs in search/get_chunks - Paginate delete_index and get_chunks to handle >10k documents - Batch DELETE calls (100 keys per call) for efficiency - Improve error message in add_texts to report partial write count - Add unit tests for tag escaping, batch delete, pagination, and password handling edge cases Signed-off-by: Daria Korenieva <daric2612@gmail.com>
…ALKEY_DISTANCE_METRIC (cosine/l2/ip), VALKEY_VECTOR_TYPE (float32), and VALKEY_VECTOR_ALGORITHM (hnsw/flat) settings with safe defaults - Log chosen config at index creation time - Fall back to defaults with a warning if unrecognized values are provided - Add '|' to tag escape character set (was missing) - Support FLAT vector algorithm as alternative to HNSW - Add unit tests for all resolver methods and pipe escaping - Update .env-template and integration test fixture Signed-off-by: Daria Korenieva <daric2612@gmail.com>
… tests - Add close() method and __del__ to release GlideClient TCP connection - _paginated_source_scan now uses ReturnField('source_id') to avoid fetching full document content (only key names needed for deletion) - _ensure_index_exists widens error matching to also catch 'index already' phrasing, reducing brittleness on different Valkey versions - Add 6 new unit tests: close() lifecycle (3), FLAT algorithm path (1), already-exists handling (1), unknown error re-raise (1) - Skipped lazy-import refactor — imports remain at module top per project style
Signed-off-by: Daria Korenieva <daric2612@gmail.com>
…re, Settings, and Postgres Migration pages to list Valkey alongside the other supported vector store backends. Signed-off-by: Daria Korenieva <daric2612@gmail.com>
…, tag escaping, typed exceptions Signed-off-by: Daria Korenieva <daric2612@gmail.com>
…guard in vector_creator.py (fragility concern) - Catch RequestError instead of bare Exception in _ensure_index_exists - Re-raise exceptions in delete_index so callers can handle failures - Extract shared _paginated_search generator with max-iterations guard - Add __enter__/__exit__ for deterministic connection cleanup - Cap search k to [1, 100] to prevent memory exhaustion - Make request_timeout configurable via VALKEY_REQUEST_TIMEOUT setting - Refactor integration test fixture to use real Settings instance - Update tests to match new behavior, add context manager + k bounds tests All 63 tests pass (56 unit + 7 integration). Signed-off-by: Daria Korenieva <daric2612@gmail.com>
677fc12 to
6b6a76f
Compare
Signed-off-by: Daria Korenieva <daric2612@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement ValkeyStore in
application/vectorstore/usingvalkey-glide-syncand the valkey-search module for HNSW vector similarity search.Fixes: #2475
What kind of change does this PR introduce?
Feature — adds Valkey as a new vector store backend option.
Why was this change needed?
DocsGPT users who run Valkey as their primary data store had no way to use its vector search capabilities within DocsGPT, requiring a separate vector database. This integration provides a high-performance, low-latency vector search option using Valkey's native HNSW indexing.
Changes
ValkeyStoreclass extendingBaseVectorStorewith full interface:search,add_texts,delete_index,get_chunks,add_chunk,delete_chunk"valkey"inVectorCreatorfactoryVALKEY_*settings toSettingsclass (host, port, password, tls, index_name, prefix, request_timeout, distance_metric, vector_type, vector_algorithm)valkey-glide-sync==2.3.1torequirements.txt.env-templateOther information
valkey-glide-sync) — consistent with all other vector store implementations in the project. The Flask backend and Celery workers are synchronous — no async runtime exists in the application.valkey-searchmodule loadedVECTOR_STORE=valkeyto enablevalkey-glide-syncis not installedkis capped at [1, 100] to prevent server memory exhaustiondelete_indexre-raises exceptions so callers can handle failures during re-indexingwith ValkeyStore(...) as store:) for deterministic connection cleanup