Open
Conversation
added 17 commits
September 8, 2025 17:06
- Added Solr to the list of supported databases in the main README. - Implemented Solr-specific command-line arguments in `cmdline_args.py` for configuring Solr URL, index name, and index configuration. - Updated the `Database` enum in `vsb/databases/__init__.py` to include Solr and return the `SolrDB` class. - Created necessary files and logic to support Solr database operations, including index creation and management. - Ensured compatibility with existing VSB workflows and command-line interface.
…nvert existing parquet files to the format Pinecone requires; the other to start the import itself.
… continuing a workload that failed.
…esume - Add persistent `requests.Session` with `HTTPAdapter` + `Retry`; centralize HTTP helpers. - Ensure schema on startup: - add/replace `knn_vector` fieldType with correct `vectorDimension` and `similarityFunction` - ensure `id` + `values` fields - add typed, multiValued dynamic fields (`*_s`, `*_i`, `*_f`, `*_b`) - Core lifecycle: - `core_exists`, `_wait_for_core_loaded`, `_recreate_core`, `_unload_hard` - `create_index` now cleanly creates core and refuses to recreate if it already exists - Ingest robustness: - `_filter_existing_ids` to skip already-present docs (RTG via `/select`) - `_normalize_docs` + type inference + field auto-creation - batched add with retries and `commitWithin=60000` - `delete_all`, `delete_index`, and explicit final `commit()` - Query/filters: - `_to_solr_fq` builds typed `fq` from dicts/lists/bools/nums - `search` defaults `ef=200`, returns `id,score` only, supports dict `fq` - API/behavior changes in wrappers: - `SolrClient.__init__` now `core=<name>`, keyword-only; supports `start_from`, `overwrite`, retry settings - `SolrNamespace.insert_batch` respects `skip_populate`, passes `start_from`/`overwrite`; `query` returns list of IDs directly - `SolrDB` passes args by name; handles `overwrite` vs resume; lowers `max_batch_size` to 100; commits after finalize BREAKING CHANGES: - `SolrClient.__init__` signature changed: use `core=<name>` and keyword args; `index_name` removed. - `create_index` errors if core already exists (use `overwrite`/drop if you need a fresh core). - `search` no longer returns `metadata` in `fl` (now `id,score`); update callers if they relied on metadata. - `SolrDB` expects new config keys: `start_from`, `solr_max_retries`, `solr_retry_delay`.
…esources - Added skeleton configsets for Solr to ensure custom settings for cache are used
- Revised the solr.py module to better handle queries with Solr tuning
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 20844949 | Triggered | Generic Password | 3c4f073 | tests/integration/test_solr.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
vkrishna1084
approved these changes
Sep 19, 2025
Contributor
|
reviewed and merging the changes |
added 2 commits
September 19, 2025 14:24
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
We need to add first‐class Solr support to VSB while ensuring robustness, configurability, and code quality.
Solution
SolrClient&SolrNamespaceinvsb/databases/solr/solr.pywith robust retry, backoff, schema management, core lifecycle, and metadata flattening that promotes fields to top-level.--query-limitCLI option (vsb/cmdline_args.py) and plumbed it through workload builders (locustfile.py,ParquetWorkload) to allow small test runs.vsb/metrics.py) to guarantee correct recall.deleteIndex,deleteDataDir,deleteInstanceDir, proper polling, and Docker volume adjustments to avoid stale data.StopUserfor clean Locust shutdown.SOLR_JAVA_MEM), G1GC flags, ulimits, CPU/memory quotas indocker/solr/docker-compose.yml.solrconfig.xml&managed-schema.xmlwith tunedfilterCache,queryResultCache,docValuesCache, thread pools, and HNSW parameters (ef,beamWidth) via bind mounts.tests/integrationfor Solr mirroring the existing common test framework.vsb/databases/solr/README.mdon configset seeding and Docker usage.Type of Change
Test Plan
pytest tests/unitandpytest tests/integration --db solrpass with query-limit and recall > 0.ef,fqparameters return expected docs.locust -f vsb/locustfile.py --query-limit 100 --database solrand verify correct op/s and recall metrics.black . --checkreturns no changes.solrconfig.xmlwithout deleting/var/solr/data.