Skip to content

Harden vector store query filter construction#21899

Open
NguyenCong2k wants to merge 2 commits into
run-llama:mainfrom
NguyenCong2k:fix-vector-store-query-filters
Open

Harden vector store query filter construction#21899
NguyenCong2k wants to merge 2 commits into
run-llama:mainfrom
NguyenCong2k:fix-vector-store-query-filters

Conversation

@NguyenCong2k
Copy link
Copy Markdown

Summary

  • parameterize Azure Cosmos NoSQL vector store delete queries
  • escape query/filter string literals before Azure AI Search and Alibaba Cloud OpenSearch deletes
  • validate and escape AnalyticDB metadata filter clauses
  • add targeted regression coverage for affected filter construction paths

Tests

uv run --no-project --with ./llama-index-core --with ./llama-index-integrations/vector_stores/llama-index-vector-stores-analyticdb --with ./llama-index-integrations/vector_stores/llama-index-vector-stores-azurecosmosnosql --with ./llama-index-integrations/vector_stores/llama-index-vector-stores-azureaisearch --with ./llama-index-integrations/vector_stores/llama-index-vector-stores-alibabacloud-opensearch --with pytest --with pytest-asyncio python -m pytest --import-mode=importlib llama-index-integrations/vector_stores/llama-index-vector-stores-analyticdb/tests/test_analyticdb.py llama-index-integrations/vector_stores/llama-index-vector-stores-azurecosmosnosql/tests/test_azurecosmosnosql.py llama-index-integrations/vector_stores/llama-index-vector-stores-azureaisearch/tests/test_azureaisearch.py llama-index-integrations/vector_stores/llama-index-vector-stores-alibabacloud-opensearch/tests/test_vector_stores_alibabacloud_opensearch.py -q

Result: 23 passed, 4 skipped.

Copilot AI review requested due to automatic review settings June 6, 2026 12:08
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Hardens multiple vector store implementations against injection-style issues by parameterizing/escaping ref_doc_id (and other filter inputs) and adding regression tests to validate the safer behavior.

Changes:

  • Azure Cosmos DB NoSQL: parameterize delete() query instead of interpolating ref_doc_id.
  • Azure AI Search + Alibaba Cloud OpenSearch: escape single quotes in ref_doc_id used in filter expressions for (a)sync deletes.
  • AnalyticDB: validate metadata keys and escape SQL string values when building filter clauses; add tests for escaping and unsafe keys.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
llama-index-integrations/vector_stores/llama-index-vector-stores-azurecosmosnosql/tests/test_azurecosmosnosql.py Adds regression test ensuring delete() uses Cosmos parameterized queries.
llama-index-integrations/vector_stores/llama-index-vector-stores-azurecosmosnosql/llama_index/vector_stores/azurecosmosnosql/base.py Switches delete() to use query parameters for ref_doc_id.
llama-index-integrations/vector_stores/llama-index-vector-stores-azureaisearch/tests/test_azureaisearch.py Adds sync/async tests verifying ref_doc_id escaping in OData filters.
llama-index-integrations/vector_stores/llama-index-vector-stores-azureaisearch/llama_index/vector_stores/azureaisearch/base.py Escapes ' in ref_doc_id before embedding into OData filter.
llama-index-integrations/vector_stores/llama-index-vector-stores-analyticdb/tests/test_analyticdb.py Adds tests for SQL escaping and rejecting unsafe metadata keys.
llama-index-integrations/vector_stores/llama-index-vector-stores-analyticdb/llama_index/vector_stores/analyticdb/base.py Adds metadata key validation + SQL string escaping in filter clause builder.
llama-index-integrations/vector_stores/llama-index-vector-stores-alibabacloud-opensearch/tests/test_vector_stores_alibabacloud_opensearch.py Adds async delete test verifying ref_doc_id escaping in filter.
llama-index-integrations/vector_stores/llama-index-vector-stores-alibabacloud-opensearch/llama_index/vector_stores/alibabacloud_opensearch/base.py Escapes ' in ref_doc_id before building OpenSearch filter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 63 to 66
elif filter_.operator == FilterOperator.CONTAINS:
return (
f"metadata_::jsonb->'{filter_.key}' {adb_operator} '[\"{filter_.value}\"]'"
f"metadata_::jsonb->'{key}' {adb_operator} '[\"{_escape_sql_str(filter_.value)}\"]'"
)
Comment on lines +44 to +48
def _validate_metadata_key(key: str) -> str:
# The key becomes part of the SQL/JSON-path text; allow a safe identifier charset only.
if not re.fullmatch(r"[A-Za-z0-9_]+", str(key)):
raise ValueError(f"Invalid metadata filter key: {key!r}")
return str(key)
Comment on lines +315 to +319
query=(
"SELECT c.id, c.id AS partitionKey FROM c "
f"WHERE c.{self._metadata_key}.ref_doc_id = @ref_doc_id"
),
parameters=[{"name": "@ref_doc_id", "value": ref_doc_id}],
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d269ce042d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +46 to +47
if not re.fullmatch(r"[A-Za-z0-9_]+", str(key)):
raise ValueError(f"Invalid metadata filter key: {key!r}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve valid JSON metadata keys

When an existing metadata key contains a non-identifier character such as source.file, content-type, or a space, this new validation makes every AnalyticDB filtered query raise ValueError even though those keys are valid in BaseNode.metadata and are stored in the metadata_ JSON object. Since the key is already used as a quoted JSON key (metadata_->>'...'), escaping quotes in the key would harden the SQL without rejecting legitimate metadata schemas.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

NguyenCong2k commented Jun 6, 2026

Updated AnalyticDB filter construction so metadata keys and values are escaped before being embedded in SQL filter fragments. This preserves valid metadata keys such as source.file, content-type, and keys with spaces while hardening generated query text. Focused vector store tests pass: 23 passed, 4 skipped.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants