Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearchVectorStore similaritySearch Metadata filtering not working for exact match #6867

Open
5 tasks done
stevedaskam opened this issue Sep 23, 2024 · 4 comments
Open
5 tasks done
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@stevedaskam
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

const docs = [
      new Document({
        metadata: { user: "U12345-1234" },
        pageContent: "Amazon Simple Storage Service (Amazon S3) is a cloud storage service that allows users to store and retrieve data of any amount from anywhere.",
      }),
      new Document({
        metadata: { user: "U12345-5678" },
        pageContent: "Amazon Elastic Compute Cloud (Amazon EC2) provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.",
      }),
    ];

const vectorStore = await OpenSearchVectorStore.fromDocuments(docs, embeddings, {
        client: client,
        indexName: 'my-index',
        service: 'aoss'
      });

const query = "What is S3?";
const k = 5; // Number of results to retrieve
const filter = {
  user: "U12345-1234"
};

// Returns no documents
const results = await vectorStore.similaritySearch(query, k, filter);

### Error Message and Stack Trace (if applicable)

_No response_

### Description

OpenSearchVectorStore similarity search metadata filtering not returning any documents for an exact match of the specified user field in the document metadata.

### System Info

AWS Lambda running Node.js 20.x
Copy link

dosubot bot commented Sep 23, 2024

I found a similar unsolved discussion and a related closed issue that might help you:

  1. How to filter metadata with OpenSearch similaritySearch [1].
  2. ElasticVectorSearch: exclude metadata filters not working due to syntax error in filter creation, which was closed by a pull request: community[patch]: fix: ElasticVectorSearch: exclude metadata filters … [2].

The issue might be due to the way the metadata filtering is being applied in the similarity search. Ensure that you are using the correct field name and match criteria in your searchOptions. Here is an example:

const exactValueResult = await store.similaritySearch(query, 4, {
  fields: ["metadata.user"],
  searchOptions: {
    query: { field: "metadata.user", match: "U12345-1234" },
  },
});

Additionally, verify that the metadata is correctly indexed and stored in the vector store. If the metadata is not properly indexed, the search might not be able to filter based on the specified criteria.

Make sure your OpenSearch instance is correctly set up and running. You can use the official Docker image to get started with OpenSearch. Here is an example of how to set up and query documents in OpenSearch:

import { Client } from "@opensearch-project/opensearch";
import { Document } from "langchain/document";
import { OpenAIEmbeddings } from "@langchain/openai";
import { OpenSearchVectorStore } from "langchain/vectorstores/opensearch";

const client = new Client({
  nodes: [process.env.OPENSEARCH_URL ?? "http://127.0.0.1:9200"],
});
const docs = [
  new Document({
    metadata: { user: "U12345-1234" },
    pageContent: "Amazon Simple Storage Service (Amazon S3) is a cloud storage service that allows users to store and retrieve data of any amount from anywhere.",
  }),
  new Document({
    metadata: { user: "U12345-5678" },
    pageContent: "Amazon Elastic Compute Cloud (Amazon EC2) provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.",
  }),
];

await OpenSearchVectorStore.fromDocuments(docs, new OpenAIEmbeddings(), {
  client,
  indexName: 'my-index',
});

To query the documents:

const results = await vectorStore.similaritySearch(query, k, {
  fields: ["metadata.user"],
  searchOptions: {
    query: { field: "metadata.user", match: "U12345-1234" },
  },
});
console.log(JSON.stringify(results, null, 2));

Make sure your OpenSearch instance is running and properly configured to handle the metadata filtering [3][4][5][6].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Sep 23, 2024
@stevedaskam
Copy link
Author

@dosu When i run this, i still don't get any documents back

Copy link

dosubot bot commented Sep 23, 2024

The issue might be due to the structure or content of the OpenSearch index. The similaritySearch method with metadata filtering requires a well-structured OpenSearch index where documents are enriched with both text and metadata properties. If the index is not properly structured or the documents do not contain the expected metadata properties, the search may not return any documents.

Here is a sample code snippet that demonstrates how to use the similaritySearch method with metadata filtering:

import { Client } from "@opensearch-project/opensearch";
import { OpenAIEmbeddings } from "@langchain/openai";
import { OpenSearchVectorStore } from "@langchain/community/vectorstores/opensearch";

const client = new Client({
  nodes: ["http://localhost:9200"], // URL for the OpenSearch instance
});

const indexName = "test_index";

const embeddings = new OpenAIEmbeddings();
const store = new OpenSearchVectorStore(embeddings, { client, indexName });
await store.deleteIfExists();

await store.addDocuments([
  { pageContent: "hello", metadata: { a: 2 } },
  { pageContent: "car", metadata: { a: 1 } },
  { pageContent: "adjective", metadata: { a: 1 } },
  { pageContent: "hi", metadata: { a: 1 } },
]);

const filter = { a: 1 };
const results = await store.similaritySearch("hello!", 1, { filter });

console.log(results);

/*
  [ Document { pageContent: 'car', metadata: { a: 1 } } ]
*/

await store.close();

Ensure that:

  1. The OpenSearch index is properly structured.
  2. Documents contain the expected metadata properties.
  3. The filter criteria match the metadata properties in the index.

If these conditions are met and the issue persists, further investigation into the index structure and content may be necessary [1][2].

To continue talking to Dosu, mention @dosu.

@stevedaskam
Copy link
Author

stevedaskam commented Sep 24, 2024

@dosu The above code does not work as "term" queries no longer seem to return results from OpenSearch for metadata fields. Is there any way to bypass this and pass in my own filter query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant