docs: opensearch connector feature (#12998)

mendonk · aimurphy · web-flow · commit b707c9a41dfb · 2026-05-07T18:28:04.000Z
* docs-add-opensearch-provider-and-adjust-kb-docs

* docs-combine-kb-config-sections-and-update-partial

* add-release-note

* Apply suggestions from code review

Co-authored-by: Mendon Kissling &lt;59585235+mendonk@users.noreply.github.com&gt;

* Apply suggestions from code review

Co-authored-by: April I. Murphy &lt;36110273+aimurphy@users.noreply.github.com&gt;

* docs-clarify-embedding-model-step

---------

Co-authored-by: April I. Murphy &lt;36110273+aimurphy@users.noreply.github.com&gt;
diff --git a/docs/docs/Develop/knowledge.mdx b/docs/docs/Develop/knowledge.mdx
@@ -49,33 +49,6 @@ import PartialKbSummary from '@site/docs/_partial-kb-summary.mdx';
 
 <PartialKbSummary />
 
-### Knowledge base storage locations
-
-Each knowledge base is a [ChromaDB](https://docs.trychroma.com/docs/overview/introduction) vector database.
-Each database is stored in a separate directory that contains the following:
-
-- **Vector embeddings**: Embeddings are stored using the Chroma vector database.
-- **Metadata files**: Configuration and embedding model information.
-- **Source data**: The original data used to create the knowledge base.
-
-Knowledge bases are stored local to your Langflow instance.
-The default storage location depends on your operating system and installation method:
-
-- **Langflow Desktop**:
-    - **macOS**: `/Users/<username>/.langflow/knowledge_bases`
-    - **Windows**: `C:\Users\<name>\AppData\Roaming\com.LangflowDesktop\knowledge_bases`
-- **Langflow OSS**:
-    - **macOS/Windows/Linux/WSL with `uv pip install`**: `<path_to_venv>/lib/python3.12/site-packages/langflow/knowledge_bases` (Python version can vary. Knowledge bases aren't shared between virtual environments.)
-    - **macOS/Windows/Linux/WSL with `git clone`**: `<path_to_clone>/src/backend/base/langflow/knowledge_bases`
-
-If you set the `LANGFLOW_CONFIG_DIR` environment variable, the `knowledge_bases` subdirectory is created relative to that path.
-
-To change the default `knowledge_bases` directory path, set the `LANGFLOW_KNOWLEDGE_BASES_DIR` environment variable:
-
-```bash
-export LANGFLOW_KNOWLEDGE_BASES_DIR="/path/to/parent/directory"
-```
-
 ### Create a knowledge base
 
 In this example, you'll create a knowledge base of chunked customer orders.
@@ -84,15 +57,16 @@ To follow along with this example, download [`customer-orders.csv`](/files/custo
 1. On the [**Projects** page](/concepts-flows#projects) page, click <Icon name="Library" aria-hidden="true"/>**Knowledge** below the list of projects to view and manage your knowledge bases.
 
 2. To create a new knowledge base, click <Icon name="Plus" aria-hidden="true"/>**Add Knowledge**.
-3. In the **Create Knowledge Base** pane, enter a name for your knowledge base, and select an embedding model.
+3. In the **Create Knowledge Base** pane, enter a name for your knowledge base, select an embedding model, and select a **DB Provider**.
     <PartialGlobalModelProviders />
+    The **DB Provider** determines where embeddings are stored. It defaults to the provider configured in **Settings → DB Providers**. Existing knowledge bases keep their original backend — changing the global DB Provider only affects new knowledge bases.
 4. To configure sources for your knowledge base, click **Configure Sources**.
 Optionally, to create an empty knowledge base, click **Create**.
 5. In the **Configure Sources** pane, configure the sources for your knowledge base's data, and also how the embedded data will be chunked for vector search retrieval.
     For this example, click <Icon name="Upload" aria-hidden="true"/>**Add Sources**, and then select the downloaded [`customer-orders.csv`](/files/customer_orders.csv) file from your local machine.
     The default settings for **Chunk Size**, **Chunk Overlap**, and **Separator** are fine.
     To continue, click **Next Step**.
-6. The **Review & Build** pane allows you to preview your first chunk before you commit to spending tokens to embedall of the data into the knowledge base.
+6. The **Review & Build** pane allows you to preview your first chunk before you commit to spending tokens to embed all of the data into the knowledge base.
     If the chunk isn't what you want to embed, click **Back** to configure your chunking strategy.
     To embed this data, click **Create**.
 7. Your data is embedded as a **Knowledge**.
@@ -113,6 +87,17 @@ For each knowledge base, you can see the following information:
 * The average length and size of chunks
 * The knowledge base's status
 
+The icon next to the knowledge base name indicates the source file type:
+
+* <Icon name="File" aria-hidden="true"/> Red — PDF
+* <Icon name="FileChartColumn" aria-hidden="true"/> Green — CSV
+* <Icon name="FileType" aria-hidden="true"/> Purple — plain text (`.txt`)
+* <Icon name="FileText" aria-hidden="true"/> Fuchsia — Markdown (`.md`, `.mdx`)
+* <Icon name="FileCode" aria-hidden="true"/> Yellow — HTML
+* <Icon name="FileCode" aria-hidden="true"/> Blue — code files (`.py`, `.js`, `.ts`)
+* <Icon name="FileJson" aria-hidden="true"/> Indigo — JSON
+* <Icon name="Layers" aria-hidden="true"/> — multiple source types
+
 Chunking behavior is determined by the embedding model, and the embedding model is set when you create the knowledge base.
 If you need to change the embedding model, you must delete and recreate the knowledge base.
 
@@ -125,6 +110,121 @@ If any flows use the deleted knowledge base, you must update them to use a diffe
 
 For more information on using knowledge bases in a flow, see the [**Knowledge Base** component](/knowledge-base) documentation.
 
+### Configure vector database providers
+
+**DB Providers** are the vector databases where your knowledge bases store and search embeddings.
+To configure these providers, go to **Settings → DB Providers**.
+The selected provider applies to all new knowledge bases you create.
+Existing knowledge bases continue to use the provider that was active when they were created.
+
+#### Chroma (default)
+
+By default, knowledge bases use [ChromaDB](https://docs.trychroma.com/docs/overview/introduction) as a local vector store, with no additional setup required.
+Knowledge bases are stored local to your Langflow instance.
+The default storage location depends on your operating system and installation method:
+
+- **Langflow Desktop**:
+    - **macOS**: `/Users/<username>/.langflow/knowledge_bases`
+    - **Windows**: `C:\Users\<name>\AppData\Roaming\com.LangflowDesktop\knowledge_bases`
+- **Langflow OSS**:
+    - **macOS/Windows/Linux/WSL with `uv pip install`**: `<path_to_venv>/lib/python3.12/site-packages/langflow/knowledge_bases` (Python version can vary. Knowledge bases aren't shared between virtual environments.)
+    - **macOS/Windows/Linux/WSL with `git clone`**: `<path_to_clone>/src/backend/base/langflow/knowledge_bases`
+
+If you set the `LANGFLOW_CONFIG_DIR` environment variable, the `knowledge_bases` subdirectory is created relative to that path.
+
+To change the default `knowledge_bases` directory path, set the `LANGFLOW_KNOWLEDGE_BASES_DIR` environment variable:
+
+```bash
+export LANGFLOW_KNOWLEDGE_BASES_DIR="/path/to/parent/directory"
+```
+
+#### OpenSearch
+
+To use OpenSearch as a database provider, you need a running OpenSearch cluster that is accessible to your Langflow instance.
+This example uses an OpenSearch container running locally, but you can also use a remote OpenSearch instance.
+
+1. For this example, start a local OpenSearch container with security disabled. This allows you to connect without a username, password, or TLS. This configuration is for example purposes only; it _isn't_ recommended in production environments.
+
+    ```bash
+    podman run -d \
+      --name opensearch \
+      -p 9200:9200 \
+      -p 9600:9600 \
+      -e "discovery.type=single-node" \
+      -e "plugins.security.disabled=true" \
+      -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=YOUR_OPENSEARCH_PASSWORD" \
+      opensearchproject/opensearch:latest
+    ```
+
+    :::note
+    OpenSearch 3.x requires `OPENSEARCH_INITIAL_ADMIN_PASSWORD` to be set even when security is disabled.
+
+    If the password fails validation, container startup exits immediately with `Password failed validation`.
+
+    The password must adhere to the https://docs.opensearch.org/latest/security/configuration/demo-configuration/#setting-up-a-custom-admin-password[OpenSearch password complexity requirements].
+    :::
+
+2. Verify the cluster is reachable:
+
+    ```bash
+    curl -s http://localhost:9200
+    ```
+
+    A successful response indicates that the container has started and can receive requests:
+
+    ```json
+    {
+      "name" : "your-node-name",
+      "cluster_name" : "docker-cluster",
+      "version" : {
+        "distribution" : "opensearch",
+        "number" : "3.6.0"
+      },
+      "tagline" : "The OpenSearch Project: https://opensearch.org/"
+    }
+    ```
+
+    If you get no response or a connection error, the container might still be starting. Wait a few seconds and try again.
+
+3. To connect the OpenSearch database to Langflow as a knowledge base, click **Settings**, and then click **DB Providers**.
+4. Select **OpenSearch**.
+5. Enter the following values for the local OpenSearch container:
+
+    - **Cluster URL**: Enter `http://localhost:9200`.
+    - **Username**: Leave blank if security is disabled. Otherwise, enter your basic auth username.
+    - **Password**: Leave blank if security is disabled. Otherwise, enter your basic auth password.
+    - **Default Index name**: Enter `langflow_knowledge`. The OpenSearch index to write and read from. This index is created in the later ingestion step, so it isn't immediately available.
+    - **Vector field**: Enter `vector_field`. The document field for storing the embedding vector.
+    - **Text field**: Enter `text`. The document field for storing the chunk text.
+    - **Use TLS (HTTPS)**: Turn off. Enable if your cluster uses HTTPS.
+    - **Verify TLS certificate**: Turn off. Enable if your cluster uses CA-signed certificates.
+
+6. Click **Save and Use OpenSearch**.
+
+    Optionally, click **Test Connection** to verify that Langflow can reach your OpenSearch cluster before saving.
+
+    The OpenSearch database is now connected to Langflow as a knowledge base, so you can create a knowledge base that stores its embeddings in OpenSearch.
+
+7. Click <Icon name="Library" aria-hidden="true"/> **Knowledge**, and then click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**.
+
+8. Enter a name for this knowledge base. The name can be anything, and doesn't need to match the OpenSearch index name.
+The name becomes the internal label used to scope searches to this knowledge base within the shared OpenSearch index.
+
+9. Select an embedding model.
+When you create a knowledge base in Langflow, you can choose one of your configured embedding model providers. Once you create a knowledge base, you cannot change its provider unless you recreate the knowledge base. For more information, see [Embedding Model](/components-embedding-models).
+
+10. Optional: Add **Custom Metadata Fields** to tag every chunk with additional context. For example, if you're ingesting files from multiple teams, add a field `team` with a value of `support`. When the **Knowledge Base** component searches, you can then filter results to only return chunks where `team` equals `support` to keep results scoped to the support team's content.
+
+11. Click **Next Step**.
+
+12. Add your source files and configure chunking settings, then click **Next Step**.
+
+13. In the **Review & Build** pane, preview the first chunk of your data and confirm the chunk size is appropriate for your use case. A typical chunk size is 512–1000 characters. Smaller chunks support more granular retrieval but they can lose context across chunks.
+
+14. Click **Create**.
+
+The knowledge base is now available to use in a flow with the **Knowledge Ingestion** and **Knowledge Base** components.
+
 ## See also
 
 * [Use Langflow agents](/agents)
diff --git a/docs/docs/Support/release-notes.mdx b/docs/docs/Support/release-notes.mdx
@@ -51,6 +51,14 @@ To avoid the impact of potential breaking changes and test new versions, the Lan
 Highlights of this release include the following changes.
 For all changes, see the [Changelog](https://github.com/langflow-ai/langflow/releases).
 
+### New features and enhancements
+
+- Database connectors for knowledge bases
+
+    Knowledge bases now support configurable vector database backends through **DB Providers** configured in **Settings → DB Providers**.
+
+    For setup instructions and configuration details, see [Manage vector data](/knowledge).
+
 ### Deprecations
 
 - Voice mode is removed
diff --git a/docs/docs/_partial-kb-summary.mdx b/docs/docs/_partial-kb-summary.mdx
@@ -1,12 +1,14 @@
-A Langflow knowledge base is a local vector database that is stored in Langflow storage.
+A Langflow knowledge base is a vector database that stores embeddings for use in your flows.
+By default, knowledge bases use Chroma as a local vector store, but you can configure an external vector database provider such as OpenSearch.
+For more information, see [Configure vector database providers](/knowledge#configure-vector-database-providers).
 
-Because knowledge bases are local, the data isn't remotely requested and re-ingested with every flow run.
-This can be more efficient than using a remote vector database, and it is a good choice for flows that use custom, domain-specific datasets, like slices of customer and product data.
+Because knowledge bases don't re-ingest data with every flow run, they can be more efficient than using a remote vector database.
+They are a good choice for flows that use custom, domain-specific datasets, like slices of customer and product data.
 
 You can use knowledge base components in much the same way that you use vector store components.
 However, there are several key differences:
 
-* **Local storage**: Langflow knowledge bases are exclusively local.
+* **Local storage by default**: Langflow knowledge bases use Chroma local storage by default.
 In contrast, only some vector store components support local databases.
 * **Built-in embedding models**: Langflow knowledge bases include built-in support for several embedding models.
 Other models aren't supported for use with knowledge bases.