Skip to content

Commit b707c9a

Browse files
mendonkaimurphy
andauthored
docs: opensearch connector feature (#12998)
* docs-add-opensearch-provider-and-adjust-kb-docs * docs-combine-kb-config-sections-and-update-partial * add-release-note * Apply suggestions from code review Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com> * docs-clarify-embedding-model-step --------- Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>
1 parent 406b372 commit b707c9a

3 files changed

Lines changed: 143 additions & 33 deletions

File tree

docs/docs/Develop/knowledge.mdx

Lines changed: 129 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -49,33 +49,6 @@ import PartialKbSummary from '@site/docs/_partial-kb-summary.mdx';
4949

5050
<PartialKbSummary />
5151

52-
### Knowledge base storage locations
53-
54-
Each knowledge base is a [ChromaDB](https://docs.trychroma.com/docs/overview/introduction) vector database.
55-
Each database is stored in a separate directory that contains the following:
56-
57-
- **Vector embeddings**: Embeddings are stored using the Chroma vector database.
58-
- **Metadata files**: Configuration and embedding model information.
59-
- **Source data**: The original data used to create the knowledge base.
60-
61-
Knowledge bases are stored local to your Langflow instance.
62-
The default storage location depends on your operating system and installation method:
63-
64-
- **Langflow Desktop**:
65-
- **macOS**: `/Users/<username>/.langflow/knowledge_bases`
66-
- **Windows**: `C:\Users\<name>\AppData\Roaming\com.LangflowDesktop\knowledge_bases`
67-
- **Langflow OSS**:
68-
- **macOS/Windows/Linux/WSL with `uv pip install`**: `<path_to_venv>/lib/python3.12/site-packages/langflow/knowledge_bases` (Python version can vary. Knowledge bases aren't shared between virtual environments.)
69-
- **macOS/Windows/Linux/WSL with `git clone`**: `<path_to_clone>/src/backend/base/langflow/knowledge_bases`
70-
71-
If you set the `LANGFLOW_CONFIG_DIR` environment variable, the `knowledge_bases` subdirectory is created relative to that path.
72-
73-
To change the default `knowledge_bases` directory path, set the `LANGFLOW_KNOWLEDGE_BASES_DIR` environment variable:
74-
75-
```bash
76-
export LANGFLOW_KNOWLEDGE_BASES_DIR="/path/to/parent/directory"
77-
```
78-
7952
### Create a knowledge base
8053

8154
In this example, you'll create a knowledge base of chunked customer orders.
@@ -84,15 +57,16 @@ To follow along with this example, download [`customer-orders.csv`](/files/custo
8457
1. On the [**Projects** page](/concepts-flows#projects) page, click <Icon name="Library" aria-hidden="true"/>**Knowledge** below the list of projects to view and manage your knowledge bases.
8558

8659
2. To create a new knowledge base, click <Icon name="Plus" aria-hidden="true"/>**Add Knowledge**.
87-
3. In the **Create Knowledge Base** pane, enter a name for your knowledge base, and select an embedding model.
60+
3. In the **Create Knowledge Base** pane, enter a name for your knowledge base, select an embedding model, and select a **DB Provider**.
8861
<PartialGlobalModelProviders />
62+
The **DB Provider** determines where embeddings are stored. It defaults to the provider configured in **Settings → DB Providers**. Existing knowledge bases keep their original backend — changing the global DB Provider only affects new knowledge bases.
8963
4. To configure sources for your knowledge base, click **Configure Sources**.
9064
Optionally, to create an empty knowledge base, click **Create**.
9165
5. In the **Configure Sources** pane, configure the sources for your knowledge base's data, and also how the embedded data will be chunked for vector search retrieval.
9266
For this example, click <Icon name="Upload" aria-hidden="true"/>**Add Sources**, and then select the downloaded [`customer-orders.csv`](/files/customer_orders.csv) file from your local machine.
9367
The default settings for **Chunk Size**, **Chunk Overlap**, and **Separator** are fine.
9468
To continue, click **Next Step**.
95-
6. The **Review & Build** pane allows you to preview your first chunk before you commit to spending tokens to embedall of the data into the knowledge base.
69+
6. The **Review & Build** pane allows you to preview your first chunk before you commit to spending tokens to embed all of the data into the knowledge base.
9670
If the chunk isn't what you want to embed, click **Back** to configure your chunking strategy.
9771
To embed this data, click **Create**.
9872
7. Your data is embedded as a **Knowledge**.
@@ -113,6 +87,17 @@ For each knowledge base, you can see the following information:
11387
* The average length and size of chunks
11488
* The knowledge base's status
11589

90+
The icon next to the knowledge base name indicates the source file type:
91+
92+
* <Icon name="File" aria-hidden="true"/> Red — PDF
93+
* <Icon name="FileChartColumn" aria-hidden="true"/> Green — CSV
94+
* <Icon name="FileType" aria-hidden="true"/> Purple — plain text (`.txt`)
95+
* <Icon name="FileText" aria-hidden="true"/> Fuchsia — Markdown (`.md`, `.mdx`)
96+
* <Icon name="FileCode" aria-hidden="true"/> Yellow — HTML
97+
* <Icon name="FileCode" aria-hidden="true"/> Blue — code files (`.py`, `.js`, `.ts`)
98+
* <Icon name="FileJson" aria-hidden="true"/> Indigo — JSON
99+
* <Icon name="Layers" aria-hidden="true"/> — multiple source types
100+
116101
Chunking behavior is determined by the embedding model, and the embedding model is set when you create the knowledge base.
117102
If you need to change the embedding model, you must delete and recreate the knowledge base.
118103

@@ -125,6 +110,121 @@ If any flows use the deleted knowledge base, you must update them to use a diffe
125110

126111
For more information on using knowledge bases in a flow, see the [**Knowledge Base** component](/knowledge-base) documentation.
127112

113+
### Configure vector database providers
114+
115+
**DB Providers** are the vector databases where your knowledge bases store and search embeddings.
116+
To configure these providers, go to **Settings → DB Providers**.
117+
The selected provider applies to all new knowledge bases you create.
118+
Existing knowledge bases continue to use the provider that was active when they were created.
119+
120+
#### Chroma (default)
121+
122+
By default, knowledge bases use [ChromaDB](https://docs.trychroma.com/docs/overview/introduction) as a local vector store, with no additional setup required.
123+
Knowledge bases are stored local to your Langflow instance.
124+
The default storage location depends on your operating system and installation method:
125+
126+
- **Langflow Desktop**:
127+
- **macOS**: `/Users/<username>/.langflow/knowledge_bases`
128+
- **Windows**: `C:\Users\<name>\AppData\Roaming\com.LangflowDesktop\knowledge_bases`
129+
- **Langflow OSS**:
130+
- **macOS/Windows/Linux/WSL with `uv pip install`**: `<path_to_venv>/lib/python3.12/site-packages/langflow/knowledge_bases` (Python version can vary. Knowledge bases aren't shared between virtual environments.)
131+
- **macOS/Windows/Linux/WSL with `git clone`**: `<path_to_clone>/src/backend/base/langflow/knowledge_bases`
132+
133+
If you set the `LANGFLOW_CONFIG_DIR` environment variable, the `knowledge_bases` subdirectory is created relative to that path.
134+
135+
To change the default `knowledge_bases` directory path, set the `LANGFLOW_KNOWLEDGE_BASES_DIR` environment variable:
136+
137+
```bash
138+
export LANGFLOW_KNOWLEDGE_BASES_DIR="/path/to/parent/directory"
139+
```
140+
141+
#### OpenSearch
142+
143+
To use OpenSearch as a database provider, you need a running OpenSearch cluster that is accessible to your Langflow instance.
144+
This example uses an OpenSearch container running locally, but you can also use a remote OpenSearch instance.
145+
146+
1. For this example, start a local OpenSearch container with security disabled. This allows you to connect without a username, password, or TLS. This configuration is for example purposes only; it _isn't_ recommended in production environments.
147+
148+
```bash
149+
podman run -d \
150+
--name opensearch \
151+
-p 9200:9200 \
152+
-p 9600:9600 \
153+
-e "discovery.type=single-node" \
154+
-e "plugins.security.disabled=true" \
155+
-e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=YOUR_OPENSEARCH_PASSWORD" \
156+
opensearchproject/opensearch:latest
157+
```
158+
159+
:::note
160+
OpenSearch 3.x requires `OPENSEARCH_INITIAL_ADMIN_PASSWORD` to be set even when security is disabled.
161+
162+
If the password fails validation, container startup exits immediately with `Password failed validation`.
163+
164+
The password must adhere to the https://docs.opensearch.org/latest/security/configuration/demo-configuration/#setting-up-a-custom-admin-password[OpenSearch password complexity requirements].
165+
:::
166+
167+
2. Verify the cluster is reachable:
168+
169+
```bash
170+
curl -s http://localhost:9200
171+
```
172+
173+
A successful response indicates that the container has started and can receive requests:
174+
175+
```json
176+
{
177+
"name" : "your-node-name",
178+
"cluster_name" : "docker-cluster",
179+
"version" : {
180+
"distribution" : "opensearch",
181+
"number" : "3.6.0"
182+
},
183+
"tagline" : "The OpenSearch Project: https://opensearch.org/"
184+
}
185+
```
186+
187+
If you get no response or a connection error, the container might still be starting. Wait a few seconds and try again.
188+
189+
3. To connect the OpenSearch database to Langflow as a knowledge base, click **Settings**, and then click **DB Providers**.
190+
4. Select **OpenSearch**.
191+
5. Enter the following values for the local OpenSearch container:
192+
193+
- **Cluster URL**: Enter `http://localhost:9200`.
194+
- **Username**: Leave blank if security is disabled. Otherwise, enter your basic auth username.
195+
- **Password**: Leave blank if security is disabled. Otherwise, enter your basic auth password.
196+
- **Default Index name**: Enter `langflow_knowledge`. The OpenSearch index to write and read from. This index is created in the later ingestion step, so it isn't immediately available.
197+
- **Vector field**: Enter `vector_field`. The document field for storing the embedding vector.
198+
- **Text field**: Enter `text`. The document field for storing the chunk text.
199+
- **Use TLS (HTTPS)**: Turn off. Enable if your cluster uses HTTPS.
200+
- **Verify TLS certificate**: Turn off. Enable if your cluster uses CA-signed certificates.
201+
202+
6. Click **Save and Use OpenSearch**.
203+
204+
Optionally, click **Test Connection** to verify that Langflow can reach your OpenSearch cluster before saving.
205+
206+
The OpenSearch database is now connected to Langflow as a knowledge base, so you can create a knowledge base that stores its embeddings in OpenSearch.
207+
208+
7. Click <Icon name="Library" aria-hidden="true"/> **Knowledge**, and then click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**.
209+
210+
8. Enter a name for this knowledge base. The name can be anything, and doesn't need to match the OpenSearch index name.
211+
The name becomes the internal label used to scope searches to this knowledge base within the shared OpenSearch index.
212+
213+
9. Select an embedding model.
214+
When you create a knowledge base in Langflow, you can choose one of your configured embedding model providers. Once you create a knowledge base, you cannot change its provider unless you recreate the knowledge base. For more information, see [Embedding Model](/components-embedding-models).
215+
216+
10. Optional: Add **Custom Metadata Fields** to tag every chunk with additional context. For example, if you're ingesting files from multiple teams, add a field `team` with a value of `support`. When the **Knowledge Base** component searches, you can then filter results to only return chunks where `team` equals `support` to keep results scoped to the support team's content.
217+
218+
11. Click **Next Step**.
219+
220+
12. Add your source files and configure chunking settings, then click **Next Step**.
221+
222+
13. In the **Review & Build** pane, preview the first chunk of your data and confirm the chunk size is appropriate for your use case. A typical chunk size is 512–1000 characters. Smaller chunks support more granular retrieval but they can lose context across chunks.
223+
224+
14. Click **Create**.
225+
226+
The knowledge base is now available to use in a flow with the **Knowledge Ingestion** and **Knowledge Base** components.
227+
128228
## See also
129229

130230
* [Use Langflow agents](/agents)

docs/docs/Support/release-notes.mdx

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,14 @@ To avoid the impact of potential breaking changes and test new versions, the Lan
5151
Highlights of this release include the following changes.
5252
For all changes, see the [Changelog](https://github.com/langflow-ai/langflow/releases).
5353

54+
### New features and enhancements
55+
56+
- Database connectors for knowledge bases
57+
58+
Knowledge bases now support configurable vector database backends through **DB Providers** configured in **Settings → DB Providers**.
59+
60+
For setup instructions and configuration details, see [Manage vector data](/knowledge).
61+
5462
### Deprecations
5563

5664
- Voice mode is removed

docs/docs/_partial-kb-summary.mdx

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
1-
A Langflow knowledge base is a local vector database that is stored in Langflow storage.
1+
A Langflow knowledge base is a vector database that stores embeddings for use in your flows.
2+
By default, knowledge bases use Chroma as a local vector store, but you can configure an external vector database provider such as OpenSearch.
3+
For more information, see [Configure vector database providers](/knowledge#configure-vector-database-providers).
24

3-
Because knowledge bases are local, the data isn't remotely requested and re-ingested with every flow run.
4-
This can be more efficient than using a remote vector database, and it is a good choice for flows that use custom, domain-specific datasets, like slices of customer and product data.
5+
Because knowledge bases don't re-ingest data with every flow run, they can be more efficient than using a remote vector database.
6+
They are a good choice for flows that use custom, domain-specific datasets, like slices of customer and product data.
57

68
You can use knowledge base components in much the same way that you use vector store components.
79
However, there are several key differences:
810

9-
* **Local storage**: Langflow knowledge bases are exclusively local.
11+
* **Local storage by default**: Langflow knowledge bases use Chroma local storage by default.
1012
In contrast, only some vector store components support local databases.
1113
* **Built-in embedding models**: Langflow knowledge bases include built-in support for several embedding models.
1214
Other models aren't supported for use with knowledge bases.

0 commit comments

Comments
 (0)