You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**macOS/Windows/Linux/WSL with `uv pip install`**: `<path_to_venv>/lib/python3.12/site-packages/langflow/knowledge_bases` (Python version can vary. Knowledge bases aren't shared between virtual environments.)
69
-
-**macOS/Windows/Linux/WSL with `git clone`**: `<path_to_clone>/src/backend/base/langflow/knowledge_bases`
70
-
71
-
If you set the `LANGFLOW_CONFIG_DIR` environment variable, the `knowledge_bases` subdirectory is created relative to that path.
72
-
73
-
To change the default `knowledge_bases` directory path, set the `LANGFLOW_KNOWLEDGE_BASES_DIR` environment variable:
In this example, you'll create a knowledge base of chunked customer orders.
@@ -84,15 +57,16 @@ To follow along with this example, download [`customer-orders.csv`](/files/custo
84
57
1. On the [**Projects** page](/concepts-flows#projects) page, click <Iconname="Library"aria-hidden="true"/>**Knowledge** below the list of projects to view and manage your knowledge bases.
85
58
86
59
2. To create a new knowledge base, click <Iconname="Plus"aria-hidden="true"/>**Add Knowledge**.
87
-
3. In the **Create Knowledge Base** pane, enter a name for your knowledge base, and select an embedding model.
60
+
3. In the **Create Knowledge Base** pane, enter a name for your knowledge base, select an embedding model, and select a **DB Provider**.
88
61
<PartialGlobalModelProviders />
62
+
The **DB Provider** determines where embeddings are stored. It defaults to the provider configured in **Settings → DB Providers**. Existing knowledge bases keep their original backend — changing the global DB Provider only affects new knowledge bases.
89
63
4. To configure sources for your knowledge base, click **Configure Sources**.
90
64
Optionally, to create an empty knowledge base, click **Create**.
91
65
5. In the **Configure Sources** pane, configure the sources for your knowledge base's data, and also how the embedded data will be chunked for vector search retrieval.
92
66
For this example, click <Iconname="Upload"aria-hidden="true"/>**Add Sources**, and then select the downloaded [`customer-orders.csv`](/files/customer_orders.csv) file from your local machine.
93
67
The default settings for **Chunk Size**, **Chunk Overlap**, and **Separator** are fine.
94
68
To continue, click **Next Step**.
95
-
6. The **Review & Build** pane allows you to preview your first chunk before you commit to spending tokens to embedall of the data into the knowledge base.
69
+
6. The **Review & Build** pane allows you to preview your first chunk before you commit to spending tokens to embed all of the data into the knowledge base.
96
70
If the chunk isn't what you want to embed, click **Back** to configure your chunking strategy.
97
71
To embed this data, click **Create**.
98
72
7. Your data is embedded as a **Knowledge**.
@@ -113,6 +87,17 @@ For each knowledge base, you can see the following information:
113
87
* The average length and size of chunks
114
88
* The knowledge base's status
115
89
90
+
The icon next to the knowledge base name indicates the source file type:
91
+
92
+
* <Iconname="File"aria-hidden="true"/> Red — PDF
93
+
* <Iconname="FileChartColumn"aria-hidden="true"/> Green — CSV
94
+
* <Iconname="FileType"aria-hidden="true"/> Purple — plain text (`.txt`)
Chunking behavior is determined by the embedding model, and the embedding model is set when you create the knowledge base.
117
102
If you need to change the embedding model, you must delete and recreate the knowledge base.
118
103
@@ -125,6 +110,121 @@ If any flows use the deleted knowledge base, you must update them to use a diffe
125
110
126
111
For more information on using knowledge bases in a flow, see the [**Knowledge Base** component](/knowledge-base) documentation.
127
112
113
+
### Configure vector database providers
114
+
115
+
**DB Providers** are the vector databases where your knowledge bases store and search embeddings.
116
+
To configure these providers, go to **Settings → DB Providers**.
117
+
The selected provider applies to all new knowledge bases you create.
118
+
Existing knowledge bases continue to use the provider that was active when they were created.
119
+
120
+
#### Chroma (default)
121
+
122
+
By default, knowledge bases use [ChromaDB](https://docs.trychroma.com/docs/overview/introduction) as a local vector store, with no additional setup required.
123
+
Knowledge bases are stored local to your Langflow instance.
124
+
The default storage location depends on your operating system and installation method:
-**macOS/Windows/Linux/WSL with `uv pip install`**: `<path_to_venv>/lib/python3.12/site-packages/langflow/knowledge_bases` (Python version can vary. Knowledge bases aren't shared between virtual environments.)
131
+
-**macOS/Windows/Linux/WSL with `git clone`**: `<path_to_clone>/src/backend/base/langflow/knowledge_bases`
132
+
133
+
If you set the `LANGFLOW_CONFIG_DIR` environment variable, the `knowledge_bases` subdirectory is created relative to that path.
134
+
135
+
To change the default `knowledge_bases` directory path, set the `LANGFLOW_KNOWLEDGE_BASES_DIR` environment variable:
To use OpenSearch as a database provider, you need a running OpenSearch cluster that is accessible to your Langflow instance.
144
+
This example uses an OpenSearch container running locally, but you can also use a remote OpenSearch instance.
145
+
146
+
1. For this example, start a local OpenSearch container with security disabled. This allows you to connect without a username, password, or TLS. This configuration is for example purposes only; it _isn't_ recommended in production environments.
OpenSearch 3.x requires `OPENSEARCH_INITIAL_ADMIN_PASSWORD` to be set even when security is disabled.
161
+
162
+
If the password fails validation, container startup exits immediately with `Password failed validation`.
163
+
164
+
The password must adhere to the https://docs.opensearch.org/latest/security/configuration/demo-configuration/#setting-up-a-custom-admin-password[OpenSearch password complexity requirements].
165
+
:::
166
+
167
+
2. Verify the cluster is reachable:
168
+
169
+
```bash
170
+
curl -s http://localhost:9200
171
+
```
172
+
173
+
A successful response indicates that the container has started and can receive requests:
174
+
175
+
```json
176
+
{
177
+
"name" : "your-node-name",
178
+
"cluster_name" : "docker-cluster",
179
+
"version" : {
180
+
"distribution" : "opensearch",
181
+
"number" : "3.6.0"
182
+
},
183
+
"tagline" : "The OpenSearch Project: https://opensearch.org/"
184
+
}
185
+
```
186
+
187
+
If you get no response or a connection error, the container might still be starting. Wait a few seconds and try again.
188
+
189
+
3. To connect the OpenSearch database to Langflow as a knowledge base, click **Settings**, and then click **DB Providers**.
190
+
4. Select **OpenSearch**.
191
+
5. Enter the following values for the local OpenSearch container:
192
+
193
+
-**Cluster URL**: Enter `http://localhost:9200`.
194
+
-**Username**: Leave blank if security is disabled. Otherwise, enter your basic auth username.
195
+
-**Password**: Leave blank if security is disabled. Otherwise, enter your basic auth password.
196
+
-**Default Index name**: Enter `langflow_knowledge`. The OpenSearch index to write and read from. This index is created in the later ingestion step, so it isn't immediately available.
197
+
-**Vector field**: Enter `vector_field`. The document field for storing the embedding vector.
198
+
-**Text field**: Enter `text`. The document field for storing the chunk text.
199
+
-**Use TLS (HTTPS)**: Turn off. Enable if your cluster uses HTTPS.
200
+
-**Verify TLS certificate**: Turn off. Enable if your cluster uses CA-signed certificates.
201
+
202
+
6. Click **Save and Use OpenSearch**.
203
+
204
+
Optionally, click **Test Connection** to verify that Langflow can reach your OpenSearch cluster before saving.
205
+
206
+
The OpenSearch database is now connected to Langflow as a knowledge base, so you can create a knowledge base that stores its embeddings in OpenSearch.
207
+
208
+
7. Click <Iconname="Library"aria-hidden="true"/> **Knowledge**, and then click <Iconname="Plus"aria-hidden="true"/> **Add Knowledge**.
209
+
210
+
8. Enter a name for this knowledge base. The name can be anything, and doesn't need to match the OpenSearch index name.
211
+
The name becomes the internal label used to scope searches to this knowledge base within the shared OpenSearch index.
212
+
213
+
9. Select an embedding model.
214
+
When you create a knowledge base in Langflow, you can choose one of your configured embedding model providers. Once you create a knowledge base, you cannot change its provider unless you recreate the knowledge base. For more information, see [Embedding Model](/components-embedding-models).
215
+
216
+
10. Optional: Add **Custom Metadata Fields** to tag every chunk with additional context. For example, if you're ingesting files from multiple teams, add a field `team` with a value of `support`. When the **Knowledge Base** component searches, you can then filter results to only return chunks where `team` equals `support` to keep results scoped to the support team's content.
217
+
218
+
11. Click **Next Step**.
219
+
220
+
12. Add your source files and configure chunking settings, then click **Next Step**.
221
+
222
+
13. In the **Review & Build** pane, preview the first chunk of your data and confirm the chunk size is appropriate for your use case. A typical chunk size is 512–1000 characters. Smaller chunks support more granular retrieval but they can lose context across chunks.
223
+
224
+
14. Click **Create**.
225
+
226
+
The knowledge base is now available to use in a flow with the **Knowledge Ingestion** and **Knowledge Base** components.
Copy file name to clipboardExpand all lines: docs/docs/_partial-kb-summary.mdx
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,14 @@
1
-
A Langflow knowledge base is a local vector database that is stored in Langflow storage.
1
+
A Langflow knowledge base is a vector database that stores embeddings for use in your flows.
2
+
By default, knowledge bases use Chroma as a local vector store, but you can configure an external vector database provider such as OpenSearch.
3
+
For more information, see [Configure vector database providers](/knowledge#configure-vector-database-providers).
2
4
3
-
Because knowledge bases are local, the data isn't remotely requested and re-ingested with every flow run.
4
-
This can be more efficient than using a remote vector database, and it is a good choice for flows that use custom, domain-specific datasets, like slices of customer and product data.
5
+
Because knowledge bases don't re-ingest data with every flow run, they can be more efficient than using a remote vector database.
6
+
They are a good choice for flows that use custom, domain-specific datasets, like slices of customer and product data.
5
7
6
8
You can use knowledge base components in much the same way that you use vector store components.
7
9
However, there are several key differences:
8
10
9
-
***Local storage**: Langflow knowledge bases are exclusively local.
11
+
***Local storage by default**: Langflow knowledge bases use Chroma local storage by default.
10
12
In contrast, only some vector store components support local databases.
11
13
***Built-in embedding models**: Langflow knowledge bases include built-in support for several embedding models.
12
14
Other models aren't supported for use with knowledge bases.
0 commit comments