You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/advanced/adding_data.mdx
+6-6
Original file line number
Diff line number
Diff line change
@@ -6,20 +6,20 @@ title: '➕ Adding Data'
6
6
7
7
- This step assumes that you have already created an `app` instance by either using `App`, `OpenSourceApp` or `CustomApp`. We are calling our app instance as `naval_chat_bot` 🤖
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."), add_config)
36
+
naval_chat_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."), config=add_config)
37
37
38
38
query_config = QueryConfig()
39
-
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))
39
+
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", config=query_config))
40
40
```
41
41
42
42
### Custom prompt template
@@ -53,7 +53,7 @@ einstein_chat_bot = App()
53
53
54
54
# Embed Wikipedia page
55
55
page = wikipedia.page("Albert Einstein")
56
-
einstein_chat_bot.add("text", page.content)
56
+
einstein_chat_bot.add(page.content)
57
57
58
58
# Example: use your own custom template with `$context` and `$query`
Copy file name to clipboardExpand all lines: docs/advanced/data_types.mdx
+47-18
Original file line number
Diff line number
Diff line change
@@ -2,22 +2,48 @@
2
2
title: '📋 Supported data formats'
3
3
---
4
4
5
-
Embedchain supports following data formats:
5
+
## Automatic data type detection
6
+
The add method automatically tries to detect the data_type, based on your input for the source argument. So `app.add('https://www.youtube.com/watch?v=dQw4w9WgXcQ')` is enough to embed a YouTube video.
7
+
8
+
This detection is implemented for all formats. It is based on factors such as whether it's a URL, a local file, the source data type, etc.
9
+
10
+
### Debugging automatic detection
11
+
12
+
13
+
Set `log_level=DEBUG` (in [AppConfig](http://localhost:3000/advanced/query_configuration#appconfig)) and make sure it's working as intended.
14
+
15
+
Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
16
+
17
+
### Forcing a data type
18
+
19
+
To omit any issues with the data type detection, you can **force** a data_type by adding it as a `add` method argument.
20
+
The examples below show you the keyword to force the respective `data_type`.
21
+
22
+
Forcing can also be used for edge cases, such as interpreting a sitemap as a web_page, for reading it's raw text instead of following links.
23
+
24
+
## Remote Data Types
25
+
26
+
<Tip>
27
+
**Use local files in remote data types**
28
+
29
+
Some data_types are meant for remote content and only work with URLs.
30
+
You can pass local files by formatting the path using the `file:`[URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme), e.g. `file:///info.pdf`.
31
+
</Tip>
6
32
7
33
### Youtube video
8
34
9
35
To add any youtube video to your app, use the data_type (first argument to `.add()` method) as `youtube_video`. Eg:
To supply your own text, use the data_type as `text` and enter a string. The text is not processed, this can be very versatile. Eg:
72
101
73
102
```python
74
-
app.add_local('text', 'Seek wealth, not money or status. Wealth is having assets that earn while you sleep. Money is how we transfer time and wealth. Status is your place in the social hierarchy.')
103
+
app.add('Seek wealth, not money or status. Wealth is having assets that earn while you sleep. Money is how we transfer time and wealth. Status is your place in the social hierarchy.', data_type='text')
75
104
```
76
105
77
106
Note: This is not used in the examples because in most cases you will supply a whole paragraph or file, which did not fit.
@@ -81,7 +110,7 @@ Note: This is not used in the examples because in most cases you will supply a w
81
110
To supply your own QnA pair, use the data_type as `qna_pair` and enter a tuple. Eg:
You can reuse the local index with the same code, but without adding new documents:
@@ -107,6 +136,6 @@ naval_chat_bot = App()
107
136
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
108
137
```
109
138
110
-
###More formats (coming soon!)
139
+
## More formats (coming soon!)
111
140
112
-
- If you want to add any other format, please create an [issue](https://github.com/embedchain/embedchain/issues) and we will add it to the list of supported formats.
141
+
- If you want to add any other format, please create an [issue](https://github.com/embedchain/embedchain/issues) and we will add it to the list of supported formats.
Copy file name to clipboardExpand all lines: docs/introduction.mdx
+8-8
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ description: '📝 Embedchain is a framework to easily create LLM powered bots o
7
7
8
8
Embedchain abstracts the entire process of loading a dataset, chunking it, creating embeddings, and storing it in a vector database.
9
9
10
-
You can add a single or multiple datasets using the .add and .add_local functions. Then, simply use the .query function to find answers from the added datasets.
10
+
You can add a single or multiple datasets using the `.add` method. Then, simply use the `.query` method to find answers from the added datasets.
11
11
12
12
If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you.
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
25
+
naval_chat_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
26
26
27
27
naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
28
28
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
@@ -32,7 +32,7 @@ naval_chat_bot.query("What unique capacity does Naval argue humans possess when
32
32
33
33
Creating a chat bot over any dataset involves the following steps:
34
34
35
-
1.Load the data
35
+
1.Detect the data type and load the data
36
36
2. Create meaningful chunks
37
37
3. Create embeddings for each chunk
38
38
4. Store the chunks in a vector database
@@ -53,4 +53,4 @@ The process of loading the dataset and querying involves multiple steps, each wi
53
53
54
54
Embedchain takes care of all these nuances and provides a simple interface to create bots over any dataset.
55
55
56
-
In the first release, we make it easier for anyone to get a chatbot over any dataset up and running in less than a minute. Just create an app instance, add the datasets using the `.add()` function, and use the `.query()` function to get the relevant answers.
56
+
In the first release, we make it easier for anyone to get a chatbot over any dataset up and running in less than a minute. Just create an app instance, add the datasets using the `.add` method, and use the `.query` method to get the relevant answers.
0 commit comments