Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@
"pages": [
"ui/document-elements",
"ui/partitioning",
"ui/data-extractor",
"ui/chunking",
{
"group": "Enriching",
Expand Down
Binary file added img/ui/data-extractor/house-plant-care.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/data-extractor/invoice.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/data-extractor/medical-invoice.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions snippets/general-shared-text/data-extractor.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
1. In the **Schema settings** pane, for **Method**, choose one of the following to extract the source's data into a custom-defined, structured output format:

- Choose **LLM** to use a large language model (LLM).
- Choose **Regex** (or **JSONPath**) to use regular expressions (or JSONPath expressions).

2. If you chose **LLM** under **Method**, then continue with step 3 in this procedure.

If you chose **Regex** (or **JSONPath**) under **Method** instead, then skip ahead to step 6 in this procedure.

3. If you chose **LLM** under **Method**, then in the **Provider** and **Model** drop-down lists, choose the LLM provider and model that you want to use for the data extraction.
4. For **Extraction fields**, do one of the following:

- Choose **Suggested** to start with a set of fields that the selected LLM has suggested for the data extraction.
As needed, you can add, change, or delete any of these suggested fields' names, data types, descriptions, or their relationships to other fields within the same schema.
- Choose **Prompt** to provide an AI prompt to the selected LLM to use to generate a set of suggested fields for the data extraction.
To generate the list of suggested fields, click **Generate schema** next to **Prompt**.
As needed, you can add, change, or delete any of these suggested fields' names, data types, descriptions, or their relationships to other fields within the same schema.
- Choose **Create** to manually specify the set of fields for the selected LLM to use for the data extraction. You can specify each field's name, data type, description, and its relationships to other fields within the same schema.

5. Skip ahead to step 7 in this procedure.
6. If you chose **Regex** (or **JSONPath**) under **Method**, then do one of the following:

- Choose **Suggested** to start with a set of fields that the default LLM has suggested for the data extraction.
As needed, you can add, change, or delete any of these suggested fields' names, regular expressions (or JSONPath expressions), or their relationships to other fields within the same schema.
- Choose **Prompt** to provide an AI prompt to the default LLM to use to generate a set of suggested fields for the data extraction.
To generate the list of suggested fields, click **Generate schema** next to **Prompt**.
As needed, you can add, change, or delete any of these suggested fields' names, regular expressions (or JSONPath expressions), or their relationships to other fields within the same schema.
- Choose **Create** to manually specify the set of fields for the default LLM to use for the data extraction. You can specify each field's name, regular expression (or JSONPath expression), and its relationships to other fields within the same schema.

7. Click **Run** to extract the source's data into the custom-defined, structured output format.
Loading