Unstructured-IO · Paul-Cornell · Sep 18, 2025 · Sep 19, 2025 · Sep 24, 2025 · Sep 30, 2025
diff --git a/docs.json b/docs.json
@@ -115,6 +115,7 @@
             "pages": [
               "ui/document-elements",
               "ui/partitioning",
+              "ui/data-extractor",
               "ui/chunking",
               {
                 "group": "Enriching",

diff --git a/img/ui/data-extractor/house-plant-care.png b/img/ui/data-extractor/house-plant-care.png
diff --git a/img/ui/data-extractor/invoice.png b/img/ui/data-extractor/invoice.png
diff --git a/img/ui/data-extractor/medical-invoice.png b/img/ui/data-extractor/medical-invoice.png
diff --git a/snippets/general-shared-text/data-extractor.mdx b/snippets/general-shared-text/data-extractor.mdx
@@ -0,0 +1,30 @@
+1. In the **Schema settings** pane, for **Method**, choose one of the following to extract the source's data into a custom-defined, structured output format:
+
+   - Choose **LLM** to use a large language model (LLM).
+   - Choose **Regex** (or **JSONPath**) to use regular expressions (or JSONPath expressions).
+
+2. If you chose **LLM** under **Method**, then continue with step 3 in this procedure.
+
+   If you chose **Regex** (or **JSONPath**) under **Method** instead, then skip ahead to step 6 in this procedure.
+
+3. If you chose **LLM** under **Method**, then in the **Provider** and **Model** drop-down lists, choose the LLM provider and model that you want to use for the data extraction.
+4. For **Extraction fields**, do one of the following:
+
+   - Choose **Suggested** to start with a set of fields that the selected LLM has suggested for the data extraction. 
+     As needed, you can add, change, or delete any of these suggested fields' names, data types, descriptions, or their relationships to other fields within the same schema.
+   - Choose **Prompt** to provide an AI prompt to the selected LLM to use to generate a set of suggested fields for the data extraction. 
+     To generate the list of suggested fields, click **Generate schema** next to **Prompt**. 
+     As needed, you can add, change, or delete any of these suggested fields' names, data types, descriptions, or their relationships to other fields within the same schema. 
+   - Choose **Create** to manually specify the set of fields for the selected LLM to use for the data extraction. You can specify each field's name, data type, description, and its relationships to other fields within the same schema.
+
+5. Skip ahead to step 7 in this procedure.
+6. If you chose **Regex** (or **JSONPath**) under **Method**, then do one of the following:
+
+   - Choose **Suggested** to start with a set of fields that the default LLM has suggested for the data extraction. 
+     As needed, you can add, change, or delete any of these suggested fields' names, regular expressions (or JSONPath expressions), or their relationships to other fields within the same schema.
+   - Choose **Prompt** to provide an AI prompt to the default LLM to use to generate a set of suggested fields for the data extraction. 
+     To generate the list of suggested fields, click **Generate schema** next to **Prompt**. 
+     As needed, you can add, change, or delete any of these suggested fields' names, regular expressions (or JSONPath expressions), or their relationships to other fields within the same schema. 
+   - Choose **Create** to manually specify the set of fields for the default LLM to use for the data extraction. You can specify each field's name, regular expression (or JSONPath expression), and its relationships to other fields within the same schema.
+
+7. Click **Run** to extract the source's data into the custom-defined, structured output format.