Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion src/langsmith/evaluation-concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,15 @@ Generate additional examples from existing ones. Works best when starting with s

**Splits**

Partition datasets into subsets for targeted evaluation. Use splits for performance optimization (smaller splits for rapid iteration) and interpretability (evaluate different input types separately).
Splits are named subsets of a dataset used to segment examples into separate groups. Common patterns include:

- **ML-style splits**: divide examples into training, validation, and test sets to avoid overfitting, where a model performs well on training data but poorly on unseen data.
- **Category-based splits**: evaluate different input types separately when a dataset spans multiple task categories.
- **Staged rollout**: keep exploratory examples isolated until you're ready to include them in the main evaluation set.

Splits differ from metadata: use splits for high-level organizational grouping for evaluation, and metadata for per-example information such as tags and provenance.

In machine learning, best practice is for each example to belong to exactly one split. LangSmith allows examples to belong to multiple splits, which is useful when an example fits several evaluation categories.

Learn how to [create and manage dataset splits](/langsmith/manage-datasets-in-application#create-and-manage-dataset-splits).

Expand Down
Binary file removed src/langsmith/images/add-metadata.gif
Binary file not shown.
Binary file removed src/langsmith/images/add-to-dataset-from-aq.png
Binary file not shown.
Binary file removed src/langsmith/images/add-to..dataset.png
Binary file not shown.
Binary file removed src/langsmith/images/confirmation.png
Binary file not shown.
Binary file removed src/langsmith/images/multiselect-add-to-dataset.png
Binary file not shown.
Binary file removed src/langsmith/images/playground-dataset.png
Binary file not shown.
76 changes: 36 additions & 40 deletions src/langsmith/manage-datasets-in-application.mdx
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
---
title: Create and manage datasets in the UI
sidebarTitle: With the UI
icon: "browser"
---

[_Datasets_](/langsmith/evaluation-concepts#datasets) enable you to perform repeatable evaluations over time using consistent data. Datasets are made up of [_examples_](/langsmith/evaluation-concepts#examples), which store inputs, outputs, and optionally, reference outputs.

This page outlines the various methods for [creating](#create-a-dataset-and-add-examples) and [managing](#manage-a-dataset) datasets in the [LangSmith UI](https://smith.langchain.com).
This page outlines the various methods for [creating](#create-a-dataset-and-add-examples) and [managing](#manage-a-dataset) datasets in the [UI](https://smith.langchain.com).

## Create a dataset and add examples

Expand All @@ -24,58 +25,52 @@ The following sections explain the different ways you can create a dataset in La
A common pattern for constructing datasets is to convert notable traces from your application into dataset examples. This approach requires that you have [configured tracing to LangSmith](/langsmith/observability-concepts#tracing-configuration).

<Check>
A technique to build datasets is to filter the most interesting traces, such as traces that were tagged with poor user feedback, and add them to a dataset. For tips on how to filter traces, refer to [Filter traces](/langsmith/filter-traces-in-application) guide.
A technique to build datasets is to filter the most interesting traces, such as traces that were tagged with poor user feedback, and add them to a dataset. For tips on how to filter traces, refer to the [Filter traces](/langsmith/filter-traces-in-application) guide.
</Check>

There are two ways to add data manually from a tracing project to datasets. Navigate to **Tracing Projects** and select a project.

1. Multi-select runs from the runs table. On the **Runs** tab, multi-select runs. At the bottom of the page, click <Icon icon="database" /> **Add to Dataset**.
1. On the **Runs** tab, select a run from the table. On the individual run details page, select **Add to** -> **Dataset** in the top right corner.

![The Runs table with a run selected and the Add to Dataset button visible at the bottom of the page.](/langsmith/images/multiselect-add-to-dataset.png)

2. On the **Runs** tab, select a run from the table. On the individual run details page, select **Add to** -> **Dataset** in the top right corner.

![Add to dataset](/langsmith/images/add-to..dataset.png)

When you select a dataset from the run details page, a modal will pop up letting you know if any [transformations](/langsmith/dataset-transformations) were applied or if schema validation failed. For example, the screenshot below shows a dataset that is using transformations to optimize for collecting LLM runs.

![Confirmation](/langsmith/images/confirmation.png)
When you select a dataset from the run details page, a modal will pop up letting you know if any [transformations](/langsmith/dataset-transformations) were applied or if schema validation failed.

You can then optionally edit the run before adding it to the dataset.

### Automatically from a tracing project

You can use [run rules](/langsmith/rules) to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that are [tagged](/langsmith/observability-concepts#tags) with a specific use case or have a [low feedback score](/langsmith/observability-concepts#feedback).
You can use [run rules](/langsmith/rules) to add traces automatically to a dataset based on certain conditions. For example, you could add all traces that are [tagged](/langsmith/observability-concepts#tags) with a specific use case or have a [low feedback score](/langsmith/observability-concepts#feedback).

### From examples in an annotation queue

<Check>
If you rely on subject matter experts to build meaningful datasets, use [annotation queues](/langsmith/annotation-queues) to provide a streamlined view for reviewers. Human reviewers can optionally modify the inputs/outputs/reference outputs from a trace before it is added to the dataset.
</Check>

Annotation queues can be optionally configured with a default dataset, though you can add runs to any dataset by using the dataset switcher on the bottom of the screen. Once you select the right dataset, click **Add to Dataset** or hit the hot key `D` to add the run to it.
You can optionally configure annotation queues with a default dataset, though you can add runs to any dataset by using the dataset switcher on the bottom of the screen. Once you select the right dataset, click **Add to Dataset** or hit the hot key `D` to add the run to it.

Any modifications you make to the run in your annotation queue will carry over to the dataset, and all metadata associated with the run will also be copied.

![Add to dataset from annotation queue](/langsmith/images/add-to-dataset-from-aq.png)

Note you can also set up rules to add runs that meet specific criteria to an annotation queue using [automation rules](/langsmith/rules).
<Tip>
You can also set up rules to add runs that meet specific criteria to an annotation queue using [automation rules](/langsmith/rules).
</Tip>

### From the Prompt Playground

On the [**Prompt Playground**](/langsmith/observability-concepts#prompt-playground) page, select **Set up Evaluation**, click **+New** if you're starting a new dataset or select from an existing dataset.
On the [**Playground**](/langsmith/observability-concepts#prompt-playground) page:

<Note>
Creating datasets inline in the playground is not supported for datasets that have nested keys. In order to add/edit examples with nested keys, you must edit [from the datasets page](/langsmith/manage-datasets-in-application#from-the-datasets-page).
</Note>
1. Select **Set up Evaluation**.
1. Click **+New** if you're starting a new dataset or select from an existing dataset.

To edit the examples:
<Note>
Creating datasets inline in the playground is not supported for datasets that have nested keys. In order to add/edit examples with nested keys, you must edit [from the datasets page](/langsmith/manage-datasets-in-application#from-the-datasets-page).
</Note>

* Use **+Row** to add a new example to the dataset
* Delete an example using the **⋮** dropdown on the right hand side of the table
* If you're creating a reference-free dataset remove the "Reference Output" column using the **x** button in the column. Note: this action is not reversible.
1. Edit the examples:

![Create a dataset in the playground](/langsmith/images/playground-dataset.png)
- Use **+Row** to add a new example to the dataset.
- Delete an example using the **⋮** dropdown on the right-hand side of the table.
- If you're creating a reference-free dataset, remove the **Reference Output** column using the **x** button in the column. Note that this action is not reversible.

### Import a dataset from a CSV or JSONL file

Expand Down Expand Up @@ -103,7 +98,6 @@ In **Generate examples**, do the following:
1. Enter the number of synthetic examples you want to generate.
1. Click **Generate**.

<div style={{ textAlign: 'center' }}>
<img
className="block dark:hidden"
src="/langsmith/images/generate-synthetic-light.png"
Expand All @@ -115,12 +109,10 @@ In **Generate examples**, do the following:
src="/langsmith/images/generate-synthetic-dark.png"
alt="The AI-Generated Examples configuration window. Selections for manual and automatic and number of examples to generate."
/>
</div>

1. The examples will appear on the **Select generated examples** page. Choose which examples to add to your dataset, with the option to edit them before finalizing. Click **Save Examples**.
1. Each example will be validated against your specified dataset schema and tagged as **synthetic** in the source metadata.

<div style={{ textAlign: 'center' }}>
<img
className="block dark:hidden"
src="/langsmith/images/select-generated-examples-light.png"
Expand All @@ -132,17 +124,16 @@ In **Generate examples**, do the following:
src="/langsmith/images/select-generated-examples-dark.png"
alt="Select generated examples page with generated examples selected and Save examples button."
/>
</div>

## Manage a dataset

### Create a dataset schema

LangSmith datasets store arbitrary JSON objects. We recommend (but do not require) that you define a schema for your dataset to ensure that they conform to a specific JSON schema. Dataset schemas are defined with standard [JSON schema](https://json-schema.org/), with the addition of a few [prebuilt types](/langsmith/dataset-json-types) that make it easier to type common primitives like messages and tools.

Certain fields in your schema have a `+ Transformations` option. Transformations are preprocessing steps that, if enabled, update your examples when you add them to the dataset. For example the `convert to OpenAI messages` transformation will convert message-like objects, like LangChain messages, to OpenAI message format.
Certain fields in your schema have a `+ Transformations` option. Transformations are preprocessing steps that, if enabled, update your examples when you add them to the dataset. For example, the `convert to OpenAI messages` transformation will convert message-like objects, like LangChain messages, to OpenAI message format.

For the full list of available transformations, see [our reference](/langsmith/dataset-transformations).
For the full list of available transformations, refer to the [Dataset transformations reference](/langsmith/dataset-transformations).

<Note>
If you plan to collect production traces in your dataset from LangChain [ChatModels](https://python.langchain.com/do/langsmith/observability-concepts/chat_models/) or from OpenAI calls using the [LangSmith OpenAI wrapper](/langsmith/annotate-code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case.
Expand All @@ -152,27 +143,32 @@ Please see the [dataset transformations reference](/langsmith/dataset-transforma

### Create and manage dataset splits

Dataset splits are divisions of your dataset that you can use to segment your data. For example, it is common in machine learning workflows to split datasets into training, validation, and test sets. This can be useful to prevent overfitting - where a model performs well on the training data but poorly on unseen data. In evaluation workflows, it can be useful to do this when you have a dataset with multiple categories that you may want to evaluate separately; or if you are testing a new use case that you may want to include in your dataset in the future, but want to keep separate for now. Note that the same effect can be achieved manually via metadata - but we expect splits to be used for higher level organization of your dataset to split it into separate groups for evaluation, whereas metadata would be used more for storing information on your examples like tags and information about its origin.
For an overview of when and why to use splits, refer to [Dataset organization](/langsmith/evaluation-concepts#dataset-organization).

In machine learning, it is best practice to keep your splits separate (each example belongs to exactly one split). However, we allow you to select multiple splits for the same example in LangSmith because it can make sense for some evaluation workflows - for example, if an example falls into multiple categories on which you may want to evaluate your application.
To create and manage splits in the UI:

In order to create and manage splits in the app, you can select some examples in your dataset and click "Add to Split". From the resulting popup menu, you can select and unselect splits for the selected examples, or create a new split.
1. Select examples in your dataset.
1. Click **Add to Split**.
1. From the resulting popup menu, you can select and unselect splits for the selected examples, or create a new split.

![Add to Split](/langsmith/images/add-to-split2.png)

### Edit example metadata

You can add metadata to your examples by clicking on an example and then clicking "Edit" on the top righthand side of the popover. From this page, you can update/delete existing metadata, or add new metadata. You may use this to store information about your examples, such as tags or version info, which you can then [group by](/langsmith/analyze-an-experiment#group-results-by-metadata) when analyzing experiment results or [filter by](/langsmith/manage-datasets-programmatically#list-examples-by-metadata) when you call `list_examples` in the SDK.
To add metadata to your examples:

1. Click on an example and then click **Edit** on the top right-hand side of the popover.
1. From this page, update or delete existing metadata, or add new metadata.

![Add Metadata](/langsmith/images/add-metadata.gif)
You may use this to store information about your examples, such as tags or version info, which you can then [group by](/langsmith/analyze-an-experiment#group-results-by-metadata) when analyzing experiment results or [filter by](/langsmith/manage-datasets-programmatically#list-examples-by-metadata) when you call `list_examples` in the SDK.

### Filter examples

You can filter examples by split, metadata key/value or perform full-text search over examples. These filtering options are available to the top left of the examples table.
You can filter examples by split, metadata key/value or perform full-text search over examples. These filtering options are available to the top left of the examples table:

* **Filter by split**: Select split > Select a split to filter by
* **Filter by metadata**: Filters > Select "Metadata" from the dropdown > Select the metadata key and value to filter on
* **Full-text search**: Filters > Select "Full Text" from the dropdown > Enter your search criteria
- **Filter by split**: Select split > Select a split to filter by.
- **Filter by metadata**: Filters > Select **Metadata** from the dropdown > Select the metadata key and value to filter on.
- **Full-text search**: Filters > Select **Full Text** from the dropdown > Enter your search criteria.

You may add multiple filters, and only examples that satisfy all of the filters will be displayed in the table.

Expand Down