Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 48 additions & 17 deletions src/langsmith/evaluate-with-attachments.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,19 @@ title: Run an evaluation with multimodal content
sidebarTitle: Run an evaluation with multimodal content
---

LangSmith lets you create dataset examples with file attachments—like images, audio files, or documents—so you can reference them when evaluating an application that uses multimodal inputs or outputs.
LangSmith lets you create dataset examples with file attachments—like images, audio files, or documents—and use them in your prompts and evaluators when running evaluations with multimodal content.

While you can include multimodal data in your examples by base64 encoding it, this approach is inefficient - the encoded data takes up more space than the original binary files, resulting in slower transfers to and from LangSmith. Using attachments instead provides two key benefits:
While you can include multimodal data in your examples by base64 encoding it, this approach is inefficientthe encoded data takes up more space than the original binary files, resulting in slower transfers to and from LangSmith. Using attachments instead provides two key benefits:

1. Faster upload and download speeds due to more efficient binary file transfers
2. Enhanced visualization of different file types in the LangSmith UI
1. Enhanced visualization of different file types in the LangSmith UI

## SDK
This guide covers how to create examples with attachments, build multimodal prompts and evaluators that use those attachments, and run evaluations with multimodal content.

### 1. Create examples with attachments
<Tabs>
<Tab title="SDK" icon="code">

## 1. Create examples with attachments

To upload examples with attachments using the SDK, use the [create_examples](https://docs.smith.langchain.com/reference/python/client/langsmith.client.Client#langsmith.client.Client.create_examples) / [update_examples](https://docs.smith.langchain.com/reference/python/client/langsmith.client.Client#langsmith.client.Client.update_examples) Python methods or the [uploadExamplesMultipart](https://docs.smith.langchain.com/reference/js/classes/client.Client#uploadexamplesmultipart) / [updateExamplesMultipart](https://docs.smith.langchain.com/reference/js/classes/client.Client#updateexamplesmultipart) TypeScript methods.

Expand Down Expand Up @@ -430,7 +433,7 @@ const resp = await evaluate(fileQA, {

</CodeGroup>

## Update examples with attachments
## 3. Update examples with attachments

In the code above, we showed how to add examples with attachments to a dataset. It is also possible to update these same examples using the SDK.

Expand Down Expand Up @@ -503,9 +506,10 @@ await langsmithClient.updateExamplesMultipart(dataset.id, [exampleUpdate]);

</CodeGroup>

## UI
</Tab>
<Tab title="UI" icon="click">

### 1. Create examples with attachments
## 1. Create examples with attachments

You can add examples with attachments to a dataset in a few different ways.

Expand All @@ -523,7 +527,7 @@ You can create examples with attachments directly from the LangSmith UI. Click t

Once uploaded, you can view examples with attachments in the LangSmith UI. Each attachment will be rendered with a preview for easy inspection. ![Attachments with examples](/langsmith/images/attachments-with-examples.png)

### 2. Create a multimodal prompt
## 2. Create a multimodal prompt

The LangSmith UI allows you to include attachments in your prompts when evaluating multimodal models:

Expand All @@ -534,20 +538,44 @@ First, click the file icon in the message where you want to add multimodal conte

![Adding multimodal variable](/langsmith/images/adding-multimodal-variable.gif)

### Define custom evaluators
## 3. Define custom evaluators

<Note>
The LangSmith playground does not currently support pulling multimodal content into evaluators. If this would be helpful for your use case, please let us know in the [LangChain Forum](https://forum.langchain.com/) (sign up [here](https://www.langchain.com/join-community) if you're not already a member)!
</Note>
You can create evaluators that use multimodal content from your dataset examples.

Since your dataset already has examples with attachments (added in step 1), you can reference them directly in your evaluator. To do so:

1. Select **+ Evaluator** from the dataset page.
1. In the **Template variables** editor, add a variable for the attachment(s) to include:
- For a single attachment type: Use the suggested variable name. All examples must have an attachment with this name.
- For multiple attachments or if attachment names vary across examples: Use the `All attachments` variable to include all available attachments for each example.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely accurate in temrs of recommednation - I would say something like:

  • if you want to include all attachmetns, use the {{attachments}} evaluator
  • if you want to include a speicifc attachment, you can use the suggested variable name, such as {{attachment.file_name}}, this will map the file with file_name in the attachment list to passs it to the evaluator

(this is bc for 1 singel attachemnt, they can still do {{attachments}} which is probably easiest.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


You can evaluate a model's text output by adding an evaluator that takes in the example's inputs and outputs. Even without multimodal support in your evaluators, you can still run text-only evaluations. For example:
<img
className="block dark:hidden"
src="/langsmith/images/evaluator-attach-file-light.png"
alt="Create evaluator modal with an audio attachment selected for output variable."
/>

* OCR → text correction: Use a vision model to extract text from a document, then evaluate the accuracy of the extracted output.
* Speech-to-text → transcription quality: Use a voice model to transcribe audio to text, then evaluate the transcription against your reference.
<img
className="hidden dark:block"
src="/langsmith/images/evaluator-attach-file-dark.png"
alt="Create evaluator modal with an audio attachment selected for output variable."
/>


The evaluator can then use these attachments along with the model's outputs to judge quality. For example, you could create an evaluator that:

- Checks if an image description matches the actual image content.
- Verifies if a transcription accurately reflects the audio.
- Validates if extracted text from a PDF is correct.

You can also create text-only evaluators that don't use attachments but evaluate the model's text output:

- OCR → text correction: Use a vision model to extract text from a document, then evaluate the accuracy of the extracted output.
- Speech-to-text → transcription quality: Use a voice model to transcribe audio to text, then evaluate the transcription against your reference.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a component on how it will take in any base64 components ?

Not all customers will use attachments to render traces, so we allow you to pass in any base64 format. For example, if part of their input / output is in base64 format, if they follow this guide here: https://docs.langchain.com/langsmith/log-multimodal-traces to set up their traces correctly, they will not only be able to visualize them in the UI, if i do variable mapping with {{input}} in the evaluator, they will also be passed correctly to the LLM that supports. I know this section is about atachments, but I think this may be worth calling out here.

We can use the images that I sent in the slack

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


For more information on defining custom evaluators, see the [LLM as Judge](/langsmith/llm-as-judge) guide.

### Update examples with attachments
## 4. Update examples with attachments

<Note>
Attachments are limited to 20MB in size in the UI.
Expand All @@ -562,3 +590,6 @@ When editing an example in the UI, you can:
Changes are not saved until you click submit.

![Attachment editing](/langsmith/images/attachment-editing.gif)

</Tab>
</Tabs>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 24 additions & 6 deletions src/langsmith/online-evaluations-llm-as-judge.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ In the [LangSmith UI](https://smith.langchian.com), head to the **Tracing Projec

## Configure online evaluators

#### 1. Navigate to online evaluators
### 1. Navigate to online evaluators

Head to the **Tracing Projects** tab and select a tracing project. Click on **+ New** in the top right corner of the tracing project page, then click on **New Evaluator**. Select the evaluator you want to configure.

#### 2. Name your evaluator
### 2. Name your evaluator

#### 3. Create a filter
### 3. Create a filter

For example, you may want to apply specific evaluators based on:

Expand All @@ -37,11 +37,11 @@ Filters on evaluators work the same way as when you're filtering traces in a pro
It's often helpful to inspect runs as you're creating a filter for your evaluator. With the evaluator configuration panel open, you can inspect runs and apply filters to them. Any filters you apply to the runs table will automatically be reflected in filters on your evaluator.
</Tip>

#### 4. (Optional) Configure a sampling rate
### 4. (Optional) Configure a sampling rate

Configure a sampling rate to control the percentage of filtered runs that trigger the automation action. For example, to control costs, you may want to set a filter to only apply the evaluator to 10% of traces. In order to do this, you would set the sampling rate to 0.1.

#### 5. (Optional) Apply rule to past runs
### 5. (Optional) Apply rule to past runs

Apply rule to past runs by toggling the **Apply to past runs** and entering a "Backfill from" date. This is only possible upon rule creation.

Expand All @@ -55,10 +55,28 @@ In order to track progress of the backfill, you can view logs for your evaluator
- Optionally filter runs that you would like to apply your evaluator on or configure a sampling rate.
- Select **Apply Evaluator**.

#### 6. Configure the LLM-as-a-judge evaluator
### 6. Configure the LLM-as-a-judge evaluator

View this guide to configure an [LLM-as-a-judge evaluator](/langsmith/llm-as-judge?mode=ui#pre-built-evaluators-1).

### 7. (Optional) Map multimodal content to evaluator

If your traces contain multimodal content like images, audio, or documents, you can include this content in your evaluator prompts. There are two approaches:
Comment thread
katmayb marked this conversation as resolved.

- **Using base64-encoded content from traces**: If your application logs multimodal content as base64-encoded data in the trace (for example, in the input or output of a run), you can reference this content directly in your evaluator prompt using template variables. The evaluator will extract the base64 data from the trace and pass it to the LLM.
Comment thread
katmayb marked this conversation as resolved.
- **Using attachments from traces**: Similar to [offline evaluations with attachments](/langsmith/evaluate-with-attachments), you can use attachments from your traces in online evaluations. Since your traces already include attachments logged via the SDK, you can reference them directly in your evaluator. To do so:

1. In your evaluator configuration, click the file icon in the evaluator message where you want to add multimodal content.
1. In the **Template variables** tab, add a variable for the attachment(s) to include:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this section to mirror the above

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

- For a single attachment type: Use the suggested variable name. All traces must have an attachment with this name.
- For multiple attachments or if attachment names vary across traces: Use the `All attachments` variable to include all available attachments for each trace.

The evaluator can then access these attachments when evaluating the trace. This is useful for evaluators that need to:

- Verify if an image description matches the actual image in the trace.
- Check if a transcription accurately reflects the audio input.
- Validate if extracted text from a document is correct.

## Video guide
<iframe
className="w-full aspect-video rounded-xl"
Expand Down