-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: multimodal content in evaluators #2876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
c785d38
b89e17c
1de59b9
8324c0e
e4fa082
6d229ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,16 +3,19 @@ title: Run an evaluation with multimodal content | |
| sidebarTitle: Run an evaluation with multimodal content | ||
| --- | ||
|
|
||
| LangSmith lets you create dataset examples with file attachments—like images, audio files, or documents—so you can reference them when evaluating an application that uses multimodal inputs or outputs. | ||
| LangSmith lets you create dataset examples with file attachments—like images, audio files, or documents—and use them in your prompts and evaluators when running evaluations with multimodal content. | ||
|
|
||
| While you can include multimodal data in your examples by base64 encoding it, this approach is inefficient - the encoded data takes up more space than the original binary files, resulting in slower transfers to and from LangSmith. Using attachments instead provides two key benefits: | ||
| While you can include multimodal data in your examples by base64 encoding it, this approach is inefficient—the encoded data takes up more space than the original binary files, resulting in slower transfers to and from LangSmith. Using attachments instead provides two key benefits: | ||
|
|
||
| 1. Faster upload and download speeds due to more efficient binary file transfers | ||
| 2. Enhanced visualization of different file types in the LangSmith UI | ||
| 1. Enhanced visualization of different file types in the LangSmith UI | ||
|
|
||
| ## SDK | ||
| This guide covers how to create examples with attachments, build multimodal prompts and evaluators that use those attachments, and run evaluations with multimodal content. | ||
|
|
||
| ### 1. Create examples with attachments | ||
| <Tabs> | ||
| <Tab title="SDK" icon="code"> | ||
|
|
||
| ## 1. Create examples with attachments | ||
|
|
||
| To upload examples with attachments using the SDK, use the [create_examples](https://docs.smith.langchain.com/reference/python/client/langsmith.client.Client#langsmith.client.Client.create_examples) / [update_examples](https://docs.smith.langchain.com/reference/python/client/langsmith.client.Client#langsmith.client.Client.update_examples) Python methods or the [uploadExamplesMultipart](https://docs.smith.langchain.com/reference/js/classes/client.Client#uploadexamplesmultipart) / [updateExamplesMultipart](https://docs.smith.langchain.com/reference/js/classes/client.Client#updateexamplesmultipart) TypeScript methods. | ||
|
|
||
|
|
@@ -430,7 +433,7 @@ const resp = await evaluate(fileQA, { | |
|
|
||
| </CodeGroup> | ||
|
|
||
| ## Update examples with attachments | ||
| ## 3. Update examples with attachments | ||
|
|
||
| In the code above, we showed how to add examples with attachments to a dataset. It is also possible to update these same examples using the SDK. | ||
|
|
||
|
|
@@ -503,9 +506,10 @@ await langsmithClient.updateExamplesMultipart(dataset.id, [exampleUpdate]); | |
|
|
||
| </CodeGroup> | ||
|
|
||
| ## UI | ||
| </Tab> | ||
| <Tab title="UI" icon="click"> | ||
|
|
||
| ### 1. Create examples with attachments | ||
| ## 1. Create examples with attachments | ||
|
|
||
| You can add examples with attachments to a dataset in a few different ways. | ||
|
|
||
|
|
@@ -523,7 +527,7 @@ You can create examples with attachments directly from the LangSmith UI. Click t | |
|
|
||
| Once uploaded, you can view examples with attachments in the LangSmith UI. Each attachment will be rendered with a preview for easy inspection.  | ||
|
|
||
| ### 2. Create a multimodal prompt | ||
| ## 2. Create a multimodal prompt | ||
|
|
||
| The LangSmith UI allows you to include attachments in your prompts when evaluating multimodal models: | ||
|
|
||
|
|
@@ -534,20 +538,44 @@ First, click the file icon in the message where you want to add multimodal conte | |
|
|
||
|  | ||
|
|
||
| ### Define custom evaluators | ||
| ## 3. Define custom evaluators | ||
|
|
||
| <Note> | ||
| The LangSmith playground does not currently support pulling multimodal content into evaluators. If this would be helpful for your use case, please let us know in the [LangChain Forum](https://forum.langchain.com/) (sign up [here](https://www.langchain.com/join-community) if you're not already a member)! | ||
| </Note> | ||
| You can create evaluators that use multimodal content from your dataset examples. | ||
|
|
||
| Since your dataset already has examples with attachments (added in step 1), you can reference them directly in your evaluator. To do so: | ||
|
|
||
| 1. Select **+ Evaluator** from the dataset page. | ||
| 1. In the **Template variables** editor, add a variable for the attachment(s) to include: | ||
| - For a single attachment type: Use the suggested variable name. All examples must have an attachment with this name. | ||
| - For multiple attachments or if attachment names vary across examples: Use the `All attachments` variable to include all available attachments for each example. | ||
|
|
||
| You can evaluate a model's text output by adding an evaluator that takes in the example's inputs and outputs. Even without multimodal support in your evaluators, you can still run text-only evaluations. For example: | ||
| <img | ||
| className="block dark:hidden" | ||
| src="/langsmith/images/evaluator-attach-file-light.png" | ||
| alt="Create evaluator modal with an audio attachment selected for output variable." | ||
| /> | ||
|
|
||
| * OCR → text correction: Use a vision model to extract text from a document, then evaluate the accuracy of the extracted output. | ||
| * Speech-to-text → transcription quality: Use a voice model to transcribe audio to text, then evaluate the transcription against your reference. | ||
| <img | ||
| className="hidden dark:block" | ||
| src="/langsmith/images/evaluator-attach-file-dark.png" | ||
| alt="Create evaluator modal with an audio attachment selected for output variable." | ||
| /> | ||
|
|
||
|
|
||
| The evaluator can then use these attachments along with the model's outputs to judge quality. For example, you could create an evaluator that: | ||
|
|
||
| - Checks if an image description matches the actual image content. | ||
| - Verifies if a transcription accurately reflects the audio. | ||
| - Validates if extracted text from a PDF is correct. | ||
|
|
||
| You can also create text-only evaluators that don't use attachments but evaluate the model's text output: | ||
|
|
||
| - OCR → text correction: Use a vision model to extract text from a document, then evaluate the accuracy of the extracted output. | ||
| - Speech-to-text → transcription quality: Use a voice model to transcribe audio to text, then evaluate the transcription against your reference. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add a component on how it will take in any base64 components ? Not all customers will use attachments to render traces, so we allow you to pass in any base64 format. For example, if part of their input / output is in base64 format, if they follow this guide here: https://docs.langchain.com/langsmith/log-multimodal-traces to set up their traces correctly, they will not only be able to visualize them in the UI, if i do variable mapping with {{input}} in the evaluator, they will also be passed correctly to the LLM that supports. I know this section is about atachments, but I think this may be worth calling out here. We can use the images that I sent in the slack
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
|
|
||
| For more information on defining custom evaluators, see the [LLM as Judge](/langsmith/llm-as-judge) guide. | ||
|
|
||
| ### Update examples with attachments | ||
| ## 4. Update examples with attachments | ||
|
|
||
| <Note> | ||
| Attachments are limited to 20MB in size in the UI. | ||
|
|
@@ -562,3 +590,6 @@ When editing an example in the UI, you can: | |
| Changes are not saved until you click submit. | ||
|
|
||
|  | ||
|
|
||
| </Tab> | ||
| </Tabs> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,13 +17,13 @@ In the [LangSmith UI](https://smith.langchian.com), head to the **Tracing Projec | |
|
|
||
| ## Configure online evaluators | ||
|
|
||
| #### 1. Navigate to online evaluators | ||
| ### 1. Navigate to online evaluators | ||
|
|
||
| Head to the **Tracing Projects** tab and select a tracing project. Click on **+ New** in the top right corner of the tracing project page, then click on **New Evaluator**. Select the evaluator you want to configure. | ||
|
|
||
| #### 2. Name your evaluator | ||
| ### 2. Name your evaluator | ||
|
|
||
| #### 3. Create a filter | ||
| ### 3. Create a filter | ||
|
|
||
| For example, you may want to apply specific evaluators based on: | ||
|
|
||
|
|
@@ -37,11 +37,11 @@ Filters on evaluators work the same way as when you're filtering traces in a pro | |
| It's often helpful to inspect runs as you're creating a filter for your evaluator. With the evaluator configuration panel open, you can inspect runs and apply filters to them. Any filters you apply to the runs table will automatically be reflected in filters on your evaluator. | ||
| </Tip> | ||
|
|
||
| #### 4. (Optional) Configure a sampling rate | ||
| ### 4. (Optional) Configure a sampling rate | ||
|
|
||
| Configure a sampling rate to control the percentage of filtered runs that trigger the automation action. For example, to control costs, you may want to set a filter to only apply the evaluator to 10% of traces. In order to do this, you would set the sampling rate to 0.1. | ||
|
|
||
| #### 5. (Optional) Apply rule to past runs | ||
| ### 5. (Optional) Apply rule to past runs | ||
|
|
||
| Apply rule to past runs by toggling the **Apply to past runs** and entering a "Backfill from" date. This is only possible upon rule creation. | ||
|
|
||
|
|
@@ -55,10 +55,28 @@ In order to track progress of the backfill, you can view logs for your evaluator | |
| - Optionally filter runs that you would like to apply your evaluator on or configure a sampling rate. | ||
| - Select **Apply Evaluator**. | ||
|
|
||
| #### 6. Configure the LLM-as-a-judge evaluator | ||
| ### 6. Configure the LLM-as-a-judge evaluator | ||
|
|
||
| View this guide to configure an [LLM-as-a-judge evaluator](/langsmith/llm-as-judge?mode=ui#pre-built-evaluators-1). | ||
|
|
||
| ### 7. (Optional) Map multimodal content to evaluator | ||
|
|
||
| If your traces contain multimodal content like images, audio, or documents, you can include this content in your evaluator prompts. There are two approaches: | ||
|
katmayb marked this conversation as resolved.
|
||
|
|
||
| - **Using base64-encoded content from traces**: If your application logs multimodal content as base64-encoded data in the trace (for example, in the input or output of a run), you can reference this content directly in your evaluator prompt using template variables. The evaluator will extract the base64 data from the trace and pass it to the LLM. | ||
|
katmayb marked this conversation as resolved.
|
||
| - **Using attachments from traces**: Similar to [offline evaluations with attachments](/langsmith/evaluate-with-attachments), you can use attachments from your traces in online evaluations. Since your traces already include attachments logged via the SDK, you can reference them directly in your evaluator. To do so: | ||
|
|
||
| 1. In your evaluator configuration, click the file icon in the evaluator message where you want to add multimodal content. | ||
| 1. In the **Template variables** tab, add a variable for the attachment(s) to include: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. update this section to mirror the above
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
| - For a single attachment type: Use the suggested variable name. All traces must have an attachment with this name. | ||
| - For multiple attachments or if attachment names vary across traces: Use the `All attachments` variable to include all available attachments for each trace. | ||
|
|
||
| The evaluator can then access these attachments when evaluating the trace. This is useful for evaluators that need to: | ||
|
|
||
| - Verify if an image description matches the actual image in the trace. | ||
| - Check if a transcription accurately reflects the audio input. | ||
| - Validate if extracted text from a document is correct. | ||
|
|
||
| ## Video guide | ||
| <iframe | ||
| className="w-full aspect-video rounded-xl" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely accurate in temrs of recommednation - I would say something like:
(this is bc for 1 singel attachemnt, they can still do {{attachments}} which is probably easiest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!