-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: multimodal content in evaluators #2876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
c785d38
Add docs for multimodal content evaluators
katmayb b89e17c
Feedback
katmayb 1de59b9
Merge branch 'main' into multimodal-content-evaluators
katmayb 8324c0e
Catherine feedback
katmayb e4fa082
Catherine feedback
katmayb 6d229ed
Merge branch 'main' into multimodal-content-evaluators
katmayb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,19 +11,19 @@ sidebarTitle: LLM-as-a-judge | |
|
|
||
| ## View online evaluators | ||
|
|
||
| In the [LangSmith UI](https://smith.langchian.com), head to the **Tracing Projects** tab and select a tracing project. To view existing online evaluators for that project, click on the **Evaluators** tab. | ||
| In the [LangSmith UI](https://smith.langchain.com), head to the **Tracing Projects** tab and select a tracing project. To view existing online evaluators for that project, click on the **Evaluators** tab. | ||
|
|
||
|  | ||
|
|
||
| ## Configure online evaluators | ||
|
|
||
| #### 1. Navigate to online evaluators | ||
| ### 1. Navigate to online evaluators | ||
|
|
||
| Head to the **Tracing Projects** tab and select a tracing project. Click on **+ New** in the top right corner of the tracing project page, then click on **New Evaluator**. Select the evaluator you want to configure. | ||
|
|
||
| #### 2. Name your evaluator | ||
| ### 2. Name your evaluator | ||
|
|
||
| #### 3. Create a filter | ||
| ### 3. Create a filter | ||
|
|
||
| For example, you may want to apply specific evaluators based on: | ||
|
|
||
|
|
@@ -37,11 +37,11 @@ Filters on evaluators work the same way as when you're filtering traces in a pro | |
| It's often helpful to inspect runs as you're creating a filter for your evaluator. With the evaluator configuration panel open, you can inspect runs and apply filters to them. Any filters you apply to the runs table will automatically be reflected in filters on your evaluator. | ||
| </Tip> | ||
|
|
||
| #### 4. (Optional) Configure a sampling rate | ||
| ### 4. (Optional) Configure a sampling rate | ||
|
|
||
| Configure a sampling rate to control the percentage of filtered runs that trigger the automation action. For example, to control costs, you may want to set a filter to only apply the evaluator to 10% of traces. In order to do this, you would set the sampling rate to 0.1. | ||
|
|
||
| #### 5. (Optional) Apply rule to past runs | ||
| ### 5. (Optional) Apply rule to past runs | ||
|
|
||
| Apply rule to past runs by toggling the **Apply to past runs** and entering a "Backfill from" date. This is only possible upon rule creation. | ||
|
|
||
|
|
@@ -55,10 +55,40 @@ In order to track progress of the backfill, you can view logs for your evaluator | |
| - Optionally filter runs that you would like to apply your evaluator on or configure a sampling rate. | ||
| - Select **Apply Evaluator**. | ||
|
|
||
| #### 6. Configure the LLM-as-a-judge evaluator | ||
| ### 6. Configure the LLM-as-a-judge evaluator | ||
|
|
||
| View this guide to configure an [LLM-as-a-judge evaluator](/langsmith/llm-as-judge?mode=ui#pre-built-evaluators-1). | ||
|
|
||
| ### 7. (Optional) Map multimodal content to evaluator | ||
|
|
||
| If your traces contain multimodal content like images, audio, or documents, you can include this content in your evaluator prompts. There are two approaches: | ||
|
|
||
| - **Using base64-encoded content from traces**: If your application logs multimodal content as base64-encoded data in the trace (for example, in the input or output of a run), you can reference this content directly in your evaluator prompt using template variables. The evaluator will extract the base64 data from the trace and pass it to the LLM. | ||
|
katmayb marked this conversation as resolved.
|
||
| - **Using attachments from traces**: Similar to [offline evaluations with attachments](/langsmith/evaluate-with-attachments), you can use attachments from your traces in online evaluations. Since your traces already include attachments logged via the SDK, you can reference them directly in your evaluator. To do so: | ||
|
|
||
| 1. In your evaluator configuration, click the file icon in the evaluator message where you want to add multimodal content. | ||
| 1. In the **Template variables** tab, add a variable for the attachment(s) to include: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. update this section to mirror the above
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
| - For a single attachment type: Use the suggested variable name. All traces must have an attachment with this name. | ||
| - For multiple attachments or if attachment names vary across traces: Use the `All attachments` variable to include all available attachments for each trace. | ||
|
|
||
| <img | ||
| className="block dark:hidden" | ||
| src="/langsmith/images/variable-multimodal-content-light.png" | ||
| alt="Edit evaluator modal with an image attachment selected for the input." | ||
| /> | ||
|
|
||
| <img | ||
| className="hidden dark:block" | ||
| src="/langsmith/images/variable-multimodal-content-dark.png" | ||
| alt="Edit evaluator modal with an image attachment selected for the input." | ||
| /> | ||
|
|
||
| The evaluator can then access these attachments when evaluating the trace. This is useful for evaluators that need to: | ||
|
|
||
| - Verify if an image description matches the actual image in the trace. | ||
| - Check if a transcription accurately reflects the audio input. | ||
| - Validate if extracted text from a document is correct. | ||
|
|
||
| ## Video guide | ||
| <iframe | ||
| className="w-full aspect-video rounded-xl" | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.