title	Overview
seoTitle	Evaluation of LLM Applications
description	With Langfuse you can capture all your LLM evaluations in one place. You can combine a variety of different evaluation metrics like model-based evaluations (LLM-as-a-Judge), human annotations or fully custom evaluation workflows via API/SDKs. This allows you to measure quality, tonality, factual accuracy, completeness, and other dimensions of your LLM application.

Evaluation Overview

Evals give you a repeatable check of your LLM application's behavior. You replace guesswork with data.

They also help you catch regressions before you ship a change. You tweak a prompt to handle an edge case, run your eval, and immediately see if it affected the behavior of your application in unintended ways.

Watch this walkthrough of Langfuse Evaluation and how to use it to improve your LLM application.

Getting Started

Follow the Get Started guide to set up your first evaluation. It helps you pick the right approach — automated monitoring, structured experiments, or human review — and walks you through the setup step by step.

If you're new to LLM evaluation concepts, explore the Core Concepts page first for background on scores, evaluation methods, and experiments.

Looking for something specific? Take a look under Evaluation Methods and Experiments for guides on specific topics.

GitHub Discussions

import { GhDiscussionsPreview } from "@/components/gh-discussions/GhDiscussionsPreview";

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Overview

Getting Started

GitHub Discussions

FilesExpand file tree

overview.mdx

Latest commit

History

overview.mdx

File metadata and controls

Evaluation Overview

Getting Started

GitHub Discussions