add error analysis guide by annabellscha · Pull Request #2892 · langfuse/langfuse-docs

annabellscha · 2026-05-05T14:49:38Z

Disclaimer: Experimental PR review

Greptile Summary

Adds a new cookbook guide on error analysis for LLM applications, covering open coding, failure taxonomy clustering, labeling, and deciding when to build evaluators versus fix prompts. The guide also registers the new page in meta.json.

A 360-line step-by-step guide is added, using a "Dad Tech Support" chatbot as a worked example through the full five-step error analysis process.
The guide integrates with the existing Langfuse annotation queue and scoring workflow and references the Claude Code Langfuse skill for an interactive walkthrough.

Confidence Score: 4/5

Safe to merge; the only flag is a draft note in the example data that should be cleaned up before the guide goes live.

The guide is well-structured and the process it describes is technically sound. The one thing worth fixing before publishing is the parenthetical in Step 4.2 that reveals the example bar chart is based on only 19 of 100 traces — readers may lose confidence in the example data if that note ships as-is.

content/guides/cookbook/error-analysis-llm-applications.mdx — specifically the Step 4.2 failure rates table and its incomplete-data caveat.

Important Files Changed

Filename	Overview
content/guides/cookbook/error-analysis-llm-applications.mdx	New 360-line guide walking through a full error analysis workflow; one draft-style note about incomplete example data remains in the published Step 4.2 results section.
content/guides/cookbook/meta.json	New guide entry added at the top of the pages list; change is straightforward and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Choose what to annotate\nTrace vs. GENERATION observation] --> B[Select ~100 representative traces\nby latency, cost, tags, multi-turn]
    B --> C[Create annotation queue\nwith open_coding + pass_fail_assessment]
    C --> D[Open code first 30-50 traces\nFree-text observations, no pre-defined categories]
    D --> E{New failure types\nstill appearing?}
    E -- Yes --> D
    E -- No --> F[Cluster into 5-10 named failure categories\nSplit by root cause, merge by same root cause]
    F --> G[Create boolean score configs per category\nNew queue with all 10 score configs]
    G --> H[Label all 100 traces]
    H --> I[Compute failure rates\nLangfuse Dashboard - Scores widget]
    I --> J{For each category:\nCan we just fix it?}
    J -- Yes --> K[Prompt / tool / code fix]
    J -- No --> L{Worth building\nan evaluator?}
    L -- Yes --> M[LLM-as-judge or code-based check]
    L -- No --> N[Monitor / defer]
    K & M & N --> O[Re-run after next\nprompt rewrite, model switch, or incident]

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
content/guides/cookbook/error-analysis-llm-applications.mdx:275
**Incomplete-data caveat left in published guide**

The parenthetical "(from 19 labeled traces, rates will shift once all 100 are done)" reads as a draft note that was never removed. Readers following the guide will see a bar chart with rates derived from 19 of 100 traces and an explicit admission that the numbers are preliminary — undermining confidence in the example data. Either finish labeling all 100 traces before publishing, or replace this note with the final, stable numbers.

_{Reviews (1): Last reviewed commit: "add error analysis guide" | Re-trigger Greptile}

vercel · 2026-05-05T14:49:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
langfuse-docs	Ready	Preview, Comment	May 5, 2026 7:50pm

github-actions · 2026-05-05T14:49:49Z

@claude review

review-notebook-app · 2026-05-05T18:31:49Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

claude

All prior feedback addressed (notebook source added, frontmatter complete, skill link fixed, illustrative-data caveat reworded).

Extended reasoning...

Overview

This documentation-only PR adds a new error-analysis cookbook guide. It includes a hand-written MDX, a corresponding source notebook in cookbook/, a _routes.json registration, a meta.json entry, and a small frameborder -> frameBorder JSX casing fix in a related blog post. No code paths, build scripts, or runtime behavior are changed.

Security risks

None. This is content-only — no auth, network, or data-handling code involved. External links go to langfuse.com pages (verified to exist) and to public references like hamel.dev.

Level of scrutiny

Light scrutiny appropriate for a docs-only change. The earlier higher-stakes risk (regen script silently deleting a hand-edited cookbook MDX with no notebook source) is now eliminated because a backing notebook was added and registered in cookbook/_routes.json.

Other factors

All four issues I raised on earlier revisions have been addressed in the latest commit: backing notebook present, frontmatter has category: Evaluation and sidebarTitle, the Claude Code skill link now points at a path that actually resolves (content/docs/api-and-data-platform/features/agent-skill.mdx), and Step 4.2's draft-style parenthetical has been reworded to frame the partial-sample numbers as illustrative. The Greptile bot's confidence is 4/5 and the only outstanding flag from external review is the same illustrative-data caveat that has now been reworded. Bug hunting system found no bugs.

annabellscha · 2026-05-05T18:38:07Z

+| **5. Update and improve** | Update your setup based on learnings or add automated evaluators |
+
+---
+


Add also the 1. Step: Gather diverse dataset

claude · 2026-05-05T19:29:36Z

+
+---
+
+### Step 4.1: Label all traces against the categories


🟡 Step 4 is missing a parent heading even though it has sub-steps (Step 4.1 at line 249 and Step 4.2 at line 261), unlike Step 1 which has a parent ### Step 1: Gather a diverse dataset (line 64) before sub-steps Step 1.1/1.2/1.3. Readers scanning headings jump from ### Step 3 straight to ### Step 4.1 with no Step 4 introduction. Either add a ### Step 4: Label and quantify parent before line 249 (mirroring Step 1) or drop the Step 1: parent for consistency.

Extended reasoning...

What the bug is

The new guide's section heading hierarchy is structurally inconsistent. Step 1 is introduced with a parent heading and then split into three sub-steps, but Step 4 jumps directly to its sub-steps with no umbrella heading.

Verified in source:

### Step 1: Gather a diverse dataset (line 64) — parent

### Step 1.1: Choose what to annotate (line 68)

### Step 1.2: Select a representative sample (line 97)

### Step 1.3: Create an annotation queue (line 128)

### Step 2: Open code your first 30-50 traces (line 161) — parent, no sub-steps

### Step 3: Cluster into failure categories (line 195) — parent, no sub-steps

### Step 4.1: Label all traces against the categories (line 249) — no Step 4 parent

### Step 4.2: Compute failure rates (line 261)

### Step 5: Decide what to do about each category (line 286) — parent, no sub-steps

Step 4 is the only multi-substep section in the guide that lacks a parent heading. The intro process table at lines 54-60 explicitly advertises 4. Label and quantify as the umbrella for what becomes 4.1 and 4.2, so the umbrella is named in the table-of-contents but not represented as a heading in the body.

Step-by-step proof

A reader skims the right-hand TOC / scrolls section headings to navigate the guide.

They see Step 1 (parent) → Step 1.1 / 1.2 / 1.3 → Step 2 (parent) → Step 3 (parent).

They expect Step 4 next as a sibling parent. Instead the next heading is Step 4.1 with no introductory framing.

They scroll to Step 4.2, then Step 5 (parent), confirming Step 4 was uniquely missing its umbrella.

Why existing code does not prevent it

There is no markdown lint or schema check that requires sub-numbered headings to have a parent at the same depth — every heading is rendered at h3, so MDX/Next.js produces a valid page regardless. The intro process table compensates somewhat, but the inconsistency remains visible in the rendered TOC.

Addressing the refutation

One verifier argued this is purely editorial: all headings render at h3 so there is no broken hierarchy, the intro table names Step 4, the page renders fine, and varying numbering depth is a reasonable editorial choice. That is a fair characterisation of the severity — there is no functional defect — and is the reason this is filed as a nit, not a normal bug. But the inconsistency is real, observable, and trivial to fix in a docs PR. The maintainer (annabellscha) has already left an inline comment on line 63 asking to add Step 1's parent — i.e. they care about parent headings being consistent — so it is reasonable to flag the matching gap at Step 4 in the same review pass rather than having it discovered later.

Fix

One-line addition before line 249:

### Step 4: Label and quantify [optional one-sentence intro mirroring Step 1's framing]

Alternatively, flatten Step 1 by removing the Step 1: parent so all step headings are sub-numbered consistently. The Step-1-style fix is preferable since Steps 2, 3, and 5 already use parent headings without sub-steps.

add error analysis guide

31add73

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 5, 2026

dosubot Bot added the documentation Improvements or additions to documentation label May 5, 2026

vercel Bot deployed to Preview May 5, 2026 14:51 View deployment

Merge branch 'main' into update-error-analysis-blogpost

e3ebd15

vercel Bot deployed to Preview May 5, 2026 14:59 View deployment

claude Bot reviewed May 5, 2026

View reviewed changes

Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx

Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated

Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx

claude Bot reviewed May 5, 2026

View reviewed changes

Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated

annabellscha added 3 commits May 5, 2026 20:16

add link to agent skill

91adfe8

mini fix

126bf6f

edits and file reatructuring

f730cfb

vercel Bot deployed to Preview May 5, 2026 18:35 View deployment

claude Bot reviewed May 5, 2026

View reviewed changes

annabellscha commented May 5, 2026

View reviewed changes

add images

46cb9eb

vercel Bot deployed to Preview May 5, 2026 19:23 View deployment

claude Bot reviewed May 5, 2026

View reviewed changes

annabellscha added 2 commits May 5, 2026 21:34

edit prompt

2c63ef2

update in mdx

e4cc3da

vercel Bot deployed to Preview May 5, 2026 19:41 View deployment

annabellscha added 2 commits May 5, 2026 21:43

edit prompt

f9b2de6

swap image

c94d81a

vercel Bot deployed to Preview May 5, 2026 19:50 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add error analysis guide#2892

add error analysis guide#2892
annabellscha wants to merge 10 commits intomainfrom
update-error-analysis-blogpost

annabellscha commented May 5, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

vercel Bot commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

review-notebook-app Bot commented May 5, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Uh oh!

annabellscha May 5, 2026

Uh oh!

annabellscha May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		\| 5. Update and improve \| Update your setup based on learnings or add automated evaluators \|

		---

Conversation

annabellscha commented May 5, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Disclaimer: Experimental PR review

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

vercel Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

review-notebook-app Bot commented May 5, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

annabellscha May 5, 2026

Choose a reason for hiding this comment

Uh oh!

annabellscha May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

What the bug is

Step-by-step proof

Why existing code does not prevent it

Addressing the refutation

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

annabellscha commented May 5, 2026 •

edited by greptile-apps Bot

Loading

vercel Bot commented May 5, 2026 •

edited

Loading