Skip to content

add error analysis guide#2892

Open
annabellscha wants to merge 10 commits intomainfrom
update-error-analysis-blogpost
Open

add error analysis guide#2892
annabellscha wants to merge 10 commits intomainfrom
update-error-analysis-blogpost

Conversation

@annabellscha
Copy link
Copy Markdown
Contributor

@annabellscha annabellscha commented May 5, 2026

Disclaimer: Experimental PR review

Greptile Summary

Adds a new cookbook guide on error analysis for LLM applications, covering open coding, failure taxonomy clustering, labeling, and deciding when to build evaluators versus fix prompts. The guide also registers the new page in meta.json.

  • A 360-line step-by-step guide is added, using a "Dad Tech Support" chatbot as a worked example through the full five-step error analysis process.
  • The guide integrates with the existing Langfuse annotation queue and scoring workflow and references the Claude Code Langfuse skill for an interactive walkthrough.

Confidence Score: 4/5

Safe to merge; the only flag is a draft note in the example data that should be cleaned up before the guide goes live.

The guide is well-structured and the process it describes is technically sound. The one thing worth fixing before publishing is the parenthetical in Step 4.2 that reveals the example bar chart is based on only 19 of 100 traces — readers may lose confidence in the example data if that note ships as-is.

content/guides/cookbook/error-analysis-llm-applications.mdx — specifically the Step 4.2 failure rates table and its incomplete-data caveat.

Important Files Changed

Filename Overview
content/guides/cookbook/error-analysis-llm-applications.mdx New 360-line guide walking through a full error analysis workflow; one draft-style note about incomplete example data remains in the published Step 4.2 results section.
content/guides/cookbook/meta.json New guide entry added at the top of the pages list; change is straightforward and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Choose what to annotate\nTrace vs. GENERATION observation] --> B[Select ~100 representative traces\nby latency, cost, tags, multi-turn]
    B --> C[Create annotation queue\nwith open_coding + pass_fail_assessment]
    C --> D[Open code first 30-50 traces\nFree-text observations, no pre-defined categories]
    D --> E{New failure types\nstill appearing?}
    E -- Yes --> D
    E -- No --> F[Cluster into 5-10 named failure categories\nSplit by root cause, merge by same root cause]
    F --> G[Create boolean score configs per category\nNew queue with all 10 score configs]
    G --> H[Label all 100 traces]
    H --> I[Compute failure rates\nLangfuse Dashboard - Scores widget]
    I --> J{For each category:\nCan we just fix it?}
    J -- Yes --> K[Prompt / tool / code fix]
    J -- No --> L{Worth building\nan evaluator?}
    L -- Yes --> M[LLM-as-judge or code-based check]
    L -- No --> N[Monitor / defer]
    K & M & N --> O[Re-run after next\nprompt rewrite, model switch, or incident]
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
content/guides/cookbook/error-analysis-llm-applications.mdx:275
**Incomplete-data caveat left in published guide**

The parenthetical "(from 19 labeled traces, rates will shift once all 100 are done)" reads as a draft note that was never removed. Readers following the guide will see a bar chart with rates derived from 19 of 100 traces and an explicit admission that the numbers are preliminary — undermining confidence in the example data. Either finish labeling all 100 traces before publishing, or replace this note with the final, stable numbers.

Reviews (1): Last reviewed commit: "add error analysis guide" | Re-trigger Greptile

@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-docs Ready Ready Preview, Comment May 5, 2026 7:50pm

Request Review

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

@claude review

@dosubot dosubot Bot added the documentation Improvements or additions to documentation label May 5, 2026
Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx
Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated
Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx
Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior feedback addressed (notebook source added, frontmatter complete, skill link fixed, illustrative-data caveat reworded).

Extended reasoning...

Overview

This documentation-only PR adds a new error-analysis cookbook guide. It includes a hand-written MDX, a corresponding source notebook in cookbook/, a _routes.json registration, a meta.json entry, and a small frameborder -> frameBorder JSX casing fix in a related blog post. No code paths, build scripts, or runtime behavior are changed.

Security risks

None. This is content-only — no auth, network, or data-handling code involved. External links go to langfuse.com pages (verified to exist) and to public references like hamel.dev.

Level of scrutiny

Light scrutiny appropriate for a docs-only change. The earlier higher-stakes risk (regen script silently deleting a hand-edited cookbook MDX with no notebook source) is now eliminated because a backing notebook was added and registered in cookbook/_routes.json.

Other factors

All four issues I raised on earlier revisions have been addressed in the latest commit: backing notebook present, frontmatter has category: Evaluation and sidebarTitle, the Claude Code skill link now points at a path that actually resolves (content/docs/api-and-data-platform/features/agent-skill.mdx), and Step 4.2's draft-style parenthetical has been reworded to frame the partial-sample numbers as illustrative. The Greptile bot's confidence is 4/5 and the only outstanding flag from external review is the same illustrative-data caveat that has now been reworded. Bug hunting system found no bugs.

Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated
Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated
| **5. Update and improve** | Update your setup based on learnings or add automated evaluators |

---

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add also the 1. Step: Gather diverse dataset

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • some infos

Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx
Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated
Comment thread content/guides/cookbook/error-analysis-llm-applications.mdx Outdated

---

### Step 4.1: Label all traces against the categories
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Step 4 is missing a parent heading even though it has sub-steps (Step 4.1 at line 249 and Step 4.2 at line 261), unlike Step 1 which has a parent ### Step 1: Gather a diverse dataset (line 64) before sub-steps Step 1.1/1.2/1.3. Readers scanning headings jump from ### Step 3 straight to ### Step 4.1 with no Step 4 introduction. Either add a ### Step 4: Label and quantify parent before line 249 (mirroring Step 1) or drop the Step 1: parent for consistency.

Extended reasoning...

What the bug is

The new guide's section heading hierarchy is structurally inconsistent. Step 1 is introduced with a parent heading and then split into three sub-steps, but Step 4 jumps directly to its sub-steps with no umbrella heading.

Verified in source:

  • ### Step 1: Gather a diverse dataset (line 64) — parent
  • ### Step 1.1: Choose what to annotate (line 68)
  • ### Step 1.2: Select a representative sample (line 97)
  • ### Step 1.3: Create an annotation queue (line 128)
  • ### Step 2: Open code your first 30-50 traces (line 161) — parent, no sub-steps
  • ### Step 3: Cluster into failure categories (line 195) — parent, no sub-steps
  • ### Step 4.1: Label all traces against the categories (line 249) — no Step 4 parent
  • ### Step 4.2: Compute failure rates (line 261)
  • ### Step 5: Decide what to do about each category (line 286) — parent, no sub-steps

Step 4 is the only multi-substep section in the guide that lacks a parent heading. The intro process table at lines 54-60 explicitly advertises 4. Label and quantify as the umbrella for what becomes 4.1 and 4.2, so the umbrella is named in the table-of-contents but not represented as a heading in the body.

Step-by-step proof

  1. A reader skims the right-hand TOC / scrolls section headings to navigate the guide.
  2. They see Step 1 (parent) → Step 1.1 / 1.2 / 1.3Step 2 (parent) → Step 3 (parent).
  3. They expect Step 4 next as a sibling parent. Instead the next heading is Step 4.1 with no introductory framing.
  4. They scroll to Step 4.2, then Step 5 (parent), confirming Step 4 was uniquely missing its umbrella.

Why existing code does not prevent it

There is no markdown lint or schema check that requires sub-numbered headings to have a parent at the same depth — every heading is rendered at h3, so MDX/Next.js produces a valid page regardless. The intro process table compensates somewhat, but the inconsistency remains visible in the rendered TOC.

Addressing the refutation

One verifier argued this is purely editorial: all headings render at h3 so there is no broken hierarchy, the intro table names Step 4, the page renders fine, and varying numbering depth is a reasonable editorial choice. That is a fair characterisation of the severity — there is no functional defect — and is the reason this is filed as a nit, not a normal bug. But the inconsistency is real, observable, and trivial to fix in a docs PR. The maintainer (annabellscha) has already left an inline comment on line 63 asking to add Step 1's parent — i.e. they care about parent headings being consistent — so it is reasonable to flag the matching gap at Step 4 in the same review pass rather than having it discovered later.

Fix

One-line addition before line 249:

### Step 4: Label and quantify

[optional one-sentence intro mirroring Step 1's framing]

Alternatively, flatten Step 1 by removing the Step 1: parent so all step headings are sub-numbered consistently. The Step-1-style fix is preferable since Steps 2, 3, and 5 already use parent headings without sub-steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant