Skip to content

Proposal: RAG flow failure analysis tutorial using WFGY 16-problem ProblemMap #20795

@onestardao

Description

@onestardao

Describe the current behavior

Many Prefect users are now orchestrating RAG / LLM pipelines with flows that look roughly like:

ingest → chunk → embed → store in vector DB → retrieve → call LLM → post-process

When these systems misbehave (hallucinations, missing context, unstable answers) the failure is rarely in one task. It is a pattern across several tasks: chunking, retriever behavior, routing, guardrails, etc.

Right now the Prefect documentation has great examples for building flows and integrating with LLMs, but there is no single place that:

  • names the most common “RAG failure modes” across the whole flow, and
  • shows how to instrument a Prefect flow to quickly localise which stage is failing.

So users often debug by trial-and-error on the LLM prompt instead of on the flow structure.

Describe the proposed behavior

I would like to propose a documentation page or tutorial:

“Debugging RAG flows with a 16-problem failure map (WFGY ProblemMap)”

The idea is to use an open-source MIT-licensed checklist called the WFGY 16-problem ProblemMap as the backbone. It classifies typical RAG / LLM pipeline failures into 16 modes (retriever behaviour, chunking, vector stores, routing, hallucinations, evaluation gaps, etc.) and is currently referenced by several open-source / research projects.

In Prefect docs, this could become:

  • a “How-to” or “Guides” page under the LLM / RAG area,
  • with a simple demo flow (ingestion → vector store → retrieval → LLM), and
  • a table that maps each of the 16 failure modes to:
    • which Prefect tasks or flow edges to instrument,
    • what signals to log (e.g. chunk count, top-k distribution, coverage of relevant documents),
    • and what quick experiments to run (e.g. swap retriever, change chunk size, run evaluation flow).

This would be a docs-only addition. No changes to Prefect core are required; it is essentially a structured troubleshooting tutorial built on top of existing patterns.
I am happy to draft the initial PR for the tutorial, diagrams and example flow.

Example Use

Example scenario:

  • A user builds a RAG assistant with Prefect that helps support agents answer tickets.
  • The flow runs end-to-end without errors, but agents report that:
    • answers are sometimes off-topic,
    • sometimes too generic, and
    • sometimes completely miss obviously relevant documents.

Using the proposed tutorial, they could:

  1. Open the “16-problem RAG failure map” table and see that their symptoms match:
    • Problem No. 3 – chunking / context fragmentation,
    • Problem No. 5 – retriever mis-prioritisation, and
    • Problem No. 9 – evaluation blind spots.
  2. Copy small code snippets that:
    • log chunk statistics in the ingestion task,
    • log retrieved document IDs / scores in the retrieval task,
    • run a simple evaluation flow on a small labelled set.
  3. Iterate on the right parts of the flow (chunking parameters, retriever config, evaluation) instead of only changing the LLM prompt.

The goal is to give Prefect users a reproducible way to debug RAG flows using a shared vocabulary of failure modes, while keeping everything in the Prefect ecosystem.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions