-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Describe the current behavior
Many Prefect users are now orchestrating RAG / LLM pipelines with flows that look roughly like:
ingest → chunk → embed → store in vector DB → retrieve → call LLM → post-process
When these systems misbehave (hallucinations, missing context, unstable answers) the failure is rarely in one task. It is a pattern across several tasks: chunking, retriever behavior, routing, guardrails, etc.
Right now the Prefect documentation has great examples for building flows and integrating with LLMs, but there is no single place that:
- names the most common “RAG failure modes” across the whole flow, and
- shows how to instrument a Prefect flow to quickly localise which stage is failing.
So users often debug by trial-and-error on the LLM prompt instead of on the flow structure.
Describe the proposed behavior
I would like to propose a documentation page or tutorial:
“Debugging RAG flows with a 16-problem failure map (WFGY ProblemMap)”
The idea is to use an open-source MIT-licensed checklist called the WFGY 16-problem ProblemMap as the backbone. It classifies typical RAG / LLM pipeline failures into 16 modes (retriever behaviour, chunking, vector stores, routing, hallucinations, evaluation gaps, etc.) and is currently referenced by several open-source / research projects.
In Prefect docs, this could become:
- a “How-to” or “Guides” page under the LLM / RAG area,
- with a simple demo flow (ingestion → vector store → retrieval → LLM), and
- a table that maps each of the 16 failure modes to:
- which Prefect tasks or flow edges to instrument,
- what signals to log (e.g. chunk count, top-k distribution, coverage of relevant documents),
- and what quick experiments to run (e.g. swap retriever, change chunk size, run evaluation flow).
This would be a docs-only addition. No changes to Prefect core are required; it is essentially a structured troubleshooting tutorial built on top of existing patterns.
I am happy to draft the initial PR for the tutorial, diagrams and example flow.
Example Use
Example scenario:
- A user builds a RAG assistant with Prefect that helps support agents answer tickets.
- The flow runs end-to-end without errors, but agents report that:
- answers are sometimes off-topic,
- sometimes too generic, and
- sometimes completely miss obviously relevant documents.
Using the proposed tutorial, they could:
- Open the “16-problem RAG failure map” table and see that their symptoms match:
- Problem No. 3 – chunking / context fragmentation,
- Problem No. 5 – retriever mis-prioritisation, and
- Problem No. 9 – evaluation blind spots.
- Copy small code snippets that:
- log chunk statistics in the ingestion task,
- log retrieved document IDs / scores in the retrieval task,
- run a simple evaluation flow on a small labelled set.
- Iterate on the right parts of the flow (chunking parameters, retriever config, evaluation) instead of only changing the LLM prompt.
The goal is to give Prefect users a reproducible way to debug RAG flows using a shared vocabulary of failure modes, while keeping everything in the Prefect ecosystem.
Additional context
No response