-
Notifications
You must be signed in to change notification settings - Fork 492
Description
Hi Helicone team,
Helicone is already a powerful gateway and observability layer for LLM applications. Many of those apps are RAG or RAG+agent pipelines, and users often lack a clean taxonomy to describe why a given request failed.
I maintain WFGY RAG 16 Problem Map, an open-source project focused on RAG / LLM failure modes and diagnostics.
Repo (MIT):
https://github.com/onestardao/WFGY
Main RAG failure map page:
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md
WFGY includes:
- A 16-label RAG failure taxonomy (retrieval, prompt, structure, infra)
- A triage prompt that takes a request/trace and assigns one of those labels
- Structured fix suggestions for each label
The same map is already used or cited by:
- RAGFlow and LlamaIndex RAG debugging docs
- ToolUniverse – Harvard MIMS Lab
- Rankify – University of Innsbruck
- Multimodal RAG Survey – QCRI LLM Lab
- And curated lists like Awesome LLM Apps and Awesome Data Science – academic
Proposal
Add WFGY’s 16-problem map as an optional tagging scheme inside Helicone, for example:
-
A small recipe / docs section showing how to:
- Export or sample failing Helicone traces for a RAG app.
- Run the WFGY triage prompt on each trace and compute a
rag_failure_typetag. - Push those tags back into Helicone metadata.
-
Optionally, a template dashboard that:
- Breaks down requests by
rag_failure_type. - Helps users see whether they mostly suffer from retrieval issues, prompt design issues, infra issues, etc.
- Breaks down requests by
This would give Helicone users a practical, standardized language for RAG failures on top of the existing observability features.
If this sounds aligned with the roadmap, I’d be happy to draft a short guide or example for a PR.