-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi and thanks for adept-agentic-framework-core.
I am the maintainer of WFGY, an MIT-licensed framework that summarizes 16 common failure modes for RAG systems and multi-agent LLM pipelines into a compact “problem map”:
- WFGY ProblemMap (RAG and agents): https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
The WFGY repo sits at around 1.5k GitHub stars, and the problem map has already been referenced by:
- Harvard MIMS Lab ToolUniverse
- QCRI / HBKU LLM Lab Multimodal RAG Survey
- University of Innsbruck Rankify
Since adept-agentic-framework-core focuses on benchmarking generative models and LLM-centric workloads on HPC systems, I think it could be useful to expose WFGY’s failure taxonomy as a qualitative axis in addition to existing performance metrics.
What problem this feature would solve
adept-agentic-framework-core gives users a way to measure:
- latency and throughput,
- hardware utilization,
- and scalability of LLM workloads.
However, when those workloads involve RAG or agentic reasoning, users also care about:
- which kind of reasoning failures appear under resource pressure,
- whether some configurations are more prone to specific failure types,
- and how failure patterns change when scaling models / hardware.
WFGY’s 16-problem map offers a light-weight vocabulary for issues such as:
- retrieval drift vs. retrieval void,
- multi-step reasoning collapse,
- inconsistent tool calls across parallel traces,
- or context trimming that silently drops critical evidence.
Surfacing this vocabulary in adept-agentic-framework-core would help users interpret performance results in terms of tension between speed / cost and failure profile.
Proposed solution
A minimal integration could include:
-
Short documentation page
- “Interpreting RAG / agent workloads with WFGY 16-problem map”.
- One table listing the 16 failure modes and examples of adept-agentic-framework-core workloads where they might show up.
-
Tagging guidance
- Suggestions on how users can:
- tag their adept-agentic-framework-core runs with WFGY problem IDs (e.g. in metadata or log filenames),
- and record observed failure counts per problem in their own analysis scripts.
- Suggestions on how users can:
-
Optional example notebook
- A small notebook showing:
- a RAG-style benchmark run,
- manual or semi-automatic tagging of a subset of outputs by WFGY problem type,
- and a simple plot relating failure-type frequencies to hardware / config choices.
- A small notebook showing:
This keeps adept-agentic-framework-core neutral and flexible, while giving users an easy, MIT-licensed taxonomy if they want one.
Alternatives considered
- Leaving failure-mode vocabularies entirely to each project.
- Introducing a bespoke adept-agentic-framework-core-only taxonomy.
Both options make it harder to compare results or share lessons across teams. Re-using an existing taxonomy that is already referenced in other LLM tooling seems lower-friction.
Willingness to contribute
If this seems aligned with adept-agentic-framework-core’s goals, I would be glad to:
- draft the docs page and example notebook,
- and open a PR so you can review and adjust scope / terminology.
All WFGY content is MIT-licensed, so you can freely adapt or trim it as needed.