Skip to content

[Feature] Add WFGY 16-problem taxonomy as an agentic debugging guide #8

@onestardao

Description

@onestardao

Hi and thanks for adept-agentic-framework-core.

I am the maintainer of WFGY, an MIT-licensed framework that summarizes 16 common failure modes for RAG systems and multi-agent LLM pipelines into a compact “problem map”:

The WFGY repo sits at around 1.5k GitHub stars, and the problem map has already been referenced by:

  • Harvard MIMS Lab ToolUniverse
  • QCRI / HBKU LLM Lab Multimodal RAG Survey
  • University of Innsbruck Rankify

Since adept-agentic-framework-core focuses on benchmarking generative models and LLM-centric workloads on HPC systems, I think it could be useful to expose WFGY’s failure taxonomy as a qualitative axis in addition to existing performance metrics.


What problem this feature would solve

adept-agentic-framework-core gives users a way to measure:

  • latency and throughput,
  • hardware utilization,
  • and scalability of LLM workloads.

However, when those workloads involve RAG or agentic reasoning, users also care about:

  • which kind of reasoning failures appear under resource pressure,
  • whether some configurations are more prone to specific failure types,
  • and how failure patterns change when scaling models / hardware.

WFGY’s 16-problem map offers a light-weight vocabulary for issues such as:

  • retrieval drift vs. retrieval void,
  • multi-step reasoning collapse,
  • inconsistent tool calls across parallel traces,
  • or context trimming that silently drops critical evidence.

Surfacing this vocabulary in adept-agentic-framework-core would help users interpret performance results in terms of tension between speed / cost and failure profile.


Proposed solution

A minimal integration could include:

  1. Short documentation page

    • “Interpreting RAG / agent workloads with WFGY 16-problem map”.
    • One table listing the 16 failure modes and examples of adept-agentic-framework-core workloads where they might show up.
  2. Tagging guidance

    • Suggestions on how users can:
      • tag their adept-agentic-framework-core runs with WFGY problem IDs (e.g. in metadata or log filenames),
      • and record observed failure counts per problem in their own analysis scripts.
  3. Optional example notebook

    • A small notebook showing:
      • a RAG-style benchmark run,
      • manual or semi-automatic tagging of a subset of outputs by WFGY problem type,
      • and a simple plot relating failure-type frequencies to hardware / config choices.

This keeps adept-agentic-framework-core neutral and flexible, while giving users an easy, MIT-licensed taxonomy if they want one.


Alternatives considered

  • Leaving failure-mode vocabularies entirely to each project.
  • Introducing a bespoke adept-agentic-framework-core-only taxonomy.

Both options make it harder to compare results or share lessons across teams. Re-using an existing taxonomy that is already referenced in other LLM tooling seems lower-friction.


Willingness to contribute

If this seems aligned with adept-agentic-framework-core’s goals, I would be glad to:

  • draft the docs page and example notebook,
  • and open a PR so you can review and adjust scope / terminology.

All WFGY content is MIT-licensed, so you can freely adapt or trim it as needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions