Skip to content

Feature request: add an option to summarize and analyze agent trace with LLM after each run #108

@KaminariOS

Description

@KaminariOS

#101

I have developed some scripts to do this, but it may be valuable to integrate this into the eval pipeline.

Currently, we lack observability in agent behaviors:

  • Traces are long, and nobody bothers to read them
  • Traces alone are not enough. We also need the evaluation and fault-injection logic to understand the run better.
  • Many times, the agent did the right thing, but there are bugs in the benchmark itself.

For summary and analysis, we can feed the context to the most advanced LLM with a better reasoning capability and a larger context length at a low cost(because it is one-step).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions