feat(benchmark): support agent api execute benchmark dataset #2945

chenliang15405 · 2025-12-08T13:56:42Z

Description

Support create agent benchmark task for execute falcon text2sql evaluation dataset by remote invoking the agent through HTTP API

How Has This Been Tested?

Step1: Create an evaluation task and select the evaluation Agent
Step2: Waiting for the execution to be completed

Snapshots:

Include snapshots for easier review.

Checklist:

My code follows the style guidelines of this project
I have already rebased the commits and make the commit message conform to the project standard.
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
Any dependent changes have been merged and published in downstream modules

# Conflicts: # packages/dbgpt-serve/src/dbgpt_serve/evaluate/service/benchmark/benchmark_service.py

Aries-ckt

LGTM

fangyinc

LGTM

alan.cl added 4 commits December 5, 2025 19:53

feat(benchmark): support agent execute benchmark dataset

50441cc

Merge branch 'main' into feat_benchmark_agent

087d6ff

# Conflicts: # packages/dbgpt-serve/src/dbgpt_serve/evaluate/service/benchmark/benchmark_service.py

chore: make code format

7b228ce

fix: create benchmark task first

c9ccdca

github-actions bot added the enhancement New feature or request label Dec 8, 2025

chore: code fmt

3c7cfba

Aries-ckt approved these changes Dec 9, 2025

View reviewed changes

fangyinc approved these changes Dec 10, 2025

View reviewed changes

fangyinc merged commit 19c2cee into eosphoros-ai:main Dec 10, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(benchmark): support agent api execute benchmark dataset #2945

feat(benchmark): support agent api execute benchmark dataset #2945

Uh oh!

chenliang15405 commented Dec 8, 2025

Uh oh!

Aries-ckt left a comment

Uh oh!

fangyinc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(benchmark): support agent api execute benchmark dataset #2945

feat(benchmark): support agent api execute benchmark dataset #2945

Uh oh!

Conversation

chenliang15405 commented Dec 8, 2025

Description

How Has This Been Tested?

Snapshots:

Checklist:

Uh oh!

Aries-ckt left a comment

Choose a reason for hiding this comment

Uh oh!

fangyinc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants