AFlow: Automating Agentic Workflow Generation

If you encounter any difficulties in using or reproducing the code, please contact me directly (Email: [email protected], Wechat: 18831933368). Some Operators may have bugs during the migration from MetaGPT to this repository.

AFlow is a framework for automatically generating and optimizing Agentic Workflows. It uses Monte Carlo tree search in a code-represented workflow space to find effective workflows, replacing manual development with machine effort. Our approach shows potential to outperform handcrafted workflows on various tasks.

We're building it to support more benchmarks and open-ended tasks! If you have any questions, please open an issue or email us!

Framework Components

Node: Basic unit of LLM invocation. See metagpt_core/action_nodes/action_node.py for a flexible interface to control LLM, temperature, format, and prompt.
Operator: Predefined combinations of Nodes to enhance search efficiency. Encapsulates common operations like Generate, Format, Review, Revise, Ensemble, Test, and Programmer. See operator.py for details. You can customize your own Operator by referencing the implementations in this code.
Workflow: A sequence of LLM-invoking nodes connected by edges. Can be represented as graphs, neural networks, or code to express various execution structures. See workflow.py for our implementation.
Optimizer: Uses LLMs within a Monte Carlo Tree Search variant to explore and refine workflows. Iteratively selects, expands, evaluates, and updates workflows based on performance. See optimizer.py for details.
Evaluator: Assesses workflow performance on given tasks. Provides feedback to guide the optimization process towards more effective workflows. See evaluator.py for details.

Datasets

Experimental Datasets

We conducted experiments on six datasets (HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP) and provide their evaluation code. The data can be found in this datasets link, or you can download them using metagpt/ext/aflow/data/download_data.py

Custom Datasets

For custom tasks, you can reference the code in the benchmark folder. Inherit the BaseBenchmark class and implement evaluate_problem, calculate_score, and get_result_columns to add your custom dataset benchmark. Then, add your benchmark name in evaluator.py and optimizer.py to find effective workflows for your custom dataset.

Quick Start

Set up the Python environment:

# Create and activate a Python 3.9 virtual environment
conda create -n <your_env_name> python=3.9

# Install dependencies
pip install -r requirements.txt

Configure optimization parameters:

Use command line arguments or modify default parameters in run.py:

--dataset              # (Required) Dataset type (HumanEval/MBPP/GSM8K/MATH/HotpotQA/DROP)
--sample 4             # Sample count - number of workflows to be resampled
--optimized_path PATH  # Optimized result save path
--initial_round 1      # Initial round
--max_rounds 20        # Max iteration rounds for AFLOW
--check_convergence    # Whether to enable early stop
--validation_rounds 5  # Validation rounds for AFLOW
--if_first_optimize    # Set True for first optimization, False afterwards

Configure LLM parameters in config/config2.yaml (see config/config2.example.yaml for reference)
Set up operators in run.py and in operator.py, optimized_path/template/operator.json. You can reference our implementation to add operators for specific datasets
For first-time use, download datasets and initial rounds by setting download(["datasets"]) in run.py
(Optional) Add your custom dataset and corresponding evaluation function following the Custom Datasets section
(Optional) If you want to use a portion of the validation data, you can set va_list in evaluator.py

Run the optimization:

# Using default parameters
python run.py --dataset MATH

# Or with custom parameters
python run.py --dataset MATH --sample n --optimized_path xxx ...

Reproduce the Results in the Paper

We provide the raw data obtained from our experiments in this link, including the workflows and prompts generated in each iteration, as well as their trajectories on the validation dataset. We also provide the optimal workflow for each dataset and the corresponding data on the test dataset. You can download these data using data/download_data.py.
You can directly reproduce our experimental results by use different ExperimentConfig of run.py.

Roadmap

Support multiple search algorithms
Support multi model search in workflow
Support LeaderBoard
Support more benchmarks
Support multimodality tasks

Citation

If you use AFlow in your research, please cite our paper:

@inproceedings{
   zhang2025aflow,
   title={{AF}low: Automating Agentic Workflow Generation},
   author={Jiayi Zhang and Jinyu Xiang and Zhaoyang Yu and Fengwei Teng and Xiong-Hui Chen and Jiaqi Chen and Mingchen Zhuge and Xin Cheng and Sirui Hong and Jinlin Wang and Bingnan Zheng and Bang Liu and Yuyu Luo and Chenglin Wu},
   booktitle={The Thirteenth International Conference on Learning Representations},
   year={2025},
   url={https://openreview.net/forum?id=z5uVAKwmjf}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
benchmarks		benchmarks
config		config
data		data
scripts		scripts
workspace		workspace
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
run_baseline.py		run_baseline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AFlow: Automating Agentic Workflow Generation

Framework Components

Datasets

Experimental Datasets

Custom Datasets

Quick Start

Reproduce the Results in the Paper

Roadmap

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

FoundationAgents/AFlow

Folders and files

Latest commit

History

Repository files navigation

AFlow: Automating Agentic Workflow Generation

Framework Components

Datasets

Experimental Datasets

Custom Datasets

Quick Start

Reproduce the Results in the Paper

Roadmap

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages