FlowSteer: Towards Agents Designing Agentic Workflows via Reinforced Progressive Canvas Editing

Agent Designing Agentic Workflows with an executable Workflow Canvas

Overview

FlowSteer studies Agent Designing Agentic Workflows: a lightweight policy agent designs a workflow graph, and a downstream executor LLM runs that workflow to solve the task. The current repository is aligned with the arXiv v4 formulation, which centers on three ideas:

Workflow Canvas: an executable graph-state environment that maintains the workflow, checks each atomic edit, executes operators, and returns feedback.
Designer--Executor decoupling: the Flow-Director designs the workflow, while a pluggable executor backend runs the designed graph.
Reinforced Progressive Canvas Editing: the Flow-Director commits one atomic edit per turn and is trained end-to-end with a canvas-masked GRPO objective and diversity-constrained reward.

Method

At each turn, the Flow-Director observes the task, operator library, workflow state, and canvas feedback. It emits a brief reflection plus exactly one action. The canvas applies that action, validates the graph, executes available nodes when needed, and appends feedback for the next turn.

Repository Layout

train_interactive.py              training entry point for multi-turn canvas editing
eval_only.py                      inference/evaluation entry point
merge_and_upload.py               LoRA merge and upload utility
config/training_interactive.yaml  paper-aligned training configuration
config/aflow_llm.yaml.example     executor backend configuration template
config/operator.json              operator descriptions
scripts/operators.py              operator implementations
src/interactive/workflow_env.py   Workflow Canvas environment
src/interactive/workflow_graph.py graph state and structure checks
src/interactive/action_parser.py  XML/action parsing
src/interactive/grpo_trainer.py   GRPO utilities
src/interactive/trajectory_reward.py diversity-gated reward
figs/                             figures synchronized with the arXiv v4 manuscript

Requirements

Python 3.10+
CUDA-capable GPU
vLLM with LoRA serving enabled
A local or API executor backend configured through config/aflow_llm.yaml

The paper experiments use Qwen3-8B as the Flow-Director policy model, LoRA fine-tuning, bfloat16 precision, and a GPT-OSS-120B executor backend.

Installation

git clone https://github.com/beita6969/FlowSteer.git
cd FlowSteer

conda create -n flowsteer python=3.10 -y
conda activate flowsteer
pip install -r requirements.txt
pip install "vllm>=0.6.0"

Dataset

The hosted dataset can be downloaded from Hugging Face:

python - <<'PY'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="beita6969/FlowSteer-Dataset",
    repo_type="dataset",
    local_dir="data",
    allow_patterns=["train/train_12k.jsonl", "eval/*.jsonl"],
    endpoint="https://huggingface.co",
)
PY

The paper evaluates 12 datasets: six IID datasets for training/testing and six OOD datasets for generalization.

IID: GSM8K, MATH, HotPotQA, SQuAD v2, MBPP, HumanEval
OOD: TriviaQA, NaturalQuestions, MathQA, AIME 2025, APPS, DS-1000

The arXiv v4 appendix specifies the paper training recipe as 10,778 IID training instances: 2,560 each from GSM8K, MATH, HotPotQA, and SQuAD v2, plus 374 MBPP and 164 HumanEval examples. The public dataset repository also provides evaluation JSONL files under data/eval/ for the 12 benchmark families.

Configure Executor Backend

Create the executor configuration from the template:

cp config/aflow_llm.yaml.example config/aflow_llm.yaml

For an OpenAI-compatible local executor service, set:

models:
  gpt-oss-120b:
    api_type: openai
    base_url: http://127.0.0.1:8004/v1
    api_key: EMPTY
    model_name: gpt-oss-120b
    temperature: 0
    top_p: 1
    max_tokens: 4096

Then ensure config/training_interactive.yaml points to the same executor model name:

aflow_executor_model: "gpt-oss-120b"

Start Flow-Director vLLM Service

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \
  --model /path/to/Qwen3-8B \
  --served-model-name Qwen3-8B \
  --port 8003 \
  --gpu-memory-utilization 0.85 \
  --max-model-len 16384 \
  --enable-lora \
  --max-loras 2 \
  --max-lora-rank 64 \
  --trust-remote-code \
  --dtype bfloat16

Training

CUDA_VISIBLE_DEVICES=0 python train_interactive.py \
  --config config/training_interactive.yaml

Important paper-aligned defaults are already set in config/training_interactive.yaml:

Category	Setting
Policy model	Qwen3-8B
LoRA	rank 64, alpha 64, dropout 0.05, q/k/v/o projections
RL objective	GRPO with canvas token mask
Samples per group	36
Clip / KL	0.20 / 0.005
Generation	temperature 0.6, top-p 0.95, top-k 20, max new tokens 2048
Interaction	max 20 rounds
Reward	base -1.0, diversity cap 1.0, correctness released only after full structural reward
Executor timeout	600 seconds

Evaluation

Evaluate a single benchmark file:

python eval_only.py \
  --config config/training_interactive.yaml \
  --data data/eval/gsm8k.jsonl \
  --num-samples 128 \
  --workers 16

Evaluate with a served LoRA adapter by starting vLLM with the adapter first, then passing the served adapter name:

python eval_only.py \
  --config config/training_interactive.yaml \
  --data data/eval/humaneval.jsonl \
  --vllm-model flowsteer-adapter \
  --workers 16

--checkpoint is recorded for diagnosis only; the adapter must already be loaded by the vLLM server.

Model Weights

The released Flow-Director model is hosted at:

https://huggingface.co/beita6969/FlowSteer-8b

License

This repository is released for research use. Please also follow the licenses and terms of the upstream models, datasets, and benchmark suites used with FlowSteer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowSteer: Towards Agents Designing Agentic Workflows via Reinforced Progressive Canvas Editing

Agent Designing Agentic Workflows with an executable Workflow Canvas

Overview

Method

Repository Layout

Requirements

Installation

Dataset

Configure Executor Backend

Start Flow-Director vLLM Service

Training

Evaluation

Model Weights

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
figs		figs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
eval_only.py		eval_only.py
merge_and_upload.py		merge_and_upload.py
requirements.txt		requirements.txt
start_vllm_server.sh		start_vllm_server.sh
train_interactive.py		train_interactive.py

Folders and files

Latest commit

History

Repository files navigation

FlowSteer: Towards Agents Designing Agentic Workflows via Reinforced Progressive Canvas Editing

Agent Designing Agentic Workflows with an executable Workflow Canvas

Overview

Method

Repository Layout

Requirements

Installation

Dataset

Configure Executor Backend

Start Flow-Director vLLM Service

Training

Evaluation

Model Weights

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages