Details on SFT Data Generation and Training Pipeline

Hi @authors, thanks for the excellent work on AORCHESTRA! The framework design is very insightful, especially the four-tuple abstraction for dynamic agent creation.

I'm trying to reproduce the SFT results reported in the paper (Section 3.3 and Table 1), and I noticed that some implementation details regarding the supervised fine-tuning (SFT) process are not fully described in the current version. To ensure correct reproduction, I'd appreciate it if you could clarify the following:
1. Training Data Construction
- **Source of trajectories**: Are the training trajectories collected from the training-free version of AORCHESTRA, or from human demonstrations, or from another method?
- **Data format**: What is the exact format of the training samples? Is it:
  - Prompt: Full history `s_t`
  - Completion: Action `a_t` (i.e., the `Delegate(Φ)` with the 4-tuple configuration)?
  - Or a different format?
- **Positive/negative sampling**: The paper mentions behavior cloning from expert trajectories. How are "expert" actions selected? Are they only successful trajectories, or do you also include failure cases with corrected labels?

Are there plans to release:
- The training data generation scripts?
- The processed SFT dataset?
- The training configuration and launcher scripts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details on SFT Data Generation and Training Pipeline #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Details on SFT Data Generation and Training Pipeline #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions