Skip to content

Details on SFT Data Generation and Training Pipeline #4

@VoyageWang

Description

@VoyageWang

Hi @authors, thanks for the excellent work on AORCHESTRA! The framework design is very insightful, especially the four-tuple abstraction for dynamic agent creation.

I'm trying to reproduce the SFT results reported in the paper (Section 3.3 and Table 1), and I noticed that some implementation details regarding the supervised fine-tuning (SFT) process are not fully described in the current version. To ensure correct reproduction, I'd appreciate it if you could clarify the following:

  1. Training Data Construction
  • Source of trajectories: Are the training trajectories collected from the training-free version of AORCHESTRA, or from human demonstrations, or from another method?
  • Data format: What is the exact format of the training samples? Is it:
    • Prompt: Full history s_t
    • Completion: Action a_t (i.e., the Delegate(Φ) with the 4-tuple configuration)?
    • Or a different format?
  • Positive/negative sampling: The paper mentions behavior cloning from expert trajectories. How are "expert" actions selected? Are they only successful trajectories, or do you also include failure cases with corrected labels?

Are there plans to release:

  • The training data generation scripts?
  • The processed SFT dataset?
  • The training configuration and launcher scripts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions