Why does dataset merge fail when tools have different parameters?

Hi, I have a question about SFT (Supervised Fine-tuning) for an agent model.

Suppose I want to fine-tune an agent model that may receive two different tools: tool1 and tool2. These tools have different parameters and types in their schema definitions.

When I try to merge datasets containing different tool definitions, I get the following error:

TypeError: Couldn't cast array of type
struct<refundFee: struct<description: string, type: string>, ... , servicerId: struct<description: string, type: string>>
to
{
  'refundFee': {'description': Value(dtype='string'), 'type': Value(dtype='string')},
  ...
  'templateId': {'description': Value(dtype='string'), 'type': Value(dtype='string')}
}
From my understanding, the merge fails because the tools column's nested structure is different across datasets — e.g., one struct contains an extra field servicerId while the other does not. This causes HuggingFace Datasets (and its underlying Apache Arrow schema) to reject the merge.

My question is: why is it designed this way?

Is this strict schema matching a hard requirement of the library?
Is there a recommended way to merge datasets with different tool schemas (different parameters and types)?
For an agent model supporting multiple tools, what's the best practice for preparing/merging training data without losing flexibility?
Any guidance or design rationale would be greatly appreciated. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does dataset merge fail when tools have different parameters? #7869

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why does dataset merge fail when tools have different parameters? #7869

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions