Question about dataset structure

Hi, thanks for sharing your amazing work!

I'm trying to fine-tune a pre-trained Octo model for just one basic manipulation task (dragging item to starting point). I've built a small dataset and reformatted it to RLDS format (following [this repository](https://github.com/kpertsch/rlds_dataset_builder/tree/main) mentioned in other issues), but I'm encountering an error during training.

**Error:**
```shell
Traceback (most recent call last):
...
  File ".../workspaces/octo_ws/src/octo/octo/data/dataset.py", line 418, in make_dataset_from_rlds
    != dataset_statistics["action"]["mean"].shape[-1]
IndexError: tuple index out of range

``` 

**Problem analysis:**
After investigation, I found that the dataset_statistics.json file is being generated incorrectly. My statistics file contains scalar values for action statistics:
```
{
  "action": {
    "mean": 0.1428595632314682,
    "std": 0.3499261140823364,
    ...
  }
}
``` 

File: [dataset_statistics_71281e479d3992908389f15e31893db4dcde2e44e45cc98f894592f9e7d7ab73.json](https://github.com/user-attachments/files/19938065/dataset_statistics_71281e479d3992908389f15e31893db4dcde2e44e45cc98f894592f9e7d7ab73.json)

When it should contain per-dimension statistics like:
```
{
  "action": {
    "mean": [0.001595, -0.001056, -0.004569, ...],
    "std": [...],
    ...
  }
}
``` 
File: [dataset_statistics_34f7ac35b1a9adf8733b60530fff5fea9f13163e21c34aee36510802806fe696.json](https://github.com/user-attachments/files/19938151/dataset_statistics_34f7ac35b1a9adf8733b60530fff5fea9f13163e21c34aee36510802806fe696.json)

**Dataset Structure:**
My original dataset before RLDS conversion consists of:
- RGB images (256×256×3)
- 8-dimensional action vectors
- Text task descriptions

**Questions:**
1. What is the correct way to structure the dataset so that the statistics are computed per action dimension?
2. Are there any additional requirements for the action space format?
3. Could you provide a minimal example of a correctly formatted RLDS dataset for fine-tuning?

**Additional Information:**
- I'm using the standard finetune.py script from the repository
- My action space is 8-dimensional ([x, y, z, r, p, y, gripper, terminate]: positions and rotations of TCP; binary gripper variable; terminate step variable)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about dataset structure #161

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about dataset structure #161

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions