Skip to content

save_episode fails for shape=(1,) numeric features due to scalar conversion assumptions #3343

@SevenFo

Description

@SevenFo

Ticket Type

What kind of ticket are you opening?

  • 🐛 Bug Report (Something isn't working)

Environment & System Info

  • LeRobot: current main (tested from source)
  • OS: Linux
  • Python: 3.12
  • datasets: 4.7.0 / 4.8.4
  • numpy: behavior differs across versions (failures seen with newer numpy combinations)
  • pyarrow: 23.0.1 in isolated repro env

Description

When a numeric feature is declared as shape=(1,), save_episode() can fail depending on dependency stack.

LeRobot maps shape=(1,) numeric features to datasets.Value(...). In the save path, values can end up in (N,1) form before passing to datasets.Dataset.from_dict. This relies on implicit scalar conversion behavior downstream.

In stricter stacks this fails with:
TypeError: only 0-dimensional arrays can be converted to Python scalars

Context & Reproduction

Minimal reproduction at datasets layer (same schema assumption LeRobot uses for shape=(1,)):

import numpy as np, datasets

ft = datasets.Features({"x": datasets.Value("float32")})
cases = {
	"list_shape1": [np.array([1.], dtype=np.float32), np.array([2.], dtype=np.float32)],
	"stacked_N1": np.stack([np.array([1.], dtype=np.float32), np.array([2.], dtype=np.float32)]),
	"list_scalar": [1.0, 2.0],
}

for k, v in cases.items():
	try:
		ds = datasets.Dataset.from_dict({"x": v}, features=ft, split="train")
		print(k, "PASS", list(ds["x"]))
	except Exception as e:
		print(k, "FAIL", type(e).__name__, e)

Observed:

  • list_shape1 / stacked_N1: may fail in stricter stacks
  • list_scalar: passes consistently

LeRobot-side impact:

  • add_frame accepts np.array([x]) for shape=(1,)
  • save_episode may fail due to (N,1) -> scalar conversion assumptions

Relevant logs or stack trace

Representative traceback:

File ".../datasets/features/features.py", line 563, in encode_example
	return float(value)
TypeError: only 0-dimensional arrays can be converted to Python scalars

Checklist

  • I have searched existing tickets to ensure this isn't a duplicate.
  • I am using the latest version of the main branch.
  • I have verified this is not an environment-specific problem.

Additional Info / Workarounds

Proposed fix:

  • In LeRobot save path, normalize shape=(1,) numeric columns from (N,1) to (N,) before Dataset.from_dict.
  • Keep add_frame input contract unchanged (np.array([x]) still accepted).
  • Add regression test to ensure scalarized encoding path remains stable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasetsdependenciesConcerns about external packages, libraries, or versioningdocumentationImprovements or fixes to the project’s docsenhancementSuggestions for new features or improvementsperformanceIssues aimed at improving speed or resource usagequestionRequests for clarification or additional informationtestsProblems with test coverage, failures, or improvements to testing

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions