Skip to content

RAM exhaustion when training on custom paired dataset (H2R works fine) #4

@AryanSethi06

Description

@AryanSethi06

Hey! First off, thanks for releasing the Human2Robot (H2R) dataset and training code — I fine-tuned the model on the provided H2R data from Hugging Face and the results were incredible.

I’m now trying to replicate the setup using my own paired Human→Robot dataset. However, when switching to my dataset, I consistently hit RAM exhaustion very early in training(the dataset size is 20 episodes for both human and robot), even though the H2R dataset loads and trains without any issues on the same machine.

This makes me suspect there may be implicit assumptions or constraints on the dataset format, video properties, or loading pipeline that my custom data violates.

I wanted to ask:

Are there specific requirements or recommended specs for recording / preprocessing custom paired datasets?

Are there known failure modes that could cause excessive RAM usage with certain video formats or dataset structures?

below I've attached two of the video samples. Even after heavy compression, still not able to load them.

episode_000000_compressed_compressed.mp4
episode_000000_compressed_compressed.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions