Skip to content

[Proposal] Improvments to amount of data displayed by minari list remote and minari list local cli commands #100

Open
@balisujohn

Description

@balisujohn

Proposal

Add gymasium env id, file size and dataset group to the displayed table when running command minari list remote or minari list local.

image

So this would mean adding new columns named something like "env_id", "size on disk" and something like "dataset group."

Right now, the datasets do not have a dataset_group value, so for backwards compatibility, the PR should check for a dataset_group attribute, and if there is none, it should use the string "Unknown" as a placeholder value.

This should be a useful hint for getting started with getting the file size for remote datasets: https://stackoverflow.com/questions/50875461/google-cloud-storage-get-object-size-api

To get started with this, it would be useful to look at the code in cli.py local.py and hosting.py

The doc will also need to be updated to reflect the existence of the new field dataset_group. Definitely on this page, https://minari.farama.org/main/content/dataset_standards/, and probably also on the individual dataset pages.

Motivation

This is partially to address #79, and also it's useful to know how large each dataset is to get an idea of how long it will take to download or process a particular dataset.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions