Description
Proposal
Add gymasium env id, file size and dataset group to the displayed table when running command minari list remote
or minari list local
.
So this would mean adding new columns named something like "env_id", "size on disk" and something like "dataset group."
Right now, the datasets do not have a dataset_group
value, so for backwards compatibility, the PR should check for a dataset_group
attribute, and if there is none, it should use the string "Unknown" as a placeholder value.
This should be a useful hint for getting started with getting the file size for remote datasets: https://stackoverflow.com/questions/50875461/google-cloud-storage-get-object-size-api
To get started with this, it would be useful to look at the code in cli.py
local.py
and hosting.py
The doc will also need to be updated to reflect the existence of the new field dataset_group
. Definitely on this page, https://minari.farama.org/main/content/dataset_standards/, and probably also on the individual dataset pages.
Motivation
This is partially to address #79, and also it's useful to know how large each dataset is to get an idea of how long it will take to download or process a particular dataset.
Checklist
- I have checked that there is no similar issue in the repo (required)