While there are some ways to pass custom data loaders (https://docs.scvi-tools.org/en/1.3.2/user_guide/use_case/custom_dataloaders.html) these approaches seem a bit of an overkill. On the other hand needing to concatenate your data (if you have multiple datasets) gets very cumbersome very quickly.
t would be nice if you can just create a custom pytorch dataset and pass it to the SCVI api. This dataset would have the get_item_ method that just returns the needed transcriptome and say batch etc.