This repository was archived by the owner on Nov 17, 2023. It is now read-only.
How to train with parquet files? #20341
Unanswered
MikkelWorkF
asked this question in
Q&A
Replies: 2 comments 1 reply
-
@szha Do you have someone to help with this question? |
Beta Was this translation helpful? Give feedback.
0 replies
-
If you are already familiar with petastorm, you can use the plain python reader of it and wrap the data into mxnet arrays using |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello
How do I train mxnet with parquet files?
I have the training data stored in a bunch of parquet files(hundreds) and they cannot fit in memory (2TB+). Until now we have been able to not deal with the issue, because we could handle the training data in memory (we ran on 728GB memory sagemaker instances, but that is no longer sufficient)
We have been looking a long time for solutions, but nothing seems to be working. We are considering switching to PyTorch as that can handle a petastorm reader, which should work with parquet files. However, we feel like there has to be some solution we are not seeing.
Beta Was this translation helpful? Give feedback.
All reactions