Currently, partitioning with a memory size bytes threshold is quite slow. After discussion with Olivia, there were a few ideas of different potential ways to improve this performance including:
- For fixed length columns (int, float, etc.) use known size instead of computing per row
- Use vectorized methods to get size/length of dynamic length columns if they exist
- Do size calculations per histogram pixel instead of per row.
We should try to benchmark and improve performance with any of these methods if they work, or any other ideas.
Currently, partitioning with a memory size bytes threshold is quite slow. After discussion with Olivia, there were a few ideas of different potential ways to improve this performance including:
We should try to benchmark and improve performance with any of these methods if they work, or any other ideas.