Skip to content

Improve performance of memory size partitioning #671

@smcguire-cmu

Description

@smcguire-cmu

Currently, partitioning with a memory size bytes threshold is quite slow. After discussion with Olivia, there were a few ideas of different potential ways to improve this performance including:

  • For fixed length columns (int, float, etc.) use known size instead of computing per row
  • Use vectorized methods to get size/length of dynamic length columns if they exist
  • Do size calculations per histogram pixel instead of per row.

We should try to benchmark and improve performance with any of these methods if they work, or any other ideas.

Metadata

Metadata

Assignees

Labels

performanceFor slow queries or compute bottlenecks

Type

No type
No fields configured for issues without a type.

Projects

Status
In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions