Skip to content

Offer methods for Dataset to cover the most common mechanisms for moving data between partitions #245

Open
@karlhigley

Description

@karlhigley

The proposed methods would be shuffle_by_keys, sort_by_keys, and group_by_keys. Right now, we only have shuffle_by_keys.

@rjzamora says:

exposing a clear space for documentation is probably the best reason to add it. That documentation should also clarify that these global operations (requiring inter-partition data movement) should be avoided unless absolutely necessary 🙂

Metadata

Metadata

Assignees

Labels

apiChanges or tweaks to the Core APIchoreMaintenance for the repositoryclean up

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions