Skip to content

Optimized shuffle for typesize=12 #649

@froody

Description

@froody

Describe the bug
Feature request, I'm happy to contribute some but I don't know if my solutions will be optimal. I compress a lot of data where typesize=12, and when using shuffle this falls back to unshuffle_generic, which is slow. It would be nice if there were 12-byte variants of all the platform-specific shuffle code. It might not be as fast as a power-of-2 typesize, but it's still much faster than generic.

To Reproduce
Decompress any data using shuffle with typesize=12, see that unshuffle_generic dominates the overall time.

Expected behavior
unshuffle for typesize=12 is approximately as fast as typesize=8 or typesize=16

Logs
If applicable, add logs to help explain your problem.

System information:

  • OS: [e.g. OSX]
  • Compiler [e.g. gcc, clang]
  • Version [e.g. 2.0.1]

Additional context
I think it would be nice to support all possible typesizes up to a point, as for most the could be quite a significant speedup compared to the generic implementation.

Here's my attempt at avx512-unshuffle: #648

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions