Skip to content

recode! slow when src is AbstractArray #354

Open
@ahnlabb

Description

@ahnlabb

In my answer to a recent question on StackOverflow I noted that recode(a, pairs...) and recode!(a, pairs...) when a is a Vector are an order of magnitude slower than unwrap.(recode(a, pairs). This problem is even worse on master (perhaps because of #345). The following benchmark shows a slowdown of 3 orders of magnitude:

using BenchmarkTools, CategoricalArrays, Random
Random.seed!(596551)

a = CategoricalArray(rand(string.('X':'Z'), 100000))

@btime unwrap.(recode($a, "X"=>1, "Y"=>2, "Z"=>3));
@btime recode!($(similar(a, Int)), $a, "X"=>1, "Y"=>2, "Z"=>3);
@btime recode!($(similar(a, Int)), $(unwrap.(a)), "X"=>1, "Y"=>2, "Z"=>3);
@btime recode(unwrap.($a), "X"=>1, "Y"=>2, "Z"=>3);

Result:

  178.212 μs (47 allocations: 1.15 MiB)
  130.184 ms (1098374 allocations: 32.00 MiB)
  155.115 ms (998374 allocations: 28.95 MiB)
  154.162 ms (998378 allocations: 30.48 MiB)

Since recode! also allocates a lot more than unwrap.(recode(a)) when dest is a Vector it seems reasonable to replace it with:

dest .= unwrap.(recode(src, default, pairs...))

for this case.

There is likely more room for optimization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions