Open
Description
Here is the experiment.
Given the dataframe and functions f0, f1
below
using DataFrames, DataFramesMeta, StatsBase
df = DataFrame(a=1:10_000) # I know the df is small but big enough to show the issue
f0(df::DataFrame) = begin
@chain df begin
@rtransform(:b = :a * 10)
@rtransform(:c = mean(:b))
@rtransform(:d = :b - :c)
@select(:a, :d)
end
end
f1(df::DataFrame) = begin
@chain df begin
@rtransform @astable begin
b = :a * 10
c = mean(b)
:d = b - c
end
end
end
We get an improvement in performance in f1
, which is what one would expect given it does not need to create columns b, c
.
@time f0(df)
0.001146 seconds (728 allocations: 898.516 KiB)
@time f1(df)
0.000503 seconds (161 allocations: 243.609 KiB)
However, if one uses this code outside a function (see below) it becomes 46 times slower! Making it unusable for datasets of a larger size.
@time @chain df begin
@rtransform @astable begin
b = :a * 10
c = mean(b)
:d = b - c
end
end
-> 2.331518 seconds (335.93 k allocations: 13.028 MiB, 4.69% compilation time)
@time @chain df begin
@rtransform(:b = :a * 10)
@rtransform(:c = mean(:b))
@rtransform(:d = :b - :c)
@select(:a, :d)
end
-> 0.056910 seconds (34.81 k allocations: 3.137 MiB, 95.06% compilation time)
Thanks for the great work :)
Metadata
Assignees
Labels
No labels
Activity