Skip to content

Very slow @astable macro outside a function #363

Open
@mbataillou

Description

Here is the experiment.

Given the dataframe and functions f0, f1 below

using DataFrames, DataFramesMeta, StatsBase
df = DataFrame(a=1:10_000)  # I know the df is small but big enough to show the issue
f0(df::DataFrame) = begin
	@chain df begin
		@rtransform(:b = :a * 10)
		@rtransform(:c = mean(:b))
		@rtransform(:d = :b - :c)
		@select(:a, :d)
	end
end
f1(df::DataFrame) = begin
	@chain df begin
		@rtransform @astable begin
			b = :a * 10
			c = mean(b)
			:d = b - c
		end
	end
end

We get an improvement in performance in f1, which is what one would expect given it does not need to create columns b, c .

@time f0(df)
0.001146 seconds (728 allocations: 898.516 KiB)
@time f1(df)
0.000503 seconds (161 allocations: 243.609 KiB)

However, if one uses this code outside a function (see below) it becomes 46 times slower! Making it unusable for datasets of a larger size.

@time @chain df begin
	@rtransform @astable begin
		b = :a * 10
		c = mean(b)
		:d = b - c
	end
end
->  2.331518 seconds (335.93 k allocations: 13.028 MiB, 4.69% compilation time)

@time @chain df begin
	@rtransform(:b = :a * 10)
	@rtransform(:c = mean(:b))
	@rtransform(:d = :b - :c)
	@select(:a, :d)
end
->  0.056910 seconds (34.81 k allocations: 3.137 MiB, 95.06% compilation time)

Thanks for the great work :)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions