Open
Description
In some practical cases SentinelVector
is much slower than Vector
. For example for data tested in https://bkamins.github.io/julialang/2022/12/23/duckdb.html.
We have:
julia> summary(posts)
"42710197×3 DataFrame"
julia> typeof.(eachcol(posts))
3-element Vector{DataType}:
SentinelArrays.ChainedVector{Union{Missing, Int64}, SentinelArrays.SentinelVector{Int64, Int64, Missing, Vector{Int64}}}
SentinelArrays.ChainedVector{Union{Missing, Int64}, SentinelArrays.SentinelVector{Int64, Int64, Missing, Vector{Int64}}}
SentinelArrays.ChainedVector{Union{Missing, Int64}, SentinelArrays.SentinelVector{Int64, Int64, Missing, Vector{Int64}}}
julia> @time dropmissing(posts);
0.819397 seconds (137 allocations: 1.822 GiB)
julia> @time dropmissing(copy(posts));
0.560146 seconds (130 allocations: 2.657 GiB)
and - as you can see - it is faster to copy a data frame (to change sentinel vectors to just Vector
) and then do dropmissing
than just do dropmissing
directly.
Metadata
Assignees
Labels
No labels
Activity