Skip to content

sum / reduce(+, …) over symbolic vectors is O(n²); route to add_worker for O(n) #983

Description

@oameye

Building a big symbolic sum by reducing a vector of terms is quadratic right now. sum(v) and reduce(+, v) go through Base's pairwise binary +, and every binary + rebuilds the whole Add dictionary, so folding n terms copies 1 + 2 + … + n entries.

The interesting part is that the O(n) fix already exists: add_worker (with the task-local AddWorkerBuffer) threads one dictionary through the entire reduction in a single pass. But today it's only reachable through the variadic +, i.e. a literal a + b + c + … chain. A sum/reduce over a vector never gets there. add_worker already takes a collection (it's called with a Vector internally in code.jl and polyform.jl), so the only thing missing is the dispatch.

Here's the gap, summing n distinct terms k * xₖ at the Num layer (fresh variables per row, min of many runs):

n foldl(+, v) sum(v) add_worker(v) speedup
8 26 µs 25 µs 10 µs ~2.6×
50 342 µs 348 µs 48 µs ~7×
100 1.16 ms 1.14 ms 82 µs ~14×
250 6.87 ms 7.86 ms 202 µs ~34×
500 21.8 ms 16.6 ms 270 µs ~70×
1000 70.9 ms 71.6 ms 598 µs ~119×

foldl/sum grow quadratically (~61× for 10× the terms), add_worker grows linearly. The raw BasicSymbolic layer is the same story.

The straightforward fix is to route the collection reductions to add_worker, e.g. in addsub.jl:

function Base.sum(a::AbstractArray{S}) where {S <: NonTreeSym}
    isempty(a) && return zero_of_vartype(vartype(S))
    return add_worker(vartype(S), a)
end
function Base.sum(f, a::AbstractArray{S}) where {S <: NonTreeSym}
    isempty(a) && return zero_of_vartype(vartype(S))
    return add_worker(vartype(S), map(f, a))
end
# Vector (not AbstractArray) avoids an ambiguity with StaticArrays.reduce.
function Base.reduce(::typeof(+), a::Vector{S}) where {S <: NonTreeSym}
    isempty(a) && return zero_of_vartype(vartype(S))
    return add_worker(vartype(S), a)
end

plus the Num mirror in Symbolics (unwrap, add_worker, rewrap; vartype taken from the first symbolic element so a vector mixing constants and symbolics still works). This is byte-identical to foldl(+, v) since add_worker is what a + b + c already calls, so it's purely a complexity change. Checking locally on SymbolicUtils 4.35.5 / Symbolics 7.28.1: results match foldl for distinct, collecting, mixed-constant and all-constant vectors up to n=1000; detect_ambiguities stays at 485 with the methods added; and sum(M; dims=1) still goes through Base's array path, so matrix reductions are untouched.

Since this changes Base.sum/reduce behavior I wanted to check here before sending a PR. If you'd rather not touch Base, the lower-risk alternative is to just document/export add_worker (or a thin fast_sum) as public API and let callers opt in. Happy to open the PR either way, whichever you prefer. One scoping note: this covers SymReal/SafeReal only, TreeReal falls through to the default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions