Building a big symbolic sum by reducing a vector of terms is quadratic right now. sum(v) and reduce(+, v) go through Base's pairwise binary +, and every binary + rebuilds the whole Add dictionary, so folding n terms copies 1 + 2 + … + n entries.
The interesting part is that the O(n) fix already exists: add_worker (with the task-local AddWorkerBuffer) threads one dictionary through the entire reduction in a single pass. But today it's only reachable through the variadic +, i.e. a literal a + b + c + … chain. A sum/reduce over a vector never gets there. add_worker already takes a collection (it's called with a Vector internally in code.jl and polyform.jl), so the only thing missing is the dispatch.
Here's the gap, summing n distinct terms k * xₖ at the Num layer (fresh variables per row, min of many runs):
| n |
foldl(+, v) |
sum(v) |
add_worker(v) |
speedup |
| 8 |
26 µs |
25 µs |
10 µs |
~2.6× |
| 50 |
342 µs |
348 µs |
48 µs |
~7× |
| 100 |
1.16 ms |
1.14 ms |
82 µs |
~14× |
| 250 |
6.87 ms |
7.86 ms |
202 µs |
~34× |
| 500 |
21.8 ms |
16.6 ms |
270 µs |
~70× |
| 1000 |
70.9 ms |
71.6 ms |
598 µs |
~119× |
foldl/sum grow quadratically (~61× for 10× the terms), add_worker grows linearly. The raw BasicSymbolic layer is the same story.
The straightforward fix is to route the collection reductions to add_worker, e.g. in addsub.jl:
function Base.sum(a::AbstractArray{S}) where {S <: NonTreeSym}
isempty(a) && return zero_of_vartype(vartype(S))
return add_worker(vartype(S), a)
end
function Base.sum(f, a::AbstractArray{S}) where {S <: NonTreeSym}
isempty(a) && return zero_of_vartype(vartype(S))
return add_worker(vartype(S), map(f, a))
end
# Vector (not AbstractArray) avoids an ambiguity with StaticArrays.reduce.
function Base.reduce(::typeof(+), a::Vector{S}) where {S <: NonTreeSym}
isempty(a) && return zero_of_vartype(vartype(S))
return add_worker(vartype(S), a)
end
plus the Num mirror in Symbolics (unwrap, add_worker, rewrap; vartype taken from the first symbolic element so a vector mixing constants and symbolics still works). This is byte-identical to foldl(+, v) since add_worker is what a + b + c already calls, so it's purely a complexity change. Checking locally on SymbolicUtils 4.35.5 / Symbolics 7.28.1: results match foldl for distinct, collecting, mixed-constant and all-constant vectors up to n=1000; detect_ambiguities stays at 485 with the methods added; and sum(M; dims=1) still goes through Base's array path, so matrix reductions are untouched.
Since this changes Base.sum/reduce behavior I wanted to check here before sending a PR. If you'd rather not touch Base, the lower-risk alternative is to just document/export add_worker (or a thin fast_sum) as public API and let callers opt in. Happy to open the PR either way, whichever you prefer. One scoping note: this covers SymReal/SafeReal only, TreeReal falls through to the default.
Building a big symbolic sum by reducing a vector of terms is quadratic right now.
sum(v)andreduce(+, v)go through Base's pairwise binary+, and every binary+rebuilds the wholeAdddictionary, so foldingnterms copies1 + 2 + … + nentries.The interesting part is that the O(n) fix already exists:
add_worker(with the task-localAddWorkerBuffer) threads one dictionary through the entire reduction in a single pass. But today it's only reachable through the variadic+, i.e. a literala + b + c + …chain. Asum/reduceover a vector never gets there.add_workeralready takes a collection (it's called with aVectorinternally incode.jlandpolyform.jl), so the only thing missing is the dispatch.Here's the gap, summing
ndistinct termsk * xₖat theNumlayer (fresh variables per row, min of many runs):foldl(+, v)sum(v)add_worker(v)foldl/sumgrow quadratically (~61× for 10× the terms),add_workergrows linearly. The rawBasicSymboliclayer is the same story.The straightforward fix is to route the collection reductions to
add_worker, e.g. inaddsub.jl:plus the
Nummirror in Symbolics (unwrap,add_worker, rewrap; vartype taken from the first symbolic element so a vector mixing constants and symbolics still works). This is byte-identical tofoldl(+, v)sinceadd_workeris whata + b + calready calls, so it's purely a complexity change. Checking locally on SymbolicUtils 4.35.5 / Symbolics 7.28.1: results matchfoldlfor distinct, collecting, mixed-constant and all-constant vectors up to n=1000;detect_ambiguitiesstays at 485 with the methods added; andsum(M; dims=1)still goes through Base's array path, so matrix reductions are untouched.Since this changes
Base.sum/reducebehavior I wanted to check here before sending a PR. If you'd rather not touchBase, the lower-risk alternative is to just document/exportadd_worker(or a thinfast_sum) as public API and let callers opt in. Happy to open the PR either way, whichever you prefer. One scoping note: this coversSymReal/SafeRealonly,TreeRealfalls through to the default.