Skip to content

Commit 56ffd5f

Browse files
perf: add parallel branch to add_subtract_columns function in the numerical_utils module (#1006)
# Rationale for this change The performance of the `add_subtract_columns` function in the `numerical_utils` module can be improved with a parallel branch of execution when the project is built with Rayon. The parallel branch also eliminates the cost of temporarily adding the `Scalar` vectors to the heap. See note below on why the non-Rayon build needs to keep the temporary vectors on the heap. On the Multi-A100 VM with a table size of `1,000,000`, the current implementation took `11.83`ms in the Sum Count benchmark query. <img width="1440" height="179" alt="image" src="https://github.com/user-attachments/assets/06fefdcd-6311-4124-8186-f36fc43eff61" /> With this PR, the same call is now `7.78`ms - a `1.52`x improvement <img width="1440" height="179" alt="image" src="https://github.com/user-attachments/assets/3eb1ad40-6752-497e-93d0-fec921fe2717" /> NOTE, if we eliminate the `Scalar` memory allocation from the original code and use `scalar_at(i).unwrap()`, the performance regresses. For example, in the Filter benchmark query, making a temporary copy of the `Scalars` on the heap took `11.63`ms <img width="1258" height="120" alt="image" src="https://github.com/user-attachments/assets/d7b8023b-823c-4ce0-becf-c778d16f43d2" /> Updating the code to eliminate the temporary copies of the vectors, and using `scalars_at(i).unwrap()` took `120.61`ms <img width="1691" height="125" alt="image" src="https://github.com/user-attachments/assets/ed3cab66-df43-4ae7-b2f8-01beb71d2255" /> Adding an `unsafe_scalar_at` function that doesn't return an `Option` did not help either, the same section of code using `unsafe_scalar_at(i)` took `106.31`ms <img width="1662" height="125" alt="image" src="https://github.com/user-attachments/assets/99a14ce2-974d-4d28-a65d-3661b5cc199d" /> # What changes are included in this PR? - A parallel branch for Rayon builds is added to the `add_subtract_columns` function in the `numerical_utils` module # Are these changes tested? Yes
2 parents e005c43 + eb03e13 commit 56ffd5f

File tree

1 file changed

+28
-9
lines changed

1 file changed

+28
-9
lines changed

crates/proof-of-sql/src/sql/proof_exprs/numerical_util.rs

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ use num_traits::{NumCast, PrimInt};
1414
use rayon::iter::{IndexedParallelIterator, IntoParallelRefMutIterator, ParallelIterator};
1515

1616
/// Add or subtract two columns together.
17+
/// # Panics
18+
/// Panics if: `lhs` and `rhs` are not of the same length
1719
#[tracing::instrument(level = "debug", skip_all)]
1820
pub(crate) fn add_subtract_columns<'a, S: Scalar>(
1921
lhs: Column<'a, S>,
@@ -27,16 +29,33 @@ pub(crate) fn add_subtract_columns<'a, S: Scalar>(
2729
lhs_len == rhs_len,
2830
"lhs and rhs should have the same length"
2931
);
30-
let lhs_scalar = lhs.to_scalar();
31-
let rhs_scalar = rhs.to_scalar();
32-
let result = alloc.alloc_slice_fill_with(lhs_len, |i| {
33-
if is_subtract {
34-
lhs_scalar[i] - rhs_scalar[i]
35-
} else {
36-
lhs_scalar[i] + rhs_scalar[i]
32+
if_rayon!(
33+
{
34+
let result = alloc.alloc_slice_fill_with(lhs_len, |_| S::ZERO);
35+
if is_subtract {
36+
result.par_iter_mut().enumerate().for_each(|(i, val)| {
37+
*val = lhs.scalar_at(i).unwrap() - rhs.scalar_at(i).unwrap();
38+
});
39+
} else {
40+
result.par_iter_mut().enumerate().for_each(|(i, val)| {
41+
*val = lhs.scalar_at(i).unwrap() + rhs.scalar_at(i).unwrap();
42+
});
43+
}
44+
result
45+
},
46+
{
47+
let lhs_scalar = lhs.to_scalar();
48+
let rhs_scalar = rhs.to_scalar();
49+
let result = alloc.alloc_slice_fill_with(lhs_len, |i| {
50+
if is_subtract {
51+
lhs_scalar[i] - rhs_scalar[i]
52+
} else {
53+
lhs_scalar[i] + rhs_scalar[i]
54+
}
55+
});
56+
result
3757
}
38-
});
39-
result
58+
)
4059
}
4160

4261
/// Multiply two columns together.

0 commit comments

Comments
 (0)