Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 85 additions & 7 deletions sdp/challenger_sdp.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,92 @@
import math
import numpy as np
from scipy.special import lambertw
from scipy.fft import next_fast_len
from scipy.fft._pocketfft.basic import r2c, c2r


def _compute_block_size(m, n, block_size=None):
"""
Return a block size for the overlap-add method.
"""
if block_size is None:
# Find optimal block_size based on m and n
if m >= n / 2:
block_size = n # i.e. no blocking
else:
overlap = m - 1
opt_size = -overlap * lambertw(-1 / (2 * math.e * overlap), k=-1).real
block_size = next_fast_len(math.ceil(opt_size), real=True)

block_size = max(block_size, m)

return min(block_size, n)


def _rfft_irfft_r2c2r_block(Q, T, block_size):
Copy link
Collaborator Author

@NimaSarajpoor NimaSarajpoor Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should choose a better name for the function. This function performs convolution between each block chunks of T and Q. So, how about _fftconvolve_block ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I see that this functions has parts in the end that look VERY similar to https://github.com/stumpy-dev/sliding_dot_product/blob/main/sdp/pocketfft_r2c_c2r_sdp.py

It feels like we should consolidate and:

  1. Create a function that determines the block/step sizes
  2. Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing
  3. Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Sounds better 👍

Copy link
Collaborator Author

@NimaSarajpoor NimaSarajpoor Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw

  1. Create a function that determines the block/step sizes

Done.

  1. Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing

Note: In oaconvolve, chunk_size is = block_size - (len(Q) - 1). The array T will be chunked by the chunk_size, and each chunk will be padded with len(Q)-1 zeros. So, the length of each chunk after padding is block_size.

Something like the following?

def chunk_generator(arr, chunk_size):
    n = len(arr)
    for i in range(0, n, chunk_size):
        yield arr[i: i+ chunk_size]

I am curious to know how you are planning to use this. Are you thinking about handling very large arrays in the future?

Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

It feels like we should consolidate

Should we return convolution instead? I've revised the module pocketfft_r2c_c2r_sdp, and it now has the function _pocketfft_convolve

I think the convolution is the common component.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like the following?

Instead of passing around arrays, I am thinking that the generator would yield the start_idx and stop_idx for the array

I am curious to know how you are planning to use this.

I am still not clear on how oaconvolve works but I am thinking that the _pocketfft_convolve is generic for ANY Q and T and then your generator gives you the appropriate index ranges to process. Without understanding everything, I could be giving, admittedly, bad advice here.

Should we return convolution instead? I've revised the module pocketfft_r2c_c2r_sdp, and it now has the function _pocketfft_convolve

Remind me, what is the difference between "convolution" and "sliding dot product"? Is the sliding dot product simply a slice of the convolution? If so, then, yes, perhaps we should maybe add a docstring to _pocketfft_convolve to explain how one gets SDP from this convolution (and also how this convolution compares to scipy.signal.convolve).

Copy link
Collaborator Author

@NimaSarajpoor NimaSarajpoor Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not clear on how oaconvolve works

Created PR #36 to help us with that.

and then your generator gives you the appropriate index ranges to process. Without understanding everything, I could be giving, admittedly, bad advice here.

overlap-add breaks down T into NON-overlapping chunks, and call RFFT on all chunks at once. So, I think we should check if r2c on N arrays with same size in one call is faster than N calls of r2c, one call per array. I guess the former should be faster because it should use the same transformation (twiddle factors). If that is true (we should check), then maybe we should pass a bunch of ranges to be processed together. (And, therefore, maybe we do batch processing when T is VERY large, where each batch contains a set of ranges ??)

Remind me, what is the difference between "convolution" and "sliding dot product"? Is the sliding dot product simply a slice of the convolution?

It is simply that.

If so, then, yes, perhaps we should maybe add a docstring to _pocketfft_convolve to explain how one gets SDP from this convolution (and also how this convolution compares to scipy.signal.convolve).

Going to keep this comment open, and will get back to this after finalizing #36, which should help us get some clarity.

m = Q.shape[0]
n = T.shape[0]
T_step = block_size - (m - 1)
n_step = math.ceil(n / T_step)
last_chunk_start = (n_step - 1) * T_step

tmp = np.empty((n_step + 1, block_size), dtype=np.float64)

# fill with T, block-wise
tmp[: n_step - 1, :T_step] = T[:last_chunk_start].reshape(n_step - 1, T_step)
tmp[: n_step - 1, T_step:] = 0.0
tmp[n_step - 1, : n - last_chunk_start] = T[last_chunk_start:]
tmp[n_step - 1, n - last_chunk_start :] = 0.0

# fill with Q[::-1]
tmp[n_step, :m] = Q[::-1]
tmp[n_step, m:] = 0.0

fft_2d = r2c(True, tmp, axis=-1)

return c2r(False, np.multiply(fft_2d[:-1], fft_2d[[-1]]), n=block_size)


def _sliding_dot_product_r2c2r(Q, T):
n = len(T)
m = len(Q)
next_fast_n = next_fast_len(n, real=True)

tmp = np.empty((2, next_fast_n))
tmp[0, :m] = Q[::-1]
tmp[0, m:] = 0.0
tmp[1, :n] = T
tmp[1, n:] = 0.0
fft_2d = r2c(True, tmp, axis=-1)

return c2r(False, np.multiply(fft_2d[0], fft_2d[1]), n=next_fast_n)[m - 1 : n]


def _sliding_dot_product(Q, T, block_size):
m = Q.shape[0]
n = T.shape[0]

overlap = m - 1
ret = _rfft_irfft_r2c2r_block(Q, T, block_size)
out = ret[:, :-overlap]
out[1:, :overlap] += ret[:-1, -overlap:]
out = np.reshape(out, (-1,))

return out[m - 1 : n]


def setup(Q, T):
return


def sliding_dot_product(Q, T):
m = len(Q)
l = T.shape[0] - m + 1
out = np.empty(l)
for i in range(l):
out[i] = np.dot(Q, T[i : i + m])
return out
def sliding_dot_product(Q, T, block_size=None):
m = Q.shape[0]
n = T.shape[0]
if m == n:
return np.dot(Q, T)

block_size = _compute_block_size(m, n, block_size=block_size)
if block_size >= n:
return _sliding_dot_product_r2c2r(Q, T)
else:
return _sliding_dot_product(Q, T, block_size)
15 changes: 15 additions & 0 deletions test.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,3 +210,18 @@ def test_pyfftw_sdp_max_n():
np.testing.assert_allclose(comp, ref)

return


def test_oaconvolve_sdp_blocksize():
from sdp.challenger_sdp import sliding_dot_product
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line needs to be modified if, at a later time, we decide to move the proposal to a new file (module).


T = np.random.rand(2**10)
Q = np.random.rand(2**8)
block_size = 2**9

comp = sliding_dot_product(Q, T, block_size=block_size)
ref = naive_sliding_dot_product(Q, T)

np.testing.assert_allclose(comp, ref)

return
Loading