Skip to content

Conversation

@neutrinoceros
Copy link
Owner

Close #85
Incidentally, close #114
I checked that binary size doesn't noticibly increase with this refactor, lifting my comment from #85

@neutrinoceros neutrinoceros added this to the Next release milestone Jul 6, 2025
@neutrinoceros neutrinoceros force-pushed the perf/special-case-single-it branch 3 times, most recently from 431356a to 86938ae Compare July 6, 2025 10:57
@neutrinoceros
Copy link
Owner Author

neutrinoceros commented Jul 6, 2025

This is stable. Before I undraft, squash and merge, I want to address two points:

@neutrinoceros
Copy link
Owner Author

I'm actually seeing a 5% performance regression with this patch. I'll re-issue the various independent parts of the PR piecemeal to reduce my cognitive overload deciphering this.

@neutrinoceros
Copy link
Owner Author

I think I've been looking at this from the wrong angle; I was assuming that my convolve_once was somehow less efficient than convolve_iteratively, while in fact, I'm seeing 5% overhead with both implementations, so maybe the dispatching itself is generating overhead (pretty surprising), or there's something else I'm missing.

@neutrinoceros
Copy link
Owner Author

(dispatching at the Python level instead doesn't improve performance)

@neutrinoceros
Copy link
Owner Author

(forcing inlining on convolve_once and convolve_iteratively doesn't help either)

@neutrinoceros neutrinoceros force-pushed the perf/special-case-single-it branch from d518f4f to 02a7c83 Compare July 6, 2025 15:12
@neutrinoceros
Copy link
Owner Author

The good news is that #172 more than compensates for the loss in performance from this patch, but I still would like to figure out why this is slower than main.

@neutrinoceros neutrinoceros force-pushed the perf/special-case-single-it branch 2 times, most recently from 55c57d1 to 2df507c Compare July 7, 2025 15:25
@neutrinoceros neutrinoceros removed this from the Next release milestone Jul 7, 2025
@neutrinoceros neutrinoceros force-pushed the perf/special-case-single-it branch 2 times, most recently from 7b1779c to e471400 Compare July 8, 2025 10:25
@neutrinoceros neutrinoceros force-pushed the perf/special-case-single-it branch from e471400 to 63b4b16 Compare July 21, 2025 16:03
@neutrinoceros
Copy link
Owner Author

rebased, but I'm still seeing about 2% (unexplained) overhead.

@neutrinoceros neutrinoceros force-pushed the perf/special-case-single-it branch from 63b4b16 to fb67ea9 Compare November 18, 2025 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: avoiding an array copy for single-pass convolution RFC: unifying APIs for owned arrays and views

2 participants