All notable changes to this project will be documented in this file.
The largest release since the initial publish: a SIMD backend with runtime CPU dispatch, a real-valued FFT, optional multi-threading, and a unified public API. Treat this as a fresh start — nearly every entry point was renamed.
- The public API was renamed and unified. Every entry point now spells the
precision as the Rust float type and names the FFT algorithm explicitly:
- Complex FFT (split real/imag arrays):
fft_64/fft_32are nowfft_f64_dit/fft_f32_dit. The full-controlfft_64_with_opts_and_plan/fft_32_with_opts_and_planare nowfft_f64_dit_with_planner_and_opts/fft_f32_dit_with_planner_and_opts, and a middlefft_f{64,32}_dit_with_plannertier was added. - Interleaved
ComplexFFT (featurecomplex-nums):fft_64_interleaved/fft_32_interleavedare nowfft_f64_dit_interleaved/fft_f32_dit_interleaved(plus_with_planner/_with_planner_and_optstiers). - Planner types
Planner64/Planner32are nowPlannerDit64/PlannerDit32, and are direction-agnostic and reusable — one instance drives both forward and inverse transforms.
- Complex FFT (split real/imag arrays):
Direction::Reversewas renamed toDirection::Inverse.- The minimum supported Rust version is now declared (
rust-version = "1.88").
- Real-valued FFT —
r2c_fft_f32/r2c_fft_f64and the inversec2r_fft_f32/c2r_fft_f64, producing/consuming the compactN/2 + 1spectrum; roughly 2x faster than a zero-imaginary complex FFT (#105). - Optional multi-threading via the
parallelfeature (Rayon): threaded bit reversal and a cache-oblivious parallel recursive FFT. Options::smallest_parallel_chunk_sizeto tune the parallel split point.- Fused multi-stage codelets — FFT-16 for
f64, FFT-32 forf32(#101). Debug/PartialEq/Eq/Hashimplementations across the public types.
- SIMD backend migrated to
fearless_simdwith runtime CPU-feature dispatch.-C target-cpu=nativeis no longer needed, and the crate now builds on stable Rust — the nightly requirement is gone. - The FFT core is now a recursive, cache-blocked decimation-in-time algorithm; CO-BRAVO provides cache-optimal SIMD bit reversal (#106).
- The public
cobrabit-reversal module, superseded by the internal CO-BRAVO implementation. - The nightly-toolchain requirement.
- Cache-blocked recursive DIT, fused first-stage codelets, FMA butterfly kernels, and SIMD-accelerated CO-BRAVO bit reversal.
- Inverse FFT output ordering and assorted correctness fixes.
- Add DIT FFT algorithm & bit reversal control
- Bump
bytemuckto latest version
- Add example usage of interleaved fft
- Add benchmark using criterion
- Add a more robust round-trip FFT test
- Add git hooks and contributing guide
- Add deinterleaving function
- Add criterion group for forward FFT f32
- Add new python benchmarking "framework"
- Add #![feature(doc_cfg)] to make rustdoc actually accept doc_cfg
- Fix formatting
- Fix formatting
- Fix formatting
- Fix formatting
- Fix docsrs config according to Clippy
tbh not sure if Clippy is right in this case
- Avx512 is now stable, so placate clippy
- Update
criterionbenchmarks
- Vectorize deinterleaving of AoS --> SoA
Use bytemuck + SIMD::deinterleave to rearrange input data from a slice of Complex values into 2 slices of f32 or f64 values
- Put macro definition begind feature flag
- Account for different signal sizes
- Forgot to add benchmark file
- Don't gate feature documentation on docs.rs, we're on nightly anyway
- Undo duplicate docs.rs all-features
-
Make sure benchmark runs
-
Make planner reusable
-
Planner should be re-usable so it can be re-used for FFT's of the same size
-
Add regression tests to make sure
fft_64/fft_32gives the same results asfft_64_with_opts_and_plan/fft_32_with_opts_and_plan -
Move planner completely outside of bench function
-
Avoid cloning twiddles
-
Make examples output time elapsed in nanoseconds
-
Use new de-interleaving function
-
Make required cfg show up on docs.rs
-
Make docs.rs build docs with all features enabled
- Update the normalization section
- Remove index tracker from hot path
- Updates to benchmarks without planners
- Remove
array_chunks - Update pre-commit hook to run clippy everywhere
- Update dependencies and their usage
- Revert "Make sure benchmark runs"
This reverts commit 7011dfc040b866cd773185667915a9920b9c5a80.
- Revert "Forgot to add benchmark file"
This reverts commit b70dd4b318f59af32740c4bd4c9e75a69bcd690b.
- Add a function to separate AoS to SoA
- Add reverse separate fn and add test
- Add fft_*_interleaved impls and tests
- Fix formatting
- Fix formatting
- Fix formatting
- Fixes inverse FFT ouput order issue
-
Advertise runtime feature selection
-
Transition
num-complexdependency to optional -
Make num-complex optional and non-default, and all functions that take and test interleaved FFT are now under the same feature
-
Bump num-complex version 0.4.5 --> 0.4.6
-
Enable num-complex for docs/docs.rs
-
Update github action workflow to run tests for all features
-
Fix formatting in docs to fix links
-
Placate clippy's complaints about assign op
-
Simplify separate_re_im
-
Move twiddles fwd * rev = identity test
-
Testing that all the values in the forward twiddle factors pointwise multiplied by all the values in the inverse twiddle factors is more of a unit test for twiddles. It made no sense to keep it in the planner.
-
The previous commit modifies the planner in a way that this test would no longer pass. We can get the same, good coverage by keeping this test under twiddles.
- Update docs for interleaved fft
-
Add badges for docs and latest published version
-
Add more examples and fix typos in readme/plots
-
Added rust bindings of FFTW as an example, which will be used for benchmarks
-
Add fftw (rust bindings) crate as a dev-dependency
-
Add an example of using pyphastft to reproduce an example use case of FFT from the FFT wikipedia page
-
Fix typos in the README and distinguish pyphastft from phastft in the python benchmarks plots
-
Add todo for improving twiddle generation perf
-
Add normalization & output order info. closes #13
-
Add tests for 32 bit planner
-
Add audio visualization as example for pyphastft
-
Add automatic CPU feature detection
-
Fix formatting with
cargo fmt -
Fix formatting
-
Fix lint issue brought up by CI
-
Fixed
clippy::needless_doctest_mainissue in README
-
Swapped out use of sincos() for more portability across platforms
-
Implement SIMDized twiddle generation using macro
-
Finish adding test macros for f32/f64
-
Make float comparison generic for f32/f64
-
Add f32/f64 tests for twiddles using macros
- Make cobra mod public for fast bit reversal
- Use 16 lanes for f32 in SIMD butterfly kernel
- Run
cargo fmt
- Update README and docs
- Updated benchmark instructions and fixed typos
- Update python benchmark plot title
- Update benchmark plots
- Update benchmark, plotting, and profiling scripts
- Remove duplicate public functions
- Bump black from 24.1.1 to 24.3.0 in /benches
Bumps black from 24.1.1 to 24.3.0.
updated-dependencies:
- dependency-name: black dependency-type: direct:production ...
Signed-off-by: dependabot[bot] support@github.com
- Update example and demo video
- Cleanup docs and examples
- Update codecov uploader to v4
- Update benchmarking readme
- Update benchmark plots
- Update pyphastft benchmark plots
- Release v0.1.1 and update name for python wrapper
- Update README.md
Just skip to features, right away
-
Add cargo config
-
Add rayon back
-
Add chunk size 4 kernel
-
Add opt flags
-
Add check for dups
-
Add COBRA initial impl pre-build twiddle factors
-
Add basic tests for bit reversal
-
Add bash script for profiling and update readme
-
Add a sequential bit reversal permutation
-
Add iterator for roots of unity
-
Add cobra implementation and integrate into FFT
-
Add separate mod for kernels
-
Add separate kernel for size 8 butterflies
-
Add utils
-
Add benchmarks for pyfftw, numpy, and PHFT
-
Add test for the SIMD impl
-
Add github workflow for CI
-
Add status badge
-
Add tests for bit rev; rand signal; touch-up docs
-
Add codecov job and codecov config for 90% target
-
Add code coverage badge
-
Add benchmarking scripts
-
Add dir with instructions & code to run benchmarks
-
Add FFTW3 benchmark
-
Add Options struct
-
Add fftw3 to benchmark script
-
Add a public function that accepts caller-provided
Options -
Add benchmark plots
-
Add plots to readme
-
Add
Plannerand ammend examples, tests, etc. -
Add planner mod
-
Add bench system config info
-
Add license files
-
Add asert to check input length is 2^n
-
Add a regression test to make sure non-power-of-two FFTs are not allowed for the time being
-
Add instructions for reproducing benchmarks
-
Add tests and make API breaking changes
-
Add a test to make sure inverse twiddle factors multiplied by forward twiddle factors always gives 1.0
-
fftnow takes the real/imaginary input and theDirectionof thefft -
Pre-built planners can be used with the
fft_with_opts_and_planfunction -
Add regression test for planner/fft mismatch
-
Add regression docs to explain regression test
-
Add requirements.txt for benchmarks
-
Fixes for profiling
-
Fix for prebuilt twiddles and portable SIMD
-
Fix assertion
-
Fix overhead from bit reversal threading
-
Fix typo in benchmark logging
-
Fix build
-
Fix typos in benchmarking code
-
Fix bug in phast benchmark & add fft direction
-
Fix FFTW benchmark to use wall time
-
Fix perf regression for small input sizes
-
Increased block size increases performance for larger input sizes, but the increase caused a regression on a AMD 7950x machine, for smaller input sizes. The
Plannerneeds to be updated to computes an ideal block width. In the interim, we will use 2 * CACHE_LINE_SIZE. On x86-64, this is usually 2 * 64 = 128. -
Fix assertion in
fft_with_opts_and_plan -
Twiddles should always be half the size of input
-
Fix typo
- Don't precompute twiddle factors for large chunks
- Split into two binaries for profiling
- Parallelize bit reverse permutation with 2 threads
- Experiments
- Pre-compute twiddle factors for chunk n
- Bug fix -- clear twiddle factor cache after use
- Abstract out twiddle factor generation
- Prebuild only half the required twiddles
- Try out generating twiddles lazily
- Restructure into lib, add examples, scripts, etc.
- Proof of concept of caching 1/4 of twiddle factors
- Unroll for half of twiddle factors
- Reuse generated twiddle factors by filtering
- Go back to only generating half of twiddles
- Faster twiddles hopefully workable concept this time
- Plug SIMD twiddle generation in
- Generate random signal for py benchmarks
- Increase
BLOCK_SIZEto 256 - Rewrite README
- Tweaks to "Features" in README
- Wire up guessing the options to the main FFT function
- Placate Clippy
- Exponentially decrease # of iters in benchmark
- Change plot ordering so colors are consistent
- Print human readable bytes without decimal
- Adjust pybindings to work with new API
- Expand on the Python installation instructions
- Mention the license in the README
- Set repository and unset documentation field; it will default to docs.rs
- Mention lower memory use in the "vs" section
- Automate findings of latest benchmark results
- Format benchmark scripts
- Expose
cobra_applyfor fast bit reverseal
- Use std
reverse_bitsfor performance improvement - Use faster impl of bit reversal implementation
- Replace vec with array since it's < 512_000 bytes
- Switch back to pre-built cache using vec
- Use in-place bit reversal in COBRA for small N
- Move kernels to separate mod
- Simplify size 2 butterfly
- Use numpy crate to avoid overhead in pybindings
- Use scipy to generate random signal
- Move rustfft benchmark to examples
- Use random signal for benchmarking rustfft
- Use random signal generator to benchmark phastft
- Move utility functions to separate lib
- Simplify options down to a single multi-threading knob
- Update README
- Update README with basic skeleton
- Update READMEs
- Update README
- Update README
- Update README text to match latest benchmarks
- Update documentation and readme
- Update README.md
- Update README.md
-
Optimize twiddles pre-compute
-
Improve recursive bit-reversal perf for testing
-
Tests run much faster now with a bit of loop unrolling, loop fusion, and pre-allocating vectors with the right sizes
-
Add documentation and cite references
-
Improve assertion functions
- Remove rayon for profiling
- Cleanup for profiling
- Remove rayon for profiling
- Remove unused mods
- Update for benchmarks
- Update profiling section in readme
- Update profile script, add profiling to examples
- Remove useless files
- Remove benchmark mod from lib
- Cleanup all warnings and unused code
- Remove rayon dep, add bar plot, rename to PhastFT
- Update workflow
- Remove old BRAVO implementation written in python
- Updates to examples and plotting script
- Update benchmark plotting
- Update benchmark scripts
- Remove workspace
- Cleanup x-axis of plots
- Update plots for latest benchmarks
- Update the license in Cargo.toml
- Drop unused author fields
- Update MIT license for future contributors
- Cleanup plots and split bar plot for py benches
- Update benchmark plots
- Update saved figure names in benches
- Update benchmark readme
- Revert to creating twiddle factors for large n
- Revert "Prebuild only half the required twiddles"
This reverts commit e552fb1ea2b5202c47e4cf1e8392adcefe1d476b.
- Revert "Proof of concept of caching 1/4 of twiddle factors"
This reverts commit de2cca02d99c93b76e3b42bf09d29deeb2eda62f.
- Revert "Add separate kernel for size 8 butterflies"
This reverts commit 249be77548a8b2f23aca4739a5d3aad536939cba.