The mixed-radix FFT algorithm's performance isn't too bad (at the time of writing, powers of three are something like 60% slower than FFTW on my machine).
However, the mixed-radix algorithm (at least as it exists right now) isn't the best choice for powers of two (approximately 3x slower than FFTW on my machine). I have to imagine the discrepancy here is that FFTW is probably using a modified split-radix algorithm. I'd probably be happy with a standard split-radix implementation for simplicity.
The mixed-radix FFT algorithm's performance isn't too bad (at the time of writing, powers of three are something like 60% slower than FFTW on my machine).
However, the mixed-radix algorithm (at least as it exists right now) isn't the best choice for powers of two (approximately 3x slower than FFTW on my machine). I have to imagine the discrepancy here is that FFTW is probably using a modified split-radix algorithm. I'd probably be happy with a standard split-radix implementation for simplicity.