Skip to content

Commit 07d82e4

Browse files
committed
fix(standalone): enable flush-to-zero on the JACK RT thread
Denormal (subnormal) float arithmetic is extremely slow, especially on ARM (Raspberry Pi). As signals decay toward silence, the IR convolver and filter tails can drive intermediate values into the denormal range, causing erratic CPU spikes that don't track IR length — some IRs run fine, others struggle, with no relation to how heavily they're trimmed. There was no global flush-to-zero anywhere; only a few amp stages manually flush their state at 1e-20 (itself a normal f32, and not covering the convolver). The VST3/CLAP plugin already gets FTZ from nih-plug's process wrapper, but the standalone JACK thread set nothing. Set the CPU flush-to-zero flag on the JACK process thread (MXCSR bit 15 on x86 SSE, FPCR bit 24 on AArch64), mirroring nih-plug's approach via inline asm since Rust 1.75 deprecated the _mm_setcsr intrinsics. Idempotent and cheap, so it runs each process callback. The per-stage manual flushes stay as belt-and-suspenders. Refs #251.
1 parent a8cc930 commit 07d82e4

3 files changed

Lines changed: 56 additions & 0 deletions

File tree

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
//! Flush-to-zero (FTZ) setup for the real-time audio thread.
2+
//!
3+
//! Denormal (subnormal) floating-point arithmetic is catastrophically slow — up to
4+
//! ~10–100× on some CPUs, and especially bad on ARM (Raspberry Pi). As signals decay
5+
//! toward silence, the IR convolver and filter tails can drive intermediate values into
6+
//! the denormal range, causing erratic CPU spikes that don't track IR length. Enabling
7+
//! the CPU's flush-to-zero flag makes denormal results flush to zero, keeping cost
8+
//! consistent.
9+
//!
10+
//! The VST3/CLAP plugin already gets this from nih-plug's process wrapper; the standalone
11+
//! JACK process thread must set it itself. The flag is per-thread, so this is called from
12+
//! inside the JACK process callback.
13+
//!
14+
//! The implementation mirrors nih-plug's `ScopedFtz` — Rust 1.75 deprecated the
15+
//! `_mm_setcsr` intrinsics, so this uses inline assembly: MXCSR bit 15 on x86 SSE, FPCR
16+
//! bit 24 on AArch64. On other targets it is a no-op.
17+
18+
/// Enable flush-to-zero for denormals on the current thread. Idempotent and cheap (a
19+
/// register read plus a conditional write), so it is safe to call every process callback.
20+
#[inline]
21+
pub fn enable_flush_to_zero() {
22+
#[cfg(target_feature = "sse")]
23+
{
24+
// MXCSR bit 15 = Flush-To-Zero.
25+
const SSE_FTZ_BIT: u32 = 1 << 15;
26+
let mut mxcsr: u32 = 0;
27+
// SAFETY: stmxcsr/ldmxcsr only read/write the current thread's MXCSR register.
28+
unsafe {
29+
std::arch::asm!("stmxcsr [{}]", in(reg) std::ptr::addr_of_mut!(mxcsr));
30+
if mxcsr & SSE_FTZ_BIT == 0 {
31+
let updated = mxcsr | SSE_FTZ_BIT;
32+
std::arch::asm!("ldmxcsr [{}]", in(reg) std::ptr::addr_of!(updated));
33+
}
34+
}
35+
}
36+
37+
#[cfg(target_arch = "aarch64")]
38+
{
39+
// FPCR bit 24 = Flush-to-zero mode.
40+
const AARCH64_FTZ_BIT: u64 = 1 << 24;
41+
let mut fpcr: u64;
42+
// SAFETY: FPCR is EL0-accessible; this reads then conditionally sets the FZ bit.
43+
unsafe {
44+
std::arch::asm!("mrs {}, fpcr", out(reg) fpcr);
45+
if fpcr & AARCH64_FTZ_BIT == 0 {
46+
std::arch::asm!("msr fpcr, {}", in(reg) fpcr | AARCH64_FTZ_BIT);
47+
}
48+
}
49+
}
50+
}

rustortion-standalone/src/audio/jack.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@ impl ProcessHandler {
6464

6565
impl jack::ProcessHandler for ProcessHandler {
6666
fn process(&mut self, _client: &jack::Client, ps: &jack::ProcessScope) -> jack::Control {
67+
// Denormals are extremely slow (esp. on ARM/Pi) and the IR convolver + filter
68+
// tails can produce them as signals decay. The plugin gets FTZ from nih-plug;
69+
// the standalone must set it on its own RT thread. Idempotent and cheap.
70+
crate::audio::denormals::enable_flush_to_zero();
71+
6772
let input = self.ports.get_input(ps);
6873

6974
if let Err(e) = self.audio_engine.process(input, self.buffer.as_mut_slice()) {
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
pub mod denormals;
12
pub mod jack;
23
pub mod manager;
34
pub mod ports;

0 commit comments

Comments
 (0)