Skip to content

Agc improvements and improve gain control stability#882

Draft
UnknownSuperficialNight wants to merge 5 commits into
RustAudio:masterfrom
UnknownSuperficialNight:agc-improvements-and-bug-fix
Draft

Agc improvements and improve gain control stability#882
UnknownSuperficialNight wants to merge 5 commits into
RustAudio:masterfrom
UnknownSuperficialNight:agc-improvements-and-bug-fix

Conversation

@UnknownSuperficialNight
Copy link
Copy Markdown
Contributor

@UnknownSuperficialNight UnknownSuperficialNight commented May 11, 2026

This PR focusses mostly on adding stability to AGC through the slowdown_factor and miscellaneous improvements.

I've been experimenting with the AGC to find ways to stabilise it. This is the result.

The compute_slowdown_factor functions as a third control layer that measures proximity to the target gain alongside standard RMS and peak metrics. It acts as a dynamic throttle, adjusting the AGC rate of change based on how close the signal is to the desired level. The slowdown logic activates only when the current gain falls within the combined RMS+peak tolerance window relative to the target. When the input is loud, the tolerance window widens; with quieter signals, it contracts.

Inside this boundary, exponential scaling prevents the harsh jumps and oscillations that occurred with fixed-rate adjustments. As the signal approaches the target, the slowdown increases to reduce the AGC rate of change and produce smoother behaviour. Outside this zone, the AGC uses normal responsiveness, which allows for more rapid correction when needed. The tolerance window is bounded by the combined RMS+peak metric.

By managing these ranges, the system enables faster attack times without flattening audio dynamics. Previously, aggressive speeds would normalise all sounds to a flat line. Now the AGC can accelerate adjustments when far from the target but slows down exponentially as it approaches the goal. This preserves audio depth while maintaining stability: quick reactions when needed, with gradual stabilisation near the final level, preventing gain overshoot and sudden volume spikes that can occur with fixed-rate adjustments.

update_peak_level Optimisation

This function was a performance hotspot due to per-sample allocation and branching. Previously, we computed a conditional coefficient for each sample: a fast attack coefficient (0.0) when the sample exceeded the peak, and a slow release coefficient otherwise.

I've replaced this with a branchless implementation that uses a fixed release_coefficient (which is always cached), eliminating the per-sample if branch and allocation.

Before (Slow, Branching + Allocation):

// This was allocating each sample
let coeff = if sample_value > self.peak_level {
    // Fast attack for rising peaks
    0.0
} else {
    // Slow release for falling peaks
    release_coeff
};

Other changes in this PR

  • CircularBufferRMS now uses sum-of-squares internally and is cleaned up.
  • Attack and release times are now raw floats instead of coefficients.
  • Added div_or_fallback helper to safely divide by non-NaN, non-infinite, positive values.
  • NaN guards added to RMS and peak logic to prevent either from getting corrupted.
  • Added fast_exp helper using Horner's method for exp(x) approximation in compute_slowdown_factor.

Benchmarks

Benchmarks before:

Timer precision: 20 ns
effects         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ agc_enabled  12.38 ms      │ 13.49 ms      │ 12.48 ms      │ 12.54 ms      │ 100     │ 100

Benchmarks after the changes and redesign:

Timer precision: 20 ns
effects         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ agc_enabled  9.145 ms      │ 12.98 ms      │ 9.209 ms      │ 9.408 ms      │ 100     │ 100

Concerns

The Libopus decoder can output samples above 1.0, such as 1.1, 1.064, and similar values, for both RMS and peak readings depending on the track. This behaviour is not observed with the FLAC decoder.

These out-of-range samples cause errors downstream, particularly when offsetting the current gain below 1.0 while targeting 1.0. I've added .min(1.0) to ensure the gain never exceeds the cap/limit for RMS and peak.

The root cause is with the Libopus decoder, as far as I can tell, which should not output values above 1.0 in the first place.

This is probably worth investigating: is this behaviour by design in Libopus, or is there something wrong upstream of the effect?

Potential Improvements

  • Lookahead Buffer: Rodio does not natively support this, but adding a buffer would allow gradual pre-amplitude gain adjustment before a spike/kick occurs.
  • Dynamic Buffer Size: Adjust size based on sample rate to maintain a consistent ~20ms window (e.g., 2048 for 96kHz). This ensures the buffer remains consistent.
    Pseudocode Example:
fn buffer_size(sample_rate: u32) -> usize {
    match sample_rate {
        96_000 => 2048,
        192_000 => 4096,
        _ => 1024,
    }
}
  • Speech Profile: Adding a profile tune for dedicated speech to AutomaticGainControlSettings might be a good idea.

Video Comparison

Before:

before_normal.mp4

After:

after_normal.mp4

Before near the loudness limit

near_limit_before.mp4

After near the loudness limit

near_limit_after.mp4

Additional notes

This can be tuned back to how it worked originally if users preferred the more normalised sound.
It might even be worth adding a toggle for the slowdown then we can disable it.

…bility

- Replace coefficient-based `attack/release` with direct `Duration` types
- Reduce `RMS_WINDOW_SIZE` from `8192` to `512` samples to lower latency
- Switch RMS calculation from mean-based buffer (`CircularBuffer`) to sum-of-squares approach in `CircularBufferRMS` for accurate root-mean-square values
- Introduce `SlowDownState` struct that manages timing and caching: counts samples in 2ms blocks, computes adaptive `slowdown_factor` using `compute_slowdown_factor` and caches the result for reuse
- Implement `fast_exp` using Horner's method for efficient exponential approximation of release coefficients (third-order Taylor polynomial)
- Add `NaN` handling in RMS calculation to prevent invalid values
- Add rate limiting to gain changes: clamp gain change per sample based on dynamic attack/release duration to prevent overshooting
- Add new `peak_tracking_window` setting to control peak level smoothing
- Tune default timing parameters: 500ms attack, 0.5ms release, 10ms peak tracking window for balanced behaviour
…calculation

- Replace hardcoded `1.0` fallback with `self.current_gain` when `RMS` equals `0.0`
- Add comment explaining this keeps gain stable or allows gradual decay instead of sudden drops
- Cap peak tracking at 1.0 to handle out-of-bounds decoder samples
- Ensure samples from decoders that are not normalised like `libopus` do not track out-of-bounds values
- Cap rms tracking at 1.0 to handle out-of-bounds decoder samples
- Ensure samples from decoders that are not normalised like `libopus` do not track out-of-bounds values
- Change `RMS_WINDOW_SIZE` constant from `512` to `1024`
- 1024 samples provides ~23ms window at 44.1kHz / ~21ms at 48kHz for stable RMS estimation
Comment thread src/source/agc.rs
release_time: Duration::from_secs(0), // Recommended release time
absolute_max_gain: 7.0, // Recommended max gain
target_level: 1.0, // Default to original level
attack_time: Duration::from_millis(500), // Recommended attack time
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too low I found 500ms or 800ms to be quite nice would like some feedback on this is if possible

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I have no idea what works best for the new algorithm. For speech quiet fast was useful

Comment thread src/source/agc.rs
}
}

impl<I> Iterator for AutomaticGainControl<I>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I implemented the new changes for this part correctly for the new algorithm I think I did, but it would be nice to have someone else check this that is more familiar with.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment thread src/source/agc.rs
/// It provides a good balance between speed and accuracy, resulting in
/// faster benchmark times compared to the standard `exp` function.
#[inline]
fn fast_exp(x: Float) -> Float {
Copy link
Copy Markdown
Contributor Author

@UnknownSuperficialNight UnknownSuperficialNight May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth moving this to math.rs?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go ahead seems like a good addition!

Copy link
Copy Markdown
Member

@yara-blue yara-blue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the idea for multiple profiles. Ideally we also give the "current" default a name, maybe "Music" and "Speech"?

Comment thread src/source/agc.rs
}
}

impl<I> Iterator for AutomaticGainControl<I>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

Comment thread src/source/agc.rs
absolute_max_gain: 7.0, // Recommended max gain
target_level: 1.0, // Default to original level
attack_time: Duration::from_millis(500), // Recommended attack time
release_time: Duration::from_nanos(500000), // Recommended release time
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use from_micros here :)

Comment thread src/source/agc.rs
}
}

impl<I> Iterator for AutomaticGainControl<I>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment thread src/source/agc.rs
release_time: Duration::from_secs(0), // Recommended release time
absolute_max_gain: 7.0, // Recommended max gain
target_level: 1.0, // Default to original level
attack_time: Duration::from_millis(500), // Recommended attack time
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I have no idea what works best for the new algorithm. For speech quiet fast was useful

Comment thread src/source/agc.rs
/// It provides a good balance between speed and accuracy, resulting in
/// faster benchmark times compared to the standard `exp` function.
#[inline]
fn fast_exp(x: Float) -> Float {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go ahead seems like a good addition!

@roderickvd
Copy link
Copy Markdown
Member

The Libopus decoder can output samples above 1.0, such as 1.1, 1.064, and similar values, for both RMS and peak readings depending on the track. This behaviour is not observed with the FLAC decoder.

These out-of-range samples cause errors downstream, particularly when offsetting the current gain below 1.0 while targeting 1.0. I've added .min(1.0) to ensure the gain never exceeds the cap/limit for RMS and peak.

Values outside of -1.0..=1.0 aren't out-of-range for a DSP pipeline. Strange as it may be from libopus itself, the beauty of working in normalized floating point is that it never clips until it's finally converted to integer. A chain of Rodio filters itself could also return values > 1.0 even if the decoder wouldn't.

Long story short, we should deal with such values without clipping them.

@UnknownSuperficialNight
Copy link
Copy Markdown
Contributor Author

UnknownSuperficialNight commented May 12, 2026

The Libopus decoder can output samples above 1.0, such as 1.1, 1.064, and similar values, for both RMS and peak readings depending on the track. This behaviour is not observed with the FLAC decoder.
These out-of-range samples cause errors downstream, particularly when offsetting the current gain below 1.0 while targeting 1.0. I've added .min(1.0) to ensure the gain never exceeds the cap/limit for RMS and peak.

Values outside of -1.0..=1.0 aren't out-of-range for a DSP pipeline. Strange as it may be from libopus itself, the beauty of working in normalized floating point is that it never clips until it's finally converted to integer. A chain of Rodio filters itself could also return values > 1.0 even if the decoder wouldn't.

Long story short, we should deal with such values without clipping them.

Any ideas on this?

First thing that comes to mind though I could be wrong is something like this self.peak_level.max(1.0); and storing that per sample like this basically

let full_scale = self.peak_level.max(1.0);

// Calculate max gain change per sample based on dynamic attack/release times
let max_attack_gain_change_per_sample = full_scale / (dynamic_attack_time * sample_rate);
let max_release_gain_change_per_sample = full_scale / (release_duration * sample_rate);

Basically go through and compute a new max per sample and scale for that.

Just throwing ideas out there.

Would probably have to go through it all again possibly and remove the 1.0 assumption

@yara-blue
Copy link
Copy Markdown
Member

Any ideas on this?

Long story short, we should deal with such values without clipping them.

AGC maps input to the range [-1.0, 1.0]. To do so without clipping it needs the width of the input range. It can't look ahead to see what other samples will be emitted and thus what the peak is. All I can think of is to assume the input range to be unrealistically big, say: [-1.5, 1.5]? Is that unrealistically big?

Basically go through and compute a new max per sample and scale for that.

Lets look at some extreme, what would happen if halfway through playback one single sample peaks really high, lets say 10.0? Would everything get quieter after that sample?

@roderickvd
Copy link
Copy Markdown
Member

Please excuse me responding a bit theoretically without recent study of the current implementation:

The gain calculation should fundamentally be the ratio target / measured, regardless of whether the peak is 0.8, 1.0, or 1.1. No fixed ceiling should be needed. Instead of clipping with .min(1.0) we should track the true peak. If the peak is 1.1, the AGC should respond with a gain below 1.0.

If the root cause is that the code assumes 1.0 as ceiling, then ideally we should remove those assumptions.

@UnknownSuperficialNight
Copy link
Copy Markdown
Contributor Author

Any ideas on this?

Long story short, we should deal with such values without clipping them.

AGC maps input to the range [-1.0, 1.0]. To do so without clipping it needs the width of the input range. It can't look ahead to see what other samples will be emitted and thus what the peak is. All I can think of is to assume the input range to be unrealistically big, say: [-1.5, 1.5]? Is that unrealistically big?

I was thinking about a running maximum where we track each sample and if we get a sample that exceeds, we will replace the old running maximum with the new one. That was my original idea anyway.

Basically go through and compute a new max per sample and scale for that.

Lets look at some extreme, what would happen if halfway through playback one single sample peaks really high, lets say 10.0? Would everything get quieter after that sample?

Possibly I guess that would depend on the implementation maybe there is a peak decay after n samples etc…

Though neither of these seem like a proper solution I must admit.

@UnknownSuperficialNight
Copy link
Copy Markdown
Contributor Author

UnknownSuperficialNight commented May 13, 2026

Please excuse me responding a bit theoretically without recent study of the current implementation:

The gain calculation should fundamentally be the ratio target / measured, regardless of whether the peak is 0.8, 1.0, or 1.1. No fixed ceiling should be needed. Instead of clipping with .min(1.0) we should track the true peak. If the peak is 1.1, the AGC should respond with a gain below 1.0.

This is what happens without the min(1.0)when the RMS or peak go over 1.0 the AGC dips below 1.0 and or spikes of dropping volume.

Though this could be an issue as we could get, for example, dips to 0.7 gain and dropping volume.

My approach was to, by default, limit the gain to 1.0/target so in other words, if it's max or very close to max, the gain should be around source so that the sound is the same as the original. Then, if the input clips, the AGC should clip (stay the same as source gain) and if the input does not clip, the AGC does not clip in other words, default to what the original sound was.

One thing we could do here is remove the min(1.0) and let it fall below 1.0 if calculated RMS or PEAK go above 1.0 but for default, use the limiter to limit gain to 1.0 then this way if people want gain dropping below 1.0 they can set the limiter to 0.0 while by default it stops at source aka 1.0

If the root cause is that the code assumes 1.0 as ceiling, then ideally we should remove those assumptions.

It would be ideal though. However, how would we scale the PEAK and RMS then? There needs to be a scale somewhere. I guess a true peak would be it, but that would only really be possible with a running maximum as we cannot pre-process the file to find a true peak, nor can we look ahead with a look-ahead buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants