Agc improvements and improve gain control stability#882
Agc improvements and improve gain control stability#882UnknownSuperficialNight wants to merge 5 commits into
Conversation
…bility - Replace coefficient-based `attack/release` with direct `Duration` types - Reduce `RMS_WINDOW_SIZE` from `8192` to `512` samples to lower latency - Switch RMS calculation from mean-based buffer (`CircularBuffer`) to sum-of-squares approach in `CircularBufferRMS` for accurate root-mean-square values - Introduce `SlowDownState` struct that manages timing and caching: counts samples in 2ms blocks, computes adaptive `slowdown_factor` using `compute_slowdown_factor` and caches the result for reuse - Implement `fast_exp` using Horner's method for efficient exponential approximation of release coefficients (third-order Taylor polynomial) - Add `NaN` handling in RMS calculation to prevent invalid values - Add rate limiting to gain changes: clamp gain change per sample based on dynamic attack/release duration to prevent overshooting - Add new `peak_tracking_window` setting to control peak level smoothing - Tune default timing parameters: 500ms attack, 0.5ms release, 10ms peak tracking window for balanced behaviour
…calculation - Replace hardcoded `1.0` fallback with `self.current_gain` when `RMS` equals `0.0` - Add comment explaining this keeps gain stable or allows gradual decay instead of sudden drops
- Cap peak tracking at 1.0 to handle out-of-bounds decoder samples - Ensure samples from decoders that are not normalised like `libopus` do not track out-of-bounds values
- Cap rms tracking at 1.0 to handle out-of-bounds decoder samples - Ensure samples from decoders that are not normalised like `libopus` do not track out-of-bounds values
- Change `RMS_WINDOW_SIZE` constant from `512` to `1024` - 1024 samples provides ~23ms window at 44.1kHz / ~21ms at 48kHz for stable RMS estimation
| release_time: Duration::from_secs(0), // Recommended release time | ||
| absolute_max_gain: 7.0, // Recommended max gain | ||
| target_level: 1.0, // Default to original level | ||
| attack_time: Duration::from_millis(500), // Recommended attack time |
There was a problem hiding this comment.
This might be too low I found 500ms or 800ms to be quite nice would like some feedback on this is if possible
There was a problem hiding this comment.
Sorry I have no idea what works best for the new algorithm. For speech quiet fast was useful
| } | ||
| } | ||
|
|
||
| impl<I> Iterator for AutomaticGainControl<I> |
There was a problem hiding this comment.
I don't know if I implemented the new changes for this part correctly for the new algorithm I think I did, but it would be nice to have someone else check this that is more familiar with.
| /// It provides a good balance between speed and accuracy, resulting in | ||
| /// faster benchmark times compared to the standard `exp` function. | ||
| #[inline] | ||
| fn fast_exp(x: Float) -> Float { |
There was a problem hiding this comment.
go ahead seems like a good addition!
yara-blue
left a comment
There was a problem hiding this comment.
I also like the idea for multiple profiles. Ideally we also give the "current" default a name, maybe "Music" and "Speech"?
| } | ||
| } | ||
|
|
||
| impl<I> Iterator for AutomaticGainControl<I> |
| absolute_max_gain: 7.0, // Recommended max gain | ||
| target_level: 1.0, // Default to original level | ||
| attack_time: Duration::from_millis(500), // Recommended attack time | ||
| release_time: Duration::from_nanos(500000), // Recommended release time |
| } | ||
| } | ||
|
|
||
| impl<I> Iterator for AutomaticGainControl<I> |
| release_time: Duration::from_secs(0), // Recommended release time | ||
| absolute_max_gain: 7.0, // Recommended max gain | ||
| target_level: 1.0, // Default to original level | ||
| attack_time: Duration::from_millis(500), // Recommended attack time |
There was a problem hiding this comment.
Sorry I have no idea what works best for the new algorithm. For speech quiet fast was useful
| /// It provides a good balance between speed and accuracy, resulting in | ||
| /// faster benchmark times compared to the standard `exp` function. | ||
| #[inline] | ||
| fn fast_exp(x: Float) -> Float { |
There was a problem hiding this comment.
go ahead seems like a good addition!
Values outside of Long story short, we should deal with such values without clipping them. |
Any ideas on this? First thing that comes to mind though I could be wrong is something like this let full_scale = self.peak_level.max(1.0);
// Calculate max gain change per sample based on dynamic attack/release times
let max_attack_gain_change_per_sample = full_scale / (dynamic_attack_time * sample_rate);
let max_release_gain_change_per_sample = full_scale / (release_duration * sample_rate);Basically go through and compute a new max per sample and scale for that. Just throwing ideas out there. Would probably have to go through it all again possibly and remove the |
AGC maps input to the range [-1.0, 1.0]. To do so without clipping it needs the width of the input range. It can't look ahead to see what other samples will be emitted and thus what the peak is. All I can think of is to assume the input range to be unrealistically big, say: [-1.5, 1.5]? Is that unrealistically big?
Lets look at some extreme, what would happen if halfway through playback one single sample peaks really high, lets say 10.0? Would everything get quieter after that sample? |
|
Please excuse me responding a bit theoretically without recent study of the current implementation: The gain calculation should fundamentally be the ratio If the root cause is that the code assumes 1.0 as ceiling, then ideally we should remove those assumptions. |
I was thinking about a running maximum where we track each sample and if we get a sample that exceeds, we will replace the old running maximum with the new one. That was my original idea anyway.
Possibly I guess that would depend on the implementation maybe there is a peak decay after Though neither of these seem like a proper solution I must admit. |
This is what happens without the Though this could be an issue as we could get, for example, dips to My approach was to, by default, limit the gain to One thing we could do here is remove the
It would be ideal though. However, how would we scale the |
This PR focusses mostly on adding stability to AGC through the
slowdown_factorand miscellaneous improvements.I've been experimenting with the AGC to find ways to stabilise it. This is the result.
The
compute_slowdown_factorfunctions as a third control layer that measures proximity to the target gain alongside standardRMSandpeakmetrics. It acts as a dynamic throttle, adjusting the AGC rate of change based on how close the signal is to the desired level. The slowdown logic activates only when the current gain falls within the combinedRMS+peaktolerance window relative to the target. When the input is loud, the tolerance window widens; with quieter signals, it contracts.Inside this boundary, exponential scaling prevents the harsh jumps and oscillations that occurred with fixed-rate adjustments. As the signal approaches the target, the slowdown increases to reduce the AGC rate of change and produce smoother behaviour. Outside this zone, the AGC uses normal responsiveness, which allows for more rapid correction when needed. The tolerance window is bounded by the combined
RMS+peakmetric.By managing these ranges, the system enables faster attack times without flattening audio dynamics. Previously, aggressive speeds would normalise all sounds to a flat line. Now the AGC can accelerate adjustments when far from the target but slows down exponentially as it approaches the goal. This preserves audio depth while maintaining stability: quick reactions when needed, with gradual stabilisation near the final level, preventing gain overshoot and sudden volume spikes that can occur with fixed-rate adjustments.
update_peak_levelOptimisationThis function was a performance hotspot due to per-sample allocation and branching. Previously, we computed a conditional coefficient for each sample: a fast attack coefficient (0.0) when the sample exceeded the peak, and a slow release coefficient otherwise.
I've replaced this with a branchless implementation that uses a fixed release_coefficient (which is always cached), eliminating the per-sample if branch and allocation.
Before (Slow, Branching + Allocation):
Other changes in this PR
CircularBufferRMSnow uses sum-of-squares internally and is cleaned up.div_or_fallbackhelper to safely divide by non-NaN, non-infinite, positive values.fast_exphelper using Horner's method forexp(x)approximation incompute_slowdown_factor.Benchmarks
Benchmarks before:
Benchmarks after the changes and redesign:
Concerns
The Libopus decoder can output samples above
1.0, such as1.1,1.064, and similar values, for bothRMSandpeakreadings depending on the track. This behaviour is not observed with the FLAC decoder.These out-of-range samples cause errors downstream, particularly when offsetting the current gain below
1.0while targeting1.0. I've added.min(1.0)to ensure the gain never exceeds the cap/limit for RMS and peak.The root cause is with the Libopus decoder, as far as I can tell, which should not output values above
1.0in the first place.This is probably worth investigating: is this behaviour by design in Libopus, or is there something wrong upstream of the effect?
Potential Improvements
2048for96kHz). This ensures the buffer remains consistent.Pseudocode Example:
AutomaticGainControlSettingsmight be a good idea.Video Comparison
Before:
before_normal.mp4
After:
after_normal.mp4
Before near the loudness limit
near_limit_before.mp4
After near the loudness limit
near_limit_after.mp4
Additional notes
This can be tuned back to how it worked originally if users preferred the more normalised sound.
It might even be worth adding a toggle for the slowdown then we can disable it.