Skip to content

AudioBridge audio limiter#3593

Open
m08pvv wants to merge 54 commits intomeetecho:masterfrom
m08pvv:ab_limiter
Open

AudioBridge audio limiter#3593
m08pvv wants to merge 54 commits intomeetecho:masterfrom
m08pvv:ab_limiter

Conversation

@m08pvv
Copy link
Contributor

@m08pvv m08pvv commented Oct 31, 2025

Worked a bit with AudioBridge and it was painful to hear all crackling noises when multiple streams summed up, so I decided to implement an audio limiter. I took AGC2 from Google's WebRTC (BSD 3-clause license that allows modifications and usage as long as license file stays in repo) and translated it to C (didn't write in C since school, so it might be not clean enough), adapted it for AudioBridge and also added SSE 4.2 and AVX2 versions (with proper initialization from available at runtime).

Also added a parameter to AudioBridge: use_limiter (boolean, default=false), which enables limiter. If limiter is disabled - only clamping (saturation to int16 boundaries) applied.

Here are some results of many participants speaking (5 participants speaking and the record is from incoming ab track):
image

Top - current Janus release with AudioBridge
Bottom - version with limiter and use_limiter = true
Full-spectrum lines - artifacts (crackling noise and clipping)

As you can see, current Janus AB implementation (just sum up all tracks) results in artifacts (values outside int16 range) whereas limiter reduces volume dynamically to normal level.

How it works:

  • On ab initialization we set references to the most efficient (from available) implementation (scalar, sse 4.2, avx2)
  • For each segment of audio:
    • If we mix 2 or more tracks (recording 2+ or talking 3+) - calculate scaling factors
    • If we record 2+ tracks mixed - scale and clamp, otherwise just clamp
    • For each participant:
      • If there are 2 or more tracks besides this participant - scale and clamp, otherwise just clamp

Usage:
By default limiter is disabled and only clamping is applied.
Set use_limiter property in create audiobridge request to true to enable limiter.

Any suggestions are welcome!

@lminiero
Copy link
Member

I'm trying to better understand the scope for this effort, and after giving it some thought, I think that this patch may be trying to do the right thing but in the wrong place.

If what we want is to prevent saturation on the mix itself, then the patch in #3601 is probably enough, but what we want a limiter to do is "limit" the contribution of each individual participant, especially noisy ones, which probably means the limiter code should be applied to the audio we decode from participants (so before the mix) rather than on the full mix itself. This would be in line with how the denoiser works, in fact: for denoising too, it's individual participants we denoise before mixing, rather than denoising a mix after the fact, so that we end up with a mix that is as clean as possible. By limiting individual participants, I assume we'd have more or less normalized contrbutions, and in case their sum goes too loud, #3601 kicks in to stay within the -0.9/0.9 limit. This would also allow us to selectively choose which participants should be limited, if any, with a room default that can be used in case no property is explicitly set for participants (again, which is how denoising works too).

Does this make sense? What do you think @m08pvv @whatisor ?

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 13, 2025

@lminiero in current master the resulting mix is a sum, so if we have 5 speakers (A, B, C, D, E) then:
recording is A+B+C+D+E
participant A receives B+C+D+E
...
participant E receives A+B+C+D

We assume that all tracks (A, B, C, D, E) do not exceed 100% volume (otherwise the audio is already broken).
The sum of 4 tracks will be in range [0, 400%] of volume and sum of 5 tracks will be in [0, 500%], so we need to reduce resulting tracks at least in 4 and 5 times respectively.
The main idea was that we can calculate one set of scale factors (for the whole mix, in this case it's A+B+C+D+E) and then apply it to each resulting track (A+B+C+D, ..., B+C+D+E), so the audiobridge would not introduce the x4-x5 gain if all participants speaking.

Normalizing every track individually would be N times more CPU-intensive and usually the volume of individual speakers adjusted by volume_gain (at least I thought so).

So, if calculating N set of factors and applying them to each track individually is ok (shouldn't take much CPU, but it's a hot path in audiobridge) - then I can do so, but the sum of tracks should be normalized anyway because otherwise even if we have 5 participants that are in range [0, 50%] - resulting tracks would be [0, 200%] and [0, 250%] for participants and recording respectively.

@lminiero
Copy link
Member

Yeah I get all that, but my point is that if we have 5 participants whose sum gets to 500%, we already have that other PR that brings the mix down to 90% (instead of cutting to 100% as we did before), since we're using 32-bit samples for mixing so we're not actually losing anything. That should already be equivalent to a gain reduction in each sample. I'm personally more interested in limiting individual participants that may be too loud, as what's most boresome in my experience is not the actual sum of things, but the relative dominance of some speakers over others due to their potentially different individual gains.

@lminiero
Copy link
Member

Normalizing every track individually would be N times more CPU-intensive and usually the volume of individual speakers adjusted by volume_gain (at least I thought so).

It's true that we have the volume gain for that, but that's manual. I was thinking of something automated, which is what I assumed a limiter could bring to the table. That would pretty much be the same as audio compression (which is what I assume any limiter implements, more or less).

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 13, 2025

  1. In Fix overflow mixing #3601 the sum is not scaled to 0.9 it's just clamped at that level, so if we have e.g. a sine wave - we will get almost a square wave (meander), which is not what we want I guess. So then it needs to also scale it somehow before or after sum.

@lminiero
Copy link
Member

even if we have 5 participants that are in range [0, 50%] - resulting tracks would be [0, 200%] and [0, 250%] for participants and recording respectively.

I think this is where we may have a source of confusion. It's true that when you mix many contributions you can exceed the 100%, but that's exactly why we use an int32_t for the mixing buffer: that is to allow a sum even if it goes beyond the limits, without losing anything. When we then snap those samples back to int16_t, the more basic code in that other PR brings it back to max 90% no matter what it was, which is more or less equivalent to a gain reduction. I guess that a better way to do that would be to figure out much the loudest sample exceeds the mix, and apply the same division to all samples in those 20ms, but it's short enough snippets that I don't think it would matter much.

Anyway, I'll need to find some simple ways to test both patches, possibly using some automated participants in the AudioBridge that provide some talking. Once I do that I'll have a clearer picture of the whole thing.

@lminiero
Copy link
Member

  1. In Fix overflow mixing #3601 the sum is not scaled to 0.9 it's just clamped at that level, so if we have e.g. a sine wave - we will get almost a square wave (meander), which is not what we want I guess. So then it needs to also scale it somehow before or after sum.

Yep, makes sense, which is what I mentioned in a subsequent comment too. I'll need to make some tests.

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 13, 2025

Just for visual representation of clipping:
image
If we just limit min and max values without actual scaling, we get clipping artifacts.

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 14, 2025

@whatisor what rnnoise version do you use? I built from f6662ad41f5bf7bf244967a04e95334c81e5af4c with 55+MB rnnoise_data.tar.gz and I have zero clipping even using default demo's 16kHz audiobridge room.

@lminiero
Copy link
Member

I still need to prepare a test with a few automated AudioBridge participants for testing purposes, but I was thinking that maybe we could achieve the same kind of result in a "simpler" way, and without the additional dependency:

  1. we mix as we do know (sum to an int32 array)
  2. once we have a sum of all participants, find the "loudest" sample, and see what negative gain needs to be applied to stay within the -0.9/0.9 range (if any: we may be in the range already)
  3. apply that gain to the mix and save it to a different array, that we'll use for all passive participants (non-talkers, recording, forwarders)
  4. for participants that are talking, as we do now, remove their contribution and do the same thing (find the loudest sample, find the gain, apply the gain to the mix for that participant)

This way we don't need to worry about how many people are talking, what's silence or not, since we always get a mix that is loud enough for everyone involved, whether they're active or not. Most importantly, we don't needlessly bring the whole mix down any time more than one person talks, since (if I understood our patch correctly) as your patch reduces the volume of active participants depending on how many there are, with 4 people you decrease each of their volumes to ~25%, which will get the mix for non-active participants to ~100%, but active speakers will get a much quieter mix (75%, since their contribution will not be a part of that). Besides, silence detection as it's done is tricky, as simple background noise will not be detected as silence but will be low enough not to contribute anything meaningful and still bring the whole volume down (people talking will suddenly be quieter if someone's unmuted but not really saying anything).

Anyway, this was just me thinking of a possible alternative solution to a problem that does exist, but that's not to say that it will be THE solution. I'll need to setup some testbed to make some tests (with different fake people speaking, as me alone wouldn't cut it), and once I have it I can prepare a tentative patch that implements my idea and I can do some comparisons.

@lminiero
Copy link
Member

@m08pvv @whatisor I prepared a first version of my proposed idea in the PR above. I did a few tests using a bunch of fake participants sending pre-recorded talking audio (plus my ugly and noisy voice) and it seems to be doing something: if I look at the recording, it seems to be doing its job, but of course I may be wrong. Considering both of you have made some tests with multiple participants, could you let me know if this more or less does what you need it to? Thanks!

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 19, 2025

This (adapted from WebRTC) implementation is an AGC (Adaptive Gain Control), so it uses piecewise linear interpolation with pre-calculated gain curves, providing precise control over different signal levels and it includes attack/decay filtering that prevents abrupt gain changes, avoiding audible artifacts. Simple peak scaling can introduce pumping or breathing effects.
This limiter calculates per-sample scaling factors for smooth transitions, especially important during attack phases.
Yes it uses sophisticated envelope detection with lookahead to anticipate signal peaks, but it allows for more graceful gain reduction compared to reactive peak detection. It also divides audio into sub-frames for more granular control, enabling better transient preservation and reduced distortion.
The gain curves are specifically designed to minimize harmonic distortion while maintaining signal integrity, based on extensive psychoacoustic research (done by WebRTC team).

And yes, this code we already run in production and it sounds great in calls with 15 people.

spscream pushed a commit to spscream/janus-gateway that referenced this pull request Nov 19, 2025
@whatisor
Copy link

ce we're using 32-bit samples for

I am using https://github.com/xiph/rnnoise/releases/tag/v0.2

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 24, 2025

@whatisor, I tried 0.2 (with ~21 MB model) and it sounds terrible, try latest master of rnnoise (it might fade-in beginnings of phrases but at least it doesn't produce noise)

@m08pvv
Copy link
Contributor Author

m08pvv commented Dec 17, 2025

@lminiero, did you have a chance to look at the code?

@lminiero
Copy link
Member

Not yet, sorry... if not in the next few days, I'll definitely make this a priority as soon as I get back to the office after the holidays.

@m08pvv
Copy link
Contributor Author

m08pvv commented Jan 20, 2026

@lminiero, any luck to look at this PR?

@lminiero
Copy link
Member

Ouch, I haven't checked it yet, apologies! I'm currently busy updating the MoQ library (new version of the draft is out), I'll 100% have a deep dive right after FOSDEM next week. If I don't feel free to shout at me 😅

Copy link
Member

@lminiero lminiero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, post-FOSDEM flu knocked me out for a few days. I've done a first quick review of the code, but there's a few things I still don't understand. More details inline.

@@ -0,0 +1,855 @@
/* The code in this file contains code adapted from WebRTC project. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be helpful to link to exactly the source file you modified. If it's not a single file because it's spread in multiple ones, then maybe the folder, so that people (me included) have a reference.

I won't review the content of this file as it's ~1000 lines and I honestly don't know what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link to the original repo and directory with AGC2 sources (it's distributed across several files)

@@ -0,0 +1,99 @@
/* The code in this file contains code adapted from WebRTC project. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

#ifdef HAVE_RNNOISE
#include <rnnoise.h>
#endif
/* We ship our own version of */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of what? 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was tough day, somehow missed.
Fixed.

}
}
/* If we use limiter we should initialize it if we have more than 2 tracks mixed or we have more than 1 track mixed and recording */
if(audiobridge->use_limiter && (mix_count > 2 || (audiobridge->recording && mix_count > 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why more than two? Don't you risk clipping when there's just two very noise speakers as well? I understand that the N-1 algorithm will get both speakers a "clean" mix (just the other one), but passive speakers will still get a sum of both.

Not sure why the recording makes any difference here. Also, why are RTP forwarders not considered, since just as recordings they are a mix of everyone else? Actually, with groups support, RTP forwarders can get even more complicated than that, since they can be a sub-mix of only selected participants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified condition, now it's just audiobridge->use_limiter && mix_count > 1

Regarding forwarders - I didn't work with them and I need a deeper dive into the code, will come back later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified initialization logic a bit more and also replaced buffer with fullmix_buffer for rorwarders

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lminiero can you please take a look with a fresh view if I correctly passed fullmix_buffer to rtp-forwarders?

scale_buffer(buffer, samples, per_sample_scaling_factors, fullmix_buffer);
} else {
/* Otherwise just clamp values that are outside int16 boundaries */
clamp_buffer(buffer, samples, fullmix_buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is clamping done only when there's silent participants? Isn't clamping what we always do now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's initialization of fullmix_buffer which is later used for all silent participants and contains a mix of all tracks.

@m08pvv
Copy link
Contributor Author

m08pvv commented Mar 3, 2026

Is there anything else I can do to help to complete this PR?

@lminiero
Copy link
Member

lminiero commented Mar 4, 2026

Is there anything else I can do to help to complete this PR?

No need at the moment, thanks! The ball is in my court, so it's on me to find time to review this. I'll probably only be able to do that towards the end of the month, as we have the IETF meeting next week.

Apologies if this is taking a while, but this is a big and heavy change on the AudioBridge fundamentals. Considering we use the AB heavily in production ourselves, I can't risk merging something I'm not very familiar with, or I'm not on board 100%. I'll need time to properly and thoroughly review everything, as there were things I had doubts about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants