Conversation
|
I'm trying to better understand the scope for this effort, and after giving it some thought, I think that this patch may be trying to do the right thing but in the wrong place. If what we want is to prevent saturation on the mix itself, then the patch in #3601 is probably enough, but what we want a limiter to do is "limit" the contribution of each individual participant, especially noisy ones, which probably means the limiter code should be applied to the audio we decode from participants (so before the mix) rather than on the full mix itself. This would be in line with how the denoiser works, in fact: for denoising too, it's individual participants we denoise before mixing, rather than denoising a mix after the fact, so that we end up with a mix that is as clean as possible. By limiting individual participants, I assume we'd have more or less normalized contrbutions, and in case their sum goes too loud, #3601 kicks in to stay within the -0.9/0.9 limit. This would also allow us to selectively choose which participants should be limited, if any, with a room default that can be used in case no property is explicitly set for participants (again, which is how denoising works too). |
|
@lminiero in current master the resulting mix is a sum, so if we have 5 speakers (A, B, C, D, E) then: We assume that all tracks (A, B, C, D, E) do not exceed 100% volume (otherwise the audio is already broken). Normalizing every track individually would be N times more CPU-intensive and usually the volume of individual speakers adjusted by volume_gain (at least I thought so). So, if calculating N set of factors and applying them to each track individually is ok (shouldn't take much CPU, but it's a hot path in audiobridge) - then I can do so, but the sum of tracks should be normalized anyway because otherwise even if we have 5 participants that are in range [0, 50%] - resulting tracks would be [0, 200%] and [0, 250%] for participants and recording respectively. |
|
Yeah I get all that, but my point is that if we have 5 participants whose sum gets to 500%, we already have that other PR that brings the mix down to 90% (instead of cutting to 100% as we did before), since we're using 32-bit samples for mixing so we're not actually losing anything. That should already be equivalent to a gain reduction in each sample. I'm personally more interested in limiting individual participants that may be too loud, as what's most boresome in my experience is not the actual sum of things, but the relative dominance of some speakers over others due to their potentially different individual gains. |
It's true that we have the volume gain for that, but that's manual. I was thinking of something automated, which is what I assumed a limiter could bring to the table. That would pretty much be the same as audio compression (which is what I assume any limiter implements, more or less). |
|
I think this is where we may have a source of confusion. It's true that when you mix many contributions you can exceed the 100%, but that's exactly why we use an Anyway, I'll need to find some simple ways to test both patches, possibly using some automated participants in the AudioBridge that provide some talking. Once I do that I'll have a clearer picture of the whole thing. |
Yep, makes sense, which is what I mentioned in a subsequent comment too. I'll need to make some tests. |
|
@whatisor what rnnoise version do you use? I built from f6662ad41f5bf7bf244967a04e95334c81e5af4c with 55+MB rnnoise_data.tar.gz and I have zero clipping even using default demo's 16kHz audiobridge room. |
|
I still need to prepare a test with a few automated AudioBridge participants for testing purposes, but I was thinking that maybe we could achieve the same kind of result in a "simpler" way, and without the additional dependency:
This way we don't need to worry about how many people are talking, what's silence or not, since we always get a mix that is loud enough for everyone involved, whether they're active or not. Most importantly, we don't needlessly bring the whole mix down any time more than one person talks, since (if I understood our patch correctly) as your patch reduces the volume of active participants depending on how many there are, with 4 people you decrease each of their volumes to ~25%, which will get the mix for non-active participants to ~100%, but active speakers will get a much quieter mix (75%, since their contribution will not be a part of that). Besides, silence detection as it's done is tricky, as simple background noise will not be detected as silence but will be low enough not to contribute anything meaningful and still bring the whole volume down (people talking will suddenly be quieter if someone's unmuted but not really saying anything). Anyway, this was just me thinking of a possible alternative solution to a problem that does exist, but that's not to say that it will be THE solution. I'll need to setup some testbed to make some tests (with different fake people speaking, as me alone wouldn't cut it), and once I have it I can prepare a tentative patch that implements my idea and I can do some comparisons. |
|
@m08pvv @whatisor I prepared a first version of my proposed idea in the PR above. I did a few tests using a bunch of fake participants sending pre-recorded talking audio (plus my ugly and noisy voice) and it seems to be doing something: if I look at the recording, it seems to be doing its job, but of course I may be wrong. Considering both of you have made some tests with multiple participants, could you let me know if this more or less does what you need it to? Thanks! |
|
This (adapted from WebRTC) implementation is an AGC (Adaptive Gain Control), so it uses piecewise linear interpolation with pre-calculated gain curves, providing precise control over different signal levels and it includes attack/decay filtering that prevents abrupt gain changes, avoiding audible artifacts. Simple peak scaling can introduce pumping or breathing effects. And yes, this code we already run in production and it sounds great in calls with 15 people. |
I am using https://github.com/xiph/rnnoise/releases/tag/v0.2 |
|
@whatisor, I tried 0.2 (with ~21 MB model) and it sounds terrible, try latest master of rnnoise (it might fade-in beginnings of phrases but at least it doesn't produce noise) |
|
@lminiero, did you have a chance to look at the code? |
|
Not yet, sorry... if not in the next few days, I'll definitely make this a priority as soon as I get back to the office after the holidays. |
|
@lminiero, any luck to look at this PR? |
|
Ouch, I haven't checked it yet, apologies! I'm currently busy updating the MoQ library (new version of the draft is out), I'll 100% have a deep dive right after FOSDEM next week. If I don't feel free to shout at me 😅 |
lminiero
left a comment
There was a problem hiding this comment.
Apologies for the delay, post-FOSDEM flu knocked me out for a few days. I've done a first quick review of the code, but there's a few things I still don't understand. More details inline.
| @@ -0,0 +1,855 @@ | |||
| /* The code in this file contains code adapted from WebRTC project. */ | |||
There was a problem hiding this comment.
It may be helpful to link to exactly the source file you modified. If it's not a single file because it's spread in multiple ones, then maybe the folder, so that people (me included) have a reference.
I won't review the content of this file as it's ~1000 lines and I honestly don't know what it does.
There was a problem hiding this comment.
Added link to the original repo and directory with AGC2 sources (it's distributed across several files)
| @@ -0,0 +1,99 @@ | |||
| /* The code in this file contains code adapted from WebRTC project. */ | |||
src/plugins/janus_audiobridge.c
Outdated
| #ifdef HAVE_RNNOISE | ||
| #include <rnnoise.h> | ||
| #endif | ||
| /* We ship our own version of */ |
There was a problem hiding this comment.
It was tough day, somehow missed.
Fixed.
src/plugins/janus_audiobridge.c
Outdated
| } | ||
| } | ||
| /* If we use limiter we should initialize it if we have more than 2 tracks mixed or we have more than 1 track mixed and recording */ | ||
| if(audiobridge->use_limiter && (mix_count > 2 || (audiobridge->recording && mix_count > 1))) |
There was a problem hiding this comment.
Why more than two? Don't you risk clipping when there's just two very noise speakers as well? I understand that the N-1 algorithm will get both speakers a "clean" mix (just the other one), but passive speakers will still get a sum of both.
Not sure why the recording makes any difference here. Also, why are RTP forwarders not considered, since just as recordings they are a mix of everyone else? Actually, with groups support, RTP forwarders can get even more complicated than that, since they can be a sub-mix of only selected participants.
There was a problem hiding this comment.
Simplified condition, now it's just audiobridge->use_limiter && mix_count > 1
Regarding forwarders - I didn't work with them and I need a deeper dive into the code, will come back later.
There was a problem hiding this comment.
Simplified initialization logic a bit more and also replaced buffer with fullmix_buffer for rorwarders
There was a problem hiding this comment.
@lminiero can you please take a look with a fresh view if I correctly passed fullmix_buffer to rtp-forwarders?
src/plugins/janus_audiobridge.c
Outdated
| scale_buffer(buffer, samples, per_sample_scaling_factors, fullmix_buffer); | ||
| } else { | ||
| /* Otherwise just clamp values that are outside int16 boundaries */ | ||
| clamp_buffer(buffer, samples, fullmix_buffer); |
There was a problem hiding this comment.
Why is clamping done only when there's silent participants? Isn't clamping what we always do now?
There was a problem hiding this comment.
It's initialization of fullmix_buffer which is later used for all silent participants and contains a mix of all tracks.
…ed to make audio limiter
|
Is there anything else I can do to help to complete this PR? |
No need at the moment, thanks! The ball is in my court, so it's on me to find time to review this. I'll probably only be able to do that towards the end of the month, as we have the IETF meeting next week. Apologies if this is taking a while, but this is a big and heavy change on the AudioBridge fundamentals. Considering we use the AB heavily in production ourselves, I can't risk merging something I'm not very familiar with, or I'm not on board 100%. I'll need time to properly and thoroughly review everything, as there were things I had doubts about. |

Worked a bit with AudioBridge and it was painful to hear all crackling noises when multiple streams summed up, so I decided to implement an audio limiter. I took AGC2 from Google's WebRTC (BSD 3-clause license that allows modifications and usage as long as license file stays in repo) and translated it to C (didn't write in C since school, so it might be not clean enough), adapted it for AudioBridge and also added SSE 4.2 and AVX2 versions (with proper initialization from available at runtime).
Also added a parameter to AudioBridge: use_limiter (boolean, default=false), which enables limiter. If limiter is disabled - only clamping (saturation to int16 boundaries) applied.
Here are some results of many participants speaking (5 participants speaking and the record is from incoming ab track):

Top - current Janus release with AudioBridge
Bottom - version with limiter and use_limiter = true
Full-spectrum lines - artifacts (crackling noise and clipping)
As you can see, current Janus AB implementation (just sum up all tracks) results in artifacts (values outside int16 range) whereas limiter reduces volume dynamically to normal level.
How it works:
Usage:
By default limiter is disabled and only clamping is applied.
Set
use_limiterproperty increateaudiobridge request totrueto enable limiter.Any suggestions are welcome!