-
-
Notifications
You must be signed in to change notification settings - Fork 22.7k
Add microphone access to Input
#105244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add microphone access to Input
#105244
Conversation
Does this assume mono mic? I have a multichannel mic. Does this break our attempts to do automatic libsamplerate conversion? Added to the last question. Does make it make a promise on the structure of the audio stream so we can never break that promise? I would recommend the It can still be seen as grabbing a packed vector2 array of some fixed size How would we design a feature where the mic is injected back into the godot engine audio bus? For example we want to proxy the audio channel for voice cancellation. Tl;DR it's a promising idea. |
This assumes a stereo mic, exactly as specified in the internal microphone buffer. (On platforms that have a mono mic the samples are duplicated into the two channels.) @fire Please send details about your multichannel mic if it has more than two channels. (I have two mics on my laptop PC on either side of the camera 10cms apart, and I have plotted a visual correlation between the two channels to prove that the offsets are consistent with the speed of sound -- 8mm/sample at 44.1kHz) There could be a scenario where we have multiple independent microphones plugged into the system as inputs. Not sure what this would be good for since there's already a lot of audio equipment for dealing with that.
The twovoip library has its own resampler for robustness and isolation from the audio system, and the output is fed directly to the RNNoise and Opus libraries which require different sample rates. It's unusual for the mic to have a wildly different sample rate to the output. (The variation I have observed is below a percent.)
The audio stream can be whatever it likes internally, but
This is the minimal API for now. We can add those other functions (eg
I need to do some experiments using the Regarding voice cancellation: Isn't this about subtracting the speaker output from the microphone input to prevent feedback where you hear your own voice round tripping across the network to another player and then back to you? I imagine that would be done by running a noise-cancelling process against the captured the output stream from the Master Bus and the microphone input. |
So the current behavior is pause the entire game engine. |
No, it just returns an empty array if there are not enough samples -- same as the equivalent function in AudioEffectCapture |
You need to implement the AudioEffectCapture internal workflow then |
I don't understand. Can you explain? Also, was my answer to your previous question adequate? |
Let's get together in a voice call at some point in discord or somewhere else. |
Input
I'm afraid I cannot satisfy @lyuma's challenge of sharing code between the two functions that draw data from the AudioDriver::input_buffer. The functions are: int AudioStreamPlaybackMicrophone::_mix_internal(AudioFrame *p_buffer, int p_frames)This:
PackedVector2Array Input::get_microphone_buffer(int p_frames)This:
Basically, my 20 line function collapses two complicated buffer copying functions (into and out of the audio system) into one very simple one. The complexity is caused by the intermediate buffer being part of the audio system. |
a1e95cf
to
47c7942
Compare
47c7942
to
3ecf06f
Compare
Here is the much simplified demo project for this feature I only just remembered to update: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was a misunderstanding with what @lyuma and I suggested a few audio meetings ago.
We suggested to add microphone to Input
(as it isn't really necessarily an audio device) in order to dissociate microphones from the audio stack. To be able to get data to arbitrary devices.
Unfortunately, the current PR is based on AudioDriver, which itself blends every microphone input together.
So I don't think it should be merged as it is currently.
(posted wrong place) |
As it stands, there are hundreds of lines of code in the AudioDriver module that manages access to the microphone data across 5 different hardware platforms, and it all converges into that one That is the single common bottleneck point that makes this PR so trivial and safe to implement, because it makes no fundamental change to anything. All it does is expose read-only access to a ringbuffer that already exists. The result is a totally robust microphone implementation that runs for hours on the Android platform, where formerly it would work for at most 5 minutes at a time without cutting out. The current microphone input code is very much embedded and enmeshed in the AudioDriver stack. It would be a very significant undertaking to strip it out, replicate and debug this into an all-new InputAudio module, not least because of the number of hardware implementations such as: macos, pulseaudio, WASAPI, Web, Android. I need a bit of help with drawing up a work-plan for this job as it's not obvious where to start and whether it can be broken into smaller chunks, such as just getting the Android platform to work as it is the most buggy. |
07dbc42
to
fe0e1b6
Compare
Is there any way speed up the process of fixing this issue? We are using TwoVoIP from @goatchurchprime and are experiencing Android microphone dropout / stutter as described in other threads. The workaround mentioned elsewhere of manually restarting the microphone stream seems to kill it entirely instead. Happy to support if there's anything that we can do. |
After another spirited discussion with @lyuma last night, the sticking points now seem to be: We therefore have a clash between Point#1 (The problem always comes first) and Point#2 (To solve the problem, it has to exist in the first place), also characterized by "future proofing" and "I think it would be useful for users to..." expressed in the best practices for engine development document. Since my Existing Problem is that VoIP is unreliable on Android/stand-alone-XR hardware, I don't have a use case for multiple microphones. So, if required by the reviewers to design a "big and flexible solution" there is a high risk it will not be "flexible enough for all users" -- to quote from Point#5 (To each problem, its own solution). One path forward is to see if we can all agree that multiple microphones (as well as Ambisonic microphones, which I am certain will come up in the next round) constitute Rare Use Cases in the words of Point#6 (Cater to common use cases, leave the door open for the rare ones). Compelling evidence of its rarity (aside from the fact that there are no proposal requests for it) can be found in the Audacity manual where it says:
|
This does seem in conflict with the Godot ethos of pragmatism. #1: The problem always comes first On Android, we can't deliver working software that requires long-running voice input. This is a concrete issue that needs a solution. #3: The problem has to be complex or frequent #5: To each problem, its own solution I'm keen to get something underway here, as it's a blocking problem our development team is facing. Whether the solution is exposing microphone input directly, or finding some other way to avoid the input/output buffer mismatch that @goatchurchprime identified, we are ready to support. |
I would like a new issue to track bugs in AudioStreamMicrophone. I think that should be fixed irrespective of what new APis we choose to create. What I call "Emergency cases". I think we can all agree that these Should Not Happen. but if they do, we should mitigate the downside consequences. Sometimes solutions need to be multi layered.
okay and for underlying issues, given the current audio bus architecture:
I understand you want to choose what time to spend and you could say I don't want to fix the bugs with the existing audio bus architecture and instead develop a new feature that bypasses the audio bus. This is fair and is your choice, but then you should be in the mindset that you are building a new feature and not fixing the audio bus, and we should agree on the design of the new feature. Speaking of design, let's either repurpose the old proposal from Dec 2024 and update it based on what we discussed at the audio meetings, or make a new proposal. Here is some of what we discussed:
so in summary, we should be able to solve these issues. I would feel better discussing in terms of concrete issues and proposals. |
The above is a list of 4 issues with the current implementation that can be papered over with somewhat questionable hacks that will degrade the microphone audio and are likely to experience a number of implementation bugs due to their extreme complexity (particularly point 4). We can avoid this developmental nightmare and Fix The Problem Today 100% in relation to a common Use Case with a 15 line highly performant function that implements a simple API modeled on the one in AudioEffectCapture that has zero risk of breaking anything. Now I know that adding new functions to the API is generally a Bad Thing(tm), but there is a balance to be struck. If I was being asked to re-write a million lines of code to avoid changing a single parameter in one function that nobody uses, we wouldn't be having a debate. In my Opinion the particulars of this issue means it easily falls on the side of: Better to change the API versus Attempting to work around a flawed implementation. I have a related Opinion to this one, which is that I strongly doubt that the proposed project of coding work-arounds for each of the various flaws in the current implementation would actually succeed. One piece of evidence I have for holding this opinion is PR#93200 (Fix audio input gets muted after a while on android) which was submitted a year ago and shows no signs of further consideration. Multiply this rate of response by the scale of the task, and it's clearly not going to happen. I have already answered Point 5 on the matter of "duplicate code" and "technical debt" in relation to these two functions in this comment above I have provided evidence that multiple microphones is a Rare Case at the end of this other comment above. |
Given that this pr grabs frames from the buffer in |
Good idea. Unfortunately the location of the function isn't the sticking point for the reviewers, or I'd fix it right away. Between a request to handle multiple microphones and a non-negotiable ban on making API changes, I don't know what to do next. |
This PR is an alternative to #100508 that answers godotengine/godot-proposals#11347
We add the following four functions to the
Input
singleton to manage and access the microphone data stream independently of the Audio System.The current means of accessing the microphone data involves chaining the following units together in a sequence:
The
AudioCaptureEffect
in the middle intercepts the audio data as it flows from the source stream to the output stream copies it into a buffer.This structure is problematic because it is locking two real-time serial devices (microphone and speakers) into the same chain so that any systematic drift between them no matter how insignificant will eventually overwhelm any buffer.
Issues that this may have caused are #80173 #95120 #86428 . The problem is most consistent on Android where the microphone will either enter an endless loop or shut down after several minutes of use.
Additional changes are made to the platform implementations of
AudioDriver.input_start()
to make them safe to call multiple times, which can happen if there is more than oneAudioStreamMicrophone
present in the project.The full test of these functions are in the
godot-demo-projects/audio/mic_input
project of godotengine/godot-demo-projects#1172 . This demo includes an option to push the samples straight into anAudioStreamGenerator
to simulate a loop-back. The observed delay is about the same (1/3 seconds) as the original case of using anAudioStreamMicrophone
as input.The code that extracts the buffer is as follows:
I have tested it on Windows, Android and Linux and it is perfectly designed to work with https://github.com/goatchurchprime/two-voip-godot-4