Add microphone access to `Input` #105244

goatchurchprime · 2025-04-10T15:38:28Z

This PR is an alternative to #100508 that answers godotengine/godot-proposals#11347

We add the following four functions to the Input singleton to manage and access the microphone data stream independently of the Audio System.

Error start_microphone()
Error stop_microphone()
int get_microphone_frames_available()
PackedVector2Array get_microphone_buffer(int p_frames)

The current means of accessing the microphone data involves chaining the following units together in a sequence:

AudioStreamPlayer [ AudioStreamMicrophone ]
-> AudioBus [ AudioCaptureEffect ]
-> AudioOutput

The AudioCaptureEffect in the middle intercepts the audio data as it flows from the source stream to the output stream copies it into a buffer.

This structure is problematic because it is locking two real-time serial devices (microphone and speakers) into the same chain so that any systematic drift between them no matter how insignificant will eventually overwhelm any buffer.

Issues that this may have caused are #80173 #95120 #86428 . The problem is most consistent on Android where the microphone will either enter an endless loop or shut down after several minutes of use.

Additional changes are made to the platform implementations of AudioDriver.input_start() to make them safe to call multiple times, which can happen if there is more than one AudioStreamMicrophone present in the project.

The full test of these functions are in the godot-demo-projects/audio/mic_input project of godotengine/godot-demo-projects#1172 . This demo includes an option to push the samples straight into an AudioStreamGenerator to simulate a loop-back. The observed delay is about the same (1/3 seconds) as the original case of using an AudioStreamMicrophone as input.

The code that extracts the buffer is as follows:

var recording_buffer = [ ]
var audio_sample_size = 882   # 20ms at 44.1kHz
func _process(delta : float) -> void:
    while Input.get_microphone_frames_available() >= audio_sample_size:
        var audio_samples : PackedVector2Array = Input.get_microphone_buffer(audio_sample_size)
        if audio_samples:
            recording_buffer.append(audio_samples)

I have tested it on Windows, Android and Linux and it is perfectly designed to work with https://github.com/goatchurchprime/two-voip-godot-4

fire · 2025-04-10T16:30:26Z

Does this assume mono mic? I have a multichannel mic.

Does this break our attempts to do automatic libsamplerate conversion?

Added to the last question. Does make it make a promise on the structure of the audio stream so we can never break that promise?

I would recommend the PackedVector2Array get_microphone_buffer(int p_frames) have an api similar to the ring buffer api in https://docs.godotengine.org/en/stable/classes/class_audioeffectcapture.html

It can still be seen as grabbing a packed vector2 array of some fixed size

How would we design a feature where the mic is injected back into the godot engine audio bus? For example we want to proxy the audio channel for voice cancellation.

Tl;DR it's a promising idea.

goatchurchprime · 2025-04-11T16:35:18Z

Does this assume mono mic? I have a multichannel mic.

This assumes a stereo mic, exactly as specified in the internal microphone buffer. (On platforms that have a mono mic the samples are duplicated into the two channels.)

@fire Please send details about your multichannel mic if it has more than two channels.

(I have two mics on my laptop PC on either side of the camera 10cms apart, and I have plotted a visual correlation between the two channels to prove that the offsets are consistent with the speed of sound -- 8mm/sample at 44.1kHz)

There could be a scenario where we have multiple independent microphones plugged into the system as inputs. Not sure what this would be good for since there's already a lot of audio equipment for dealing with that.

Does this break our attempts to do automatic libsamplerate conversion?

The twovoip library has its own resampler for robustness and isolation from the audio system, and the output is fed directly to the RNNoise and Opus libraries which require different sample rates. It's unusual for the mic to have a wildly different sample rate to the output. (The variation I have observed is below a percent.)

Added to the last question. Does make it make a promise on the structure of the audio stream so we can never break that promise?

The audio stream can be whatever it likes internally, but get_microphone_buffer() has to return a PackedVector2Array or else everything that uses it will break. This "promise" is a consequence of being an API.

I would recommend the PackedVector2Array get_microphone_buffer(int p_frames) have an api similar to the ring buffer api in https://docs.godotengine.org/en/stable/classes/class_audioeffectcapture.html

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

How would we design a feature where the mic is injected back into the godot engine audio bus? For example we want to proxy the audio channel for voice cancellation.

I need to do some experiments using the AudioStreamGenerator to check if the obvious re-injection method is any good.

Regarding voice cancellation: Isn't this about subtracting the speaker output from the microphone input to prevent feedback where you hear your own voice round tripping across the network to another player and then back to you? I imagine that would be done by running a noise-cancelling process against the captured the output stream from the Master Bus and the microphone input.

fire · 2025-04-12T01:52:54Z

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

So the current behavior is pause the entire game engine.

goatchurchprime · 2025-04-15T09:04:52Z

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

So the current behavior is pause the entire game engine.

No, it just returns an empty array if there are not enough samples -- same as the equivalent function in AudioEffectCapture

fire · 2025-04-16T00:39:26Z

You need to implement the AudioEffectCapture internal workflow then

goatchurchprime · 2025-04-16T11:33:42Z

You need to implement the AudioEffectCapture internal workflow then

I don't understand. Can you explain?

Also, was my answer to your previous question adequate?

fire · 2025-04-16T20:33:29Z

Let's get together in a voice call at some point in discord or somewhere else.

goatchurchprime · 2025-04-25T13:44:41Z

I've made a diagram to help me when I'm explaining this PR to anyone in future.
I drew the outputs at the top and the inputs at the bottom to make it feel like the system pulls rather than pushes the audio data

goatchurchprime · 2025-04-28T16:36:29Z

I'm afraid I cannot satisfy @lyuma's challenge of sharing code between the two functions that draw data from the AudioDriver::input_buffer.

The functions are:

int AudioStreamPlaybackMicrophone::_mix_internal(AudioFrame *p_buffer, int p_frames)

This:

Runs in the audio thread and applies thread-locking
Outputs into a pre-allocated AudioFrame* array
Pads the very start of its stream with 50ms of zeros to allow the buffer to grow large enough to work for a few minutes on Android before the mismatch between the input and output rates causes it to potentially starve.

PackedVector2Array Input::get_microphone_buffer(int p_frames)

This:

Does not use thread-locking
Reserves and outputs a PackedVector2Array
Does nothing more than copy values from a ring buffer to a flat array

Basically, my 20 line function collapses two complicated buffer copying functions (into and out of the audio system) into one very simple one. The complexity is caused by the intermediate buffer being part of the audio system.

goatchurchprime · 2025-05-14T09:57:53Z

Here is the much simplified demo project for this feature I only just remembered to update:
https://github.com/goatchurchprime/godot-demo-projects/tree/gtch/micplot/audio/mic_input

adamscott

I think there was a misunderstanding with what @lyuma and I suggested a few audio meetings ago.

We suggested to add microphone to Input (as it isn't really necessarily an audio device) in order to dissociate microphones from the audio stack. To be able to get data to arbitrary devices.

Unfortunately, the current PR is based on AudioDriver, which itself blends every microphone input together.

So I don't think it should be merged as it is currently.

goatchurchprime · 2025-06-07T09:48:29Z

(posted wrong place)

goatchurchprime · 2025-06-07T10:03:43Z

As it stands, there are hundreds of lines of code in the AudioDriver module that manages access to the microphone data across 5 different hardware platforms, and it all converges into that one Vector2<int32> input_buffer.

That is the single common bottleneck point that makes this PR so trivial and safe to implement, because it makes no fundamental change to anything. All it does is expose read-only access to a ringbuffer that already exists. The result is a totally robust microphone implementation that runs for hours on the Android platform, where formerly it would work for at most 5 minutes at a time without cutting out.

The current microphone input code is very much embedded and enmeshed in the AudioDriver stack. It would be a very significant undertaking to strip it out, replicate and debug this into an all-new InputAudio module, not least because of the number of hardware implementations such as: macos, pulseaudio, WASAPI, Web, Android.

I need a bit of help with drawing up a work-plan for this job as it's not obvious where to start and whether it can be broken into smaller chunks, such as just getting the Android platform to work as it is the most buggy.

danielkanda · 2025-06-12T05:36:15Z

Is there any way speed up the process of fixing this issue?

We are using TwoVoIP from @goatchurchprime and are experiencing Android microphone dropout / stutter as described in other threads. The workaround mentioned elsewhere of manually restarting the microphone stream seems to kill it entirely instead.

Happy to support if there's anything that we can do.

goatchurchprime · 2025-06-12T12:13:30Z

After another spirited discussion with @lyuma last night, the sticking points now seem to be:
(a) I should have tried much harder to debug what I've determined is a flawed implementation, and
(b) my proposed implementation hasn't been properly thought through because it doesn't, for example, support multiple microphones.

We therefore have a clash between Point#1 (The problem always comes first) and Point#2 (To solve the problem, it has to exist in the first place), also characterized by "future proofing" and "I think it would be useful for users to..." expressed in the best practices for engine development document.

Since my Existing Problem is that VoIP is unreliable on Android/stand-alone-XR hardware, I don't have a use case for multiple microphones. So, if required by the reviewers to design a "big and flexible solution" there is a high risk it will not be "flexible enough for all users" -- to quote from Point#5 (To each problem, its own solution).

One path forward is to see if we can all agree that multiple microphones (as well as Ambisonic microphones, which I am certain will come up in the next round) constitute Rare Use Cases in the words of Point#6 (Cater to common use cases, leave the door open for the rare ones).

Compelling evidence of its rarity (aside from the fact that there are no proposal requests for it) can be found in the Audacity manual where it says:

Hardware support: you need a sound card or external audio interdace which has enough Analog to Digital Converters (ADC's) to do multi-channel recording. Most consumer cards only have one stereo pair of ADC's that is switched between various inputs such as Line-In and "Mic". You will need at least a semi-professional device to find support for multi-channel recording.
Driver support: the drivers for the device must make it possible to record more than two channels at once. This is more problematic that it might seem because the standard sound interfaces for many operating systems were designed long before multi-channel recording was possible, and so only allow for up to two channels of recording. Also, consumer-level systems are not designed to achieve the low latencies and high throughputs needed for high quality multi-channel recordings.

danielkanda · 2025-06-13T08:16:31Z

This does seem in conflict with the Godot ethos of pragmatism.

#1: The problem always comes first
#2: To solve the problem, it has to exist in the first place
Let's put aside the ergonomics of having to pipe microphone streams into audio busses to obtain data to send or analyse via effects. Long-running microphone streams are unstable.

On Android, we can't deliver working software that requires long-running voice input. This is a concrete issue that needs a solution.

#3: The problem has to be complex or frequent
In our experience, microphone streams drop out and come into a broken state happens regularly on devices like Meta Quest.

#5: To each problem, its own solution
#6: Cater to common use cases, leave the door open for the rare ones
What is being suggested here seems like a pragmatic solution to a real problem. Would it be reasonable to say this can be modified to accommodate more rare cases in the future? Multi-microphone / multi-channel setups are relatively rare, especially on consumer hardware.

I'm keen to get something underway here, as it's a blocking problem our development team is facing.

Whether the solution is exposing microphone input directly, or finding some other way to avoid the input/output buffer mismatch that @goatchurchprime identified, we are ready to support.

lyuma · 2025-06-14T00:41:30Z

I would like a new issue to track bugs in AudioStreamMicrophone. I think that should be fixed irrespective of what new APis we choose to create.
From my understanding, there are a few bugs on AudioStreamMicrophone

What I call "Emergency cases". I think we can all agree that these Should Not Happen. but if they do, we should mitigate the downside consequences. Sometimes solutions need to be multi layered.

Too few samples / Dropout / ending the stream early (my understanding is it arbitrarily kills the stream entirely if one read call has too few samples. This should never happen and either we should allow streams to continue with too few samples, or we should guarantee the stream is padded even in the worst case. No matter what, an AudioStreaamMicrophone should never end, except maybe if the device disconnects.
Too many samples / increasing delay: the buffer size should be limited. This is bad, but the delay should have an upper bound. If the ring buffer limit is reached, it may be good to clear it entirely or something to prevent garbled mess afterwards.

okay and for underlying issues, given the current audio bus architecture:

It should perhaps be possible to sync the audio bus clock to the audio device clock. I think you made a good case that we cannot properly play audio if the clocks are desynced. However, if the user wishes to use the audio bus, it can make sense that the audio bus runs at the correct sample rate. Or,
Measure the divergence between the device clock and the audio bus clock, and resample from one to the other. This could cause aliasing artifacts if the resample difference is small enough so it might need to be a preference or we need to experiment with resampling algorithms or simply duplicating or deleting samples occasionally to make up the difference ("nearest sampling")

I understand you want to choose what time to spend and you could say I don't want to fix the bugs with the existing audio bus architecture and instead develop a new feature that bypasses the audio bus. This is fair and is your choice, but then you should be in the mindset that you are building a new feature and not fixing the audio bus, and we should agree on the design of the new feature.

Speaking of design, let's either repurpose the old proposal from Dec 2024 and update it based on what we discussed at the audio meetings, or make a new proposal. Here is some of what we discussed:

For example the ring buffer should allow multiple consumers (read pointer tracked in a RefCounted state object, or passed in) to support migrating AudioStreamMicrophone to it (this will not fix the above bugs in AudioStreamMicrophone, but avoids adding duplicate code and contributing to code debt).
And we talked about an approach to support multiple device inputs at an API level without needing operating system drivers to implement this support at first: you would check if there is someone who has opened a given input device and return an error if the user asks to open another input device without closing the first one.

so in summary, we should be able to solve these issues. I would feel better discussing in terms of concrete issues and proposals.

goatchurchprime · 2025-06-17T16:45:43Z

The above is a list of 4 issues with the current implementation that can be papered over with somewhat questionable hacks that will degrade the microphone audio and are likely to experience a number of implementation bugs due to their extreme complexity (particularly point 4).

We can avoid this developmental nightmare and Fix The Problem Today 100% in relation to a common Use Case with a 15 line highly performant function that implements a simple API modeled on the one in AudioEffectCapture that has zero risk of breaking anything.

Now I know that adding new functions to the API is generally a Bad Thing(tm), but there is a balance to be struck. If I was being asked to re-write a million lines of code to avoid changing a single parameter in one function that nobody uses, we wouldn't be having a debate. In my Opinion the particulars of this issue means it easily falls on the side of: Better to change the API versus Attempting to work around a flawed implementation.

I have a related Opinion to this one, which is that I strongly doubt that the proposed project of coding work-arounds for each of the various flaws in the current implementation would actually succeed. One piece of evidence I have for holding this opinion is PR#93200 (Fix audio input gets muted after a while on android) which was submitted a year ago and shows no signs of further consideration. Multiply this rate of response by the scale of the task, and it's clearly not going to happen.

I have already answered Point 5 on the matter of "duplicate code" and "technical debt" in relation to these two functions in this comment above

I have provided evidence that multiple microphones is a Rare Case at the end of this other comment above.

Kaleb-Reid · 2025-06-18T06:36:47Z

Given that this pr grabs frames from the buffer in AudioDriver, is there any reason that these methods can't be added to AudioServer which already appears to be intertwined with AudioDriver instead of making Input dependent on AudioServer/Driver?

goatchurchprime · 2025-06-18T10:25:46Z

Good idea. Unfortunately the location of the function isn't the sticking point for the reviewers, or I'd fix it right away.

Between a request to handle multiple microphones and a non-negotiable ban on making API changes, I don't know what to do next.

goatchurchprime mentioned this pull request Apr 10, 2025

Fix Capture and Record AudioEffect bugs for surround systems #92532

Open

AThousandShips added enhancement topic:audio labels Apr 10, 2025

AThousandShips added this to the 4.x milestone Apr 10, 2025

goatchurchprime mentioned this pull request Apr 15, 2025

Show AudioEffectCapture by use of a shader showing the captured waveform in stereo godotengine/godot-demo-projects#1172

Draft

goatchurchprime marked this pull request as ready for review April 16, 2025 00:12

goatchurchprime requested review from a team as code owners April 16, 2025 00:12

AThousandShips changed the title ~~Add in the microphone access to the Input object~~ Add microphone access to Input Apr 17, 2025

goatchurchprime mentioned this pull request Apr 24, 2025

Encoder crackling with default values goatchurchprime/two-voip-godot-4#36

Closed

goatchurchprime mentioned this pull request May 1, 2025

Project segfaults at quit on Linux with effect attached to bus. goatchurchprime/two-voip-godot-4#49

Closed

goatchurchprime force-pushed the gtch/inputmic branch from a1e95cf to 47c7942 Compare May 13, 2025 10:43

goatchurchprime requested review from a team as code owners May 13, 2025 10:50

akien-mga removed request for a team May 13, 2025 10:55

goatchurchprime force-pushed the gtch/inputmic branch from 47c7942 to 3ecf06f Compare May 13, 2025 10:55

Ivorforce requested review from a team May 13, 2025 10:56

quellus mentioned this pull request May 16, 2025

[Bug]: Microphone input builds up latency over time quellus/GDTuber#174

Open

goatchurchprime mentioned this pull request May 22, 2025

Include building the Android template in the flake goatchurchprime/ngodot#2

Open

adamscott requested changes Jun 5, 2025

View reviewed changes

Add microphone access to Input

fe0e1b6

goatchurchprime force-pushed the gtch/inputmic branch from 07dbc42 to fe0e1b6 Compare June 10, 2025 13:04

lyuma mentioned this pull request Jun 18, 2025

Fix audio input gets muted after a while on android #93200

Open

goatchurchprime mentioned this pull request Jul 2, 2025

Microphone Record Demo is using wrong default mix rate godotengine/godot-demo-projects#1216

Open

Uh oh!

Add microphone access to Input #105244

Are you sure you want to change the base?

Add microphone access to Input #105244

Conversation

goatchurchprime commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fire commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goatchurchprime commented Apr 11, 2025

Uh oh!

fire commented Apr 12, 2025

Uh oh!

goatchurchprime commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fire commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goatchurchprime commented Apr 16, 2025

Uh oh!

fire commented Apr 16, 2025

Uh oh!

goatchurchprime commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goatchurchprime commented Apr 28, 2025

int AudioStreamPlaybackMicrophone::_mix_internal(AudioFrame *p_buffer, int p_frames)

PackedVector2Array Input::get_microphone_buffer(int p_frames)

Uh oh!

goatchurchprime commented May 14, 2025

Uh oh!

adamscott left a comment

Choose a reason for hiding this comment

Uh oh!

goatchurchprime commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goatchurchprime commented Jun 7, 2025

Uh oh!

danielkanda commented Jun 12, 2025

Uh oh!

goatchurchprime commented Jun 12, 2025

Uh oh!

danielkanda commented Jun 13, 2025

Uh oh!

lyuma commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goatchurchprime commented Jun 17, 2025

Uh oh!

Kaleb-Reid commented Jun 18, 2025

Uh oh!

goatchurchprime commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Add microphone access to `Input` #105244

Add microphone access to `Input` #105244

goatchurchprime commented Apr 10, 2025 •

edited

Loading

fire commented Apr 10, 2025 •

edited

Loading

goatchurchprime commented Apr 15, 2025 •

edited

Loading

fire commented Apr 16, 2025 •

edited

Loading

goatchurchprime commented Apr 25, 2025 •

edited

Loading

goatchurchprime commented Jun 7, 2025 •

edited

Loading

lyuma commented Jun 14, 2025 •

edited

Loading

goatchurchprime commented Jun 18, 2025 •

edited

Loading