Skip to content

Add microphone access to Input #105244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

goatchurchprime
Copy link
Contributor

@goatchurchprime goatchurchprime commented Apr 10, 2025

This PR is an alternative to #100508 that answers godotengine/godot-proposals#11347

We add the following four functions to the Input singleton to manage and access the microphone data stream independently of the Audio System.

Error start_microphone()
Error stop_microphone()
int get_microphone_frames_available()
PackedVector2Array get_microphone_buffer(int p_frames)

The current means of accessing the microphone data involves chaining the following units together in a sequence:

AudioStreamPlayer [ AudioStreamMicrophone ]
-> AudioBus [ AudioCaptureEffect ]
-> AudioOutput

The AudioCaptureEffect in the middle intercepts the audio data as it flows from the source stream to the output stream copies it into a buffer.

This structure is problematic because it is locking two real-time serial devices (microphone and speakers) into the same chain so that any systematic drift between them no matter how insignificant will eventually overwhelm any buffer.

Issues that this may have caused are #80173 #95120 #86428 . The problem is most consistent on Android where the microphone will either enter an endless loop or shut down after several minutes of use.

Additional changes are made to the platform implementations of AudioDriver.input_start() to make them safe to call multiple times, which can happen if there is more than one AudioStreamMicrophone present in the project.

The full test of these functions are in the godot-demo-projects/audio/mic_input project of godotengine/godot-demo-projects#1172 . This demo includes an option to push the samples straight into an AudioStreamGenerator to simulate a loop-back. The observed delay is about the same (1/3 seconds) as the original case of using an AudioStreamMicrophone as input.

The code that extracts the buffer is as follows:

var recording_buffer = [ ]
var audio_sample_size = 882   # 20ms at 44.1kHz
func _process(delta : float) -> void:
    while Input.get_microphone_frames_available() >= audio_sample_size:
        var audio_samples : PackedVector2Array = Input.get_microphone_buffer(audio_sample_size)
        if audio_samples:
            recording_buffer.append(audio_samples)

I have tested it on Windows, Android and Linux and it is perfectly designed to work with https://github.com/goatchurchprime/two-voip-godot-4

@fire
Copy link
Member

fire commented Apr 10, 2025

Does this assume mono mic? I have a multichannel mic.

Does this break our attempts to do automatic libsamplerate conversion?

Added to the last question. Does make it make a promise on the structure of the audio stream so we can never break that promise?

I would recommend the PackedVector2Array get_microphone_buffer(int p_frames) have an api similar to the ring buffer api in https://docs.godotengine.org/en/stable/classes/class_audioeffectcapture.html

It can still be seen as grabbing a packed vector2 array of some fixed size

How would we design a feature where the mic is injected back into the godot engine audio bus? For example we want to proxy the audio channel for voice cancellation.

Tl;DR it's a promising idea.

@goatchurchprime
Copy link
Contributor Author

Does this assume mono mic? I have a multichannel mic.

This assumes a stereo mic, exactly as specified in the internal microphone buffer. (On platforms that have a mono mic the samples are duplicated into the two channels.)

@fire Please send details about your multichannel mic if it has more than two channels.

(I have two mics on my laptop PC on either side of the camera 10cms apart, and I have plotted a visual correlation between the two channels to prove that the offsets are consistent with the speed of sound -- 8mm/sample at 44.1kHz)

There could be a scenario where we have multiple independent microphones plugged into the system as inputs. Not sure what this would be good for since there's already a lot of audio equipment for dealing with that.

Does this break our attempts to do automatic libsamplerate conversion?

The twovoip library has its own resampler for robustness and isolation from the audio system, and the output is fed directly to the RNNoise and Opus libraries which require different sample rates. It's unusual for the mic to have a wildly different sample rate to the output. (The variation I have observed is below a percent.)

Added to the last question. Does make it make a promise on the structure of the audio stream so we can never break that promise?

The audio stream can be whatever it likes internally, but get_microphone_buffer() has to return a PackedVector2Array or else everything that uses it will break. This "promise" is a consequence of being an API.

I would recommend the PackedVector2Array get_microphone_buffer(int p_frames) have an api similar to the ring buffer api in https://docs.godotengine.org/en/stable/classes/class_audioeffectcapture.html

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

How would we design a feature where the mic is injected back into the godot engine audio bus? For example we want to proxy the audio channel for voice cancellation.

I need to do some experiments using the AudioStreamGenerator to check if the obvious re-injection method is any good.

Regarding voice cancellation: Isn't this about subtracting the speaker output from the microphone input to prevent feedback where you hear your own voice round tripping across the network to another player and then back to you? I imagine that would be done by running a noise-cancelling process against the captured the output stream from the Master Bus and the microphone input.

@fire
Copy link
Member

fire commented Apr 12, 2025

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

So the current behavior is pause the entire game engine.

@goatchurchprime
Copy link
Contributor Author

goatchurchprime commented Apr 15, 2025

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

So the current behavior is pause the entire game engine.

No, it just returns an empty array if there are not enough samples -- same as the equivalent function in AudioEffectCapture

@fire
Copy link
Member

fire commented Apr 16, 2025

You need to implement the AudioEffectCapture internal workflow then

@goatchurchprime
Copy link
Contributor Author

You need to implement the AudioEffectCapture internal workflow then

I don't understand. Can you explain?

Also, was my answer to your previous question adequate?

@fire
Copy link
Member

fire commented Apr 16, 2025

Let's get together in a voice call at some point in discord or somewhere else.

@AThousandShips AThousandShips changed the title Add in the microphone access to the Input object Add microphone access to Input Apr 17, 2025
@goatchurchprime
Copy link
Contributor Author

goatchurchprime commented Apr 25, 2025

I've made a diagram to help me when I'm explaining this PR to anyone in future.
I drew the outputs at the top and the inputs at the bottom to make it feel like the system pulls rather than pushes the audio data
image

@goatchurchprime
Copy link
Contributor Author

I'm afraid I cannot satisfy @lyuma's challenge of sharing code between the two functions that draw data from the AudioDriver::input_buffer.

The functions are:

int AudioStreamPlaybackMicrophone::_mix_internal(AudioFrame *p_buffer, int p_frames)

This:

  • Runs in the audio thread and applies thread-locking
  • Outputs into a pre-allocated AudioFrame* array
  • Pads the very start of its stream with 50ms of zeros to allow the buffer to grow large enough to work for a few minutes on Android before the mismatch between the input and output rates causes it to potentially starve.

PackedVector2Array Input::get_microphone_buffer(int p_frames)

This:

  • Does not use thread-locking
  • Reserves and outputs a PackedVector2Array
  • Does nothing more than copy values from a ring buffer to a flat array

Basically, my 20 line function collapses two complicated buffer copying functions (into and out of the audio system) into one very simple one. The complexity is caused by the intermediate buffer being part of the audio system.

@akien-mga akien-mga removed request for a team May 13, 2025 10:55
@Ivorforce Ivorforce requested review from a team May 13, 2025 10:56
@goatchurchprime
Copy link
Contributor Author

Here is the much simplified demo project for this feature I only just remembered to update:
https://github.com/goatchurchprime/godot-demo-projects/tree/gtch/micplot/audio/mic_input

image

Copy link
Member

@adamscott adamscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was a misunderstanding with what @lyuma and I suggested a few audio meetings ago.

We suggested to add microphone to Input (as it isn't really necessarily an audio device) in order to dissociate microphones from the audio stack. To be able to get data to arbitrary devices.

Unfortunately, the current PR is based on AudioDriver, which itself blends every microphone input together.

So I don't think it should be merged as it is currently.

@goatchurchprime
Copy link
Contributor Author

goatchurchprime commented Jun 7, 2025

(posted wrong place)

@goatchurchprime
Copy link
Contributor Author

As it stands, there are hundreds of lines of code in the AudioDriver module that manages access to the microphone data across 5 different hardware platforms, and it all converges into that one Vector2<int32> input_buffer.

That is the single common bottleneck point that makes this PR so trivial and safe to implement, because it makes no fundamental change to anything. All it does is expose read-only access to a ringbuffer that already exists. The result is a totally robust microphone implementation that runs for hours on the Android platform, where formerly it would work for at most 5 minutes at a time without cutting out.

The current microphone input code is very much embedded and enmeshed in the AudioDriver stack. It would be a very significant undertaking to strip it out, replicate and debug this into an all-new InputAudio module, not least because of the number of hardware implementations such as: macos, pulseaudio, WASAPI, Web, Android.

I need a bit of help with drawing up a work-plan for this job as it's not obvious where to start and whether it can be broken into smaller chunks, such as just getting the Android platform to work as it is the most buggy.

@danielkanda
Copy link

Is there any way speed up the process of fixing this issue?

We are using TwoVoIP from @goatchurchprime and are experiencing Android microphone dropout / stutter as described in other threads. The workaround mentioned elsewhere of manually restarting the microphone stream seems to kill it entirely instead.

Happy to support if there's anything that we can do.

@goatchurchprime
Copy link
Contributor Author

After another spirited discussion with @lyuma last night, the sticking points now seem to be:
(a) I should have tried much harder to debug what I've determined is a flawed implementation, and
(b) my proposed implementation hasn't been properly thought through because it doesn't, for example, support multiple microphones.

We therefore have a clash between Point#1 (The problem always comes first) and Point#2 (To solve the problem, it has to exist in the first place), also characterized by "future proofing" and "I think it would be useful for users to..." expressed in the best practices for engine development document.

Since my Existing Problem is that VoIP is unreliable on Android/stand-alone-XR hardware, I don't have a use case for multiple microphones. So, if required by the reviewers to design a "big and flexible solution" there is a high risk it will not be "flexible enough for all users" -- to quote from Point#5 (To each problem, its own solution).

One path forward is to see if we can all agree that multiple microphones (as well as Ambisonic microphones, which I am certain will come up in the next round) constitute Rare Use Cases in the words of Point#6 (Cater to common use cases, leave the door open for the rare ones).

Compelling evidence of its rarity (aside from the fact that there are no proposal requests for it) can be found in the Audacity manual where it says:

  • Hardware support: you need a sound card or external audio interdace which has enough Analog to Digital Converters (ADC's) to do multi-channel recording. Most consumer cards only have one stereo pair of ADC's that is switched between various inputs such as Line-In and "Mic". You will need at least a semi-professional device to find support for multi-channel recording.
  • Driver support: the drivers for the device must make it possible to record more than two channels at once. This is more problematic that it might seem because the standard sound interfaces for many operating systems were designed long before multi-channel recording was possible, and so only allow for up to two channels of recording. Also, consumer-level systems are not designed to achieve the low latencies and high throughputs needed for high quality multi-channel recordings.

@danielkanda
Copy link

This does seem in conflict with the Godot ethos of pragmatism.

#1: The problem always comes first
#2: To solve the problem, it has to exist in the first place
Let's put aside the ergonomics of having to pipe microphone streams into audio busses to obtain data to send or analyse via effects. Long-running microphone streams are unstable.

On Android, we can't deliver working software that requires long-running voice input. This is a concrete issue that needs a solution.

#3: The problem has to be complex or frequent
In our experience, microphone streams drop out and come into a broken state happens regularly on devices like Meta Quest.

#5: To each problem, its own solution
#6: Cater to common use cases, leave the door open for the rare ones
What is being suggested here seems like a pragmatic solution to a real problem. Would it be reasonable to say this can be modified to accommodate more rare cases in the future? Multi-microphone / multi-channel setups are relatively rare, especially on consumer hardware.

I'm keen to get something underway here, as it's a blocking problem our development team is facing.

Whether the solution is exposing microphone input directly, or finding some other way to avoid the input/output buffer mismatch that @goatchurchprime identified, we are ready to support.

@lyuma
Copy link
Contributor

lyuma commented Jun 14, 2025

I would like a new issue to track bugs in AudioStreamMicrophone. I think that should be fixed irrespective of what new APis we choose to create.
From my understanding, there are a few bugs on AudioStreamMicrophone

What I call "Emergency cases". I think we can all agree that these Should Not Happen. but if they do, we should mitigate the downside consequences. Sometimes solutions need to be multi layered.

  1. Too few samples / Dropout / ending the stream early (my understanding is it arbitrarily kills the stream entirely if one read call has too few samples. This should never happen and either we should allow streams to continue with too few samples, or we should guarantee the stream is padded even in the worst case. No matter what, an AudioStreaamMicrophone should never end, except maybe if the device disconnects.
  2. Too many samples / increasing delay: the buffer size should be limited. This is bad, but the delay should have an upper bound. If the ring buffer limit is reached, it may be good to clear it entirely or something to prevent garbled mess afterwards.

okay and for underlying issues, given the current audio bus architecture:

  1. It should perhaps be possible to sync the audio bus clock to the audio device clock. I think you made a good case that we cannot properly play audio if the clocks are desynced. However, if the user wishes to use the audio bus, it can make sense that the audio bus runs at the correct sample rate. Or,
  2. Measure the divergence between the device clock and the audio bus clock, and resample from one to the other. This could cause aliasing artifacts if the resample difference is small enough so it might need to be a preference or we need to experiment with resampling algorithms or simply duplicating or deleting samples occasionally to make up the difference ("nearest sampling")

I understand you want to choose what time to spend and you could say I don't want to fix the bugs with the existing audio bus architecture and instead develop a new feature that bypasses the audio bus. This is fair and is your choice, but then you should be in the mindset that you are building a new feature and not fixing the audio bus, and we should agree on the design of the new feature.

Speaking of design, let's either repurpose the old proposal from Dec 2024 and update it based on what we discussed at the audio meetings, or make a new proposal. Here is some of what we discussed:

  1. For example the ring buffer should allow multiple consumers (read pointer tracked in a RefCounted state object, or passed in) to support migrating AudioStreamMicrophone to it (this will not fix the above bugs in AudioStreamMicrophone, but avoids adding duplicate code and contributing to code debt).
  2. And we talked about an approach to support multiple device inputs at an API level without needing operating system drivers to implement this support at first: you would check if there is someone who has opened a given input device and return an error if the user asks to open another input device without closing the first one.

so in summary, we should be able to solve these issues. I would feel better discussing in terms of concrete issues and proposals.

@goatchurchprime
Copy link
Contributor Author

The above is a list of 4 issues with the current implementation that can be papered over with somewhat questionable hacks that will degrade the microphone audio and are likely to experience a number of implementation bugs due to their extreme complexity (particularly point 4).

We can avoid this developmental nightmare and Fix The Problem Today 100% in relation to a common Use Case with a 15 line highly performant function that implements a simple API modeled on the one in AudioEffectCapture that has zero risk of breaking anything.

Now I know that adding new functions to the API is generally a Bad Thing(tm), but there is a balance to be struck. If I was being asked to re-write a million lines of code to avoid changing a single parameter in one function that nobody uses, we wouldn't be having a debate. In my Opinion the particulars of this issue means it easily falls on the side of: Better to change the API versus Attempting to work around a flawed implementation.

I have a related Opinion to this one, which is that I strongly doubt that the proposed project of coding work-arounds for each of the various flaws in the current implementation would actually succeed. One piece of evidence I have for holding this opinion is PR#93200 (Fix audio input gets muted after a while on android) which was submitted a year ago and shows no signs of further consideration. Multiply this rate of response by the scale of the task, and it's clearly not going to happen.

I have already answered Point 5 on the matter of "duplicate code" and "technical debt" in relation to these two functions in this comment above

I have provided evidence that multiple microphones is a Rare Case at the end of this other comment above.

@Kaleb-Reid
Copy link
Contributor

Given that this pr grabs frames from the buffer in AudioDriver, is there any reason that these methods can't be added to AudioServer which already appears to be intertwined with AudioDriver instead of making Input dependent on AudioServer/Driver?

@goatchurchprime
Copy link
Contributor Author

goatchurchprime commented Jun 18, 2025

Good idea. Unfortunately the location of the function isn't the sticking point for the reviewers, or I'd fix it right away.

Between a request to handle multiple microphones and a non-negotiable ban on making API changes, I don't know what to do next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants