Skip to content

Add microphone access to Input #105244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

goatchurchprime
Copy link
Contributor

@goatchurchprime goatchurchprime commented Apr 10, 2025

This PR is an alternative to #100508 that answers godotengine/godot-proposals#11347

We add the following four functions to the Input singleton to manage and access the microphone data stream independently of the Audio System.

Error start_microphone()
Error stop_microphone()
int get_microphone_frames_available()
PackedVector2Array get_microphone_buffer(int p_frames)

The current means of accessing the microphone data involves chaining the following units together in a sequence:

AudioStreamPlayer [ AudioStreamMicrophone ]
-> AudioBus [ AudioCaptureEffect ]
-> AudioOutput

The AudioCaptureEffect in the middle intercepts the audio data as it flows from the source stream to the output stream copies it into a buffer.

This structure is problematic because it is locking two real-time serial devices (microphone and speakers) into the same chain so that any systematic drift between them no matter how insignificant will eventually overwhelm any buffer.

Issues that this may have caused are #80173 #95120 #86428 . The problem is most consistent on Android where the microphone will either enter an endless loop or shut down after several minutes of use.

Additional changes are made to the platform implementations of AudioDriver.input_start() to make them safe to call multiple times, which can happen if there is more than one AudioStreamMicrophone present in the project.

The full test of these functions are in the godot-demo-projects/audio/mic_input project of godotengine/godot-demo-projects#1172 . This demo includes an option to push the samples straight into an AudioStreamGenerator to simulate a loop-back. The observed delay is about the same (1/3 seconds) as the original case of using an AudioStreamMicrophone as input.

The code that extracts the buffer is as follows:

var recording_buffer = [ ]
var audio_sample_size = 882   # 20ms at 44.1kHz
func _process(delta : float) -> void:
    while Input.get_microphone_frames_available() >= audio_sample_size:
        var audio_samples : PackedVector2Array = Input.get_microphone_buffer(audio_sample_size)
        if audio_samples:
            recording_buffer.append(audio_samples)

I have tested it on Windows, Android and Linux and it is perfectly designed to work with https://github.com/goatchurchprime/two-voip-godot-4

@fire
Copy link
Member

fire commented Apr 10, 2025

Does this assume mono mic? I have a multichannel mic.

Does this break our attempts to do automatic libsamplerate conversion?

Added to the last question. Does make it make a promise on the structure of the audio stream so we can never break that promise?

I would recommend the PackedVector2Array get_microphone_buffer(int p_frames) have an api similar to the ring buffer api in https://docs.godotengine.org/en/stable/classes/class_audioeffectcapture.html

It can still be seen as grabbing a packed vector2 array of some fixed size

How would we design a feature where the mic is injected back into the godot engine audio bus? For example we want to proxy the audio channel for voice cancellation.

Tl;DR it's a promising idea.

@goatchurchprime
Copy link
Contributor Author

Does this assume mono mic? I have a multichannel mic.

This assumes a stereo mic, exactly as specified in the internal microphone buffer. (On platforms that have a mono mic the samples are duplicated into the two channels.)

@fire Please send details about your multichannel mic if it has more than two channels.

(I have two mics on my laptop PC on either side of the camera 10cms apart, and I have plotted a visual correlation between the two channels to prove that the offsets are consistent with the speed of sound -- 8mm/sample at 44.1kHz)

There could be a scenario where we have multiple independent microphones plugged into the system as inputs. Not sure what this would be good for since there's already a lot of audio equipment for dealing with that.

Does this break our attempts to do automatic libsamplerate conversion?

The twovoip library has its own resampler for robustness and isolation from the audio system, and the output is fed directly to the RNNoise and Opus libraries which require different sample rates. It's unusual for the mic to have a wildly different sample rate to the output. (The variation I have observed is below a percent.)

Added to the last question. Does make it make a promise on the structure of the audio stream so we can never break that promise?

The audio stream can be whatever it likes internally, but get_microphone_buffer() has to return a PackedVector2Array or else everything that uses it will break. This "promise" is a consequence of being an API.

I would recommend the PackedVector2Array get_microphone_buffer(int p_frames) have an api similar to the ring buffer api in https://docs.godotengine.org/en/stable/classes/class_audioeffectcapture.html

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

How would we design a feature where the mic is injected back into the godot engine audio bus? For example we want to proxy the audio channel for voice cancellation.

I need to do some experiments using the AudioStreamGenerator to check if the obvious re-injection method is any good.

Regarding voice cancellation: Isn't this about subtracting the speaker output from the microphone input to prevent feedback where you hear your own voice round tripping across the network to another player and then back to you? I imagine that would be done by running a noise-cancelling process against the captured the output stream from the Master Bus and the microphone input.

@fire
Copy link
Member

fire commented Apr 12, 2025

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

So the current behavior is pause the entire game engine.

@goatchurchprime
Copy link
Contributor Author

goatchurchprime commented Apr 15, 2025

This is the minimal API for now. We can add those other functions (eg can_get_buffer(frames: int)) at a later date.

So the current behavior is pause the entire game engine.

No, it just returns an empty array if there are not enough samples -- same as the equivalent function in AudioEffectCapture

@fire
Copy link
Member

fire commented Apr 16, 2025

You need to implement the AudioEffectCapture internal workflow then

@goatchurchprime
Copy link
Contributor Author

You need to implement the AudioEffectCapture internal workflow then

I don't understand. Can you explain?

Also, was my answer to your previous question adequate?

@fire
Copy link
Member

fire commented Apr 16, 2025

Let's get together in a voice call at some point in discord or somewhere else.

@AThousandShips AThousandShips changed the title Add in the microphone access to the Input object Add microphone access to Input Apr 17, 2025
@goatchurchprime
Copy link
Contributor Author

goatchurchprime commented Apr 25, 2025

I've made a diagram to help me when I'm explaining this PR to anyone in future.
I drew the outputs at the top and the inputs at the bottom to make it feel like the system pulls rather than pushes the audio data
image

@goatchurchprime
Copy link
Contributor Author

I'm afraid I cannot satisfy @lyuma's challenge of sharing code between the two functions that draw data from the AudioDriver::input_buffer.

The functions are:

int AudioStreamPlaybackMicrophone::_mix_internal(AudioFrame *p_buffer, int p_frames)

This:

  • Runs in the audio thread and applies thread-locking
  • Outputs into a pre-allocated AudioFrame* array
  • Pads the very start of its stream with 50ms of zeros to allow the buffer to grow large enough to work for a few minutes on Android before the mismatch between the input and output rates causes it to potentially starve.

PackedVector2Array Input::get_microphone_buffer(int p_frames)

This:

  • Does not use thread-locking
  • Reserves and outputs a PackedVector2Array
  • Does nothing more than copy values from a ring buffer to a flat array

Basically, my 20 line function collapses two complicated buffer copying functions (into and out of the audio system) into one very simple one. The complexity is caused by the intermediate buffer being part of the audio system.

@akien-mga akien-mga removed request for a team May 13, 2025 10:55
@Ivorforce Ivorforce requested review from a team May 13, 2025 10:56
@goatchurchprime
Copy link
Contributor Author

Here is the much simplified demo project for this feature I only just remembered to update:
https://github.com/goatchurchprime/godot-demo-projects/tree/gtch/micplot/audio/mic_input

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants