-
Notifications
You must be signed in to change notification settings - Fork 932
Adding voice consent gate blogpost #3152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
voice-consent-gate.md
Outdated
| - Alternatively, it’s possible to modify the code we provide in the demo to model the speaker’s voice using a variety of _different_ uploaded voice files that the speaker is consenting to – for example, when providing consent for using online recordings. Prompts and consent phrases should be altered accordingly. | ||
| - It’s also possible to save the consent audio to be used by a given system, for example, when the speaker is consenting to have their voice used for arbitrary utterances in the future. This can be done using the `huggingface_hub` upload capability. [Read how to do this here](https://huggingface.co/docs/huggingface_hub/en/guides/upload). Again, prompts and consent phrases for the speaker to say should account for this context of use. | ||
|
|
||
| Check our demo out! The code is modular so it can be sliced and diced in different ways to incorporate into your own projects. We’ll be working on making this more robust and secure over time, and we’re curious to hear your ideas on how to improve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can link to the code here for easier reference: https://huggingface.co/spaces/society-ethics/RepeatAfterMe/blob/main/app.py
merveenoyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, feel free to skip my questions if you think they don't make sense!
| # Voice Cloning with Consent | ||
|
|
||
|
|
||
| <img src="https://huggingface.co/spaces/society-ethics/RepeatAfterMe/resolve/main/assets/voice_consent_gate.png" alt="Line-drawing/clipart of a gate, where the family name says Consent" width="50%"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why you repeat thumbnail here?
|
|
||
|
|
||
| # Voice Cloning with Consent | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few words of what the blog talks about with what motivation would be great as people have attention span of 5 secs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| ## Ethics in Practice: Consent as System Infrastructure | ||
|
|
||
| The voice consent gate is a bit of infrastructure we're exploring that provides methods for ethical principles like **consent** to be embedded directly into AI system workflows. By requiring consent to be spoken and recognized before proceeding, the gate turns an ethical principle into a computational condition. This creates a traceable, auditable interaction: An AI model can only run after an unambiguous act of consent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit abstract here imo would be great to materialize a little
| The voice consent gate is a bit of infrastructure we're exploring that provides methods for ethical principles like **consent** to be embedded directly into AI system workflows. By requiring consent to be spoken and recognized before proceeding, the gate turns an ethical principle into a computational condition. This creates a traceable, auditable interaction: An AI model can only run after an unambiguous act of consent. | |
| The voice consent gate is part of an infrastructure we're exploring that provides methods for ethical principles like **consent** to be embedded directly into AI system workflows. By requiring consent to be spoken and recognized before proceeding, the gate turns an ethical principle into a computational condition. This creates a traceable, auditable interaction: An AI model can only run after an unambiguous act of consent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to 'piece'.
|
|
||
| ### Approach | ||
|
|
||
| **The consent bit:** To create a voice consent gate in an English voice cloning system, generate a short, natural-sounding English utterance (~20 words) for a person to read aloud that clearly states their informed consent in the current context. We recommend explicitly including _a consent phrase_ and _the model name_, such as “I give my consent to use the < MODEL > voice cloning model with my voice”. We also recommend using an audio recording that cannot be uploaded, but that instead comes directly from a microphone, to make sure that the sentence isn’t part of an earlier recording that’s been manipulated. Pairing this with a novel (previously unsaid) sentence further helps to directly index the current consent context - supporting explicit, active, context-specific, informed consent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like one can also generate this phrase with a TTS model, do you want to touch on how to validate if the consent phrase is originally made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mention that it comes directly from the microphone, but it's not bullet proof -- it's just an initial idea for the moment.
|
|
||
| **The suitable-for-voice-cloning bit:** Previous work on voice cloning has shown that the phrases provided by the speaker must have _phonetic variety_, covering [_diverse vowels and consonants_](https://proceedings.neurips.cc/paper_files/paper/2018/file/6832a7b24bc06775d02b7406880b93fc-Paper.pdf); have a [_“neutral” or polite tone_](https://dl.acm.org/doi/10.5555/3666122.3666982), without background noise and with the speaker in a comfortable position; and have _a clear start and end_ (i.e., don’t trim the clip mid-word). | ||
|
|
||
| To enact both of these aspects within the demo, we prompt a language model to create pairs of sentences: one expressing explicit consent, and another neutral sentence that adds phonetic diversity (covering different vowels, consonants, and tones). Each prompt utilizes a randomly-chosen everyday topic (like the weather, food, or music) to keep the sentences varied and comfortable to say, aiding in creating recordings that are clear, natural, and phonetically rich, while also containing an unambiguous statement of consent. For example, the language model might generate: _“I give my consent to use my voice for generating audio with the model EchoVoice. The weather is bright and calm this morning.”_ This approach ensures that every sample used for cloning contains verifiable, explicit consent, while remaining suitable as technical input for high-quality voice synthesis. (Note: It's not required that the language model be a "large" language model, which brings its own consent issues.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why it's not just two sentences we can provide? I'm a bit confused on LM side
Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.
Preparing the Article
You're not quite done yet, though. Please make sure to follow this process (as documented here):
mdfile. You can also specifyguestororgfor the authors.Here is an example of a complete PR: #2382
Getting a Review
Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.
Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.