Skip to content

Conversation

@meg-huggingface
Copy link
Contributor

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

  • [ x] Add an entry to _blog.yml.
  • [ x] Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • [ x] Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • [ x] Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • [ x] Ensure the publication date is correct.
  • [ x] Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

- Alternatively, it’s possible to modify the code we provide in the demo to model the speaker’s voice using a variety of _different_ uploaded voice files that the speaker is consenting to – for example, when providing consent for using online recordings. Prompts and consent phrases should be altered accordingly.
- It’s also possible to save the consent audio to be used by a given system, for example, when the speaker is consenting to have their voice used for arbitrary utterances in the future. This can be done using the `huggingface_hub` upload capability. [Read how to do this here](https://huggingface.co/docs/huggingface_hub/en/guides/upload). Again, prompts and consent phrases for the speaker to say should account for this context of use.

Check our demo out! The code is modular so it can be sliced and diced in different ways to incorporate into your own projects. We’ll be working on making this more robust and secure over time, and we’re curious to hear your ideas on how to improve.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can link to the code here for easier reference: https://huggingface.co/spaces/society-ethics/RepeatAfterMe/blob/main/app.py

@meg-huggingface meg-huggingface merged commit e5ae70a into main Oct 28, 2025
1 check passed
@meg-huggingface meg-huggingface deleted the meg/voice-consent-gate branch October 28, 2025 16:42
Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, feel free to skip my questions if you think they don't make sense!

# Voice Cloning with Consent


<img src="https://huggingface.co/spaces/society-ethics/RepeatAfterMe/resolve/main/assets/voice_consent_gate.png" alt="Line-drawing/clipart of a gate, where the family name says Consent" width="50%"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why you repeat thumbnail here?



# Voice Cloning with Consent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few words of what the blog talks about with what motivation would be great as people have attention span of 5 secs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## Ethics in Practice: Consent as System Infrastructure

The voice consent gate is a bit of infrastructure we're exploring that provides methods for ethical principles like **consent** to be embedded directly into AI system workflows. By requiring consent to be spoken and recognized before proceeding, the gate turns an ethical principle into a computational condition. This creates a traceable, auditable interaction: An AI model can only run after an unambiguous act of consent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit abstract here imo would be great to materialize a little

Suggested change
The voice consent gate is a bit of infrastructure we're exploring that provides methods for ethical principles like **consent** to be embedded directly into AI system workflows. By requiring consent to be spoken and recognized before proceeding, the gate turns an ethical principle into a computational condition. This creates a traceable, auditable interaction: An AI model can only run after an unambiguous act of consent.
The voice consent gate is part of an infrastructure we're exploring that provides methods for ethical principles like **consent** to be embedded directly into AI system workflows. By requiring consent to be spoken and recognized before proceeding, the gate turns an ethical principle into a computational condition. This creates a traceable, auditable interaction: An AI model can only run after an unambiguous act of consent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to 'piece'.


### Approach

**The consent bit:** To create a voice consent gate in an English voice cloning system, generate a short, natural-sounding English utterance (~20 words) for a person to read aloud that clearly states their informed consent in the current context. We recommend explicitly including _a consent phrase_ and _the model name_, such as “I give my consent to use the < MODEL > voice cloning model with my voice”. We also recommend using an audio recording that cannot be uploaded, but that instead comes directly from a microphone, to make sure that the sentence isn’t part of an earlier recording that’s been manipulated. Pairing this with a novel (previously unsaid) sentence further helps to directly index the current consent context - supporting explicit, active, context-specific, informed consent.
Copy link
Contributor

@merveenoyan merveenoyan Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like one can also generate this phrase with a TTS model, do you want to touch on how to validate if the consent phrase is originally made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mention that it comes directly from the microphone, but it's not bullet proof -- it's just an initial idea for the moment.


**The suitable-for-voice-cloning bit:** Previous work on voice cloning has shown that the phrases provided by the speaker must have _phonetic variety_, covering [_diverse vowels and consonants_](https://proceedings.neurips.cc/paper_files/paper/2018/file/6832a7b24bc06775d02b7406880b93fc-Paper.pdf); have a [_“neutral” or polite tone_](https://dl.acm.org/doi/10.5555/3666122.3666982), without background noise and with the speaker in a comfortable position; and have _a clear start and end_ (i.e., don’t trim the clip mid-word).

To enact both of these aspects within the demo, we prompt a language model to create pairs of sentences: one expressing explicit consent, and another neutral sentence that adds phonetic diversity (covering different vowels, consonants, and tones). Each prompt utilizes a randomly-chosen everyday topic (like the weather, food, or music) to keep the sentences varied and comfortable to say, aiding in creating recordings that are clear, natural, and phonetically rich, while also containing an unambiguous statement of consent. For example, the language model might generate: _“I give my consent to use my voice for generating audio with the model EchoVoice. The weather is bright and calm this morning.”_ This approach ensures that every sample used for cloning contains verifiable, explicit consent, while remaining suitable as technical input for high-quality voice synthesis. (Note: It's not required that the language model be a "large" language model, which brings its own consent issues.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why it's not just two sentences we can provide? I'm a bit confused on LM side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants