Skip to content

Update spec with latest on-device changes #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 27, 2025
134 changes: 88 additions & 46 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ This does not preclude adding support for this as a future API enhancement, and
</li>

<li>The user agent may also give the user a longer explanation the first time speech input is used, to let the user know what it is and how they can tune their privacy settings to disable speech recording if required.</li>

<li>To mitigate the risk of fingerprinting, user agents MUST NOT personalize speech recognition when performing speech recognition on a {{MediaStreamTrack}}.</li>
</ol>

<h3 id="implementation-considerations">Implementation considerations</h3>
Expand All @@ -141,8 +143,8 @@ This does not preclude adding support for this as a future API enhancement, and
<h3 id="speechreco-section">The SpeechRecognition Interface</h3>

<p>The speech recognition interface is the scripted web API for controlling a given recognition.</p>
The term "final result" indicates a SpeechRecognitionResult in which the final attribute is true.
The term "interim result" indicates a SpeechRecognitionResult in which the final attribute is false.
The term "final result" indicates a {{SpeechRecognitionResult}} in which the {{SpeechRecognitionResult/isFinal}} attribute is true.
The term "interim result" indicates a {{SpeechRecognitionResult}} in which the {{SpeechRecognitionResult/isFinal}} attribute is false.

{{SpeechRecognition}} has the following internal slots:

Expand All @@ -153,9 +155,9 @@ The term "interim result" indicates a SpeechRecognitionResult in which the final
</dl>

<dl dfn-type=attribute dfn-for="SpeechRecognition">
: <dfn>[[mode]]</dfn>
: <dfn>[[processLocally]]</dfn>
::
A {{SpeechRecognitionMode}} enum to determine where speech recognition takes place. The initial value is <code>ondevice-preferred</code>.
A boolean flag indicating whether recognition <em class="rfc2119" title="MUST">MUST</em> be performed locally. The initial value is <code>false</code>.
</dl>

<dl dfn-type=attribute dfn-for="SpeechRecognition">
Expand All @@ -174,16 +176,16 @@ interface SpeechRecognition : EventTarget {
attribute boolean continuous;
attribute boolean interimResults;
attribute unsigned long maxAlternatives;
attribute SpeechRecognitionMode mode;
attribute boolean processLocally;
attribute SpeechRecognitionPhraseList phrases;

// methods to drive the speech interaction
undefined start();
undefined start(MediaStreamTrack audioTrack);
undefined stop();
undefined abort();
static Promise<AvailabilityStatus> availableOnDevice(DOMString lang);
static Promise<boolean> installOnDevice(DOMString lang);
static Promise<AvailabilityStatus> available(SpeechRecognitionOptions options);
static Promise<boolean> install(SpeechRecognitionOptions options);

// event methods
attribute EventHandler onaudiostart;
Expand All @@ -199,6 +201,11 @@ interface SpeechRecognition : EventTarget {
attribute EventHandler onend;
};

dictionary SpeechRecognitionOptions {
required sequence<DOMString> langs;
boolean processLocally = false;
};

enum SpeechRecognitionErrorCode {
"no-speech",
"aborted",
Expand All @@ -210,12 +217,6 @@ enum SpeechRecognitionErrorCode {
"phrases-not-supported"
};

enum SpeechRecognitionMode {
"ondevice-preferred", // On-device speech recognition if available, otherwise use Cloud speech recognition as a fallback.
"ondevice-only", // On-device speech recognition only. Returns an error if on-device speech recognition is not available.
"cloud-only", // Cloud speech recognition only.
};

enum AvailabilityStatus {
"unavailable",
"downloadable",
Expand Down Expand Up @@ -314,20 +315,10 @@ interface SpeechRecognitionPhraseList {
<dd>This attribute will set the maximum number of {{SpeechRecognitionAlternative}}s per result.
The default value is 1.</dd>

<dt><dfn attribute for=SpeechRecognition>mode</dfn> attribute</dt>
<dd>
This attribute represents where speech recognition takes place.
</dd>
<dd>
The getter steps are to return the value of {{SpeechRecognition/[[mode]]}}.
</dd>
<dd>
The setter steps are:
1. If the {{SpeechRecognitionPhraseList/length}} of {{SpeechRecognition/phrases}} is greater than 0
and the system using the given value for {{SpeechRecognition/[[mode]]}} does not support contextual biasing,
throw a {{SpeechRecognitionErrorEvent}} with the {{SpeechRecognitionErrorCode/phrases-not-supported}}
error code and abort these steps.
1. Set {{SpeechRecognition/[[mode]]}} to the given value.
<dt><dfn attribute for=SpeechRecognition>processLocally</dfn> attribute</dt>
<dd>This attribute, when set to true, indicates a requirement that the speech recognition process <em class="rfc2119" title="MUST">MUST</em> be performed locally on the user's device.
If set to false, the user agent can choose between local and remote processing.
The default value is false.
</dd>

<dt><dfn attribute for=SpeechRecognition>phrases</dfn> attribute</dt>
Expand Down Expand Up @@ -389,46 +380,93 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072
The user agent must raise an <a event for=SpeechRecognition>end</a> event once the speech service is no longer connected.
If the abort method is called on an object which is already stopped or aborting (that is, start was never called on it, the <a event for=SpeechRecognition>end</a> or <a event for=SpeechRecognition>error</a> event has fired on it, or abort was previously called on it), the user agent must ignore the call.</dd>

<dt><dfn method for=SpeechRecognition>availableOnDevice({{DOMString}} lang)</dfn> method</dt>
<dt><dfn method for=SpeechRecognition>available({{SpeechRecognitionOptions}} options)</dfn> method</dt>
<dd>
The {{SpeechRecognition/availableOnDevice}} method returns a {{Promise}} that resolves to a {{AvailabilityStatus}} indicating the on-device speech recognition availability for a given [[!BCP47]] language tag.
The {{SpeechRecognition/available}} method returns a {{Promise}} that resolves to a {{AvailabilityStatus}} indicating the recognition availability matching the {{SpeechRecognitionOptions}} argument.

When invoked, run these steps:
1. Let <var>promise</var> be <a>a new promise</a>.
1. Run the <a>on-device availability algorithm</a> with <var>lang</var> and <var>promise</var>. If it returns an exception, throw it and abort these steps.
1. Run the <a>availability algorithm</a> with <var>options</var> and <var>promise</var>. If it returns an exception, throw it and abort these steps.
1. Return <var>promise</var>.
</dd>

<dt><dfn method for=SpeechRecognition>installOnDevice({{DOMString}} lang)</dfn> method</dt>
<dt><dfn method for=SpeechRecognition>install({{SpeechRecognitionOptions}} options)</dfn> method</dt>
<dd>
The {{SpeechRecognition/installOnDevice}} method returns a {{Promise}} that resolves to a {{boolean}} when and whether the installation of on-device speech recognition for a given [[!BCP47]] language tag succeeded.
The {{SpeechRecognition/install}} method attempts to install speech recognition language packs for all languages specified in `options.langs`.
It returns a {{Promise}} that resolves to a {{boolean}}.
The promise resolves to `true` when all installation attempts for requested and supported languages succeed (or the languages were already installed).
The promise resolves to `false` if `options.langs` is empty, if not all of the requested languages are supported, or if any installation attempt for a supported language fails.

When invoked, run these steps:
1. If the [=current settings object=]'s [=relevant global object=]'s [=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}} and abort these steps.
1. If <var>lang</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
1. If the on-device speech recognition language pack for <var>lang</var> is unsupported, return a resolved {{Promise}} with false and skip the rest of these steps.
1. If any <var>lang</var> in {{SpeechRecognitionOptions/langs}} of <var>options</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
1. If the on-device speech recognition language pack for any <var>lang</var> in {{SpeechRecognitionOptions/langs}} of <var>options</var> is unsupported, return a resolved {{Promise}} with false and skip the rest of these steps.
1. Let <var>promise</var> be <a>a new promise</a>.
1. Initiate the download of the on-device speech recognition language for <var>lang</var>.
1. For each <var>lang</var> in {{SpeechRecognitionOptions/langs}} of <var>options</var>, initiate the download of the on-device speech recognition language for <var>lang</var>.
<p class=note>
Note: The user agent can prompt the user for explicit permission to download the on-device speech recognition language pack.
</p>
1. [=Queue a task=] on the [=relevant global object=]'s [=task queue=] to run the following step:
- If the download succeeds, resolve <var>promise</var> with <code>true</code>, otherwise resolve it with <code>false</code>.
- When the download of all languages specified by {{SpeechRecognitionOptions/langs}} of <var>options</var> succeeds, resolve <var>promise</var> with <code>true</code>, otherwise resolve it with <code>false</code>.
<p class="note">
Note: The <code>false</code> resolution of the Promise does not indicate the specific cause of failure. User agents are encouraged to provide more detailed information about the failure in developer tools console messages. However, this detailed error information is not exposed to the script.
</p>
1. Return <var>promise</var>.
<p class=note>
{{SpeechRecognitionOptions/processLocally}} of <var>options</var> is not used in this algorithm.
</p>
</dd>

</dl>
<p>When the <dfn>on-device availability algorithm</dfn> with <var>lang</var> and <var>promise</var> is invoked, the user agent MUST run the following steps:

<h4 id="availability-status-values">AvailabilityStatus Enum Values</h4>
<p>The {{AvailabilityStatus}} enum indicates the availability of speech recognition capabilities. Its values are:</p>
<dl>
<dt><dfn enum-value for="AvailabilityStatus">"unavailable"</dfn></dt>
<dd>Indicates that speech recognition is not available for the specified language(s) and processing preference.
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `true`, this means on-device recognition for the language is not supported by the user agent.
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `false`, it means neither local nor remote recognition is available for at least one of the specified languages.</dd>

<dt><dfn enum-value for="AvailabilityStatus">"downloadable"</dfn></dt>
<dd>Indicates that on-device speech recognition for the specified language(s) is supported by the user agent but not yet installed. It can potentially be installed using the {{SpeechRecognition/install()}} method. This status is primarily relevant when {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is true.</dd>

<dt><dfn enum-value for="AvailabilityStatus">"downloading"</dfn></dt>
<dd>Indicates that on-device speech recognition for the specified language(s) is currently in the process of being downloaded. This status is primarily relevant when {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is true.</dd>

<dt><dfn enum-value for="AvailabilityStatus">"available"</dfn></dt>
<dd>Indicates that speech recognition is available for all specified language(s) and the given processing preference.
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is true, this means on-device recognition is installed and ready.
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is false, it means recognition (which could be local or remote) is available.</dd>
</dl>

<p>When the <dfn>availability algorithm</dfn> with <var>options</var> and <var>promise</var> is invoked, the user agent MUST run the following steps:
1. If the [=current settings object=]'s [=relevant global object=]'s [=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}} and abort these steps.
1. If <var>lang</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
1. Determine the availability status for <var>lang</var>:
- If the on-device speech recognition language pack for <var>lang</var> is unsupported, let <var>status</var> be {{AvailabilityStatus/unavailable}}.
- Else if the on-device speech recognition language pack for <var>lang</var> is supported but not installed, let <var>status</var> be {{AvailabilityStatus/downloadable}}.
- Else if the on-device speech recognition language pack for <var>lang</var> is downloading, let <var>status</var> be {{AvailabilityStatus/downloading}}.
- Else if the on-device speech recognition language pack for <var>lang</var> is installed, let <var>status</var> be {{AvailabilityStatus/available}}.
1. Let <var>langs</var> be {{SpeechRecognitionOptions/langs}} of <var>options</var>.
1. If any <var>lang</var> in <var>langs</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
1. If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `false`:
1. If <var>langs</var> is an empty sequence, let <var>status</var> be {{AvailabilityStatus/unavailable}}.
1. Else if speech recognition (which may be remote) is available for all <var>language</var> in <var>langs</var>, let <var>status</var> be {{AvailabilityStatus/available}}.
1. Else, let <var>status</var> be {{AvailabilityStatus/unavailable}}.
1. If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `true`:
<ol type=a>
<li>If <var>langs</var> is an empty sequence, let <var>status</var> be {{AvailabilityStatus/unavailable}}.</li>
<li>Else:
<ol type=i>
<li>Let <var>finalStatus</var> be {{AvailabilityStatus/available}}.</li>
<li>For each <var>language</var> in <var>langs</var>:
<ol>
<li>Let <var>currentLanguageStatus</var>.</li>
<li>If on-device speech recognition for <var>language</var> is installed, set <var>currentLanguageStatus</var> to {{AvailabilityStatus/available}}.</li>
<li>Else if on-device speech recognition for <var>language</var> is currently being downloaded, set <var>currentLanguageStatus</var> to {{AvailabilityStatus/downloading}}.</li>
<li>Else if on-device speech recognition for <var>language</var> is supported by the user agent but not yet installed, set <var>currentLanguageStatus</var> to {{AvailabilityStatus/downloadable}}.</li>
<li>Else (on-device speech recognition for <var>language</var> is not supported), set <var>currentLanguageStatus</var> to {{AvailabilityStatus/unavailable}}.</li>
<li>If <var>currentLanguageStatus</var> comes after <var>finalStatus</var> in the ordered list `[{{AvailabilityStatus/available}}, {{AvailabilityStatus/downloading}}, {{AvailabilityStatus/downloadable}}, {{AvailabilityStatus/unavailable}}]`, set <var>finalStatus</var> to <var>currentLanguageStatus</var>.</li>
</ol>
</li>
<li>Let <var>status</var> be <var>finalStatus</var>.</li>
</ol>
</li>
</ol>
1. [=Queue a task=] on the [=relevant global object=]'s [=task queue=] to run the following step:
- Resolve <var>promise</var> with <var>status</var>.

Expand All @@ -439,9 +477,13 @@ following steps:
1. If the [=current settings object=]'s [=relevant global object=]'s
[=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}}
and abort these steps.
1. If {{[[started]]}} is `true` and no <a event
for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event
have fired, throw an {{InvalidStateError}} and abort these steps.
1. If {{SpeechRecognition/[[started]]}} is `true` and no <a event
for=SpeechRecognition>error</a> event or <a event for=SpeechRecognition>end</a> event
has fired on it, throw an {{InvalidStateError}} and abort these steps.
1. If this.{{SpeechRecognition/[[processLocally]]}} is `true`:
a. If the user agent determines that local speech recognition is not available for this.{{SpeechRecognition/lang}}, or if it cannot fulfill the local processing requirement for other reasons:
i. [=Queue a task=] to [=fire an event=] named `error` at `this`. The event's `error` attribute <em class="rfc2119" title="MUST">MUST</em> be {{SpeechRecognitionErrorCode/service-not-allowed}}. The event's `message` attribute <em class="rfc2119" title="MUST">MUST</em> provide an implementation-defined string detailing the reason.
ii. Abort these steps.
1. Set {{[[started]]}} to `true`.
1. If |requestMicrophonePermission| is `true` and [=request
permission to use=] "`microphone`" is [=permission/"denied"=], abort
Expand Down