Skip to content

Commit e8a36c6

Browse files
evanbliuevliu-googlepadenot
authored
Update spec with latest on-device changes (#157)
* Update availableOnDevice and installOnDevice parameters * Update availableOnDevice and installOnDevice parameters * Update availableOnDevice and installOnDevice parameters * Update availableOnDevice and installOnDevice parameters * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> * Update index.bs Co-authored-by: Paul Adenot <[email protected]> --------- Co-authored-by: Evan Liu <[email protected]> Co-authored-by: Paul Adenot <[email protected]>
1 parent ff12a0d commit e8a36c6

File tree

1 file changed

+88
-46
lines changed

1 file changed

+88
-46
lines changed

index.bs

Lines changed: 88 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ This does not preclude adding support for this as a future API enhancement, and
118118
</li>
119119

120120
<li>The user agent may also give the user a longer explanation the first time speech input is used, to let the user know what it is and how they can tune their privacy settings to disable speech recording if required.</li>
121+
122+
<li>To mitigate the risk of fingerprinting, user agents MUST NOT personalize speech recognition when performing speech recognition on a {{MediaStreamTrack}}.</li>
121123
</ol>
122124

123125
<h3 id="implementation-considerations">Implementation considerations</h3>
@@ -141,8 +143,8 @@ This does not preclude adding support for this as a future API enhancement, and
141143
<h3 id="speechreco-section">The SpeechRecognition Interface</h3>
142144

143145
<p>The speech recognition interface is the scripted web API for controlling a given recognition.</p>
144-
The term "final result" indicates a SpeechRecognitionResult in which the final attribute is true.
145-
The term "interim result" indicates a SpeechRecognitionResult in which the final attribute is false.
146+
The term "final result" indicates a {{SpeechRecognitionResult}} in which the {{SpeechRecognitionResult/isFinal}} attribute is true.
147+
The term "interim result" indicates a {{SpeechRecognitionResult}} in which the {{SpeechRecognitionResult/isFinal}} attribute is false.
146148

147149
{{SpeechRecognition}} has the following internal slots:
148150

@@ -153,9 +155,9 @@ The term "interim result" indicates a SpeechRecognitionResult in which the final
153155
</dl>
154156

155157
<dl dfn-type=attribute dfn-for="SpeechRecognition">
156-
: <dfn>[[mode]]</dfn>
158+
: <dfn>[[processLocally]]</dfn>
157159
::
158-
A {{SpeechRecognitionMode}} enum to determine where speech recognition takes place. The initial value is <code>ondevice-preferred</code>.
160+
A boolean flag indicating whether recognition <em class="rfc2119" title="MUST">MUST</em> be performed locally. The initial value is <code>false</code>.
159161
</dl>
160162

161163
<dl dfn-type=attribute dfn-for="SpeechRecognition">
@@ -174,16 +176,16 @@ interface SpeechRecognition : EventTarget {
174176
attribute boolean continuous;
175177
attribute boolean interimResults;
176178
attribute unsigned long maxAlternatives;
177-
attribute SpeechRecognitionMode mode;
179+
attribute boolean processLocally;
178180
attribute SpeechRecognitionPhraseList phrases;
179181

180182
// methods to drive the speech interaction
181183
undefined start();
182184
undefined start(MediaStreamTrack audioTrack);
183185
undefined stop();
184186
undefined abort();
185-
static Promise<AvailabilityStatus> availableOnDevice(DOMString lang);
186-
static Promise<boolean> installOnDevice(DOMString lang);
187+
static Promise<AvailabilityStatus> available(SpeechRecognitionOptions options);
188+
static Promise<boolean> install(SpeechRecognitionOptions options);
187189

188190
// event methods
189191
attribute EventHandler onaudiostart;
@@ -199,6 +201,11 @@ interface SpeechRecognition : EventTarget {
199201
attribute EventHandler onend;
200202
};
201203

204+
dictionary SpeechRecognitionOptions {
205+
required sequence<DOMString> langs;
206+
boolean processLocally = false;
207+
};
208+
202209
enum SpeechRecognitionErrorCode {
203210
"no-speech",
204211
"aborted",
@@ -210,12 +217,6 @@ enum SpeechRecognitionErrorCode {
210217
"phrases-not-supported"
211218
};
212219

213-
enum SpeechRecognitionMode {
214-
"ondevice-preferred", // On-device speech recognition if available, otherwise use Cloud speech recognition as a fallback.
215-
"ondevice-only", // On-device speech recognition only. Returns an error if on-device speech recognition is not available.
216-
"cloud-only", // Cloud speech recognition only.
217-
};
218-
219220
enum AvailabilityStatus {
220221
"unavailable",
221222
"downloadable",
@@ -314,20 +315,10 @@ interface SpeechRecognitionPhraseList {
314315
<dd>This attribute will set the maximum number of {{SpeechRecognitionAlternative}}s per result.
315316
The default value is 1.</dd>
316317

317-
<dt><dfn attribute for=SpeechRecognition>mode</dfn> attribute</dt>
318-
<dd>
319-
This attribute represents where speech recognition takes place.
320-
</dd>
321-
<dd>
322-
The getter steps are to return the value of {{SpeechRecognition/[[mode]]}}.
323-
</dd>
324-
<dd>
325-
The setter steps are:
326-
1. If the {{SpeechRecognitionPhraseList/length}} of {{SpeechRecognition/phrases}} is greater than 0
327-
and the system using the given value for {{SpeechRecognition/[[mode]]}} does not support contextual biasing,
328-
throw a {{SpeechRecognitionErrorEvent}} with the {{SpeechRecognitionErrorCode/phrases-not-supported}}
329-
error code and abort these steps.
330-
1. Set {{SpeechRecognition/[[mode]]}} to the given value.
318+
<dt><dfn attribute for=SpeechRecognition>processLocally</dfn> attribute</dt>
319+
<dd>This attribute, when set to true, indicates a requirement that the speech recognition process <em class="rfc2119" title="MUST">MUST</em> be performed locally on the user's device.
320+
If set to false, the user agent can choose between local and remote processing.
321+
The default value is false.
331322
</dd>
332323

333324
<dt><dfn attribute for=SpeechRecognition>phrases</dfn> attribute</dt>
@@ -389,46 +380,93 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072
389380
The user agent must raise an <a event for=SpeechRecognition>end</a> event once the speech service is no longer connected.
390381
If the abort method is called on an object which is already stopped or aborting (that is, start was never called on it, the <a event for=SpeechRecognition>end</a> or <a event for=SpeechRecognition>error</a> event has fired on it, or abort was previously called on it), the user agent must ignore the call.</dd>
391382

392-
<dt><dfn method for=SpeechRecognition>availableOnDevice({{DOMString}} lang)</dfn> method</dt>
383+
<dt><dfn method for=SpeechRecognition>available({{SpeechRecognitionOptions}} options)</dfn> method</dt>
393384
<dd>
394-
The {{SpeechRecognition/availableOnDevice}} method returns a {{Promise}} that resolves to a {{AvailabilityStatus}} indicating the on-device speech recognition availability for a given [[!BCP47]] language tag.
385+
The {{SpeechRecognition/available}} method returns a {{Promise}} that resolves to a {{AvailabilityStatus}} indicating the recognition availability matching the {{SpeechRecognitionOptions}} argument.
395386

396387
When invoked, run these steps:
397388
1. Let <var>promise</var> be <a>a new promise</a>.
398-
1. Run the <a>on-device availability algorithm</a> with <var>lang</var> and <var>promise</var>. If it returns an exception, throw it and abort these steps.
389+
1. Run the <a>availability algorithm</a> with <var>options</var> and <var>promise</var>. If it returns an exception, throw it and abort these steps.
399390
1. Return <var>promise</var>.
400391
</dd>
401392

402-
<dt><dfn method for=SpeechRecognition>installOnDevice({{DOMString}} lang)</dfn> method</dt>
393+
<dt><dfn method for=SpeechRecognition>install({{SpeechRecognitionOptions}} options)</dfn> method</dt>
403394
<dd>
404-
The {{SpeechRecognition/installOnDevice}} method returns a {{Promise}} that resolves to a {{boolean}} when and whether the installation of on-device speech recognition for a given [[!BCP47]] language tag succeeded.
395+
The {{SpeechRecognition/install}} method attempts to install speech recognition language packs for all languages specified in `options.langs`.
396+
It returns a {{Promise}} that resolves to a {{boolean}}.
397+
The promise resolves to `true` when all installation attempts for requested and supported languages succeed (or the languages were already installed).
398+
The promise resolves to `false` if `options.langs` is empty, if not all of the requested languages are supported, or if any installation attempt for a supported language fails.
405399

406400
When invoked, run these steps:
407401
1. If the [=current settings object=]'s [=relevant global object=]'s [=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}} and abort these steps.
408-
1. If <var>lang</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
409-
1. If the on-device speech recognition language pack for <var>lang</var> is unsupported, return a resolved {{Promise}} with false and skip the rest of these steps.
402+
1. If any <var>lang</var> in {{SpeechRecognitionOptions/langs}} of <var>options</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
403+
1. If the on-device speech recognition language pack for any <var>lang</var> in {{SpeechRecognitionOptions/langs}} of <var>options</var> is unsupported, return a resolved {{Promise}} with false and skip the rest of these steps.
410404
1. Let <var>promise</var> be <a>a new promise</a>.
411-
1. Initiate the download of the on-device speech recognition language for <var>lang</var>.
405+
1. For each <var>lang</var> in {{SpeechRecognitionOptions/langs}} of <var>options</var>, initiate the download of the on-device speech recognition language for <var>lang</var>.
412406
<p class=note>
413407
Note: The user agent can prompt the user for explicit permission to download the on-device speech recognition language pack.
414408
</p>
415409
1. [=Queue a task=] on the [=relevant global object=]'s [=task queue=] to run the following step:
416-
- If the download succeeds, resolve <var>promise</var> with <code>true</code>, otherwise resolve it with <code>false</code>.
410+
- When the download of all languages specified by {{SpeechRecognitionOptions/langs}} of <var>options</var> succeeds, resolve <var>promise</var> with <code>true</code>, otherwise resolve it with <code>false</code>.
417411
<p class="note">
418412
Note: The <code>false</code> resolution of the Promise does not indicate the specific cause of failure. User agents are encouraged to provide more detailed information about the failure in developer tools console messages. However, this detailed error information is not exposed to the script.
419413
</p>
420414
1. Return <var>promise</var>.
415+
<p class=note>
416+
{{SpeechRecognitionOptions/processLocally}} of <var>options</var> is not used in this algorithm.
417+
</p>
421418
</dd>
422419

423420
</dl>
424-
<p>When the <dfn>on-device availability algorithm</dfn> with <var>lang</var> and <var>promise</var> is invoked, the user agent MUST run the following steps:
421+
422+
<h4 id="availability-status-values">AvailabilityStatus Enum Values</h4>
423+
<p>The {{AvailabilityStatus}} enum indicates the availability of speech recognition capabilities. Its values are:</p>
424+
<dl>
425+
<dt><dfn enum-value for="AvailabilityStatus">"unavailable"</dfn></dt>
426+
<dd>Indicates that speech recognition is not available for the specified language(s) and processing preference.
427+
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `true`, this means on-device recognition for the language is not supported by the user agent.
428+
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `false`, it means neither local nor remote recognition is available for at least one of the specified languages.</dd>
429+
430+
<dt><dfn enum-value for="AvailabilityStatus">"downloadable"</dfn></dt>
431+
<dd>Indicates that on-device speech recognition for the specified language(s) is supported by the user agent but not yet installed. It can potentially be installed using the {{SpeechRecognition/install()}} method. This status is primarily relevant when {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is true.</dd>
432+
433+
<dt><dfn enum-value for="AvailabilityStatus">"downloading"</dfn></dt>
434+
<dd>Indicates that on-device speech recognition for the specified language(s) is currently in the process of being downloaded. This status is primarily relevant when {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is true.</dd>
435+
436+
<dt><dfn enum-value for="AvailabilityStatus">"available"</dfn></dt>
437+
<dd>Indicates that speech recognition is available for all specified language(s) and the given processing preference.
438+
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is true, this means on-device recognition is installed and ready.
439+
If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is false, it means recognition (which could be local or remote) is available.</dd>
440+
</dl>
441+
442+
<p>When the <dfn>availability algorithm</dfn> with <var>options</var> and <var>promise</var> is invoked, the user agent MUST run the following steps:
425443
1. If the [=current settings object=]'s [=relevant global object=]'s [=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}} and abort these steps.
426-
1. If <var>lang</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
427-
1. Determine the availability status for <var>lang</var>:
428-
- If the on-device speech recognition language pack for <var>lang</var> is unsupported, let <var>status</var> be {{AvailabilityStatus/unavailable}}.
429-
- Else if the on-device speech recognition language pack for <var>lang</var> is supported but not installed, let <var>status</var> be {{AvailabilityStatus/downloadable}}.
430-
- Else if the on-device speech recognition language pack for <var>lang</var> is downloading, let <var>status</var> be {{AvailabilityStatus/downloading}}.
431-
- Else if the on-device speech recognition language pack for <var>lang</var> is installed, let <var>status</var> be {{AvailabilityStatus/available}}.
444+
1. Let <var>langs</var> be {{SpeechRecognitionOptions/langs}} of <var>options</var>.
445+
1. If any <var>lang</var> in <var>langs</var> is not a valid [[!BCP47]] language tag, throw a {{SyntaxError}} and abort these steps.
446+
1. If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `false`:
447+
1. If <var>langs</var> is an empty sequence, let <var>status</var> be {{AvailabilityStatus/unavailable}}.
448+
1. Else if speech recognition (which may be remote) is available for all <var>language</var> in <var>langs</var>, let <var>status</var> be {{AvailabilityStatus/available}}.
449+
1. Else, let <var>status</var> be {{AvailabilityStatus/unavailable}}.
450+
1. If {{SpeechRecognitionOptions/processLocally}} of <var>options</var> is `true`:
451+
<ol type=a>
452+
<li>If <var>langs</var> is an empty sequence, let <var>status</var> be {{AvailabilityStatus/unavailable}}.</li>
453+
<li>Else:
454+
<ol type=i>
455+
<li>Let <var>finalStatus</var> be {{AvailabilityStatus/available}}.</li>
456+
<li>For each <var>language</var> in <var>langs</var>:
457+
<ol>
458+
<li>Let <var>currentLanguageStatus</var>.</li>
459+
<li>If on-device speech recognition for <var>language</var> is installed, set <var>currentLanguageStatus</var> to {{AvailabilityStatus/available}}.</li>
460+
<li>Else if on-device speech recognition for <var>language</var> is currently being downloaded, set <var>currentLanguageStatus</var> to {{AvailabilityStatus/downloading}}.</li>
461+
<li>Else if on-device speech recognition for <var>language</var> is supported by the user agent but not yet installed, set <var>currentLanguageStatus</var> to {{AvailabilityStatus/downloadable}}.</li>
462+
<li>Else (on-device speech recognition for <var>language</var> is not supported), set <var>currentLanguageStatus</var> to {{AvailabilityStatus/unavailable}}.</li>
463+
<li>If <var>currentLanguageStatus</var> comes after <var>finalStatus</var> in the ordered list `[{{AvailabilityStatus/available}}, {{AvailabilityStatus/downloading}}, {{AvailabilityStatus/downloadable}}, {{AvailabilityStatus/unavailable}}]`, set <var>finalStatus</var> to <var>currentLanguageStatus</var>.</li>
464+
</ol>
465+
</li>
466+
<li>Let <var>status</var> be <var>finalStatus</var>.</li>
467+
</ol>
468+
</li>
469+
</ol>
432470
1. [=Queue a task=] on the [=relevant global object=]'s [=task queue=] to run the following step:
433471
- Resolve <var>promise</var> with <var>status</var>.
434472

@@ -439,9 +477,13 @@ following steps:
439477
1. If the [=current settings object=]'s [=relevant global object=]'s
440478
[=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}}
441479
and abort these steps.
442-
1. If {{[[started]]}} is `true` and no <a event
443-
for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event
444-
have fired, throw an {{InvalidStateError}} and abort these steps.
480+
1. If {{SpeechRecognition/[[started]]}} is `true` and no <a event
481+
for=SpeechRecognition>error</a> event or <a event for=SpeechRecognition>end</a> event
482+
has fired on it, throw an {{InvalidStateError}} and abort these steps.
483+
1. If this.{{SpeechRecognition/[[processLocally]]}} is `true`:
484+
a. If the user agent determines that local speech recognition is not available for this.{{SpeechRecognition/lang}}, or if it cannot fulfill the local processing requirement for other reasons:
485+
i. [=Queue a task=] to [=fire an event=] named `error` at `this`. The event's `error` attribute <em class="rfc2119" title="MUST">MUST</em> be {{SpeechRecognitionErrorCode/service-not-allowed}}. The event's `message` attribute <em class="rfc2119" title="MUST">MUST</em> provide an implementation-defined string detailing the reason.
486+
ii. Abort these steps.
445487
1. Set {{[[started]]}} to `true`.
446488
1. If |requestMicrophonePermission| is `true` and [=request
447489
permission to use=] "`microphone`" is [=permission/"denied"=], abort

0 commit comments

Comments
 (0)