Skip to content

Commit 2a9e568

Browse files
authored
V1.2 (#8)
v1.2
1 parent 6f39d6f commit 2a9e568

File tree

67 files changed

+252
-178
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+252
-178
lines changed

README.md

Lines changed: 67 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
44

5-
Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from speech commands within a given context of
5+
Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of
66
interest in real-time. For example, given a speech command "*Can I have a small double-shot espresso with a lot of sugar
77
and some milk*" it infers that the user wants to *order a drink* with the following specific requirements.
88

@@ -20,16 +20,15 @@ Rhino is
2020

2121
* intuitive. It allows users to utter their intention in a natural and conversational fashion.
2222
* using deep neural networks trained in **real-world situations**.
23-
* compact and computationally-efficient making it suitable for **IoT** applications. It can run with as low as 100 KB of RAM.
24-
* cross-platform. It is implemented in fixed-point ANSI C. Currently **ARM Cortex-M**, **ARM Cortex-A**,
25-
**Raspberry Pi**, **Android**, **iOS**, **watchOS**, **Linux**, **Mac**, **Windows**, and **WebAssembly** are supported.
23+
* compact and computationally-efficient making it suitable for **IoT** applications. It can run with as low as 90 KB of
24+
RAM on an MCU.
25+
* cross-platform. It is implemented in fixed-point ANSI C. Currently **Raspberry Pi**, **Beagle Bone** **Android**,
26+
**iOS**, **Linux**, **Mac**, **Windows**, and **web browsers** (**WebAssembly**) are supported. Additionally support for
27+
various **ARM Cortex-A**, **ARM Cortex-M** (M4/M7) and **DSP cores** is available for commercial customers.
2628
* customizable. It can be customized for any given domain.
2729

2830
[![Rhino in Action](https://img.youtube.com/vi/WadKhfLyqTQ/0.jpg)](https://www.youtube.com/watch?v=WadKhfLyqTQ)
2931

30-
NOTE: Currently Raspberry Pi, Android, and Linux builds are available to the open-source community. But we do have plans
31-
to make other platforms available as well in upcoming releases.
32-
3332
## Table of Contents
3433
* [Try It Out](#try-it-out)
3534
* [Motivation](#motivation)
@@ -43,9 +42,11 @@ to make other platforms available as well in upcoming releases.
4342
* [Running Demo Applications](#running-demo-applications)
4443
* [Running Python Demo Application](#running-python-demo-application)
4544
* [Running C Demo Application](#running-c-demo-application)
45+
* [Runnning Android Demo Application](#running-android-demo-application)
4646
* [Integration](#integration)
4747
* [C](#c)
4848
* [Python](#python)
49+
* [Android](#android)
4950
* [Releases](#releases)
5051
* [License](#license)
5152

@@ -64,17 +65,17 @@ requires significant CPU and memory for an on-device implementation.
6465

6566
Rhino solves this problem by providing a tightly-coupled speech recognition and NLU engine that are jointly optimized
6667
for a specific domain (use case). Rhino is quite lean and can even run on small embedded processors
67-
(think ARM Cortex-M or fixed-point DSPs) with very limited RAM (as low as 100 KB) making it ideal for
68+
(think ARM Cortex-M or fixed-point DSPs) with very limited RAM (as low as 90 KB) making it ideal for
6869
resource-constrained IoT applications.
6970

7071
## Metrics
7172

72-
The table shows the average CPU usage on three different platforms (1) Raspberry Pi zero, (2) Raspberry Pi 3, and an
73-
Ubuntu box (i5-6500 CPU @ 3.20GHz). You can recreate this using the [C demo application](/demo/c).
73+
The table shows the average CPU usage on two different platforms (1) Raspberry Pi zero and (2) Raspberry Pi 3. You can
74+
recreate this using the [C demo application](/demo/c).
7475

75-
Raspberry Pi zero | Raspberry Pi 3 | Ubuntu Desktop (i5-6500 CPU @ 3.20GHz)
76-
:---: | :---: | :---:
77-
48.7% | 8.9% | 1.2%
76+
Raspberry Pi zero | Raspberry Pi 3
77+
:---: | :---:
78+
46.4% | 7.2%
7879

7980
## Terminology
8081

@@ -93,8 +94,8 @@ of spoken commands:
9394

9495
### Expression
9596

96-
A context is made of a collection of spoken commands mapped to the user's intent. An expression is an entity that defines a mapping between
97-
a (or a set of) spoken commands and its (their) corresponding intent. For example
97+
A context is made of a collection of spoken commands mapped to the user's intent. An expression is an entity that defines
98+
a mapping between a (or a set of) spoken commands and its (their) corresponding intent. For example
9899

99100
* {turnCommand} the lights. -> {turnIntent}
100101
* Make the {location} light {intensityChange}. -> {changeIntensityIntent}
@@ -143,7 +144,7 @@ python demo/python/rhino_demo.py \
143144
--rhino_context_file_path ./resources/contexts/linux/coffee_maker_linux.rhn \
144145
--porcupine_library_path ./resources/porcupine/lib/linux/x86_64/libpv_porcupine.so \
145146
--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \
146-
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn
147+
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey\ pico_linux.ppn
147148
```
148149

149150
The following runs the engine on a *Raspberry Pi 3* to infer intent within the context of smart lighting system
@@ -155,14 +156,19 @@ python demo/python/rhino_demo.py \
155156
--rhino_context_file_path ./resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn \
156157
--porcupine_library_path ./resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so \
157158
--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \
158-
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn
159+
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey\ pico_raspberrypi.ppn
159160
```
160161

161162
### Running C Demo Application
162163

163164
This [demo application](demo/c) is mainly used to show how Rhino can be integrated into an efficient C/C++ application.
164165
Furthermore it can be used to measure runtime metrics of the engine on various supported platforms.
165166

167+
### Running Android Demo Application
168+
169+
Using Android Studio open [demo/android](/demo/android) as an Android project and then run the application. Note that
170+
you need an android phone with developer options enabled connected to your machine in order to run the application.
171+
166172
## Integration
167173

168174
Below are code snippets showcasing how Rhino can be integrated into different applications.
@@ -282,8 +288,52 @@ collector.
282288
rhino.delete()
283289
```
284290

291+
### Android
292+
293+
Rhino provides a binding for Android using JNI. It can be initialized using.
294+
295+
```java
296+
final String modelFilePath = ... // It is available at lib/common/rhino_params.pv
297+
final String contextFilePath = ...
298+
299+
Rhino rhino = new Rhino(modelFilePath, contextFilePath);
300+
```
301+
302+
once initialized `rhino` can be used for intent inference.
303+
304+
305+
```java
306+
private short[] getNextAudioFrame();
307+
308+
while (rhino.process(getNextAudioFrame()));
309+
310+
if (rhino.isUnderstood()) {
311+
RhinoIntent intent = rhino.getIntent();
312+
// logic to perform an action given the intent object.
313+
} else {
314+
// logic for handling out of context or unrecognized command
315+
}
316+
```
317+
318+
when finalized the processing be sure to reset the object before processing a new stream of audio via
319+
320+
```java
321+
rhino.reset()
322+
```
323+
324+
finally, prior to exiting the application be sure to release resources acquired via
325+
326+
```java
327+
rhino.delete()
328+
```
329+
285330
## Releases
286331

332+
### v1.2.0 April 26, 2019
333+
334+
* Accuracy improvements.
335+
* Runtime optimizations.
336+
287337
### v1.1.0 December 23rd, 2018
288338

289339
* Accuracy improvements.

binding/android/rhino/src/main/java/ai/picovoice/rhino/Rhino.java

Lines changed: 44 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,18 @@
1717

1818
package ai.picovoice.rhino;
1919

20-
import java.util.HashMap;
20+
import java.util.LinkedHashMap;
2121
import java.util.Map;
2222

2323
/**
24-
* Binding for Picovoice's speech-to-intent engine (aka Rhino).
25-
* The object directly infers intent from speech commands within a given context of interest in
26-
* real-time. It processes incoming audio in consecutive frames (chunks) and at the end of each
27-
* frame indicates if the intent extraction is finalized. When finalized, the intent can be
28-
* retrieved as structured data in form of an intent string and pairs of slots and values
29-
* representing arguments (details) of intent. The number of samples per frame can be attained by
30-
* calling {@link #frameLength()}. The incoming audio needs to have a sample rate equal to
31-
* {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore, Rhino operates on single
32-
* channel audio.
24+
* Binding for Picovoice's speech-to-intent engine (aka Rhino). The object directly infers intent
25+
* from speech commands within a given context of interest in real-time. It processes incoming audio
26+
* in consecutive frames (chunks) and at the end of each frame indicates if the intent extraction is
27+
* finalized. When finalized, the intent can be retrieved as structured data in form of an intent
28+
* string and pairs of slots and values representing arguments (details) of intent. The number of
29+
* samples per frame can be attained by calling {@link #frameLength()}. The incoming audio needs to
30+
* have a sample rate equal to {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore,
31+
* Rhino operates on single channel audio.
3332
*/
3433
public class Rhino {
3534
static {
@@ -40,10 +39,11 @@ public class Rhino {
4039

4140
/**
4241
* Constructor.
43-
* @param modelFilePath Absolute path to file containing model parameters.
44-
* @param contextFilePath Absolute path to file containing context parameters. A context
45-
* represents the set of expressions (commands), intents, and intent
46-
* arguments (slots) within a domain of interest.
42+
*
43+
* @param modelFilePath Absolute path to file containing model parameters.
44+
* @param contextFilePath Absolute path to file containing context parameters. A context
45+
* represents the set of expressions (commands), intents, and intent
46+
* arguments (slots) within a domain of interest.
4747
* @throws RhinoException On failure.
4848
*/
4949
public Rhino(String modelFilePath, String contextFilePath) throws RhinoException {
@@ -55,7 +55,8 @@ public Rhino(String modelFilePath, String contextFilePath) throws RhinoException
5555
}
5656

5757
/**
58-
* Destructor. This is needs to be called explicitly as we do not rely on garbage collector.
58+
* Destructor. This needs to be called explicitly as we do not rely on garbage collector.
59+
*
5960
* @throws RhinoException On failure.
6061
*/
6162
public void delete() throws RhinoException {
@@ -69,7 +70,8 @@ public void delete() throws RhinoException {
6970
/**
7071
* Processes a frame of audio and emits a flag indicating if the engine has finalized intent
7172
* extraction. When finalized, {@link #isUnderstood()} should be called to check if the command
72-
* was valid (is within context of interest).
73+
* was valid (is within context of interest) and is understood.
74+
*
7375
* @param pcm A frame of audio samples. The number of samples per frame can be attained by
7476
* calling {@link #frameLength()}. The incoming audio needs to have a sample rate
7577
* equal to {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore,
@@ -79,7 +81,7 @@ public void delete() throws RhinoException {
7981
*/
8082
public boolean process(short[] pcm) throws RhinoException {
8183
try {
82-
return process(object, pcm) == 1;
84+
return process(object, pcm);
8385
} catch (Exception e) {
8486
throw new RhinoException(e);
8587
}
@@ -88,13 +90,14 @@ public boolean process(short[] pcm) throws RhinoException {
8890
/**
8991
* Indicates if the spoken command is valid, is within the domain of interest (context), and the
9092
* engine understood it.
93+
*
9194
* @return Flag indicating if the spoken command is valid, is within the domain of interest
9295
* (context), and the engine understood it.
9396
* @throws RhinoException On failure.
9497
*/
9598
public boolean isUnderstood() throws RhinoException {
9699
try {
97-
return isUnderstood(object) == 1;
100+
return isUnderstood(object);
98101
} catch (Exception e) {
99102
throw new RhinoException(e);
100103
}
@@ -105,21 +108,22 @@ public boolean isUnderstood() throws RhinoException {
105108
* string and pairs of slots and their values. It should be called only after intent extraction
106109
* is finalized and it is verified that the spoken command is valid and understood via calling
107110
* {@link #isUnderstood()}.
111+
*
108112
* @return Inferred intent object.
109113
* @throws RhinoException On failure.
110114
*/
111115
public RhinoIntent getIntent() throws RhinoException {
112116
final String intentPacked = getIntent(object);
113117
String[] parts = intentPacked.split(",");
114118
if (parts.length == 0) {
115-
throw new RhinoException(String.format("Failed to retrieve intent from %s", intentPacked));
119+
throw new RhinoException(String.format("failed to retrieve intent from %s", intentPacked));
116120
}
117121

118-
Map<String, String> slots = new HashMap<>();
122+
Map<String, String> slots = new LinkedHashMap<>();
119123
for (int i = 1; i < parts.length; i++) {
120124
String[] slotAndValue = parts[i].split(":");
121125
if (slotAndValue.length != 2) {
122-
throw new RhinoException(String.format("Failed to retrieve intent from %s", intentPacked));
126+
throw new RhinoException(String.format("failed to retrieve intent from %s", intentPacked));
123127
}
124128
slots.put(slotAndValue[0], slotAndValue[1]);
125129
}
@@ -130,6 +134,7 @@ public RhinoIntent getIntent() throws RhinoException {
130134
/**
131135
* Resets the internal state of the engine. It should be called before the engine can be used to
132136
* infer intent from a new stream of audio.
137+
*
133138
* @throws RhinoException On failure.
134139
*/
135140
public void reset() throws RhinoException {
@@ -143,6 +148,7 @@ public void reset() throws RhinoException {
143148
/**
144149
* Getter for expressions. Each expression maps a set of spoken phrases to an intent and
145150
* possibly a number of slots (intent arguments).
151+
*
146152
* @return Expressions.
147153
* @throws RhinoException On failure.
148154
*/
@@ -154,35 +160,38 @@ public String getContextExpressions() throws RhinoException {
154160
}
155161
}
156162

157-
private native long init(String model_file_path, String context_file_path);
158-
159-
private native long delete(long object);
160-
161-
private native int process(long object, short[] pcm);
162-
163-
private native int isUnderstood(long object);
164-
165-
private native String getIntent(long object);
166-
167-
private native boolean reset(long object);
168-
169-
private native String contextExpressions(long object);
170-
171163
/**
172164
* Getter for length (number of audio samples) per frame.
165+
*
173166
* @return Frame length.
174167
*/
175168
public native int frameLength();
176169

177170
/**
178171
* Audio sample rate accepted by Picovoice.
172+
*
179173
* @return Sample rate.
180174
*/
181175
public native int sampleRate();
182176

183177
/**
184178
* Getter for version string.
179+
*
185180
* @return Version string.
186181
*/
187182
public native String version();
183+
184+
private native long init(String model_file_path, String context_file_path);
185+
186+
private native void delete(long object);
187+
188+
private native boolean process(long object, short[] pcm);
189+
190+
private native boolean isUnderstood(long object);
191+
192+
private native String getIntent(long object);
193+
194+
private native boolean reset(long object);
195+
196+
private native String contextExpressions(long object);
188197
}

binding/python/rhino.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121

2222
class Rhino(object):
23-
"""Python binding for Picovoice's Speech to Intent (a.k.a Rhino) engine."""
23+
"""Python binding for Picovoice's Speech-to-Intent (a.k.a Rhino) engine."""
2424

2525
class PicovoiceStatuses(Enum):
2626
"""Status codes corresponding to 'pv_status_t' defined in 'include/picovoice.h'"""

binding/python/test_rhino.py

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -88,25 +88,34 @@ def _library_path(cls):
8888
system = platform.system()
8989
machine = platform.machine()
9090

91-
if system == 'Linux':
91+
if system == 'Darwin':
92+
return cls._abs_path('lib/mac/x86_64/libpv_rhino.dylib')
93+
elif system == 'Linux':
9294
if machine == 'x86_64':
9395
return cls._abs_path('lib/linux/x86_64/libpv_rhino.so')
9496
elif machine.startswith('arm'):
9597
return cls._abs_path('lib/raspberry-pi/arm11/libpv_rhino.so')
96-
97-
raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
98+
elif system == 'Windows':
99+
return cls._abs_path('lib/windows/amd64/libpv_rhino.dll')
100+
else:
101+
raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
98102

99103
@classmethod
100104
def _context_file_path(cls):
101105
system = platform.system()
102106
machine = platform.machine()
103107

104-
if system == 'Linux' and machine == 'x86_64':
105-
return cls._abs_path('resources/contexts/linux/coffee_maker_linux.rhn')
106-
elif system == 'Linux' and machine.startswith('arm'):
107-
return cls._abs_path('resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn')
108-
109-
raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
108+
if system == 'Darwin':
109+
return cls._abs_path('resources/contexts/mac/coffee_maker_mac.rhn')
110+
elif system == 'Linux':
111+
if machine == 'x86_64':
112+
return cls._abs_path('resources/contexts/linux/coffee_maker_linux.rhn')
113+
elif machine.startswith('arm'):
114+
return cls._abs_path('resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn')
115+
elif system == 'Windows':
116+
return cls._abs_path('resources/contexts/windows/coffee_maker_windows.rhn')
117+
else:
118+
raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
110119

111120

112121
if __name__ == '__main__':

0 commit comments

Comments
 (0)