|
1 | 1 | # SEPIA Speech-To-Text Server |
| 2 | + |
| 3 | +SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for realtime automatic speech recognition (ASR) supporting multiple open-source ASR engines. |
| 4 | +It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results. |
2 | 5 |
|
3 | | -[BETA - UNDER CONSTRUCTION] |
| 6 | +One goal of this project is to offer a **standardized, secure, realtime interface** for all the great open-source ASR tools out there. |
| 7 | +The server works on all major platforms including single-board devices like Raspberry Pi (4). |
4 | 8 |
|
5 | | -This server supports streaming audio over a WebSocket connection with integration of an open-source ASR decoder like the Kaldi speech recognition toolkit. It can handle full-duplex messaging during the decoding process for intermediate results. The REST interface of the server allows to switch the ASR model on-the-fly. |
| 9 | +NOTE: This is a complete **rewrite** (2021) of the original STT Server (2018). Code of the old version has been moved to the [LEGACY SERVER](legacy-server) folder. |
| 10 | +If you are using custom models built for the 2018 version you can easily [convert them to new models](https://github.com/fquirin/kaldi-adapt-lm/blob/master/4a-build-vosk-model.sh) (please ask for details via the issues section). |
| 11 | + |
| 12 | +<p align="center"> |
| 13 | + <img src="screenshots/stt-recorder-demo.png" alt="SEPIA STT Recorder Demo"/> |
| 14 | +</p> |
6 | 15 |
|
7 | 16 | ## Features |
8 | | -* Websocket server (Python Tornado) that can receive (and send) audio streams |
9 | | -* Compatible to [SEPIA Framework client](https://github.com/SEPIA-Framework/sepia-html-client-app) |
10 | | -* Integration of [Zamia Speech](https://github.com/gooofy/zamia-speech) (python-kaldiasr) to use Kalid ASR in Python |
11 | | -* Roughly based on [nexmo-community/audiosocket_framework](https://github.com/nexmo-community/audiosocket_framework) |
12 | 17 |
|
13 | | -## Using the Docker image |
| 18 | +* WebSocket server (Python Fast-API) that can **receive audio streams and send transcribed text at the same time** |
| 19 | +* Modular architecture to **support multiple ASR engines** like Vosk (reference implementation), Coqui, Deepspeech, Scribosermo, ... |
| 20 | +* Optional **post processing** of result (e.g. via [text2num](https://github.com/allo-media/text2num) and custom modules) |
| 21 | +* **Standardized API for all engines** and support for individual engine features (speaker identification, grammar, confidence score, word timestamps, alternative results, etc.) |
| 22 | +* **On-the-fly server and engine configuration** via HTTP REST API and WebSocket 'welcome' event (including custom grammar, if supported by engine and model) |
| 23 | +* **User authentication** via simple common token or individual tokens for multiple users |
| 24 | +* Docker containers with **support for all major platform architectures**: x86 64Bit (amd64), ARM 32Bit (armv7l) and ARM 64Bit (aarch64) |
| 25 | +* Fast enough to **run even on Raspberry Pi 4 (2GB) in realtime** (depending on engine and model configuration) |
| 26 | +* Compatible to [SEPIA Framework client](https://github.com/SEPIA-Framework/sepia-html-client-app) (v0.24+) |
14 | 27 |
|
15 | | -Make sure you have Docker installed then pull the image via the command-line: |
16 | | -```bash |
17 | | -docker pull sepia/stt-server:beta2.1 |
18 | | -``` |
19 | | -Once the image has finished downloading (~700MB, extracted ~2GB) you can run it using: |
20 | | -```bash |
21 | | -docker run --rm --name=sepia_stt -d -p 9000:8080 sepia/stt-server:beta2.1 |
22 | | -``` |
23 | | -This will start the STT server (with internal proxy running on port 8080 with path '/stt') and expose it to port 9000 (choose whatever you need here). |
24 | | -To test if the server is working you can call the settings interface with: |
25 | | -```bash |
26 | | -curl http://localhost:9000/stt/settings && echo |
27 | | -``` |
28 | | -You should see a JSON response indicating the ASR model and server version. |
29 | | -To stop the server use: |
30 | | -```bash |
31 | | -docker stop sepia_stt |
32 | | -``` |
33 | | -To change the server settings, add your own ASR models, do language model customization or to capture your recordings for later you can use the internal 'share' folder like this: |
34 | | -```bash |
35 | | -wget -O share-folder.zip https://github.com/SEPIA-Framework/sepia-stt-server/blob/master/share-folder.zip?raw=true |
36 | | -unzip share-folder.zip -d /home/[my user]/sepia-stt-share/ |
37 | | -docker run --rm --name=sepia_stt -d -p 9000:8080 -v /home/[my user]/sepia-stt-share:/apps/share sepia/stt-server:beta2.1 |
38 | | -``` |
39 | | -where `/home/[my user]/sepia-stt-share` is just an example for any folder you would like to use (e.g. in Windows it could be C:/sepia/stt-share). |
40 | | -When setup like this the server will load it's configuration from the app.conf in your shared folder. |
41 | | - |
42 | | -For SEPIA app/client settings see below. |
| 28 | +## Integrated ASR Engines |
| 29 | + |
| 30 | +- [Vosk](https://github.com/alphacep/vosk-api) - Status: Ready. Includes tiny EN and DE models. |
| 31 | +- [Coqui](https://github.com/coqui-ai/STT) - Status: Planned. |
| 32 | +- [Scribosermo](https://gitlab.com/Jaco-Assistant/Scribosermo) - Status: Help wanted. |
| 33 | +- [TensorFlowASR](https://github.com/TensorSpeech/TensorFlowASR) - Status: Help wanted. |
| 34 | +- If you want to see additional engines please create a new [issue](https://github.com/SEPIA-Framework/sepia-stt-server/issues). Pull requests are welcome ;-) |
43 | 35 |
|
44 | | -## Custom installation (tested on Debian9 64bit) |
| 36 | +## Quick-Start |
45 | 37 |
|
46 | | -### Requirements |
47 | | -Make sure you have at least Python 2.7 with pip (e.g.: sudo apt-get install python-pip) installed. You may also need header files for Python and OpenSSL depending on your operating system. |
48 | | -If you are good to go install a few dependencies via pip: |
49 | | -```bash |
50 | | -pip install tornado webrtcvad numpy |
| 38 | +The easiest way to get started is to use a Docker container for your platform: |
| 39 | +- x86 64Bit Systeme (Desktop PCs, Linux server etc.): `docker pull sepia/stt-server:v2_amd64_beta` |
| 40 | +- ARM 32Bit (Raspberry Pi 4 32Bit OS): `docker pull sepia/stt-server:v2_armv7l_beta` |
| 41 | +- ARM 64Bit (RPi 4 64Bit, Jetson Nano(?)): `docker pull sepia/stt-server:v2_aarch64_beta` |
| 42 | + |
| 43 | +After the download is complete simply start the container, for example via: |
51 | 44 | ``` |
52 | | -Then get the Python Kaldi bindings from [Zamia Speech](https://github.com/gooofy/zamia-speech) (Debian9 64bit example, see link for details): |
53 | | -```bash |
54 | | -echo "deb http://goofy.zamia.org/repo-ai/debian/stretch/amd64/ ./" >/etc/apt/sources.list.d/zamia-ai.list |
55 | | -wget -qO - http://goofy.zamia.org/repo-ai/debian/stretch/amd64/bofh.asc | sudo apt-key add - |
56 | | -apt-get update |
57 | | -apt-get install python-kaldiasr |
| 45 | +sudo docker run --name=sepia-stt -p 20741:20741 -it sepia/stt-server:[platform-tag] |
58 | 46 | ``` |
59 | | -Download one (or more) of their great ASR models too! I recommend 'kaldi-generic-en-tdnn_sp'. |
60 | 47 |
|
61 | | -### Install STT server and run |
62 | | -```bash |
63 | | -git clone https://github.com/SEPIA-Framework/sepia-stt-server.git |
64 | | -cd sepia-stt-server |
65 | | -python sepia_stt_server.py |
66 | | -``` |
67 | | -You can check if the server is reachable by calling `http://localhost:20741/ping` |
| 48 | +To test the server visit: `http://localhost:20741` if you are on the same machine or `http://[server-IP]:20741` if you are in the same network (NOTE: custom recordings via microphone will only work using localhost or a HTTPS URL!). |
68 | 49 |
|
69 | | -### Configuration |
70 | | -The application reads its configuration on start-up from the app.conf file that can be located in several different locations (checked in this order): |
71 | | -* Home folder of the user: `~/share/sepia_stt_server/app.conf` |
72 | | -* App folder: `/apps/share/sepia_stt_server/app.conf` |
73 | | -* Base folder of the server app: `./app.conf` |
74 | | - |
75 | | -The most important settings are: |
76 | | -* port: Port of the server, default is 20741. You can use `ngrok http 20741` to tunnel to the SEPIA STT-Server for testing |
77 | | -* recordings_path: This is where the framework application will store audio files it records, default is "./recordings/" |
78 | | -* kaldi_model_path: This is where the ASR models for Kaldi are stored, default is "/opt/kaldi/model/kaldi-generic-en-tdnn_sp" as used by Zamia Speech |
| 50 | +## Server Settings |
79 | 51 |
|
80 | | -## How to set-up the SEPIA client |
81 | | -Open your client (or e.g. the [official public client](https://sepia-framework.github.io/app/index.html)), go to settings and look for 'ASR server' (page 2). If you are using the Docker image (see above) your entry should look something like this: |
82 | | -* `ws://127.0.0.1:9000/stt/socket` (when running Docker on same machine and used the example command to start the image) |
83 | | -* `wss://secure.example.com/stt/socket` (when using a secure server and proxy) |
| 52 | +Most of the settings can be handled easily via the [server.conf settings file](src/server.conf). Please check out the file to see whats possible. |
84 | 53 |
|
85 | | -After you've set the correct server check the 'ASR engine' selector. If your browser supports the 'MediaDevices' interface you will be able to select 'Custom (WebSocket)' here. |
86 | | - |
87 | | -Some browsers might require a secure HTTPS connection. If you don't have your [own secure web-server](https://github.com/SEPIA-Framework/sepia-docs/wiki/SSL-for-your-Server) you can use tools like [Ngrok](https://ngrok.com/docs) for testing, e.g.: |
88 | | -```bash |
89 | | -./ngrok http 9000 |
90 | | -``` |
91 | | -Choose the right port depending on your app.conf and your Docker run command (in case you are using the Docker image) and then set your 'ASR server' like this: |
92 | | -* `wss://[MY-NGROK-ADDRESS].nkrok.io/socket` (if you run the server directly) or |
93 | | -* `wss://[MY-NGROK-ADDRESS].nkrok.io/stt/socket` (if you're using the Docker image). |
| 54 | +ENV variables: |
| 55 | +- `SEPIA_STT_SETTINGS`: Overwrites default path to settings file |
| 56 | + |
| 57 | +Commandline options: |
| 58 | +- Use `python -m launch -h` to see all commandline options |
| 59 | +- Use `python -m launch -s [path-to-file]` to use custom settings |
| 60 | + |
| 61 | +NOTE: Commandline options always overrule the settings file but in most scenarios it makes sense to simply create a new settings file and use the `-s` flag. |
| 62 | + |
| 63 | +## ASR Engine Settings |
| 64 | + |
| 65 | +As soon as the server is running you can check the current setup via the HTTP REST interface: `http://localhost:20741//settings` or the test page (see quick-start above). |
94 | 66 |
|
95 | | -Finally test the speech recognition in your client via the microphone button :-) |
| 67 | +Individual settings for the active engine can be changed on-the-fly during the WebSocket 'welcome' event. See the [API docs](API.md) file for more info or check out the 'Engine Settings' section of the test page. |
96 | 68 |
|
97 | | -## REST Interface |
98 | | -The configuration can be changed while the server is running. |
| 69 | +## How to use with SEPIA Client |
| 70 | + |
| 71 | +The [SEPIA Client](https://github.com/SEPIA-Framework/sepia-html-client-app) will support the new STT server out-of-the-box from version 0.24.0 on. |
| 72 | +Simply open the client's settings, look for 'ASR engine (STT)' and select `SEPIA`. The server address will be set automatically relative to your SEPIA Server host. |
| 73 | +If your SEPIA server proxy has not been updated yet to forward requests to the SEPIA STT-Server you can enter the direct URL via the STT settings page, e.g.: `http://localhost:20741` or `http://localhost:20726/sepia/stt`. |
| 74 | +The settings will allow you to select a specific ASR model for each client language as well (if you don't want to use the language defaults set by your STT server config). |
99 | 75 |
|
100 | | -Get the current configuration via HTTP GET to (custom server): |
101 | | -``` |
102 | | -curl -X GET http://localhost:20741/settings |
103 | | -``` |
104 | | -Note: Replace localhost by your server or localhost:port with the web-server/proxy/Ngrok address. When you are using the Docker image your server is using a proxy! Add: '/stt/settings' to the path like in the client setup. |
| 76 | +NOTE: Keep in mind that the client's microphone will [only work in a secure environment](https://github.com/SEPIA-Framework/sepia-docs/wiki/SSL-for-your-Server) (that is localhost or HTTPS) |
| 77 | +and thus the link to your server must be secure as well (e.g. use a real domain and SSL certificate, self-signed SSL or a proxy running on localhost). |
| 78 | + |
| 79 | +## Develop your own client |
| 80 | + |
| 81 | +See the separate [API docs](API.md) file or check out the [Javascript client class](src/www/audio-modules/shared/sepia-stt-socket-client.js) and the [test page](src/www/test-page.html) source-code. |
105 | 82 |
|
106 | | -Set a different Kaldi model via HTTP POST, e.g.: |
107 | | -``` |
108 | | -curl -X POST http://localhost:20741/settings \ |
109 | | - -H 'Content-Type: application/json' \ |
110 | | - -d '{"token":"test", "kaldi_model":"/home/user/share/kaldi_models/my-own-model"}' |
111 | | -``` |
112 | | -(Note: token=test is a placeholder for future authentication process) |
| 83 | +Demo clients: |
| 84 | +- Server test page(s): `http://localhost:20741` (with microphone) or `http://[server-IP]:20741` (no microphone due to "insecure" origin) |
| 85 | +- [SEPIA Client app](https://sepia-framework.github.io/app/) (v0.24+, simply skip the login, go to settings and enter your server URL) |
| 86 | + |
| 87 | +## Adapt ASR models |
| 88 | + |
| 89 | +Open-source ASR has improved a lot in the last years but sometimes it makes sense to adapt the models to your own, specific use-case and vocabulary to improve accuracy. |
| 90 | +The language model adaptation process will be integrated into the server in the near future. Until then please check out the following links: |
113 | 91 |
|
| 92 | +- Language model adaptation made easy with [kaldi-adapt-lm](https://github.com/fquirin/kaldi-adapt-lm) |
0 commit comments