Skip to content

Commit 71714d7

Browse files
authored
Merge pull request #4 from SEPIA-Framework/dev
Move 2021 rework of STT server to master; v0.9.5
2 parents 66c183b + ae79d9b commit 71714d7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+8232
-97
lines changed

.gitignore

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ parts/
2020
sdist/
2121
var/
2222
wheels/
23+
share/python-wheels/
2324
*.egg-info/
2425
.installed.cfg
2526
*.egg
@@ -38,14 +39,17 @@ pip-delete-this-directory.txt
3839
# Unit test / coverage reports
3940
htmlcov/
4041
.tox/
42+
.nox/
4143
.coverage
4244
.coverage.*
4345
.cache
4446
nosetests.xml
4547
coverage.xml
4648
*.cover
49+
*.py,cover
4750
.hypothesis/
4851
.pytest_cache/
52+
cover/
4953

5054
# Translations
5155
*.mo
@@ -55,6 +59,7 @@ coverage.xml
5559
*.log
5660
local_settings.py
5761
db.sqlite3
62+
db.sqlite3-journal
5863

5964
# Flask stuff:
6065
instance/
@@ -67,16 +72,34 @@ instance/
6772
docs/_build/
6873

6974
# PyBuilder
75+
.pybuilder/
7076
target/
7177

7278
# Jupyter Notebook
7379
.ipynb_checkpoints
7480

81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
7585
# pyenv
76-
.python-version
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
7796

78-
# celery beat schedule file
97+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
98+
__pypackages__/
99+
100+
# Celery stuff
79101
celerybeat-schedule
102+
celerybeat.pid
80103

81104
# SageMath parsed files
82105
*.sage.py
@@ -102,3 +125,18 @@ venv.bak/
102125

103126
# mypy
104127
.mypy_cache/
128+
.dmypy.json
129+
dmypy.json
130+
131+
# Pyre type checker
132+
.pyre/
133+
134+
# pytype static type analyzer
135+
.pytype/
136+
137+
# Cython debug symbols
138+
cython_debug/
139+
140+
# PROJECT
141+
recordings/
142+
models/

API.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# SEPIA Speech-To-Text Server API
2+
3+
This document describes the API to communicated with SEPIA Speech-To-Text (STT) Server.
4+
5+
[UNDER CONSTRUCTION: Please create an issue to push me and update this :-p]
6+
In the meantime follow the discussion: https://github.com/SEPIA-Framework/sepia-docs/discussions/112
7+
8+
## Client connection and 'welcome' event
9+
10+
TBD
11+
12+
## Sending chunks of audio
13+
14+
TBD
15+
16+
## Transcription Results
17+
18+
TBD

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2017 Nexmo Community, 2018 Florian Quirin (bytemind.de)
3+
Copyright (c) 2021 Florian Quirin (bytemind.de) for SEPIA Framework
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 72 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,92 @@
11
# SEPIA Speech-To-Text Server
2+
3+
SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for realtime automatic speech recognition (ASR) supporting multiple open-source ASR engines.
4+
It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results.
25

3-
[BETA - UNDER CONSTRUCTION]
6+
One goal of this project is to offer a **standardized, secure, realtime interface** for all the great open-source ASR tools out there.
7+
The server works on all major platforms including single-board devices like Raspberry Pi (4).
48

5-
This server supports streaming audio over a WebSocket connection with integration of an open-source ASR decoder like the Kaldi speech recognition toolkit. It can handle full-duplex messaging during the decoding process for intermediate results. The REST interface of the server allows to switch the ASR model on-the-fly.
9+
NOTE: This is a complete **rewrite** (2021) of the original STT Server (2018). Code of the old version has been moved to the [LEGACY SERVER](legacy-server) folder.
10+
If you are using custom models built for the 2018 version you can easily [convert them to new models](https://github.com/fquirin/kaldi-adapt-lm/blob/master/4a-build-vosk-model.sh) (please ask for details via the issues section).
11+
12+
<p align="center">
13+
<img src="screenshots/stt-recorder-demo.png" alt="SEPIA STT Recorder Demo"/>
14+
</p>
615

716
## Features
8-
* Websocket server (Python Tornado) that can receive (and send) audio streams
9-
* Compatible to [SEPIA Framework client](https://github.com/SEPIA-Framework/sepia-html-client-app)
10-
* Integration of [Zamia Speech](https://github.com/gooofy/zamia-speech) (python-kaldiasr) to use Kalid ASR in Python
11-
* Roughly based on [nexmo-community/audiosocket_framework](https://github.com/nexmo-community/audiosocket_framework)
1217

13-
## Using the Docker image
18+
* WebSocket server (Python Fast-API) that can **receive audio streams and send transcribed text at the same time**
19+
* Modular architecture to **support multiple ASR engines** like Vosk (reference implementation), Coqui, Deepspeech, Scribosermo, ...
20+
* Optional **post processing** of result (e.g. via [text2num](https://github.com/allo-media/text2num) and custom modules)
21+
* **Standardized API for all engines** and support for individual engine features (speaker identification, grammar, confidence score, word timestamps, alternative results, etc.)
22+
* **On-the-fly server and engine configuration** via HTTP REST API and WebSocket 'welcome' event (including custom grammar, if supported by engine and model)
23+
* **User authentication** via simple common token or individual tokens for multiple users
24+
* Docker containers with **support for all major platform architectures**: x86 64Bit (amd64), ARM 32Bit (armv7l) and ARM 64Bit (aarch64)
25+
* Fast enough to **run even on Raspberry Pi 4 (2GB) in realtime** (depending on engine and model configuration)
26+
* Compatible to [SEPIA Framework client](https://github.com/SEPIA-Framework/sepia-html-client-app) (v0.24+)
1427

15-
Make sure you have Docker installed then pull the image via the command-line:
16-
```bash
17-
docker pull sepia/stt-server:beta2.1
18-
```
19-
Once the image has finished downloading (~700MB, extracted ~2GB) you can run it using:
20-
```bash
21-
docker run --rm --name=sepia_stt -d -p 9000:8080 sepia/stt-server:beta2.1
22-
```
23-
This will start the STT server (with internal proxy running on port 8080 with path '/stt') and expose it to port 9000 (choose whatever you need here).
24-
To test if the server is working you can call the settings interface with:
25-
```bash
26-
curl http://localhost:9000/stt/settings && echo
27-
```
28-
You should see a JSON response indicating the ASR model and server version.
29-
To stop the server use:
30-
```bash
31-
docker stop sepia_stt
32-
```
33-
To change the server settings, add your own ASR models, do language model customization or to capture your recordings for later you can use the internal 'share' folder like this:
34-
```bash
35-
wget -O share-folder.zip https://github.com/SEPIA-Framework/sepia-stt-server/blob/master/share-folder.zip?raw=true
36-
unzip share-folder.zip -d /home/[my user]/sepia-stt-share/
37-
docker run --rm --name=sepia_stt -d -p 9000:8080 -v /home/[my user]/sepia-stt-share:/apps/share sepia/stt-server:beta2.1
38-
```
39-
where `/home/[my user]/sepia-stt-share` is just an example for any folder you would like to use (e.g. in Windows it could be C:/sepia/stt-share).
40-
When setup like this the server will load it's configuration from the app.conf in your shared folder.
41-
42-
For SEPIA app/client settings see below.
28+
## Integrated ASR Engines
29+
30+
- [Vosk](https://github.com/alphacep/vosk-api) - Status: Ready. Includes tiny EN and DE models.
31+
- [Coqui](https://github.com/coqui-ai/STT) - Status: Planned.
32+
- [Scribosermo](https://gitlab.com/Jaco-Assistant/Scribosermo) - Status: Help wanted.
33+
- [TensorFlowASR](https://github.com/TensorSpeech/TensorFlowASR) - Status: Help wanted.
34+
- If you want to see additional engines please create a new [issue](https://github.com/SEPIA-Framework/sepia-stt-server/issues). Pull requests are welcome ;-)
4335

44-
## Custom installation (tested on Debian9 64bit)
36+
## Quick-Start
4537

46-
### Requirements
47-
Make sure you have at least Python 2.7 with pip (e.g.: sudo apt-get install python-pip) installed. You may also need header files for Python and OpenSSL depending on your operating system.
48-
If you are good to go install a few dependencies via pip:
49-
```bash
50-
pip install tornado webrtcvad numpy
38+
The easiest way to get started is to use a Docker container for your platform:
39+
- x86 64Bit Systeme (Desktop PCs, Linux server etc.): `docker pull sepia/stt-server:v2_amd64_beta`
40+
- ARM 32Bit (Raspberry Pi 4 32Bit OS): `docker pull sepia/stt-server:v2_armv7l_beta`
41+
- ARM 64Bit (RPi 4 64Bit, Jetson Nano(?)): `docker pull sepia/stt-server:v2_aarch64_beta`
42+
43+
After the download is complete simply start the container, for example via:
5144
```
52-
Then get the Python Kaldi bindings from [Zamia Speech](https://github.com/gooofy/zamia-speech) (Debian9 64bit example, see link for details):
53-
```bash
54-
echo "deb http://goofy.zamia.org/repo-ai/debian/stretch/amd64/ ./" >/etc/apt/sources.list.d/zamia-ai.list
55-
wget -qO - http://goofy.zamia.org/repo-ai/debian/stretch/amd64/bofh.asc | sudo apt-key add -
56-
apt-get update
57-
apt-get install python-kaldiasr
45+
sudo docker run --name=sepia-stt -p 20741:20741 -it sepia/stt-server:[platform-tag]
5846
```
59-
Download one (or more) of their great ASR models too! I recommend 'kaldi-generic-en-tdnn_sp'.
6047

61-
### Install STT server and run
62-
```bash
63-
git clone https://github.com/SEPIA-Framework/sepia-stt-server.git
64-
cd sepia-stt-server
65-
python sepia_stt_server.py
66-
```
67-
You can check if the server is reachable by calling `http://localhost:20741/ping`
48+
To test the server visit: `http://localhost:20741` if you are on the same machine or `http://[server-IP]:20741` if you are in the same network (NOTE: custom recordings via microphone will only work using localhost or a HTTPS URL!).
6849

69-
### Configuration
70-
The application reads its configuration on start-up from the app.conf file that can be located in several different locations (checked in this order):
71-
* Home folder of the user: `~/share/sepia_stt_server/app.conf`
72-
* App folder: `/apps/share/sepia_stt_server/app.conf`
73-
* Base folder of the server app: `./app.conf`
74-
75-
The most important settings are:
76-
* port: Port of the server, default is 20741. You can use `ngrok http 20741` to tunnel to the SEPIA STT-Server for testing
77-
* recordings_path: This is where the framework application will store audio files it records, default is "./recordings/"
78-
* kaldi_model_path: This is where the ASR models for Kaldi are stored, default is "/opt/kaldi/model/kaldi-generic-en-tdnn_sp" as used by Zamia Speech
50+
## Server Settings
7951

80-
## How to set-up the SEPIA client
81-
Open your client (or e.g. the [official public client](https://sepia-framework.github.io/app/index.html)), go to settings and look for 'ASR server' (page 2). If you are using the Docker image (see above) your entry should look something like this:
82-
* `ws://127.0.0.1:9000/stt/socket` (when running Docker on same machine and used the example command to start the image)
83-
* `wss://secure.example.com/stt/socket` (when using a secure server and proxy)
52+
Most of the settings can be handled easily via the [server.conf settings file](src/server.conf). Please check out the file to see whats possible.
8453

85-
After you've set the correct server check the 'ASR engine' selector. If your browser supports the 'MediaDevices' interface you will be able to select 'Custom (WebSocket)' here.
86-
87-
Some browsers might require a secure HTTPS connection. If you don't have your [own secure web-server](https://github.com/SEPIA-Framework/sepia-docs/wiki/SSL-for-your-Server) you can use tools like [Ngrok](https://ngrok.com/docs) for testing, e.g.:
88-
```bash
89-
./ngrok http 9000
90-
```
91-
Choose the right port depending on your app.conf and your Docker run command (in case you are using the Docker image) and then set your 'ASR server' like this:
92-
* `wss://[MY-NGROK-ADDRESS].nkrok.io/socket` (if you run the server directly) or
93-
* `wss://[MY-NGROK-ADDRESS].nkrok.io/stt/socket` (if you're using the Docker image).
54+
ENV variables:
55+
- `SEPIA_STT_SETTINGS`: Overwrites default path to settings file
56+
57+
Commandline options:
58+
- Use `python -m launch -h` to see all commandline options
59+
- Use `python -m launch -s [path-to-file]` to use custom settings
60+
61+
NOTE: Commandline options always overrule the settings file but in most scenarios it makes sense to simply create a new settings file and use the `-s` flag.
62+
63+
## ASR Engine Settings
64+
65+
As soon as the server is running you can check the current setup via the HTTP REST interface: `http://localhost:20741//settings` or the test page (see quick-start above).
9466

95-
Finally test the speech recognition in your client via the microphone button :-)
67+
Individual settings for the active engine can be changed on-the-fly during the WebSocket 'welcome' event. See the [API docs](API.md) file for more info or check out the 'Engine Settings' section of the test page.
9668

97-
## REST Interface
98-
The configuration can be changed while the server is running.
69+
## How to use with SEPIA Client
70+
71+
The [SEPIA Client](https://github.com/SEPIA-Framework/sepia-html-client-app) will support the new STT server out-of-the-box from version 0.24.0 on.
72+
Simply open the client's settings, look for 'ASR engine (STT)' and select `SEPIA`. The server address will be set automatically relative to your SEPIA Server host.
73+
If your SEPIA server proxy has not been updated yet to forward requests to the SEPIA STT-Server you can enter the direct URL via the STT settings page, e.g.: `http://localhost:20741` or `http://localhost:20726/sepia/stt`.
74+
The settings will allow you to select a specific ASR model for each client language as well (if you don't want to use the language defaults set by your STT server config).
9975

100-
Get the current configuration via HTTP GET to (custom server):
101-
```
102-
curl -X GET http://localhost:20741/settings
103-
```
104-
Note: Replace localhost by your server or localhost:port with the web-server/proxy/Ngrok address. When you are using the Docker image your server is using a proxy! Add: '/stt/settings' to the path like in the client setup.
76+
NOTE: Keep in mind that the client's microphone will [only work in a secure environment](https://github.com/SEPIA-Framework/sepia-docs/wiki/SSL-for-your-Server) (that is localhost or HTTPS)
77+
and thus the link to your server must be secure as well (e.g. use a real domain and SSL certificate, self-signed SSL or a proxy running on localhost).
78+
79+
## Develop your own client
80+
81+
See the separate [API docs](API.md) file or check out the [Javascript client class](src/www/audio-modules/shared/sepia-stt-socket-client.js) and the [test page](src/www/test-page.html) source-code.
10582

106-
Set a different Kaldi model via HTTP POST, e.g.:
107-
```
108-
curl -X POST http://localhost:20741/settings \
109-
-H 'Content-Type: application/json' \
110-
-d '{"token":"test", "kaldi_model":"/home/user/share/kaldi_models/my-own-model"}'
111-
```
112-
(Note: token=test is a placeholder for future authentication process)
83+
Demo clients:
84+
- Server test page(s): `http://localhost:20741` (with microphone) or `http://[server-IP]:20741` (no microphone due to "insecure" origin)
85+
- [SEPIA Client app](https://sepia-framework.github.io/app/) (v0.24+, simply skip the login, go to settings and enter your server URL)
86+
87+
## Adapt ASR models
88+
89+
Open-source ASR has improved a lot in the last years but sometimes it makes sense to adapt the models to your own, specific use-case and vocabulary to improve accuracy.
90+
The language model adaptation process will be integrated into the server in the near future. Until then please check out the following links:
11391

92+
- Language model adaptation made easy with [kaldi-adapt-lm](https://github.com/fquirin/kaldi-adapt-lm)

engines/vosk/Dockerfile

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
FROM debian:buster-slim
2+
3+
# Default to UTF-8 file.encoding
4+
ENV LANG C.UTF-8
5+
6+
# Run 1
7+
RUN echo 'Installing dependencies...' && \
8+
#
9+
# Dependencies
10+
apt-get update && \
11+
apt-get install -y --no-install-recommends \
12+
sudo git wget curl nano unzip zip procps \
13+
build-essential \
14+
python3-pip python3-dev python3-setuptools python3-wheel \
15+
libffi-dev && \
16+
#
17+
# Vosk and Fast-API
18+
pip3 install cffi && \
19+
pip3 install fastapi uvicorn[standard] aiofiles && \
20+
#
21+
# Clean up
22+
apt-get remove -y build-essential && \
23+
apt-get install libatomic1 && \
24+
apt-get clean && apt-get autoclean && apt-get autoremove -y && \
25+
#
26+
# Create user
27+
useradd --create-home --shell /bin/bash admin && \
28+
adduser admin sudo && \
29+
echo "admin ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
30+
#
31+
# ENV
32+
#SOME_ENV_VAR=/...my-stuff
33+
34+
# USER
35+
USER admin
36+
37+
# Run 1
38+
RUN echo "Installing Vosk ..." && \
39+
mkdir -p /home/admin/install && \
40+
mkdir -p /home/admin/sepia-stt/models && \
41+
cd /home/admin/install && \
42+
#pip3 install cffi && \
43+
#pip3 install fastapi uvicorn[standard] aiofiles && \
44+
if [ -n "$(uname -m | grep aarch64)" ]; then \
45+
echo "Downloading Vosk 0.3.30 for aarch64"; \
46+
wget https://github.com/alphacep/vosk-api/releases/download/0.3.30/vosk-0.3.30-py3-none-linux_aarch64.whl; \
47+
pip3 install vosk-0.3.30-py3-none-linux_aarch64.whl; \
48+
elif [ -n "$(uname -m | grep armv7l)" ]; then \
49+
echo "Downloading Vosk 0.3.30 for armv7l"; \
50+
wget https://github.com/alphacep/vosk-api/releases/download/0.3.30/vosk-0.3.30-py3-none-linux_armv7l.whl; \
51+
pip3 install vosk-0.3.30-py3-none-linux_armv7l.whl; \
52+
else \
53+
echo "Downloading Vosk 0.3.30 for x86_64"; \
54+
wget https://github.com/alphacep/vosk-api/releases/download/0.3.30/vosk-0.3.30-py3-none-linux_x86_64.whl; \
55+
pip3 install vosk-0.3.30-py3-none-linux_x86_64.whl; \
56+
fi && \
57+
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip && \
58+
wget https://alphacephei.com/vosk/models/vosk-model-small-de-0.15.zip && \
59+
wget https://alphacephei.com/vosk/models/vosk-model-spk-0.4.zip && \
60+
unzip vosk-model-small-en-us-0.15.zip && \
61+
mv vosk-model-small-en-us-0.15 /home/admin/sepia-stt/models/vosk-model-small-en-us && \
62+
unzip vosk-model-small-de-0.15.zip && \
63+
mv vosk-model-small-de-0.15 /home/admin/sepia-stt/models/vosk-model-small-de && \
64+
unzip vosk-model-spk-0.4.zip && \
65+
mv vosk-model-spk-0.4 /home/admin/sepia-stt/models/vosk-model-spk && \
66+
#
67+
echo "Installing SEPIA STT ..." && \
68+
SEPIA_STT_BRANCH=dev && \
69+
git clone --single-branch --depth 1 -b $SEPIA_STT_BRANCH https://github.com/SEPIA-Framework/sepia-stt-server.git && \
70+
mv sepia-stt-server/src /home/admin/sepia-stt/server && \
71+
#
72+
# Clean up install folder
73+
cd /home/admin && \
74+
sudo rm -rf /home/admin/install && \
75+
#
76+
# TODO: install proxy with self-signed certs?
77+
#
78+
echo "#!/bin/bash" > on-docker.sh && echo "cd sepia-stt/server && python3 -m launch" >> on-docker.sh
79+
80+
# Start
81+
WORKDIR /home/admin
82+
CMD bash on-docker.sh

engines/vosk/build_container.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
# TODO: make version number variable
3+
if [ -n "$(uname -m | grep aarch64)" ]; then
4+
echo "Building Vosk Docker container for aarch64"
5+
sudo docker build -t sepia/stt-server:vosk_aarch64 .
6+
elif [ -n "$(uname -m | grep armv7l)" ]; then
7+
echo "Building Vosk Docker container for armv7l"
8+
sudo docker build -t sepia/stt-server:vosk_armv7l .
9+
else
10+
# NOTE: x86 32bit build not supported atm
11+
echo "Building Vosk Docker container for amd64"
12+
sudo docker build -t sepia/stt-server:vosk_amd64 .
13+
fi

0 commit comments

Comments
 (0)