This REST API is built on top of Mozilla's DeepSpeech. It is written based on examples provided by Mozilla. It accepts HTTP methods such as GET and POST as well as WebSocket. To perform transcription using HTTP methods is appropriate for relatively short audio files while the WebSocket can be used even for longer audio recordings.
Below instructions are for Unix/OS X, they will have to be changed to be able to run the code on Windows.
- Clone the repository to your local machine and change directory to
deepspeech-rest-api
$ git clone https://github.com/fabricekwizera/deepspeech-rest-api.git
$ cd deepspeech-rest-api2. Create a virtual environment and activate it (assuming that it is installed your machine) and install the project in editable mode (locally).
$ python -m venv venv
$ source venv/bin/activate
$ python -m pip install -U pip==21.0.0 wheel
$ python -m pip install --editable .- Download the model and the scorer. For English model and scorer, follow below links
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm \
-O deepspeech_model.pbmm
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer \
-O deepspeech_model.scorer
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models-zh-CN.pbmm \
-O deepspeech_model.pbmm
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models-zh-CN.scorer \
-O deepspeech_model.scorerFor other languages, you can place the two files in the current working directory under the names deepspeech_model.pbmm for the
model and deepspeech_model.scorer for the scorer.
- Migrations are done using Alembic
You will need a database setup. To use a basic database with the same connection string as the currently configed database in the .env file, run
$ docker-compose up -d
within the postgres_docker folder to create a database
To get the database created with the correct tables and user, run
$ alembic upgrade head
- Running the server
$ python run.pyRegister a new user and request a new JWT token to access the API
$ curl -X POST \
http://0.0.0.0:8000/users \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"email": "[email protected]",
"password": "yourpassword"
}'API response
{
"message": "User forrestgump is successfully created."
}To generate a JWT token to access the API
$ curl -X POST \
http://0.0.0.0:8000/token \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"password": "yourpassword"
}'If both steps are done correctly, you should get a token in below format
{
"access_token": "JWT_token"
}With this JWT_token, you have access to different endpoints of the API.
Change directory to audio and use the WAV files provided for testing.
Note the usage of hot-words and their boosts in the request.
- STT the HTTP way
cURL
$ curl -X POST \
http://0.0.0.0:8000/api/v1/stt/http \
-H 'Authorization: Bearer JWT_token' \
-F '[email protected]' \
-F 'paris=-1000' \
-F 'power=1000' \
-F 'parents=-1000'python
import requests
jwt_token = 'JWT_token'
headers = {'Authorization': 'Bearer ' + jwt_token}
url = 'http://0.0.0.0:8000/api/v1/stt/http'
hot_words = {'paris': -1000, 'power': 1000, 'parents': -1000}
audio_filename = 'audio/8455-210777-0068.wav'
audio = [('audio', open(audio_filename, 'rb'))]
response = requests.post(url, data=hot_words, files=audio, headers=headers)
print(response.json())- STT the WebSocket way (simple test)
WebSockets don't support curl. To take advantage of this feature, you will have to write a web app to send request to the endpoint /api/v1/stt/ws.
Below command can be used to check if the WebSocket is running.
$ python client_audio_file_stt.pyIn the both cases (HTTP and WebSocket), you should get a result in below format.
{
"message": "experience proves this",
"time": 1.4718825020026998
}Below command can be used to stream speech using the WebSocket on the endpoint api/v1/mic. Also in this case, the web app well need to implement
something similar (or far better) to the one in below code.
$ python client_audio_file_stt.pyNow you can stream speech to your server and see the result in the client's shell. The implementation of VAD (Voice Activity Detection) will be released pretty soon.