DeepSpeech REST API

This REST API is built on top of Mozilla's DeepSpeech. It is written based on examples provided by Mozilla. It accepts HTTP methods such as GET and POST as well as WebSocket. To perform transcription using HTTP methods is appropriate for relatively short audio files while the WebSocket can be used even for longer audio recordings.

Getting started

Below instructions are for Unix/OS X, they will have to be changed to be able to run the code on Windows.

Clone the repository to your local machine and change directory to deepspeech-rest-api

$ git clone https://github.com/fabricekwizera/deepspeech-rest-api.git
$ cd deepspeech-rest-api

2. Create a virtual environment and activate it (assuming that it is installed your machine) and install the project in editable mode (locally).

$ python -m venv venv
$ source venv/bin/activate
$ python -m pip install -U pip==21.0.0 wheel
$ python -m pip install --editable .

Download the model and the scorer. For English model and scorer, follow below links

$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm \
    -O deepspeech_model.pbmm
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer \
    -O deepspeech_model.scorer
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models-zh-CN.pbmm \
    -O deepspeech_model.pbmm
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models-zh-CN.scorer \
    -O deepspeech_model.scorer

For other languages, you can place the two files in the current working directory under the names deepspeech_model.pbmm for the model and deepspeech_model.scorer for the scorer.

Migrations are done using Alembic

You will need a database setup. To use a basic database with the same connection string as the currently configed database in the .env file, run

$ docker-compose up -d

within the postgres_docker folder to create a database

To get the database created with the correct tables and user, run

$ alembic upgrade head

Running the server

$ python run.py

Usage

Register a new user and request a new JWT token to access the API

$ curl -X POST \
http://0.0.0.0:8000/users \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"email": "[email protected]",
"password": "yourpassword"
}'

API response

{
  "message": "User forrestgump is successfully created."
}

To generate a JWT token to access the API

$ curl -X POST \
http://0.0.0.0:8000/token \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"password": "yourpassword"
}'

If both steps are done correctly, you should get a token in below format

{
    "access_token": "JWT_token"
}

With this JWT_token, you have access to different endpoints of the API.

Performing STT (Speech-To-Text)

STT with audio files

Change directory to audio and use the WAV files provided for testing.

Note the usage of hot-words and their boosts in the request.

STT the HTTP way

cURL

$ curl -X POST \
http://0.0.0.0:8000/api/v1/stt/http \
-H 'Authorization: Bearer JWT_token' \
-F '[email protected]' \
-F 'paris=-1000' \
-F 'power=1000' \
-F 'parents=-1000'

python

import requests

jwt_token = 'JWT_token'
headers = {'Authorization': 'Bearer ' + jwt_token}
url = 'http://0.0.0.0:8000/api/v1/stt/http'
hot_words = {'paris': -1000, 'power': 1000, 'parents': -1000}
audio_filename = 'audio/8455-210777-0068.wav'
audio = [('audio', open(audio_filename, 'rb'))]
response = requests.post(url, data=hot_words, files=audio, headers=headers)
print(response.json())

STT the WebSocket way (simple test)

WebSockets don't support curl. To take advantage of this feature, you will have to write a web app to send request to the endpoint /api/v1/stt/ws.

Below command can be used to check if the WebSocket is running.

$ python client_audio_file_stt.py

In the both cases (HTTP and WebSocket), you should get a result in below format.

{
  "message": "experience proves this",
  "time": 1.4718825020026998
}

STT with speech from microphone

Below command can be used to stream speech using the WebSocket on the endpoint api/v1/mic. Also in this case, the web app well need to implement something similar (or far better) to the one in below code.

$ python client_audio_file_stt.py

Now you can stream speech to your server and see the result in the client's shell. The implementation of VAD (Voice Activity Detection) will be released pretty soon.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
app		app
audio		audio
migrations		migrations
postgres_docker		postgres_docker
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.rst		README.rst
alembic.ini		alembic.ini
client_audio_file_stt.py		client_audio_file_stt.py
client_http_file_stt.py		client_http_file_stt.py
client_mic_stream_stt.py		client_mic_stream_stt.py
config.py		config.py
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt
requirements_test_websocket.txt		requirements_test_websocket.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepSpeech REST API

Getting started

Usage

Performing STT (Speech-To-Text)

STT with audio files

STT with speech from microphone

About

Uh oh!

Releases

Packages

Languages

License

JEMeyer/deepspeech-rest-api

Folders and files

Latest commit

History

Repository files navigation

DeepSpeech REST API

Getting started

Usage

Performing STT (Speech-To-Text)

STT with audio files

STT with speech from microphone

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages