This project when run will generate a speech api json file from the provided videos.
The Project is not updated in a very long time. I am being really busy, but I am trying to revive this. Feel free to provide suggestions and improvements. I am available on Twitter @Abhi347.
- A Google Cloud Platform project
- gcloud sdk installed and initialised with proper GCP project configuration. Check gcloud configuration using
gcloud config listcommand. If no configuration is found, rungcloud init. - Three separate google cloud storage buckets with regional or multiregional settings in the same gcp project. Add the name of the buckets to
app-config.json. - Access to Google Cloud Functions.
- Run
npm install. - Replace the
video,audioandspeechResponsebucket names inapp-config.jsonfile with the name of your own GCS bucket names. - Deploy both functions using
npm run deploycommand. - Upload a video to the GCS bucket the name of which you have mentioned as your video bucket in
app-config.json. - After some time the audio of the video file will be extracted in your
audiobucket as a flac file of the same name as the video file. - After few more minutes depending on the size of the video the google speech api response json will be available in your
speechResponsebucket. - Upload more videos to have multiple responses.
- The cloud function
extractAudiowill watch for any new upload in thevideobucket. - As soon as a new file is uploaded, the
extractAudiofunction will extract the audio from the file and upload the audio as a flac file with the same name as the video file to theaudiobucket. - The second cloud function
transcribeAudiowill watch any new upload in theaudiobucket. - As soon as the
extractAudiofunction uploads the extracted flac audio file to theaudiobucket, thetranscribeAudiofunction will be triggered. transcribeAudiofunction will then upload the audio file to the google speech api.- As soon as the response is received from google speech api, the response is converted to the json file and is uploaded to the
speechResponsebucket.
- Currently both the functions doesn't check whether the uploaded files are of video or audio type. It'll run and throw error if you try to upload any other file than a video file to
videobucket or any other file than an flac audio file toaudiobucket. - Extracting the audio is fast, although same can't be said about transcribing the audio using google speech api. The function will timeout, thus you'll have to go to your GCP console and increase the timeout of at least the
transcribeAudiocloud function by editing it.
The original idea and most of the source is taken from the Hackernoon article mentioned below. I have created this repo, because they didn't provided any working code, just code fragments. The code is modularised and the deploy script is created by me.