Speech Recognition for Document Interaction

Overview

The speech_rec.py script is an advanced utility for real-time speech recognition, designed to interact with documents by taking notes through voice commands and integrating them into a form. This script uses Google Cloud Speech-to-Text for voice recognition, Tesseract OCR for reading text from images, and custom utility functions for PDF manipulation.

Features

Real-time speech recognition with Google Cloud Speech-to-Text.
Document analysis using Tesseract OCR.
Note-taking by voice command, integrated directly into document forms.
Automatic PDF generation from collected notes.
Environment variable setup for Google Cloud authentication.

Prerequisites

Google Cloud account with Speech-to-Text API access.
Tesseract OCR installed on your system.
Python with packages: pytesseract, pyaudio, google-cloud-speech, and nltk.

Setup

Install the required Python packages:

pip install pytesseract pyaudio google-cloud-speech nltk

Set up Google Cloud credentials:

export GOOGLE_APPLICATION_CREDENTIALS="path_to_your_credentials.json

Install Tesseract OCR and ensure it's in your system's PATH, or set it in the script:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Usage

Run the script from your command line:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Speak into the microphone to take notes. Say "banana" to integrate notes into the document, and "exit" or "quit" to finish the session and save the output.

Configuration

Modify the passphrase variable to a keyword that triggers note integration. Update the RATE and CHUNK parameters to match your audio recording preferences.

Output

The script saves recognized speech as notes, which are then written into a form image and saved as a PDF. Additionally, the raw notes are saved in a text file.

Note

This script is a prototype and is intended for use in controlled environments. Real-world application may require additional error handling and optimizations.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
data		data
output		output
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
destination.pdf		destination.pdf
output_image.jpg		output_image.jpg
pdf2img.py		pdf2img.py
requirements.txt		requirements.txt
simple_form.pdf		simple_form.pdf
speech_rec.py		speech_rec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition for Document Interaction

Overview

Features

Prerequisites

Setup

Usage

Configuration

Output

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition for Document Interaction

Overview

Features

Prerequisites

Setup

Usage

Configuration

Output

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages