Description
Describe the problem
For up until now chromaDB is only for text, from this feature it will be the solution for videos as well
like "Chat With Video" / "Talk To Video"
Describe the proposed solution
function that creates embeddings for videos
steps :
video ==> audio ==> text ==> nltk(sent_tokenize) ==> vector (steps followed in text ) -- (creating collection for sentences in video )
sample code :
from moviepy import VideoFileClip
import speech_recognition as sr
def extract_audio_from_video(video_file_path , audio_file_path):
'''
desc : extracts audio from video_file_path and stores it in audio_file_path
input :
video_file_path : path to video file
audio_file_path : path to audio file
output :
'''
video = VideoFileClip(video_file_path)
video.audio.write_audiofile(audio_file_path)
def transcribe_audio_to_text(audio_file_path , text_file_path):
'''
desc : transcribes the audio to text
input :
audio_file_path : path to audio file
text_file_path : path to text file to save the text
output :
'''
recognizer = sr.Recognizer()
with sr.AudioFile(audio_file_path) as source:
audio_data = recognizer.record(source)
text = recognizer.recognize_google(audio_data)
with open(text_file_path , "w") as file:
file.writelines(text)
return text
video_file_path = "samples.mp4"
audio_file_path = "temp_audio.wav"
text_file_path = "text_of_video.txt"
extract_audio_from_video(video_file_path, audio_file_path)
transcription = transcribe_audio_to_text(audio_file_path , text_file_path)
print("Transcription:")
print(transcription)
and I would like to contribute to this feature
Alternatives considered
No response
Importance
nice to have
Additional Information
I would like to work on it!