Skip to content

Nebula: Introduce Nebula Transcription Service (Local Whisper + GPT Vision Integration) #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Anishyou
Copy link

@Anishyou Anishyou commented Apr 28, 2025

Overview:

  • Adds the initial version of the Nebula Transcription backend service.
  • Provides full local transcription with Whisper and slide number extraction using GPT-4o Vision.

Main Features:

  1. Download lecture videos from a given URL using ffmpeg.
  2. Extract audio from video and convert it into .wav format.
  3. Transcribe the extracted audio locally using openai-whisper (default model: base).
  4. Automatically uses GPU (CUDA) if available, otherwise CPU.
  5. Capture frames at transcript segment timestamps.
  6. Crop the bottom 5% of frames where slide numbers are typically shown.
  7. Detect slide numbers from frames using OpenAI GPT-4o Vision model.
  8. Return a structured JSON response with:
    • Start time, end time, transcribed text, and detected slide number for each segment.
  9. Automatically clean up temporary files after processing.

API Endpoint:
POST /start-transcribe

Input: {
  "videoUrl": "<video-link>",
  "lectureId": 123,
  "lectureUnitId": 456
 }

Output: JSON containing a list of transcribed segments with slide numbers.
example output

{
  "language": "en",
  "segments": [
    {
      "startTime": 0.0,
      "endTime": 5.2,
      "text": "Welcome to today's lecture...",
      "slideNumber": 1
    },
    ...
  ]
}

How to start the Nebula Transcriber:
Read - README.MD

@Anishyou Anishyou changed the title Lecture trasncription ´Nebula´:Lecture trasncription using Whisper and Gpt4-0 Apr 28, 2025
@Anishyou Anishyou changed the title ´Nebula´:Lecture trasncription using Whisper and Gpt4-0 Nebula:Lecture trasncription using Whisper and Gpt4-0 Apr 28, 2025
@Anishyou Anishyou changed the title Nebula:Lecture trasncription using Whisper and Gpt4-0 Nebula: Introduce Nebula Transcription Service (Local Whisper + GPT Vision Integration) Apr 28, 2025
Copy link
Member

@alexjoham alexjoham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first version looks good in my opinion. I added some comments. Also consider adding some doc comments and also maybe if you could have more files to not have the whole functionality in one file. Also please have a look at the linting test as there are some failing tests

Comment on lines 115 to 116
data = request.get_json()
video_url = data.get("videoUrl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small suggestion: Consider working here with DTOs as you do in Artemis

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to look into it.

@Anishyou Anishyou marked this pull request as ready for review May 3, 2025 15:57
@Anishyou Anishyou requested review from a team as code owners May 3, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants