Skip to content

Nebula: Introduce Nebula Transcription Service #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 76 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
bf64035
integrated transcription for lectures
Apr 20, 2025
1a0cbf9
Merge branch 'main' into nebula/feature/transcript
Anishyou Apr 28, 2025
a155c04
Local transcription system
Apr 28, 2025
82faf13
README.MD
Apr 28, 2025
ae8d0ad
migrated transcription module to src/transcript
May 2, 2025
42eefac
added an example llm config yml file
May 2, 2025
bf1a7dc
updated README.MD file
May 2, 2025
cc70c21
Merge branch 'main' into nebula/feature/transcript
Anishyou May 2, 2025
36c9942
changed to fastapi
May 3, 2025
29226cd
updated requirements.txt
May 3, 2025
aca21b8
linting
May 3, 2025
22ede99
Moved readme to transcript
May 6, 2025
4761fd2
Using Whisper API. Added Authentication token and Health endpoint
May 9, 2025
2e44879
Adding audio_utils.py
May 9, 2025
0638f0d
Refactored transcript into NEBULA.
May 12, 2025
3265a5b
Added dto.py
May 13, 2025
aeebc39
update README
May 13, 2025
093cbfa
updated dto
May 13, 2025
20d2aa3
Reversing language change. Earlier one was correct
May 13, 2025
842f308
Updating dto as lectureID is not needed
May 13, 2025
ed06049
Added poetry , so easier to run. Updated Readme.md
May 15, 2025
32f35b5
Updated docker, Now transcriber can be ran through docker or poetry
May 15, 2025
11d9c12
deleting old Dockerfile
May 15, 2025
576ebc2
update README
May 15, 2025
4c6f874
reformatted
May 15, 2025
4df505c
Merge branch 'main' into nebula/feature/transcript
Anishyou May 15, 2025
a1a5bb4
fixing lower case
May 15, 2025
b94c1f0
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
May 15, 2025
95fb271
Added mock tests. Updated docker compose yml
May 16, 2025
bf64cc7
use example conifg as test config, so changing workflow
May 16, 2025
7f002d2
Merge branch 'main' into nebula/feature/transcript
Anishyou May 16, 2025
0e2d4f2
Adding pre commit in .toml
May 16, 2025
7f8e49d
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
May 16, 2025
c936bca
Adding logging statements
May 16, 2025
ed1b690
Linting issues
May 16, 2025
414c027
Linting issues
May 16, 2025
0c67671
Made changes for linting and other issues with code quality
May 17, 2025
4be98c3
Due to flake and black contradicting each other, had to put ignore in…
May 18, 2025
4b5ddd9
lazyloading apikeys
May 18, 2025
c07e840
deleting unecessary import
May 18, 2025
52299ec
fixing tests
May 18, 2025
09b3c1a
Remove stale config files causing coverage issues
May 18, 2025
a0ea546
updating pyproject.toml
May 18, 2025
9f914ec
Fixed temp cleaning issue
May 19, 2025
b5b2dd9
comment
May 19, 2025
ada4f59
fix test
May 19, 2025
f87b300
fix test
May 21, 2025
ba86581
fix test
May 21, 2025
7d02cb3
Added a common authentication system. Also added a jobs where now mul…
May 21, 2025
305b5f8
fix lint
May 21, 2025
d0f20dd
Updated dockerfile and docker-compose.yml
May 21, 2025
68644b9
update README.md
May 22, 2025
9641c6f
Merge branch 'main' into nebula/feature/transcript
Anishyou May 26, 2025
c2befd1
change port for mac
Jun 1, 2025
42b71c4
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
Jun 1, 2025
73862a2
Remove stuff
cremertim Jun 4, 2025
0dc47fc
add variant again
cremertim Jun 4, 2025
16b7194
Replacing print with logging
Jun 23, 2025
7961c48
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
Jun 23, 2025
3f95600
Merge branch 'main' into nebula/feature/transcript
Anishyou Jun 23, 2025
2c6667d
adding llm files in .gitignore , updaing cleanup logic
Jun 23, 2025
672fd10
Fix issue by regenerating files
cremertim Jun 25, 2025
0e7e11f
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
cremertim Jun 25, 2025
9be133f
Added .env to configure ports. Refactored logging
cremertim Jun 25, 2025
3d18702
Add LLM to faq_service
cremertim Jun 25, 2025
493c8ba
Move files for better structure
cremertim Jun 25, 2025
4b6d71a
remove empty folder
cremertim Jun 25, 2025
d43979d
rename variables
cremertim Jun 25, 2025
142783f
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
Jun 28, 2025
d3a8442
Make custom gateway obsolet
cremertim Jun 29, 2025
bf248dd
Add status update rewriting example
cremertim Jun 29, 2025
25764ad
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
Jun 29, 2025
bd18ffd
Change package
cremertim Jul 1, 2025
cb44011
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
Jul 2, 2025
263b3e1
Refined retry logic for azure whisper, update routes url , update tra…
Jul 3, 2025
e415d3a
Add openAI api availability
Jul 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .github/workflows/nebula_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,22 @@ jobs:
run: poetry install
working-directory: nebula

- name: Use example config as test config
run: cp application_local.example.nebula.yml application_local.nebula.yml
working-directory: nebula

- name: Clean old coverage data
run: |
rm -f .coverage
rm -f coverage.xml
working-directory: nebula

- name: Run Tests
run: poetry run coverage run -m pytest
working-directory: nebula

- name: Generate Coverage Report
run: poetry run coverage xml
run: poetry run coverage xml --ignore-errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the necessity of the --ignore-errors flag.

While the --ignore-errors flag improves CI robustness, it might mask legitimate coverage issues. Consider monitoring if this flag is actually needed or if the underlying issues can be resolved.

To better understand if this flag is necessary, you could run the following to check for common coverage issues:


🏁 Script executed:

#!/bin/bash
# Description: Check for potential coverage configuration issues
# Expected: Find coverage configuration and identify potential issues

# Look for coverage configuration
fd -e cfg -e ini -e toml | xargs grep -l coverage 2>/dev/null || echo "No coverage config found"

# Check for any .coveragerc files
fd ".coveragerc"

# Look for pyproject.toml coverage settings
if [ -f "nebula/pyproject.toml" ]; then
    echo "=== Coverage settings in pyproject.toml ==="
    grep -A 10 -B 2 "\[tool\.coverage" nebula/pyproject.toml || echo "No coverage settings found in pyproject.toml"
fi

Length of output: 400


Remove the unnecessary --ignore-errors flag from coverage XML generation

No coverage configuration was found (no .coveragerc and no [tool.coverage] section in nebula/pyproject.toml), so coverage xml should succeed by default. The --ignore-errors flag can mask real issues and isn’t needed unless you’re actively seeing failures in CI.

  • File: .github/workflows/nebula_test.yml, line 91

Suggested diff:

- run: poetry run coverage xml --ignore-errors
+ run: poetry run coverage xml

If you do encounter failures after removing the flag, consider adding a proper coverage config (.coveragerc or [tool.coverage] in pyproject.toml) rather than silencing errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: poetry run coverage xml --ignore-errors
run: poetry run coverage xml
🤖 Prompt for AI Agents
In .github/workflows/nebula_test.yml at line 91, remove the --ignore-errors flag
from the coverage xml command because no coverage configuration files were found
and this flag may hide real errors. Update the run command to simply use "poetry
run coverage xml" to allow any genuine issues to surface. If errors occur after
this change, address them by adding proper coverage configuration instead of
suppressing errors.

working-directory: nebula

#- name: Upload Coverage Report
Expand Down
1 change: 1 addition & 0 deletions iris/src/iris/web/routers/pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ def run_exercise_chat_pipeline(
# Additional validation for ChatGPT wrapper variant
validate_pipeline_variant(dto.settings, ChatGPTWrapperPipeline)
thread = Thread(target=run_chatgpt_wrapper_pipeline_worker, args=(dto, variant))

else:
thread = Thread(
target=run_exercise_chat_pipeline_worker,
Expand Down
1 change: 1 addition & 0 deletions nebula/.flake8
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[flake8]
max-line-length = 120
extend-ignore = E203
exclude =
.git,
__pycache__,
Expand Down
7 changes: 7 additions & 0 deletions nebula/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -177,3 +177,10 @@ cython_debug/
.idea/

.DS_Store

# Ignore transcription temp files
nebula/src/nebula/transcript/temp/

# Ignore real config files
nebula/llm_config.nebula.yml
nebula/application_local.nebula.yml
20 changes: 0 additions & 20 deletions nebula/Dockerfile

This file was deleted.

3 changes: 0 additions & 3 deletions nebula/README.MD

This file was deleted.

5 changes: 5 additions & 0 deletions nebula/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Nebula

This is the central orchestration repository for all Nebula services.

- [Transcript README](./src/nebula/transcript/README.md)
2 changes: 2 additions & 0 deletions nebula/application_local.example.nebula.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
api_keys:
- token: "nebula-secret"
47 changes: 47 additions & 0 deletions nebula/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
version: "3.9"

services:
transcriber:
build:
context: .
dockerfile: docker/transcriber/Dockerfile
container_name: nebula-transcriber
expose:
- "5000" # interne Kommunikation mit Envoy
environment:
- APPLICATION_YML_PATH=/app/application_local.nebula.yml
- LLM_CONFIG_PATH=/app/llm_config.nebula.yml
volumes:
- ./application_local.nebula.yml:/app/application_local.nebula.yml:ro
- ./llm_config.nebula.yml:/app/llm_config.nebula.yml:ro
- ./temp:/app/temp
restart: unless-stopped

faq:
build:
context: .
dockerfile: docker/faq/Dockerfile
container_name: ${FAQ_SERVICE_NAME}
expose:
- "${FAQ_SERVICE_PORT}"
environment:
- APPLICATION_YML_PATH=/app/application_local.nebula.yml
- LLM_CONFIG_PATH=/app/llm_config.nebula.yml
- FAQ_SERVICE_PORT=${FAQ_SERVICE_PORT}
volumes:
- ./application_local.nebula.yml:/app/application_local.nebula.yml:ro
- ./llm_config.nebula.yml:/app/llm_config.nebula.yml:ro
restart: unless-stopped

envoy:
image: envoyproxy/envoy:v1.30-latest
container_name: envoy
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml:ro
ports:
- "8000:8000" # HTTP for FastAPI
- "50051:50051" # gRPC
- "9901:9901" # Admin interface
depends_on:
- transcriber
- faq
44 changes: 44 additions & 0 deletions nebula/docker/faq/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
FROM python:3.12-slim

WORKDIR /app

# Install Poetry and project dependencies
COPY pyproject.toml poetry.lock ./
RUN pip install poetry && poetry install --only main --no-root

# Copy source code and config
COPY src/nebula ./nebula
COPY application_local.nebula.yml .
COPY llm_config.nebula.yml .

# COMPILE PROTO FROM ROOT CONTEXT!
# This makes relative imports (from . import faq_pb2) work inside grpc_stubs
WORKDIR /app

# Compile .proto file into grpc_stubs
# Compile the proto file
RUN poetry run python -m grpc_tools.protoc \
-I=nebula/protos \
--python_out=nebula/grpc_stubs \
--grpc_python_out=nebula/grpc_stubs \
nebula/protos/faq.proto && \
echo "✅ Proto compiled!" && \
ls -l nebula/grpc_stubs

RUN sed -i 's/^import faq_pb2/import nebula.grpc_stubs.faq_pb2/' nebula/grpc_stubs/faq_pb2_grpc.py


# Debug output: list grpc_stubs content
RUN echo "Contents of grpc_stubs:" && ls -l nebula/grpc_stubs

# Ensure grpc_stubs is a proper Python package
RUN touch nebula/grpc_stubs/__init__.py

# Set PYTHONPATH so packages can be resolved
ENV PYTHONPATH=/app

# Expose the gRPC port
EXPOSE 50052

# Start the FAQ gRPC server
CMD ["poetry", "run", "python", "nebula/faq/faq_server.py"]
34 changes: 34 additions & 0 deletions nebula/docker/gateway/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# docker/gateway/Dockerfile
FROM python:3.12-slim

WORKDIR /app

# Install Poetry + dependencies
COPY pyproject.toml poetry.lock ./
RUN pip install poetry && poetry install --only main --no-root

# Copy source code (außer protos und stubs)
COPY src/nebula ./nebula
COPY application_local.nebula.yml ./

# Copy proto definitions
COPY src/nebula/protos ./nebula/protos

# Compile proto to grpc_stubs. This should be done in the compose later on
RUN poetry run python -m grpc_tools.protoc \
-I=nebula/protos \
--python_out=nebula/grpc_stubs \
--grpc_python_out=nebula/grpc_stubs \
nebula/protos/faq.proto && \
echo "✅ Proto compiled!" && \
ls -l nebula/grpc_stubs


RUN sed -i 's/^import faq_pb2/import nebula.grpc_stubs.faq_pb2/' nebula/grpc_stubs/faq_pb2_grpc.py


# Set Python path so modules are discoverable
ENV PYTHONPATH=/app

# Start FastAPI (oder später kombiniert mit gRPC)
CMD ["poetry", "run", "uvicorn", "nebula.gateway.main:app", "--host", "0.0.0.0", "--port", "8000"]
36 changes: 36 additions & 0 deletions nebula/docker/transcriber/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Use Python 3.12 slim image
FROM python:3.12-slim

# System dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
tesseract-ocr \
libgl1 \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*

# Set working directory inside the container
WORKDIR /app

# Copy dependency files first
COPY pyproject.toml poetry.lock ./

# Install poetry
RUN pip install poetry

# Install only main dependencies
RUN poetry install --only main --no-root

# Copy source code and config files
COPY src/nebula ./src/nebula
COPY application_local.nebula.yml .
COPY llm_config.nebula.yml .

# Set PYTHONPATH so Python can find src/nebula
ENV PYTHONPATH=/app/src

# Expose FastAPI port
EXPOSE 5000

# Run app
CMD ["poetry", "run", "uvicorn", "nebula.transcript.app:app", "--host", "0.0.0.0", "--port", "5000"]
119 changes: 119 additions & 0 deletions nebula/envoy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
static_resources:
listeners:
# HTTP listener (for services like FastAPI)
- name: listener_http
address:
socket_address: { address: 0.0.0.0, port_value: 8000 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: http_route
virtual_hosts:
- name: fastapi
domains: ["*"]
routes:
- match: { prefix: "/transcribe" }
route: { cluster: transcriber }

# Add new HTTP service routes here if needed
# - match: { prefix: "/another-api" }
# route: { cluster: another_http_service }

http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

# gRPC listener on port 50051
- name: listener_grpc
address:
socket_address:
address: 0.0.0.0
port_value: 50051
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: AUTO
stat_prefix: ingress_grpc
route_config:
name: grpc_route
virtual_hosts:
- name: grpc_services
domains: ["*"]
routes:
- match:
prefix: "/de.tum.cit.aet.artemis.nebula.FAQService/"
route:
cluster: faq

# Add a new gRPC service route by its fully-qualified service name
# - match:
# prefix: "/lectureservice.LectureService/"
# route:
# cluster: lecture

http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

clusters:
# Cluster for the FastAPI-based transcriber service
- name: transcriber
connect_timeout: 0.5s
type: logical_dns
lb_policy: round_robin
load_assignment:
cluster_name: transcriber
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: transcriber
port_value: 5000

# Cluster for the FAQ gRPC service
- name: faq
connect_timeout: 0.5s
type: logical_dns
lb_policy: round_robin
http2_protocol_options: {} # Required for gRPC
load_assignment:
cluster_name: faq
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: nebula-faq
port_value: 50052

# Add a new gRPC cluster for another microservice (e.g., lecture)
# - name: lecture
# connect_timeout: 0.5s
# type: logical_dns
# lb_policy: round_robin
# http2_protocol_options: {} # Required for gRPC
# load_assignment:
# cluster_name: lecture
# endpoints:
# - lb_endpoints:
# - endpoint:
# address:
# socket_address:
# address: nebula-lecture # Docker container name or DNS
# port_value: 50052 # Port used by the lecture gRPC server

admin:
access_log_path: /tmp/envoy_admin.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
Loading
Loading