-
Notifications
You must be signed in to change notification settings - Fork 3
Nebula
: Introduce Nebula Transcription Service
#108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Anishyou
wants to merge
76
commits into
main
Choose a base branch
from
nebula/feature/transcript
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+3,617
−368
Open
Changes from all commits
Commits
Show all changes
76 commits
Select commit
Hold shift + click to select a range
bf64035
integrated transcription for lectures
1a0cbf9
Merge branch 'main' into nebula/feature/transcript
Anishyou a155c04
Local transcription system
82faf13
README.MD
ae8d0ad
migrated transcription module to src/transcript
42eefac
added an example llm config yml file
bf1a7dc
updated README.MD file
cc70c21
Merge branch 'main' into nebula/feature/transcript
Anishyou 36c9942
changed to fastapi
29226cd
updated requirements.txt
aca21b8
linting
22ede99
Moved readme to transcript
4761fd2
Using Whisper API. Added Authentication token and Health endpoint
2e44879
Adding audio_utils.py
0638f0d
Refactored transcript into NEBULA.
3265a5b
Added dto.py
aeebc39
update README
093cbfa
updated dto
20d2aa3
Reversing language change. Earlier one was correct
842f308
Updating dto as lectureID is not needed
ed06049
Added poetry , so easier to run. Updated Readme.md
32f35b5
Updated docker, Now transcriber can be ran through docker or poetry
11d9c12
deleting old Dockerfile
576ebc2
update README
4c6f874
reformatted
4df505c
Merge branch 'main' into nebula/feature/transcript
Anishyou a1a5bb4
fixing lower case
b94c1f0
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
95fb271
Added mock tests. Updated docker compose yml
bf64cc7
use example conifg as test config, so changing workflow
7f002d2
Merge branch 'main' into nebula/feature/transcript
Anishyou 0e2d4f2
Adding pre commit in .toml
7f8e49d
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
c936bca
Adding logging statements
ed1b690
Linting issues
414c027
Linting issues
0c67671
Made changes for linting and other issues with code quality
4be98c3
Due to flake and black contradicting each other, had to put ignore in…
4b5ddd9
lazyloading apikeys
c07e840
deleting unecessary import
52299ec
fixing tests
09b3c1a
Remove stale config files causing coverage issues
a0ea546
updating pyproject.toml
9f914ec
Fixed temp cleaning issue
b5b2dd9
comment
ada4f59
fix test
f87b300
fix test
ba86581
fix test
7d02cb3
Added a common authentication system. Also added a jobs where now mul…
305b5f8
fix lint
d0f20dd
Updated dockerfile and docker-compose.yml
68644b9
update README.md
9641c6f
Merge branch 'main' into nebula/feature/transcript
Anishyou c2befd1
change port for mac
42b71c4
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
73862a2
Remove stuff
cremertim 0dc47fc
add variant again
cremertim 16b7194
Replacing print with logging
7961c48
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
3f95600
Merge branch 'main' into nebula/feature/transcript
Anishyou 2c6667d
adding llm files in .gitignore , updaing cleanup logic
672fd10
Fix issue by regenerating files
cremertim 0e7e11f
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
cremertim 9be133f
Added .env to configure ports. Refactored logging
cremertim 3d18702
Add LLM to faq_service
cremertim 493c8ba
Move files for better structure
cremertim 4b6d71a
remove empty folder
cremertim d43979d
rename variables
cremertim 142783f
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
d3a8442
Make custom gateway obsolet
cremertim bf248dd
Add status update rewriting example
cremertim 25764ad
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
bd18ffd
Change package
cremertim cb44011
Merge remote-tracking branch 'origin/nebula/feature/transcript' into …
263b3e1
Refined retry logic for azure whisper, update routes url , update tra…
e415d3a
Add openAI api availability
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
[flake8] | ||
max-line-length = 120 | ||
extend-ignore = E203 | ||
exclude = | ||
.git, | ||
__pycache__, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Nebula | ||
|
||
This is the central orchestration repository for all Nebula services. | ||
|
||
- [Transcript README](./src/nebula/transcript/README.md) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
api_keys: | ||
- token: "nebula-secret" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
version: "3.9" | ||
|
||
services: | ||
transcriber: | ||
build: | ||
context: . | ||
dockerfile: docker/transcriber/Dockerfile | ||
container_name: nebula-transcriber | ||
expose: | ||
- "5000" # interne Kommunikation mit Envoy | ||
environment: | ||
- APPLICATION_YML_PATH=/app/application_local.nebula.yml | ||
- LLM_CONFIG_PATH=/app/llm_config.nebula.yml | ||
volumes: | ||
- ./application_local.nebula.yml:/app/application_local.nebula.yml:ro | ||
- ./llm_config.nebula.yml:/app/llm_config.nebula.yml:ro | ||
- ./temp:/app/temp | ||
restart: unless-stopped | ||
|
||
faq: | ||
build: | ||
context: . | ||
dockerfile: docker/faq/Dockerfile | ||
container_name: ${FAQ_SERVICE_NAME} | ||
expose: | ||
- "${FAQ_SERVICE_PORT}" | ||
environment: | ||
- APPLICATION_YML_PATH=/app/application_local.nebula.yml | ||
- LLM_CONFIG_PATH=/app/llm_config.nebula.yml | ||
- FAQ_SERVICE_PORT=${FAQ_SERVICE_PORT} | ||
volumes: | ||
- ./application_local.nebula.yml:/app/application_local.nebula.yml:ro | ||
- ./llm_config.nebula.yml:/app/llm_config.nebula.yml:ro | ||
restart: unless-stopped | ||
|
||
envoy: | ||
image: envoyproxy/envoy:v1.30-latest | ||
container_name: envoy | ||
volumes: | ||
- ./envoy.yaml:/etc/envoy/envoy.yaml:ro | ||
ports: | ||
- "8000:8000" # HTTP for FastAPI | ||
- "50051:50051" # gRPC | ||
- "9901:9901" # Admin interface | ||
depends_on: | ||
- transcriber | ||
- faq |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
FROM python:3.12-slim | ||
|
||
WORKDIR /app | ||
|
||
# Install Poetry and project dependencies | ||
COPY pyproject.toml poetry.lock ./ | ||
RUN pip install poetry && poetry install --only main --no-root | ||
|
||
# Copy source code and config | ||
COPY src/nebula ./nebula | ||
COPY application_local.nebula.yml . | ||
COPY llm_config.nebula.yml . | ||
|
||
# COMPILE PROTO FROM ROOT CONTEXT! | ||
# This makes relative imports (from . import faq_pb2) work inside grpc_stubs | ||
WORKDIR /app | ||
|
||
# Compile .proto file into grpc_stubs | ||
# Compile the proto file | ||
RUN poetry run python -m grpc_tools.protoc \ | ||
-I=nebula/protos \ | ||
--python_out=nebula/grpc_stubs \ | ||
--grpc_python_out=nebula/grpc_stubs \ | ||
nebula/protos/faq.proto && \ | ||
echo "✅ Proto compiled!" && \ | ||
ls -l nebula/grpc_stubs | ||
|
||
RUN sed -i 's/^import faq_pb2/import nebula.grpc_stubs.faq_pb2/' nebula/grpc_stubs/faq_pb2_grpc.py | ||
|
||
|
||
# Debug output: list grpc_stubs content | ||
RUN echo "Contents of grpc_stubs:" && ls -l nebula/grpc_stubs | ||
|
||
# Ensure grpc_stubs is a proper Python package | ||
RUN touch nebula/grpc_stubs/__init__.py | ||
|
||
# Set PYTHONPATH so packages can be resolved | ||
ENV PYTHONPATH=/app | ||
|
||
# Expose the gRPC port | ||
EXPOSE 50052 | ||
|
||
# Start the FAQ gRPC server | ||
CMD ["poetry", "run", "python", "nebula/faq/faq_server.py"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# docker/gateway/Dockerfile | ||
FROM python:3.12-slim | ||
|
||
WORKDIR /app | ||
|
||
# Install Poetry + dependencies | ||
COPY pyproject.toml poetry.lock ./ | ||
RUN pip install poetry && poetry install --only main --no-root | ||
|
||
# Copy source code (außer protos und stubs) | ||
COPY src/nebula ./nebula | ||
COPY application_local.nebula.yml ./ | ||
|
||
# Copy proto definitions | ||
COPY src/nebula/protos ./nebula/protos | ||
|
||
# Compile proto to grpc_stubs. This should be done in the compose later on | ||
RUN poetry run python -m grpc_tools.protoc \ | ||
-I=nebula/protos \ | ||
--python_out=nebula/grpc_stubs \ | ||
--grpc_python_out=nebula/grpc_stubs \ | ||
nebula/protos/faq.proto && \ | ||
echo "✅ Proto compiled!" && \ | ||
ls -l nebula/grpc_stubs | ||
|
||
|
||
RUN sed -i 's/^import faq_pb2/import nebula.grpc_stubs.faq_pb2/' nebula/grpc_stubs/faq_pb2_grpc.py | ||
|
||
|
||
# Set Python path so modules are discoverable | ||
ENV PYTHONPATH=/app | ||
|
||
# Start FastAPI (oder später kombiniert mit gRPC) | ||
CMD ["poetry", "run", "uvicorn", "nebula.gateway.main:app", "--host", "0.0.0.0", "--port", "8000"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Use Python 3.12 slim image | ||
FROM python:3.12-slim | ||
|
||
# System dependencies | ||
RUN apt-get update && apt-get install -y \ | ||
ffmpeg \ | ||
tesseract-ocr \ | ||
libgl1 \ | ||
libglib2.0-0 \ | ||
&& rm -rf /var/lib/apt/lists/* | ||
|
||
# Set working directory inside the container | ||
WORKDIR /app | ||
|
||
# Copy dependency files first | ||
COPY pyproject.toml poetry.lock ./ | ||
|
||
# Install poetry | ||
RUN pip install poetry | ||
|
||
# Install only main dependencies | ||
RUN poetry install --only main --no-root | ||
|
||
# Copy source code and config files | ||
COPY src/nebula ./src/nebula | ||
COPY application_local.nebula.yml . | ||
COPY llm_config.nebula.yml . | ||
|
||
# Set PYTHONPATH so Python can find src/nebula | ||
ENV PYTHONPATH=/app/src | ||
|
||
# Expose FastAPI port | ||
EXPOSE 5000 | ||
|
||
# Run app | ||
CMD ["poetry", "run", "uvicorn", "nebula.transcript.app:app", "--host", "0.0.0.0", "--port", "5000"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
static_resources: | ||
listeners: | ||
# HTTP listener (for services like FastAPI) | ||
- name: listener_http | ||
address: | ||
socket_address: { address: 0.0.0.0, port_value: 8000 } | ||
filter_chains: | ||
- filters: | ||
- name: envoy.filters.network.http_connection_manager | ||
typed_config: | ||
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager | ||
stat_prefix: ingress_http | ||
route_config: | ||
name: http_route | ||
virtual_hosts: | ||
- name: fastapi | ||
domains: ["*"] | ||
routes: | ||
- match: { prefix: "/transcribe" } | ||
route: { cluster: transcriber } | ||
|
||
# Add new HTTP service routes here if needed | ||
# - match: { prefix: "/another-api" } | ||
# route: { cluster: another_http_service } | ||
|
||
http_filters: | ||
- name: envoy.filters.http.router | ||
typed_config: | ||
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router | ||
|
||
# gRPC listener on port 50051 | ||
- name: listener_grpc | ||
address: | ||
socket_address: | ||
address: 0.0.0.0 | ||
port_value: 50051 | ||
filter_chains: | ||
- filters: | ||
- name: envoy.filters.network.http_connection_manager | ||
typed_config: | ||
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager | ||
codec_type: AUTO | ||
stat_prefix: ingress_grpc | ||
route_config: | ||
name: grpc_route | ||
virtual_hosts: | ||
- name: grpc_services | ||
domains: ["*"] | ||
routes: | ||
- match: | ||
prefix: "/de.tum.cit.aet.artemis.nebula.FAQService/" | ||
route: | ||
cluster: faq | ||
|
||
# Add a new gRPC service route by its fully-qualified service name | ||
# - match: | ||
# prefix: "/lectureservice.LectureService/" | ||
# route: | ||
# cluster: lecture | ||
|
||
http_filters: | ||
- name: envoy.filters.http.router | ||
typed_config: | ||
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router | ||
|
||
clusters: | ||
# Cluster for the FastAPI-based transcriber service | ||
- name: transcriber | ||
connect_timeout: 0.5s | ||
type: logical_dns | ||
lb_policy: round_robin | ||
load_assignment: | ||
cluster_name: transcriber | ||
endpoints: | ||
- lb_endpoints: | ||
- endpoint: | ||
address: | ||
socket_address: | ||
address: transcriber | ||
port_value: 5000 | ||
|
||
# Cluster for the FAQ gRPC service | ||
- name: faq | ||
connect_timeout: 0.5s | ||
type: logical_dns | ||
lb_policy: round_robin | ||
http2_protocol_options: {} # Required for gRPC | ||
load_assignment: | ||
cluster_name: faq | ||
endpoints: | ||
- lb_endpoints: | ||
- endpoint: | ||
address: | ||
socket_address: | ||
address: nebula-faq | ||
port_value: 50052 | ||
|
||
# Add a new gRPC cluster for another microservice (e.g., lecture) | ||
# - name: lecture | ||
# connect_timeout: 0.5s | ||
# type: logical_dns | ||
# lb_policy: round_robin | ||
# http2_protocol_options: {} # Required for gRPC | ||
# load_assignment: | ||
# cluster_name: lecture | ||
# endpoints: | ||
# - lb_endpoints: | ||
# - endpoint: | ||
# address: | ||
# socket_address: | ||
# address: nebula-lecture # Docker container name or DNS | ||
# port_value: 50052 # Port used by the lecture gRPC server | ||
|
||
admin: | ||
access_log_path: /tmp/envoy_admin.log | ||
address: | ||
socket_address: | ||
address: 0.0.0.0 | ||
port_value: 9901 |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify the necessity of the
--ignore-errors
flag.While the
--ignore-errors
flag improves CI robustness, it might mask legitimate coverage issues. Consider monitoring if this flag is actually needed or if the underlying issues can be resolved.To better understand if this flag is necessary, you could run the following to check for common coverage issues:
🏁 Script executed:
Length of output: 400
Remove the unnecessary
--ignore-errors
flag from coverage XML generationNo coverage configuration was found (no
.coveragerc
and no[tool.coverage]
section innebula/pyproject.toml
), socoverage xml
should succeed by default. The--ignore-errors
flag can mask real issues and isn’t needed unless you’re actively seeing failures in CI..github/workflows/nebula_test.yml
, line 91Suggested diff:
If you do encounter failures after removing the flag, consider adding a proper coverage config (
.coveragerc
or[tool.coverage]
inpyproject.toml
) rather than silencing errors.📝 Committable suggestion
🤖 Prompt for AI Agents