The AI-Box is an encapsulated solution general AI tasks including:
- Generative AI. Function calling included
(is in progress)
- Proxy to OpenAI
- The list of OpenSource models:
- gorilla-openfunctions-v2
- Audio processing including
- Segmentation
- Diarization
- Realtime audio stream processing
(is in progress)
docker build -t ai-box:latest .
To run the Docker container, use the following command:
docker run -it \
-v local_path_for_video_files:/usr/src/app/download \
-p 8765:8765 \
ghcr.io/symfa-inc/ai-box:latest
Mount your local directory to the container's /usr/src/app/download
directory, so that the transcriber can access the files for processing.
version: "3.8"
services:
aibox:
image: ghcr.io/symfa-inc/ai-box:latest
container_name: aibox
ports:
- "8765:8765"
environment:
SPEAKER: segmentation
MODE: CPU
QUALITY: LOW
PARALLELISM: 1
volumes:
- local_path_for_video_files:/usr/src/app/download
You can change parameters of the server to find the optimal performance/quality comprise for you solution with the following parameters:
SPEAKER
:segmentation
ordiarization
.diarization
is better but as it can say you who say, what and when. Segmentation is only split audio to segments with different speakers, however it can be better choice forCPU
processing.MODE
:CPU
orGPU
.QUALITY
: - transcription quality level.DEBUG
- not acceptable level of quality for most of cases. But can be useful for debug environments.LOW
- the optimal level for CPUMEDIUM
HIGH
PARALLELISM
: Integer, default1
. How many files transcriber can process in parallel.
All requests should follow the JSON format. Below is an example of a request to process a file:
{
"file_path": "video.mp4",
"speaker": "diarization",
"mode": "cpu",
"quality": "medium"
}
file_path
(required): Specifies the path to the input file.speaker
: (optional): Specifies the desired processing mode, eitherdiarization
orsegmentation
.mode
(optional): Specifies the processing mode, eithergpu
orcpu
.quality
(optional): Specifies the quality of processing, one ofdebug
,low
,medium
, orhigh
.
Responses from are also in JSON format. Below is an example response to the processing request:
{
"type": "recording_processed",
"file_name": "video.wav",
"data": "{transcriptionText}"
}
type
: Indicates the status of the processing. It can be "recording_queued" when the file is queued, "recording_processed" when the file is processed successfully, or "recording_errored" if processing failed.file_name
: Specifies the name of the input file.data
(optional): Contains additional information. If processing is successful, it returns the result of the processing. If processing fails, it returns an error message explaining the failure.
const WebSocket = require('ws');
// Connect to the WebSocket server
const ws = new WebSocket('ws://localhost:8765');
ws.on('open', function open() {
console.log('Connected to server');
// Send a request to prcess the file. Expectation that the video file
// is in the {local_path_for_video_files}
ws.send(JSON.stringify({"file_path": "video.mp4"}));
});
ws.on('message', function incoming(message) {
//{"data": "transcriptionText"}.
console.log('Message from server: %s', message.data);
});
import asyncio
import websockets
import json
async def talk():
uri = "ws://localhost:8765"
async with websockets.connect(uri) as websocket:
print("Connected to server")
message = {
'file_path': "video.mp4",
'speaker': "segmentation",
'quality': "medium"
}
await websocket.send(json.dumps(message))
response = await websocket.receive()
response_json = json.loads(response)
# {"result": "transcriptionText"}.
print(f"Message from server: {response_json.data}")
asyncio.run(talk())