Description
🚀 Feature
Support websocket endpoints to allow two-way real-time data communication.
Motivation
Currently, the requests are processed with the expectation that the data is complete and stateless. However, the input data isn't always ready immediately for use cases like speech to text, text to speech, audio/speech understanding, especially in time-sensitive situations. With the recent release of Realtime API from OpenAI and a new family of voice AI models (ultravox, mini-omni, llama-omni, moshi), support for streaming input and output could benefit the community in many ways and unlock even more creative uses of AI models.
Pitch
Support streaming input and output with websocket or any other methods to allow real-time AI applications.
Alternatives
A typical FastAPI websocket implementation is very template-like:
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
try:
while True:
message = await websocket.receive_bytes()
# process data
# results = model(parse(message))
await websocket.send_json(results)
except WebSocketDisconnect:
logger.error("WebSocket disconnected")
except Exception as e:
logger.error(f"Error: {e}")
if websocket.client_state != WebSocketState.DISCONNECTED:
await websocket.close(code=1001)
finally:
# clean up
However, this might make the batching impossible or complicated.
I am new to this repo, so if there is a workaround by hacking the server/spec/api to allow websocket, I am more than happy to contribute. If this is duplicate/irrelevant, sorry for the trouble.
Thanks a million for open sourcing this awesome project. ❤️