[RFC]: AIBrix Multi-Modality: Best-in-Class Omni-Modal Serving Platform

### Summary

vLLM-Omni already provides the multi-stage serving framework with pluggable connectors, Cache-DiT acceleration, and OpenAI-compatible APIs. AIBrix provides Kubernetes-native inference infrastructure with gateway routing, autoscaling, PrisKV KV cache store, and StormService orchestration.

This proposal integrates them deeply — making AIBrix the best platform to run multi-modal models powered by vLLM-Omni.


### Motivation


Multi-modal AI is moving from research demos to production workloads. Applications now combine text chat, image generation, video generation, speech recognition (ASR), and text-to-speech (TTS) in a single user experience. Serving these workloads efficiently requires solving problems that neither standalone LLM inference nor single-model diffusion serving addresses:

- Heterogeneous pipeline stages — An omni pipeline chains ASR (1.7B params, lightweight) → LLM (235B params, compute-heavy prefill, memory-bound decode) → DiT (burst GPU for diffusion steps) → TTS (real-time audio streaming). Each stage has fundamentally different resource profiles.
- Inter-stage data transfer — KV caches, visual tokens, and audio embeddings must flow between stages with minimal latency. CPU-staged copies are a bottleneck.
- Independent scaling — Image generation traffic spikes don't correlate with text chat load. Scaling the entire pipeline uniformly wastes GPUs.
- GPU cost — A naive deployment dedicates one GPU per model. For a full omni pipeline (ASR + LLM + DiT + TTS), that's at least 4+ GPUs minimum, even when most models are idle most of the time.
- Cloud-native orchestration — Production deployments run on Kubernetes. Ray adds a second distributed runtime on top of K8s, increasing operational complexity.


### Proposed Change

TODO. A proposal will come soon

### Alternatives Considered

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: AIBrix Multi-Modality: Best-in-Class Omni-Modal Serving Platform #1966

Summary

Motivation

Proposed Change

Alternatives Considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: AIBrix Multi-Modality: Best-in-Class Omni-Modal Serving Platform #1966

Description

Summary

Motivation

Proposed Change

Alternatives Considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions