This repo contains a few ways to serve Whisper:
- With a KServe model using hf pipelines
- With a KServe model using vLLM
- With Triton using vLLM (currently not mature enough)
It also contains an example for how to fetch files from S3 as part of the preprocessing.