Hi,
I’d like to contribute by adding support for serving a local embedding model.
The idea is to serve the embedding model via a vLLM container.
Specifically, I’m thinking of adding support for small Korean embedding models that can run on low-spec GPUs, such as:
Would this be a welcome contribution? Please let me know if there are any guidelines or preferences I should follow.
Thanks!