Description
We are very interested in integrating open source self-hosted multi-modal models into LLMWare. We have been watching the space closely and looking for ideas and contributions for supporting open source multi-modal models that work in conjunction with RAG and Agent-based automation pipelines.
Our key criteria is that there must be a use case related to some business objective (e.g., not just image generation), the model needs to work reasonably well, and should be self-hostable (e.g., max of 10-15B parameters).
To implement, the key focus will be the construction of a new MultiModal model class, and design of the preprocessor and postprocessors required to handle the multi-modal content, along with support for the underlying model packaging (e.g., GGUF, Pytorch, ONNX, OpenVino). We would look to collaborate and will support the underlying inferencing technology required.