Ollama sucks: 1. Chat template are broken most of the time 2. What is Versioning???? 3. Don't add any additional feature... Now with [llama-server](https://github.com/ggerganov/llama.cpp/tree/master/examples/server) we can do the same as ollama. For the cat, prefer to start with a plugin that add [ChatLLama](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/llamacpp.py).