Adding vLLM, PostgreSQL/pg_vector, and UV #23

thesteve0 · 2025-06-16T06:06:15Z

thesteve0
Jun 16, 2025

Hey folks - I just spun up the default Allycat with Docling and several different Open Weights models. Kudos to those who have contributed before - basically worked right out of the box and nice and simple flow. Also love that you also included a pure code version for every jupyter notebook.
Couple of questions:

I found some typos or instructions that could be updated. Should I just PR those?
I want to add using vLLM rather than Ollama to Allycat as the inference server. Once I do that I would like to feed that back into the project.
a. Is that welcome or should I just leave it in my own fork?
b. If it is welcome, how should I approach contributing it back?
Same request again but this time using PostgreSQL pg_vector rather than Milvus.
How are people feeling about converting this to a UV project for dependency management? https://docs.astral.sh/uv/

thesteve0 · 2025-06-17T18:20:20Z

thesteve0
Jun 17, 2025
Author

@sujee please feel free to tag anyone else in that you think could help with this discussion

0 replies

nat3058 · 2025-08-05T23:50:40Z

nat3058
Aug 5, 2025

Hi Steve,

I am also working on adding vLLM support as a contribution and it seems non-trivial to support (especially for CPU-only machines).

Were you able to add vLLM to Allycat? Also, is there a specific reason for using vLLM instead of Ollama (perhaps you want local GPU inference)? Ollama seems to be pretty good for inference on local CPU-only machines (correct me if I am wrong).

I have looked into adding vLLM and have noted my thoughts here. Any help is appreciated.

1 reply

thesteve0 Aug 6, 2025
Author

Hey Nat
I did do this, along with some other refactoring, in a separate fork but in the meantime AllyCat moved to a new framework making the integration much easier
https://github.com/thesteve0/allycat/tree/vllm-update-rebased

I will be adding vLLM to the newer version of AllyCat soon

I work at Red Hat and one of my tasks is the vLLM community work. I chose it because I thought it would be a good addition to show alternates to model serving for inference. It is not that hard to start for CPU only machines but its use case is primarly for multi-user serving. If you just want to run something locally on your machine then ollama is a simpler solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding vLLM, PostgreSQL/pg_vector, and UV #23

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Adding vLLM, PostgreSQL/pg_vector, and UV #23

Uh oh!

thesteve0 Jun 16, 2025

Replies: 2 comments · 1 reply

Uh oh!

thesteve0 Jun 17, 2025 Author

Uh oh!

nat3058 Aug 5, 2025

Uh oh!

thesteve0 Aug 6, 2025 Author

thesteve0
Jun 16, 2025

Replies: 2 comments 1 reply

thesteve0
Jun 17, 2025
Author

nat3058
Aug 5, 2025

thesteve0 Aug 6, 2025
Author