YAOCr is a RAG tool for windows desktop (WinUI 3) that allows you to chat with an LLM model and work with your own documents (.txt, .json...).
- LLM Backend: Llama.cpp
- Vector DB: Qdrant
- LLM model: Gemma 3
- Embedding model: Embedding Gemma
- Replace
d:\\llama.cpp\\modelsby the path where the models are saved
Llama.cpp
docker run --name llama_cpp -p 8010:8010 -v d:\\llama.cpp\\models:/models ghcr.io/ggml-org/llama.cpp:server -m /models/gemma-3-4b-it-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 8010
Llama.cpp GPU version
docker run --name llama_cpp_gpu -p 8010:8010 -v d:\\llama.cpp\\models:/models --gpus all ghcr.io/ggml-org/llama.cpp:server-cuda -m /models/gemma-3-4b-it-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 8010 --n-gpu-layers 99
Embeddings
docker run --name embeddings -p 8001:8001 -v d:\\llama.cpp\\models:/models ghcr.io/ggml-org/llama.cpp:server -m /models/embeddinggemma-300M-BF16.gguf -c 2048 -ub 2048 --host 0.0.0.0 --port 8001 --embeddings -ngl 99
Embeddings GPU version
docker run --name embeddings_gpu -p 8001:8001 -v d:\\llama.cpp\\models:/models --gpus all ghcr.io/ggml-org/llama.cpp:server-cuda -m /models/embeddinggemma-300M-BF16.gguf -c 2048 -ub 2048 --host 0.0.0.0 --port 8001 --embeddings -ngl 99
Qdrant (https://qdrant.tech/documentation/quickstart/)
docker run -p 6333:6333 -p 6334:6334 -v "$(pwd)/qdrant_storage:/qdrant/storage:z" qdrant/qdrant
[2025-11-25] Parsing documents has been externalize to plugins. Check YAOCr.Plugins.PlainText project as reference
YAOCr_demo.webm