Ollama-RAG-hpp

A lightweight Retrieval-Augmented Generation (RAG) system in C++ using Ollama-hpp for local language model inference and embedding-based retrieval.

Features

Uses Ollama-hpp, a modern C++ wrapper for the Ollama API.
Implements vector-based text retrieval using cosine similarity.
Supports embedding generation and retrieval of relevant knowledge before LLM interaction.
Provides streaming chat responses for real-time AI interactions.

Why C++ for RAG?

While most Retrieval-Augmented Generation (RAG) implementations are written in Python, C++ offers significant advantages:

Performance & Efficiency – C++ eliminates Python's runtime overhead, making inference and retrieval faster.
Lower Memory Usage – Direct memory control avoids Python's garbage collection inefficiencies.
Real-time & Embedded AI – Ideal for applications in gaming, robotics, and edge computing.
Scalability – Faster execution makes it more suitable for high-load production environments.
No Python Dependency – Runs natively without needing Python-based AI stacks.

With Ollama-RAG-hpp, C++ developers can now leverage the power of local LLMs with efficient retrieval, proving that AI is not just for Python.

Installation

Requirements:

C++11 or later.
Ollama server running (sudo systemctl status ollama)
A compatible LLM and embedding model (e.g., llama3.2-latest, bge-base-en-v1.5-gguf)

Prerequisite:

Install Ollama.
LLM ollama pull llama3.2 (Or modify file for different model.)
A dedicated embedder. ollama pull hf.co/CompendiumLabs/bge-base-en-v1.5-gguf Or any LLM with embedding ability.

Setup:

Clone the repository:
Download Ollama-hpp from https://github.com/jmont-dev/ollama-hpp and place it in the project folder:
Compile the program:

Run it

Usage:

Load a dataset: Place a text file (cat-facts.txt) in the same directory. The program will embed and store the text for retrieval.
Ask a question: When prompted, enter a query. The system will retrieve relevant information and generate a response using Ollama.

Example

Ask me a question (quit to exit): How fast do cat run?
Retrieved knowledge:
  - (similarity: 0.803109) A cat can travel at a top speed of approximately 31 mph (49 km) over a short distance.
Chatbot respond:
Cats can travel at a top speed of approximately 31 mph (49 km) over a short distance, typically when they are chasing prey or running to catch up with something. However, their average running speed is usually much slower, around 5-8 mph (8-13 km/h), depending on the individual cat and its age, size, and health.

License

This project is licensed under the MIT License—you are free to use, modify, and distribute it, but attribution is appreciated.

Future Improvements

Support for different embedding distance metrics (Euclidean, Manhattan, etc.)
Multithreading for faster retrieval and generation
Integration with a vector database (e.g., FAISS, SQLite)

This project is among the first C++ RAG implementations—contributions and feedback are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
bin		bin
include		include
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
example.cpp		example.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ollama-RAG-hpp

Features

Why C++ for RAG?

Installation

Requirements:

Prerequisite:

Setup:

Run it

Usage:

Example

License

Future Improvements

About

Uh oh!

Releases

Packages

Languages

License

JG-Adams/Ollama-RAG-hpp

Folders and files

Latest commit

History

Repository files navigation

Ollama-RAG-hpp

Features

Why C++ for RAG?

Installation

Requirements:

Prerequisite:

Setup:

Run it

Usage:

Example

License

Future Improvements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages