Simple, local and free RAG using Python, ChromaDB, Ollama server to receive TXT's and answer your questions.
RAG stand for Retrieval Augmented Generation here the idea is have a Ollama server running using docker in your local machine (instead of OpenAI, Gemini, or others online service), and use PDF locally to be considered during your questions.
Also there is a test script where the network will be validate the result for itself, using a prompt like this:
Expected Response: {expected_response}
Actual Response: {actual_response}
---
(Answer with 'true' or 'false') Does the actual response match the expected response?
cd ollama_server
docker compose up -d
curl http://localhost:11343 and you should see Ollama is running as answer
There are two ways to load models in Ollama server
-
Using API You can use
ollama_server/ollama_api.httpfile to se the API details anddonebd.rest-client-apiVSCode plugin to call the APIs -
Using CLI
docker exec -it ollama "ollama pull mistral && ollama run mistral"
By-the-way:
- Mistral was better in my tests, but there is a lot models you can test! ;)
- The models are huge and you should have a good machine to run, in my local it running but the load and answer takes a long time.
- If you don´t have a good machine or don´t wanna wait for a long time, considering to use a online model like Gemini, OpenAI or others.
python populate_database.py
or you can use --reset to clean the databse before to load documents
python populate_database.py --reset
python query_data.py "Who was the NBA winner in 2021?"
pytest
To fix: Edit an existing document you should rebuild the databse from scratch
- Python
- Langchain
- Ollama server
- ChromaDB
- Docker
- DevContainer - VSCode plugin
- API Client Lite - VSCode plugin
- Twitter: @mcostacurta
- Linkedin: @mcostacurta
- MIT

