This repository provides a small, end-to-end Retrieval-Augmented Generation (RAG) demo inspired by our work on DMP-Chef.
RAG is a practical approach for improving the accuracy and trustworthiness of LLM outputs by retrieving relevant context from a local knowledge base (e.g., web pages, PDFs, notes) and using that context during generation.
In this demo, we collect a small snapshot (~20 pages) from the FAIR Data Innovations Hub website, save it locally as .txt, build a FAISS vector index using Ollama embeddings, and compare No-RAG vs RAG answers side by side.
👉 Blog post: How to Quickly Set Up a RAG System: A Practical Guide Inspired by Our Work on DMP-Chef
👉 Code repo: fairdataihub/LLM-RAG-demo
The overall codebase is organized in alignment with the FAIR-BioRS guidelines. The Python code in the primary Jupyter notebook, main.ipynb, follows PEP 8 style conventions (including comments and docstrings). All required dependencies are listed in requirements.txt.
git clone https://github.com/fairdataihub/LLM-RAG-demo.git
cd LLM-RAG-demopython3 -m venv .venv
source .venv/bin/activatepython -m venv .venv
.\.venv\Scripts\Activate.ps1Upgrade pip (recommended):
python -m pip install --upgrade pippip install -r requirements.txtIf you don’t have a
requirements.txtyet, create one (recommended) and include:
- langchain, langchain-community, langchain-ollama
- faiss-cpu
- requests, beautifulsoup4
Install Ollama: https://ollama.com
Make sure Ollama is running:
ollama servePull the models used in this demo:
ollama pull llama3.2
ollama pull nomic-embed-textOptional: confirm Ollama is reachable
- Open:
http://localhost:11434/api/tagsin your browser
or
curl http://localhost:11434/api/tagsThis repo may include the demo as a script, notebook, or both. Use whichever exists in your repo structure.
If you have a notebook (RAG_Application.ipynb):
pip install notebook
jupyter notebookThen open the notebook and run cells in order:
- Crawl website pages into
data/fairdata_texts/ - Split into chunks
- Build FAISS index (saved locally)
- Compare No-RAG vs RAG outputs
After crawling:
data/fairdata_texts/*.txt
After indexing:
faiss_index_fairdata/(or similar folder)
When you run the comparison step, you should see side-by-side answers for:
- No-RAG (LLM answers without documents)
- RAG (LLM answers using retrieved context)
This work is licensed under the MIT License. See LICENSE for more information.
Use GitHub Issues to submit feedback, report problems, or suggest improvements.
You can also fork the repository and submit a Pull Request with your changes.
If you use this code, please cite this repository using the versioned DOI on Zenodo for the specific release you used (see CITATION.cff file)