Skip to content

fairdataihub/LLM-RAG-demo

Repository files navigation

Contributors Stars Issues License: MIT

LLM-RAG-demo

About

This repository provides a small, end-to-end Retrieval-Augmented Generation (RAG) demo inspired by our work on DMP-Chef.
RAG is a practical approach for improving the accuracy and trustworthiness of LLM outputs by retrieving relevant context from a local knowledge base (e.g., web pages, PDFs, notes) and using that context during generation.

In this demo, we collect a small snapshot (~20 pages) from the FAIR Data Innovations Hub website, save it locally as .txt, build a FAISS vector index using Ollama embeddings, and compare No-RAG vs RAG answers side by side.

👉 Blog post: How to Quickly Set Up a RAG System: A Practical Guide Inspired by Our Work on DMP-Chef
👉 Code repo: fairdataihub/LLM-RAG-demo


Standards followed

The overall codebase is organized in alignment with the FAIR-BioRS guidelines. The Python code in the primary Jupyter notebook, main.ipynb, follows PEP 8 style conventions (including comments and docstrings). All required dependencies are listed in requirements.txt.


1) Clone the repository

git clone https://github.com/fairdataihub/LLM-RAG-demo.git
cd LLM-RAG-demo

2) Create and activate a Python environment

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate

Windows (PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Upgrade pip (recommended):

python -m pip install --upgrade pip

3) Install dependencies

pip install -r requirements.txt

If you don’t have a requirements.txt yet, create one (recommended) and include:

  • langchain, langchain-community, langchain-ollama
  • faiss-cpu
  • requests, beautifulsoup4

4) Install and run Ollama

Install Ollama: https://ollama.com

Make sure Ollama is running:

ollama serve

Pull the models used in this demo:

ollama pull llama3.2
ollama pull nomic-embed-text

Optional: confirm Ollama is reachable

  • Open: http://localhost:11434/api/tags in your browser
    or
curl http://localhost:11434/api/tags

5) Run the demo (recommended order)

This repo may include the demo as a script, notebook, or both. Use whichever exists in your repo structure.

Run as a Jupyter notebook

If you have a notebook (RAG_Application.ipynb):

pip install notebook
jupyter notebook

Then open the notebook and run cells in order:

  1. Crawl website pages into data/fairdata_texts/
  2. Split into chunks
  3. Build FAISS index (saved locally)
  4. Compare No-RAG vs RAG outputs

6) Expected outputs

After crawling:

  • data/fairdata_texts/*.txt

After indexing:

  • faiss_index_fairdata/ (or similar folder)

When you run the comparison step, you should see side-by-side answers for:

  • No-RAG (LLM answers without documents)
  • RAG (LLM answers using retrieved context)

License

This work is licensed under the MIT License. See LICENSE for more information.


Feedback and contribution

Use GitHub Issues to submit feedback, report problems, or suggest improvements.
You can also fork the repository and submit a Pull Request with your changes.


How to cite

If you use this code, please cite this repository using the versioned DOI on Zenodo for the specific release you used (see CITATION.cff file)

About

Demo Jupyter notebook for implementing RAG with Ollama

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •