LLM-RAG-demo

About

This repository provides a small, end-to-end Retrieval-Augmented Generation (RAG) demo inspired by our work on DMP-Chef.
RAG is a practical approach for improving the accuracy and trustworthiness of LLM outputs by retrieving relevant context from a local knowledge base (e.g., web pages, PDFs, notes) and using that context during generation.

In this demo, we collect a small snapshot (~20 pages) from the FAIR Data Innovations Hub website, save it locally as .txt, build a FAISS vector index using Ollama embeddings, and compare No-RAG vs RAG answers side by side.

👉 Blog post: How to Quickly Set Up a RAG System: A Practical Guide Inspired by Our Work on DMP-Chef
👉 Code repo: fairdataihub/LLM-RAG-demo

Standards followed

The overall codebase is organized in alignment with the FAIR-BioRS guidelines. The Python code in the primary Jupyter notebook, main.ipynb, follows PEP 8 style conventions (including comments and docstrings). All required dependencies are listed in requirements.txt.

1) Clone the repository

git clone https://github.com/fairdataihub/LLM-RAG-demo.git
cd LLM-RAG-demo

2) Create and activate a Python environment

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate

Windows (PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Upgrade pip (recommended):

python -m pip install --upgrade pip

3) Install dependencies

pip install -r requirements.txt

If you don’t have a requirements.txt yet, create one (recommended) and include:

langchain, langchain-community, langchain-ollama

faiss-cpu

requests, beautifulsoup4

4) Install and run Ollama

Install Ollama: https://ollama.com

Make sure Ollama is running:

ollama serve

Pull the models used in this demo:

ollama pull llama3.2
ollama pull nomic-embed-text

Optional: confirm Ollama is reachable

Open: http://localhost:11434/api/tags in your browser
or

curl http://localhost:11434/api/tags

5) Run the demo (recommended order)

This repo may include the demo as a script, notebook, or both. Use whichever exists in your repo structure.

Run as a Jupyter notebook

If you have a notebook (RAG_Application.ipynb):

pip install notebook
jupyter notebook

Then open the notebook and run cells in order:

Crawl website pages into data/fairdata_texts/
Split into chunks
Build FAISS index (saved locally)
Compare No-RAG vs RAG outputs

6) Expected outputs

After crawling:

data/fairdata_texts/*.txt

After indexing:

faiss_index_fairdata/ (or similar folder)

When you run the comparison step, you should see side-by-side answers for:

No-RAG (LLM answers without documents)
RAG (LLM answers using retrieved context)

License

This work is licensed under the MIT License. See LICENSE for more information.

Feedback and contribution

Use GitHub Issues to submit feedback, report problems, or suggest improvements.
You can also fork the repository and submit a Pull Request with your changes.

How to cite

If you use this code, please cite this repository using the versioned DOI on Zenodo for the specific release you used (see CITATION.cff file)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-RAG-demo

About

Standards followed

1) Clone the repository

2) Create and activate a Python environment

macOS / Linux

Windows (PowerShell)

3) Install dependencies

4) Install and run Ollama

5) Run the demo (recommended order)

Run as a Jupyter notebook

6) Expected outputs

License

Feedback and contribution

How to cite

About

Uh oh!

Releases 1

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data/fairdata_texts		data/fairdata_texts
faiss_index_fairdata		faiss_index_fairdata
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
codemeta.json		codemeta.json
main.ipynb		main.ipynb
requirements.txt		requirements.txt

License

fairdataihub/LLM-RAG-demo

Folders and files

Latest commit

History

Repository files navigation

LLM-RAG-demo

About

Standards followed

1) Clone the repository

2) Create and activate a Python environment

macOS / Linux

Windows (PowerShell)

3) Install dependencies

4) Install and run Ollama

5) Run the demo (recommended order)

Run as a Jupyter notebook

6) Expected outputs

License

Feedback and contribution

How to cite

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Uh oh!

Languages

Packages