This project evaluates different Retrieval-Augmented Generation (RAG) approaches on biomedical datasets (BioASQ, PubMedQA) using the OpenWebUI deployment of the University of Freiburg.
- Python 3.11
- Uni Freiburg VPN: An active VPN connection to the University of Freiburg is required to access the OpenWebUI deployment (https://openwebui.uni-freiburg.de).
- Docker: Required to run the Graph RAG pipeline (Neo4j).
- Clone the repository.
- Create and activate a virtual environment:
python3.11 -m venv venv source venv/bin/activate - Install dependencies:
pip install -r requirements.txt
The entire setup is centrally controlled via the config.yaml file. Here you can:
- Select Dataset: Change
active_datasettobioasqorpubmedqa. - Configure Experiments: Adjust paths and parameters for the various RAG pipelines.
Example:
active_dataset: "bioasq"
# ...The scripts automatically load the configuration regardless of the directory from which they are started.
A Neo4j database is required for the Graph RAG pipeline (scripts in the graph_rag/ folder). Start it with Docker using the following command to enable the necessary APOC plugins:
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
neo4j:latestNote: The password used for this setup is mygraph12345. Please use this password when connecting (default user is usually neo4j).
This setup is based on the LlamaIndex GraphRAG v2 Cookbook.
Execute the desired evaluation scripts. Ensure that config.yaml is correctly configured.
- Baseline (No RAG):
python run_ai_without_rag.py - Vector RAG:
python run_vector_rag_bioasq.py - Graph RAG:
- Extraction/Construction:
python graph_rag/run_graph_rag.py - Community Detection:
python graph_rag/run_community_local.py
- Extraction/Construction:
- Plotting:
python plotting/plot_retrieval_metrics.py
The files graph_rag/graph_extractor.py and graph_rag/graph_rag_store.py were adapted from the LlamaIndex GraphRAG v2 Cookbook to work with this project and the centralized configuration.