This repository demonstrates how to set up a Retrieval-Augmented Generation (RAG) pipeline on an IBM Power LPAR environment.
It includes environment setup, model integration, and vector database management for AI inference using IBM Granite models.
- Power LPAR – Target environment for deployment
- Micromamba + Python – Lightweight package and environment management
- Gradio – Web-based UI for chatbot interaction
- ChromaDB – Vector database for document embeddings
- HuggingFace Granite (4-bit, GGUF) – Large Language Model for inference
- LangChain + Docling – Document chunking and RAG integration
- Optional: Ansible – Automation support
Install required system packages:
sudo dnf install git mesa-libGL bzip2 gcc g++ zlib-devel vim gcc-toolset-12Clone the project repository:
git clone https://github.com/HenrikMader/RAG_public
cd RAG_publiccd ~
curl -Ls https://micro.mamba.pm/api/micromamba/linux-ppc64le/latest | tar -xvj bin/micromamba
eval "$(micromamba shell hook --shell bash)"
micromamba --versionmicromamba create -n rag_env python=3.11
micromamba activate rag_envmicromamba install -c rocketce -c defaults pytorch-cpu pyyaml httptools onnxruntime "pandas<1.6.0" tokenizersThen install additional packages via pip:
pip install -U --extra-index-url https://repo.fury.io/mgiessing --prefer-binary streamlit chromadb transformers psutil langchain sentence_transformers gradio==3.50.2 llama-cpp-python scikit-learn docling einops openaiCheck installed packages:
pip listWith the script
``converted_docling.py```
you can convert a folder which contains pdf files to markdown files.
python converted_docling.pyWhen prompted with the path for your pdf files, take in the absolute path to a folder which contains all of your PDF files. The output folder for markdown does not need to exist.
-
Navigate to the project directory:
cd ~/RAG_public rm -rf db
-
Populate the database:
python chromaDB_md.py
Insert the pull path to the converted Markdown files when prompted. Afterwards, you need to insert a name for the collection that you are creating.
This process may take up several minutes.
Run Ollama (based on llama.cpp) as a container:
podman run -d --name ollama --replace -p 11434:11434 -v ollama:/root/.ollama quay.io/anchinna/ollama:v3
podman exec -it ollama /opt/ollama/ollama pull granite4:tiny-hStart the chatbot application:
streamlit run streamlit_adv.py --server.port 7680Note: Old Gradio Frontend (Does not need to be started if you want to use new frontend (streamlit))
python run_model_openai_backend.pyAccess the web UI:
http://<IP_of_your_machine>:7680
Stop the chatbot (Ctrl + C) and start the admin interface:
python admin_database.pyAccess the admin UI at:
http://<IP_of_your_machine>:8082
From here, you can:
- List collections
- Add or remove Markdown files
- View chunk statistics per collection
Example: Add a new document to the Power10 collection:
./files_for_database/db_files_md/IBM Power E1050 Technical Overview and Introduction - redp5684.mdAfter ingestion, restart the chatbot to query new data.
Re-launch the chatbot app:
streamlit run streamlit_adv.py --server.port 7680Then open:
http://<IP_of_your_machine>:7680
Now you can ask questions about all loaded documents (e.g., Power10, Power9, E1050).