Skip to content

Commit 43201a7

Browse files
Blog: Add post on leveraging Katib for efficient RAG optimization. (#161)
Signed-off-by: Varsha Prasad Narsing <[email protected]>
1 parent c665e2a commit 43201a7

File tree

3 files changed

+289
-0
lines changed

3 files changed

+289
-0
lines changed
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
---
2+
toc: true
3+
layout: post
4+
categories: [ katib ]
5+
description: "Leveraging Katib for efficient RAG optimization."
6+
comments: true
7+
title: "Optimizing RAG Pipelines with Katib: Hyperparameter Tuning for Better Retrieval & Generation"
8+
hide: false
9+
permalink: /katib/rag/
10+
author: "Varsha Prasad Narsing (@varshaprasad96)"
11+
---
12+
13+
# Introduction
14+
15+
As artificial intelligence and machine learning models become more
16+
sophisticated, optimising their performance remains a critical challenge.
17+
Kubeflow provides a robust component, [Katib][Katib], designed for
18+
hyperparameter optimization and neural architecture search. As a part of the
19+
Kubeflow ecosystem, Katib enables scalable, automated tuning of underlying
20+
machine learning models, reducing the manual effort required for parameter
21+
selection while improving model performance across diverse ML workflows.
22+
23+
With Retrieval-Augmented Generation ([RAG][rag]) becoming an increasingly
24+
popular approach for improving search and retrieval quality, optimizing its
25+
parameters is essential to achieving high-quality results. RAG pipelines involve
26+
multiple hyperparameters that influence retrieval accuracy, hallucination
27+
reduction, and language generation quality. In this blog, we will explore how
28+
Katib can be leveraged to fine-tune a RAG pipeline, ensuring optimal performance
29+
by systematically adjusting key hyperparameters.
30+
31+
# Let's Get Started!
32+
33+
## STEP 1: Setup
34+
35+
Since compute resources are scarcer than a perfectly labeled dataset :), we’ll
36+
use a lightweight [Kind cluster (Kubernetes in Docker)][kind_documentation]
37+
cluster to run this example locally. Rest assured, this setup can seamlessly
38+
scale to larger clusters by increasing the dataset size and the number of
39+
hyperparameters to tune.
40+
41+
To get started, we'll first install the Katib control plane in our cluster by
42+
following the steps outlined [in the documentation][katib_installation].
43+
44+
## STEP 2: Implementing RAG pipeline
45+
46+
In this implementation, we use a [retriever model][retriever_model_paper], which
47+
encodes queries and documents into vector representations to find the most
48+
relevant matches, to fetch relevant documents based on a query and a generator
49+
model to produce coherent text responses.
50+
51+
### Implementation Details:
52+
53+
1. Retriever: Sentence Transformer & FAISS (Facebook AI Similarity Search) Index
54+
- A SentenceTransformer model (paraphrase-MiniLM-L6-v2) encodes predefined
55+
documents into vector representations.
56+
- [FAISS][FAISS] is used to index these document embeddings and perform
57+
efficient similarity searches to retrieve the most relevant documents.
58+
2. Generator: Pre-trained GPT-2 Model
59+
- A Hugging Face GPT-2 text generation pipeline (which can be replaced with
60+
any other model) is used to generate responses based on the retrieved
61+
documents. I chose GPT-2 for this example as it is lightweight enough to
62+
run on my local machine while still generating coherent responses.
63+
3. Query Processing & Response Generation
64+
- When a query is submitted, the retriever encodes it and searches the FAISS
65+
index for the top-k most similar documents.
66+
- These retrieved documents are concatenated to form the input context, which
67+
is then passed to the GPT-2 model to generate a response.
68+
4. Evaluation: [BLEU][bleu] (Bilingual Evaluation Understudy) Score Calculation
69+
- To assess the quality of generated responses, we use the BLEU score, a
70+
popular metric for evaluating text generation.
71+
- The evaluate function takes a query, retrieves documents, generates a
72+
response, and compares it against a ground-truth reference to compute a
73+
BLEU score with smoothing functions from the nltk library.
74+
75+
To run Katib, we will use the [Katib SDK][Katib_SDK], which provides a programmatic interface for defining and running
76+
hyperparameter tuning experiments in Kubeflow.
77+
78+
Katib requires an [objective][katib_running_experiment] function, which:
79+
80+
1. Defines what we want to optimize (e.g., BLEU score for text generation quality).
81+
2. Executes the RAG pipeline with different hyperparameter values.
82+
3. Returns an evaluation metric so Katib can compare different hyperparameter configurations.
83+
84+
```python
85+
def objective(parameters):
86+
# Import dependencies inside the function (required for Katib)
87+
import numpy as np
88+
import faiss
89+
from sentence_transformers import SentenceTransformer
90+
from transformers import pipeline
91+
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
92+
93+
# Function to fetch documents (Modify as needed)
94+
def fetch_documents():
95+
"""Returns a predefined list of documents or loads them from a file."""
96+
return [
97+
...
98+
]
99+
# OR, to load from a file:
100+
# with open("/path/to/documents.json", "r") as f:
101+
# return json.load(f)
102+
103+
# Define the RAG pipeline within the function
104+
def rag_pipeline_execute(query, top_k, temperature):
105+
"""Retrieves relevant documents and generates a response using GPT-2."""
106+
107+
# Initialize retriever
108+
retriever_model = SentenceTransformer("paraphrase-MiniLM-L6-v2")
109+
110+
# Sample documents
111+
documents = fetch_documents()
112+
113+
# Encode documents
114+
doc_embeddings = retriever_model.encode(documents)
115+
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
116+
index.add(np.array(doc_embeddings))
117+
118+
# Encode query and retrieve top-k documents
119+
query_embedding = retriever_model.encode([query])
120+
distances, indices = index.search(query_embedding, top_k)
121+
retrieved_docs = [documents[i] for i in indices[0]]
122+
123+
# Generate response using GPT-2
124+
generator = pipeline("text-generation", model="gpt2", tokenizer="gpt2")
125+
context = " ".join(retrieved_docs)
126+
generated = generator(context, max_length=50, temperature=temperature, num_return_sequences=1)
127+
128+
return generated[0]["generated_text"]
129+
130+
# TODO: Provide queries and ground truth directly here or load them dynamically from a file/external volume.
131+
query = "" # Example: "Tell me about the Eiffel Tower."
132+
ground_truth = "" # Example: "The Eiffel Tower is a famous landmark in Paris."
133+
134+
# Extract hyperparameters
135+
top_k = int(parameters["top_k"])
136+
temperature = float(parameters["temperature"])
137+
138+
# Generate response
139+
response = rag_pipeline_execute(query, top_k, temperature)
140+
141+
# Compute BLEU score
142+
reference = [ground_truth.split()] # Tokenized reference
143+
candidate = response.split() # Tokenized candidate response
144+
smoothie = SmoothingFunction().method1
145+
bleu_score = sentence_bleu(reference, candidate, smoothing_function=smoothie)
146+
147+
# Print BLEU score in Katib-compatible format
148+
print(f"BLEU={bleu_score}")
149+
```
150+
_Note_: Make sure to return the result in the format of `<parameter>=<value>`
151+
for Katib's metrics collector to be able to utilize it. More ways to configure
152+
the output are available in [Katib Metrics
153+
Collector][Katib_metrics_collector] guide.
154+
155+
## STEP 3: Run a Katib Experiment
156+
157+
Once our pipeline is encapsulated within the objective function, we can configure Katib to optimize the `BLEU` score by
158+
tuning the hyperparameters:
159+
160+
1. `top_k`: The number of documents retrieved (eg. between 10 and 20).
161+
2. `temperature`: The randomness of text generation (eg. between 0.5 and 1.0).
162+
163+
# Define hyperparameter search space
164+
```python
165+
parameters = {
166+
"top_k": katib.search.int(min=10, max=20),
167+
"temperature": katib.search.double(min=0.5, max=1.0, step=0.1)
168+
}
169+
```
170+
171+
Let's submit the experiment! We'll use the [`tune` API ][tune_api] that will run multiple trials to find the optimal `top_k`
172+
and `temperature` values for our RAG pipeline.
173+
174+
```python
175+
katib_client = katib.KatibClient(namespace="kubeflow")
176+
177+
name = "rag-tuning-experiment"
178+
katib_client.tune(
179+
name=name,
180+
objective=objective,
181+
parameters=parameters,
182+
algorithm_name="grid", # Grid search for hyperparameter tuning
183+
objective_metric_name="BLEU",
184+
objective_type="maximize",
185+
objective_goal=0.8,
186+
max_trial_count=10, # Run up to 10 trials
187+
parallel_trial_count=2, # Run 2 trials in parallel
188+
resources_per_trial={"cpu": "1", "memory": "2Gi"},
189+
base_image="python:3.10-slim",
190+
packages_to_install=[
191+
"transformers==4.36.0",
192+
"sentence-transformers==2.2.2",
193+
"faiss-cpu==1.7.4",
194+
"numpy==1.23.5",
195+
"huggingface_hub==0.20.0",
196+
"nltk==3.9.1"
197+
]
198+
)
199+
```
200+
201+
Once the experiment is submitted, we can see output indicating that Katib has started the trials:
202+
203+
```commandline
204+
Experiment Trials status: 0 Trials, 0 Pending Trials, 0 Running Trials, 0 Succeeded Trials, 0 Failed Trials, 0 EarlyStopped Trials, 0 MetricsUnavailable Trials
205+
Current Optimal Trial:
206+
{'best_trial_name': None,
207+
'observation': {'metrics': None},
208+
'parameter_assignments': None}
209+
Experiment conditions:
210+
[{'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()),
211+
'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()),
212+
'message': 'Experiment is created',
213+
'reason': 'ExperimentCreated',
214+
'status': 'True',
215+
'type': 'Created'}]
216+
Waiting for Experiment: kubeflow/rag-tuning-experiment to reach Succeeded condition
217+
218+
.....
219+
220+
Experiment Trials status: 9 Trials, 0 Pending Trials, 2 Running Trials, 7 Succeeded Trials, 0 Failed Trials, 0 EarlyStopped Trials, 0 MetricsUnavailable Trials
221+
Current Optimal Trial:
222+
{'best_trial_name': 'rag-tuning-experiment-66tmh9g7',
223+
'observation': {'metrics': [{'latest': '0.047040418725887996',
224+
'max': '0.047040418725887996',
225+
'min': '0.047040418725887996',
226+
'name': 'BLEU'}]},
227+
'parameter_assignments': [{'name': 'top_k', 'value': '10'},
228+
{'name': 'temperature', 'value': '0.6'}]}
229+
Experiment conditions:
230+
[{'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()),
231+
'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()),
232+
'message': 'Experiment is created',
233+
'reason': 'ExperimentCreated',
234+
'status': 'True',
235+
'type': 'Created'}, {'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 52, tzinfo=tzutc()),
236+
'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 52, tzinfo=tzutc()),
237+
'message': 'Experiment is running',
238+
'reason': 'ExperimentRunning',
239+
'status': 'True',
240+
'type': 'Running'}]
241+
Waiting for Experiment: kubeflow/rag-tuning-experiment to reach Succeeded condition
242+
```
243+
244+
We can also see the experiments and trials being run to search for the optimized parameter:
245+
246+
```commandline
247+
kubectl get experiments.kubeflow.org -n kubeflow
248+
NAME TYPE STATUS AGE
249+
rag-tuning-experiment Running True 10m
250+
```
251+
252+
```commandline
253+
kubectl get trials --all-namespaces
254+
NAMESPACE NAME TYPE STATUS AGE
255+
kubeflow rag-tuning-experiment-7wskq9b9 Running True 10m
256+
kubeflow rag-tuning-experiment-cll6bt4z Running True 10m
257+
kubeflow rag-tuning-experiment-hzxrzq2t Running True 10m
258+
```
259+
260+
The list of completed trials and their results will be shown in the UI like
261+
below. Steps to access Katib UI are available [in the documentation][katib_ui]:
262+
263+
![completed_runs](/images/2025-02-21-katib-rag-optimization/katib_experiment_run.jpeg)
264+
![trial details](/images/2025-02-21-katib-rag-optimization/katib_ui.jpeg)
265+
266+
# Conclusion
267+
268+
In this experiment, we leveraged Kubeflow Katib to optimize a
269+
Retrieval-Augmented Generation (RAG) pipeline, systematically tuning key
270+
hyperparameters like top_k and temperature to enhance retrieval precision and
271+
generative response quality.
272+
273+
For anyone working with RAG systems or hyperparameter optimization, Katib is a
274+
powerful tool—enabling scalable, efficient, and intelligent tuning of machine
275+
learning models! We hope this tutorial helps you streamline hyperparameter
276+
tuning and unlock new efficiencies in your ML workflows!
277+
278+
[Katib]: https://www.kubeflow.org/docs/components/katib/
279+
[kind_documentation]: https://kind.sigs.k8s.io/
280+
[rag]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
281+
[katib_installation]: https://www.kubeflow.org/docs/components/katib/installation/
282+
[retriever_model_paper]: https://www.sciencedirect.com/topics/computer-science/retrieval-model
283+
[FAISS]: https://ai.meta.com/tools/faiss/
284+
[bleu]: https://huggingface.co/spaces/evaluate-metric/bleu
285+
[Katib_metrics_collector]: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/#pull-based-metrics-collector
286+
[katib_ui]: https://www.kubeflow.org/docs/components/katib/user-guides/katib-ui/
287+
[Katib_SDK]: https://www.kubeflow.org/docs/components/katib/installation/#installing-python-sdk
288+
[tune_api]: https://github.com/kubeflow/katib/blob/c18035e1041ca1b87ea7eb7c01cb81b5e2b922b3/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L178
289+
[katib_running_experiment]: https://www.kubeflow.org/docs/components/katib/user-guides/hp-tuning/configure-experiment/#configuring-the-experiment
68.9 KB
Loading
61.3 KB
Loading

0 commit comments

Comments
 (0)