|
| 1 | +--- |
| 2 | +toc: true |
| 3 | +layout: post |
| 4 | +categories: [ katib ] |
| 5 | +description: "Leveraging Katib for efficient RAG optimization." |
| 6 | +comments: true |
| 7 | +title: "Optimizing RAG Pipelines with Katib: Hyperparameter Tuning for Better Retrieval & Generation" |
| 8 | +hide: false |
| 9 | +permalink: /katib/rag/ |
| 10 | +author: "Varsha Prasad Narsing (@varshaprasad96)" |
| 11 | +--- |
| 12 | + |
| 13 | +# Introduction |
| 14 | + |
| 15 | +As artificial intelligence and machine learning models become more |
| 16 | +sophisticated, optimising their performance remains a critical challenge. |
| 17 | +Kubeflow provides a robust component, [Katib][Katib], designed for |
| 18 | +hyperparameter optimization and neural architecture search. As a part of the |
| 19 | +Kubeflow ecosystem, Katib enables scalable, automated tuning of underlying |
| 20 | +machine learning models, reducing the manual effort required for parameter |
| 21 | +selection while improving model performance across diverse ML workflows. |
| 22 | + |
| 23 | +With Retrieval-Augmented Generation ([RAG][rag]) becoming an increasingly |
| 24 | +popular approach for improving search and retrieval quality, optimizing its |
| 25 | +parameters is essential to achieving high-quality results. RAG pipelines involve |
| 26 | +multiple hyperparameters that influence retrieval accuracy, hallucination |
| 27 | +reduction, and language generation quality. In this blog, we will explore how |
| 28 | +Katib can be leveraged to fine-tune a RAG pipeline, ensuring optimal performance |
| 29 | +by systematically adjusting key hyperparameters. |
| 30 | + |
| 31 | +# Let's Get Started! |
| 32 | + |
| 33 | +## STEP 1: Setup |
| 34 | + |
| 35 | +Since compute resources are scarcer than a perfectly labeled dataset :), we’ll |
| 36 | +use a lightweight [Kind cluster (Kubernetes in Docker)][kind_documentation] |
| 37 | +cluster to run this example locally. Rest assured, this setup can seamlessly |
| 38 | +scale to larger clusters by increasing the dataset size and the number of |
| 39 | +hyperparameters to tune. |
| 40 | + |
| 41 | +To get started, we'll first install the Katib control plane in our cluster by |
| 42 | +following the steps outlined [in the documentation][katib_installation]. |
| 43 | + |
| 44 | +## STEP 2: Implementing RAG pipeline |
| 45 | + |
| 46 | +In this implementation, we use a [retriever model][retriever_model_paper], which |
| 47 | +encodes queries and documents into vector representations to find the most |
| 48 | +relevant matches, to fetch relevant documents based on a query and a generator |
| 49 | +model to produce coherent text responses. |
| 50 | + |
| 51 | +### Implementation Details: |
| 52 | + |
| 53 | +1. Retriever: Sentence Transformer & FAISS (Facebook AI Similarity Search) Index |
| 54 | + - A SentenceTransformer model (paraphrase-MiniLM-L6-v2) encodes predefined |
| 55 | + documents into vector representations. |
| 56 | + - [FAISS][FAISS] is used to index these document embeddings and perform |
| 57 | + efficient similarity searches to retrieve the most relevant documents. |
| 58 | +2. Generator: Pre-trained GPT-2 Model |
| 59 | + - A Hugging Face GPT-2 text generation pipeline (which can be replaced with |
| 60 | + any other model) is used to generate responses based on the retrieved |
| 61 | + documents. I chose GPT-2 for this example as it is lightweight enough to |
| 62 | + run on my local machine while still generating coherent responses. |
| 63 | +3. Query Processing & Response Generation |
| 64 | + - When a query is submitted, the retriever encodes it and searches the FAISS |
| 65 | + index for the top-k most similar documents. |
| 66 | + - These retrieved documents are concatenated to form the input context, which |
| 67 | + is then passed to the GPT-2 model to generate a response. |
| 68 | +4. Evaluation: [BLEU][bleu] (Bilingual Evaluation Understudy) Score Calculation |
| 69 | + - To assess the quality of generated responses, we use the BLEU score, a |
| 70 | + popular metric for evaluating text generation. |
| 71 | + - The evaluate function takes a query, retrieves documents, generates a |
| 72 | + response, and compares it against a ground-truth reference to compute a |
| 73 | + BLEU score with smoothing functions from the nltk library. |
| 74 | + |
| 75 | +To run Katib, we will use the [Katib SDK][Katib_SDK], which provides a programmatic interface for defining and running |
| 76 | +hyperparameter tuning experiments in Kubeflow. |
| 77 | + |
| 78 | +Katib requires an [objective][katib_running_experiment] function, which: |
| 79 | + |
| 80 | +1. Defines what we want to optimize (e.g., BLEU score for text generation quality). |
| 81 | +2. Executes the RAG pipeline with different hyperparameter values. |
| 82 | +3. Returns an evaluation metric so Katib can compare different hyperparameter configurations. |
| 83 | + |
| 84 | +```python |
| 85 | +def objective(parameters): |
| 86 | + # Import dependencies inside the function (required for Katib) |
| 87 | + import numpy as np |
| 88 | + import faiss |
| 89 | + from sentence_transformers import SentenceTransformer |
| 90 | + from transformers import pipeline |
| 91 | + from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction |
| 92 | + |
| 93 | + # Function to fetch documents (Modify as needed) |
| 94 | + def fetch_documents(): |
| 95 | + """Returns a predefined list of documents or loads them from a file.""" |
| 96 | + return [ |
| 97 | + ... |
| 98 | + ] |
| 99 | + # OR, to load from a file: |
| 100 | + # with open("/path/to/documents.json", "r") as f: |
| 101 | + # return json.load(f) |
| 102 | + |
| 103 | + # Define the RAG pipeline within the function |
| 104 | + def rag_pipeline_execute(query, top_k, temperature): |
| 105 | + """Retrieves relevant documents and generates a response using GPT-2.""" |
| 106 | + |
| 107 | + # Initialize retriever |
| 108 | + retriever_model = SentenceTransformer("paraphrase-MiniLM-L6-v2") |
| 109 | + |
| 110 | + # Sample documents |
| 111 | + documents = fetch_documents() |
| 112 | + |
| 113 | + # Encode documents |
| 114 | + doc_embeddings = retriever_model.encode(documents) |
| 115 | + index = faiss.IndexFlatL2(doc_embeddings.shape[1]) |
| 116 | + index.add(np.array(doc_embeddings)) |
| 117 | + |
| 118 | + # Encode query and retrieve top-k documents |
| 119 | + query_embedding = retriever_model.encode([query]) |
| 120 | + distances, indices = index.search(query_embedding, top_k) |
| 121 | + retrieved_docs = [documents[i] for i in indices[0]] |
| 122 | + |
| 123 | + # Generate response using GPT-2 |
| 124 | + generator = pipeline("text-generation", model="gpt2", tokenizer="gpt2") |
| 125 | + context = " ".join(retrieved_docs) |
| 126 | + generated = generator(context, max_length=50, temperature=temperature, num_return_sequences=1) |
| 127 | + |
| 128 | + return generated[0]["generated_text"] |
| 129 | + |
| 130 | + # TODO: Provide queries and ground truth directly here or load them dynamically from a file/external volume. |
| 131 | + query = "" # Example: "Tell me about the Eiffel Tower." |
| 132 | + ground_truth = "" # Example: "The Eiffel Tower is a famous landmark in Paris." |
| 133 | + |
| 134 | + # Extract hyperparameters |
| 135 | + top_k = int(parameters["top_k"]) |
| 136 | + temperature = float(parameters["temperature"]) |
| 137 | + |
| 138 | + # Generate response |
| 139 | + response = rag_pipeline_execute(query, top_k, temperature) |
| 140 | + |
| 141 | + # Compute BLEU score |
| 142 | + reference = [ground_truth.split()] # Tokenized reference |
| 143 | + candidate = response.split() # Tokenized candidate response |
| 144 | + smoothie = SmoothingFunction().method1 |
| 145 | + bleu_score = sentence_bleu(reference, candidate, smoothing_function=smoothie) |
| 146 | + |
| 147 | + # Print BLEU score in Katib-compatible format |
| 148 | + print(f"BLEU={bleu_score}") |
| 149 | +``` |
| 150 | +_Note_: Make sure to return the result in the format of `<parameter>=<value>` |
| 151 | +for Katib's metrics collector to be able to utilize it. More ways to configure |
| 152 | +the output are available in [Katib Metrics |
| 153 | +Collector][Katib_metrics_collector] guide. |
| 154 | + |
| 155 | +## STEP 3: Run a Katib Experiment |
| 156 | + |
| 157 | +Once our pipeline is encapsulated within the objective function, we can configure Katib to optimize the `BLEU` score by |
| 158 | +tuning the hyperparameters: |
| 159 | + |
| 160 | +1. `top_k`: The number of documents retrieved (eg. between 10 and 20). |
| 161 | +2. `temperature`: The randomness of text generation (eg. between 0.5 and 1.0). |
| 162 | + |
| 163 | +# Define hyperparameter search space |
| 164 | +```python |
| 165 | +parameters = { |
| 166 | + "top_k": katib.search.int(min=10, max=20), |
| 167 | + "temperature": katib.search.double(min=0.5, max=1.0, step=0.1) |
| 168 | +} |
| 169 | +``` |
| 170 | + |
| 171 | +Let's submit the experiment! We'll use the [`tune` API ][tune_api] that will run multiple trials to find the optimal `top_k` |
| 172 | +and `temperature` values for our RAG pipeline. |
| 173 | + |
| 174 | +```python |
| 175 | +katib_client = katib.KatibClient(namespace="kubeflow") |
| 176 | + |
| 177 | +name = "rag-tuning-experiment" |
| 178 | +katib_client.tune( |
| 179 | + name=name, |
| 180 | + objective=objective, |
| 181 | + parameters=parameters, |
| 182 | + algorithm_name="grid", # Grid search for hyperparameter tuning |
| 183 | + objective_metric_name="BLEU", |
| 184 | + objective_type="maximize", |
| 185 | + objective_goal=0.8, |
| 186 | + max_trial_count=10, # Run up to 10 trials |
| 187 | + parallel_trial_count=2, # Run 2 trials in parallel |
| 188 | + resources_per_trial={"cpu": "1", "memory": "2Gi"}, |
| 189 | + base_image="python:3.10-slim", |
| 190 | + packages_to_install=[ |
| 191 | + "transformers==4.36.0", |
| 192 | + "sentence-transformers==2.2.2", |
| 193 | + "faiss-cpu==1.7.4", |
| 194 | + "numpy==1.23.5", |
| 195 | + "huggingface_hub==0.20.0", |
| 196 | + "nltk==3.9.1" |
| 197 | + ] |
| 198 | +) |
| 199 | +``` |
| 200 | + |
| 201 | +Once the experiment is submitted, we can see output indicating that Katib has started the trials: |
| 202 | + |
| 203 | +```commandline |
| 204 | +Experiment Trials status: 0 Trials, 0 Pending Trials, 0 Running Trials, 0 Succeeded Trials, 0 Failed Trials, 0 EarlyStopped Trials, 0 MetricsUnavailable Trials |
| 205 | +Current Optimal Trial: |
| 206 | + {'best_trial_name': None, |
| 207 | + 'observation': {'metrics': None}, |
| 208 | + 'parameter_assignments': None} |
| 209 | +Experiment conditions: |
| 210 | + [{'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), |
| 211 | + 'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), |
| 212 | + 'message': 'Experiment is created', |
| 213 | + 'reason': 'ExperimentCreated', |
| 214 | + 'status': 'True', |
| 215 | + 'type': 'Created'}] |
| 216 | +Waiting for Experiment: kubeflow/rag-tuning-experiment to reach Succeeded condition |
| 217 | +
|
| 218 | +..... |
| 219 | +
|
| 220 | +Experiment Trials status: 9 Trials, 0 Pending Trials, 2 Running Trials, 7 Succeeded Trials, 0 Failed Trials, 0 EarlyStopped Trials, 0 MetricsUnavailable Trials |
| 221 | +Current Optimal Trial: |
| 222 | + {'best_trial_name': 'rag-tuning-experiment-66tmh9g7', |
| 223 | + 'observation': {'metrics': [{'latest': '0.047040418725887996', |
| 224 | + 'max': '0.047040418725887996', |
| 225 | + 'min': '0.047040418725887996', |
| 226 | + 'name': 'BLEU'}]}, |
| 227 | + 'parameter_assignments': [{'name': 'top_k', 'value': '10'}, |
| 228 | + {'name': 'temperature', 'value': '0.6'}]} |
| 229 | +Experiment conditions: |
| 230 | + [{'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), |
| 231 | + 'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), |
| 232 | + 'message': 'Experiment is created', |
| 233 | + 'reason': 'ExperimentCreated', |
| 234 | + 'status': 'True', |
| 235 | + 'type': 'Created'}, {'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 52, tzinfo=tzutc()), |
| 236 | + 'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 52, tzinfo=tzutc()), |
| 237 | + 'message': 'Experiment is running', |
| 238 | + 'reason': 'ExperimentRunning', |
| 239 | + 'status': 'True', |
| 240 | + 'type': 'Running'}] |
| 241 | +Waiting for Experiment: kubeflow/rag-tuning-experiment to reach Succeeded condition |
| 242 | +``` |
| 243 | + |
| 244 | +We can also see the experiments and trials being run to search for the optimized parameter: |
| 245 | + |
| 246 | +```commandline |
| 247 | +kubectl get experiments.kubeflow.org -n kubeflow |
| 248 | +NAME TYPE STATUS AGE |
| 249 | +rag-tuning-experiment Running True 10m |
| 250 | +``` |
| 251 | + |
| 252 | +```commandline |
| 253 | +kubectl get trials --all-namespaces |
| 254 | +NAMESPACE NAME TYPE STATUS AGE |
| 255 | +kubeflow rag-tuning-experiment-7wskq9b9 Running True 10m |
| 256 | +kubeflow rag-tuning-experiment-cll6bt4z Running True 10m |
| 257 | +kubeflow rag-tuning-experiment-hzxrzq2t Running True 10m |
| 258 | +``` |
| 259 | + |
| 260 | +The list of completed trials and their results will be shown in the UI like |
| 261 | +below. Steps to access Katib UI are available [in the documentation][katib_ui]: |
| 262 | + |
| 263 | + |
| 264 | + |
| 265 | + |
| 266 | +# Conclusion |
| 267 | + |
| 268 | +In this experiment, we leveraged Kubeflow Katib to optimize a |
| 269 | +Retrieval-Augmented Generation (RAG) pipeline, systematically tuning key |
| 270 | +hyperparameters like top_k and temperature to enhance retrieval precision and |
| 271 | +generative response quality. |
| 272 | + |
| 273 | +For anyone working with RAG systems or hyperparameter optimization, Katib is a |
| 274 | +powerful tool—enabling scalable, efficient, and intelligent tuning of machine |
| 275 | +learning models! We hope this tutorial helps you streamline hyperparameter |
| 276 | +tuning and unlock new efficiencies in your ML workflows! |
| 277 | + |
| 278 | +[Katib]: https://www.kubeflow.org/docs/components/katib/ |
| 279 | +[kind_documentation]: https://kind.sigs.k8s.io/ |
| 280 | +[rag]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation |
| 281 | +[katib_installation]: https://www.kubeflow.org/docs/components/katib/installation/ |
| 282 | +[retriever_model_paper]: https://www.sciencedirect.com/topics/computer-science/retrieval-model |
| 283 | +[FAISS]: https://ai.meta.com/tools/faiss/ |
| 284 | +[bleu]: https://huggingface.co/spaces/evaluate-metric/bleu |
| 285 | +[Katib_metrics_collector]: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/#pull-based-metrics-collector |
| 286 | +[katib_ui]: https://www.kubeflow.org/docs/components/katib/user-guides/katib-ui/ |
| 287 | +[Katib_SDK]: https://www.kubeflow.org/docs/components/katib/installation/#installing-python-sdk |
| 288 | +[tune_api]: https://github.com/kubeflow/katib/blob/c18035e1041ca1b87ea7eb7c01cb81b5e2b922b3/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L178 |
| 289 | +[katib_running_experiment]: https://www.kubeflow.org/docs/components/katib/user-guides/hp-tuning/configure-experiment/#configuring-the-experiment |
0 commit comments