generated from fastai/fastpages
-
Notifications
You must be signed in to change notification settings - Fork 56
Blog: Add post on leveraging Katib for efficient RAG optimization. #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
google-oss-prow
merged 1 commit into
kubeflow:master
from
varshaprasad96:post/rag_katib
Mar 14, 2025
+289
−0
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,289 @@ | ||
| --- | ||
| toc: true | ||
| layout: post | ||
| categories: [ katib ] | ||
| description: "Leveraging Katib for efficient RAG optimization." | ||
| comments: true | ||
| title: "Optimizing RAG Pipelines with Katib: Hyperparameter Tuning for Better Retrieval & Generation" | ||
| hide: false | ||
| permalink: /katib/rag/ | ||
| author: "Varsha Prasad Narsing (@varshaprasad96)" | ||
| --- | ||
|
|
||
| # Introduction | ||
|
|
||
| As artificial intelligence and machine learning models become more | ||
| sophisticated, optimising their performance remains a critical challenge. | ||
| Kubeflow provides a robust component, [Katib][Katib], designed for | ||
| hyperparameter optimization and neural architecture search. As a part of the | ||
| Kubeflow ecosystem, Katib enables scalable, automated tuning of underlying | ||
| machine learning models, reducing the manual effort required for parameter | ||
| selection while improving model performance across diverse ML workflows. | ||
|
|
||
| With Retrieval-Augmented Generation ([RAG][rag]) becoming an increasingly | ||
| popular approach for improving search and retrieval quality, optimizing its | ||
| parameters is essential to achieving high-quality results. RAG pipelines involve | ||
| multiple hyperparameters that influence retrieval accuracy, hallucination | ||
| reduction, and language generation quality. In this blog, we will explore how | ||
| Katib can be leveraged to fine-tune a RAG pipeline, ensuring optimal performance | ||
| by systematically adjusting key hyperparameters. | ||
|
|
||
| # Let's Get Started! | ||
|
|
||
| ## STEP 1: Setup | ||
|
|
||
| Since compute resources are scarcer than a perfectly labeled dataset :), we’ll | ||
| use a lightweight [Kind cluster (Kubernetes in Docker)][kind_documentation] | ||
| cluster to run this example locally. Rest assured, this setup can seamlessly | ||
| scale to larger clusters by increasing the dataset size and the number of | ||
| hyperparameters to tune. | ||
|
|
||
| To get started, we'll first install the Katib control plane in our cluster by | ||
| following the steps outlined [in the documentation][katib_installation]. | ||
|
|
||
| ## STEP 2: Implementing RAG pipeline | ||
|
|
||
| In this implementation, we use a [retriever model][retriever_model_paper], which | ||
| encodes queries and documents into vector representations to find the most | ||
| relevant matches, to fetch relevant documents based on a query and a generator | ||
| model to produce coherent text responses. | ||
|
|
||
| ### Implementation Details: | ||
|
|
||
| 1. Retriever: Sentence Transformer & FAISS (Facebook AI Similarity Search) Index | ||
| - A SentenceTransformer model (paraphrase-MiniLM-L6-v2) encodes predefined | ||
| documents into vector representations. | ||
| - [FAISS][FAISS] is used to index these document embeddings and perform | ||
| efficient similarity searches to retrieve the most relevant documents. | ||
| 2. Generator: Pre-trained GPT-2 Model | ||
| - A Hugging Face GPT-2 text generation pipeline (which can be replaced with | ||
| any other model) is used to generate responses based on the retrieved | ||
| documents. I chose GPT-2 for this example as it is lightweight enough to | ||
| run on my local machine while still generating coherent responses. | ||
| 3. Query Processing & Response Generation | ||
| - When a query is submitted, the retriever encodes it and searches the FAISS | ||
| index for the top-k most similar documents. | ||
| - These retrieved documents are concatenated to form the input context, which | ||
| is then passed to the GPT-2 model to generate a response. | ||
| 4. Evaluation: [BLEU][bleu] (Bilingual Evaluation Understudy) Score Calculation | ||
| - To assess the quality of generated responses, we use the BLEU score, a | ||
| popular metric for evaluating text generation. | ||
| - The evaluate function takes a query, retrieves documents, generates a | ||
| response, and compares it against a ground-truth reference to compute a | ||
| BLEU score with smoothing functions from the nltk library. | ||
|
|
||
| To run Katib, we will use the [Katib SDK][Katib_SDK], which provides a programmatic interface for defining and running | ||
| hyperparameter tuning experiments in Kubeflow. | ||
|
|
||
| Katib requires an [objective][katib_running_experiment] function, which: | ||
|
|
||
| 1. Defines what we want to optimize (e.g., BLEU score for text generation quality). | ||
| 2. Executes the RAG pipeline with different hyperparameter values. | ||
| 3. Returns an evaluation metric so Katib can compare different hyperparameter configurations. | ||
|
|
||
| ```python | ||
| def objective(parameters): | ||
| # Import dependencies inside the function (required for Katib) | ||
| import numpy as np | ||
| import faiss | ||
| from sentence_transformers import SentenceTransformer | ||
| from transformers import pipeline | ||
| from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction | ||
|
|
||
| # Function to fetch documents (Modify as needed) | ||
| def fetch_documents(): | ||
| """Returns a predefined list of documents or loads them from a file.""" | ||
| return [ | ||
| ... | ||
| ] | ||
| # OR, to load from a file: | ||
| # with open("/path/to/documents.json", "r") as f: | ||
| # return json.load(f) | ||
|
|
||
| # Define the RAG pipeline within the function | ||
| def rag_pipeline_execute(query, top_k, temperature): | ||
| """Retrieves relevant documents and generates a response using GPT-2.""" | ||
|
|
||
| # Initialize retriever | ||
| retriever_model = SentenceTransformer("paraphrase-MiniLM-L6-v2") | ||
|
|
||
| # Sample documents | ||
| documents = fetch_documents() | ||
|
|
||
| # Encode documents | ||
| doc_embeddings = retriever_model.encode(documents) | ||
| index = faiss.IndexFlatL2(doc_embeddings.shape[1]) | ||
| index.add(np.array(doc_embeddings)) | ||
|
|
||
| # Encode query and retrieve top-k documents | ||
| query_embedding = retriever_model.encode([query]) | ||
| distances, indices = index.search(query_embedding, top_k) | ||
| retrieved_docs = [documents[i] for i in indices[0]] | ||
|
|
||
| # Generate response using GPT-2 | ||
| generator = pipeline("text-generation", model="gpt2", tokenizer="gpt2") | ||
| context = " ".join(retrieved_docs) | ||
| generated = generator(context, max_length=50, temperature=temperature, num_return_sequences=1) | ||
|
|
||
| return generated[0]["generated_text"] | ||
|
|
||
| # TODO: Provide queries and ground truth directly here or load them dynamically from a file/external volume. | ||
| query = "" # Example: "Tell me about the Eiffel Tower." | ||
| ground_truth = "" # Example: "The Eiffel Tower is a famous landmark in Paris." | ||
|
|
||
| # Extract hyperparameters | ||
| top_k = int(parameters["top_k"]) | ||
| temperature = float(parameters["temperature"]) | ||
|
|
||
| # Generate response | ||
| response = rag_pipeline_execute(query, top_k, temperature) | ||
|
|
||
| # Compute BLEU score | ||
| reference = [ground_truth.split()] # Tokenized reference | ||
| candidate = response.split() # Tokenized candidate response | ||
| smoothie = SmoothingFunction().method1 | ||
| bleu_score = sentence_bleu(reference, candidate, smoothing_function=smoothie) | ||
|
|
||
| # Print BLEU score in Katib-compatible format | ||
| print(f"BLEU={bleu_score}") | ||
| ``` | ||
| _Note_: Make sure to return the result in the format of `<parameter>=<value>` | ||
| for Katib's metrics collector to be able to utilize it. More ways to configure | ||
| the output are available in [Katib Metrics | ||
| Collector][Katib_metrics_collector] guide. | ||
|
|
||
| ## STEP 3: Run a Katib Experiment | ||
|
|
||
| Once our pipeline is encapsulated within the objective function, we can configure Katib to optimize the `BLEU` score by | ||
| tuning the hyperparameters: | ||
|
|
||
| 1. `top_k`: The number of documents retrieved (eg. between 10 and 20). | ||
| 2. `temperature`: The randomness of text generation (eg. between 0.5 and 1.0). | ||
|
|
||
| # Define hyperparameter search space | ||
| ```python | ||
| parameters = { | ||
| "top_k": katib.search.int(min=10, max=20), | ||
| "temperature": katib.search.double(min=0.5, max=1.0, step=0.1) | ||
| } | ||
| ``` | ||
|
|
||
| Let's submit the experiment! We'll use the [`tune` API ][tune_api] that will run multiple trials to find the optimal `top_k` | ||
| and `temperature` values for our RAG pipeline. | ||
|
|
||
| ```python | ||
| katib_client = katib.KatibClient(namespace="kubeflow") | ||
|
|
||
| name = "rag-tuning-experiment" | ||
| katib_client.tune( | ||
| name=name, | ||
| objective=objective, | ||
| parameters=parameters, | ||
| algorithm_name="grid", # Grid search for hyperparameter tuning | ||
| objective_metric_name="BLEU", | ||
| objective_type="maximize", | ||
| objective_goal=0.8, | ||
| max_trial_count=10, # Run up to 10 trials | ||
| parallel_trial_count=2, # Run 2 trials in parallel | ||
| resources_per_trial={"cpu": "1", "memory": "2Gi"}, | ||
| base_image="python:3.10-slim", | ||
| packages_to_install=[ | ||
| "transformers==4.36.0", | ||
| "sentence-transformers==2.2.2", | ||
| "faiss-cpu==1.7.4", | ||
| "numpy==1.23.5", | ||
| "huggingface_hub==0.20.0", | ||
| "nltk==3.9.1" | ||
| ] | ||
| ) | ||
| ``` | ||
|
|
||
| Once the experiment is submitted, we can see output indicating that Katib has started the trials: | ||
|
|
||
| ```commandline | ||
| Experiment Trials status: 0 Trials, 0 Pending Trials, 0 Running Trials, 0 Succeeded Trials, 0 Failed Trials, 0 EarlyStopped Trials, 0 MetricsUnavailable Trials | ||
| Current Optimal Trial: | ||
| {'best_trial_name': None, | ||
| 'observation': {'metrics': None}, | ||
| 'parameter_assignments': None} | ||
| Experiment conditions: | ||
| [{'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), | ||
| 'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), | ||
| 'message': 'Experiment is created', | ||
| 'reason': 'ExperimentCreated', | ||
| 'status': 'True', | ||
| 'type': 'Created'}] | ||
| Waiting for Experiment: kubeflow/rag-tuning-experiment to reach Succeeded condition | ||
|
|
||
| ..... | ||
|
|
||
| Experiment Trials status: 9 Trials, 0 Pending Trials, 2 Running Trials, 7 Succeeded Trials, 0 Failed Trials, 0 EarlyStopped Trials, 0 MetricsUnavailable Trials | ||
| Current Optimal Trial: | ||
| {'best_trial_name': 'rag-tuning-experiment-66tmh9g7', | ||
| 'observation': {'metrics': [{'latest': '0.047040418725887996', | ||
| 'max': '0.047040418725887996', | ||
| 'min': '0.047040418725887996', | ||
| 'name': 'BLEU'}]}, | ||
| 'parameter_assignments': [{'name': 'top_k', 'value': '10'}, | ||
| {'name': 'temperature', 'value': '0.6'}]} | ||
| Experiment conditions: | ||
| [{'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), | ||
| 'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 32, tzinfo=tzutc()), | ||
| 'message': 'Experiment is created', | ||
| 'reason': 'ExperimentCreated', | ||
| 'status': 'True', | ||
| 'type': 'Created'}, {'last_transition_time': datetime.datetime(2025, 3, 13, 19, 40, 52, tzinfo=tzutc()), | ||
| 'last_update_time': datetime.datetime(2025, 3, 13, 19, 40, 52, tzinfo=tzutc()), | ||
| 'message': 'Experiment is running', | ||
| 'reason': 'ExperimentRunning', | ||
| 'status': 'True', | ||
| 'type': 'Running'}] | ||
| Waiting for Experiment: kubeflow/rag-tuning-experiment to reach Succeeded condition | ||
| ``` | ||
|
|
||
| We can also see the experiments and trials being run to search for the optimized parameter: | ||
|
|
||
| ```commandline | ||
| kubectl get experiments.kubeflow.org -n kubeflow | ||
| NAME TYPE STATUS AGE | ||
| rag-tuning-experiment Running True 10m | ||
| ``` | ||
|
|
||
| ```commandline | ||
| kubectl get trials --all-namespaces | ||
| NAMESPACE NAME TYPE STATUS AGE | ||
| kubeflow rag-tuning-experiment-7wskq9b9 Running True 10m | ||
| kubeflow rag-tuning-experiment-cll6bt4z Running True 10m | ||
| kubeflow rag-tuning-experiment-hzxrzq2t Running True 10m | ||
| ``` | ||
|
|
||
| The list of completed trials and their results will be shown in the UI like | ||
| below. Steps to access Katib UI are available [in the documentation][katib_ui]: | ||
|
|
||
|  | ||
|  | ||
|
|
||
| # Conclusion | ||
|
|
||
| In this experiment, we leveraged Kubeflow Katib to optimize a | ||
| Retrieval-Augmented Generation (RAG) pipeline, systematically tuning key | ||
| hyperparameters like top_k and temperature to enhance retrieval precision and | ||
| generative response quality. | ||
|
|
||
| For anyone working with RAG systems or hyperparameter optimization, Katib is a | ||
| powerful tool—enabling scalable, efficient, and intelligent tuning of machine | ||
| learning models! We hope this tutorial helps you streamline hyperparameter | ||
| tuning and unlock new efficiencies in your ML workflows! | ||
|
|
||
| [Katib]: https://www.kubeflow.org/docs/components/katib/ | ||
| [kind_documentation]: https://kind.sigs.k8s.io/ | ||
| [rag]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation | ||
| [katib_installation]: https://www.kubeflow.org/docs/components/katib/installation/ | ||
| [retriever_model_paper]: https://www.sciencedirect.com/topics/computer-science/retrieval-model | ||
| [FAISS]: https://ai.meta.com/tools/faiss/ | ||
| [bleu]: https://huggingface.co/spaces/evaluate-metric/bleu | ||
| [Katib_metrics_collector]: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/#pull-based-metrics-collector | ||
| [katib_ui]: https://www.kubeflow.org/docs/components/katib/user-guides/katib-ui/ | ||
| [Katib_SDK]: https://www.kubeflow.org/docs/components/katib/installation/#installing-python-sdk | ||
| [tune_api]: https://github.com/kubeflow/katib/blob/c18035e1041ca1b87ea7eb7c01cb81b5e2b922b3/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L178 | ||
| [katib_running_experiment]: https://www.kubeflow.org/docs/components/katib/user-guides/hp-tuning/configure-experiment/#configuring-the-experiment | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.