fix: eliminate redundant RAG model reloading that defeated paralleliz… by ARYANPATEL-BIT · Pull Request #382 · kubeedge/ianvs

ARYANPATEL-BIT · 2026-04-10T19:52:38Z

Overview

This PR resolves a critical architectural issue in the Government RAG example where redundant model initialization inside a mutex completely negated parallel execution.

Problem

The process_query() function instantiated a new GovernmentRAG object for nearly every query. Each instantiation:

Reloaded the bge-m3 embedding model
Reinitialized ChromaDB connections

This logic was executed inside a self.gpu_lock, causing:

Thread serialization: All threads in ThreadPoolExecutor(max_workers=4) were forced to wait for the lock
Severe performance degradation: Expensive model loading occurred repeatedly per query
Resource inefficiency: Excessive VRAM usage, memory fragmentation, and unnecessary I/O overhead

As a result, the system behaved like a single-threaded pipeline despite being designed for parallel execution.

Solution

This PR aligns the implementation with a standard single-instance retrieval architecture:

Single initialization
- GovernmentRAG is initialized once during preprocess()
- The same instance is reused across all queries
Optimized lock scope
- gpu_lock now only protects the vector similarity search (self.rag.query())
- Removes unnecessary blocking around model setup
True parallelism for LLM calls
- get_model_response() is moved outside the lock
- Enables concurrent API calls and response generation
Fail-safe initialization
- Adds a fallback initialization in predict() if preprocess() is skipped

Impact

Restores actual multi-threaded performance
Eliminates redundant model reloads
Reduces GPU/VRAM pressure and prevents memory fragmentation
Significantly improves benchmark execution speed and stability

Summary

This change removes a fundamental bottleneck that was silently disabling parallelism, ensuring the benchmarking pipeline performs as intended under concurrent workloads.

##Fixes: #380 381

kubeedge-bot · 2026-04-10T19:52:52Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ARYANPATEL-BIT
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ARYANPATEL-BIT · 2026-04-10T19:53:06Z

/assign @jaypume

gemini-code-assist

Code Review

This pull request optimizes the RAG benchmarking process by initializing the model once during preprocessing to improve parallel performance. However, the reviewer noted that this change removes the logic for handling different retrieval scopes (global, local, and other), which breaks the benchmarking comparisons. Additional feedback pointed out a hardcoded absolute path that hinders portability and suggested consolidating redundant initialization logic.

gemini-code-assist · 2026-04-10T19:54:18Z

examples/government_rag/singletask_learning_bench/testalgorithms/basemodel.py

            else:
+                # Run the embedding-based retrieval under the GPU lock.
+                # This is a quick vector search, not a full model reload.
                with self.gpu_lock:
-                    if rag_type == "[global]":
-                        if self.rag is None:
-                            self.rag = GovernmentRAG(model_name="/home/icyfeather/models/bge-m3", device="cuda", persist_directory="./chroma_db")
-                    elif rag_type == "[local]":
-                        self.rag = GovernmentRAG(model_name="/home/icyfeather/models/bge-m3", device="cuda", persist_directory="./chroma_db", provinces=[location])
-                    else:  # [other]
-                        all_locations = set(self.all_locations)
-                        self.rag = GovernmentRAG(model_name="/home/icyfeather/models/bge-m3", device="cuda", persist_directory="./chroma_db", provinces=list(all_locations - set([location])))
-
                    relevant_docs = self.rag.query(query, k=1)


The logic for handling different rag_type values ([local], [other]) has been removed. Previously, these types triggered the creation of a GovernmentRAG instance with specific provinces filters. Now, all RAG queries use the same global instance initialized in preprocess(), which ignores the location and rag_type parameters for retrieval. This breaks the benchmarking logic intended to compare different retrieval scopes (global vs local vs other).

gemini-code-assist · 2026-04-10T19:54:18Z

examples/government_rag/singletask_learning_bench/testalgorithms/basemodel.py

+        # ChromaDB each time. That made the ThreadPoolExecutor useless
+        # since all threads were serialized by the gpu_lock while waiting
+        # for the slow model loading to finish.
        self.rag = GovernmentRAG(model_name="/home/icyfeather/models/bge-m3", device="cuda", persist_directory="./chroma_db")


The model_name parameter is set to a hardcoded absolute path (/home/icyfeather/models/bge-m3). This makes the code non-portable and will cause it to fail on any system where this specific path does not exist. Consider using a configuration parameter or a relative path.

gemini-code-assist · 2026-04-10T19:54:18Z

examples/government_rag/singletask_learning_bench/testalgorithms/basemodel.py

+        if self.rag is None:
+            LOGGER.info("RAG not initialized yet, loading now...")
+            self.rag = GovernmentRAG(
+                model_name="/home/icyfeather/models/bge-m3",
+                device="cuda",
+                persist_directory="./chroma_db"
+            )


The initialization logic for GovernmentRAG is duplicated here. To ensure consistency and avoid repeating hardcoded parameters, you should call self.preprocess() instead.

if self.rag is None: LOGGER.info("RAG not initialized yet, loading now...") self.preprocess()

…ation Signed-off-by: Aryan Patel <aryan.patel7291@gmail.com>

kubeedge-bot requested review from MooreZheng and Poorunga April 10, 2026 19:52

kubeedge-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 10, 2026

kubeedge-bot assigned jaypume Apr 10, 2026

gemini-code-assist bot reviewed Apr 10, 2026

View reviewed changes

fix: eliminate redundant RAG model reloading that defeated paralleliz…

462a739

…ation Signed-off-by: Aryan Patel <aryan.patel7291@gmail.com>

ARYANPATEL-BIT force-pushed the perf/gov-rag-model-loading branch from f04e3a9 to 462a739 Compare April 10, 2026 20:06

MooreZheng requested review from IcyFeather233 and hsj576 and removed request for Poorunga April 13, 2026 11:01

MooreZheng unassigned jaypume Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: eliminate redundant RAG model reloading that defeated paralleliz…#382

fix: eliminate redundant RAG model reloading that defeated paralleliz…#382
ARYANPATEL-BIT wants to merge 1 commit intokubeedge:mainfrom
ARYANPATEL-BIT:perf/gov-rag-model-loading

ARYANPATEL-BIT commented Apr 10, 2026 •

edited

Loading

Uh oh!

kubeedge-bot commented Apr 10, 2026

Uh oh!

ARYANPATEL-BIT commented Apr 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ARYANPATEL-BIT commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Problem

Solution

Impact

Summary

Uh oh!

kubeedge-bot commented Apr 10, 2026

Uh oh!

ARYANPATEL-BIT commented Apr 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARYANPATEL-BIT commented Apr 10, 2026 •

edited

Loading