-
Notifications
You must be signed in to change notification settings - Fork 84
Shift embedding flow to litellm project #2310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Shift embedding flow to litellm project #2310
Conversation
|
Important Review skippedReview was skipped as selected files did not have any reviewable changes. 💤 Files selected but had no reviewable changes (1)
You can disable this status message by setting the 📝 WalkthroughWalkthroughThe pull request refactors LLM embedding retrieval in the processor from in-process litellm calls to REST-based requests via ActionUtility. Text truncation is added before sending to the LLM, with configurable timeouts and strict HTTP status validation. Tests are updated to mock the new LLMProcessor.get_embedding method and verify the REST-based request flow. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant LLMProcessor
participant ActionUtility
participant LLMAPIServer
rect rgb(200, 230, 200)
Note over Client,LLMAPIServer: New REST-Based Embedding Flow
end
Client->>LLMProcessor: get_embedding(texts, user)
activate LLMProcessor
LLMProcessor->>LLMProcessor: truncate_text(texts)
LLMProcessor->>LLMProcessor: Build request body<br/>{text, user, kwargs}
LLMProcessor->>ActionUtility: execute_request_async(url, method, body, timeout)
activate ActionUtility
ActionUtility->>LLMAPIServer: POST /embedding<br/>(timeout: 30s)
activate LLMAPIServer
alt HTTP Status Success
LLMAPIServer-->>ActionUtility: 200-204 response<br/>with embeddings
ActionUtility-->>LLMProcessor: response_dict
rect rgb(220, 240, 220)
Note over LLMProcessor: Validate & Return
end
LLMProcessor->>LLMProcessor: if list: return first<br/>else: return response
LLMProcessor-->>Client: embeddings
else HTTP Status Error
LLMAPIServer-->>ActionUtility: 4xx/5xx status
ActionUtility-->>LLMProcessor: exception
LLMProcessor-->>Client: raise exception
end
deactivate LLMAPIServer
deactivate ActionUtility
deactivate LLMProcessor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (3)
tests/unit_test/vector_embeddings/qdrant_test.py (1)
33-74: Test correctly updated to reflect REST-based embedding flow.The test changes properly align with the implementation refactoring:
- Mocking shifted from
litellm.aembeddingtoLLMProcessor.get_embedding(line 37)- Using
autospec=Truefor type safety- Mock return value correctly set (line 63)
- Assertions verify the mock was called (lines 72-73)
Optional enhancement: Consider verifying the arguments passed to
mock_get_embeddingto ensure the test validates the correct data flow:mock_get_embedding.assert_called_once_with( mock.ANY, # self parameter 'Hi', # text user=user, # user parameter )This would catch regressions if the calling code changes the arguments passed to
get_embedding.tests/unit_test/data_processor/data_processor_test.py (2)
3712-3712: Consider verifying mock arguments for stronger test coverage.The assertion only checks that
get_embeddingwas called once, but doesn't verify the texts being embedded. Given this PR migrates the embedding flow, consider usingassert_called_once_withor checkingcall_argsto ensure the correct data is sent for embedding.# Example: Verify the texts passed for embedding call_args = mock_get_embedding.call_args assert len(call_args[0][1]) == 2 # Verify number of texts
20288-20336: Consider adding cleanup for test data isolation.The test creates
CollectionDatadocuments but doesn't clean them up after the test. While the test verifies the stale item is removed, the remainingitem_1document persists and could affect other tests.🔎 Proposed fix
assert "item_2" not in remaining_ids + + CollectionData.objects(collection_name=collection_name).delete()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
kairon/shared/llm/processor.pytests/integration_test/action_service_test.pytests/integration_test/services_test.pytests/unit_test/data_processor/data_processor_test.pytests/unit_test/llm_test.pytests/unit_test/vector_embeddings/qdrant_test.py
🧰 Additional context used
🧬 Code graph analysis (2)
kairon/shared/llm/processor.py (1)
kairon/shared/actions/utils.py (1)
execute_request_async(42-84)
tests/unit_test/data_processor/data_processor_test.py (2)
kairon/shared/data/constant.py (2)
SyncType(330-332)SYNC_STATUS(121-136)kairon/shared/cognition/processor.py (2)
upsert_data(697-817)CognitionDataProcessor(26-927)
🪛 Ruff (0.14.10)
kairon/shared/llm/processor.py
182-182: Create your own exception
(TRY002)
tests/unit_test/data_processor/data_processor_test.py
6797-6797: Unused method argument: mock_vec_client
(ARG002)
20234-20234: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Analyze (python)
- GitHub Check: Python CI
| truncated_texts = self.truncate_text(texts) | ||
| kwargs["truncated_texts"] = truncated_texts | ||
| kwargs["api_key"] = self.llm_secret_embedding.get("api_key") | ||
|
|
||
| result = await litellm.aembedding( | ||
| model="text-embedding-3-large", | ||
| input=truncated_texts, | ||
| metadata={'user': user, 'bot': self.bot, 'invocation': kwargs.get("invocation")}, | ||
| api_key=self.llm_secret_embedding.get('api_key'), | ||
| num_retries=3 | ||
| ) | ||
|
|
||
| embeddings = [embedding["embedding"] for embedding in result["data"]] | ||
|
|
||
| if is_single_text: | ||
| return embeddings[0] | ||
| body = { | ||
| "text": texts, | ||
| "user": user, | ||
| "kwargs": kwargs, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant data transmission: sending both original and truncated texts.
The request body includes both the original texts (line 168) and truncated_texts (line 164 added to kwargs, line 170). Since the remote LLM service should only process the truncated version, sending the original texts is wasteful and increases payload size unnecessarily.
Recommendation: Send only the truncated texts in the request body, or clarify if the remote service requires both versions for a specific reason.
🔎 Proposed fix to eliminate redundant data
truncated_texts = self.truncate_text(texts)
-kwargs["truncated_texts"] = truncated_texts
kwargs["api_key"] = self.llm_secret_embedding.get("api_key")
body = {
- "text": texts,
+ "text": truncated_texts,
"user": user,
"kwargs": kwargs,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| truncated_texts = self.truncate_text(texts) | |
| kwargs["truncated_texts"] = truncated_texts | |
| kwargs["api_key"] = self.llm_secret_embedding.get("api_key") | |
| result = await litellm.aembedding( | |
| model="text-embedding-3-large", | |
| input=truncated_texts, | |
| metadata={'user': user, 'bot': self.bot, 'invocation': kwargs.get("invocation")}, | |
| api_key=self.llm_secret_embedding.get('api_key'), | |
| num_retries=3 | |
| ) | |
| embeddings = [embedding["embedding"] for embedding in result["data"]] | |
| if is_single_text: | |
| return embeddings[0] | |
| body = { | |
| "text": texts, | |
| "user": user, | |
| "kwargs": kwargs, | |
| } | |
| truncated_texts = self.truncate_text(texts) | |
| kwargs["api_key"] = self.llm_secret_embedding.get("api_key") | |
| body = { | |
| "text": truncated_texts, | |
| "user": user, | |
| "kwargs": kwargs, | |
| } |
🤖 Prompt for AI Agents
In kairon/shared/llm/processor.py around lines 163 to 171, the request body is
sending both the original `texts` and the `truncated_texts` (via kwargs), which
is redundant and increases payload size; change the body to include only the
truncated texts (replace "text": texts with "text": truncated_texts or remove
the original `texts` entry) and ensure any callers or the remote service receive
`truncated_texts` in the expected key (adjust key name if the remote API expects
a different field), while retaining `user` and `kwargs` as needed.
| kwargs["api_key"] = self.llm_secret_embedding.get("api_key") | ||
|
|
||
| result = await litellm.aembedding( | ||
| model="text-embedding-3-large", | ||
| input=truncated_texts, | ||
| metadata={'user': user, 'bot': self.bot, 'invocation': kwargs.get("invocation")}, | ||
| api_key=self.llm_secret_embedding.get('api_key'), | ||
| num_retries=3 | ||
| ) | ||
|
|
||
| embeddings = [embedding["embedding"] for embedding in result["data"]] | ||
|
|
||
| if is_single_text: | ||
| return embeddings[0] | ||
| body = { | ||
| "text": texts, | ||
| "user": user, | ||
| "kwargs": kwargs, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file and get context around lines 165-171
head -n 180 kairon/shared/llm/processor.py | tail -n 30Repository: digiteinfotech/kairon
Length of output: 1176
🏁 Script executed:
# Let's get more comprehensive context
sed -n '150,200p' kairon/shared/llm/processor.pyRepository: digiteinfotech/kairon
Length of output: 2249
🏁 Script executed:
# Find the ActionUtility.execute_request_async method to understand how it handles requests
rg -n "def execute_request_async" --type py -A 20Repository: digiteinfotech/kairon
Length of output: 1700
🏁 Script executed:
# Search for any logging of request bodies in the codebase
rg -n "request_body|request body" kairon/ --type py | head -20Repository: digiteinfotech/kairon
Length of output: 1744
🏁 Script executed:
# Get full execute_request_async implementation
sed -n '42,150p' kairon/shared/actions/utils.pyRepository: digiteinfotech/kairon
Length of output: 4892
🏁 Script executed:
# Check where that logging happens and verify if request_body gets logged with the API key
sed -n '1390,1410p' kairon/shared/utils.pyRepository: digiteinfotech/kairon
Length of output: 938
🏁 Script executed:
# Check if the aembedding endpoint is local or external and what it expects
grep -r "aembedding" --type py | head -20Repository: digiteinfotech/kairon
Length of output: 182
🏁 Script executed:
# Look for other usages of execute_request_async to see if headers are commonly used
rg "execute_request_async" --type py -B 2 -A 5 | head -100Repository: digiteinfotech/kairon
Length of output: 6624
🏁 Script executed:
# Search for aembedding endpoint definition/documentation
rg "aembedding" . --type py -B 5 -A 5Repository: digiteinfotech/kairon
Length of output: 50377
🏁 Script executed:
# Look for how other API key usage is handled in the codebase
rg "api_key.*Bearer|Authorization.*api_key" . --type py | head -20Repository: digiteinfotech/kairon
Length of output: 200
🏁 Script executed:
# Check if there's any documentation about the llm service/embedding service expectations
find . -name "*.md" -o -name "*.txt" | xargs grep -l "aembedding\|embedding.*api" 2>/dev/null | head -5Repository: digiteinfotech/kairon
Length of output: 47
🏁 Script executed:
# Look at how other services pass API keys (to see if body or header is standard in this codebase)
rg "Authorization.*Bearer|api.key.*header" . --type py -A 2 -B 2 | head -50Repository: digiteinfotech/kairon
Length of output: 5294
API key in request body is a security best practice violation; move to headers.
The API key is included in the request body via kwargs, which increases the risk of exposure if the service logs, caches, or proxies the request body. While execute_request_async supports headers, the current implementation doesn't use them. Pass the API key via an Authorization header instead.
🔎 Proposed fix
truncated_texts = self.truncate_text(texts)
-kwargs["truncated_texts"] = truncated_texts
-kwargs["api_key"] = self.llm_secret_embedding.get("api_key")
body = {
"text": texts,
"user": user,
+ "truncated_texts": truncated_texts,
- "kwargs": kwargs,
}
timeout = Utility.environment["llm"].get("request_timeout", 30)
+headers = {"Authorization": f"Bearer {self.llm_secret_embedding.get('api_key')}"}
http_response, status_code, elapsed_time, _ = await ActionUtility.execute_request_async(
http_url=f"{Utility.environment['llm']['url']}/{urllib.parse.quote(self.bot)}/aembedding/{self.llm_type}",
request_method="POST",
request_body=body,
+ headers=headers,
timeout=timeout,
)Ensure the receiving service expects the API key in the Authorization header.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In kairon/shared/llm/processor.py around lines 165 to 171, the code places the
API key into kwargs that are included in the request body; instead remove
api_key from the body and set an Authorization header (e.g., "Authorization":
f"Bearer {api_key}") when calling execute_request_async. Update the body to
exclude api_key, build a headers dict with the Authorization value (and preserve
any existing headers), and pass that headers dict into execute_request_async so
the key is sent in headers rather than in the request body; ensure the receiving
service accepts Authorization headers.
| if status_code not in [200, 201, 202, 203, 204]: | ||
| raise Exception(HTTPStatus(status_code).phrase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a custom exception instead of generic Exception.
Raising a generic Exception with only the HTTP status phrase provides limited context for debugging. The static analysis tool correctly flags this as a code smell.
🔎 Proposed fix using AppException
logging.info(f"LLM request completed in {elapsed_time} for bot: {self.bot}")
if status_code not in [200, 201, 202, 203, 204]:
- raise Exception(HTTPStatus(status_code).phrase)
+ raise AppException(f"LLM embedding request failed with status {status_code}: {HTTPStatus(status_code).phrase}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if status_code not in [200, 201, 202, 203, 204]: | |
| raise Exception(HTTPStatus(status_code).phrase) | |
| if status_code not in [200, 201, 202, 203, 204]: | |
| raise AppException(f"LLM embedding request failed with status {status_code}: {HTTPStatus(status_code).phrase}") |
🧰 Tools
🪛 Ruff (0.14.10)
182-182: Create your own exception
(TRY002)
🤖 Prompt for AI Agents
In kairon/shared/llm/processor.py around lines 181-182 the code raises a generic
Exception with only HTTPStatus(...).phrase; replace this with the project's
AppException (or a suitable custom exception) so callers get structured error
info: construct and raise AppException including the HTTP status code, phrase
and any relevant response body or error payload (if available) to provide
context, and ensure the exception type is imported at the top of the file.
| if is_single_text and isinstance(http_response, list): | ||
| return http_response[0] | ||
| return http_response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate http_response structure before returning.
The code assumes http_response is either a list or the correct format without validation. If the remote service returns None, an unexpected structure, or an error payload, this could cause issues downstream.
Recommendation: Add validation to ensure http_response has the expected structure (e.g., list of embeddings) and raise an appropriate exception if not.
🔎 Proposed fix with response validation
logging.info(f"LLM request completed in {elapsed_time} for bot: {self.bot}")
if status_code not in [200, 201, 202, 203, 204]:
raise AppException(f"LLM embedding request failed with status {status_code}: {HTTPStatus(status_code).phrase}")
+
+if http_response is None:
+ raise AppException("LLM embedding request returned None response")
+
+if not isinstance(http_response, (list, dict)):
+ raise AppException(f"Unexpected response type from LLM service: {type(http_response)}")
+
if is_single_text and isinstance(http_response, list):
return http_response[0]
return http_responseCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In kairon/shared/llm/processor.py around lines 183 to 185, the code returns
http_response without validating its structure; update it to (1) verify
http_response is not None, (2) if is_single_text ensure http_response is a
non-empty list and the first element has the expected shape/type (or raise
ValueError/RuntimeError if not), (3) if not single-text ensure http_response is
a list of the expected items (or a dict/structure you expect) and validate each
entry, (4) detect common error payloads (e.g., dict with "error" key or
unexpected types) and raise a descriptive exception rather than returning them;
return the validated element (http_response[0]) or the validated response only
after these checks.
| @patch("kairon.shared.rest_client.AioRestClient.request", autospec=True) | ||
| @patch("kairon.shared.account.processor.AccountProcessor.get_bot", autospec=True) | ||
| @patch("kairon.train.train_model_for_bot", autospec=True) | ||
| @patch.object(LLMProcessor, "get_embedding", autospec=True) | ||
| def test_start_training_with_llm_faq( | ||
| self, mock_train, mock_bot, mock_vec_client, mock_openai | ||
| self, mock_get_embedding, mock_train, mock_bot, mock_vec_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused mock decorator and parameter.
The mock_vec_client parameter is flagged as unused by static analysis. The @patch("kairon.shared.rest_client.AioRestClient.request", autospec=True) decorator at line 6792 appears to be leftover from the previous implementation and can be removed along with its corresponding parameter.
🔎 Proposed fix
- @patch("kairon.shared.rest_client.AioRestClient.request", autospec=True)
@patch("kairon.shared.account.processor.AccountProcessor.get_bot", autospec=True)
@patch("kairon.train.train_model_for_bot", autospec=True)
@patch.object(LLMProcessor, "get_embedding", autospec=True)
def test_start_training_with_llm_faq(
- self, mock_get_embedding, mock_train, mock_bot, mock_vec_client
+ self, mock_get_embedding, mock_train, mock_bot
):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @patch("kairon.shared.rest_client.AioRestClient.request", autospec=True) | |
| @patch("kairon.shared.account.processor.AccountProcessor.get_bot", autospec=True) | |
| @patch("kairon.train.train_model_for_bot", autospec=True) | |
| @patch.object(LLMProcessor, "get_embedding", autospec=True) | |
| def test_start_training_with_llm_faq( | |
| self, mock_train, mock_bot, mock_vec_client, mock_openai | |
| self, mock_get_embedding, mock_train, mock_bot, mock_vec_client | |
| @patch("kairon.shared.account.processor.AccountProcessor.get_bot", autospec=True) | |
| @patch("kairon.train.train_model_for_bot", autospec=True) | |
| @patch.object(LLMProcessor, "get_embedding", autospec=True) | |
| def test_start_training_with_llm_faq( | |
| self, mock_get_embedding, mock_train, mock_bot | |
| ): |
🧰 Tools
🪛 Ruff (0.14.10)
6797-6797: Unused method argument: mock_vec_client
(ARG002)
🤖 Prompt for AI Agents
In tests/unit_test/data_processor/data_processor_test.py around lines 6792 to
6797, the test has an unused patch decorator
@patch("kairon.shared.rest_client.AioRestClient.request", autospec=True) and its
corresponding parameter mock_vec_client; remove that decorator and delete the
mock_vec_client parameter from the test signature so the remaining patches and
parameters line up correctly (ensure the order of @patch decorators still
matches the remaining test parameters).
| result = list( | ||
| processor.list_cognition_data( | ||
| bot=bot, | ||
| data="bot", | ||
| start_idx=0, | ||
| page_size=10 | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address unused variable result.
The result variable is assigned but never used, as flagged by static analysis. Either assert on the result value or prefix with underscore to indicate it's intentionally unused.
🔎 Proposed fix
- result = list(
+ _ = list(
processor.list_cognition_data(
bot=bot,
data="bot",
start_idx=0,
page_size=10
)
)Or alternatively, add an assertion:
result = list(
processor.list_cognition_data(
bot=bot,
data="bot",
start_idx=0,
page_size=10
)
)
+ assert result == []
mock_queryset.search_text.assert_called_once_with("bot")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| result = list( | |
| processor.list_cognition_data( | |
| bot=bot, | |
| data="bot", | |
| start_idx=0, | |
| page_size=10 | |
| ) | |
| ) | |
| _ = list( | |
| processor.list_cognition_data( | |
| bot=bot, | |
| data="bot", | |
| start_idx=0, | |
| page_size=10 | |
| ) | |
| ) |
🧰 Tools
🪛 Ruff (0.14.10)
20234-20234: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
🤖 Prompt for AI Agents
In tests/unit_test/data_processor/data_processor_test.py around lines 20234 to
20241, the local variable `result` is assigned but never used; change this by
either asserting on the expected outcome of processor.list_cognition_data (e.g.,
assert its length or contents) or, if the value is intentionally not used,
rename the variable to `_result` (or prefix with an underscore) to silence the
unused-variable warning; update the test to include a concrete assertion when
appropriate so the test verifies behavior rather than merely calling the method.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.