fix: Improve LiteLLM service and add embeddings support by avonian · Pull Request #719 · emcie-co/parlant

avonian · 2026-01-22T12:23:53Z

Summary

Fixes several issues with the LiteLLM service and adds configurable embeddings support.

Bug fixes:

Use async acompletion() instead of blocking completion() to avoid blocking the event loop
Pass api_key only if LITELLM_PROVIDER_API_KEY is set; otherwise let LiteLLM auto-detect provider-specific keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
Implement do_generate() directly instead of wrapping _do_generate() (fixes abstract method error)
Add trust_remote_code=True to HuggingFace model/tokenizer loading to fix JinaAI embeddings with newer transformers versions

New feature:

Add LiteLLMEmbedder class that uses litellm.aembedding() for provider-agnostic embeddings
New env var LITELLM_EMBEDDING_MODEL_NAME - when set, uses LiteLLM for embeddings; otherwise falls back to local JinaAI
This allows users to use cloud embedding APIs (OpenAI, Cohere, etc.) without the heavy torch/transformers dependencies

Note: This PR supersedes #517 and includes its functionality. The LITELLM_PROVIDER_BASE_URL env var (already in the codebase) provides the same self-hosted LLM support that #517 was adding via LITELLM_PROVIDER_URL. This PR also addresses the feedback on #517 requesting separate LLM and embedding model configuration.

Test plan

Added unit tests for LiteLLMEmbedder initialization and API calls
Added tests for service embedder selection based on env var

iwr-redmond · 2026-01-23T01:08:39Z

Could tests be accomplished without requiring trust_remote_code=True?

avonian · 2026-01-23T16:13:17Z

Could tests be accomplished without requiring trust_remote_code=True?

@iwr-redmond it's needed because Parlant uses JinaAI as the local fallback embedding model, and without trust_remote_code=True the Jina model loads as a standard BertModel instead of JinaBertModel (Jina's custom BERT variant).

The custom implementation lives in the model repo and HuggingFace won't execute it without the flag; and because the architectures differ and the checkpoint weights don't map correctly it produces garbage embeddings that break semantic search.

Tests would still technically pass without trust_remote_code=True, because they mock the embedder and never actually load the model, but in real use the embeddings would be effectively useless.

I get the security concern of course.

Is it a must to use JinaAI? Because if not, we could replace it with a different sentence-transformers model that works properly (e.g., all-MiniLM-L6-v2, all-mpnet-base-v2, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5)

mc-dorzo · 2026-01-25T08:33:15Z


+        # Only pass api_key if explicitly set; otherwise let LiteLLM auto-detect
+        # provider-specific keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
+        api_key = os.environ.get("LITELLM_PROVIDER_API_KEY") or None


Why not just
api_key = os.environ.get("LITELLM_PROVIDER_API_KEY")

mc-dorzo · 2026-01-25T08:34:21Z

+    @property
+    @override
+    def max_tokens(self) -> int:
+        return 8192


Can it be dynamically fetched from an env variable if it exists, with this default value?

mc-dorzo · 2026-01-25T08:34:32Z

+    @override
+    def dimensions(self) -> int:
+        # Common dimensions for popular models; may need adjustment per model
+        return 1536


mc-dorzo · 2026-01-25T08:35:07Z

+        base_url: str | None = None,
+    ) -> None:
+        super().__init__(logger, tracer, meter, model_name)
+        self._model_name = model_name


BaseEmbedder already configures self.model_name, so just use that.

mc-dorzo · 2026-01-25T08:35:29Z

+        hints: Mapping[str, Any] = {},
+    ) -> EmbeddingResult:
+        # Only pass api_key if explicitly set
+        api_key = os.environ.get("LITELLM_PROVIDER_API_KEY") or None


api_key = os.environ.get("LITELLM_PROVIDER_API_KEY")

mc-dorzo · 2026-01-25T08:36:28Z

Overall, looking good mate! Just added some tiny comments.

iwr-redmond · 2026-01-25T08:43:27Z

Off-topic @mc-dorzo but important: you accidentally closed #710 instead of merging it 😉

avonian · 2026-01-25T20:52:03Z

@mc-dorzo thank you for the feedback, I think everything's been addresses take a look at 4f6caa0

standing by

kichanyurd · 2026-01-27T19:05:10Z

@@ -0,0 +1,179 @@
+# Copyright 2026 Emcie Co Ltd.


I'm not sure we need this (I presume generated?) test file. The tests don't seem particularly meaningful, and ultimately, the maintenance and testing of specific provider adapters is IMO better done manually.

Perhaps the LiteLLM tests would be more useful if they tested the environment variables like the Qwen tests? LiteLLM is a very flexible service, which means reading the vars is critical for the functionality to work as expected.

noted, let me see what i can do, will report back

hey guys, just pushed updated tests, tested quite a bit manually too (setting different env params to change models and embedding models) seems to be working well and jinaAI as fallback embedding model as well

LiteLLM service fixes: - Use async acompletion() instead of sync completion() to avoid blocking - Pass api_key only if LITELLM_PROVIDER_API_KEY is set; otherwise let LiteLLM auto-detect provider-specific keys (OPENAI_API_KEY, etc.) - Implement do_generate() directly instead of wrapping _do_generate() - Make LITELLM_PROVIDER_API_KEY optional in verify_environment HuggingFace model loading fixes: - Add trust_remote_code=True to AutoModel.from_pretrained() and AutoTokenizer.from_pretrained() to fix JinaAI embeddings model loading with newer transformers versions Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>

Add LiteLLMEmbedder class that uses litellm.aembedding() to support various embedding providers (OpenAI, Cohere, etc.) through LiteLLM. New environment variable LITELLM_EMBEDDING_MODEL_NAME: - When set, uses LiteLLM for embeddings with the specified model - When not set, falls back to local JinaAI embeddings This allows users to avoid the heavy torch/transformers dependencies required for local embeddings, and enables using cloud embedding APIs with self-hosted LLMs. Includes tests for the new embedder and service configuration. Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>

- Remove redundant 'or None' from os.environ.get calls - Use inherited model_name from BaseEmbedder instead of duplicating - Make max_tokens and dimensions configurable via env vars (LITELLM_EMBEDDING_MAX_TOKENS, LITELLM_EMBEDDING_DIMENSIONS) Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>

Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>

avonian force-pushed the fix/litellm-service-improvements branch 10 times, most recently from 7073f97 to 5f4bda6 Compare January 22, 2026 14:13

mc-dorzo reviewed Jan 25, 2026

View reviewed changes

avonian force-pushed the fix/litellm-service-improvements branch from 0ac1034 to 4f6caa0 Compare January 25, 2026 20:48

iwr-redmond mentioned this pull request Jan 26, 2026

[Enhancement] xllamacpp NLP adapter #678

Open

kichanyurd reviewed Jan 27, 2026

View reviewed changes

avonian force-pushed the fix/litellm-service-improvements branch 2 times, most recently from d68065c to 264af8d Compare January 28, 2026 07:10

avonian added 3 commits January 28, 2026 08:14

avonian force-pushed the fix/litellm-service-improvements branch from 264af8d to e73826c Compare January 28, 2026 07:15

Rework LiteLLM tests to test env vars

cf3af0a

Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>

avonian force-pushed the fix/litellm-service-improvements branch from 708b123 to cf3af0a Compare January 29, 2026 16:41

mc-dorzo merged commit 68b1ca8 into emcie-co:develop Jan 29, 2026
1 check passed

avonian deleted the fix/litellm-service-improvements branch February 14, 2026 17:59

Conversation

avonian commented Jan 22, 2026

Summary

Test plan

Uh oh!

iwr-redmond commented Jan 23, 2026

Uh oh!

avonian commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mc-dorzo Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

mc-dorzo Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

mc-dorzo Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

mc-dorzo Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

mc-dorzo Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

mc-dorzo commented Jan 25, 2026

Uh oh!

iwr-redmond commented Jan 25, 2026

Uh oh!

avonian commented Jan 25, 2026

Uh oh!

kichanyurd Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

iwr-redmond Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avonian Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

avonian Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

avonian commented Jan 23, 2026 •

edited

Loading

iwr-redmond Jan 27, 2026 •

edited

Loading

avonian Jan 29, 2026 •

edited

Loading