fix: Improve LiteLLM service and add embeddings support#719
Conversation
7073f97 to
5f4bda6
Compare
|
Could tests be accomplished without requiring |
@iwr-redmond it's needed because Parlant uses JinaAI as the local fallback embedding model, and without The custom implementation lives in the model repo and HuggingFace won't execute it without the flag; and because the architectures differ and the checkpoint weights don't map correctly it produces garbage embeddings that break semantic search. Tests would still technically pass without I get the security concern of course. Is it a must to use JinaAI? Because if not, we could replace it with a different sentence-transformers model that works properly (e.g., all-MiniLM-L6-v2, all-mpnet-base-v2, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5) |
|
|
||
| # Only pass api_key if explicitly set; otherwise let LiteLLM auto-detect | ||
| # provider-specific keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) | ||
| api_key = os.environ.get("LITELLM_PROVIDER_API_KEY") or None |
There was a problem hiding this comment.
Why not just
api_key = os.environ.get("LITELLM_PROVIDER_API_KEY")
| @property | ||
| @override | ||
| def max_tokens(self) -> int: | ||
| return 8192 |
There was a problem hiding this comment.
Can it be dynamically fetched from an env variable if it exists, with this default value?
| @override | ||
| def dimensions(self) -> int: | ||
| # Common dimensions for popular models; may need adjustment per model | ||
| return 1536 |
| base_url: str | None = None, | ||
| ) -> None: | ||
| super().__init__(logger, tracer, meter, model_name) | ||
| self._model_name = model_name |
There was a problem hiding this comment.
BaseEmbedder already configures self.model_name, so just use that.
| hints: Mapping[str, Any] = {}, | ||
| ) -> EmbeddingResult: | ||
| # Only pass api_key if explicitly set | ||
| api_key = os.environ.get("LITELLM_PROVIDER_API_KEY") or None |
There was a problem hiding this comment.
api_key = os.environ.get("LITELLM_PROVIDER_API_KEY")
|
Overall, looking good mate! Just added some tiny comments. |
0ac1034 to
4f6caa0
Compare
| @@ -0,0 +1,179 @@ | |||
| # Copyright 2026 Emcie Co Ltd. | |||
There was a problem hiding this comment.
I'm not sure we need this (I presume generated?) test file. The tests don't seem particularly meaningful, and ultimately, the maintenance and testing of specific provider adapters is IMO better done manually.
There was a problem hiding this comment.
Perhaps the LiteLLM tests would be more useful if they tested the environment variables like the Qwen tests? LiteLLM is a very flexible service, which means reading the vars is critical for the functionality to work as expected.
There was a problem hiding this comment.
noted, let me see what i can do, will report back
There was a problem hiding this comment.
hey guys, just pushed updated tests, tested quite a bit manually too (setting different env params to change models and embedding models) seems to be working well and jinaAI as fallback embedding model as well
d68065c to
264af8d
Compare
LiteLLM service fixes: - Use async acompletion() instead of sync completion() to avoid blocking - Pass api_key only if LITELLM_PROVIDER_API_KEY is set; otherwise let LiteLLM auto-detect provider-specific keys (OPENAI_API_KEY, etc.) - Implement do_generate() directly instead of wrapping _do_generate() - Make LITELLM_PROVIDER_API_KEY optional in verify_environment HuggingFace model loading fixes: - Add trust_remote_code=True to AutoModel.from_pretrained() and AutoTokenizer.from_pretrained() to fix JinaAI embeddings model loading with newer transformers versions Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
Add LiteLLMEmbedder class that uses litellm.aembedding() to support various embedding providers (OpenAI, Cohere, etc.) through LiteLLM. New environment variable LITELLM_EMBEDDING_MODEL_NAME: - When set, uses LiteLLM for embeddings with the specified model - When not set, falls back to local JinaAI embeddings This allows users to avoid the heavy torch/transformers dependencies required for local embeddings, and enables using cloud embedding APIs with self-hosted LLMs. Includes tests for the new embedder and service configuration. Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
- Remove redundant 'or None' from os.environ.get calls - Use inherited model_name from BaseEmbedder instead of duplicating - Make max_tokens and dimensions configurable via env vars (LITELLM_EMBEDDING_MAX_TOKENS, LITELLM_EMBEDDING_DIMENSIONS) Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
264af8d to
e73826c
Compare
Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
708b123 to
cf3af0a
Compare
Summary
Fixes several issues with the LiteLLM service and adds configurable embeddings support.
Bug fixes:
acompletion()instead of blockingcompletion()to avoid blocking the event loopapi_keyonly ifLITELLM_PROVIDER_API_KEYis set; otherwise let LiteLLM auto-detect provider-specific keys (OPENAI_API_KEY,ANTHROPIC_API_KEY, etc.)do_generate()directly instead of wrapping_do_generate()(fixes abstract method error)trust_remote_code=Trueto HuggingFace model/tokenizer loading to fix JinaAI embeddings with newer transformers versionsNew feature:
LiteLLMEmbedderclass that useslitellm.aembedding()for provider-agnostic embeddingsLITELLM_EMBEDDING_MODEL_NAME- when set, uses LiteLLM for embeddings; otherwise falls back to local JinaAINote: This PR supersedes #517 and includes its functionality. The
LITELLM_PROVIDER_BASE_URLenv var (already in the codebase) provides the same self-hosted LLM support that #517 was adding viaLITELLM_PROVIDER_URL. This PR also addresses the feedback on #517 requesting separate LLM and embedding model configuration.Test plan