Skip to content

fix: Improve LiteLLM service and add embeddings support#719

Merged
mc-dorzo merged 4 commits into
emcie-co:developfrom
avonian:fix/litellm-service-improvements
Jan 29, 2026
Merged

fix: Improve LiteLLM service and add embeddings support#719
mc-dorzo merged 4 commits into
emcie-co:developfrom
avonian:fix/litellm-service-improvements

Conversation

@avonian
Copy link
Copy Markdown
Contributor

@avonian avonian commented Jan 22, 2026

Summary

Fixes several issues with the LiteLLM service and adds configurable embeddings support.

Bug fixes:

  • Use async acompletion() instead of blocking completion() to avoid blocking the event loop
  • Pass api_key only if LITELLM_PROVIDER_API_KEY is set; otherwise let LiteLLM auto-detect provider-specific keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
  • Implement do_generate() directly instead of wrapping _do_generate() (fixes abstract method error)
  • Add trust_remote_code=True to HuggingFace model/tokenizer loading to fix JinaAI embeddings with newer transformers versions

New feature:

  • Add LiteLLMEmbedder class that uses litellm.aembedding() for provider-agnostic embeddings
  • New env var LITELLM_EMBEDDING_MODEL_NAME - when set, uses LiteLLM for embeddings; otherwise falls back to local JinaAI
  • This allows users to use cloud embedding APIs (OpenAI, Cohere, etc.) without the heavy torch/transformers dependencies

Note: This PR supersedes #517 and includes its functionality. The LITELLM_PROVIDER_BASE_URL env var (already in the codebase) provides the same self-hosted LLM support that #517 was adding via LITELLM_PROVIDER_URL. This PR also addresses the feedback on #517 requesting separate LLM and embedding model configuration.

Test plan

  • Added unit tests for LiteLLMEmbedder initialization and API calls
  • Added tests for service embedder selection based on env var

@avonian avonian force-pushed the fix/litellm-service-improvements branch 10 times, most recently from 7073f97 to 5f4bda6 Compare January 22, 2026 14:13
@iwr-redmond
Copy link
Copy Markdown

Could tests be accomplished without requiring trust_remote_code=True?

@avonian
Copy link
Copy Markdown
Contributor Author

avonian commented Jan 23, 2026

Could tests be accomplished without requiring trust_remote_code=True?

@iwr-redmond it's needed because Parlant uses JinaAI as the local fallback embedding model, and without trust_remote_code=True the Jina model loads as a standard BertModel instead of JinaBertModel (Jina's custom BERT variant).

The custom implementation lives in the model repo and HuggingFace won't execute it without the flag; and because the architectures differ and the checkpoint weights don't map correctly it produces garbage embeddings that break semantic search.

Tests would still technically pass without trust_remote_code=True, because they mock the embedder and never actually load the model, but in real use the embeddings would be effectively useless.

I get the security concern of course.

Is it a must to use JinaAI? Because if not, we could replace it with a different sentence-transformers model that works properly (e.g., all-MiniLM-L6-v2, all-mpnet-base-v2, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5)


# Only pass api_key if explicitly set; otherwise let LiteLLM auto-detect
# provider-specific keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
api_key = os.environ.get("LITELLM_PROVIDER_API_KEY") or None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just
api_key = os.environ.get("LITELLM_PROVIDER_API_KEY")

@property
@override
def max_tokens(self) -> int:
return 8192
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be dynamically fetched from an env variable if it exists, with this default value?

@override
def dimensions(self) -> int:
# Common dimensions for popular models; may need adjustment per model
return 1536
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

base_url: str | None = None,
) -> None:
super().__init__(logger, tracer, meter, model_name)
self._model_name = model_name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseEmbedder already configures self.model_name, so just use that.

hints: Mapping[str, Any] = {},
) -> EmbeddingResult:
# Only pass api_key if explicitly set
api_key = os.environ.get("LITELLM_PROVIDER_API_KEY") or None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api_key = os.environ.get("LITELLM_PROVIDER_API_KEY")

@mc-dorzo
Copy link
Copy Markdown
Contributor

Overall, looking good mate! Just added some tiny comments.

@iwr-redmond
Copy link
Copy Markdown

Off-topic @mc-dorzo but important: you accidentally closed #710 instead of merging it 😉

@avonian avonian force-pushed the fix/litellm-service-improvements branch from 0ac1034 to 4f6caa0 Compare January 25, 2026 20:48
@avonian
Copy link
Copy Markdown
Contributor Author

avonian commented Jan 25, 2026

@mc-dorzo thank you for the feedback, I think everything's been addresses take a look at 4f6caa0

standing by

@@ -0,0 +1,179 @@
# Copyright 2026 Emcie Co Ltd.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this (I presume generated?) test file. The tests don't seem particularly meaningful, and ultimately, the maintenance and testing of specific provider adapters is IMO better done manually.

Copy link
Copy Markdown

@iwr-redmond iwr-redmond Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the LiteLLM tests would be more useful if they tested the environment variables like the Qwen tests? LiteLLM is a very flexible service, which means reading the vars is critical for the functionality to work as expected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, let me see what i can do, will report back

Copy link
Copy Markdown
Contributor Author

@avonian avonian Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey guys, just pushed updated tests, tested quite a bit manually too (setting different env params to change models and embedding models) seems to be working well and jinaAI as fallback embedding model as well

@avonian avonian force-pushed the fix/litellm-service-improvements branch 2 times, most recently from d68065c to 264af8d Compare January 28, 2026 07:10
LiteLLM service fixes:
- Use async acompletion() instead of sync completion() to avoid blocking
- Pass api_key only if LITELLM_PROVIDER_API_KEY is set; otherwise let
  LiteLLM auto-detect provider-specific keys (OPENAI_API_KEY, etc.)
- Implement do_generate() directly instead of wrapping _do_generate()
- Make LITELLM_PROVIDER_API_KEY optional in verify_environment

HuggingFace model loading fixes:
- Add trust_remote_code=True to AutoModel.from_pretrained() and
  AutoTokenizer.from_pretrained() to fix JinaAI embeddings model
  loading with newer transformers versions

Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
Add LiteLLMEmbedder class that uses litellm.aembedding() to support
various embedding providers (OpenAI, Cohere, etc.) through LiteLLM.

New environment variable LITELLM_EMBEDDING_MODEL_NAME:
- When set, uses LiteLLM for embeddings with the specified model
- When not set, falls back to local JinaAI embeddings

This allows users to avoid the heavy torch/transformers dependencies
required for local embeddings, and enables using cloud embedding APIs
with self-hosted LLMs.

Includes tests for the new embedder and service configuration.

Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
- Remove redundant 'or None' from os.environ.get calls
- Use inherited model_name from BaseEmbedder instead of duplicating
- Make max_tokens and dimensions configurable via env vars
  (LITELLM_EMBEDDING_MAX_TOKENS, LITELLM_EMBEDDING_DIMENSIONS)

Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
@avonian avonian force-pushed the fix/litellm-service-improvements branch from 264af8d to e73826c Compare January 28, 2026 07:15
Signed-off-by: Ara Kevonian <5542980+avonian@users.noreply.github.com>
@avonian avonian force-pushed the fix/litellm-service-improvements branch from 708b123 to cf3af0a Compare January 29, 2026 16:41
@mc-dorzo mc-dorzo merged commit 68b1ca8 into emcie-co:develop Jan 29, 2026
1 check passed
@avonian avonian deleted the fix/litellm-service-improvements branch February 14, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants