Releases: BerriAI/litellm
v1.65.4.dev8
What's Changed
- fix: claude haiku cache read pricing per token by @hewliyang in #9834
- Add service annotations to litellm-helm chart by @mlhynfield in #9840
- Reflect key and team update in UI by @crisshaker in #9825
- Add user alias to API endpoint by @Jacobh2 in #9859
- Update Azure Phi-4 pricing by @emerzon in #9862
- feat: add enterpriseWebSearch tool for vertex-ai by @qvalentin in #9856
- VertexAI non-jsonl file storage support by @krrishdholakia in #9781
- [Bug Fix] Add support for UploadFile on LLM Pass through endpoints (OpenAI, Azure etc) by @ishaan-jaff in #9853
- [Feat SSO] Debug route - allow admins to debug SSO JWT fields by @ishaan-jaff in #9835
New Contributors
- @hewliyang made their first contribution in #9834
- @mlhynfield made their first contribution in #9840
- @crisshaker made their first contribution in #9825
- @qvalentin made their first contribution in #9856
Full Changelog: v1.65.4.dev6...v1.65.4.dev8
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.4.dev8
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 240.0 | 271.9459418950253 | 6.118160191369328 | 0.0 | 1829 | 0 | 215.48997299998973 | 3681.300501999999 |
Aggregated | Passed ✅ | 240.0 | 271.9459418950253 | 6.118160191369328 | 0.0 | 1829 | 0 | 215.48997299998973 | 3681.300501999999 |
v1.65.4.dev6
What's Changed
- build: bump litellm-proxy-extras version by @krrishdholakia in #9771
- Update model_prices by @aoaim in #9768
- Move daily user transaction logging outside of 'disable_spend_logs' flag - different tables by @krrishdholakia in #9772
- Add inference providers support for Hugging Face (#8258) (#9738) by @krrishdholakia in #9773
- [UI Bug fix] Don't show duplicate models on Team Admin models page by @ishaan-jaff in #9775
- [UI QA/Bug Fix] - Don't change team, key, org, model values on scroll by @ishaan-jaff in #9776
- [UI Polish] - Polish login screen by @ishaan-jaff in #9778
- Litellm 04 05 2025 release notes by @krrishdholakia in #9785
- feat: add offline swagger docs by @devdev999 in #7653
- fix(gemini/transformation.py): handle file_data being passed in by @krrishdholakia in #9786
- Realtime API Cost tracking by @krrishdholakia in #9795
- fix(vertex_ai.py): move to only passing in accepted keys by vertex ai response schema by @krrishdholakia in #8992
- fix(databricks/chat/transformation.py): remove reasoning_effort from … by @krrishdholakia in #9811
- Handle pydantic base model in message tool calls + Handle tools = [] + handle fireworks ai w/ 'strict' param in function call + support fake streaming on tool calls for meta.llama3-3-70b-instruct-v1:0 by @krrishdholakia in #9774
- Allow passing
thinking
param to litellm proxy via client sdk + Code QA Refactor on get_optional_params (get correct values) by @krrishdholakia in #9386 - [Feat] LiteLLM Tag/Policy Management by @ishaan-jaff in #9813
- Remove redundant
apk update
in Dockerfiles (cc #5016) by @PeterDaveHello in #9055 - [Security fix - CVE-2025-0330] - Leakage of Langfuse API keys in team exception handling by @ishaan-jaff in #9830
- [Security Fix CVE-2024-6825] Fix remote code execution in post call rules by @ishaan-jaff in #9826
- Bump next from 14.2.25 to 14.2.26 in /ui/litellm-dashboard by @dependabot in #9716
New Contributors
Full Changelog: v1.65.4-nightly...v1.65.4.dev6
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.4.dev6
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 230.0 | 264.1081772527121 | 6.162437450043016 | 0.0 | 1844 | 0 | 200.65376200000173 | 5098.356198000033 |
Aggregated | Passed ✅ | 230.0 | 264.1081772527121 | 6.162437450043016 | 0.0 | 1844 | 0 | 200.65376200000173 | 5098.356198000033 |
v1.65.4-stable
Docker Run LiteLLM Proxy
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.65.4-stable
pip install LiteLLM Proxy
pip install litellm==1.65.4.post1
What's Changed
- Add support to Vertex AI transformation for anyOf union type with null fields by @NickGrab in #9618
- fix(logging): add json formatting for uncaught exceptions (#9615) by @krrishdholakia in #9619
- fix: wrong indentation of ttlSecondsAfterFinished in chart by @Dbzman in #9611
- Fix anthropic thinking + response_format by @krrishdholakia in #9594
- Add support to Vertex AI transformation for anyOf union type with null fields by @NickGrab in #9625
- fix(openrouter/chat/transformation.py): raise informative message for openrouter key error by @krrishdholakia in #9626
- [Reliability] - Reduce DB Deadlocks by storing spend updates in Redis and then committing to DB by @ishaan-jaff in #9608
- [Refactor] - Use single class for managing DB update spend transactions by @ishaan-jaff in #9600
- Add bedrock latency optimized inference support + Vertex AI Multimodal embedding cost tracking by @krrishdholakia in #9623
- build(pyproject.toml): add new dev dependencies - for type checking by @krrishdholakia in #9631
- install prisma migration files - connects litellm proxy to litellm's prisma migration files by @krrishdholakia in #9637
- update docs for openwebui by @tan-yong-sheng in #9636
- Add gemini audio input support + handle special tokens in sagemaker response by @krrishdholakia in #9640
- [Docs - Release notes v0] v1.65.0-stable by @ishaan-jaff in #9643
- [Feat] - MCP improvements, add support for using SSE MCP servers by @ishaan-jaff in #9642
- [FIX] - Add password to sync sentinel client by @jmarshall-medallia in #9622
- fix: Anthropic prompt caching on GCP Vertex AI by @sammcj in #9605
- Fixes Databricks llama3.3-70b endpoint and add databricks claude 3.7 sonnet endpoint by @anton164 in #9661
- fix(docs): update xAI Grok vision model reference by @colesmcintosh in #9286
- docs(gemini): fix typo by @GabrielLoiseau in #9581
- Update all_caches.md by @KPCOFGS in #9562
- [Bug fix] - Sagemaker endpoint with inference component streaming by @ishaan-jaff in #9515
- Revert "Correct Databricks llama3.3-70b endpoint and add databricks c… by @krrishdholakia in #9668
- Revert "fix: Anthropic prompt caching on GCP Vertex AI" by @krrishdholakia in #9670
- [Refactor] - Expose litellm.messages.acreate() and litellm.messages.create() to make LLM API calls in Anthropic API spec by @ishaan-jaff in #9567
- Openrouter streaming fixes + Anthropic 'file' message support by @krrishdholakia in #9667
- fix(cost_calculator.py): allows checking received + sent model name w… by @krrishdholakia in #9669
- Revert "Revert "Correct Databricks llama3.3-70b endpoint and add databricks c…" by @krrishdholakia in #9676
- Update model_prices_and_context_window.json add gemini-2.5-pro-exp-03-25 by @superpoussin22 in #9650
- fix(proxy_server.py): Fix "Circular reference detected" error when max_parallel_requests = 0 by @krrishdholakia in #9671
- UI (new_usage.tsx): Report 'total_tokens' + report success/failure calls by @krrishdholakia in #9675
- [Reliability] - Ensure new Redis + DB architecture tracks spend accurately by @ishaan-jaff in #9673
- [Bug fix] - Service accounts - only apply
service_account_settings.enforced_params
on service accounts by @ishaan-jaff in #9683 - UI - New Usage Tab fixes by @krrishdholakia in #9696
- [Reliability Fixes] - Ensure no deadlocks occur when updating
DailyUserSpendTransaction
by @ishaan-jaff in #9690 - Virtual key based policies in Aim Guardrails by @hxtomer in #9499
- fix(streaming_handler.py): fix completion start time tracking + Anthropic 'reasoning_effort' param mapping by @krrishdholakia in #9688
- Litellm user daily activity allow non admin usage by @krrishdholakia in #9695
- fix(model_management_endpoints.py): fix allowing team admins to update team models by @krrishdholakia in #9697
- Add support for max_completion_tokens to the Cohere chat transformati… by @simha104 in #9701
- fix(gemini/): add gemini/ route embedding optional param mapping support by @krrishdholakia in #9677
- Add Google AI Studio
/v1/files
upload API support by @krrishdholakia in #9645 - [Docs] High Availability Setup (Resolve DB Deadlocks) by @ishaan-jaff in #9714
- Bump image-size from 1.1.1 to 1.2.1 in /docs/my-website by @dependabot in #9708
- [Bug fix] Azure o-series tool calling by @ishaan-jaff in #9694
- [Reliability Fix] - Use Redis for PodLock Manager instead of PG (ensures no deadlocks occur) by @ishaan-jaff in #9715
- Ban hardcoded numbers - merge of #9513 by @krrishdholakia in #9709
- [Feat] Add VertexAI gemini-2.0-flash by @Dobiasd in #9723
- Fix: Use request body in curl log for Gemini streaming mode by @fengjiajie in #9736
- LiteLLM Minor Fixes & Improvements (04/02/2025) by @krrishdholakia in #9725
- fix:Gemini Flash 2.0 implementation is not returning the logprobs by @sajdakabir in #9713
- UI Improvements + Fixes - remove 'default key' on user signup + fix showing user models available for personal key creation by @krrishdholakia in #9741
- Fix prompt caching for Anthropic tool calls by @aorwall in #9706
- passthrough kwargs during acompletion, and unwrap extra_body for openrouter by @adrianlyjak in #9747
- [Feat] UI - Test Key v2 page - allow testing image endpoints + polish the page by @ishaan-jaff in #9748
- [Feat] Allow assigning SSO users to teams on MSFT SSO by @ishaan-jaff in #9745
- Fix VertexAI Credential Caching issue by @krrishdholakia in #9756
- [Reliability] v2 DB Deadlock Reduction Architecture – Add Max Size for In-Memory Queue + Backpressure Mechanism by @ishaan-jaff in #9759
- fix(router.py): support reusable credentials via passthrough router by @krrishdholakia in #9758
- Allow team members to see team models by @krrishdholakia in #9742
- fix(xai/chat/transformation.py): filter out 'name' param for xai non-… by @krrishdholakia in #9761
- Gemini image generation output support by @krrishdholakia in #9646
- [Fix] issue that metadata key exist, but value is None by @chaosddp in #9764
- fix(asr-groq): add groq whisper models to model cost map by @liuhu in #9648
- Update model_prices_and_context_window.json by @caramulrooney in #9620
- [Reliability] Emit operational metrics for new DB Transaction architecture by @ishaan-jaff in #9719
- [Security feature] Allow adding authentication on /metrics endpoints by @ishaan-jaff in #9766
- [Reliability] Prometheus emit llm provider on failure metric - make it easy to differentiate litellm error vs llm api error by @ishaan-jaff in #9760
- Fix prisma migrate deploy to use correct directory by @krrishdholakia in #9767
- Add DBRX Anthropic w/ thinking + response_format support by @krrishdholakia in #9744
- build: bump litellm-proxy-extras version by @krrishdholakia in #9771
- Update model_prices by @aoaim in #9768
- Move daily user transaction logging outside of 'disable_spend_logs' flag - different tables by @krrishdholakia in https://gith...
v1.65.4-nightly
What's Changed
- UI Improvements + Fixes - remove 'default key' on user signup + fix showing user models available for personal key creation by @krrishdholakia in #9741
- Fix prompt caching for Anthropic tool calls by @aorwall in #9706
- passthrough kwargs during acompletion, and unwrap extra_body for openrouter by @adrianlyjak in #9747
- [Feat] UI - Test Key v2 page - allow testing image endpoints + polish the page by @ishaan-jaff in #9748
- [Feat] Allow assigning SSO users to teams on MSFT SSO by @ishaan-jaff in #9745
- Fix VertexAI Credential Caching issue by @krrishdholakia in #9756
- [Reliability] v2 DB Deadlock Reduction Architecture – Add Max Size for In-Memory Queue + Backpressure Mechanism by @ishaan-jaff in #9759
- fix(router.py): support reusable credentials via passthrough router by @krrishdholakia in #9758
- Allow team members to see team models by @krrishdholakia in #9742
- fix(xai/chat/transformation.py): filter out 'name' param for xai non-… by @krrishdholakia in #9761
- Gemini image generation output support by @krrishdholakia in #9646
- [Fix] issue that metadata key exist, but value is None by @chaosddp in #9764
- fix(asr-groq): add groq whisper models to model cost map by @liuhu in #9648
- Update model_prices_and_context_window.json by @caramulrooney in #9620
- [Reliability] Emit operational metrics for new DB Transaction architecture by @ishaan-jaff in #9719
- [Security feature] Allow adding authentication on /metrics endpoints by @ishaan-jaff in #9766
- [Reliability] Prometheus emit llm provider on failure metric - make it easy to differentiate litellm error vs llm api error by @ishaan-jaff in #9760
- Fix prisma migrate deploy to use correct directory by @krrishdholakia in #9767
- Add DBRX Anthropic w/ thinking + response_format support by @krrishdholakia in #9744
New Contributors
- @aorwall made their first contribution in #9706
- @adrianlyjak made their first contribution in #9747
- @chaosddp made their first contribution in #9764
- @liuhu made their first contribution in #9648
- @caramulrooney made their first contribution in #9620
Full Changelog: v1.65.3.dev5...v1.65.4-nightly
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.4-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 200.0 | 223.7736452877303 | 6.155321270706562 | 0.0 | 1842 | 0 | 181.8848560000106 | 4326.022138999974 |
Aggregated | Passed ✅ | 200.0 | 223.7736452877303 | 6.155321270706562 | 0.0 | 1842 | 0 | 181.8848560000106 | 4326.022138999974 |
v1.65.3.dev5
Full Changelog: v1.65.3-nightly...v1.65.3.dev5
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.3.dev5
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 240.0 | 257.21842034089684 | 6.187487225938014 | 0.0 | 1851 | 0 | 214.41921800010277 | 1901.027192000015 |
Aggregated | Passed ✅ | 240.0 | 257.21842034089684 | 6.187487225938014 | 0.0 | 1851 | 0 | 214.41921800010277 | 1901.027192000015 |
v1.65.3-nightly.post1
Full Changelog: v1.65.3-nightly...v1.65.3-nightly.post1
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.3-nightly.post1
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 190.0 | 229.62687951995775 | 6.194402083103671 | 0.0 | 1854 | 0 | 166.3033429999814 | 4963.725549999992 |
Aggregated | Passed ✅ | 190.0 | 229.62687951995775 | 6.194402083103671 | 0.0 | 1854 | 0 | 166.3033429999814 | 4963.725549999992 |
v1.65.3-nightly
What's Changed
- Litellm user daily activity allow non admin usage by @krrishdholakia in #9695
- fix(model_management_endpoints.py): fix allowing team admins to update team models by @krrishdholakia in #9697
- Add support for max_completion_tokens to the Cohere chat transformati… by @simha104 in #9701
- fix(gemini/): add gemini/ route embedding optional param mapping support by @krrishdholakia in #9677
- Add Google AI Studio
/v1/files
upload API support by @krrishdholakia in #9645 - [Docs] High Availability Setup (Resolve DB Deadlocks) by @ishaan-jaff in #9714
- Bump image-size from 1.1.1 to 1.2.1 in /docs/my-website by @dependabot in #9708
- [Bug fix] Azure o-series tool calling by @ishaan-jaff in #9694
- [Reliability Fix] - Use Redis for PodLock Manager instead of PG (ensures no deadlocks occur) by @ishaan-jaff in #9715
- Ban hardcoded numbers - merge of #9513 by @krrishdholakia in #9709
- [Feat] Add VertexAI gemini-2.0-flash by @Dobiasd in #9723
- Fix: Use request body in curl log for Gemini streaming mode by @fengjiajie in #9736
- LiteLLM Minor Fixes & Improvements (04/02/2025) by @krrishdholakia in #9725
- fix:Gemini Flash 2.0 implementation is not returning the logprobs by @sajdakabir in #9713
New Contributors
- @simha104 made their first contribution in #9701
- @Dobiasd made their first contribution in #9723
- @sajdakabir made their first contribution in #9713
Full Changelog: v1.65.2.dev1...v1.65.3-nightly
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.3-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.3-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 250.0 | 280.0203592110988 | 6.144626182419757 | 0.0 | 1838 | 0 | 216.55763899997282 | 5015.033350000011 |
Aggregated | Passed ✅ | 250.0 | 280.0203592110988 | 6.144626182419757 | 0.0 | 1838 | 0 | 216.55763899997282 | 5015.033350000011 |
v1.65.2.dev1
What's Changed
- Openrouter streaming fixes + Anthropic 'file' message support by @krrishdholakia in #9667
- fix(cost_calculator.py): allows checking received + sent model name w… by @krrishdholakia in #9669
- Revert "Revert "Correct Databricks llama3.3-70b endpoint and add databricks c…" by @krrishdholakia in #9676
- Update model_prices_and_context_window.json add gemini-2.5-pro-exp-03-25 by @superpoussin22 in #9650
- fix(proxy_server.py): Fix "Circular reference detected" error when max_parallel_requests = 0 by @krrishdholakia in #9671
- UI (new_usage.tsx): Report 'total_tokens' + report success/failure calls by @krrishdholakia in #9675
- [Reliability] - Ensure new Redis + DB architecture tracks spend accurately by @ishaan-jaff in #9673
- [Bug fix] - Service accounts - only apply
service_account_settings.enforced_params
on service accounts by @ishaan-jaff in #9683 - UI - New Usage Tab fixes by @krrishdholakia in #9696
- [Reliability Fixes] - Ensure no deadlocks occur when updating
DailyUserSpendTransaction
by @ishaan-jaff in #9690 - Virtual key based policies in Aim Guardrails by @hxtomer in #9499
- fix(streaming_handler.py): fix completion start time tracking + Anthropic 'reasoning_effort' param mapping by @krrishdholakia in #9688
Full Changelog: v1.65.1-nightly...v1.65.2.dev1
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.2.dev1
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 200.0 | 228.05634387180868 | 6.2817559363541715 | 0.0 | 1880 | 0 | 183.1938070000092 | 4938.761445000011 |
Aggregated | Passed ✅ | 200.0 | 228.05634387180868 | 6.2817559363541715 | 0.0 | 1880 | 0 | 183.1938070000092 | 4938.761445000011 |
v1.65.1-nightly
What's Changed
- Litellm fix db testing by @krrishdholakia in #9593
- Litellm new UI build by @krrishdholakia in #9601
- Support max_completion_tokens on Mistral by @Cmancuso in #9589
- Revert "Support max_completion_tokens on Mistral" by @krrishdholakia in #9604
- fix(mistral_chat_transformation.py): add missing comma by @krrishdholakia in #9606
- Support discovering gemini, anthropic, xai models by calling their
/v1/model
endpoint by @krrishdholakia in #9530 - Connect UI to "LiteLLM_DailyUserSpend" spend table - enables usage tab to work at 1m+ spend logs by @krrishdholakia in #9603
- Update README.md by @krrishdholakia in #9616
- Add support to Vertex AI transformation for anyOf union type with null fields by @NickGrab in #9618
- fix(proxy_server.py): get master key from environment, if not set in … by @krrishdholakia in #9617
- fix(logging): add json formatting for uncaught exceptions (#9615) by @krrishdholakia in #9619
- fix: wrong indentation of ttlSecondsAfterFinished in chart by @Dbzman in #9611
- Fix anthropic thinking + response_format by @krrishdholakia in #9594
- Add support to Vertex AI transformation for anyOf union type with null fields by @NickGrab in #9625
- fix(openrouter/chat/transformation.py): raise informative message for openrouter key error by @krrishdholakia in #9626
- [Reliability] - Reduce DB Deadlocks by storing spend updates in Redis and then committing to DB by @ishaan-jaff in #9608
- Add bedrock latency optimized inference support + Vertex AI Multimodal embedding cost tracking by @krrishdholakia in #9623
- build(pyproject.toml): add new dev dependencies - for type checking by @krrishdholakia in #9631
- install prisma migration files - connects litellm proxy to litellm's prisma migration files by @krrishdholakia in #9637
- update docs for openwebui by @tan-yong-sheng in #9636
- Add gemini audio input support + handle special tokens in sagemaker response by @krrishdholakia in #9640
- [Docs - Release notes v0] v1.65.0-stable by @ishaan-jaff in #9643
- [Feat] - MCP improvements, add support for using SSE MCP servers by @ishaan-jaff in #9642
- [FIX] - Add password to sync sentinel client by @jmarshall-medallia in #9622
- fix: Anthropic prompt caching on GCP Vertex AI by @sammcj in #9605
- Fixes Databricks llama3.3-70b endpoint and add databricks claude 3.7 sonnet endpoint by @anton164 in #9661
- fix(docs): update xAI Grok vision model reference by @colesmcintosh in #9286
- docs(gemini): fix typo by @GabrielLoiseau in #9581
- Update all_caches.md by @KPCOFGS in #9562
- [Bug fix] - Sagemaker endpoint with inference component streaming by @ishaan-jaff in #9515
- Revert "Correct Databricks llama3.3-70b endpoint and add databricks c… by @krrishdholakia in #9668
- Revert "fix: Anthropic prompt caching on GCP Vertex AI" by @krrishdholakia in #9670
- [Refactor] - Expose litellm.messages.acreate() and litellm.messages.create() to make LLM API calls in Anthropic API spec by @ishaan-jaff in #9567
New Contributors
- @Cmancuso made their first contribution in #9589
- @Dbzman made their first contribution in #9611
- @tan-yong-sheng made their first contribution in #9636
- @jmarshall-medallia made their first contribution in #9622
- @GabrielLoiseau made their first contribution in #9581
- @KPCOFGS made their first contribution in #9562
Full Changelog: v1.64.1.dev1...v1.65.1-nightly
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.65.1-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 220.0 | 261.03979166611845 | 6.112143157921839 | 0.0 | 1827 | 0 | 196.8891020000001 | 5075.201525000011 |
Aggregated | Passed ✅ | 220.0 | 261.03979166611845 | 6.112143157921839 | 0.0 | 1827 | 0 | 196.8891020000001 | 5075.201525000011 |
v1.65.0-stable
What's Changed
- Fix route check for non-proxy admins on jwt auth by @krrishdholakia in #9454
- docs(predibase): fix typo by @luisegarduno in #9464
- build(deps): bump next from 14.2.21 to 14.2.25 in /ui/litellm-dashboard by @dependabot in #9458
- [Feat] Add OpenAI Web Search Tool Call Support - Initial support by @ishaan-jaff in #9465
- Refactor vertex ai passthrough routes - fixes unpredictable behaviour w/ auto-setting default_vertex_region on router model add by @krrishdholakia in #9467
- [Feat] Add testing for
litellm.supports_web_search()
and render supports_web_search on model hub by @ishaan-jaff in #9469 - Litellm dev 03 22 2025 release note by @krrishdholakia in #9475
- build: add new vertex text embedding model by @krrishdholakia in #9476
- enables viewing all wildcard models on /model/info by @krrishdholakia in #9473
- Litellm redis semantic caching by @tylerhutcherson in #9356
- Log 'api_base' on spend logs by @krrishdholakia in #9509
- [Fix] Use StandardLoggingPayload for GCS Pub Sub Logging Integration by @ishaan-jaff in #9508
- [Feat] Support for exposing MCP tools on litellm proxy by @ishaan-jaff in #9426
- fix(invoke_handler.py): remove hard coded final usage chunk on bedrock streaming usage by @krrishdholakia in #9512
- Add vertexai topLogprobs support by @krrishdholakia in #9518
- Update model_prices_and_context_window.json by @superpoussin22 in #9459
- fix vertex ai multimodal embedding translation by @krrishdholakia in #9471
- ci(publish-migrations.yml): add action for publishing prisma db migrations by @krrishdholakia in #9537
- [Feat - New Model] Add VertexAI
gemini-2.0-flash-lite
and Google AI Studiogemini-2.0-flash-lite
by @ishaan-jaff in #9523 - Support
litellm.api_base
for vertex_ai + gemini/ across completion, embedding, image_generation by @krrishdholakia in #9516 - Nova Canvas complete image generation tasks (#9177) by @krrishdholakia in #9525
- [Feature]: Support for Fine-Tuned Vertex AI LLMs by @ishaan-jaff in #9542
- feat(prisma-migrations): add baseline db migration file by @krrishdholakia in #9565
- Add Daily User Spend Aggregate view - allows UI Usage tab to work > 1m rows by @krrishdholakia in #9538
- Support Gemini audio token cost tracking + fix openai audio input token cost tracking by @krrishdholakia in #9535
- [Reliability Fixes] - Gracefully handle exceptions when DB is having an outage by @ishaan-jaff in #9533
- [Reliability Fix] - Allow Pods to startup + passing /health/readiness when
allow_requests_on_db_unavailable: True
and DB is down by @ishaan-jaff in #9569 - Add OpenAI gpt-4o-transcribe support by @krrishdholakia in #9517
- Allow viewing keyinfo on request logs by @krrishdholakia in #9568
- Allow team admins to add/update/delete models on UI + show api base and model id on request logs by @krrishdholakia in #9572
- Litellm fix db testing by @krrishdholakia in #9593
- Litellm new UI build by @krrishdholakia in #9601
- Support max_completion_tokens on Mistral by @Cmancuso in #9589
- Revert "Support max_completion_tokens on Mistral" by @krrishdholakia in #9604
- fix(mistral_chat_transformation.py): add missing comma by @krrishdholakia in #9606
- Support discovering gemini, anthropic, xai models by calling their
/v1/model
endpoint by @krrishdholakia in #9530 - Connect UI to "LiteLLM_DailyUserSpend" spend table - enables usage tab to work at 1m+ spend logs by @krrishdholakia in #9603
- Update README.md by @krrishdholakia in #9616
- fix(proxy_server.py): get master key from environment, if not set in … by @krrishdholakia in #9617
New Contributors
- @luisegarduno made their first contribution in #9464
- @Cmancuso made their first contribution in #9589
Full Changelog: v1.63.14-stable.patch1...v1.65.0-stable
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.65.0-stable
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 200.0 | 233.43193575834258 | 6.214443976298119 | 0.0 | 1858 | 0 | 180.17820199997914 | 4614.819022000006 |
Aggregated | Passed ✅ | 200.0 | 233.43193575834258 | 6.214443976298119 | 0.0 | 1858 | 0 | 180.17820199997914 | 4614.819022000006 |