Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe PR adds comprehensive vLLM provider support to Bifrost, including a complete OpenAI-compatible provider implementation, integration into the core system, UI components, configuration schema updates, and extensive documentation. vLLM is configured as a keyless, self-hosted inference provider. Changes
Poem
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🧪 Test Suite AvailableThis PR can be tested by a repository admin. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
core/utils.go (1)
51-91:⚠️ Potential issue | 🟠 MajorAllow optional API keys for vLLM; current keyless path skips auth.
Line 90 now treats vLLM as keyless, so key selection never runs. That means authenticated vLLM instances won’t receive Authorization headers even if a key is configured, which conflicts with the PR’s “authenticated/unauthenticated” support. Consider an “optional key” flow (use a key when present, but allow none) instead of fully bypassing key selection.
core/internal/llmtests/account.go (1)
741-1335:⚠️ Potential issue | 🟡 MinorVLLM is missing from
AllProviderConfigs, but the supporting evidence in this review is inaccurate.VLLM is indeed absent from
AllProviderConfigsdespite having a full provider implementation and a dedicated test file (core/providers/vllm/vllm_test.go). However, the review incorrectly states that "similarly self-hosted ones like Ollama and SGL" have entries in the suite. While Ollama is present, SGL is also missing fromAllProviderConfigs, making the inconsistency broader than stated. Both VLLM and SGL require external servers yet are excluded, while Ollama is included. If this omission is intentional, a comment explaining why would help future contributors.
🤖 Fix all issues with AI agents
In `@core/internal/llmtests/account.go`:
- Around line 361-369: The vLLM test key incorrectly advertises batch support:
in the switch case for schemas.VLLM that returns []schemas.Key (the struct with
Value, Models, Weight, UseForBatchAPI), change UseForBatchAPI from
bifrost.Ptr(true) to bifrost.Ptr(false) or remove the field entirely so keys do
not claim batch capability; update the value on the schemas.Key returned by that
case to reflect that vLLM does not support batch operations (refer to
UseForBatchAPI and the schemas.VLLM case).
In `@core/providers/vllm/vllm_test.go`:
- Around line 34-55: The Transcription scenario is enabled but vLLM in this PR
does not support transcription; update the llmtests.TestScenarios in
vllm_test.go to set Transcription to false (and ensure TranscriptionStream
remains false) so the test suite does not run transcription tests against a
standard vLLM server; locate the TestScenarios literal in the vllm_test.go file
and change the Transcription field accordingly.
In `@core/providers/vllm/vllm.go`:
- Around line 89-106: TextCompletionStream (and likewise ChatCompletionStream)
currently builds the request URL using provider.networkConfig.BaseURL +
"/v1/completions" which ignores context overrides; change the URL construction
in TextCompletionStream to use provider.networkConfig.BaseURL +
providerUtils.GetPathFromContext(ctx, "/v1/completions") (and do the analogous
change in ChatCompletionStream to use GetPathFromContext(ctx,
"/v1/chat/completions")) so streaming calls honor context path overrides while
keeping the rest of the call to openai.HandleOpenAITextCompletionStreaming
unchanged.
🧹 Nitpick comments (2)
core/providers/vllm/vllm.go (2)
125-144:ChatCompletionStreamhardcodesschemas.VLLMinstead of usingprovider.GetProviderKey().Line 136 passes
schemas.VLLMdirectly, while all other methods in this file consistently useprovider.GetProviderKey(). This breaks the pattern recommended by the codebase learnings: when a custom provider alias is configured, hardcoded constants won't reflect the actual provider key.Proposed fix
return openai.HandleOpenAIChatCompletionStreaming( ctx, provider.client, - provider.networkConfig.BaseURL+"/v1/chat/completions", + provider.networkConfig.BaseURL+providerUtils.GetPathFromContext(ctx, "/v1/chat/completions"), request, nil, provider.networkConfig.ExtraHeaders, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), - schemas.VLLM, + provider.GetProviderKey(), postHookRunner,Based on learnings: "When handling unsupported operations across providers, avoid hardcoding provider constants (e.g., schemas.Bedrock). Use the provider.GetProviderKey() (or equivalent API) to obtain the actual provider key from configuration."
196-209: Consider whether conditional transcription support should be gated by model type validation.vLLM's OpenAI-compatible server does support the
/v1/audio/transcriptionsendpoint, but only when serving speech-to-text/ASR models (e.g., Whisper-class models). The endpoint won't be exposed if the loaded model isn't recognized as supporting the "transcription" task.The current implementation unconditionally delegates to the OpenAI handler without validating whether the deployed vLLM instance actually supports transcription. If a user deploys vLLM with a standard text LLM, callers will receive vLLM server errors rather than a clear "unsupported operation" response. Consider either:
- Adding runtime validation that the model supports transcription before attempting the operation, or
- Documenting that this method only works with ASR models
237e761 to
09e4e1b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/providers/vllm/vllm.go`:
- Around line 125-144: Replace the hardcoded schemas.VLLM argument in
VLLMProvider.ChatCompletionStream with the provider.GetProviderKey() call so the
actual provider key/alias is used consistently; locate the ChatCompletionStream
method on VLLMProvider and change the parameter currently passing schemas.VLLM
to provider.GetProviderKey() to match other methods in this file and support
provider aliasing.
09e4e1b to
588ebc8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@core/providers/openai/transcription.go`:
- Around line 114-139: The parseVLLMTranscriptionStreamChunk function currently
returns delta-type responses with only Text populated, causing downstream
consumer checks (which look at chunk.Delta) to skip them; update the delta
return to also set Delta to point to the same content string (i.e., populate the
Delta *string alongside Text) so downstream accumulation in
frameworks/streaming/transcription.go sees and appends the chunk; ensure the
returned schemas.BifrostTranscriptionStreamResponse for Type
TranscriptionStreamResponseTypeDelta sets both Text and Delta (reference:
parseVLLMTranscriptionStreamChunk, schemas.BifrostTranscriptionStreamResponse,
field Delta).
In `@core/providers/vllm/vllm.go`:
- Around line 89-105: The streaming methods are not passing the selected auth
key into the OpenAI streaming helpers (they pass nil), causing Authorization to
be omitted; update VLLMProvider.TextCompletionStream (and likewise the
ChatCompletionStream at the other location) to pass the provided key variable
instead of nil into the openai.HandleOpenAITextCompletionStreaming (and the
corresponding chat streaming call) so the Authorization header is included for
authenticated vLLM servers.
🧹 Nitpick comments (4)
core/internal/llmtests/account.go (1)
634-647: Consider aligning retry backoff with other self-hosted providers.The retry config uses
5sinitial /3minmax backoff, which matches cloud providers like Cerebras but is much higher than Ollama (250ms/4s) — another self-hosted, OpenAI-compatible provider. For a local vLLM instance, shorter backoffs would give faster test feedback. Not blocking since this is test infrastructure.core/providers/openai/transcription.go (1)
123-128: Content in the final usage chunk would be silently dropped.If vLLM sends a chunk containing both
UsageandChoices[0].Delta.Content, the current logic prioritizes usage and returns aDoneresponse, discarding any accompanying text content. In practice, OpenAI-compatible APIs typically send usage in a separate final chunk with null content, so this is unlikely to be an issue — but worth a comment noting the assumption.core/providers/openai/openai.go (1)
2343-2357: Hardcodedschemas.VLLMcheck in shared handler breaks custom provider aliasing.
HandleOpenAITranscriptionStreamRequestis a shared function reused by multiple providers. CheckingproviderName == schemas.VLLMdirectly means a custom provider wrapping vLLM (with a different alias) will never hit the vLLM parsing path.Consider an extensible approach — e.g., accept an optional custom chunk parser function (similar to the existing
postResponseConverterpattern), or pass a flag/option indicating the vLLM parsing mode. This keeps the shared handler agnostic to specific provider identities.Suggested approach: add an optional parser parameter
func HandleOpenAITranscriptionStreamRequest( ctx *schemas.BifrostContext, client *fasthttp.Client, url string, request *schemas.BifrostTranscriptionRequest, authHeader map[string]string, extraHeaders map[string]string, sendBackRawResponse bool, providerName schemas.ModelProvider, postHookRunner schemas.PostHookRunner, postRequestConverter func(*OpenAITranscriptionRequest) *OpenAITranscriptionRequest, postResponseConverter func(*schemas.BifrostTranscriptionStreamResponse) *schemas.BifrostTranscriptionStreamResponse, + customChunkParser func([]byte) (*schemas.BifrostTranscriptionStreamResponse, bool), logger schemas.Logger, ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) {Then in the streaming goroutine:
var response schemas.BifrostTranscriptionStreamResponse - if providerName == schemas.VLLM { - if parsed, ok := parseVLLMTranscriptionStreamChunk([]byte(jsonData)); ok && parsed != nil { + if customChunkParser != nil { + if parsed, ok := customChunkParser([]byte(jsonData)); ok && parsed != nil { response = *parsed } else {The vLLM provider would pass
parseVLLMTranscriptionStreamChunkas the argument; OpenAI and others passnil.Based on learnings: "When handling unsupported operations across providers, avoid hardcoding provider constants (e.g., schemas.Bedrock). Use the provider.GetProviderKey() (or equivalent API) to obtain the actual provider key from configuration, ensuring errors and messages adapt to custom provider names."
core/providers/vllm/vllm_test.go (1)
73-77: Trim env values to avoid whitespace-only model names.This avoids accidental spaces when env vars are set via CI or shell scripts.
♻️ Suggested tweak
func getEnvWithDefault(key, def string) string { - if v := os.Getenv(key); v != "" { - return v + if v := strings.TrimSpace(os.Getenv(key)); v != "" { + return v } return def }
588ebc8 to
4d077a8
Compare
| } | ||
| } | ||
|
|
||
| var errorResp schemas.BifrostError |
There was a problem hiding this comment.
any reason for this? we already are handling error below right?
There was a problem hiding this comment.
vllm sends ErrorField object as error (as 200), we are only handling unmarshalling errors below
Pratham-Mishra04
left a comment
There was a problem hiding this comment.
added comments
4d077a8 to
63a5243
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@core/providers/openai/openai.go`:
- Around line 2347-2355: In the customChunkParser branch (customChunkParser,
response, schemas.BifrostTranscriptionStreamResponse, logger.Warn), remove the
undefined err reference and guard against nil: call
customChunkParser([]byte(jsonData)), then check if ok && customChunk != nil
before doing response = *customChunk; otherwise log a simple warning like
"Failed to parse stream response" (no err var) and continue. Ensure the code
never dereferences a nil customChunk.
In `@core/providers/vllm/utils.go`:
- Around line 24-38: The delta branch currently leaves response.Type empty;
update the block handling chunk.Choices (in utils.go) so that when
chunk.Choices[0].Delta.Content is present but not a "stop" finish, you assign
response.Type = schemas.TranscriptionStreamResponseTypeDelta (and still set
response.Delta = chunk.Choices[0].Delta.Content); when the finish reason is
"stop" keep setting response.Type = schemas.TranscriptionStreamResponseTypeDone
as already done. Adjust the logic around chunk.Choices[0].Delta.Content,
FinishReason/StopReason and response.Type to ensure every emitted delta response
has a non-empty Type.
In `@core/providers/vllm/vllm.go`:
- Around line 24-41: The doc comment for NewVLLMProvider incorrectly claims a
default base URL; update the comment to state that config.NetworkConfig.BaseURL
is required (no default) to match the code path that returns an error when
BaseURL is empty, referencing NewVLLMProvider and the
config.NetworkConfig.BaseURL check; ensure you remove the "Default base URL is
http://localhost:8000" text and instead note that API key is optional and
BaseURL must be provided (or alternatively, if you prefer Option A, implement
setting config.NetworkConfig.BaseURL to "http://localhost:8000" before the
BaseURL check and remove the error branch).
🧹 Nitpick comments (2)
core/internal/llmtests/account.go (2)
634-647: Retry backoff seems high for a self-hosted service.The
RetryBackoffMaxof 3 minutes is quite aggressive for a self-hosted vLLM instance. Compare with other self-hosted providers:
- Ollama: 4s max backoff
- SGL: 15s max backoff
A 3-minute max backoff may cause unnecessarily long waits in test (and real) scenarios when hitting a local vLLM server. Consider aligning with SGL's settings (e.g.,
RetryBackoffInitial: 1s,RetryBackoffMax: 15s) since both are self-hosted OpenAI-compatible servers.Suggested adjustment
case schemas.VLLM: return &schemas.ProviderConfig{ NetworkConfig: schemas.NetworkConfig{ BaseURL: os.Getenv("VLLM_BASE_URL"), DefaultRequestTimeoutInSeconds: 120, - MaxRetries: 10, // vllm is stable - RetryBackoffInitial: 5 * time.Second, - RetryBackoffMax: 3 * time.Minute, + MaxRetries: 10, // vllm (self-hosted) is generally stable + RetryBackoffInitial: 1 * time.Second, + RetryBackoffMax: 15 * time.Second, },
634-647: NoComprehensiveTestConfigentry inAllProviderConfigsfor vLLM.There's no entry for vLLM in the
AllProviderConfigsslice (which starts at line 732), so the comprehensive integration test suite won't exercise vLLM scenarios. If dedicated tests exist incore/providers/vllm/vllm_test.go, this may be intentional—but adding an entry (even with minimal scenarios likeSimpleChat,CompletionStream, andEmbedding) would provide coverage through the shared test harness.
| var response schemas.BifrostTranscriptionStreamResponse | ||
| if err := sonic.Unmarshal([]byte(jsonData), &response); err != nil { | ||
| logger.Warn("Failed to parse stream response: %v", err) | ||
| continue | ||
| if customChunkParser != nil { | ||
| customChunk, ok := customChunkParser([]byte(jsonData)) | ||
| if ok { | ||
| response = *customChunk | ||
| } else { | ||
| logger.Warn("Failed to parse stream response: %v", err) | ||
| continue | ||
| } |
There was a problem hiding this comment.
Fix undefined variable in custom chunk parser branch.
Line 2353 references err, which is undefined in this scope (compile error). Also guard against ok == true with a nil chunk to prevent panics.
🛠️ Proposed fix
- customChunk, ok := customChunkParser([]byte(jsonData))
- if ok {
- response = *customChunk
- } else {
- logger.Warn("Failed to parse stream response: %v", err)
- continue
- }
+ customChunk, ok := customChunkParser([]byte(jsonData))
+ if !ok || customChunk == nil {
+ logger.Warn("customChunkParser returned no chunk")
+ continue
+ }
+ response = *customChunk🤖 Prompt for AI Agents
In `@core/providers/openai/openai.go` around lines 2347 - 2355, In the
customChunkParser branch (customChunkParser, response,
schemas.BifrostTranscriptionStreamResponse, logger.Warn), remove the undefined
err reference and guard against nil: call customChunkParser([]byte(jsonData)),
then check if ok && customChunk != nil before doing response = *customChunk;
otherwise log a simple warning like "Failed to parse stream response" (no err
var) and continue. Ensure the code never dereferences a nil customChunk.
| // Delta chunk: has choices[].delta.content | ||
| if len(chunk.Choices) == 0 || chunk.Choices[0].Delta.Content == nil { | ||
| return nil, false | ||
| } | ||
| if len(chunk.Choices) > 0 { | ||
| reason := chunk.Choices[0].FinishReason | ||
| if reason == nil && chunk.Choices[0].StopReason != nil { | ||
| reason = chunk.Choices[0].StopReason | ||
| } | ||
| if reason != nil && *reason == "stop" { | ||
| response.Text = *chunk.Choices[0].Delta.Content | ||
| response.Type = schemas.TranscriptionStreamResponseTypeDone | ||
| } | ||
| response.Delta = chunk.Choices[0].Delta.Content | ||
| } |
There was a problem hiding this comment.
Set a non-empty Type for delta chunks.
Delta responses currently return with Type == "", which can emit invalid stream events (Type is required in BifrostTranscriptionStreamResponse). Set it to the delta event type used elsewhere in schemas/transcriptions.go.
✅ Suggested patch
if reason != nil && *reason == "stop" {
response.Text = *chunk.Choices[0].Delta.Content
response.Type = schemas.TranscriptionStreamResponseTypeDone
}
response.Delta = chunk.Choices[0].Delta.Content
+ if response.Type == "" {
+ response.Type = schemas.TranscriptionStreamResponseTypeDelta
+ }
}🤖 Prompt for AI Agents
In `@core/providers/vllm/utils.go` around lines 24 - 38, The delta branch
currently leaves response.Type empty; update the block handling chunk.Choices
(in utils.go) so that when chunk.Choices[0].Delta.Content is present but not a
"stop" finish, you assign response.Type =
schemas.TranscriptionStreamResponseTypeDelta (and still set response.Delta =
chunk.Choices[0].Delta.Content); when the finish reason is "stop" keep setting
response.Type = schemas.TranscriptionStreamResponseTypeDone as already done.
Adjust the logic around chunk.Choices[0].Delta.Content, FinishReason/StopReason
and response.Type to ensure every emitted delta response has a non-empty Type.
| // NewVLLMProvider creates a new vLLM provider instance. | ||
| // Default base URL is http://localhost:8000. API key is optional for local instances. | ||
| func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) { | ||
| config.CheckAndSetDefaults() | ||
|
|
||
| client := &fasthttp.Client{ | ||
| ReadTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), | ||
| WriteTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), | ||
| MaxConnsPerHost: 5000, | ||
| MaxIdleConnDuration: 60 * time.Second, | ||
| MaxConnWaitTimeout: 10 * time.Second, | ||
| } | ||
|
|
||
| client = providerUtils.ConfigureProxy(client, config.ProxyConfig, logger) | ||
|
|
||
| if config.NetworkConfig.BaseURL == "" { | ||
| return nil, fmt.Errorf("base_url is required for vllm provider") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if CheckAndSetDefaults sets a default BaseURL for vLLM
echo "=== Looking for CheckAndSetDefaults and BaseURL default logic ==="
rg -n "CheckAndSetDefaults" core/schemas/ -A 20 | head -100
echo ""
echo "=== vLLM provider constructor context ==="
rg -n "NewVLLMProvider" core/providers/vllm/ -A 30 | head -50
echo ""
echo "=== Comparing with Ollama provider ==="
rg -n "NewOllamaProvider" core/providers/ollama/ -A 30 | head -50
echo ""
echo "=== Comparing with SGL provider ==="
rg -n "NewSGLProvider" core/providers/sgl/ -A 30 | head -50Repository: maximhq/bifrost
Length of output: 8949
Fix doc comment—it claims a default base URL that the code doesn't provide.
The comment on line 25 states "Default base URL is http://localhost:8000", but lines 39–41 return an error if BaseURL is empty with no fallback. Other similar providers (Ollama, SGL) also require base_url and explicitly state it in comments. Either set the default here or correct the comment to say base_url is required.
Option A: Set the default (if intended)
config.CheckAndSetDefaults()
+
+ if config.NetworkConfig.BaseURL == "" {
+ config.NetworkConfig.BaseURL = "http://localhost:8000"
+ }
client := &fasthttp.Client{Then remove the error check on lines 39–41.
Option B: Fix the comment to match code behavior
// NewVLLMProvider creates a new vLLM provider instance.
-// Default base URL is http://localhost:8000. API key is optional for local instances.
+// base_url is required in the provider config. API key is optional for local instances.
func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // NewVLLMProvider creates a new vLLM provider instance. | |
| // Default base URL is http://localhost:8000. API key is optional for local instances. | |
| func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) { | |
| config.CheckAndSetDefaults() | |
| client := &fasthttp.Client{ | |
| ReadTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), | |
| WriteTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), | |
| MaxConnsPerHost: 5000, | |
| MaxIdleConnDuration: 60 * time.Second, | |
| MaxConnWaitTimeout: 10 * time.Second, | |
| } | |
| client = providerUtils.ConfigureProxy(client, config.ProxyConfig, logger) | |
| if config.NetworkConfig.BaseURL == "" { | |
| return nil, fmt.Errorf("base_url is required for vllm provider") | |
| } | |
| // NewVLLMProvider creates a new vLLM provider instance. | |
| // base_url is required in the provider config. API key is optional for local instances. | |
| func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) { | |
| config.CheckAndSetDefaults() | |
| client := &fasthttp.Client{ | |
| ReadTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), | |
| WriteTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), | |
| MaxConnsPerHost: 5000, | |
| MaxIdleConnDuration: 60 * time.Second, | |
| MaxConnWaitTimeout: 10 * time.Second, | |
| } | |
| client = providerUtils.ConfigureProxy(client, config.ProxyConfig, logger) | |
| if config.NetworkConfig.BaseURL == "" { | |
| return nil, fmt.Errorf("base_url is required for vllm provider") | |
| } |
🤖 Prompt for AI Agents
In `@core/providers/vllm/vllm.go` around lines 24 - 41, The doc comment for
NewVLLMProvider incorrectly claims a default base URL; update the comment to
state that config.NetworkConfig.BaseURL is required (no default) to match the
code path that returns an error when BaseURL is empty, referencing
NewVLLMProvider and the config.NetworkConfig.BaseURL check; ensure you remove
the "Default base URL is http://localhost:8000" text and instead note that API
key is optional and BaseURL must be provided (or alternatively, if you prefer
Option A, implement setting config.NetworkConfig.BaseURL to
"http://localhost:8000" before the BaseURL check and remove the error branch).

Summary
Added support for vLLM, an OpenAI-compatible provider for self-hosted inference. This integration allows users to connect to vLLM servers for chat completions, text completions, embeddings, and streaming capabilities.
Changes
vllmprovider package that implements the Provider interfaceType of change
Affected areas
How to test
Breaking changes
Related issues
Adds support for vLLM as requested in several community discussions.
Security considerations
The vLLM provider is designed to work with both authenticated and unauthenticated vLLM instances. For local development, API keys are optional, but production deployments should consider using authentication.
Checklist