Skip to content

feat: add vllm provider support#1649

Open
Radheshg04 wants to merge 1 commit intomainfrom
02-14-feat_add_vllm_provider_support
Open

feat: add vllm provider support#1649
Radheshg04 wants to merge 1 commit intomainfrom
02-14-feat_add_vllm_provider_support

Conversation

@Radheshg04
Copy link
Contributor

@Radheshg04 Radheshg04 commented Feb 14, 2026

Summary

Added support for vLLM, an OpenAI-compatible provider for self-hosted inference. This integration allows users to connect to vLLM servers for chat completions, text completions, embeddings, and streaming capabilities.

Changes

  • Added a new vllm provider package that implements the Provider interface
  • Implemented OpenAI-compatible endpoints for chat, text completions, and embeddings
  • Added support for streaming responses
  • Created documentation for the vLLM provider
  • Added UI constants for vLLM provider
  • Updated test infrastructure to support vLLM testing

Type of change

  • Feature
  • Bug fix
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

# Set environment variables
export VLLM_BASE_URL="http://localhost:8000"  # Optional: defaults to this value

# Run tests
go test ./core/providers/vllm/...

# Or run all tests
go test ./...

Breaking changes

  • Yes
  • No

Related issues

Adds support for vLLM as requested in several community discussions.

Security considerations

The vLLM provider is designed to work with both authenticated and unauthenticated vLLM instances. For local development, API keys are optional, but production deployments should consider using authentication.

Checklist

  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added vLLM provider support for OpenAI-compatible self-hosted inference
    • Supports chat completions, text completions, embeddings, transcription, and model listing
    • Configurable via base URL without requiring authentication keys
  • Documentation

    • Added comprehensive vLLM provider documentation with setup and configuration examples

Walkthrough

The PR adds comprehensive vLLM provider support to Bifrost, including a complete OpenAI-compatible provider implementation, integration into the core system, UI components, configuration schema updates, and extensive documentation. vLLM is configured as a keyless, self-hosted inference provider.

Changes

Cohort / File(s) Summary
Core vLLM Provider Implementation
core/providers/vllm/vllm.go, core/providers/vllm/types.go, core/providers/vllm/utils.go
New VLLMProvider struct with constructor, GetProviderKey method, and 32+ handler methods for chat completion, text completion, embeddings, streaming, transcription, and other operations. Unsupported operations return appropriate errors. Includes custom transcription chunk parsing and text accumulation for streaming.
Core System Integration
core/bifrost.go, core/schemas/bifrost.go, core/utils.go, core/internal/llmtests/account.go
Wires vLLM into provider factory, adds VLLM constant to schemas and StandardProviders list, marks vLLM as keyless provider, and adds test configuration with timeout, retry, and concurrency settings.
Test Suite
core/providers/vllm/vllm_test.go
Integration test harness using environment-driven configuration, executing comprehensive test matrix covering chat, text completion, embeddings, list models, transcription, and tool calling scenarios.
Supporting Provider Updates
core/providers/openai/openai.go, core/providers/utils/utils.go
Adds customizable transcription stream chunk parsing and text accumulation to OpenAI provider; improves error response pre-checking in HandleProviderResponse.
UI Configuration
ui/lib/constants/config.ts, ui/lib/constants/icons.tsx, ui/lib/constants/logs.ts, ui/app/workspace/providers/views/utils.ts
Adds vLLM provider placeholder, icon SVG component, display label, and marks provider as not requiring API keys.
Configuration & Schema
transports/config.schema.json, core/changelog.md
Extends JSON schema with vllm provider definition and semantic cache plugin support; adds changelog entry.
Documentation
docs/providers/supported-providers/vllm.mdx, docs/docs.json, docs/overview.mdx, docs/providers/supported-providers/overview.mdx
Comprehensive vLLM documentation including overview, supported operations matrix, authentication, configuration examples, and provider links; updates doc index and overview cards.

Poem

🐰 A vLLM hops into the warren today,
Self-hosted and swift, in the Bifrost gateway!
With streaming streams flowing and chat chambers calling,
The provider extends—no API walls stalling!
From schemas to icons, all systems align,
Another fine friend in the inference line! ✨

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: adding support for the vLLM provider. It is specific, clear, and directly relates to the comprehensive changeset.
Description check ✅ Passed The description includes a clear summary, detailed changes, correct type of change selection, affected areas, testing instructions, security considerations, and completed checklist items. It follows the template structure well.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 02-14-feat_add_vllm_provider_support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Contributor

🧪 Test Suite Available

This PR can be tested by a repository admin.

Run tests for PR #1649

@Radheshg04 Radheshg04 marked this pull request as ready for review February 14, 2026 06:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/utils.go (1)

51-91: ⚠️ Potential issue | 🟠 Major

Allow optional API keys for vLLM; current keyless path skips auth.

Line 90 now treats vLLM as keyless, so key selection never runs. That means authenticated vLLM instances won’t receive Authorization headers even if a key is configured, which conflicts with the PR’s “authenticated/unauthenticated” support. Consider an “optional key” flow (use a key when present, but allow none) instead of fully bypassing key selection.

core/internal/llmtests/account.go (1)

741-1335: ⚠️ Potential issue | 🟡 Minor

VLLM is missing from AllProviderConfigs, but the supporting evidence in this review is inaccurate.

VLLM is indeed absent from AllProviderConfigs despite having a full provider implementation and a dedicated test file (core/providers/vllm/vllm_test.go). However, the review incorrectly states that "similarly self-hosted ones like Ollama and SGL" have entries in the suite. While Ollama is present, SGL is also missing from AllProviderConfigs, making the inconsistency broader than stated. Both VLLM and SGL require external servers yet are excluded, while Ollama is included. If this omission is intentional, a comment explaining why would help future contributors.

🤖 Fix all issues with AI agents
In `@core/internal/llmtests/account.go`:
- Around line 361-369: The vLLM test key incorrectly advertises batch support:
in the switch case for schemas.VLLM that returns []schemas.Key (the struct with
Value, Models, Weight, UseForBatchAPI), change UseForBatchAPI from
bifrost.Ptr(true) to bifrost.Ptr(false) or remove the field entirely so keys do
not claim batch capability; update the value on the schemas.Key returned by that
case to reflect that vLLM does not support batch operations (refer to
UseForBatchAPI and the schemas.VLLM case).

In `@core/providers/vllm/vllm_test.go`:
- Around line 34-55: The Transcription scenario is enabled but vLLM in this PR
does not support transcription; update the llmtests.TestScenarios in
vllm_test.go to set Transcription to false (and ensure TranscriptionStream
remains false) so the test suite does not run transcription tests against a
standard vLLM server; locate the TestScenarios literal in the vllm_test.go file
and change the Transcription field accordingly.

In `@core/providers/vllm/vllm.go`:
- Around line 89-106: TextCompletionStream (and likewise ChatCompletionStream)
currently builds the request URL using provider.networkConfig.BaseURL +
"/v1/completions" which ignores context overrides; change the URL construction
in TextCompletionStream to use provider.networkConfig.BaseURL +
providerUtils.GetPathFromContext(ctx, "/v1/completions") (and do the analogous
change in ChatCompletionStream to use GetPathFromContext(ctx,
"/v1/chat/completions")) so streaming calls honor context path overrides while
keeping the rest of the call to openai.HandleOpenAITextCompletionStreaming
unchanged.
🧹 Nitpick comments (2)
core/providers/vllm/vllm.go (2)

125-144: ChatCompletionStream hardcodes schemas.VLLM instead of using provider.GetProviderKey().

Line 136 passes schemas.VLLM directly, while all other methods in this file consistently use provider.GetProviderKey(). This breaks the pattern recommended by the codebase learnings: when a custom provider alias is configured, hardcoded constants won't reflect the actual provider key.

Proposed fix
 	return openai.HandleOpenAIChatCompletionStreaming(
 		ctx,
 		provider.client,
-		provider.networkConfig.BaseURL+"/v1/chat/completions",
+		provider.networkConfig.BaseURL+providerUtils.GetPathFromContext(ctx, "/v1/chat/completions"),
 		request,
 		nil,
 		provider.networkConfig.ExtraHeaders,
 		providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest),
 		providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse),
-		schemas.VLLM,
+		provider.GetProviderKey(),
 		postHookRunner,

Based on learnings: "When handling unsupported operations across providers, avoid hardcoding provider constants (e.g., schemas.Bedrock). Use the provider.GetProviderKey() (or equivalent API) to obtain the actual provider key from configuration."


196-209: Consider whether conditional transcription support should be gated by model type validation.

vLLM's OpenAI-compatible server does support the /v1/audio/transcriptions endpoint, but only when serving speech-to-text/ASR models (e.g., Whisper-class models). The endpoint won't be exposed if the loaded model isn't recognized as supporting the "transcription" task.

The current implementation unconditionally delegates to the OpenAI handler without validating whether the deployed vLLM instance actually supports transcription. If a user deploys vLLM with a standard text LLM, callers will receive vLLM server errors rather than a clear "unsupported operation" response. Consider either:

  1. Adding runtime validation that the model supports transcription before attempting the operation, or
  2. Documenting that this method only works with ASR models

@Radheshg04 Radheshg04 force-pushed the 02-14-feat_add_vllm_provider_support branch from 237e761 to 09e4e1b Compare February 14, 2026 12:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@core/providers/vllm/vllm.go`:
- Around line 125-144: Replace the hardcoded schemas.VLLM argument in
VLLMProvider.ChatCompletionStream with the provider.GetProviderKey() call so the
actual provider key/alias is used consistently; locate the ChatCompletionStream
method on VLLMProvider and change the parameter currently passing schemas.VLLM
to provider.GetProviderKey() to match other methods in this file and support
provider aliasing.

@Radheshg04 Radheshg04 force-pushed the 02-14-feat_add_vllm_provider_support branch from 09e4e1b to 588ebc8 Compare February 14, 2026 13:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@core/providers/openai/transcription.go`:
- Around line 114-139: The parseVLLMTranscriptionStreamChunk function currently
returns delta-type responses with only Text populated, causing downstream
consumer checks (which look at chunk.Delta) to skip them; update the delta
return to also set Delta to point to the same content string (i.e., populate the
Delta *string alongside Text) so downstream accumulation in
frameworks/streaming/transcription.go sees and appends the chunk; ensure the
returned schemas.BifrostTranscriptionStreamResponse for Type
TranscriptionStreamResponseTypeDelta sets both Text and Delta (reference:
parseVLLMTranscriptionStreamChunk, schemas.BifrostTranscriptionStreamResponse,
field Delta).

In `@core/providers/vllm/vllm.go`:
- Around line 89-105: The streaming methods are not passing the selected auth
key into the OpenAI streaming helpers (they pass nil), causing Authorization to
be omitted; update VLLMProvider.TextCompletionStream (and likewise the
ChatCompletionStream at the other location) to pass the provided key variable
instead of nil into the openai.HandleOpenAITextCompletionStreaming (and the
corresponding chat streaming call) so the Authorization header is included for
authenticated vLLM servers.
🧹 Nitpick comments (4)
core/internal/llmtests/account.go (1)

634-647: Consider aligning retry backoff with other self-hosted providers.

The retry config uses 5s initial / 3min max backoff, which matches cloud providers like Cerebras but is much higher than Ollama (250ms / 4s) — another self-hosted, OpenAI-compatible provider. For a local vLLM instance, shorter backoffs would give faster test feedback. Not blocking since this is test infrastructure.

core/providers/openai/transcription.go (1)

123-128: Content in the final usage chunk would be silently dropped.

If vLLM sends a chunk containing both Usage and Choices[0].Delta.Content, the current logic prioritizes usage and returns a Done response, discarding any accompanying text content. In practice, OpenAI-compatible APIs typically send usage in a separate final chunk with null content, so this is unlikely to be an issue — but worth a comment noting the assumption.

core/providers/openai/openai.go (1)

2343-2357: Hardcoded schemas.VLLM check in shared handler breaks custom provider aliasing.

HandleOpenAITranscriptionStreamRequest is a shared function reused by multiple providers. Checking providerName == schemas.VLLM directly means a custom provider wrapping vLLM (with a different alias) will never hit the vLLM parsing path.

Consider an extensible approach — e.g., accept an optional custom chunk parser function (similar to the existing postResponseConverter pattern), or pass a flag/option indicating the vLLM parsing mode. This keeps the shared handler agnostic to specific provider identities.

Suggested approach: add an optional parser parameter
 func HandleOpenAITranscriptionStreamRequest(
 	ctx *schemas.BifrostContext,
 	client *fasthttp.Client,
 	url string,
 	request *schemas.BifrostTranscriptionRequest,
 	authHeader map[string]string,
 	extraHeaders map[string]string,
 	sendBackRawResponse bool,
 	providerName schemas.ModelProvider,
 	postHookRunner schemas.PostHookRunner,
 	postRequestConverter func(*OpenAITranscriptionRequest) *OpenAITranscriptionRequest,
 	postResponseConverter func(*schemas.BifrostTranscriptionStreamResponse) *schemas.BifrostTranscriptionStreamResponse,
+	customChunkParser func([]byte) (*schemas.BifrostTranscriptionStreamResponse, bool),
 	logger schemas.Logger,
 ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) {

Then in the streaming goroutine:

 		var response schemas.BifrostTranscriptionStreamResponse
-		if providerName == schemas.VLLM {
-			if parsed, ok := parseVLLMTranscriptionStreamChunk([]byte(jsonData)); ok && parsed != nil {
+		if customChunkParser != nil {
+			if parsed, ok := customChunkParser([]byte(jsonData)); ok && parsed != nil {
 				response = *parsed
 			} else {

The vLLM provider would pass parseVLLMTranscriptionStreamChunk as the argument; OpenAI and others pass nil.

Based on learnings: "When handling unsupported operations across providers, avoid hardcoding provider constants (e.g., schemas.Bedrock). Use the provider.GetProviderKey() (or equivalent API) to obtain the actual provider key from configuration, ensuring errors and messages adapt to custom provider names."

core/providers/vllm/vllm_test.go (1)

73-77: Trim env values to avoid whitespace-only model names.

This avoids accidental spaces when env vars are set via CI or shell scripts.

♻️ Suggested tweak
 func getEnvWithDefault(key, def string) string {
-	if v := os.Getenv(key); v != "" {
-		return v
+	if v := strings.TrimSpace(os.Getenv(key)); v != "" {
+		return v
 	}
 	return def
 }

@Radheshg04 Radheshg04 force-pushed the 02-14-feat_add_vllm_provider_support branch from 588ebc8 to 4d077a8 Compare February 14, 2026 14:47
}
}

var errorResp schemas.BifrostError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for this? we already are handling error below right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm sends ErrorField object as error (as 200), we are only handling unmarshalling errors below

Copy link
Collaborator

@Pratham-Mishra04 Pratham-Mishra04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments

@Radheshg04 Radheshg04 force-pushed the 02-14-feat_add_vllm_provider_support branch from 4d077a8 to 63a5243 Compare February 15, 2026 09:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@core/providers/openai/openai.go`:
- Around line 2347-2355: In the customChunkParser branch (customChunkParser,
response, schemas.BifrostTranscriptionStreamResponse, logger.Warn), remove the
undefined err reference and guard against nil: call
customChunkParser([]byte(jsonData)), then check if ok && customChunk != nil
before doing response = *customChunk; otherwise log a simple warning like
"Failed to parse stream response" (no err var) and continue. Ensure the code
never dereferences a nil customChunk.

In `@core/providers/vllm/utils.go`:
- Around line 24-38: The delta branch currently leaves response.Type empty;
update the block handling chunk.Choices (in utils.go) so that when
chunk.Choices[0].Delta.Content is present but not a "stop" finish, you assign
response.Type = schemas.TranscriptionStreamResponseTypeDelta (and still set
response.Delta = chunk.Choices[0].Delta.Content); when the finish reason is
"stop" keep setting response.Type = schemas.TranscriptionStreamResponseTypeDone
as already done. Adjust the logic around chunk.Choices[0].Delta.Content,
FinishReason/StopReason and response.Type to ensure every emitted delta response
has a non-empty Type.

In `@core/providers/vllm/vllm.go`:
- Around line 24-41: The doc comment for NewVLLMProvider incorrectly claims a
default base URL; update the comment to state that config.NetworkConfig.BaseURL
is required (no default) to match the code path that returns an error when
BaseURL is empty, referencing NewVLLMProvider and the
config.NetworkConfig.BaseURL check; ensure you remove the "Default base URL is
http://localhost:8000" text and instead note that API key is optional and
BaseURL must be provided (or alternatively, if you prefer Option A, implement
setting config.NetworkConfig.BaseURL to "http://localhost:8000" before the
BaseURL check and remove the error branch).
🧹 Nitpick comments (2)
core/internal/llmtests/account.go (2)

634-647: Retry backoff seems high for a self-hosted service.

The RetryBackoffMax of 3 minutes is quite aggressive for a self-hosted vLLM instance. Compare with other self-hosted providers:

  • Ollama: 4s max backoff
  • SGL: 15s max backoff

A 3-minute max backoff may cause unnecessarily long waits in test (and real) scenarios when hitting a local vLLM server. Consider aligning with SGL's settings (e.g., RetryBackoffInitial: 1s, RetryBackoffMax: 15s) since both are self-hosted OpenAI-compatible servers.

Suggested adjustment
 	case schemas.VLLM:
 		return &schemas.ProviderConfig{
 			NetworkConfig: schemas.NetworkConfig{
 				BaseURL:                        os.Getenv("VLLM_BASE_URL"),
 				DefaultRequestTimeoutInSeconds: 120,
-				MaxRetries:                     10, // vllm is stable
-				RetryBackoffInitial:            5 * time.Second,
-				RetryBackoffMax:                3 * time.Minute,
+				MaxRetries:                     10, // vllm (self-hosted) is generally stable
+				RetryBackoffInitial:            1 * time.Second,
+				RetryBackoffMax:                15 * time.Second,
 			},

634-647: No ComprehensiveTestConfig entry in AllProviderConfigs for vLLM.

There's no entry for vLLM in the AllProviderConfigs slice (which starts at line 732), so the comprehensive integration test suite won't exercise vLLM scenarios. If dedicated tests exist in core/providers/vllm/vllm_test.go, this may be intentional—but adding an entry (even with minimal scenarios like SimpleChat, CompletionStream, and Embedding) would provide coverage through the shared test harness.

Comment on lines 2347 to +2355
var response schemas.BifrostTranscriptionStreamResponse
if err := sonic.Unmarshal([]byte(jsonData), &response); err != nil {
logger.Warn("Failed to parse stream response: %v", err)
continue
if customChunkParser != nil {
customChunk, ok := customChunkParser([]byte(jsonData))
if ok {
response = *customChunk
} else {
logger.Warn("Failed to parse stream response: %v", err)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix undefined variable in custom chunk parser branch.

Line 2353 references err, which is undefined in this scope (compile error). Also guard against ok == true with a nil chunk to prevent panics.

🛠️ Proposed fix
-			customChunk, ok := customChunkParser([]byte(jsonData))
-			if ok {
-				response = *customChunk
-			} else {
-				logger.Warn("Failed to parse stream response: %v", err)
-				continue
-			}
+			customChunk, ok := customChunkParser([]byte(jsonData))
+			if !ok || customChunk == nil {
+				logger.Warn("customChunkParser returned no chunk")
+				continue
+			}
+			response = *customChunk
🤖 Prompt for AI Agents
In `@core/providers/openai/openai.go` around lines 2347 - 2355, In the
customChunkParser branch (customChunkParser, response,
schemas.BifrostTranscriptionStreamResponse, logger.Warn), remove the undefined
err reference and guard against nil: call customChunkParser([]byte(jsonData)),
then check if ok && customChunk != nil before doing response = *customChunk;
otherwise log a simple warning like "Failed to parse stream response" (no err
var) and continue. Ensure the code never dereferences a nil customChunk.

Comment on lines +24 to +38
// Delta chunk: has choices[].delta.content
if len(chunk.Choices) == 0 || chunk.Choices[0].Delta.Content == nil {
return nil, false
}
if len(chunk.Choices) > 0 {
reason := chunk.Choices[0].FinishReason
if reason == nil && chunk.Choices[0].StopReason != nil {
reason = chunk.Choices[0].StopReason
}
if reason != nil && *reason == "stop" {
response.Text = *chunk.Choices[0].Delta.Content
response.Type = schemas.TranscriptionStreamResponseTypeDone
}
response.Delta = chunk.Choices[0].Delta.Content
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Set a non-empty Type for delta chunks.

Delta responses currently return with Type == "", which can emit invalid stream events (Type is required in BifrostTranscriptionStreamResponse). Set it to the delta event type used elsewhere in schemas/transcriptions.go.

✅ Suggested patch
 		if reason != nil && *reason == "stop" {
 			response.Text = *chunk.Choices[0].Delta.Content
 			response.Type = schemas.TranscriptionStreamResponseTypeDone
 		}
 		response.Delta = chunk.Choices[0].Delta.Content
+		if response.Type == "" {
+			response.Type = schemas.TranscriptionStreamResponseTypeDelta
+		}
 	}
🤖 Prompt for AI Agents
In `@core/providers/vllm/utils.go` around lines 24 - 38, The delta branch
currently leaves response.Type empty; update the block handling chunk.Choices
(in utils.go) so that when chunk.Choices[0].Delta.Content is present but not a
"stop" finish, you assign response.Type =
schemas.TranscriptionStreamResponseTypeDelta (and still set response.Delta =
chunk.Choices[0].Delta.Content); when the finish reason is "stop" keep setting
response.Type = schemas.TranscriptionStreamResponseTypeDone as already done.
Adjust the logic around chunk.Choices[0].Delta.Content, FinishReason/StopReason
and response.Type to ensure every emitted delta response has a non-empty Type.

Comment on lines +24 to +41
// NewVLLMProvider creates a new vLLM provider instance.
// Default base URL is http://localhost:8000. API key is optional for local instances.
func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) {
config.CheckAndSetDefaults()

client := &fasthttp.Client{
ReadTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
WriteTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
MaxConnsPerHost: 5000,
MaxIdleConnDuration: 60 * time.Second,
MaxConnWaitTimeout: 10 * time.Second,
}

client = providerUtils.ConfigureProxy(client, config.ProxyConfig, logger)

if config.NetworkConfig.BaseURL == "" {
return nil, fmt.Errorf("base_url is required for vllm provider")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if CheckAndSetDefaults sets a default BaseURL for vLLM
echo "=== Looking for CheckAndSetDefaults and BaseURL default logic ==="
rg -n "CheckAndSetDefaults" core/schemas/ -A 20 | head -100

echo ""
echo "=== vLLM provider constructor context ==="
rg -n "NewVLLMProvider" core/providers/vllm/ -A 30 | head -50

echo ""
echo "=== Comparing with Ollama provider ==="
rg -n "NewOllamaProvider" core/providers/ollama/ -A 30 | head -50

echo ""
echo "=== Comparing with SGL provider ==="
rg -n "NewSGLProvider" core/providers/sgl/ -A 30 | head -50

Repository: maximhq/bifrost

Length of output: 8949


Fix doc comment—it claims a default base URL that the code doesn't provide.

The comment on line 25 states "Default base URL is http://localhost:8000", but lines 39–41 return an error if BaseURL is empty with no fallback. Other similar providers (Ollama, SGL) also require base_url and explicitly state it in comments. Either set the default here or correct the comment to say base_url is required.

Option A: Set the default (if intended)
 	config.CheckAndSetDefaults()
+
+	if config.NetworkConfig.BaseURL == "" {
+		config.NetworkConfig.BaseURL = "http://localhost:8000"
+	}
 
 	client := &fasthttp.Client{

Then remove the error check on lines 39–41.

Option B: Fix the comment to match code behavior
 // NewVLLMProvider creates a new vLLM provider instance.
-// Default base URL is http://localhost:8000. API key is optional for local instances.
+// base_url is required in the provider config. API key is optional for local instances.
 func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// NewVLLMProvider creates a new vLLM provider instance.
// Default base URL is http://localhost:8000. API key is optional for local instances.
func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) {
config.CheckAndSetDefaults()
client := &fasthttp.Client{
ReadTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
WriteTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
MaxConnsPerHost: 5000,
MaxIdleConnDuration: 60 * time.Second,
MaxConnWaitTimeout: 10 * time.Second,
}
client = providerUtils.ConfigureProxy(client, config.ProxyConfig, logger)
if config.NetworkConfig.BaseURL == "" {
return nil, fmt.Errorf("base_url is required for vllm provider")
}
// NewVLLMProvider creates a new vLLM provider instance.
// base_url is required in the provider config. API key is optional for local instances.
func NewVLLMProvider(config *schemas.ProviderConfig, logger schemas.Logger) (*VLLMProvider, error) {
config.CheckAndSetDefaults()
client := &fasthttp.Client{
ReadTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
WriteTimeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
MaxConnsPerHost: 5000,
MaxIdleConnDuration: 60 * time.Second,
MaxConnWaitTimeout: 10 * time.Second,
}
client = providerUtils.ConfigureProxy(client, config.ProxyConfig, logger)
if config.NetworkConfig.BaseURL == "" {
return nil, fmt.Errorf("base_url is required for vllm provider")
}
🤖 Prompt for AI Agents
In `@core/providers/vllm/vllm.go` around lines 24 - 41, The doc comment for
NewVLLMProvider incorrectly claims a default base URL; update the comment to
state that config.NetworkConfig.BaseURL is required (no default) to match the
code path that returns an error when BaseURL is empty, referencing
NewVLLMProvider and the config.NetworkConfig.BaseURL check; ensure you remove
the "Default base URL is http://localhost:8000" text and instead note that API
key is optional and BaseURL must be provided (or alternatively, if you prefer
Option A, implement setting config.NetworkConfig.BaseURL to
"http://localhost:8000" before the BaseURL check and remove the error branch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants