Skip to content

vscode-openai hanging #351

@rickmcgeer

Description

@rickmcgeer

I'm using vscode-openai 1.6.18 with code-server 4.100.3 (Code 1.100.3), working through a LiteLLM proxy (1.52.0). When I try to use the conversation feature, the message "waiting for response" appears, but no response appears.
The only log message (Log level is set to Debug) that appears is:

2025-06-23 16:08:28.234 [info]		chat-completion - event properties
{
  "service_provider": "Custom-OpenAI",
  "default_model": "gpt-4o-2024-08-06",
  "tokens_prompt": "0",
  "tokens_completion": "0",
  "tokens_total": "12",
  "tokens_session": "12"
}

The server reports receiving the POST and gives a 200 response. When the command

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "What models do you have?"}
    ]
  }' \
  https://bearborg.berkeley.edu:4433/v1/chat/completions

Is run from the command line, I get this response:

{"id":"chatcmpl-Ble48wFvPUsFLABp0SA46tbQVAOPY","choices":[{"finish_reason":"stop","index":0,"message":{"content":"I am based on OpenAI's GPT-3.5 architecture, which is designed to understand and generate human-like text. If you're referring to specific versions or variations within the GPT models, I primarily operate using the capabilities and updates provided up to the GPT-3.5 series. If you have questions about specific functionalities or need assistance with something, feel free to ask!","role":"assistant","tool_calls":null,"function_call":null}}],"created":1750695128,"model":"gpt-4o-2024-08-06","object":"chat.completion","system_fingerprint":"fp_ee1d74bde0","usage":{"completion_tokens":76,"prompt_tokens":13,"total_tokens":89,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0}},"service_tier":null,"prompt_filter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"jailbreak":{"filtered":false,"detected":false},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}]}

The settings for vscode-openai are:

{
    "vscode-openai.serviceProvider": "Custom-OpenAI",
    "vscode-openai.baseUrl": "https://bearborg.berkeley.edu:4433/v1/",
    // "vscode-openai.defaultModel": "openai/gpt-4o" // Set this to the *exact* name of the model you defined in your LiteLLM config, e.g., "gpt-4o" or "gemini-1.5-pro-litellm"
    "vscode-openai.defaultModel": "gpt-4o-2024-08-06",
    "vscode-openai.logLevel": "Debug",
    "vscode-openai.conversation-configuration.max-tokens": 50,
    "vscode-openai.conversation-configuration.summary-threshold": 3

}

My guess is that vscode-openai isn't detecting termination of the response for some reason

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions