InternVL3.5 Usage Guide

InternVL3.5 is a vision-language model developed by Shanghai AI Laboratory. This guide describes how to deploy InternVL3.5 with vLLM and provide some simple examples of how to use the API.

Installing vLLM

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)

uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700

⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in vLLM

Launching InternVL3.5 with vLLM

vllm serve OpenGVLab/InternVL3_5-8B --trust-remote-code

Launching InternVL3.5 with vLLM on AMD GPUs

export VLLM_ROCM_USE_AITER=1
vllm serve OpenGVLab/InternVL3_5-8B --trust-remote-code

You can set export VLLM_ROCM_USE_AITER=1 for Better Performance on AMD GPUs. The default is export VLLM_ROCM_USE_AITER=0

API Usage Examples

Chat with Pure-Text

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role':
        'user',
        'content': [{
            'type': 'text',
            'text': '9.11 and 9.8, which is greater?',
        }],
    }],
    temperature=0.6,
    top_p=0.95,
)
print(response.choices[0].message.content)

Chat with Image

Single Image

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role':
        'user',
        'content': [{
            'type': 'text',
            'text': 'Describe the image.',
        }, {
            'type': 'image_url',
            'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'},
        }],
    }],
    temperature=0.0
)
print(response.choices[0].message.content)

Multiple Images

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role':
        'user',
        'content': [{
            'type': 'text',
            'text': 'Describe these two images.',
        }, {
            'type': 'image_url',
            'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg'},
        }, {
            'type': 'image_url',
            'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg'},
        }],
    }],
    temperature=0.0
)
print(response.choices[0].message.content)

Thinking Mode

To enable thinking mode, please set the system prompt to our Thinking System Prompt. When enabling thinking mode, we recommend setting temperature=0.6 to mitigate undesired repetition.

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

THINKING_SYSTEM_PROMPT = """
You are an AI assistant that rigorously follows this response protocol:

1. First, conduct a detailed analysis of the question. Consider different angles, potential solutions, and reason through the problem step-by-step. Enclose this entire thinking process within <think> and </think> tags.

2. After the thinking section, provide a clear, concise, and direct answer to the user's question. Separate the answer from the think section with a newline.

Ensure that the thinking process is thorough but remains focused on the query. The final answer should be standalone and not reference the thinking section.
""".strip()

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role': 'system',
        'content': [{
            'type': 'text',
            'text': THINKING_SYSTEM_PROMPT,
        }],
    }, {
        'role': 'user',
        'content': [{
            'type': 'text',
            'text': '9.11 and 9.8, which is greater?',
        }],
    }],
    temperature=0.6,
    top_p=0.95,
)
print(response.choices[0].message.content)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InternVL3.5 Usage Guide

Installing vLLM

Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)

Launching InternVL3.5 with vLLM

Launching InternVL3.5 with vLLM on AMD GPUs

API Usage Examples

Chat with Pure-Text

Chat with Image

Single Image

Multiple Images

Thinking Mode

Additional Resources

FilesExpand file tree

InternVL3_5.md

Latest commit

History

InternVL3_5.md

File metadata and controls

InternVL3.5 Usage Guide

Installing vLLM

Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)

Launching InternVL3.5 with vLLM

Launching InternVL3.5 with vLLM on AMD GPUs

API Usage Examples

Chat with Pure-Text

Chat with Image

Single Image

Multiple Images

Thinking Mode

Additional Resources