Skip to content

Commit 62583d7

Browse files
committed
feat: add LLaVA-OneVision2 chat model wrapper
Register llava_onevision2_chat (key: llava_onevision2) targeting the released checkpoint lmms-lab-encoder/LLaVA-OneVision2-8B-Instruct. The wrapper loads via AutoModelForImageTextToText with trust_remote_code so the bundled processing pipeline (patch_positions, RoPE block layout, frame sampling + smart_resize, per-frame timestamp expansion) is used exactly as during training. - New: lmms_eval/models/chat/llava_onevision2.py - Register in lmms_eval/models/__init__.py - Example launch script: examples/models/llava_onevision2.sh - Documented under docs/advanced/throughput_metrics.md as a backend that logs throughput via log_metrics().
1 parent 9c7a55a commit 62583d7

4 files changed

Lines changed: 438 additions & 1 deletion

File tree

docs/advanced/throughput_metrics.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,12 +57,13 @@ All chat backends listed below log throughput-oriented metrics (`total_gen_token
5757
- `llava_hf` (`/lmms_eval/models/chat/llava_hf.py`)
5858
- `internvl_hf` (`/lmms_eval/models/chat/internvl_hf.py`)
5959
- `llava_onevision1_5` (`/lmms_eval/models/chat/llava_onevision1_5.py`)
60+
- `llava_onevision2` (`/lmms_eval/models/chat/llava_onevision2.py`)
6061
- `thyme` (`/lmms_eval/models/chat/thyme.py`)
6162

6263
TTFT/TPOT coverage is narrower:
6364

6465
- **Native TTFT/TPOT in run summary**: `vllm`, `vllm_generate`
65-
- **Throughput-only (no native TTFT/TPOT in summary)**: `sglang`, `openai`, `async_openai`, `huggingface`, `qwen2_5_vl`, `qwen3_vl`, `llava_hf`, `internvl_hf`, `llava_onevision1_5`, `thyme`
66+
- **Throughput-only (no native TTFT/TPOT in summary)**: `sglang`, `openai`, `async_openai`, `huggingface`, `qwen2_5_vl`, `qwen3_vl`, `llava_hf`, `internvl_hf`, `llava_onevision1_5`, `llava_onevision2`, `thyme`
6667

6768
## Usage
6869

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
export HF_HOME="~/.cache/huggingface"
2+
3+
# pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
4+
# pip install qwen-vl-utils
5+
6+
# Example: MLVU-dev with best config (min_pixels = max_pixels = 102400, max_num_frames = 384)
7+
accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \
8+
--model=llava_onevision2 \
9+
--model_args=pretrained=lmms-lab-encoder/LLaVA-OneVision2-8B-Instruct,attn_implementation=flash_attention_2,messages_format=timestamp,max_new_tokens=16,fps=1,max_num_frames=384,min_pixels=102400,max_pixels=102400 \
10+
--tasks=mlvu_dev \
11+
--batch_size=1

lmms_eval/models/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@
133133
"async_hf_model": "AsyncHFModel",
134134
"longvila": "LongVila",
135135
"llava_onevision1_5": "Llava_OneVision1_5",
136+
"llava_onevision2": "Llava_OneVision2",
136137
}
137138

138139
MODEL_ALIASES: dict[str, tuple[str, ...]] = {

0 commit comments

Comments
 (0)