Hey, thank you for putting together this valuable collection!
I would appreciate if you could include our recent work:
MUSE: A Unified Agentic Harness for MLLMs
Unlike most existing harness optimization methods that are primarily developed and evaluated on text-centric domains (e.g., coding, match reasoning, and Terminal-Bench-style tasks), MUSE investigates harness design for multimodal large language models (MLLMs).
The paper presents a unified agentic harness and evaluates it across a diverse set of vision-centric tasks, including visual spatial planning, visual perception, multimodal reasoning, and fine-grained visual discrimination.
The results demonstrate that harness optimization can provide substantial gains beyond language-only settings and serves as a general framework for improving multimodal intelligence.
Since this repository already covers harness engineering and agent optimization research, we believe MUSE would be a relevant addition.
Thank you for your consideration.
@article{lu2026museunifiedagenticharness,
title={MUSE: A Unified Agentic Harness for MLLMs},
author={Jianglin Lu and Hailing Wang and Xu Ma and Qihua Dong and Mingyuan Zhang and Yizhou Wang and Yun Fu},
year={2026},
journal={arXiv preprint arXiv:2606.03005},
url={https://arxiv.org/abs/2606.03005},
}
Hey, thank you for putting together this valuable collection!
I would appreciate if you could include our recent work:
MUSE: A Unified Agentic Harness for MLLMs
Unlike most existing harness optimization methods that are primarily developed and evaluated on text-centric domains (e.g., coding, match reasoning, and Terminal-Bench-style tasks), MUSE investigates harness design for multimodal large language models (MLLMs).
The paper presents a unified agentic harness and evaluates it across a diverse set of vision-centric tasks, including visual spatial planning, visual perception, multimodal reasoning, and fine-grained visual discrimination.
The results demonstrate that harness optimization can provide substantial gains beyond language-only settings and serves as a general framework for improving multimodal intelligence.
Since this repository already covers harness engineering and agent optimization research, we believe MUSE would be a relevant addition.
Thank you for your consideration.