machine-learning-library/corpus/papers/2412.15115.md at main · ATOM00blue/machine-learning-library

title

Qwen2.5 Technical Report

source

arxiv

arxiv_id

2412.15115

url

https://arxiv.org/abs/2412.15115

authors

An Yang

Baosong Yang

Beichen Zhang

Binyuan Hui

Bo Zheng

Bowen Yu

Chengyuan Li

Dayiheng Liu

Fei Huang

Haoran Wei

Huan Lin

Jian Yang

Jianhong Tu

Jianwei Zhang

Jianxin Yang

Jiaxi Yang

Jingren Zhou

Junyang Lin

Kai Dang

Keming Lu

Keqin Bao

Kexin Yang

Le Yu

Mei Li

Mingfeng Xue

Pei Zhang

Qin Zhu

Rui Men

Runji Lin

Tianhao Li

Tianyi Tang

Tingyu Xia

Xingzhang Ren

Xuancheng Ren

Yang Fan

Yang Su

Yichang Zhang

Yu Wan

Yuqiong Liu

Zeyu Cui

Zhenru Zhang

Zihan Qiu

published

2024-12-19

Abstract

In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.

Why it matters

Qwen2.5 scales pre-training data from 7 to 18 trillion tokens, yielding a much stronger knowledge and reasoning foundation across its model family.
Post-training uses over 1 million supervised fine-tuning samples plus multistage reinforcement learning, substantially improving instruction following, long-text generation, and structured data analysis.
The open-weight 72B instruction-tuned model matches or exceeds Llama-3-405B-Instruct (roughly 5x larger), demonstrating strong parameter efficiency at scale.
The MoE-based hosted variants (Qwen2.5-Turbo and Qwen2.5-Plus) compete with GPT-4o-mini and GPT-4o respectively, while also serving as the foundation for specialized models in math, coding, and multimodal tasks.

Source: https://arxiv.org/abs/2412.15115. This entry is the paper's abstract + metadata; read the full paper at the link.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract

Why it matters

FilesExpand file tree

2412.15115.md

Latest commit

History

2412.15115.md

File metadata and controls

Abstract

Why it matters