Skip to content

Latest commit

 

History

History
125 lines (85 loc) · 10.2 KB

File metadata and controls

125 lines (85 loc) · 10.2 KB

MiniCPM Legacy Topics

Auxiliary topics for earlier MiniCPM releases. For the current flagship model, see the main README.

中文版: README-legacy-cn.md


BitCPM4: Quantization

BitCPM4 are ternary quantized models derived from the MiniCPM series models through quantization-aware training (QAT), achieving significant improvements in both training efficiency and model parameter efficiency.

  • Improvements of the training method
    • Searching hyperparameters with a wind-tunnel on a small model.
    • Using a two-stage training method: training in high-precision first and then QAT, making the best of the trained high-precision models and significantly reducing the computational resources required for the QAT phase.
  • High parameter efficiency
    • Achieving comparable performance to full-precision models of similar parameter models with a bit width of only 1.58 bits, demonstrating high parameter efficiency.

BitCPM4 Evaluation

BitCPM4's performance is comparable with other full-precision models in same model size. bitcpm-benchmark

BitCPM4 Inference

BitCPM4's parameters are stored in a fake-quantized format, which supports direct inference within the Huggingface framework.

MiniCPM4 Application

Click to view details about MiniCPM4 Application

MiniCPM4-Survey: Trustworthy Survey Generation

MiniCPM4-Survey is an open-source LLM agent model jointly developed by THUNLP, Renmin University of China and ModelBest. Built on MiniCPM4-8B, it accepts users' quiries as input and autonomously generate trustworthy, long-form survey papers.

Key features include:

  • Plan-Retrieve-Write Survey Generation Framework — We propose a multi-agent generation framework, which operates through three core stages: planning (defining the overall structure of the survey), retrieval (generating appropriate retrieval keywords), and writing (synthesizing the retrieved information to generate coherent section-level content).

  • High-Quality Dataset Construction — We gather and process lots of expert-written survey papers to construct a high-quality training dataset. Meanwhile, we collect a large number of research papers to build a retrieval database.

  • Multi-Aspect Reward Design — We carefully design a reward system with three aspects (structure, content, and citations) to evaluate the quality of the surveys, which is used as the reward function in the RL training stage.

  • Multi-Step RL Training Strategy — We propose a Context Manager to ensure retention of essential information while facilitating efficient reasoning, and we construct Parallel Environment to maintain efficient RL training cycles.

Demo and Quick Start

See here

Performance Evaluation
Method Relevance Coverage Depth Novelty Avg. Fact Score
Naive RAG (driven by G2FT) 3.25 2.95 3.35 2.60 3.04 43.68
AutoSurvey (driven by G2FT) 3.10 3.25 3.15 3.15 3.16 46.56
Webthinker (driven by WTR1-7B) 3.30 3.00 2.75 2.50 2.89 --
Webthinker (driven by QwQ-32B) 3.40 3.30 3.30 2.50 3.13 --
OpenAI Deep Research (driven by GPT-4o) 3.50 3.95 3.55 3.00 3.50 --
MiniCPM-4-Survey 3.45 3.70 3.85 3.00 3.50 68.73
   w/o RL 3.55 3.35 3.30 2.25 3.11 50.24

Performance comparison of the survey generation systems. "G2FT" stands for Gemini-2.0-Flash-Thinking, and "WTR1-7B" denotes Webthinker-R1-7B. FactScore evaluation was omitted for Webthinker, as it does not include citation functionality, and for OpenAI Deep Research, which does not provide citations when exporting the results.

MiniCPM4-MCP: Tool Use with Model Context Protocol

MiniCPM4-MCP is an open-source on-device LLM agent model jointly developed by THUNLP, Renmin University of China and ModelBest, built on MiniCPM-4 with 8 billion parameters. It is capable of solving a wide range of real-world tasks by interacting with various tool and data resources through MCP. As of now, MiniCPM4-MCP supports the following:

  • Utilization of tools across 16 MCP servers: These servers span various categories, including office, lifestyle, communication, information, and work management.

  • Single-tool-calling capability: It can perform single- or multi-step tool calls using a single tool that complies with the MCP.

  • Cross-tool-calling capability: It can perform single- or multi-step tool calls using different tools that complies with the MCP.

Demo

Demo is available in this link.

Performance Evaluation
MCP Server gpt-4o qwen3 minicpm4
func param value func param value func param value
Airbnb 89.3 67.9 53.6 92.8 60.7 50.0 96.4 67.9 50.0
Amap-Maps 79.8 77.5 50.0 74.4 72.0 41.0 89.3 85.7 39.9
Arxiv-MCP-Server 85.7 85.7 85.7 81.8 54.5 50.0 57.1 57.1 52.4
Calculator 100.0 100.0 20.0 80.0 80.0 13.3 100.0 100.0 6.67
Computor-Control-MCP 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 86.7
Desktop-Commander 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Filesystem 63.5 63.5 31.3 69.7 69.7 26.0 83.3 83.3 42.7
Github 92.0 80.0 58.0 80.5 50.0 27.7 62.8 25.7 17.1
Gaode 71.1 55.6 17.8 68.8 46.6 24.4 68.9 46.7 15.6
MCP-Code-Executor 85.0 80.0 70.0 80.0 80.0 70.0 90.0 90.0 65.0
MCP-Docx 95.8 86.7 67.1 94.9 81.6 60.1 95.1 86.6 76.1
PPT 72.6 49.8 40.9 85.9 50.7 37.5 91.2 72.1 56.7
PPTx 64.2 53.7 13.4 91.0 68.6 20.9 91.0 58.2 26.9
Simple-Time-Server 90.0 70.0 70.0 90.0 90.0 90.0 90.0 60.0 60.0
Slack 100.0 90.0 70.0 100.0 100.0 65.0 100.0 100.0 100.0
Whisper 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 30.0
Average 80.2 70.2 49.1 83.5 67.7 43.8 88.3 76.1 51.2

MiniCPM Intel AIPC Client: A New Edge Large Model Powerhouse

Developed in collaboration between Mianbi Intelligence and Intel, the MiniCPM Intel AIPC Client is an edge large model client specially designed for devices equipped with Intel Core Ultra series processors. It delivers a low-latency, high-efficiency, and privacy-preserving local large model experience for developers, researchers, and AI enthusiasts. Its core features include:

Key Features
  • Deep Intel Hardware Adaptation
    Fully compatible with Intel Core Ultra series processors, enabling deep integration with hardware to unleash peak performance. Users can run large models smoothly on local devices without relying on cloud services.

  • Extreme Optimization Based on OpenVINO
    Deeply optimized with the OpenVINO inference framework, it significantly boosts inference efficiency, reaching up to 80 tokens per second. This ensures rapid model response for both quick queries and complex task processing.

  • Privacy and Security Assurance
    Adopting local deployment, all data processing is completed on the device, eliminating privacy risks from cloud uploads. This provides users with peace of mind, especially for scenarios with high data privacy requirements.

  • Catering to Diverse User Groups
    Whether for developers chasing cutting-edge technologies, researchers focused on academic studies, or enthusiasts eager to explore AI applications, the MiniCPM Intel AIPC Client enables easy access to the power of local large models, opening the door to personalized AI exploration.

System Requirements
  • Recommended processor: Intel Core Ultra 7 or higher (mobile version)
  • Recommended RAM: 32GB or above
Download

download