Evaluation

Paper link.

Usage Guide

For running a supported model on this benchmark, please run the following command:

python main.py \
    --model_name [model_name] \
    --knowledge_type [theory|moral|cases] \
    --question_type [single|multiple|mix|essay] \
    --scope [second|third|all] \
    --sleep_time [sleep_time] \
    --local [True|False] \
    --all_in_one

--model_name: the name of model to evaluate. Default to "qwen1.5-7b-chat" (use Qwen API | local).

--knowledge_type: the examination knowledge points of the questions. Default to "theory".

--question_type: the answer format of the questions. Default to "single". Please note that [theory, moral --> single or multiple], [cases --> mix or essay].

--scope: the level range of questions. Default to "all".

--sleep_time: the sleep time after completing each question. It is mainly to deal with the QPS limit when calling the API. Default to "0.1".

--local: the indicator that identifies whether the model to evaluate is locally deployed. Default to "False".

--all_in_one: complete the experiments of all types of questions at once. Default to "False".

Notice: The model has not been uploaded. If you want to evaluate the locally deployed model (e.g. qwen1.5-7b-chat), you need to download the corresponding model file from huggingface and place it in the models folder in the root directory. And if you want to use the API to access LLMs, you need to apply for an API-key from the official website of Qwen or Qianfan to call the API.

Supported Models

qwen1.5-7b-chat (Qwen API | local)
qwen1.5-14b-chat (Qwen API | local)
baichuan2-7b-chat (local)
baichuan2-13b-chat(local)
chatglm3-6b-32k (local)
qianfan-chinese-llama-2-7b (Qianfan API)
qianfan-chinese-llama-2-13b (Qianfan API)
chinese-alpaca-2-7b (local)
chinese-alpaca-2-13b (local)
yi-6b-chat (local)
yi-34b-chat (Qianfan API)
... (To be continued)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation

Usage Guide

Supported Models

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Evaluation

Usage Guide

Supported Models

Citation