Ko-MuSR

Codes for the paper Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean

Tested Environment

openai==1.78.0
lm-eval==0.4.8 # Installed with [api] configuration

Setup

1. Installing required packages

pip install -r requirements.txt

Installing LM Evaluation Harness (Use api configuration)

git clone https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e ".[api]"

2. Other requirements that might be of your interest

For data synthesis, please make your openai API key available with the environment variable, OPENAI_API_KEY

export OPENAI_API_KEY="<your-openai-api-key>"

For evaluation, prepare your language model servers. Offline inference using lm-evaluation-harness might work, but this feature is not tested.

Reproducing data

1. Generating random domain

For generating random domain, utilize two scripts

# before running the script, make sure you setup the openai api key here!
export OPENAI_API_KEY="<your-openai-api-key>"

python sample_madlib_op.py # for generating random domain for object placements task.
python sample_madlib_ta.py # for generating random domain for team allocation task.

2. Generating data instances

For generating data instances, utilize three scripts

# before running the script, make sure you setup the openai api key here!
export OPENAI_API_KEY="<your-openai-api-key>"

bash create_mm.sh # for generating murder mysteries data
bash create_op.sh # for generating object placements data
bash create_ta.sh # for generating team allocations data

Evaluation

We utilize LM Evaluation Harness for the evaluation. We provide task descriptions in musr-tasks directory.

We provide evaluation example in evaluation.sh script that utilizes local inference servers. See lm-evaluation-harness repository for advanced usage.

License

This repository is licensed under MIT license. For the original code, see MuSR Repository.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
domain_seed		domain_seed
musr-tasks		musr-tasks
musr_dataset_scripts		musr_dataset_scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_mm.sh		create_mm.sh
create_op.sh		create_op.sh
create_ta.sh		create_ta.sh
evaluation.sh		evaluation.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ko-MuSR

Tested Environment

Setup

1. Installing required packages

2. Other requirements that might be of your interest

Reproducing data

1. Generating random domain

2. Generating data instances

Evaluation

License

About

Uh oh!

Releases

Packages

Languages

License

mcrl/Ko-MuSR

Folders and files

Latest commit

History

Repository files navigation

Ko-MuSR

Tested Environment

Setup

1. Installing required packages

2. Other requirements that might be of your interest

Reproducing data

1. Generating random domain

2. Generating data instances

Evaluation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages