Issue running  single judgement with references

Hi guys first of all thank you for the great paper. I am trying the single case scenario that is where i have a question, a model generated answer and a reference answer. Looking at the code i am using gen_model_judgment_single.py.
The first thing i did was to generate the dataset in the desired format:
        "question_id": i,
        "question_body": question["question"],
        "decoding_method": "top_p_sampling",  # Placeholder value
        "model": "alpaca-native",  # Placeholder value
        "text": answer,
        "scores": {"logprobs": -7.0179795026779175} #placheholder
I ase generated the reference answer dataset like this
    combined_entry = {
        "question_id": i,
        "question_body": question["question"],
        "decoding_method": "top_p_sampling",  # Placeholder value
        "model": "alpaca-native",  # Placeholder value
        "reference": {
            "text": answer  # You can update this with the correct reference text
        },
        "scores": {
            "logprobs": -7.0179795026779175  # place holder
        }
    }
Then as stated in the repo i ran the judgelm_preprocess.py which generated a json with the following format
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated answer, "answer2_body": "reference answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}}
First question is it ok for the answer2body to be the reference answer?

Then having this dataset a run:
!python ./judgelm/llm_judge/gen_model_judgement_single.py \
--model-path "BAAI/JudgeLM-7B-v1.0" \
--model-id 7b-full-model \
--question-file /root/JudgeLM/judgelm/data/judgelm-val-5k-judge-samples.jsonl \
--answer-file /root/JudgeLM/judgelm/data/JudgeLM/output \
--num-gpus-per-model 1 \
--num-gpus-total 1 \
--temperature 0 \
--reference-file /root/JudgeLM/judgelm/data/JudgeLM/combined_questions_answers_ref.jsonl \
--if-fast-eval 1 
First issue i run into was that since i was using references the copy function of conversation requests num_answers but this is a single one so i had to change the code to add this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy() 
to this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy(answer_num=answer_num_value)
passing 1 as answer_num_value
So i do not know if this is a bug, if my change is ok?
After changing this a get the code to run however I do not see any judgment on the output, here is a sample output:
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated_answer", "answer2_body": reference_answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}, "pred_id": "ie5CkG9JTxcCYmAwt3pwrj", "pred_text": "10", "pred_model_id": "7b-full-model", "tstamp": 1703790064.0357897, "reference": "reference_anwer"}
I was wondering if you could help me to properly run this code and point anything i am doing wrong
Best
Sergio

    

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue running single judgement with references #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue running single judgement with references #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions