Open
Description
Hi! I encounter a bug when doing the step3 (Principle Engraving). I used the self_align_merged.json which is created with "self_align_32shards_*.jsonl" and "vicuna_dummy_data.json" to finetune the base model.
However, I find that vicuna_dummy_data.json file items do not have 'example_id' labels. It results in a bug when execute function "extract_dromedary_dataset":
def extract_dromedary_dataset(example, meta_prompts):
assert "example_id" in example
total_meta_prompt = len(meta_prompts)
meta_prompt = meta_prompts[int(example["example_id"]) % total_meta_prompt]
if example.get("input", "") != "":
prompt_format = DROMEDARY_PROMPT_DICT["prompt_input"]
else:
prompt_format = DROMEDARY_PROMPT_DICT["prompt_no_input"]
return {
"input": prompt_format.format(meta_prompt=meta_prompt, **example),
"output": "\n" + example["output"],
}
The vicuna_dummy_data are all labeled "example_id" = None, and result in a int error.
Therefore, I wonder how to deal with this issue and correctly get the vicuna_dummy_data example_ids.Thanks a lot for your reply!
Metadata
Assignees
Labels
No labels