Skip to content

About vicuna_dummy_data.json lack 'example_id' #14

Open
@Harry-mic

Description

Hi! I encounter a bug when doing the step3 (Principle Engraving). I used the self_align_merged.json which is created with "self_align_32shards_*.jsonl" and "vicuna_dummy_data.json" to finetune the base model.

However, I find that vicuna_dummy_data.json file items do not have 'example_id' labels. It results in a bug when execute function "extract_dromedary_dataset":

def extract_dromedary_dataset(example, meta_prompts):
    assert "example_id" in example
    total_meta_prompt = len(meta_prompts)
    meta_prompt = meta_prompts[int(example["example_id"]) % total_meta_prompt]
    if example.get("input", "") != "":
        prompt_format = DROMEDARY_PROMPT_DICT["prompt_input"]
    else:
        prompt_format = DROMEDARY_PROMPT_DICT["prompt_no_input"]
    return {
        "input": prompt_format.format(meta_prompt=meta_prompt, **example),
        "output": "\n" + example["output"],
    }

The vicuna_dummy_data are all labeled "example_id" = None, and result in a int error.

Therefore, I wonder how to deal with this issue and correctly get the vicuna_dummy_data example_ids.Thanks a lot for your reply!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions