Skip to content

Guide to finetune on custom dataset #251

@vishalk2999

Description

@vishalk2999

I have created a dataset in the following format:

- Dataset_folder
    - videos
        - video1,mp4
        - video2.mp4
    train.json

train.json is in the following format:

[
    {
        "video":"videos/calling.mp4",
        "QA":[{
            "i":"Go through the video and understand the all the actions performed in the video",
            "q":"Describe the video",
            "a":"The person is making phone call and talking on the phone"
        }]
    },
]

How to prepare a custom dataset and what are the changes I need to do in order to train on this custom dataset for stage3 finetuning.

I have set the train_file variable of config_7b_stage3.py to the path of this train.json and i get the following error:

2024-12-07T07:52:41 | __main__: train_file: /home/ubuntu/Custom_Data/train.json
2024-12-07T07:52:41 | __main__: Creating dataset for it
2024-12-07T07:52:41 | dataset.it_dataset: Load json file
Traceback (most recent call last):
  File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 221, in <module>
    main(cfg)
  File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 138, in main
    train_loaders, train_media_types = setup_dataloaders(
  File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 105, in setup_dataloaders
    train_datasets = create_dataset(f"{mode}_train", config)
  File "/home/ubuntu/Ask-Anything/video_chat2/dataset/__init__.py", line 174, in create_dataset
    datasets.append(dataset_cls(**dataset_kwargs))
  File "/home/ubuntu/Ask-Anything/video_chat2/dataset/it_dataset.py", line 37, in __init__
    with open(self.label_file, 'r') as f:
IsADirectoryError: [Errno 21] Is a directory: '/'

Could you please help in understading the steps and changes required to train on a custom dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions