Skip to content

A Question Regarding Performance Reproduction and Discrepancy Report #15

@koliose

Description

@koliose

Hello,

I had the pleasure of seeing this paper during its review process and came to check the official code. However, I've run into some issues trying to reproduce the results, as things are not working as expected.

I am conducting my experiments on an A100 GPU for llava-phi3, the same environment mentioned in your paper, so the setup should be fully comparable.

Here are the specific issues I've observed:

  1. Difficulty Reproducing Table 1 Performance:
    When I fine-tune the model from scratch following the procedure for the learning stage (Stage I), the performance does not reach the scores reported in Table 1. I have been using the exact hyperparameter settings as detailed in Table 7 of the paper.
    If any differences in settings please let me know.

  2. Question About the Released "Retain" Model Checkpoint:
    I inspected the configuration file for the released "retain" model (vlm_unlearning_ft_llava_phi_3_mini_retain). In the configuration, it seems to load a model from the following path: model_id: ./models/final_ft_10_epochs_lr2e-05_llava-phi_retain.
    The fine-tuning process in the paper is described as 10 epochs. Does this mean the released checkpoint was trained for the initial 10 epochs, and then trained for an additional 2 epochs before being saved for release? I would appreciate clarification on the exact training process for the provided checkpoints.

  3. Discrepancy in eval_results Logs:
    I also checked the logs available in the eval_results folder of the repository. I found that the performance scores reported in those logs also differ from the results published in the paper's tables.

Could you please provide some insight into these points? Any clarification would be immensely helpful for me and others in the community trying to engage with your work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions