Description
Dear Author,
Thank you for sharing the code implementation of your work. I have been carefully studying your paper and codebase, but I have come across a discrepancy that I would like to clarify.
In your paper, the reduce_sim term is mentioned as a part of the loss function, which plays a role in optimizing the prompt selection process. However, in the provided code, it appears that reduce_sim is calculated but not included in the loss function (it is commented out). As a result, the prompt_key_dict, which is randomly initialized, does not receive any gradient updates during training.
Here are my specific questions:
Was the exclusion of reduce_sim from the loss function intentional in the provided code? If so, could you elaborate on the reasoning behind this decision?
How does the omission of reduce_sim affect the effectiveness of the learned prompts, especially since the keys (prompt_key_dict) remain randomly initialized without updates?
If this was an oversight, could you provide guidance on how to properly incorporate reduce_sim into the loss function and ensure that prompt_key_dict is updated?
Did the implementation consider the diversity of keys during training, such as tracking how frequently each key was selected? For example, is there a mechanism to ensure that less frequently used keys are adapted or penalized to promote diversity?
I appreciate your time and effort in addressing these queries, as they are critical to understanding and reproducing the results presented in your paper.
Thank you!
Best regards,
Jason
Activity