Dear Author,
Thank you for sharing your excellent work. I have a couple of questions regarding the methodology:
-
Is the proposed method applicable to models beyond the one used in the paper? Specifically, can it be adapted to other architectures, or is it tightly coupled to the model described in your experiments?
-
Are there any constraints or limitations on the parameters that can be updated? For example, are there specific layers where the update ratio should not be modified, or any parts of the model that are sensitive to such changes?
I appreciate your time and would be grateful for any clarification.
Best regards,