Description
Hi,
I'm trying to reproduce the Sana-Sprint experiment and have two questions regarding the configuration and dataset used for training the student model:
-
Teacher Checkpoint for Student Initialization:
As I understand from the Sana-Sprint paper, the student model should be initialized using a fine-tuned teacher model. However, in the configuration file (SanaSprint_1600M_1024px_allqknorm_bf16_scm_ladd.yaml
), the parameter is set as follows:model.load_from: hf://Efficient-Large-Model/Sana_Sprint_1.6B_1024px/checkpoints/Sana_Sprint_1.6B_1024px.pth
Should this checkpoint actually be the teacher checkpoint (for example, something like
Sana_Sprint_1.6B_1024px_teacher.pth
), which is then used to initialize the student model? Please confirm if the current setting is correct. -
Dataset Details for Student Training:
There isn’t enough information available about the dataset used for training the student model. Could you provide details on:- The number of image-caption pairs used.
- The sources of the images (e.g., LAION, SA-1B, internal databases, etc.).
- How the captions were generated (e.g., using models like GPT-4 or other methods).
Any clarification or additional documentation on these points would be greatly appreciated.
Thank you!