Hello, I conducted an experiment incorporating Vary-tiny (loading Vary-toy weights) into Internvl7B, but I used llava-558K to train the projector. The final model outputs a lot of irrelevant content, and I suspect the alignment stage failed. I'd like to ask if anyone have tried aligning Vary using only llava-558k, and whether it's necessary for me to use 4M samples from Laion-coco for alignment.