-
Notifications
You must be signed in to change notification settings - Fork 507
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Required prerequisites
- I have read the documentation https://align-anything.readthedocs.io.
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
Questions
I use the janus text-to-image training code and I have some questions about detailed code.
First, I use the janus code named Align_Anything_Janus from your recommended repo, and in ../Align_Anything_Janus/janus/models/modeling_vlm.py file , I found that your training code
elif task == "generation":
image_token_num_per_image = 576
cfg_weight = 5
temperature = 1
tokens = torch.zeros((2*input_ids.size(0), input_ids.size(1)), dtype=torch.int).cuda()
for i in range(2):
tokens[i*input_ids.size(0):(i+1)*input_ids.size(0), :] = input_ids
if i % 2 != 0:
tokens[i*input_ids.size(0):(i+1)*input_ids.size(0), 1:-1] = 100015 # pad_id
inputs_embeds = self.language_model.get_input_embeddings()(tokens)
print("Embedding size:", self.language_model.get_input_embeddings().weight.size(0))
print("Max token id in input_ids:", input_ids.max())
outputs = self.language_model.model(inputs_embeds=inputs_embeds, use_cache=True, past_key_values=None)
hidden_states = outputs.last_hidden_state
logits = self.gen_head(hidden_states)
logits_cond = logits[0::2, :]
logits_uncond = logits[1::2, :]
all_logits = logits_uncond + cfg_weight * (logits_cond - logits_uncond)
For this , input_ids contain text and image ids, but you seem to process image token ids using text embedding processor in inputs_embeds = self.language_model.get_input_embeddings()(tokens), so I want to know why , and I think if it should use the mmgpt.prepare_gen_img_embeds provided by janus.
Maybe I am not right , but I really want to know why to handle the tokens and why just run self.language_model.get_input_embeddings()(tokens) just once?
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested