Skip to content

[Question] Janus text-to-image training code #188

@miyapeng

Description

@miyapeng

Required prerequisites

Questions

I use the janus text-to-image training code and I have some questions about detailed code.

First, I use the janus code named Align_Anything_Janus from your recommended repo, and in ../Align_Anything_Janus/janus/models/modeling_vlm.py file , I found that your training code

elif task == "generation":
            image_token_num_per_image = 576
            cfg_weight = 5
            temperature = 1
            tokens = torch.zeros((2*input_ids.size(0), input_ids.size(1)), dtype=torch.int).cuda()
            for i in range(2):
                tokens[i*input_ids.size(0):(i+1)*input_ids.size(0), :] = input_ids
                if i % 2 != 0:
                    tokens[i*input_ids.size(0):(i+1)*input_ids.size(0), 1:-1] = 100015 # pad_id

            inputs_embeds = self.language_model.get_input_embeddings()(tokens)
            print("Embedding size:", self.language_model.get_input_embeddings().weight.size(0))
            print("Max token id in input_ids:", input_ids.max())
            outputs = self.language_model.model(inputs_embeds=inputs_embeds, use_cache=True, past_key_values=None)
            hidden_states = outputs.last_hidden_state
            logits = self.gen_head(hidden_states)
            logits_cond = logits[0::2, :]
            logits_uncond = logits[1::2, :]

            all_logits = logits_uncond + cfg_weight * (logits_cond - logits_uncond)

For this , input_ids contain text and image ids, but you seem to process image token ids using text embedding processor in inputs_embeds = self.language_model.get_input_embeddings()(tokens), so I want to know why , and I think if it should use the mmgpt.prepare_gen_img_embeds provided by janus.
Maybe I am not right , but I really want to know why to handle the tokens and why just run self.language_model.get_input_embeddings()(tokens) just once?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions