-
Notifications
You must be signed in to change notification settings - Fork 5
Description
First of all, congrats on the amazing work!, I myself have been working on something like this for the last week but seeing this is such a relief, it'll just confirm that this idea works and there's even an implementation ready.
Hugging face recently merged this into their master. https://huggingface.co/blog/poedator/4d-masks
With that, there should be no need for the custom model and it should just work. However, the input attention masks would need to be 4d, and yours are 2d. I'm sure mathematically is the same, I just need to understand how to build this.
here is an example of how to use 4d masks in hf.
the way you call the model is quite similar as well:
packed_outputs = custom_model(
input_ids=packed_tokens.to(device),
attention_mask=independent_mask.to(device),
position_ids=restart_positions.to(device),
return_dict=True,
output_hidden_states=True,
)
Do you have any pointers on how could I reuse your data processor directly with huggingface 4d masks so I don't need a custom model and can train any model that supports this API in hf?