-
Notifications
You must be signed in to change notification settings - Fork 205
Description
Describe the bug
The MultiHeadAttention layer throws a "The buffer cannot be subscribed" error during model compilation. When attempting to use the MultiHeadAttention layer to process sequence data, the model fails to compile with an error indicating that the buffer cannot be subscribed.
model.add( hugectr.DenseLayer( layer_type=hugectr.Layer_t.MultiHeadAttention, bottom_names=["query_embedding_reshape", "key_embedding_reshape", "value_embedding_reshape"], top_names=["attention_out"], num_attention_heads=4, ) )
To Reproduce
sudo docker run --gpus=all -it --cap-add SYS_NICE -v /home/ubuntu/hugectr:/home/ubuntu -w /home/ubuntu nvcr.io/nvidia/merlin/merlin-hugectr:24.06
Expected behavior
The model should compile successfully, and the MultiHeadAttention layer should correctly process the input query, key, and value tensors and output the attention computation results.
Screenshots
Environment (please complete the following information):
nvcr.io/nvidia/merlin/merlin-hugectr:24.06
