Skip to content
This repository was archived by the owner on May 7, 2025. It is now read-only.
This repository was archived by the owner on May 7, 2025. It is now read-only.

MatMul Broadcasting #197

@AmineDiro

Description

@AmineDiro

Describe the bug
Hello,
Thanks for this amazing project !

I was trying to run the sentence-transformers/all-MiniLM-L6-v2 model and I had the following error:

Can't create session: 
GpuError(CompileError { node: "/encoder/layer.0/attention/self/query/MatMul", 
    error: UnimplementedVariant { 
    variant: "broadcasting for two stacks of matrixes (left side has shape 1x512x384:f32, 
                right side has shape 384x384:f32)", 
    op: "MatMul" } 
})

It seems that you can't multiply 1x512x384 @ 384x384 = 1x512z384. I don't know if there is a need to match the same dimensions.
I understand that MatMul doesn't support shape inference and I don't know if this issue is related to this. I might be doing something wrong when preparing the model so could you help on this matter. all-MiniLM-L6-v2 is basically a bert model and I saw that Bert ran successfully using wonnx.

To Reproduce

  1. Transform sentence-transformers/all-MiniLM-L6-v2 to ONNX format using optimum lib
  2. Ran onnx-simplify:
python -m onnxsim model.onnx sim_model.onnx --overwrite-input-shape "input_ids:1,512" "attention_mask:1,512" "token_type_ids:1,512"

Note I also ran nnx prepare model.onnx model-prepared.onnx --set batch=1 --set sequence=512 on another test but it didn't solve the issue

  1. Ran the following test:
    use std::collections::HashMap;

    use tokio_test;
    use wonnx::{utils::InputTensor, Session};
    #[test]
    fn test_load_model() {
        let session = tokio_test::block_on(Session::from_path(PATH_XXX.onnx).expect("Can't create session");

        let mut input: HashMap<String, InputTensor> = HashMap::new();
        let tokens = vec![1f32; 512];
        let attention_mask = vec![1f32; 512];
        let token_type_ids = vec![0f32; 512];
        // For now ['input_ids', 'token_type_ids', 'attention_mask']
        input.insert("input_ids".to_string(), tokens[..].into());
        input.insert("attention_mask".to_string(), attention_mask[..].into());
        input.insert("token_type_ids".to_string(), token_type_ids[..].into());
        let output = tokio_test::block_on(session.run(&input)).unwrap();
        assert!(output.len() > 1)

Expected behavior
Run session inference

Desktop (please complete the following information):

  • OS: MacOS 13.5.1

Thanks for your help, I'm also open to contributing if there is a fix to implement if you point me to the right direction 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions