[Web] Quantized model decreases in size, but takes same amount of inference time as non-quantized model

### Describe the issue

I have a transformer model from which I'm exporting all the modules (i.e. source embedding, positional encoding, encoder, decoder, projection layer etc) separately to onnx. For simplicity, I am going to focus on just one module - the encoder. The non-quantized encoder module was sized 75.7 MB and it took around 110 milliseconds for inference in onnx runtime javascript.  I used the following code to quantize the module - 

```
# encoder
quantize_dynamic(
    model_input=f'{common_dir}/encoder.onnx',
    model_output=f'{common_dir}/quantized/encoder.onnx',
    weight_type=QuantType.QUInt8,
)
```

The generated quantized model is of size 19.2 MB. However, the web inference is still taking roughly the same time, meaning the quantization has not had an impact in inference time. 

This is the inference code -

```
 const src_encoder_out = await session.src_encode.run({
    input_1: src_pos_out,
    input_2: src_mask,
 }).then((res) => res[871])
```

This is the session configuration -

```
const sessionOptions = {
               executionProviders: ['wasm'],
               enableCpuMemArena: true,
               // enableGraphCapture: true,
               executionMode: "parallel",
               enableMemPattern: true,
               intraOpNumThreads: 4,
               graphOptimizationLevel: "extended"
            }

            // create the session variable
            const session = {
               ...
               src_encode: await ort.InferenceSession.create("./models/encoder.onnx", sessionOptions),
               ...
            }
```

Why is the quantized model smaller, but take the same time to infer as the non-quantized model?

### To reproduce

Unfortunately, the onnx files are too big to upload here. 

### Urgency

_No response_

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

ONNX Runtime Web v1.18.0

### Execution Provider

'wasm'/'cpu' (WebAssembly CPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Web] Quantized model decreases in size, but takes same amount of inference time as non-quantized model #21535

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Web] Quantized model decreases in size, but takes same amount of inference time as non-quantized model #21535

Description

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions