Skip to content

Optimize inference performance of ERNIE on CPU #180

@tensor-tang

Description

@tensor-tang

负责人

@tensor-tang @GaoWei8

机器型号

6148

commit号

based on #164

初始性能

10个sample

I0820 10:33:36.597270 35686 inference.cc:211] Load 10 samples from /home/tangjian/ernie/Inference/c++/ernie/seq128_data/test_ds_10
I0820 10:33:37.552497 35686 inference.cc:351] Run 10 samples, average latency: 95.519 ms per sample.
I0820 10:33:37.552565 35686 inference.cc:356] Run 9 samples, average latency [exclude 1 warmup steps]: 89.8265 ms per sample.

profile 结果

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::fc                 740         625.813     0.013066    2.59008     0.845693    0.511826
thread0::load               202         269.789     0.009506    168.902     1.33559     0.220649
thread0::elementwise_add    380         78.8811     0.045364    29.1685     0.207582    0.0645135
thread0::transpose2         480         63.5249     0.089001    4.08297     0.132343    0.0519543
thread0::dropout            380         51.7229     0.01217     0.262424    0.136113    0.0423019
thread0::layer_norm         250         43.0832     0.150904    0.226994    0.172333    0.0352359
thread0::matmul             250         36.4239     0.033627    14.6704     0.145696    0.0297895
thread0::relu               120         22.7715     0.130891    1.77192     0.189762    0.0186238
thread0::scale              140         11.0508     0.006102    0.105016    0.0789342   0.00903797
thread0::softmax            120         9.73205     0.050275    0.451451    0.0811004   0.00795943
thread0::reshape2           480         4.47205     0.006964    0.022523    0.00931677  0.0036575
thread0::lookup_table       30          2.67894     0.074823    0.105928    0.089298    0.00219099
thread0::stack              10          1.43889     0.130984    0.154692    0.143889    0.00117681
thread0::tanh               10          0.986778    0.084346    0.191761    0.0986778   0.000807043
thread0::slice              10          0.12234     0.009367    0.033865    0.012234    0.000100057
thread0::feed               40          0.109835    0.001013    0.005219    0.00274588  8.98293e-05
thread0::fetch              10          0.106458    0.006874    0.011848    0.0106458   8.70674e-05

TODO

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions