-
Notifications
You must be signed in to change notification settings - Fork 157
Open
Description
负责人
机器型号
6148
commit号
based on #164
初始性能
10个sample
I0820 10:33:36.597270 35686 inference.cc:211] Load 10 samples from /home/tangjian/ernie/Inference/c++/ernie/seq128_data/test_ds_10
I0820 10:33:37.552497 35686 inference.cc:351] Run 10 samples, average latency: 95.519 ms per sample.
I0820 10:33:37.552565 35686 inference.cc:356] Run 9 samples, average latency [exclude 1 warmup steps]: 89.8265 ms per sample.
profile 结果
Event Calls Total Min. Max. Ave. Ratio.
thread0::fc 740 625.813 0.013066 2.59008 0.845693 0.511826
thread0::load 202 269.789 0.009506 168.902 1.33559 0.220649
thread0::elementwise_add 380 78.8811 0.045364 29.1685 0.207582 0.0645135
thread0::transpose2 480 63.5249 0.089001 4.08297 0.132343 0.0519543
thread0::dropout 380 51.7229 0.01217 0.262424 0.136113 0.0423019
thread0::layer_norm 250 43.0832 0.150904 0.226994 0.172333 0.0352359
thread0::matmul 250 36.4239 0.033627 14.6704 0.145696 0.0297895
thread0::relu 120 22.7715 0.130891 1.77192 0.189762 0.0186238
thread0::scale 140 11.0508 0.006102 0.105016 0.0789342 0.00903797
thread0::softmax 120 9.73205 0.050275 0.451451 0.0811004 0.00795943
thread0::reshape2 480 4.47205 0.006964 0.022523 0.00931677 0.0036575
thread0::lookup_table 30 2.67894 0.074823 0.105928 0.089298 0.00219099
thread0::stack 10 1.43889 0.130984 0.154692 0.143889 0.00117681
thread0::tanh 10 0.986778 0.084346 0.191761 0.0986778 0.000807043
thread0::slice 10 0.12234 0.009367 0.033865 0.012234 0.000100057
thread0::feed 40 0.109835 0.001013 0.005219 0.00274588 8.98293e-05
thread0::fetch 10 0.106458 0.006874 0.011848 0.0106458 8.70674e-05
TODO
-
去掉load @tensor-tang,不会统计进预测时间,可以忽略 - dropout多线程 @GaoWei8
- fuse @intel
acrazing
Metadata
Metadata
Assignees
Labels
No labels