PyTorch quantization and inference optimization #890

evelina-gudauskayte · 2021-04-22T09:29:05Z

evelina-gudauskayte
Apr 22, 2021

Hello! I am using PyTorch model in my Kotlin application, but i need to make inference time faster, I noticed that there is quantize method in Model class, but is not implemented. I also tried to load dynamically quantized model into DJL, but could not see any improvement. So I have 2 questions:

Is there any way to make predictions faster using DJL?
Are quantized PyTorch models weights interpreted as int8 or any other way in DJL? (couldn't find weights while debugging)

Answered by zachgk

Apr 22, 2021

To make prediction faster, you can take a look at our inference performance optimization document for some ideas.

DJL doesn't affect the weights of an imported pytorch model. The model is imported entirely inside the C++ native pytorch engine (the same one underlying the pytorch python code) and we can just rely on that.

It may be possible to use static quantization, but I haven't looked into it too much. If you can quantize your model and then save the quantized format, executing the model through DJL may execute it quantized

View full answer

zachgk · 2021-04-22T23:45:43Z

zachgk
Apr 22, 2021
Maintainer

To make prediction faster, you can take a look at our inference performance optimization document for some ideas.

DJL doesn't affect the weights of an imported pytorch model. The model is imported entirely inside the C++ native pytorch engine (the same one underlying the pytorch python code) and we can just rely on that.

It may be possible to use static quantization, but I haven't looked into it too much. If you can quantize your model and then save the quantized format, executing the model through DJL may execute it quantized

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch quantization and inference optimization #890

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

PyTorch quantization and inference optimization #890

Uh oh!

evelina-gudauskayte Apr 22, 2021

Replies: 1 comment

Uh oh!

Uh oh!

zachgk Apr 22, 2021 Maintainer

evelina-gudauskayte
Apr 22, 2021

zachgk
Apr 22, 2021
Maintainer