Skip to content

PyTorch quantization and inference optimization #890

Answered by zachgk
evelina-gudauskayte asked this question in Q&A
Discussion options

You must be logged in to vote

To make prediction faster, you can take a look at our inference performance optimization document for some ideas.

DJL doesn't affect the weights of an imported pytorch model. The model is imported entirely inside the C++ native pytorch engine (the same one underlying the pytorch python code) and we can just rely on that.

It may be possible to use static quantization, but I haven't looked into it too much. If you can quantize your model and then save the quantized format, executing the model through DJL may execute it quantized

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by evelina-gudauskayte
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants