Hey @cceyda,
I came here via https://cceyda.github.io/blog/dali/cv/image_processing/2020/11/10/nvidia_dali.html
nice blog post!
Did you use pillow-simd built against libjpeg-turbo, or just the vanilla version? I find libjpeg-turbo speeds things up considerably (even compared to opencv it's 3x fast in my benchmarks). Also check out simplejpeg which so far seems to be the fasted jpeg loader I've yet to come across.