Optimize BatchNormalization by avoiding tensor slicing #661

robertknight · 2025-04-12T06:56:20Z

Since the tensor is contiguous and we only need the data for each chunk, we can replace N * C slice_mut calls with a much cheaper chunks_mut iterator.

This made BatchNormalization ~10% faster on a MobileNet v4 model where there are layers with many channels but a relatively small number of elements per channel.

Since the tensor is contiguous and we only need the data for each chunk, we can replace N * C `slice_mut` calls with a much cheaper `chunks_mut` iterator. This made BatchNormalization ~10% faster on a MobileNet v4 model where there are layers with many channels but a relatively small number of elements per channel.

robertknight merged commit 0fb9130 into main Apr 12, 2025
2 checks passed

robertknight deleted the batch-norm-opt branch April 12, 2025 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize BatchNormalization by avoiding tensor slicing #661

Optimize BatchNormalization by avoiding tensor slicing #661

Uh oh!

robertknight commented Apr 12, 2025

Uh oh!

Uh oh!

Uh oh!

Optimize BatchNormalization by avoiding tensor slicing #661

Optimize BatchNormalization by avoiding tensor slicing #661

Uh oh!

Conversation

robertknight commented Apr 12, 2025

Uh oh!

Uh oh!

Uh oh!