Skip to content

Incorrect gradients of batchnorm in testmode #548

Open
@phaim

Description

@phaim

I have been trying to take the derivative of a Flux model in testmode, and noticed that the BatchNorm layer behaves incorrectly for 4D and 5D CUDA-arrays.
Here is a MVE of this behaviour, computing the gradient of the BatchNorm for differently reshaped inputs:

using Flux, CUDA, Zygote

function gradient_varying_shape(m, x, n_dims, device)
    m = m |> device
    Flux.testmode!(m)

    x = reshape(x, ntuple(i -> 1, n_dims)) |> device
    return gradient(input -> sum(m(input).^2), x)[1] |> cpu
end

model = BatchNorm(1)
x = [1f0]

for i=2:7
    cpu_gradient = gradient_varying_shape(model, x, i, cpu) 
    gpu_gradient = gradient_varying_shape(model, x, i, gpu) 
    println("n_dim=$i, cpu: $(cpu_gradient[1]), gpu: $(gpu_gradient[1])")
end

This gives the following output for me:

n_dim=2, cpu: 1.99998, gpu: 0.0
n_dim=3, cpu: 1.99998, gpu: 1.99998
n_dim=4, cpu: 1.99998, gpu: 0.0
n_dim=5, cpu: 1.99998, gpu: 0.0
n_dim=6, cpu: 1.99998, gpu: 1.99998
n_dim=7, cpu: 1.99998, gpu: 1.99998

Looking through the Code, I found that the implementation of the CUDA backwards batchnorm here ignores the argument training. Could this be the origin of this behavior?

I'm using Julia 1.9.3 with NNlib version 0.9.7 and this environment:

[052768ef] CUDA v5.0.0
[587475ba] Flux v0.14.6
[e88e6eb3] Zygote v0.6.66
[02a925ec] cuDNN v1.2.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions