L1/L2 weight regularization #1654

dnajera27 · 2021-10-29T16:49:36Z

dnajera27
Oct 29, 2021

I was wondering if there's a preferred way of performing l1/l2 regularization on a neural network weights in Flax? I could not find an example in the documentation but I was basically trying to replicate what the kernel_regularizer method does in Tensorflow.

Thanks!

Answered by cgarciae

Oct 30, 2021

Hey! You can do something like this to get global L2 regularization:

def l2_loss(x, alpha):
    return alpha * (x ** 2).sum()

def loss_fn(...):
    ...
    loss = ...
    loss += sum(
        l2_loss(w, alpha=0.001) 
        for w in jax.tree_leaves(variables["params"])
    )

View full answer

cgarciae · 2021-10-30T12:19:31Z

cgarciae
Oct 30, 2021
Maintainer

Hey! You can do something like this to get global L2 regularization:

def l2_loss(x, alpha):
    return alpha * (x ** 2).sum()

def loss_fn(...):
    ...
    loss = ...
    loss += sum(
        l2_loss(w, alpha=0.001) 
        for w in jax.tree_leaves(variables["params"])
    )

0 replies

jheek · 2021-11-01T09:30:56Z

jheek
Nov 1, 2021
Maintainer

I refer @cgarciae approach but if you need to control how/which params get regularized from inside the model it's easier to sow the regularization loss:

class Foo(nn.Module):
  kernel_reg_fn: Callable

  @nn.compact
  def __call__(self, x):
     kernel = self.param("kernel", kernel_shape, self.kernel_init)
     self.sow("losses", "kernel_regularizer", self.kernel_reg_fn(kernel))

# when you apply the model...
y, out_variables = model.apply(variables, x, mutable=["losses"])
reg_loss = sum(jax.tree_leaves(out_variables["losses"]))

0 replies

YusukeSuzuki · 2021-12-16T08:31:47Z

YusukeSuzuki
Dec 16, 2021

As a supplement to @cgarciae's answer
Assuming params consists entirely of dict like objects, you can filter it as in the code below.

def find_params_by_node_name(params, node_name):
    from typing import Iterable

    def _is_leaf_fun(x):
        if isinstance(x, Iterable) and jax.tree_util.all_leaves(x.values()):
            return True
        return False

    def _get_key_finder(key):
        def _finder(x):
            value = x.get(key)
            return None if value is None else {key: value}
        return _finder

    filtered_params = jax.tree_map(_get_key_finder(node_name), params, is_leaf=_is_leaf_fun)
    filtered_params = [x for x in jax.tree_leaves(filtered_params) if x is not None]

    return filtered_params

model = MyModel()
params = model.init(...)['params']
kernels = find_params_by_node_name(params, 'kernel')

0 replies

mfouesneau · 2023-03-29T13:43:25Z

mfouesneau
Mar 29, 2023

Can you give a full example of using the l2_loss update?

0 replies

matthiasak · 2023-08-21T21:09:41Z

matthiasak
Aug 21, 2023

specifically with Convolutions, I was testing out WeightNorm and came up with a wrapper that might work.

class WSConv(nn.Conv):
    @nn.compact
    def __call__(self, x: Array) -> Array:
        conv = nn.Conv(
            self.features,
            self.kernel_size,
            self.strides,
            self.padding,
            self.input_dilation,
            self.kernel_dilation,
            self.feature_group_count,
            self.use_bias,
            self.mask,
            self.dtype,
            self.param_dtype,
            self.precision,
            self.kernel_init,
            self.bias_init,
            self.conv_general_dilated,
        )

        ws_weights_params = unfreeze(conv.init(key(1), x))
        ws_kernel = self.param("ws_kernel", nn.initializers.constant(ws_weights_params["params"]["kernel"]), ws_weights_params["params"]["kernel"].shape)
        mean = jnp.mean(jnp.mean(jnp.mean(ws_kernel, axis=1, keepdims=True), axis=2, keepdims=True), axis=3, keepdims=True)
        std = ws_kernel.reshape(ws_kernel.shape[0], -1).std(axis=1).reshape(-1, 1, 1, 1) + 1e-5
        ws_kernel = (ws_kernel - mean) / std

        ws_weights_params["params"]["kernel"] = ws_kernel
        ws_weights_params = freeze(ws_weights_params)

        return conv.apply(ws_weights_params, x)

basically wraps another conv cell and uses a new kernel param for the wrapped convolution so I can perform WS and then invoke apply(). Still working through implementation ideas. Maybe this can help get a better design across the board :-)

I like the idea of modeling it into the Modules themselves but it's just @cgarciae's implementation fits the flax paradigms better

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

L1/L2 weight regularization #1654

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

L1/L2 weight regularization #1654

Uh oh!

dnajera27 Oct 29, 2021

Replies: 5 comments

Uh oh!

Uh oh!

cgarciae Oct 30, 2021 Maintainer

Uh oh!

jheek Nov 1, 2021 Maintainer

Uh oh!

Uh oh!

YusukeSuzuki Dec 16, 2021

Uh oh!

mfouesneau Mar 29, 2023

Uh oh!

Uh oh!

matthiasak Aug 21, 2023

dnajera27
Oct 29, 2021

cgarciae
Oct 30, 2021
Maintainer

jheek
Nov 1, 2021
Maintainer

YusukeSuzuki
Dec 16, 2021

mfouesneau
Mar 29, 2023

matthiasak
Aug 21, 2023