How to explicitly specify which variables require grads? #1080

mplxrg · 2021-09-09T05:56:11Z

mplxrg
Sep 9, 2021

Hi,

I'm trying to differentiate a function with millions of parameters with only a few parameters requiring grads. However, Enzyme would compute the gradients for all of them. I wondered whether it is necessary to explicitly specify which variables require grads to achieve better performance.

wsmoses · 2021-09-16T00:31:11Z

wsmoses
Sep 16, 2021
Maintainer

The answer presently is "kind of" Enzyme presently supports annotations on individual parameters. This information is fed into Enzyme's activity analysis to deduce if any given instruction (such as a load/store) will impact the derivative computation or not.

We could extend the representation to not just mark a value as inactive or not, but the value at a given pointer offset as inactive or not. This would allow what you describe (assuming your parameters are in arrays, if they're arguments you're good to go).

Alternatively you might be able to get some of those benefits by making a wrapper function which itself has two arguments: one which is the inactive arguments (marked accordingly) and another which are the active args.

Do you have a simple test case you can demonstrate this with?

I'm specifically curious to see what potential performance boost you're hoping to see -- and how that compares with the added analysis code and time complexity.

Alternatively, if its very few inputs, forward mode differentiation might be what you're looking for. The __enzyme_autodiff function performs reverse mode which allows you to get the derivative of all inputs wrt a single output in one call. Forward mode (in progress via __enzyme_fwddiff) lets you get the derivative of all outputs wrt a single input in one call. Reverse-mode is useful for gradients, machine learning, or certain scientific codes that need the derivative of many inputs, but not the optimal algorithm for all use cases (thus, forward mode exists).

0 replies

mplxrg · 2021-09-30T07:30:32Z

mplxrg
Sep 30, 2021
Author

Sorry for the late reply. I made a simple test case as follows. In this example, I split the parameters into two arrays and explicitly specified which one requires grad and which doesn't . dsum2 is two times faster than dsum1. In some more complex cases, the reverse pass could be 10x slower than the forward pass. So, I was thinking that differentiating all the parameters might cause extra memory access and hurt the performance. Creating a wrapper function seems to be an alternative way, I'll try to test it a bit and see if it could improve the performance.

#include <algorithm>
#include "timer.h"
using namespace std;
void __enzyme_autodiff(...);
int enzyme_dup, enzyme_const, enzyme_out, enzyme_dupnoneed;

double sum(double *x1, int size1,
           double *x2, int size2)
{
    double ret = 0.;
    for (int j = 0; j < size1; j++)
        ret += x1[j];
    for (int j = 0; j < size2; j++)
        ret += x2[j];
    return ret;
}

void dsum1(double *x1, double *dx1, int size1,
           double *x2, double *dx2, int size2)
{
    __enzyme_autodiff((void *)(sum),
                      enzyme_dup, x1, dx1,
                      enzyme_const, size1,
                      enzyme_dup, x2, dx2,
                      enzyme_const, size2);
}

void dsum2(double *x1, double *dx1, int size1,
           double *x2, int size2)
{
    __enzyme_autodiff((void *)(sum),
                      enzyme_dup, x1, dx1,
                      enzyme_const, size1,
                      enzyme_const, x2,
                      enzyme_const, size2);
}

int main()
{
    int size1 = 100;
    int size2 = 100;

    double x1[size1];
    double dx1[size1];
    double x1_const[size1];
    std::fill_n(x1, size1, 1.);
    std::fill_n(dx1, size1, 0.);
    std::fill_n(x1_const, size1, 0.);

    double x2[size2];
    double dx2[size2];
    double x2_const[size2];
    std::fill_n(x2, size2, 1.);
    std::fill_n(dx2, size2, 0.);
    std::fill_n(x2_const, size2, 0.);

    dsum1(x1, dx1, size1,
          x2, dx2, size2);

    dsum2(x1, dx1, size1,
          x2, size2);
}

I also tested the forward mode autodiff. It seems that the current version cannot handle the Eigen data types like Eigen::Vector.

0 replies

wsmoses · 2021-09-30T15:01:26Z

wsmoses
Sep 30, 2021
Maintainer

@tgymnich re forward mode

The other thing we could do, if the data structure is fixed is introduce a parameterized activity analysis (e.g. specify somewhere that at byte offsets 0, 8, and 16 the data is inactive, whereas it is active at 24). These would need to be known constants though.

The other thing for this case in particular, is you might be able to use one parameter, and two for loops -- the second one with an __enzyme_inactiveval (like described in #311) and get the same performance behavior.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to explicitly specify which variables require grads? #1080

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to explicitly specify which variables require grads? #1080

Uh oh!

mplxrg Sep 9, 2021

Replies: 3 comments

Uh oh!

Uh oh!

wsmoses Sep 16, 2021 Maintainer

Uh oh!

mplxrg Sep 30, 2021 Author

Uh oh!

wsmoses Sep 30, 2021 Maintainer

mplxrg
Sep 9, 2021

wsmoses
Sep 16, 2021
Maintainer

mplxrg
Sep 30, 2021
Author

wsmoses
Sep 30, 2021
Maintainer