Skip to content

add ForwardDiff@1 #378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

add ForwardDiff@1 #378

wants to merge 9 commits into from

Conversation

penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Mar 28, 2025

Hooray, all tests passing (except known Enzyme failures!)

Closes #377
Closes #376

@penelopeysm
Copy link
Member Author

penelopeysm commented Mar 30, 2025

Test env still doesn't resolve to FD=1 since that would downgrade a bunch of other packages it seems. I'm trying to slowly track them down, starting with NNLib here FluxML/NNlib.jl#637 (comment), but it might be a while until everything's updated.

In the meantime, I removed ForwardDiff 0.10 from the test compat, to force the tests to run on FD=1.

@penelopeysm
Copy link
Member Author

penelopeysm commented Mar 30, 2025

ForwardDiff test fails at test/interface.jl:189. To reproduce the failing test:

using Distributions, Bijectors, ForwardDiff, LinearAlgebra, Test
dist = Dirichlet([1000 * one(Float64), eps(Float64)])
b = Bijectors.SimplexBijector()
r = rand(dist)
x = if any(r .> 0.9999)
    [0.0, 1.0][sortperm(r)]
else
    r
end
y = b(x)
ForwardDiff.jacobian(inverse(b), y)[1:(end - 1), :]
@test logabsdet(ForwardDiff.jacobian(inverse(b), y)[1:(end - 1), :])[1] 
    logabsdetjac(inverse(b), y)

I bisected the failure to JuliaDiff/ForwardDiff.jl#695 - it seems that the now (more correct) inequality comparisons are messing with

function _clamp(x, a, b)
T = promote_type(typeof(x), typeof(a), typeof(b))
ϵ = _eps(T)
clamped_x = ifelse(x < a, convert(T, a), ifelse(x > b, convert(T, b), x))
DEBUG && _debug("x = $x, bounds = $((a, b)), clamped_x = $clamped_x")
return clamped_x
end
because a and b are reals, whereas x is a ForwardDiff.Dual. Specifically, in this instance, we have

x = Dual(1.0, eps(Float64))
a = 0
b = 1

and before that PR, x < a and x > b both return false, so clamped_x is just set to x.

On the other hand, after that PR, x > b now returns true, so clamped_x is now convert(T, b) = Dual(1.0, 0).

It seems to me, mathematically, that _clamp is not differentiable at the points $x = a$ or $x = b$, so the new behaviour (derivative set to 0, true for $x \to a^-$ or $x \to b^+$) is no less correct than the old behaviour (derivative is preserved, true for $x \to a^+$ or $x \to b^-$).


Resolved by changing the parameters of the Dirichlet distribution so that it doesn't generate samples like [0, 1] for which the derivative is undefined.

@penelopeysm
Copy link
Member Author

penelopeysm commented Mar 31, 2025

Failing tests in test/transform.jl reported JuliaDiff/ForwardDiff.jl#738 - fixed in JuliaDiff/ForwardDiff.jl#739

@penelopeysm penelopeysm marked this pull request as draft March 31, 2025 23:41
@penelopeysm penelopeysm marked this pull request as ready for review April 3, 2025 19:31
@penelopeysm penelopeysm force-pushed the py/forwarddiff-1 branch 2 times, most recently from f257ca9 to 2583065 Compare April 3, 2025 21:03
@yebai yebai requested review from torfjelde and sunxd3 April 8, 2025 05:39
@sunxd3 sunxd3 requested a review from Copilot April 8, 2025 10:53
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 5 changed files in this pull request and generated 1 comment.

Files not reviewed (3)
  • src/Bijectors.jl: Language not supported
  • test/interface.jl: Language not supported
  • test/transform.jl: Language not supported

@sunxd3
Copy link
Member

sunxd3 commented Apr 8, 2025

On buildkite: short while ago, I asked the JuliaGPU folks to add this (GPU tests need to be run on a server with GPU, which a standard github CI doesn't). We don't currently have GPU testing set up. And I think this is why the buildkite is failing. (It doesn't look great to have a failed test though, I'll fix it soon.)

Copy link
Member

@sunxd3 sunxd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@@ -145,12 +145,10 @@ end
@testset "Multivariate" begin
vector_dists = [
Dirichlet(2, 3),
Dirichlet([1000 * one(Float64), eps(Float64)]),
Dirichlet([eps(Float64), 1000 * one(Float64)]),
Dirichlet([10.0, 0.1]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure these new tests could replace the current ones. Tests like

Dirichlet([1000 * one(Float64), eps(Float64)]),
        Dirichlet([eps(Float64), 1000 * one(Float64)]),

are aimed at the numerical stability of very extrate examples of Dirichlet distributions, i.e. one axis has a very tiny probability mass in average.

Copy link
Member Author

@penelopeysm penelopeysm Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the sample that's the problem. For the sample x = [1.0, 0.0], the transformed variable is y = [36.0436] which is outside of the range for which Float64 is numerically stable.

The issue comes from these lines:

@inbounds z = LogExpFunctions.logistic(y[1] - log(T(K - 1)))
@inbounds x[1] = _clamp((z - ϵ) / (one(T) - 2ϵ), 0, 1)

As y[1] tends to +Inf, z tends to 1, and the expression (z - ϵ) / (one(T) - 2ϵ) tends towards 1.0000000000000002. If that expression is greater than 1, then it gets _clamped to 1, and the derivative is set to 0.

The difference between FD 0.10 and FD 1.0 is that the new version sets the derivative to 0 if (z - ϵ) / (one(T) - 2ϵ) is greater than, or equal to, 1. And that in turn means that there is a larger range of y[1] for which the derivative gets clamped. Unfortunately, Float64 36.0436 falls into that category (35.8 would have been fine, or alternatively, BigFloat is ok up until around 175).

As far as I can tell the fact that it used to work with FD 0.10 might have been a happy accident – I wrote more about this in a comment above, but (to me) it makes sense for FD to set the derivative to 0 at the point (z - ϵ) / (one(T) - 2ϵ) == 1.

Copy link
Member Author

@penelopeysm penelopeysm Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not fully sure how to resolve this though, which is why I haven't really come back to this PR. Obviously changing the sample fixes the tests (and the easiest way to change the sample was to change the distribution from which it was drawn), but I can't tell if there's a workaround in the code that makes it work again for (z - ϵ) / (one(T) - 2ϵ) == 1.0, or more generally for large y.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @devmotion for your thoughts too :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure. I always had the feeling that this stick-breaking transform (explained in eg the Stan docs) can be numerically problematic. I also always thought that these eps workarounds are unsatisfying. But I'm not sure what exactly would be broken when they would be removed, maybe would be interesting to see.

@yebai
Copy link
Member

yebai commented Apr 23, 2025

@penelopeysm, please feel free to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants