Improve `expected_loglikelihood` differentiability and type-stability

For convenience, the old version of `expected_loglikelihood` (Gauss-Hermite quadrature method) looked like this:
https://github.com/JuliaGaussianProcesses/GPLikelihoods.jl/blob/e9b7da99e46f56859209ff27bbb36d12512d0ad4/src/expectations.jl#L83-L109

#90 introduced a work-around/hack for two (possibly interrelated) issues of that implementation:
- Computing the gradient with Zygote was 1-3 orders of magnitude slower than expected, when called with `NegativeBinomialLikelihood`,
- The function was generally type-unstable #77.

The type-instability of the broadcast could be related to https://github.com/JuliaLang/julia/issues/45748 (see also https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/issues/458, which documents a strange behavior in which inference depends on previous evaluations, which I also observed in the `Broadcast` construction, but could not resolve).

I therefore tried to get rid of the `Broadcast` entirely, hoping that type stability would improve performance. First I tried a custom implementation of pairwise sum for two function arguments (i.e. I implemented `sum(f, X, Y)`, which is equivalent to `mapreduce(f, +, X, Y)` based on the implementation of `mapreduce(f, +, X)` in Base, see #77 for the reason behind this). That implementation can be found in [here](https://github.com/simsurace/GPLikelihoods.jl/blob/8bd9a04641b8aff0b081430ae3f8721fc256eef6/src/expectations.jl#L83-L151). 

Although that implementation makes the function type stable, it was still not very performant. For this reason, in #90 I chose an implementation which allocates an explicit array, which I believed to be more Zygote-friendly. The large improvements over the old versions seen in the benchmarks confirmed this intuition (see https://github.com/JuliaGaussianProcesses/GPLikelihoods.jl/pull/90#issuecomment-1163515226).

However, the solution is not very clean, as it pays for this performance improvement with additional allocations in the forward pass. It is also unclear whether this implementation will be as beneficial or generalize to other AD backends. The potentially better approach would be to define an `rrule` for the broadcasted sum, as suggested in https://github.com/JuliaGaussianProcesses/GPLikelihoods.jl/pull/90#issuecomment-1163548553 (although even then it would be nice to have the function be type-stable anyway).

	# Compute the expected_loglikelihood over a collection of observations and marginal distributions
	function expected_loglikelihood(
	gh::GaussHermiteExpectation, lik, q_f::AbstractVector{<:Normal}, y::AbstractVector
	)
	# Compute the expectation via Gauss-Hermite quadrature
	# using a reparameterisation by change of variable
	# (see e.g. en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature)
	return sum(Broadcast.instantiate(
	Broadcast.broadcasted(y, q_f) do yᵢ, q_fᵢ # Loop over every pair
	# of marginal distribution q(fᵢ) and observation yᵢ
	expected_loglikelihood(gh, lik, q_fᵢ, yᵢ)
	end,
	))
	end

	# Compute the expected_loglikelihood for one observation and a marginal distributions
	function expected_loglikelihood(gh::GaussHermiteExpectation, lik, q_f::Normal, y)
	μ = mean(q_f)
	σ̃ = sqrt2 * std(q_f)
	return invsqrtπ * sum(Broadcast.instantiate(
	Broadcast.broadcasted(gh.xs, gh.ws) do x, w # Loop over every
	# pair of Gauss-Hermite point x with weight w
	f = σ̃ * x + μ
	loglikelihood(lik(f), y) * w
	end,
	))
	end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `expected_loglikelihood` differentiability and type-stability #93

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve expected_loglikelihood differentiability and type-stability #93

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Improve `expected_loglikelihood` differentiability and type-stability #93