Merge pull request #18 from thomaspinder/tools_notebook

thomaspinder · web-flow · commit a2e3944aed1f · 2021-01-07T21:09:32.000Z
Create technical details notebook
diff --git a/examples/technical_details.ipynb b/examples/technical_details.ipynb
@@ -0,0 +1,287 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Technical Details\n",
+    "\n",
+    "As with all packages, there are numerous technical details that are abstracted away from the user. Now in order to ensure a clean interface, this abstraction is entirely necessary. However, it can sometimes be confusing when navigating a package's source code to pin down what's going on when there's so many _under the hood_ operations taking place. In this notebook I'll aim to shed some light on all of tricks that we do in GPJax in order to help elucidate the code to anyone wishing to extend GPJax for their own uses.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Parameter Transformations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Motivations\n",
+    "\n",
+    "Many parameters in a Gaussian process are what we call a _constrained parameter_. By this, we mean that the parameters value is only defined on a subset of $\\mathbb{R}$. One example of this is the lengthscale parameter in any of the stationary kernels. It would not make sense to have a negative lengthscale, and as such the parameter's value is constrained to exist only on the positive real line. \n",
+    "\n",
+    "Whilst mathematically correct, constrained parameters can become a pain when optimising as many optimisers are designed to operate on an unconstrained space. Further, it can often be computationally inefficient to restrict the search space of an optimiser. For these reasons, we instead transform the constrained parameter to exist in an unconstrained space. Optimisation is then done on this unconstrained parameter before we transform it back when we need to evaluate its value. \n",
+    "\n",
+    "Only bijective transformations are valid as we cannot afford to lose our original parameter value when transforming. As such, we have to be careful about which transformations we use. Some common choices include the log-exponential bijection and the softplus transform. We, by default, opt for the softplus transformation in GPJax as it less prone to overflowing in comparison to log-exp transformations.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Implementation\n",
+    "\n",
+    "When it comes to implementations, we attach the transformation directly to the `Parameter` class. It is an optional argument that one can specify when instantiating their parameter. To see this, simply consider the following example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from gpjax.parameters import Parameter\n",
+    "from gpjax.transforms import Softplus\n",
+    "import jax.numpy as jnp\n",
+    "\n",
+    "x = Parameter(jnp.array(1.0), transform=Softplus())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we know that the softplus transformation operation on an input $x \\in \\mathbb{R}_{>0}$ can be written as \n",
+    "$$\\alpha(x) = \\log(\\exp(x)-1)$$\n",
+    "where $\\alpha(x) \\in \\mathbb{R}$. In this instance, it can be seen that $\\alpha(1)=0.54$. Now this unconstrained value is stored within the parameter's `value` property"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.541324854612918\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(x.value)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "whilst the original constrained value can be computed by accesing the parameter's `untransform` property"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(x.untransform)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Custom transformation\n",
+    "\n",
+    "Should you wish to define your own custom transformation, then this can very easily be done by simply extending the `Transform` class within `gpjax.transforms` and defining a forward transformation and a backward transformation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Transform:\n",
+    "    def __init__(self, name=\"Transformation\"):\n",
+    "        self.name = name\n",
+    "\n",
+    "    @staticmethod\n",
+    "    def forward(x: jnp.ndarray) -> jnp.ndarray:\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    @staticmethod\n",
+    "    def backward(x: jnp.ndarray) -> jnp.ndarray:\n",
+    "        raise NotImplementedError"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `forward` method is the transformation that maps from a constrained space to an unconstrained space, whilst the `backward` method is the transformation that reverses this. A nice example of this can be seen for the earlier used softplus transformation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from jax.nn import softplus\n",
+    "\n",
+    "class Softplus(Transform):\n",
+    "    def __init__(self):\n",
+    "        super().__init__(name='Softplus')\n",
+    "\n",
+    "    @staticmethod\n",
+    "    def forward(x: jnp.ndarray) -> jnp.ndarray:\n",
+    "        return jnp.log(jnp.exp(x) - 1.)\n",
+    "\n",
+    "    @staticmethod\n",
+    "    def backward(x: jnp.ndarray) -> jnp.ndarray:\n",
+    "        return softplus(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prior distributions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Motivations\n",
+    "\n",
+    "Often when we use Gaussian processes, we do so as they facilitate easily incorporation of prior information into the model. Implicitly, by the very use of a Gaussian process we are incorporating our prior inforation around the functional behaviour of the latent function that we are seeking to recover. However, we can take this one step further by placing priors on the hyperparameters of the Gaussian process. Going into the details of which priors are recommended and how to go about selecting them goes beyond the scope of this article, but it's suffice to say that doing so can greatly enhance the utility of a Gaussian process. \n",
+    "\n",
+    "At least in my own experience, when priors are placed on the hyperparameters of a Gaussian process they are specified with respect to the constrained parameter value. As an example of this, consider the lengthscale parameter $\\ell \\in \\mathbb{R}_{>0}$. When specifying a prior distribution $p_{0}(\\ell)$, I would typically select a distribution that has support on the positive real line, such as the Gamma distribution. An opposing approach would be to transform the parameter so that it is defined on the entire real line (as discussed in §1) and then specify a prior distribution such as a Gaussian that has an unconstrained support. Deciding which of these two approaches to adopt in GPJax is somewhat a moot point to me, so I've opted for priors to be defined on the constrained parameter. That being said, I'd be more than open to altering this opinion is people felt strongly that priors should be defined on the unconstrained parameter value."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Implementation\n",
+    "\n",
+    "Regarding the implementational details of enabling prior specification, this is hopefully a more lucid concept upon code inspection. As with the earlier discussed parameter transformations, the notion of a prior distribution is acknolwedged in the definition of a parameter. To exactly specify a prior distribution, one should simply call in the relevant distribution from TensorFlow probability's distributions module. For an example of this, consider the parameter `x` that was earlier defined."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tensorflow_probability.substrates.jax import distributions as tfd\n",
+    "\n",
+    "x.prior = tfd.Gamma(concentration = 3., rate = 2.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If we momentarily pause to consider the state of this parameter now, then we have a constrained parameter value with corresponding prior distribution. When it comes to deriving our posterior distribution, then we know that it is proportional to the product of the likelihood and the prior density function. As addition is less prone to numerical overflow than multiplication, we take the log of this produce. The log of a product is just a sum of logs, meaning that our log-posterior is then proportional to the sum of our log-likelihood and the log-prior density. Therefore, to connect the value of our parameter and its respective prior distribution, the only implementational point left to cover is how to evaluate the parameters log-prior density. This can be done through the following `@property`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "-0.613706111907959\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(x.log_density)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Naturally, should one wish to evaluate the prior density of the parameter, then the exponent can be taken"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.5413408768770793\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(jnp.exp(x.log_density))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cholesky decomposition\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "gpblocks",
+   "language": "python",
+   "name": "gpblocks"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  },
+  "toc-autonumbering": false,
+  "toc-showcode": false,
+  "toc-showmarkdowntxt": false,
+  "toc-showtags": false
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/gpjax/__init__.py b/gpjax/__init__.py
@@ -1,7 +1,11 @@
 from jax.config import config
 config.update("jax_enable_x64", True)
-from .gps.priors import Prior
+from . import gps
+from .gps import Prior  
 from .kernel import RBF
 from .likelihoods import Gaussian
 from .mean_functions import ZeroMean
 from .utilities import save, load
+
+
+__version__ = "0.1.0"
diff --git a/setup.py b/setup.py
@@ -18,6 +18,11 @@ def parse_requirements_file(filename):
     license='LICENSE',
     description=
     'Building blocks of Gaussian processes. Made to enable rapid development for researchers.',
-    long_description=open('README.md').read(),
+    long_description="GPJax aims to provide a low-level interface to Gaussian process models. Code is written entirely in Jax and Objax to enhance readability, and structured so as to allow researchers to easily extend the code to suit their own needs. When defining GP prior in GPJax, the user need only specify a mean and kernel function. A GP posterior can then be realised by computing the product of our prior with a likelihood function. The idea behind this is that the code should be as close as possible to the maths that we would write on paper when working with GP models.",
     install_requires=parse_requirements_file("requirements.txt"),
-)
+    # entry_points={
+    #     "console_scripts": [
+    #         "gpjax=gpjax.__main__:main",
+    #     ]
+    # },
+)