matrix normal distribution

Implement one or more forms of the matrix normal distribution.  The general distribution is of the form

```
matrix_normal(matrix[N, P] y | matrix[N, P] mu, cov_matrix[N, N] SigmaRow, cov_matrix[P, P] SigmaCol),
```

where `SigmaRow` and `SigmaCol` are positive definite covariance matrices.  The covariance matrix `SigmaRow` controls the covariance of the rows of `y` whereas `SigmaRow` controls the covariances of the columns.

There are three design decisions about how `SigmaRow` and `SigmaCol` are parameterized, which are whether to use

1. diagonal or dense matrices,
2. covariance or precision (inverse covariance), and
3. Cholesky factors or full matrices.

There is also the issue of how to resolve the non-identifiability of scale between the two matrices, because

```
matrix_normal(y | mu, SigmaRow, SigmaCol) == matrix_normal(y | mu, c * SigmaRow, (1 / c) * SigmaCol).
```

The log density can be expressed as follows, but this is not how we'd compute it.

```
log matrix_normal(y | mu, SigmaRow, SigmaCol)
= - 1 / 2 * tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
   - N / 2 * log det(SigmaRow) 
   - P / 2 * log det(SigmaCol)
   - N * P / 2 * log(2 * pi())
```

If `LRow` and `LCol` are Cholesky factors of `SigmaRow` and `SigmaCol` so that, e.g., `SigmaRow = LRow * LRow'`, and `z ~ normal(0, I)`, then

```
y = mu + LRow * z * LCol ~ matrix_normal(mu, SigmaRow, SigmaCol)
```

As shown on [the Wikipedia page for matrix normal](https://en.wikipedia.org/wiki/Matrix_normal_distribution), we can evaluate that trace term as

```
tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
= (vec(y) - vec(mu))' * inv(SigmaCol kronecker SigmaRow) * (vec(y) - vec(mu))
```

where `vec(y)` is the flattening of `y`.  We then want to use the property of [Kronecker products](https://en.wikipedia.org/wiki/Kronecker_product) that

```
inv(SigmaCol kronecker SigmaRow) = inv(SigmaCol) kronecker inv(SigmaRow).
```

@bgoodri suggested going further with Cholesky factorizations `LRow` and `LCol` using the rules for [inverse of the product of square matrixes](https://en.wikipedia.org/wiki/Matrix_multiplication#Square_matrices) `inv(A * B) = inv(B) * inv(A)` and the [trace of a product](https://en.wikipedia.org/wiki/Trace_(linear_algebra)#Trace_of_a_product) rule `tr(A' * B) = vec(A)' * vec(B)`, to derive

```
tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
= tr(inv(LCol * LCol') * (y - mu)' * inv(LRow * LRow') * (y - mu))
= tr(inv(LCol') * inv(LCol) * (y - mu)' * inv(LRow') * inv(LRow) * (y - mu))
= tr((LCol' \ LCol \ (y - mu)') * (LRow' \ LRow \ (y - mu)))
= dot_product(vec(LCol' \ LCol \ (y - mu')), vec(LRow' \ LRow \ (y - mu)))
```

Identifiability will be left up to the user.  One way to achieve it is to define one of the matrices to be a correlation matrix with a simplex of scales. That's most efficient when done with a simple product of a Cholesky factor rather than a quadratic form for the full correlation matrix.  A second, more symmetric way to achieve a proper posterior is to decompose both covariance matrices into scales time Cholesky factors and put priors on both of the scales.  

There is also a matrix Student-t distribution, which should just follow whatever matrix normal does.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

matrix normal distribution #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

matrix normal distribution #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions