From vector calculus we're familiar with the notion of the gradient of a function \(f:\RR^n\to\RR\), \(\nabla f\). Assuming cartesian coordinates it is the vector of partial derivatives, \((\partial f/\partial x_1,\partial f/\partial x_2,\dots,\partial f/\partial x_n)^\top\), and points in the direction of the greatest rate of change of \(f\). If we think of \(f\) as a real-valued function of real \(m\times n\) matrices, \(f:\text{Mat}_{m,n}(\RR)\to\RR\), then \(\nabla f\) is an \(m\times n\) matrix,\begin{equation*}\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}}\\\frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}&\cdots&\frac{\partial f}{\partial x_{2n}}\\\vdots&\vdots&\ddots&\vdots\\\frac{\partial f}{\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}}\end{pmatrix}.\end{equation*}
Read MoreIf a \(N\)-dimensional random vector \(\mathbf=(X_1,X_2,\dots,X_N)^\top\) is distributed according to a multivariate Gaussian distribution with mean vector \begin and covariance matrix \(\mathbf\) such that \(\Sigma_=\cov(X_i,X_j)\) we write \(\mathbf)\) and the probability density function is given by \begin{equation*}p(\mathbf{x}|\boldsymbol{\mu},\mathbf
Read MoreBy way of "warming up", in this and the next two posts we'll review some of the foundational material (mostly from probability and stats) we'll need. This review is in no way comprehensive. On the contrary it is highly selective and brief.
Bayes' theorem
Bayes' theorem is absolutely critical so let's kick off by recalling it, \begin{equation*}P(A|B)=\frac{P(B|A)P(A)}{P(B)}.\end{equation*}
Read More