0%

Fisher Information Matrix

Learning a parameter vector $\theta$:

Here $p(x|\theta)$ is the likelihood, which we need to maximize wrt. $\theta$.

Score function:

Claim. The expected value of $s(\theta)$ wrt. $\theta$ is $0$.

Proof.

Fisher Information Matrix is the covariance of score function:

$\text F=\mathop{\mathbb{E}}_{p(x\vert\theta)}\left[ (s(\theta) - 0)(s(\theta) - 0)^{\text{T}}\right] =\mathop{\mathbb{E}}_{p(x\vert\theta)}\left[\nabla\log p(x\vert\theta)\nabla\log p(x\vert\theta)^{\text{T}}\right] $

Calculating the exact expectation can be hard, so we approximate the expectation by using empirical distribution. Given training data $X =\{ x_1, x_2,\cdots, x_N\}$, we have Empirical Fisher Information Matrix:

Not sure: Calculating empirical FIM takes $O(nk^2)$ time, where $n$ is the size of training set and $k$ is the number of parameters.