Question 1
For a GMM, $D = \{x_1,\dots,x_N\}$, $\theta = \{\pi_k,\mu_k,\Sigma_k\}_{k=1}^K$
- What is the ML solution of $\theta$ ?
- If $\pi\sim\text{Dir}(N_{10},\dots,N_{K0})$, $\mu_k\sim\mathcal N(\mu_k\vert m_{k0},\Sigma_{k0})$. What is the MAP solution of $\theta$ ?
- What is $p(x_{N+1}\vert\theta_\text{MAP})$ ?
Solution 1.1
The GMM:
The log likelihood function:
Derivative with respect to $\pi_k$:
Setting the derivative equal to zero:
Derivative with respect to $\mu_k$:
Setting the derivative equal to zero:
Derivative with respect to $\Sigma_k$:
Setting the derivative equal to zero:
For the non-linear system above, we can then use EM methods to solve the ML solution of $\theta$:
Initialize $\theta$, and evaluate the initial value of the log likelihood.
E step. Evaluate the responsibilities using the current parameter values
M step. Re-estimate the parameters using the current responsibilities
Evaluate the log likelihood function
and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied return to step 2.
Solution 1.2
To solve $\pi_k$,
To solve $\mu_k$,
To solve $\Sigma_k$,
Similar to 1.1, but the M step is modified for MAP:
M step. Re-estimate the parameters using the current responsibilities
Then by EM method we can obtain $\theta_\text{MAP}$.
Solution 1.3
The prediction:
Question 2
For a HMM, $D = \{x_1,\dots,x_N\}$, $\theta = \{\pi,A,\mu_k,\Sigma_k\}_{k=1}^K$
- What is the ML solution of $\theta$ ?
- If $\pi\sim\text{Dir}(N_{10},\dots,N_{K0})$, $A^{(k)}\sim \text{Dir}(M_{10}^{(k)},M_{10}^{(k)})$, $\mu_k\sim\mathcal N(\mu_k\vert m_{k0},\Sigma_{k0})$. What is the MAP solution of $\theta$ ?
- What is $p(x_{N+1}\vert\theta_\text{MAP})$ ?
Solution 2.1
The likelihood function
EM algorithm to find an efficient framework
Initialize $\theta$, and evaluate the initial value of the log likelihood.
E step. Use $\gamma(z_n)$ to denote the marginal posterior distribution of a latent variable $z_n$, and $\xi(z_{nβ1}, z_n)$ to denote the joint posterior distribution of two successive latent variables, evaluate these quantities in this step:
M step. Re-estimate the parameters using the current responsibilities
Evaluate the log likelihood function and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied return to step 2.
Solution 2.2
Modify the M step of 2.1 to obtain the MAP solution:
Then by EM method we can obtain $\theta_\text{MAP}$.
Solution 2.3
The prediction:
Question 3
For a stock market model, $\pi = [0.5,0.5]$, $A=\begin{bmatrix}0.6 & 0.3 \\ 0.4 &0.7\end{bmatrix}$, $B = \begin{bmatrix}0.8&0.1\\0.2&0.9\end{bmatrix}$, $z=\{\text{bull},\text{bear}\}$, $D=\{\text{rise},\text{fall}\}$
If we have an observation of $D = \{\text{fall},\text{fall},\text{rise}\}$:
- What is $p(D\vert \pi,A,B)$?
- What are $p(z_2\vert D,\pi, A,B)$ and $p(z_2,z_3\vert D,\pi,A,B)$, respectively?
- What is the optimal $\{z_1,z_2,z_3\}$?
- If there is a new observation of $x_4 = \text{rise}$, what is $p(x_4\vert D,\pi,A,B)$?
Solution 3.1
Therefore
So we have
Solution 3.2
Solution 3.3
The optimal $\{z_1,z_2,z_3\}$ is $\{\text{bear},\text{bear},\text{bull}\}$, which maximizes $p(z_1,z_2,z_3\vert D,\pi,A,B)$.