| Learning Parameters with Only $X$
X 만 주어져 있을 때 모든 Parameters 알아내보자!
이전에는 $\pi, a, b$ 안다는 가정: 이 말은 즉슨 $X$ 와 $Z$ 를 모두 안다는 가정
... 현실에서는 어려움
EM Algorithm
Iteratively optimizing $\hat{\pi}, \hat{a}, \hat{b}$ and $Z$
1. Finding the optimized $\hat{\pi}, \hat{a}, \hat{b}$ with $X$
2. Finding the most probable $Z$ with $X, \hat{\pi}, \hat{a}, \hat{b}$
| EM Algorithm
Initialize $\theta^{0}$ to an arbitrary point
Loop until the liklihood converges
- Expectation step
* $q^{t+1}(Z)=argmin_{q}\ KL(\ q(Z)\parallel P(Z \mid X,\ \theta^{t})\ )$
→ $q^{t}(Z)=P(Z \mid X,\ \theta)$
... Assign $Z$ by $P(Z \mid X,\ \theta)$
* Assign $Z$ !
- Maximization step
* $\theta^{t+1}=argmax_{\theta}\ Q(\theta, q^{t}) = argmax_{\theta}\ E_{q^{t}(Z)}ln P(X, Z \mid \theta^{t}) $
* Fixed $Z$ means that there is no unobserved variables
* Same optimization of ordinary MLE
* $Z$ 와 $X$ 가 있으니 $\theta$ 계산!
| EM Algorithm for HMM
Initialize $\pi^{0}, a^{0}, b^{0}$ to an arbitrary point
Loop until the liklihood converges
- Expectation step
* $q^{t+1}(Z)=P(Z \mid X,\pi^{t}, a^{t}, b^{t})$
* Assign $Z$ by $P(Z \mid X,\pi^{t}, a^{t}, b^{t})$
- Maximization step
* $ P(X, Z \mid \pi, a, b) = \pi_{z_1} \prod_{t=2}^{T} a_{z_{t-1}, z_t} \prod_{t=1}^{T}b_{z_t, x_t} $
* $ ln P(X, Z \mid \pi, a, b) = \pi_{z_1} + \sum_{t=2}^{T} ln a_{z_{t-1}, z_t} + \sum_{t=1}^{T} ln b_{z_t, x_t} $
* $ E_{q^{t+1}(z)}ln P(X, Z \mid \pi, a, b) = \sum_{Z}q^{t+1}(z)ln P(X, Z \mid \pi, a, b) = \sum_{Z}P(Z \mid X, \pi^t, a^t, b^t) \cdot ln P(X, Z \mid \pi, a, b) $
| Derivation of EM
$$
\begin{align*}
E_{q^{t+1}(z)}ln P(X, Z \mid \pi, a, b) &= \sum_{Z}q^{t+1}(z)ln P(X, Z \mid \pi, a, b) \\
&= \sum_{Z}P(Z \mid X, \pi^t, a^t, b^t) \cdot ln P(X, Z \mid \pi, a, b) \\
&= \sum_{Z}P(Z \mid X, \pi^{t}, a^{t}, b^{t})\left\{ ln \pi_{z_1} + \sum_{t=2}^{T} ln a_{z_{t-1}, z_t} + \sum_{t=1}^{T} ln b_{z_t, x_t}\right\} \\
\end{align*}
$$
Optimize with constraint of $\sum_{i} \pi_{i} = 1$
... Lagrange method!
... 풀어보면 ...
$$
\begin{align*}
\pi_{i}^{t+1} &= \frac{P(z_{1}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t})}{\sum_{j=1}^{K}P(z_{1}^{j}=1 \mid X, \pi^{t}, a^{t}, b^{t})} \\
a_{i,j}^{t+1} &= \frac{\sum_{t=2}^{T}P(z_{t-1}^{i}=1, z_{t}^{j}=1 \mid X, \pi^{t}, a^{t}, b^{t})}{\sum_{t=2}^{T}P(z_{t-1}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t})} \\
b_{i,j}^{t+1} &= \frac{\sum_{t=1}^{T}P(z_{t}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t}) \cdot \delta (idx(x_t)=j)}{\sum_{t=1}^{T}P(z_{t}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t})} \\
\end{align*}
$$
| Baum Welch Algorithm
EM algorithm for HMM. a.k.a. Baum-Welch Algorithm
Initialize $\pi^{0}, a^{0}, b^{0}$ to an arbitrary point
Loop until the liklihood converges
Expectation step
* $q^{t+1}(Z)=P(Z \mid X,\pi^{t}, a^{t}, b^{t})$
* Assign $Z$ by $P(Z \mid X,\pi^{t}, a^{t}, b^{t})$
$$
\begin{align*}
P(z_{t}^{k}=1, X) &= \alpha_{t}^{k} \cdot \beta_{t}^{k} \\
\alpha_{t}^{k} &= b_{k, x_t} \sum_{i} \alpha_{t-1}^{i} \cdot a_{i, k} \\
\beta_{t}^{k} &= \sum_{i} a_{k, i} \cdot b_{i, x_t} \cdot \beta_{t+1}^{i} \\
P(X) &= \sum_{i} \alpha_{T}^{i} \\
\end{align*}
$$
Maximization step
$$
\begin{align*}
\pi_{i}^{t+1} &= \frac{P(z_{1}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t})}{\sum_{j=1}^{K}P(z_{1}^{j}=1 \mid X, \pi^{t}, a^{t}, b^{t})} \\
a_{i,j}^{t+1} &= \frac{\sum_{t=2}^{T}P(z_{t-1}^{i}=1, z_{t}^{j}=1 \mid X, \pi^{t}, a^{t}, b^{t})}{\sum_{t=2}^{T}P(z_{t-1}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t})} \\
b_{i,j}^{t+1} &= \frac{\sum_{t=1}^{T}P(z_{t}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t}) \cdot \delta (idx(x_t)=j)}{\sum_{t=1}^{T}P(z_{t}^{i}=1 \mid X, \pi^{t}, a^{t}, b^{t})} \\
\end{align*}
$$
Reference
문일철 교수님 강의
https://www.youtube.com/watch?v=mnUcZbT5E28&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=61
'Study > Lecture - Basic' 카테고리의 다른 글
W10.L2. Sampling Based Inference - Rejection Sampling (0) | 2023.06.18 |
---|---|
W10.L1. Sampling Based Inference - Forward Sampling (0) | 2023.06.12 |
W9.L4. Hidden Markov Model - Viterbi Decoding (0) | 2023.06.11 |
W9.L3. Hidden Markov Model - Forward-Backward Probability (0) | 2023.06.11 |
W9.L2. Hidden Markov Model - Joint, Marginal Probability (1) | 2023.06.11 |