W10.L4. Sampling Based Inference

Study/Lecture - Basic

W10.L4. Sampling Based Inference - Markov Chain

공부해라이 2023. 6. 20. 15:18

| INTRO

ML 계열에서 가장 많이 쓰이는 Sampling Method: Gibbs Sampling

Gibbs Sampling 은 Metropolis-Hastings Alogirhtm 의 Special Case!

| EM Algorithm

Unsupervised Learning 에서

Unobserved variable (Latent variable) $Z$ 에 대하여 Marginalize 하고,

$P(X \mid \theta) $ 를 Maximize 하려고 하니 로그 안에 들어가서 계산이 쉽지 않았었다 ...

$$
\begin{align*}
P(X \mid \theta)&=\sum_{Z}P(X, Z \mid \theta) \\
\rightarrow\ \ \ ln P(X \mid \theta)&=ln \left\{ \sum_{Z}P(X, Z \mid \theta)\right\}
\end{align*}
$$

그리하여 ... Inequality 쓰고 ... EM Algorithm 을 도출했었다!

EM Algorithm

Initialize $\theta^{0}$ to an arbitrary point

Loop until the likelihood converges

- Expectation step

* $q^{t+1}(Z)=argmin_{q}\ KL(\ q(Z)\parallel P(Z \mid X,\ \theta)\ )$

→ $q^{t}(Z)=P(Z \mid X,\ \theta)$

... Assign $Z$ by $P(Z \mid X,\ \theta)$

- Maximization step

* $\theta^{t+1}=argmax_{\theta}\ Q(\theta, q^{t}) = argmax_{\theta}\ E_{q^{t}(Z)}ln P(X, Z \mid \theta^{t}) $

* Fixed $Z$ means that there is no unobserved variables

* Same optimization of ordinary MLE

Expectation Step 에서 Lagrange Optimization 등 계산가능하지만

이번에는 Sampling 으로 한 번 풀어나가보자!

사실은 Sampling-based Inference 가 Gaussian Mixture 이런 모델에 잘 적용되는 것은 아니고 ...

여러 단계의 Latent Variables 순차적으로 연결되어 있는 Latent Direction Allocation 같이

Text 분야의 Soft-clustering 모델 같은 곳에 Gibss Sampler 가 Collapse 되어 잘 쓰일 수 있다 ...

무튼 오늘은 Gaussian Mixture 모델에 어찌 적용되는지 살펴보자!

| Markov Chain

결국 EM 알고리즘에서 $\theta$ 를 Optimization 하는 것이 목적

($\theta$ 를 Opzimiation 하다보면 ... $Z$ 도 나오고 ... 하는 것)

여기서 Optimization 이 아니라!

$Z$ 를 지속적으로 Sampling 해보는 것 입니다!

$Z$ 를 지속적으로 Sampling 하다보니 ... 어느샌가 Optimized $\theta$ 를 구하게 된다라는 것!

예전에 Forward, Backward Sampling ... 등등에서 배운건

'과거에 만든' sampling instance 가 '현재' sampling instance 와 전혀 상관이 없다는 IID 가정이었지만,

이번에 배워볼 Gibbs Sampler 는 Marcov Chain 을 활용해서 Sampling 하게 된다!

(버려졌던 '과거에 만든' sample 을 다시 활용하는 측면)

Marcov Chain 에 대한 이해가 필요하니 한 번 살펴보자.

Each node has a probability distribution of states

Concrete observation of a system: [1 0 0] → the system is at the fisrt state

Stochastic observation of a system: [0.7 0.2 0.1] → the system is likely at the first state

$Z_{t}$ : "어느 정도의 확률로 각 State 에 Assign 될 것인지를 확률분포로 표현"

Ex. 첫 번째 State 0.5 의 확률 / 두 번째 State 0.2 의 확률 / 세 번째 State 0.3 의 확률

... 아주 많은 State ... 계속해서 Stochastically 상태천이하는 그래프

| Properties of Markov Chain

Recurrent: 다시 일어날 수 있는 State (다음번에 다시 등장할 확률이 반드시 존재)

Transient: 다시 일어날 수 없는 State (다음번에 다시 등장할 확률이 X)

| Stationary Distribution

Metropolis-Hastings Alogirhtm 을 동작하게 하는 핵심 내용!

Return Time

특정 Time 0 에서 $i$ 번째 State 에 방문했다가 다시 방문한 경우 중 Minimum (바로 다음번 방문까지의 $n$)

$ RT_i=min\left\{ n>0: X_n=i \mid X_0=i\right\} $

Limit Theorem

If a Markov chain is Irreducible and Ergodic

(모든 state 가 Communicate 하고, Recurrent, Aperiodic 한 경우, 아래와 같이 정의 가능)

$ \pi_i=\lim_{n\rightarrow \infty} T_{i,j}^{\ (n)}=\frac{1}{E[RT_i]} $

모든 State 마다 정의되는 값 (특정 State 에 이 시스템이 있을 확률의 분포)

Stationary Distribution 에서 다음 번 Transition 하더라도 똑같은 Stationary Distribution 이 되는 것

Transition 아주 많이 하다보면 Stationary Distribution 으로 넘어간다!

$\pi_i$ is uniquely determined by the set of equations

- $ \pi_i \geq 0 $
- $ \sum_{i \in S} \pi_{i}=1 $
- $ \pi_j=\sum_{i \in S} \pi_i T_{i,j} $

하나의 식으로 표현하면 ...

$$ \pi (I_{\left| S\right|, \left| S\right|}-T+1_{\left| S\right|, \left| S\right|})=1_{{1, \left| S\right|}} $$

$$ \pi - \pi T + \pi = 1 $$

Balance Equation (Detailed Balance)

Reversible Markov Chain

IF ...

$$\pi_{i} T_{i,j} = \pi_{j} T_{j,i}$$

Reference
문일철 교수님 강의
https://www.youtube.com/watch?v=mnUcZbT5E28&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=65