W10.L8-10. Sampling Based Inference

Study/Lecture - Basic

W10.L8-10. Sampling Based Inference - LDA

공부해라이 2023. 6. 21. 00:15

| INTRO

Gibbs Sampler - Parameter Inference 사례

Collapsed Gibbs sampling → Latent Dirichlet Allocation (LDA, Topic modeling)

총 10가지 (Lines) 주제

1. 주제 찾기, 결국 cluster 찾는 문제 (soft clustering)

2. Topic이 얼마만큼 비중을 가지고 나타나는지

3. 각 Topic을 가장 잘 설명하고 있는 단어군 추출

| Latent Dirichlet Allocation

Bayesian network plate notation

- $\alpha, \beta$ : Prior knowledge (Dirichlet Distribution Prior)

- $w$ : Observed word

- $z$ : 단어들에 대한 topic cluster assignment (TA, 상당히 중요한 역할)

- $\theta$ : 문서에 대한 topic assignment (TA, topic distribution in a document)

- $N$ : 한 문서에 있는 단어들의 개수 (Ex. 한 문서에 100개 단어, Iterations)

- $M$ : 문서의 수 (Ex. 10개의 문서, Iterations)

- $\varphi$ : 각 Topic 마다 어떤 단어를 쓸 것인지, Topic 별로 단어가 등장할 확률 (word distribution in a topic)

- $K$ : Topic 개수

$\theta, z$ : Multinomial distribution

$z, \varphi$ → $w$ 결정

| Finding Topic Assignment / Word

앞선 Bayesian network 예시 ...

There was a stroy of generating Mary's call from the event

Generative Process

$ \theta_{i} \sim \textup{Dir}(\alpha),\ i \in \left\{ 1, ..., M\right\} $

- 전체 corpus의 topic distribution ($\alpha$) 의 Prior에서 $\theta$ Doc. level TA가 생성됨

- 전체 corpus의 topic distribution 에서 특정문서의 topic을 하나 assign 받은 것

$ \varphi_{k} \sim \textup{Dir}(\beta),\ k \in \left\{ 1, ..., K\right\} $

- 개별 단어들마다 어떤 Topic 에 어떤 단어가 얼마만큼 쓰일것이냐라는 사전지식이

$ z_{i,l} \sim \textup{Mult}(\theta_{i}),\ i \in \left\{ 1, ..., M\right\},\ l \in \left\{ 1, ..., N\right\} $

- Document level TA는 word level TA에 영향

- Multinomial 이니깐 특정 soft cluster 에 대해서 선택된 형태의 $z$ 가 생성

- A 라는 단어가 Topic 1 에 어느정도 확률로 나올 것인지 담고있는 형태

$ w_{i,l} \sim \textup{Mult}(\varphi_{z_{i,l}}),\ i \in \left\{ 1, ..., M\right\},\ l \in \left\{ 1, ..., N\right\} $

- 특정 단어에 대한 TA 정보를 가지고 $\varphi$의 확률을 가지고 와서 하나의 단어를 선정한다는 내용

A word $w$ is generated from the distribution of $\varphi_{z}$ word-topic distribution

$z$ topic is generated from the distribution of $\theta$ document-topic distribution

$\theta$ document topic distribution is generated from the distribution of $\alpha$ (corpus level)

$\phi$ word-topic distribution is generated from the distribution of $\beta$

우리가 $z, \theta, \phi$ 에 대해서 잘 assign 할 수 있으면 문제가 풀리는 것

가장 Evidence 에 근접된 random variable 이 $Z$

만약 $Z$ 를 관측한 것처럼 알고있다라고 생각해보면, $\theta, \phi$ 를 estimation 가능

Z assignment 가 가장 중요!

If we have Z distribution, we can find the most likley $\theta$ and $phi$

Finding the most likley allocation of Z is the key of inference on $\theta$ and $\phi$

| Gibbs Sampling on Z (1)

Finding the most likley allocation on Z → Gibbs sampling

https://s-h-s-f.tistory.com/27 (Factorization)

$P(\varphi ; \beta) = P( \varphi_{i} \mid \beta) \cdot P(\beta)$ : 변치 않는 Prior 표시

Factorization

$$ P(W, Z, \theta, \varphi ; \alpha, \beta)=\prod_{i=1}^{K}P(\varphi_i;\beta) \prod_{j=1}^{M}P(\theta_j;\alpha)\prod_{l=1}^{N}P(Z_{j,l} \mid \theta_{j}) P(W_{j,l} \mid \varphi_{Z_{j,l}}) $$

Collapsed Gibbs Sampling: 간단히 만들기 위해 $\theta$ 와 $\varphi$ 생략해보자

(나중에 Z 를 통해서 $\theta$ 와 $\varphi$ 를 추정 가능)

위 Joint 에서 억지로 없애보려하면 Marginalization!

$$ P(W, Z ; \alpha, \beta)=\int_{\theta} \int_{\varphi}\ \prod_{i=1}^{K}P(\varphi_i;\beta) \prod_{j=1}^{M}P(\theta_j;\alpha)\prod_{l=1}^{N}P(Z_{j,l} \mid \theta_{j}) P(W_{j,l} \mid \varphi_{Z_{j,l}}) d\varphi d\theta $$

다행히 $\theta$ 와 $\varphi$ 는 각각 독립적인 형태라서 독립된 적분의 곱으로 표현 가능

$$ P(W, Z ; \alpha, \beta)=\int_{\varphi}\ \prod_{i=1}^{K}P(\varphi_i;\beta) \prod_{j=1}^{M}\prod_{l=1}^{N} P(W_{j,l} \mid \varphi_{Z_{j,l}})\ d\varphi \cdot \int_{\theta} \ \prod_{j=1}^{M}P(\theta_j;\alpha) \prod_{l=1}^{N}P(Z_{j,l} \mid \theta_{j})\ d\theta $$

추가적으로 적분을 없애기 위해서 ...

$$
\begin{align*}
\int_{\varphi}\ \prod_{i=1}^{K}P(\varphi_i;\beta) \prod_{j=1}^{M}\prod_{l=1}^{N} P(W_{j,l} \mid \varphi_{Z_{j,l}})\ d\varphi &= \prod_{i=1}^{K} \int_{\varphi_{i}} P(\varphi_i;\beta) \prod_{j=1}^{M}\prod_{l=1}^{N} P(W_{j,l} \mid \varphi_{Z_{j,l}})\ d\varphi_{i} \\
&= \prod_{i=1}^{K} \int_{\varphi_{i}} \frac{\Gamma(\sum_{v=1}^{V} \beta_{v})}{\prod_{v=1}^{V}\Gamma(\beta_{v})} \prod_{v=1}^{V}\varphi_{i,v}^{\ \beta_{v}-1} \prod_{j=1}^{M}\prod_{l=1}^{N} P(W_{j,l} \mid \varphi_{Z_{j,l}})\ d\varphi_{i} \\
\end{align*}
$$

...

Reference
문일철 교수님 강의
https://www.youtube.com/watch?v=mnUcZbT5E28&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=69
https://www.youtube.com/watch?v=mnUcZbT5E28&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=70
https://www.youtube.com/watch?v=mnUcZbT5E28&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=71