W8.L3-4. Gaussian Mixture Model - Multinomial Distribution
| Multinomial Distribution
Binary variable
- Selecting 0 or 1 → binomial distribution
- Selecting 0, 1, 2, ... → multinomial distribution
How about K options?
Multinomial distribution (A generalization of binomial distribution)
주사위 한 번 던지는 경우
One observation: $X_1=(0, 0, 1, 0, 0, 0)$
$P(X \mid \mu) = \prod_{k=1}^{K} \mu_{k}^{x_k}$
such that $\mu_{k} \geqslant 0,\ \sum_{k}\mu_{k}=1$
$\sum_{k} x_{k} = 1 $
특정 선택지를 선택할 확률: $\mu_{k}$
- 첫 번째를 선택할 확률: $\mu_{1}$
- 두 번째를 선택할 확률: $\mu_{2}$
- ...
- $k$ 번째를 선택할 확률: $\mu_{k}$
$\mu$ 가 Given 인 상황에서 $X$ 라는 Data 를 관측할 확률
$$
\begin{align*}
P(X \mid \mu) &= \prod_{k=1}^{K} \mu_{k}^{x_k} \\
&= \mu_{1}^{x_1} \cdot \mu_{2}^{x_2} \cdot \mu_{3}^{x_3} \cdot \mu_{4}^{x_4} \cdot \mu_{5}^{x_5} \cdot \mu_{6}^{x_6} \\
&= \mu_{3} \\
\end{align*}
$$
주사위 25 번 던지는 경우
N=25 observations: $X_1, X_2, ... , X_N $
Number of selecting $k^{th}$ option out of $N$ selections
$$P(X \mid \mu) = \prod_{n=1}^{N} \prod_{k=1}^{K} \mu_{k}^{x_{nk}} = \prod_{k=1}^{K} \mu_{k}^{\sum_{n=1}^{N} x_{nk}} = \prod_{k=1}^{K} \mu_{k}^{m_k}$$
when $ m_k = \sum_{n=1}^{N} x_{nk} $
| How to determine the maximum likelihood solution of $\mu$ ?
MLE
Maximize $P(X \mid \mu) = \prod_{k=1}^{K} \mu_{k}^{m_k} $
Subject to $\mu_{k} \geqslant 0,\ \sum_{k}\mu_{k}=1$
Constrained Optimization ... Lagrange Method 로 풀면 ...
$\mu_{k} = \frac{m_k}{N} $
특정 선택지를 선택한 횟수 / 전체 선택 횟수
( 마치 Binomial 에서 $\frac{a_H}{a_H + a_T}$ 처럼 )
| Multivariate Gaussian Distribution
Probability density function of the Gaussian distribution
$$ N(x \mid \mu,\ \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2} $$
$$ N(x \mid \mu,\ \sigma^2) = \frac{1}{(2 \pi)^{\frac{D}{2}}}\frac{1}{\left| \Sigma \right| ^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)} $$
MLE 로 $\mu$, $\Sigma$ 추정해보면 ...
$$ \hat{\mu}=\frac{\sum_{n=1}^{N}x_n}{N} $$
$$\hat{\Sigma}=\frac{1}{N}\sum_{n=1}^{N}(x_n - \hat{\mu})(x_n - \hat{\mu})^T $$
| Mixture Model
앞서 살펴본 2 가지 재료를 연결시켜주는 것 (Multinomial Distrobution, Multivariate Gaussian Distribution)
$$ P(x) = \sum_{k=1}^{K} \pi_{k}\cdot N (x \mid \mu_{k},\ \sigma_{k} ) $$
Mixing coefficients $ \pi_{k} $ :
A normal distribution is chosen out of $K$ options with probability
$\sum_{k=1}^{K} \pi_{k} = 1 $, $ 0\leq \pi_k \leq 1 $
This is a probability (as well as weighting)
Mixture component $ N (x \mid \mu_k,\ \sigma_k ) $ :
A distribution for the subpopulation
Reference
문일철 교수님 강의
https://www.youtube.com/watch?v=mnUcZbT5E28&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=50
https://www.youtube.com/watch?v=mnUcZbT5E28&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=51