| Overfitting and Underfitting
There is a trade-off between the complexity of a model and the generality of a dataset
| Bias and Variance
Sources of Error in Mahicne Learning
* Approximation: 열심히 fitting 했는데 잘 안된 것
* Generalization: 앞으로 다가올 미래의 데이터에 대한 오차
$E_{out} \leqslant E_{in} + \Omega$
* $E_{out}$ is the estimation error
* $E_{in}$ is the error from approximation by the learning algorithms (from approximation)
* $\Omega$ is the error caused by the variance of the observations (from variance of the observations ... generalization)
Symbols
* $f$ : the target function to learn
* $g$ : the learning function of ML
* $g^{(D)}$ : the learned function by using a dataset, $D$, or an instance of hypothesis
* $D$ : an available dataset drawn from the real world
* $\bar{g}$ : the average hypothesis of a given infinite number of $D$s
- Formally, $\bar{g}(x) = E_D{\left [ g^{(D)}(x)) \right ]}$
$E_{out}{\left [ g^{(D)}(x)) \right ]} = E_X\left [ \left ( g^{(D)}(x)-f(x) \right )^2 \right ]$
위 식으로부터 전개하다 보면 ...
$E_D\left [ E_{out}\left [ g^{(D)}(x)) \right ] \right ]=E_X\left [ E_D\left [ \left ( g^{(D)}(x)-\bar{g}(x) \right )^2 \right ] +\left ( \bar{g}(x)-f(x) \right )^2 \right ]$
Let's define
$\textup{Variance}(x)= E_D\left [ \left ( g^{(D)}(x)-\bar{g}(x) \right )^2 \right ] $
$\textup{Bias}^2(x)=\left ( \bar{g}(x)-f(x) \right )^2 $
Semantically, what do they mean?
* Variance is an inability to train a model to the average hypothesis because of the dataset limitation (Generalization)
* Bias is an inability to train an average hypothesis to match the real world
(세상의 모든 데이터를 다 봐도 TRUE function을 Approx. 하는 부분에 한계가 존재)
How to reduce the bias and the variance?
* Reducing the variance → Collecting more data
* Reducing the bias → More complex model
However, if we reduce the bias, we increase the variance, and vice versa
* Bias and Variance Dilemma
* We will see why this is in the next slide by empirical evaluations ...
| Occam's Razor
Zero degree line v.s. One degree line
A complex model has a higher variance and a lower bias
A simple model has a lower variance and a higher bias
→ Need a balance in the complexity of a ML algorithm
(현실에서는 True function 을 알 수 없으므로 Bias 와 Variance 를 계산할 수 없긴 하다.)
Occam's Razor
여러 개의 후보군이 있다면 '가정의 개수가 적은 모델', 'Simple model' 을 선택하는 것이 좋다.
(새로운 데이터에 대한 위험성 낮추는 방향)
| Cross Validation
Infinite number of samples 현실에서 불가능하지만 이를 시뮬레이션 해보자.
Example. N-fold cross validation → average hypothesis 계산 가능
| Performance Measure of ML
Accuracy, Precision and Recall, F-measure, ROC curve, ...
지도교수님 메일은 스팸메일로 분류하면 안된다.
걸러진 스팸메일이면 무조건 스팸메일이어야 한다는 지표가 중요 ... Precision 이라는 지표가 중요
CRM (Classifying VIP customer) ... Recall 이라는 지표가 더 중요
Precision, Recall 종합적으로 고려 ... F-measure
| Regularization
Variance Error 를 줄이기 위하여 Perfect Fitting 은 포기
- We sacrifice the perfect fit (Reducing the training accuracy)
- We increase the potential fit in the test
We add a new term to the MSE
Ridge regularization (L2 regularization)
$$ E(w) = \frac{1}{2}\sum_{n=0}^{N}(train_n - g(x_n, w))^2 + \frac{\lambda}{2}\left \| w \right \|^2 \\ $$
Lasso regularization (L1 regularization)
$$ E(w) = \frac{1}{2}\sum_{n=0}^{N}(train_n - g(x_n, w))^2 + \lambda \left \| w \right \| $$
Weight 에 대하여 미분하여 계산도 가능
$ \frac{d}{dw}E(w)=0 \\ $
$ w = (X^TX+\lambda I)^{-1}\cdot X^T\cdot train_n $
Reference
문일철 교수님 강의
https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=32
https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=33
https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=34
https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=35
https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=36
https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=37
https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=38
'Study > Lecture - Basic' 카테고리의 다른 글
W7.L3. Bayesian Network - Interpretation of Bayesian Network (0) | 2023.06.04 |
---|---|
W7.L1-2. Bayesian Network - Probability Concepts (0) | 2023.05.07 |
W5.L1-9. Support Vector Machine (0) | 2023.05.06 |
W4.StatQuest. MLE, Gaussian Naive Bayes (0) | 2023.05.06 |
W4.L7-8. Logistic Regression - Naive Bayes to Logistic Regression (0) | 2023.05.06 |