W6.L1-7. Training Testing and Regularization

| Overfitting and Underfitting

There is a trade-off between the complexity of a model and the generality of a dataset

| Bias and Variance

Sources of Error in Mahicne Learning

* Approximation: 열심히 fitting 했는데 잘 안된 것

* Generalization: 앞으로 다가올 미래의 데이터에 대한 오차

$E_{out} \leqslant E_{in} + \Omega$

* $E_{out}$ is the estimation error

* $E_{in}$ is the error from approximation by the learning algorithms (from approximation)

* $\Omega$ is the error caused by the variance of the observations (from variance of the observations ... generalization)

Symbols

* $f$ : the target function to learn

* $g$ : the learning function of ML

* $g^{(D)}$ : the learned function by using a dataset, $D$, or an instance of hypothesis

* $D$ : an available dataset drawn from the real world

* $\bar{g}$ : the average hypothesis of a given infinite number of $D$s

- Formally, $\bar{g}(x) = E_D{\left [ g^{(D)}(x)) \right ]}$

$E_{out}{\left [ g^{(D)}(x)) \right ]} = E_X\left [ \left ( g^{(D)}(x)-f(x) \right )^2 \right ]$

위 식으로부터 전개하다 보면 ...

$E_D\left [ E_{out}\left [ g^{(D)}(x)) \right ] \right ]=E_X\left [ E_D\left [ \left ( g^{(D)}(x)-\bar{g}(x) \right )^2 \right ] +\left ( \bar{g}(x)-f(x) \right )^2 \right ]$

Let's define

$\textup{Variance}(x)= E_D\left [ \left ( g^{(D)}(x)-\bar{g}(x) \right )^2 \right ] $
$\textup{Bias}^2(x)=\left ( \bar{g}(x)-f(x) \right )^2 $

Semantically, what do they mean?
* Variance is an inability to train a model to the average hypothesis because of the dataset limitation (Generalization)
* Bias is an inability to train an average hypothesis to match the real world

(세상의 모든 데이터를 다 봐도 TRUE function을 Approx. 하는 부분에 한계가 존재)

How to reduce the bias and the variance?
* Reducing the variance → Collecting more data
* Reducing the bias → More complex model

However, if we reduce the bias, we increase the variance, and vice versa
* Bias and Variance Dilemma
* We will see why this is in the next slide by empirical evaluations ...

| Occam's Razor

Zero degree line v.s. One degree line

A complex model has a higher variance and a lower bias

A simple model has a lower variance and a higher bias

→ Need a balance in the complexity of a ML algorithm

(현실에서는 True function 을 알 수 없으므로 Bias 와 Variance 를 계산할 수 없긴 하다.)

Occam's Razor

여러 개의 후보군이 있다면 '가정의 개수가 적은 모델', 'Simple model' 을 선택하는 것이 좋다.

(새로운 데이터에 대한 위험성 낮추는 방향)

| Cross Validation

Infinite number of samples 현실에서 불가능하지만 이를 시뮬레이션 해보자.

Example. N-fold cross validation → average hypothesis 계산 가능

| Performance Measure of ML

Accuracy, Precision and Recall, F-measure, ROC curve, ...

지도교수님 메일은 스팸메일로 분류하면 안된다.

걸러진 스팸메일이면 무조건 스팸메일이어야 한다는 지표가 중요 ... Precision 이라는 지표가 중요

CRM (Classifying VIP customer) ... Recall 이라는 지표가 더 중요

Precision, Recall 종합적으로 고려 ... F-measure

| Regularization

Variance Error 를 줄이기 위하여 Perfect Fitting 은 포기

- We sacrifice the perfect fit (Reducing the training accuracy)

- We increase the potential fit in the test

We add a new term to the MSE

Ridge regularization (L2 regularization)

$$ E(w) = \frac{1}{2}\sum_{n=0}^{N}(train_n - g(x_n, w))^2 + \frac{\lambda}{2}\left \| w \right \|^2 \\ $$

Lasso regularization (L1 regularization)

$$ E(w) = \frac{1}{2}\sum_{n=0}^{N}(train_n - g(x_n, w))^2 + \lambda \left \| w \right \| $$

Weight 에 대하여 미분하여 계산도 가능

$ \frac{d}{dw}E(w)=0 \\ $

$ w = (X^TX+\lambda I)^{-1}\cdot X^T\cdot train_n $

Reference
문일철 교수님 강의

https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=32

https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=33

https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=34

https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=35

https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=36

https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=37

https://www.youtube.com/watch?v=WFzkKr4Q9HU&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz&index=38

'Study > Lecture - Basic' 카테고리의 다른 글

W7.L3. Bayesian Network - Interpretation of Bayesian Network (0)	2023.06.04
W7.L1-2. Bayesian Network - Probability Concepts (0)	2023.05.07
W5.L1-9. Support Vector Machine (0)	2023.05.06
W4.StatQuest. MLE, Gaussian Naive Bayes (0)	2023.05.06
W4.L7-8. Logistic Regression - Naive Bayes to Logistic Regression (0)	2023.05.06

공부해라이

W6.L1-7. Training Testing and Regularization

| Overfitting and Underfitting

| Bias and Variance

| Occam's Razor

| Cross Validation

| Performance Measure of ML

| Regularization

'Study > Lecture - Basic' 카테고리의 다른 글

티스토리툴바

W6.L1-7. Training Testing and Regularization

| Overfitting and Underfitting

| Bias and Variance

| Occam's Razor

| Cross Validation

| Performance Measure of ML

| Regularization

'Study > Lecture - Basic' 카테고리의 다른 글

'Study/Lecture - Basic' Related Articles

티스토리툴바