Machine Learning/Models (with codes)

Simple Linear Regression (concepts)

metamong 2022. 4. 14.

** 우리는 저번시간에 Supervised Learning - Regression - Linear Regression까지 concepts에 대해 배웠다

(↓↓↓↓↓↓ 하단 포스팅 참조 ↓↓↓↓↓↓)

ML Supervised Learning → Regression → Linear Regression

1. ML 기법 구분 💆🏽‍♂️ 답이 주어져 있는 Supervised Learning 🙅🏽 답이 주어져 있지 않은 UnSupervised Learning → Simple Linear Regression(단순선형회귀)은 답이 주어져 있는 Dependent variable & In..

sh-avid-learner.tistory.com

1. HOW? - Simple Linear Regression (step-by-step)

→ independent variable(독립변수)과 dependent variable(종속변수)간의 관계를 보여주는 직선!

▩ 이해하기 쉽게 순서대로 따라가 보면 ▩

(아래 그림) 왼쪽부터 순서대로 1-2-3-5

① 한 개의 independent variable과 이에 영향을 미치는 한 개의 dependent variable을 준비한다

→ dependent variable = 종속변수 = label = target (outcome)

→ independent variable = 독립변수 = 예측변수(predictor) = 설명(explanatory) = 특성(feature) (what we control, manipulate)

② 관측치 observation들의 independent variable & dependent variable 값에 맞게 배치시킨다

③ linear regression line을 그린다! (least squares method를 기준으로) ★선형회귀모델 완성★

(이 때 regression line을 그리기 위해 해당 모델을 찾는 과정을 '학습'이라 부른다.)

◈ 학습 = '비용함수(RSS)를 최소화하는(최소제곱회귀 과정; OLS(Ordinary Least Squares) 과정) 모델을 찾는 과정' ◈

④ interpolate & extrapolate 수행! ★새롭게 input된 feature값을 통해 새로운 predicted value 찾기★

⑤ 이제 만들어진 선형회귀모델을 evaluation metrics를 사용해 baeline model과 비교하여 성능을 평가한다! ★해당 선형회귀모델 성능 평가★

(여기서 다양한 evaluation metrics를 사용할 수 있음. 추후 포스팅 참고!)

③ 심화) Q. least squares method를 통해 linear regression 만드는 과정?

▒ 회귀선 = 잔차 제곱들의 합인 RSS(Residual Sum of Squares)(=SSE; Sum of Squared Error)를 최소화하는 직선 ▒

(잔차(residual) = 예측값과 관측값의 차이)

(여기서 error는 이상적인 모집단 값과 관측치의 차이 (추상적 개념) / residual은 실제 model의 예측값과 관측치의 차이 - 값으로 나옴)

Q1 - 여기서 RSS란?

→ RSS = SSE = 회귀모델의 비용함수(Cost Function)

Q2 - 그러면 이렇게 RSS를 최소로하는 회귀 직선을 찾는 방법은?

→ 주어진 관측치의 각 x값 평균 (x bar) & y값 평균 (y bar) 사용

Sxx와 Sxy로 기울기를 구하고, 평균 x bar & y bar를 사용하여 y절편 구함

→ 위와 같은 식으로 회귀모델의 기울기 & y절편 구함!

→ 기울기와 y절편을 구했다면? 단순선형회귀모델 완성!

♧OLS♧

"In statistics, ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear approximation."

→ 관측치와 선형회귀 모델로 예측된 값 사이 수직 거리의 제곱의 합을 최소화하는 기법!

④ 심화) Q. interpolate & extrapolate

→ interpolate는 주어진 dataset 내의 범위에 빠진 값이라던가, 관심있는 값의 expected label을 model을 통해 찾는 과정

→ extrapolate는 주어진 dataset 외의 범위에 새로운 값을 선택해서 해당 값의 expected label을 model을 통해 찾는 과정이다

2. Simple Linear Regression Model 전제조건

① X와 Y간의 관계는 linear relationship이어야 한다! <선형성>

'The relationship between X and the mean of Y is linear.'

→ Y에 대해서 X가 일차결합으로 표현이 된다는 가정이 있어야 당연히 해당 regression model을 만들 수 있겠지...?

② 그 어떤 X에서의 잔차는 모두 동일한 분포를 가져야 한다 (분산이 동일) <등분산성(homoscedasticity)>

'The variance of residual is the same for any value of X.'

→ 즉 잔차가 분포가 서로 다르다면 서로 영향을 갖는, 상관관계가 있다고 해석할 수 있으므로 잔차로 인해 모델의 성능에 영향을 주어서는 안되기에 등분산성이 꼭 전제되어야 함!

(+) * no auto-correlation(오차항끼리의 자기상관성이 없다)

→ residual plot에서 나타나는 오차항 error terms끼리의 일종의 pattern이 존재하면 자기상관성이 있다고 판별
→ 이런 자기상관성이 존재해서는 안된다는 뜻

③ independent variables간의 영향이 없어야 한다! (~~다중공선성~~이 없는) <독립성> = No Multicollinearity

'Observations are independent of each other.'

→ independent variable이 2개 이상인 Multiple Linear Regression Model에만 해당
(~~단순선형회귀~~는 한 개의 독립변수만 존재하므로 독립성 전제조건 해당 x)

④ 잔차가 정규분포 모양을 띄어야 한다! <정규성> = Multivariate Normality

'For any fixed value of X, Y is normally distributed.'

* 출처1) https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R5_Correlation-Regression/R5_Correlation-Regression4.html

* 출처2) https://www.immagic.com/eLibrary/ARCHIVES/GENERAL/WIKIPEDI/W120529O.pdf

* 출처3) https://www.youtube.com/watch?v=JvS2triCgOY&list=PLF596A4043DBEAE9C&index=2

저작자표시 비영리 변경금지 (새창열림)

'Machine Learning > Models (with codes)' 카테고리의 다른 글

(L2 Regularization) → Ridge Regression (w/scikit-learn) (0)	2022.04.20
(L2 Regularization) → Ridge Regression (concepts) (1)	2022.04.19
Multiple Linear Regression Model (concepts+w/code) (0)	2022.04.17
Simple Linear Regression Model (w/scikit-learn) (0)	2022.04.16
ML Supervised Learning → Regression → Linear Regression (0)	2022.04.13

Simple Linear Regression (concepts)

1. HOW? - Simple Linear Regression (step-by-step)

2. Simple Linear Regression Model 전제조건

'Machine Learning > Models (with codes)' 카테고리의 다른 글

댓글

티스토리툴바