Machine Learning/Experiments

EXP001 - โ‰ชRidge ํšจ๊ณผ(1)โ‰ซ - coefficients ๋ณ€ํ™” & ์„ฑ๋Šฅ ํ–ฅ์ƒ ํ™•์ธํ•˜๊ธฐ

metamong 2022. 4. 22.

๐Ÿค™ ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ์šฐ๋ฆฌ๋Š” Multiple Regression ๋‹คํ•ญํšŒ๊ท€๋ชจ๋ธ์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ–ˆ๋‹ค.

 

Multiple Linear Regression Model (concepts+w/code)

โœŒ๏ธ ์ €๋ฒˆ ์‹œ๊ฐ„์— feature๊ฐ€ 1๊ฐœ์ธ ๋‹จ์ˆœ์„ ํ˜•ํšŒ๊ท€๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ๋ฐฐ์› ๋‹ค โœŒ๏ธ - ์ด๋ก (๊ฐœ๋…) - Simple Linear Regression (concepts) ** ์šฐ๋ฆฌ๋Š” ์ €๋ฒˆ์‹œ๊ฐ„์— Supervised Learning - Regression - Linear Regression..

sh-avid-learner.tistory.com

 

๐Ÿค™ ๊ทธ๋ฆฌ๊ณ  Ridge ๋ชจ๋ธ ์†Œ๊ฐœ ์‹œ๊ฐ„์— SLR ๋ชจ๋ธ์— Ridge ๊ทœ์ œ๋ฅผ ์ •ํ•ด Ridge์˜ ํšจ๊ณผ๋ฅผ ์‹คํ—˜์œผ๋กœ ์ฆ๋ช…ํ–ˆ๊ณ , ํ›„๋ฐ˜๋ถ€์— MLR ๋ชจ๋ธ์— ๊ด€ํ•ด์„œ๋„ ์ž ๊น ์–ธ๊ธ‰ํ–ˆ์—ˆ๋‹ค

 

(L2 Regularization) → Ridge Regression (w/scikit-learn)

๐Ÿ˜ผ ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ Ridge ํšŒ๊ท€๊ฐ€ ๋ฌด์—‡์ธ์ง€ ๊ฐœ๋…์— ๋Œ€ํ•ด ์ •ํ™•ํžˆ ์•Œ์•„๋ณด์•˜๋‹ค ๐Ÿ˜ผ (L2 Regularization) → Ridge Regression (concepts) ** ์šฐ๋ฆฌ๋Š” ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ Supervised Learning ์ค‘ Regression์˜ ์ผ์ข…์ธ '..

sh-avid-learner.tistory.com

 

 

> MLR ๋‹คํ•ญํšŒ๊ท€๋ชจ๋ธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๋“ค์ด feature๋กœ ์ž‘์šฉํ•œ๋‹ค. ์ด ๋•Œ ridge ๊ทœ์ œ๋ฅผ ํ†ตํ•ด ๋œ ์ค‘์š”ํ•œ feature์˜ ์˜ํ–ฅ๋ ฅ์„ ๊ฐ์†Œ์‹œํ‚จ๋‹ค. SLR ๋ชจ๋ธ๊ณผ ๋‹ค๋ฅด๊ฒŒ ridge ํšจ๊ณผ๊ฐ€ ํ™•์—ฐํžˆ ๋Š๊ปด์งˆ ์ˆ˜ ๋ฐ–์— ์—†๋‹ค! (feature๊ฐ€ ๋งŽ์œผ๋ฏ€๋กœ)

> ๊ทธ๋Ÿผ ์šฐ๋ฆฌ๋Š” ridge ์ ์šฉ ํ›„ โ‘ ๊ฐ feature์˜ coefficients ๊ณ„์ˆ˜์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณผ ๊ฒƒ์ด๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ โ‘กridge ํ›„ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์–ด๋Š ์ •๋„ ๊ฐœ์„ ๋˜์—ˆ๋Š” ์ง€ ์ˆ˜์น˜๋กœ ํ™•์ธํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค.

 

์‹œ์ž‘!


โ‘ ๊ฐ feature์˜ coefficients ๊ณ„์ˆ˜์˜ ๋ณ€ํ™”

1> data ์ค€๋น„ / train, test ๋ถ„๋ฆฌ / features, target ๋ถ„๋ฆฌ

 

#<MLR ๋ชจ๋ธ - ์ตœ์ ์˜ alpha์ฐพ๊ณ  ์ด๋•Œ ๊ฐ feature๋ณ„ coef ๋ณ€ํ™” ๋ณด๊ธฐ>
path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-SkillsNetwork/labs/Data%20files/module_5_auto.csv'
df = pd.read_csv(path)
df.to_csv('module_5_auto.csv')
df=df._get_numeric_data()

train = df.sample(frac=0.75,random_state=1)
test = df.drop(train.index)

train.dropna(inplace=True)
test.dropna(inplace=True)

target = 'price'

## X_train, y_train, X_test, y_test ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฆฌ
X_train = train.drop(columns=target)
y_train = train[target]
X_test = test.drop(columns=target)
y_test = test[target]

 

2> MLR ๋ชจ๋ธ

 

#mlr model
#20 features
from sklearn.linear_model import LinearRegression

model_lr = LinearRegression()
model_lr.fit(X_train, y_train)

 

3> RidgeCV๋กœ ์ตœ์ ์˜ ridge ๊ทœ์ œ ๋ชจ๋ธ ์ฐพ๊ณ  RidgeCV ์ ์šฉ ์ „๊ณผ ํ›„ coefficients ํ™•์ธํ•˜๊ธฐ

 

#ridgeCV
alphas = np.arange(1, 200, 1)

ridge_mlr = RidgeCV(alphas=alphas, cv=10)

#fitting 
ridge_mlr.fit(X_train, y_train) 

print(ridge_mlr.coef_, ridge_mlr.intercept_, ridge_mlr.alpha_)
print(ridge_mlr.best_score_)

#[-7.56491516e+00 -7.56491516e+00  2.78681974e+02 -1.06845870e+01
#  8.61802076e+01 -7.29448106e+01  1.27669085e+02  2.37005747e+02
#  2.26131771e+00  8.68613041e+01  8.27692719e+02 -2.19957856e+03
#  3.19741534e+02  5.01485386e+01  1.73810064e+00 -4.85339063e+01
#  2.47655001e+02  7.95687055e+02  7.05511891e+01 -7.05511891e+01] -49415.03407219711 5

 

# plot MLR coefficients
coefficients = pd.Series(model_lr.coef_, X_train.columns)
plt.figure(figsize=(10,5))
coefficients.sort_values().plot.barh()
plt.show()

print(model_lr.coef_.mean())
#1251.874366308613

print(model_lr.coef_.var())
#63032739.06265261

# plot Ridge coefficients
coefficients = pd.Series(ridge_mlr.coef_, X_train.columns)
plt.figure(figsize=(10,5))
coefficients.sort_values().plot.barh()
plt.show()

print(ridge_mlr.coef_.mean())
#35.722544617003535

print(ridge_mlr.coef_.var())
#323993.716524918

 

4> coefficients ๋น„๊ต ๋ถ„์„

 

- (์œ„) Ridge ์ ์šฉ ์ „ / (์•„๋ž˜) Ridge ์ ์šฉ ํ›„ -

 

โ˜๐Ÿป ridge ์ ์šฉ ์ „ ๊ณ„์ˆ˜ ๊ฐ’์ด ์ƒ๋Œ€์ ์œผ๋กœ ์ ์šฉ ํ›„๋ณด๋‹ค ํผ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰ width์™€ length์˜ target ๊ฒฐ์ •๋ ฅ ์˜ํ–ฅ์ด ํƒ€ feature์— ๋น„ํ•ด ์••๋„์ ์œผ๋กœ ํฐ๋ฐ, ์ด๋กœ ์ธํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ๋‹ค๋ฅธ feature๊ฐ€ target ๊ฒฐ์ •์— ์˜ํ–ฅ์„ ๊ฑฐ์˜ ๋ชป ๋ฏธ์น˜๊ณ  ์žˆ๋‹ค. ridge ์ ์šฉ ํ›„ ์••๋„์ ์œผ๋กœ ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” width์™€ length ๊ณ„์ˆ˜๊ฐ€ ํ™• ๊ฐ์†Œํ–ˆ์œผ๋ฉฐ, stroke ๊ฒฐ์ •๋ ฅ์ด ์••๋„์ ์œผ๋กœ ์ปค์ง„๊ฒŒ ์•„๋‹Œ๊ฐ€ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ชจ๋“  ๊ณ„์ˆ˜๋“ค์˜ ํ‰๊ท ์„ ๋”ฐ์ง€๋ฉด 1251์—์„œ 35๋กœ ํฐ ๊ฐ์†Œ๋ฅผ ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰ l2 ๊ทœ์ œ๋ฅผ ํ†ตํ•ด ๋ชจ๋“  feature์˜ target ๊ฒฐ์ •๋ ฅ์„ ์–ด๋Š ์ •๋„ ํšจ๊ณผ๋ฅผ ๊ฐ์†Œ์‹œํ‚ด์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

โœŒ๐Ÿป ridge ์ ์šฉ ์ „ ๋ถ„์‚ฐ ๊ฐ’์€ 63032739์ด๊ณ , ridge ์ ์šฉ ํ›„ ๋ถ„์‚ฐ ๊ฐ’์€ 323993์œผ๋กœ ์—„์ฒญ๋‚œ ๊ณ„์ˆ˜๊ฐ’ ๋ถ„์‚ฐ์˜ ๊ฐ์†Œ๋ฅผ ๋ณด์ธ๋‹ค. ์ด๋Š” ๊ณ„์ˆ˜๊ฐ’ ๊ฐ„์˜ ํฐ ํŽธ์ฐจ๋ฅผ ๊ทœ์ œ๋กœ ์ธํ•ด ํ™• ์ค„์˜€๋‹ค๋Š” ๋œป์ด ๋˜๋ฉฐ, ๊ณง ์ƒ๋Œ€์ ์œผ๋กœ ๋” ๋งŽ์€ feature๋“ค์ด target ๊ฒฐ์ •๋ ฅ์— ์–ด๋Š ์ •๋„ ์˜ํ–ฅ๋ ฅ์„ ๋ณด์ด๊ธฐ ์‹œ์ž‘ํ–ˆ์Œ์„ ๋œปํ•˜๋ฉฐ, ์ด๋Š” ๊ณง ์ผ๋ถ€ feature๋กœ๋งŒ target๊ฐ’์ด ๊ฒฐ์ •๋œ๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋Š” ํ˜„์ƒ์„ ์–ด๋Š ์ •๋„ ๋ง‰์•˜๋‹ค๊ณ  ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค!

 

๐ŸคŸ๐Ÿป feature selection์˜ ํšจ๊ณผ - ์šฐ๋ฆฌ๋Š” ridge๋ฅผ ์ ์šฉํ•œ ๊ฒฐ๊ณผ stroke๋ผ๋Š” feature๊ฐ€ ๊ฒฐ์ •์ ์ธ feature์ž„์„ ํ™•์ธํ–ˆ๊ณ , ์ด๋Š” ๊ณง ridge model์„ ํ†ตํ•ด ์–ด๋Š feature๊ฐ€ ๊ฒฐ์ •์ ์ธ ์ง€ ๋งํ•ด์ฃผ๊ณ  ์žˆ๋‹ค. (๋ฌผ๋ก  ridge ์ ์šฉ ์ „์—๋„ ์–ด๋–ค ์ข…๋ฅ˜๋ฅผ feature๋กœ ์ •ํ•  ์ง€ ์•Œ ์ˆ˜ ์žˆ์ง€๋งŒ ๊ธฐ์กด training set์— ๋งž์ถฐ์ง„ ๊ณผ์ ํ•ฉ์˜ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์‹ ๋ขฐ์„ฑ์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋–จ์–ด์ง„๋‹ค. ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ๋ณด์•˜์„ ๋•Œ๋„ ๋‘๋“œ๋Ÿฌ์ง„ feature ์ข…๋ฅ˜๊ฐ€ ๋‹ฌ๋ผ์กŒ์Œ์„ ํ™•์ธ ๊ฐ€๋Šฅ!) ๋˜ํ•œ ์šฐ๋ฆฌ๊ฐ€ ridge๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” ๊ธฐ์กด training set๋ณด๋‹ค ์•ž์œผ๋กœ ์ƒˆ๋กญ๊ฒŒ ๋“ค์–ด์˜ฌ test data์— ๋Œ€ํ•œ ์ข‹์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์ด๊ธฐ ์œ„ํ•จ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ l2 ๊ทœ์ œ๋Š” ๊ณ„์ˆ˜๊ฐ’์„ ๊ฐ์†Œ์‹œ์ผœ ๋„ˆ๋ฌด ๋ณต์žกํ•œ ๋ชจ๋ธ์ด ๋˜์ง€ ์•Š๊ฒŒ๋” ํ•˜๊ณ , ์–ด๋Š ์ •๋„ ์˜ˆ์ธก์€ ์ž˜๋˜๊ฒŒ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์ค€๋‹ค. (l1 ๊ทœ์ œ๋Š” ์˜ํ–ฅ๋ ฅ์ด ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ feature ์•„์˜ˆ 0์œผ๋กœ ๋งŒ๋“ค์–ด๋ฒ„๋ฆฐ๋‹ค. ๋”ฐ๋ผ์„œ feature selection์ด ๋ชฉ์ ์ด๋ผ๋ฉด LASSO๊ฐ€ ๋” ๋‚ซ๋‹ค๊ณ  ํŒ๋‹จ๋œ๋‹ค. ํ›„์— ํฌ์ŠคํŒ… ์˜ˆ์ •)

โ‘กridge ํ›„ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์–ด๋Š ์ •๋„ ๊ฐœ์„ ๋˜์—ˆ๋Š” ์ง€ ์ˆ˜์น˜๋กœ ํ™•์ธ

 

5> test data๋กœ ๋ชจ๋ธ ์„ฑ๋Šฅ ์˜ˆ์ธกํ•˜๊ธฐ (MLR & ridge)

 

#predicting 
y_test_pred_MLR = model_lr.predict(X_test)

print(r2_score(y_test, y_test_pred_MLR), mean_squared_error(y_test, y_test_pred_MLR))
#0.8977890607619405 5466979.56131857

#predicting 
y_test_pred_mlr = ridge_mlr.predict(X_test)

print(r2_score(y_test, y_test_pred_mlr), mean_squared_error(y_test, y_test_pred_mlr))
#0.8991869412385929 5392210.812963509

 

๐Ÿ‘Œ๐Ÿป alpha ๊ทœ์ œ๊ฐ’์ด 5์ผ ๋•Œ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๊ณ , MLR ๊ธฐ์กด ๋ชจ๋ธ๋ณด๋‹ค ์†Œํญ์ด์ง€๋งŒ ๊ฒฐ์ •๊ณ„์ˆ˜๊ฐ’, MSE ๋ชจ๋‘ ์„ฑ๋Šฅ์ด ์ข‹์•„์กŒ์œผ๋ฉฐ, ์ด ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์šฐ๋ฆฌ๋Š” l2 regulariztion์˜ ํšจ๊ณผ๋ฅผ ๋ณด์•˜๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค!

 

- ๋งˆ์ง€๋ง‰์œผ๋กœ ridge์— ๋Œ€ํ•ด ์–ด๋–ค user๊ฐ€ ๋‚จ๊ธด '๋„ˆ๋ฌด ์ •๋ฆฌ๊ฐ€ ์ž˜ ๋œ' ํ•œ ๋งˆ๋””๋ฅผ ๋ณต๋ถ™ํ•˜์—ฌ ๊ฐ€์ ธ์˜ด๐Ÿ‘๐Ÿป -

'If our underlying data follows a relatively simple model, and the model we use is too complex for the task, what we are essentially doing is we are putting too much weight on any possible change or variance in the data. Our model is overreacting and overcompensating for even the slightest change in our data(๊ณผ์ ํ•ฉ). People in the field of statistics and machine learning call this phenomenon overfitting. When you have features in your dataset that are highly linearly correlated with other features, turns out linear models will be likely to overfit(์œ„ ์˜ˆ์‹œ์—์„  ํŠนํžˆ width๊ฐ€ ๋„ˆ๋ฌด ์••๋„์ ์ผ์ •๋„๋กœ overfit์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์˜€์Œ - ํฌ๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ๋Š” feature๊ฐ€ ์žˆ๋‹ค๋ฉด ์ด๋กœ ์ธํ•ด ์กฐ๊ธˆ์˜ feature ๋ณ€ํ™”์—๋„ target์— ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•จ - ๋”ฐ๋ผ์„œ ์–ด๋Š ์ •๋„ ์ด feature์˜ ์˜ํ–ฅ๋ ฅ์„ ์œ ํ•˜๊ฒŒ ๋งŒ๋“ค ํ•„์š”๊ฐ€ ์žˆ์Œ). Ridge Regression, avoids over fitting by adding a penalty to models that have too large coefficients(penalty ์ค€ ๊ฒฐ๊ณผ width์˜ ์˜ํ–ฅ๋ ฅ ํ˜„์ €ํ•œ ๊ฐ์†Œ).'

 

* overfitting/underfitting ๊ด€๋ จ ํฌ์ŠคํŒ…์€ ์•„๋ž˜ ์ฐธ์กฐ ↓↓↓↓

 

 

Overfitting/Underfitting & Bias/Variance Tradeoff

1. ์ผ๋ฐ˜ํ™”(generalization) "In machine learning, generalization is a definition to demonstrate how well is a trained model to classify or forecast unseen data. Training a generalized machine learnin..

sh-avid-learner.tistory.com


* ์ฐธ๊ณ ์ถœ์ฒ˜) https://stats.stackexchange.com/questions/251708/when-to-use-ridge-regression-and-lasso-regression-what-can-be-achieved-while-us

๋Œ“๊ธ€