Machine Learning/Models (with codes)

(L1 Regularization) → LASSO Regression (concepts)

metamong 2022. 6. 23.

๐Ÿฅ‘ ๊ณผ์ ํ•ฉ ๋ฌธ์ œ๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กดํšŒ๊ท€๊ณ„์ˆ˜๋“ค์˜ ์ œ๊ณฑํ•ฉ์— λ penalty๋ฅผ ๋ถ€๊ณผํ•ด ์ƒˆ๋กœ์šด data์— ์•Œ๋งž์€ ์˜ˆ์ธก์„ ํ•˜๊ฒŒ๋” ํ•ด์ฃผ๋Š” Ridge ๊ทœ์ œ์— ๋Œ€ํ•ด ๋ฐฐ์› ๋‹ค.

 

 

(L2 Regularization) → Ridge Regression (concepts)

** ์šฐ๋ฆฌ๋Š” ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ Supervised Learning ์ค‘ Regression์˜ ์ผ์ข…์ธ 'linear regression'์— ๋Œ€ํ•ด ํ•™์Šตํ–ˆ๋‹ค. โ˜๏ธ ์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณด๋‹ค์‹œํ”ผ linear ์„ ํ˜• regression์œผ๋กœ๋Š” ๋งŽ์€ ์ข…๋ฅ˜์˜ model์ด ์žˆ์Œ์„ ํ™•์ธํ• ..

sh-avid-learner.tistory.com

 

 

(L2 Regularization) → Ridge Regression (w/scikit-learn)

๐Ÿ˜ผ ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ Ridge ํšŒ๊ท€๊ฐ€ ๋ฌด์—‡์ธ์ง€ ๊ฐœ๋…์— ๋Œ€ํ•ด ์ •ํ™•ํžˆ ์•Œ์•„๋ณด์•˜๋‹ค ๐Ÿ˜ผ (L2 Regularization) → Ridge Regression (concepts) ** ์šฐ๋ฆฌ๋Š” ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ Supervised Learning ์ค‘ Regression์˜ ์ผ์ข…์ธ '..

sh-avid-learner.tistory.com

 

๐Ÿฅ‘ ๋™์ผํ•œ Regularization์ธ๋ฐ, ์•ฝ๊ฐ„ ๋‹ค๋ฅธ L1 Regularization - LASSO์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์ž

* Ridge & Lasso formula ์ •๋ฆฌ>

โ‘  Ridge

 

Q. Ridge = '์œ„ ๋น„์šฉํ•จ์ˆ˜ $J(\beta)$๋ฅผ ์ตœ์†Œ๋กœ ํ•˜๋Š” ํšŒ๊ท€๊ณ„์ˆ˜ $\hat{\beta}^R$๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ

 

 

๐Ÿ“ฃ $J(\beta)$์—๋Š” ๋‘ term์ด ์žˆ๋Š”๋ฐ ์ด term๋“ค์˜ ํ•ฉ์„ ์ตœ์†Œ๋กœ ํ•˜๋Š” ํšŒ๊ท€๊ณ„์ˆ˜๋“ค์„ ๊ตฌํ•˜๋Š” ๋ฌธ์ œ๋ผ๊ณ  ๋ฐ”๊ฟ” ํ•ด์„ํ•  ์ˆ˜ ์žˆ๊ณ , ์ด ์ค‘ ๊ทœ์ œํ•ญ์ด ๋“ค์–ด๊ฐ„ term๋งŒ ๋นผ๋‚ธ๋‹ค. ๊ฐ ํšŒ๊ท€๊ณ„์ˆ˜๋“ค์˜ ์ œ๊ณฑํ•ฉ์ด ์ตœ์†Œ๊ฐ€ ๋˜์–ด์•ผ ํ•˜๋ฏ€๋กœ, ์ด๋Š” $\sum_{j=1}^{k} {\beta_j}^2$์€ ๊ทœ์ œํ•ญ์— ์˜ํ•ด ์ž„์˜์˜ ์ƒ์ˆ˜ t์ดํ•˜์—ฌ์•ผ ํ•˜๋Š” condition์œผ๋กœ ๋ฐ”๊ฟ” ์“ธ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด condition์„ $J(\beta)$์—์„œ ๋นผ๋‚ด๊ณ  ๋‹ค์‹œ ์ •๋ฆฌํ•˜๋ฉด.. 

 

 

๐Ÿ“ฃ ์‹œ๊ฐํ™”๋ฅผ ์‰ฝ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด $\beta_1, \beta_2$๋ฅผ x, y์ถ•์œผ๋กœ ํ•œ๋‹ค๋ฉด, ๊ทœ์ œํ•ญ์€ ํ•œ ์›์œผ๋กœ ํ‘œํ˜„์ด ๋  ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  $\hat{\beta}^R$์€ ์ด์ œ ํ•œ ๊ฐœ์˜ term์œผ๋กœ ๋งŒ๋“ค์–ด์กŒ์œผ๋ฉฐ, ํ•ด๋‹น term์€ $\beta$๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํƒ€์›ํ˜•(elipse contour plot) ์‹์ด ๋งŒ๋“ค์–ด์ง„๋‹ค. ํƒ€์›์˜ ์ค‘์‹ฌ์ด ํ•ด๋‹น term์ด ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๊ณณ์ด๋ฉฐ, ์ ์  ํƒ€์›์ด ์ปค์งˆ์ˆ˜๋ก ํ•ด๋‹น term์˜ ๊ฐ’์ด ์ปค์ง„๋‹ค.

 

๐Ÿ“ฃ ๊ทœ์ œํ•ญ์œผ๋กœ ํ‘œํ˜„๋˜๋Š” ์›๊ณผ ํƒ€์›์ด ์ตœ์†Œํ•œ์œผ๋กœ ํ‘œํ˜„๋˜๋Š” ๊ณณ์ด ์„œ๋กœ ๋งŒ๋‚˜๋Š” ์ง€์  (๋งŒ๋‚˜๊ธด ํ•ด์•ผํ•œ๋‹ค. ๊ทœ์ œํ•ญ์„ ์ ์šฉํ•ด์•ผ ํ•˜๋ฏ€๋กœ)์ด ์กด์žฌํ•œ๋‹ค. ๋งŒ๋‚˜๊ธฐ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ, ์ฆ‰ ํƒ€์›๊ณผ ์› ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ์ง€์ ์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ridge coefficients' vector ํ•ด๊ฐ€ ๋œ๋‹ค.

โ‘ก Lasso

 

Q. Lasso = '์œ„ ๋น„์šฉํ•จ์ˆ˜ $J(\beta)$๋ฅผ ์ตœ์†Œ๋กœ ํ•˜๋Š” ํšŒ๊ท€๊ณ„์ˆ˜ $\hat{\beta}^L$๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ

 

 

๐Ÿ“ฃ Ridge์™€ ๋ชจ๋“  ๋ถ€๋ถ„์€ ๋™์ผํ•˜๊ณ , ๋‹ค๋งŒ ๊ทœ์ œํ•ญ์ด ์ ˆ๋Œ“๊ฐ’์ด ๋“ค์–ด๊ฐ€์„œ ํ•ด๋‹น term์„ ๋”ฐ๋กœ ๋นผ๋‚ด๋ฉด ์ž„์˜์˜ ์ƒ์ˆ˜ t๊ฐ€ ๋“ค์–ด๊ฐ„ condition์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์›์ด ์•„๋‹Œ, ๋งˆ๋ฆ„๋ชจ๋กœ ํ‘œํ˜„๋œ๋‹ค.

 

 

๐Ÿ“ฃ ์—ญ์‹œ ๋งˆ๋ฆ„๋ชจ์™€ ํƒ€์›์ด ๋งŒ๋‚˜๋Š” ์ง€์ ์ด ์กด์žฌํ•ด์•ผ ํ•  ๊ฒƒ์ด๊ณ , ๋‘ ์ง€์ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๊ณณ์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” lasso coefficients' vector ํ•ด๊ฐ€ ๋œ๋‹ค.

 

 

๐Ÿ“ฃ ๊ทธ ๊ฒฐ๊ณผ, ์›์œผ๋กœ ๊ทœ์ œํ•ญ์ด ํ‘œํ˜„๋œ ridge์˜ ๊ฒฝ์šฐ ์ตœ์†Œ ๊ฑฐ๋ฆฌ๋กœ ๋งŒ๋‚˜๋Š” ์ง€์ ์€ x ๋˜๋Š” y์ถ•์˜ ๊ฐ’์ด ๋‘˜ ๋‹ค 0์ด ์•„๋‹Œ ๊ฒฝ์šฐ๊ฐ€ ๋˜๊ณ , lasso์˜ ๊ฒฝ์šฐ ๋งˆ๋ฆ„๋ชจ ๊ทœ์ œํ•ญ์ด๋ฏ€๋กœ ์ตœ์†Œ ๊ฑฐ๋ฆฌ๋กœ ๋งŒ๋‚˜๋Š” ์ง€์ ์ด ์ฃผ๋กœ x ๋˜๋Š” y ์ค‘ ์ ์–ด๋„ ํ•œ ๊ฐœ ์ด์ƒ์ด 0์ด ๋˜๋Š”, ์ฆ‰ coefficient๊ฐ€ 0์ด ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. 

* Ridge & Lasso concept ๋น„๊ต>

โ˜€๏ธ ๋‘˜ ๋‹ค penalty ๊ฐ’์€ 0์—์„œ +$\infty$์ด๋ฉฐ CV - cross validation ๊ธฐ๋ฒ•์œผ๋กœ ์ตœ์ ์˜ penalty ๊ฐ’ $\lambda$๋ฅผ ๊ตฌํ•œ๋‹ค.

 

โ˜€๏ธ ๋ชจ๋ธ๋ง์— ์ฐธ์—ฌํ•˜๋Š” variable์ด ๋Œ€๋ถ€๋ถ„ ์œ ์šฉํ•˜๊ณ  ์‚ฌ์šฉ๋˜์–ด์•ผ ํ•œ๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด Ridge

โ˜€๏ธ ์ผ๋ถ€ variable์€ ์“ธ๋ชจ ์—†๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด Lasso๋ฅผ ์ ์šฉ


<์ถ”ํ›„ ์‹คํ—˜ ์˜ˆ์ •>

 

๐Ÿ’ซ 1) ์ƒํ™ฉ๋ณ„ Ridge, Lasso ๋ชจ๋ธ์„ ์ ์šฉํ•ด๋ณด๊ณ  ๊ณผ์—ฐ ์œ„์—์„œ ์ ์€ ๋Œ€๋กœ, Ridge์™€ Lasso๊ฐ€ ์ ์ ˆํ•œ ๋•Œ ์“ฐ์ด๋ฉด generalization ๋Šฅ๋ ฅ์ด ์ข‹์•„์ง€๋Š” ์ง€ ํ™•์ธ

๐Ÿ’ซ 2) Lasso coefficients ๋ณ€ํ™” / Ridge coefficients ๋ณ€ํ™”์™€ ๋น„๊ตํ•ด๋ณด๊ธฐ

๐Ÿ’ซ 3) ๊ธฐ์กด coefficients์˜ ํฌ๊ธฐ๊ฐ€ ํฌ๊ณ  ์ž‘์Œ์— ๋”ฐ๋ฅธ lasso(+ridge) ๋ชจ๋ธ๋ง ํšจ๊ณผ ๋น„๊ต๋ถ„์„ํ•ด๋ณด๊ธฐ


* ์ถœ์ฒ˜1) STATQUEST ๐Ÿฅ– https://www.youtube.com/watch?v=NGf0voTMlcs 

* ์ถœ์ฒ˜2) ProDS (์ดˆ๊ธ‰+์ค‘๊ธ‰1) ์ด๋ก  ๊ฐ•์˜

* ์ถœ์ฒ˜3) ridge, lasso ์ฐจ์ด visualization https://www.youtube.com/watch?v=Xm2C_gTAl8c 

 

 

'Machine Learning > Models (with codes)' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

K-Means Clustering (concepts + w/code)  (0) 2022.06.08
Decision Trees (concepts)  (0) 2022.04.27
Logistic Regression Model (w/code)  (0) 2022.04.25
Polynomial Regression Model  (0) 2022.04.24
Logistic Regression Model (concepts)  (0) 2022.04.24

๋Œ“๊ธ€