Machine Learning/Models (with codes)

ML Supervised Learning → Regression → Linear Regression

metamong 2022. 4. 13.

1. ML ๊ธฐ๋ฒ• ๊ตฌ๋ถ„

๐Ÿ’†๐Ÿฝ‍โ™‚๏ธ ๋‹ต์ด ์ฃผ์–ด์ ธ ์žˆ๋Š” Supervised Learning

๐Ÿ™…๐Ÿฝ ๋‹ต์ด ์ฃผ์–ด์ ธ ์žˆ์ง€ ์•Š์€ UnSupervised Learning

 

→ Simple Linear Regression(๋‹จ์ˆœ์„ ํ˜•ํšŒ๊ท€)์€ ๋‹ต์ด ์ฃผ์–ด์ ธ ์žˆ๋Š” Dependent variable & Independent variable ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด์ค€๋‹ค.

→ ๋”ฐ๋ผ์„œ Simple Linear Regression์€ ์ง€๋„ํ•™์Šต(Supervised Learning)์˜ ์ผ์ข…!

 

2. Supervised Learning

→ ์ง€๋„ํ•™์Šต์€ ํ•˜๋‹จ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์€ model ์ ˆ์ฐจ๋ฅผ ๋”ฐ๋ผ๊ฐ„๋‹ค.

 

 

(์œ„ ๊ทธ๋ฆผ ์„ค๋ช…!) - 1-2-3-4-5 ์ˆœ์„œ ์ž˜ ๋”ฐ๋ผ๊ฐ€๊ธฐ

 

โ‘  test, docu, image์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ format์˜ data์—์„œ ํŠน์ง• vector(X)๋ฅผ ๋ฝ‘์•„์˜จ๋‹ค

โ‘ก labelingํ•  labels data(y)๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค

 

'A supervised learning algorithm makes the distinction between the raw observed data X with shape (n_samples, n_features) and some label given to the model while training by some teacher. In scikit-learn this array is often noted y and has generally the shape (n_samples,).'

 

→ ์—ฌ๊ธฐ์„œ y์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ supervised learning์€ ๋‹ค๋ฅธ ์ด๋ฆ„์„ ๊ฐ€์ง„๋‹ค..!

 

'If y has values in a fixed set of categorical outcomes (represented by integers) the task to predict y is called classification. If y has floating point values (e.g. to represent a price, a temperature, a size...), the task to predict y is called regression.'

- ์ฆ‰ ๋ฒ”์ฃผํ™”๋กœ ํ‘œํ˜„๋˜๋Š” (๋‹จ, ์ •์ˆ˜๋กœ discretazied๋œ! ์˜ˆ๋ฅผ ๋“ค๋ฉด 0๊ณผ 1๋กœ) ๋ถ„๋ฅ˜ & float๋กœ ํ‘œํ˜„๋˜๋Š” ํšŒ๊ท€๋กœ ๋‚˜๋‰จ!

 

โ—ˆ classification vs. regression โ—ˆ

 

property classification regression
output type discrete (class labels) continuous (number)
what to find? decision boundary "best fit line(it might not be line)" 
evaluation metrics accuracy sum of squared error / r squared

 classification์œผ๋กœ discrete๋œ data๋ฅผ ๋ฌถ๊ณ , regression์œผ๋กœ continuousํ•œ result์— ๋Œ€์‘๋˜๋Š” line์„ ๋งŒ๋“ ๋‹ค

 

โ‘ข ์ฃผ์–ด์ง„ ํŠน์ง• vector & label data๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ML ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ƒ์„ฑ! fit(X,y)

(์ฆ‰, ํŠน์ง• vector๋ฅผ ๊ฐ€์ง€๊ณ  labels ๋Œ€์ƒ์„ ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜ค๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ fitting ์‹œํ‚จ๋‹ค!)

 

model.fit(X,y)

 

→ ์ฆ‰, ์šฐ๋ฆฌ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ†ตํ•ด ์ฃผ์–ด์ง„ data์™€ ๋‹ต์„ ๊ฐ€์ง€๊ณ  ๊ด€๋ จ rule์„ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค ๐Ÿ’๐Ÿฝ

 

'A machine-learning system is trained rather than explicitly programmed. It’s presented with many examples relevant to a task, and it finds statistical structure in these examples that eventually allows the system to come up with rules for automating the task. For instance, if you wished to automate the task of tagging your vacation pictures, you could present a machine-learning system with many examples of pictures already tagged by humans, and the system would learn statistical rules for associating specific pictures to specific tags.'

 

โ‘ฃ ์ž ๊ทธ๋ ‡๊ฒŒ ํ•ด์„œ ๋งŒ๋“ค์–ด์ง„ ML algorithm์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ œ ์ƒˆ๋กœ์šด feature vector, ์ฆ‰ X๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ ML algorithm์— ์˜ํ•ด ๋งŒ๋“ค์–ด์ง„ predicted model์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ธฐ๋Œ€๋˜๋Š” expected label์„ ๋งŒ๋“ค๋ฉด ๋!

 

โ‘ค... ์šฐ๋ฆฌ๋Š” ์•ž์œผ๋กœ ๋ณต์žกํ•œ ์—ฌ๋Ÿฌ Supervised Learning ๊ธฐ๋ฒ•์„ ๋ฐฐ์šธ ๊ฒƒ์ด๋‹ค! - ๊ฐ ๊ธฐ๋ฒ•๋ณ„ ์ถ”ํ›„ ํฌ์ŠคํŒ… ์ฐธ์กฐ ๐Ÿ’ƒ๐Ÿฝ

 

โ˜… classification / regression model ์ข…๋ฅ˜ (ํ•˜๋‹จ ์ฐธ์กฐ)

 

3. Regression / Linear Regression

* Supervised Learning์— ๋Œ€ํ•ด ๋ฐฐ์› ๋‹ค. ์ด์   ๊ทธ ์ค‘ ํ•œ ๊ธฐ๋ฒ•์ธ 'Regression ๊ธฐ๋ฒ•'์— ๋Œ€ํ•ด์„œ ๋ฐฐ์›Œ๋ณด์ž

 

Q. Regression์ด๋ž€?

'Regression is the task of predicting the value of a continuously varying variable (e.g. a price, a temperature, a conversion rate...) given some input variables (a.k.a. the features, “predictors” or “regressors”).'

 

 continuousํ•˜๊ฒŒ ๋ณ€ํ•˜๋Š” ์ง€์†์ ์ธ ๋ณ€์ˆ˜๋ฅผ ์˜ˆ์ธกํ•  ๋•Œ ์“ฐ์ด๋Š” ๊ธฐ๋ฒ•์ด Regression!

→ ์ฆ‰, ๊ทธ ๋œป์€ categorical ํ•œ ๋ณ€์ˆ˜๋ฅผ ์˜ˆ์ธกํ•  ๋•Œ๋Š” regression ๊ธฐ๋ฒ•์ด ์“ฐ์ด์ง€ ์•Š๋Š”๋‹ค๋Š” ์ 

 

Q. Linear Regression์ด๋ž€?

๐Ÿ‘‰ Regression์ธ๋ฐ ์˜ˆ์ธกํ•˜๋Š” model์ด line์˜ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” regression

 

- line์˜ ํ˜•ํƒœ๋กœ regression๋˜์–ด ์žˆ๋Š” linear regression ์˜ˆ -

 

 

1> independent variable → dependent variable์— ์˜ํ–ฅ์„ ์ค€๋‹ค

2>

- positive relationship) independent variable ๊ฐ’์ด ์ฆ๊ฐ€ํ•˜๋ฉด์„œ dependent variable ๊ฐ’๋„ ์ฆ๊ฐ€!

- negative relationship) independent variable ๊ฐ’์ด ์ฆ๊ฐ€ํ•˜๋ฉด์„œ dependent variable ๊ฐ’์ด ๊ฐ์†Œ!

3>

- Simple Linear Regression) feature๊ฐ€ 1๊ฐœ์ด๊ณ  target์ด 1๊ฐœ

- Multiple Linear Regression) feature๊ฐ€ 2๊ฐœ ์ด์ƒ์ด๊ณ  target์ด 1๊ฐœ

 

๐Ÿ™Œ ์ด์ œ ์ „์ฒด์ ์ธ ๊ฐœ์š”์— ๋Œ€ํ•ด์„œ ์•Œ์•˜์œผ๋‹ˆ ์ข…๋ฅ˜๋ณ„ Regression์— ๋Œ€ํ•ด ์ถ”ํ›„ ํฌ์ŠคํŒ…์œผ๋กœ ์ž์„ธํžˆ ์•Œ์•„๋ณด์ž ๐Ÿ™Œ

์•ˆ๋…• ๐Ÿ–


** ์ถœ์ฒ˜1) https://www.youtube.com/watch?v=G_0W912qmGc 

** ์ถœ์ฒ˜2) https://www.youtube.com/watch?v=zPG4NjIkCjc 

** ์ถœ์ฒ˜3) https://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/tutorial/text_analytics/general_concepts.html#machine-learning-101-general-concepts

** ์ถœ์ฒ˜4) https://livebook.manning.com/book/deep-learning-with-python/chapter-1/19

๋Œ“๊ธ€