Statistics/Concepts(+codes)

MLE for the normal distribution

metamong 2022. 6. 27.

๐Ÿ”Š ์ €๋ฒˆ ์‹œ๊ฐ„์— MLE์— ๋Œ€ํ•ด์„œ ๋ฐฐ์› ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ˆ์ œ๋กœ MLE ๊ธฐ๋ฒ•์„ logistic regression์— ์ ์šฉํ•ด ์ตœ์ ์˜ sigmoid ํ•จ์ˆ˜๋ฅผ ์–ด๋–ป๊ฒŒ ๊ตฌํ•˜๋Š” ์ง€ ์ˆ˜ํ•™์ ์œผ๋กœ ์ˆ˜์‹์„ ํ†ตํ•ด ์•Œ์•„๋ณด์•˜๋‹ค.

 

๐Ÿ”Š ์ด๋ฒˆ ์‹œ๊ฐ„์—๋Š” logistic์ด ์•„๋‹Œ normal distribution์— MLE ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ด ์ฃผ์–ด์ง„ data๋ฅผ ๊ฐ€์žฅ ์ž˜ ์„ค๋ช…ํ•˜๋Š” normal distribution์˜ ๋‘ ๋ชจ์ˆ˜์ธ $\mu$์™€ $\sigma$๋ฅผ ์ฐพ์•„ ์ตœ์ ์˜ normal distribution์„ ์•Œ์•„๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณด๋ ค ํ•œ๋‹ค.

 

 

Maximum Likelihood Estimation(MLE)

๐ŸŒŸ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ํฌ์ŠคํŒ…์—์„œ MLE๊ธฐ๋ฒ•์„ ํ†ตํ•ด model์„ ๊ฒฐ์ •ํ•œ๋‹ค๊ณ  ํ•˜์˜€๋‹ค. ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ์‹์„ ๋” deepํ•˜๊ฒŒ ์ˆ˜ํ•™์ ์œผ๋กœ ๋“ค์–ด๊ฐ€, ์–ด๋–ค ๋ชจ๋ธ์„ ๊ณ ๋ฅผ ์ง€ ์ˆ˜์‹์œผ๋กœ ์—ฐ์‚ฐํ•˜๋Š” ๊ณผ์ •์—์„œ MLE๊ฐ€ ํ•ต์‹ฌ์œผ๋กœ

sh-avid-learner.tistory.com

* normal distribution ๊ฐœ์š”>

$pr(x|\mu, \sigma)$ = $\cfrac{1}{\sigma \sqrt{2\pi}}$ $e^{-\cfrac{1}{2}(\cfrac{x - \mu}{\sigma})^2}$

 

 

→ ์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณด๋“ฏ์ด distribution์˜ ์ด๋™ ๋ฐฉํ–ฅ์€ $\mu$๊ฐ€ ๊ฒฐ์ •ํ•ด์ค€๋‹ค. ๋ถ„ํฌ ์ „์ฒด์˜ ํ‰๊ท ์ธ ์ค‘์‹ฌ์„ ๋œปํ•œ๋‹ค. $\mu$ ๊ฐ’์ด ํฌ๋ฉด ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด๋™, ์ž‘์œผ๋ฉด ์™ผ์ชฝ์œผ๋กœ ์ด๋™ํ•œ๋‹ค.

→ distribution์˜ ๋„ˆ๋น„, ์ฆ‰ ์–‘ ์˜†์œผ๋กœ ํผ์ง„ ์ •๋„๋Š” $\sigma$๊ฐ€ ๊ฒฐ์ •ํ•ด์ค€๋‹ค. $\sigma$ ๊ฐ’์ด ํด์ˆ˜๋ก ์–‘ ์˜†์œผ๋กœ ํผ์ง€๊ณ , ์ž‘์„์ˆ˜๋ก ์œ„๋กœ ๋พฐ์กฑํ•ด์ง„๋‹ค.

 

๐ŸŒฟ ์šฐ๋ฆฌ๋Š” ํ•ด๋‹น distribution ํ•จ์ˆ˜๋ฅผ likelihood๋กœ ๋‘์–ด ํ•ด๋‹น likelihood๊ฐ€ ์ตœ๋Œ€๊ฐ€ ๋  ๋•Œ์˜ distribution์„ ์ฐพ์œผ๋ ค ํ•œ๋‹ค (๋‘ ๋ชจ์ˆ˜ ์ฐพ๊ธฐ)

 

๐ŸŒฟ $L(\mu, \sigma|x)$ = $\cfrac{1}{\sigma \sqrt{2\pi}}$ $e^{-\cfrac{1}{2}(\cfrac{x - \mu}{\sigma})^2}$

 

๐ŸŒฟ speculation) ์ฃผ์–ด์ง„ x data๋“ค์ด ์žˆ์„ ๋•Œ, ํ•ด๋‹น x data๋“ค์„ ๊ฐ€์žฅ ์ž˜ ์„ค๋ช…ํ•˜๋Š” normal distribution์€ ์ฃผ์–ด์ง„ x data์˜ ํ‰๊ท ์ด ํ•ด๋‹น distribution์˜ ๋ชจ์ˆ˜ $\mu$์ด๊ณ , ์ฃผ์–ด์ง„ x data์˜ ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ distribution์˜ $\sigma$๋ผ๊ณ  ์ถ”์ธกํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๐ŸŒฟ MLE ๊ธฐ๋ฒ•์œผ๋กœ ์ฆ๋ช…ํ•ด๋ณด์ž.

* using MLE>

โ€ป ์ฃผ์˜ - ๋ชจ์ˆ˜๊ฐ€ 2๊ฐœ ์ด์ƒ์ธ ๊ฒฝ์šฐ ๊ฐ ๋ชจ์ˆ˜๋ณ„๋กœ ํŽธ๋ฏธ๋ถ„ํ•  ๋•Œ, ๋‹ค๋ฅธ ๋ชจ์ˆ˜๋Š” constant ์ทจ๊ธ‰ํ•œ ์ฑ„๋กœ ํŽธ๋ฏธ๋ถ„

 

โ‘  ์ „์ฒด likelihood๋Š” ๊ฐ x point๋ณ„ likelihood๋ฅผ ๋ชจ๋‘ ๊ณฑํ•œ ๊ฐ’์ด๋‹ค

→ $L(\mu, \sigma | x_1, x_2, ... , x_n) $ = $L(\mu, \sigma | x_1)$ x $L(\mu, \sigma | x_n)$ = $\cfrac{1}{\sigma \sqrt{2\pi}}$ x $e^{-\cfrac{1}{2}(\cfrac{x_1 - \mu}{\sigma})^2}$ x ... x $\cfrac{1}{\sigma \sqrt{2\pi}}$ x $e^{-\cfrac{1}{2}(\cfrac{x_n - \mu}{\sigma})^2}$

 

โ‘ก ๋ฏธ๋ถ„์—ฐ์‚ฐ ํŽธ์˜๋ฅผ ์œ„ํ•ด ์–‘๋ณ€์— ln ๋กœ๊ทธ๋ฅผ ์ทจํ•˜๋ฉด

→ $ln[L(\mu, \sigma | x_1, x_2, ... , x_n)]$ = $ln($$\cfrac{1}{\sigma \sqrt{2\pi}}$ x $e^{-\cfrac{1}{2}(\cfrac{x_1 - \mu}{\sigma})^2}$ x ... x $\cfrac{1}{\sigma \sqrt{2\pi}}$ x $e^{-\cfrac{1}{2}(\cfrac{x_n - \mu}{\sigma})^2}$)

 

โ‘ข ์šฐ๋ณ€ ln์„ ์ญ‰ ํ’€๊ณ  ๊ณ„์‚ฐํ•˜๋ฉด

→ = $ln($$\cfrac{1}{\sigma \sqrt{2\pi}}$ x $e^{-\cfrac{1}{2}(\cfrac{x_1 - \mu}{\sigma})^2}$) + ... + $ln($$\cfrac{1}{\sigma \sqrt{2\pi}}$ x $e^{-\cfrac{1}{2}(\cfrac{x_n - \mu}{\sigma})^2}$)

= $ln[(2\pi\sigma^2)^{-1/2}]$ - $\cfrac{(x_1 - \mu)^2}{2\sigma^2}$$ln(e)$ + ... + $ln[(2\pi\sigma^2)^{-1/2}]$ - $\cfrac{(x_n - \mu)^2}{2\sigma^2}$$ln(e)$

= -$\cfrac{1}{2}$$ln(2\pi\sigma^2)$ - $\cfrac{(x_1 - \mu)^2}{2\sigma^2}$ + ... + -$\cfrac{1}{2}$$ln(2\pi\sigma^2)$ - $\cfrac{(x_n - \mu)^2}{2\sigma^2}$

= -$\cfrac{1}{2}$$ln(2\pi)$ -$\cfrac{1}{2}$$ln(\sigma^2)$ - $\cfrac{(x_1 - \mu)^2}{2\sigma^2}$ + ... + -$\cfrac{1}{2}$$ln(2\pi)$ -$\cfrac{1}{2}$$ln(\sigma^2)$ - $\cfrac{(x_n - \mu)^2}{2\sigma^2}$

= -$\cfrac{1}{2}$$ln(2\pi)$ -$ln(\sigma)$ - $\cfrac{(x_1 - \mu)^2}{2\sigma^2}$ + ... + -$\cfrac{1}{2}$$ln(2\pi)$ -$ln(\sigma)$ - $\cfrac{(x_n - \mu)^2}{2\sigma^2}$

 

โ‘ฃ ๊ณตํ†ต๋œ ํ•ญ๋“ค์„ ๋ฌถ์–ด ํ‘œํ˜„ํ•˜๋ฉด ln ์—ฐ์‚ฐ์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์™„์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

→ = -$\cfrac{n}{2}$$ln(2\pi)$ -$nln(\sigma)$ - $\cfrac{(x_1 - \mu)^2}{2\sigma^2}$ - ... - $\cfrac{(x_n - \mu)^2}{2\sigma^2}$

 

โ‘ค (1) ์ด์ œ $\mu$์— ๊ด€ํ•œ ํŽธ๋ฏธ๋ถ„์„ ํ•ด๋ณด๋ฉด

→ $\cfrac{\partial}{\partial\mu}$ $ln[L(\mu, \sigma | x_1, x_2, ... , x_n)]$ = 0 - 0 + $\cfrac{x_1 - \mu}{\sigma^2}$ + ... + $\cfrac{x_n - \mu}{\sigma^2}$

= $\cfrac{1}{\sigma^2}$ $[(x_1 + ... + x_n) - n\mu]$

 

โ‘ฅ (2) $\sigma$์— ๊ด€ํ•œ ํŽธ๋ฏธ๋ถ„์„ ํ•ด๋ณด๋ฉด

$\cfrac{\partial}{\partial\sigma}$ $ln[L(\mu, \sigma | x_1, x_2, ... , x_n)]$ = 0 -$\cfrac{n}{\sigma}$ + $\cfrac{(x_1 - \mu)^2}{\sigma^3}$ + ... + $\cfrac{(x_n - \mu)^2}{\sigma^3}$ = -$\cfrac{n}{\sigma}$ + $\cfrac{1}{\sigma^3}$$[(x_1 - \mu)^2 + ... + (x_n - \mu)^2]$

 

โ‘ฆ ๋‘ ๋ชจ์ˆ˜์—์˜ ํŽธ๋ฏธ๋ถ„ ๊ฐ’์ด 0์ผ ๋•Œ๋ฅผ ํ™•์ธ (์ตœ๋Œ€์น˜์ด๋ฏ€๋กœ)

→ 1> 0 = $\cfrac{1}{\sigma^2}$ $[(x_1 + ... + x_n) - n\mu]$

↔ 0 = $(x_1 + ... x_n) - n\mu$

$\mu$ = $\cfrac{(x_1 + ... x_n)}{n}$

 

→ 2> 0 = -$\cfrac{n}{\sigma}$ + $\cfrac{1}{\sigma^3}$$[(x_1 - \mu)^2 + ... + (x_n - \mu)^2]$

↔ 0 = $-n$ + $\cfrac{1}{\sigma^2}$$[(x_1 - \mu)^2 + ... + (x_n - \mu)^2]$

$\sigma$ = $\sqrt{\cfrac{(x_1 - \mu)^2 + ... + (x_n - \mu)^2}{n}}$

 

โ‘ง ๊ฒฐ๊ณผ, ์ตœ์ ์˜ $\mu$๋Š” ์ฃผ์–ด์ง„ data์˜ ํ‰๊ท , ์ตœ์ ์˜ $\sigma$๋Š” ์ฃผ์–ด์ง„ data์˜ ํ‘œ์ค€ํŽธ์ฐจ์ž„์„ MLE ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์ฆ๋ช…ํ•˜์˜€๋‹ค! 

(์œ„์˜ speculation์ด ๋งž์•˜์Œ ํ™•์ธ ๊ฐ€๋Šฅ)


* ์ถœ์ฒ˜) ๊ฐ“ STATQUEST https://www.youtube.com/watch?v=Dn6b9fCIUpM 

* ์‚ฌ์ง„, ์ธ๋„ฌ์ถœ์ฒ˜) https://www.boost.org/doc/libs/1_49_0/libs/math/doc/sf_and_dist/graphs/normal_pdf.png

๋Œ“๊ธ€