Statistics/Concepts(+codes)

Bayesian Theorem

metamong 2022. 5. 7.

๐Ÿ˜บ ์ด๋ฏธ ๋ฒ ์ด์ง€์•ˆ ์ด๋ก ๊ณผ ๊ด€๋ จํ•œ ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ ์˜›๋‚  ํฌ์ŠคํŒ…์— ๋‹ค๋ฃฌ ๋ฐ” ์žˆ์—ˆ๋‹ค!

 

Bayesian Theorem '(example - 2 exercises)

Q1) At a certain stage of a criminal investigation, โ‘ the inspector in charge is 60% convinced of the guilty of a certain suspect. Suppose now that a new piece of evidence that shows that โ‘กth..

sh-avid-learner.tistory.com

 

๐Ÿ˜บ ์ด์   ๋ฒ ์ด์ง€์•ˆ ์ด๋ก ์— ๊ด€ํ•ด ์ž์„ธํžˆ concept์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋ ค ํ•จ!

concepts>

๐Ÿ–๐Ÿป ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋Š” ํ•œ ๋งˆ๋””๋กœ '๋ฐ์ดํ„ฐ๋ผ๋Š” ์กฐ๊ฑด์ด ์ฃผ์–ด์กŒ์„ ๋•Œ์˜ ์กฐ๊ฑด๋ถ€ํ™•๋ฅ ์„ ๊ตฌํ•˜๊ธฐ'๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค

(์ฆ‰ ๊ฒฐ๊ณผ(A)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์›์ธ(B1, B2, B3, ...)์„ ๊ตฌํ•˜๋Š” ํ™•๋ฅ )

 

๐Ÿ–๐Ÿป Bayes Theorem ๋‘ ๊ฐ€์ง€ ๊ฐ€์ •

 

โ‘  ํ‘œ๋ณธ๊ณต๊ฐ„์˜ ๋ถ„ํ• 

๐Ÿ‘ณ๐Ÿป ๋ถ„ํ• ๋œ ์›์ธ๋“ค - ์ฆ‰ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ์—ฌ๋Ÿฌ ์›์ธ๋“ค(B1, B2, B3 ~)์ด ์žˆ์„ํ…๋ฐ ์ด ์›์ธ๋“ค์€ ์„œ๋กœ ์ƒํ˜ธ๋ฐฐ๋ฐ˜(๊ต์ง‘ํ•ฉ ์กด์žฌ x)์ด๋ฉฐ ํ•ฉ์ง‘ํ•ฉ์€ ์ „์ฒด ํ‘œ๋ณธ๊ณต๊ฐ„์ด๋‹ค

 

โ‘ก ์ „ํ™•๋ฅ ๊ณต์‹

๐Ÿ‘ณ๐Ÿป ๊ฒฐ๊ณผ(A)์™€ ์›์ธ(Bx)์„ ์•ˆ๋‹ค๋ฉด ๊ฒฐ๊ณผ์˜ ํ™•๋ฅ ์„ ์•„๋ž˜๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค

$$P(A) = P(A\cap B1) + ... + P(A\cap Bk) = P(B1)P(A|B1) + ... + P(Bk)P(A|Bk)$$

 

- (์œ„๋ถ€ํ„ฐ) ๊ฐ€์ • โ‘  - ๊ฐ€์ • โ‘ก - k=3์ผ ๋•Œ ์ „ํ™•๋ฅ ๊ณต์‹ ๋ฐ ํ‘œ๋ณธ๊ณต๊ฐ„ S ํ‘œํ˜„ -

 

 

๐Ÿ–๐Ÿป ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ํ™•๋ฅ ์„ ์•Œ๋ฉด ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ ์ ์šฉ์ด ๊ฐ€๋Šฅ!

 

โ‘  ๋ถ„ํ• ๋œ ์›์ธ ์‚ฌ๊ฑด๋“ค(B1 ~ Bk) ๊ฐ๊ฐ์˜ ํ™•๋ฅ  $$P(B1), P(B2), ... P(Bk)$$

โ‘ก ๊ฐ ์›์ธ ์‚ฌ๊ฑด๋“ค(B1 ~ Bk)์„ ์ „์ œ๋กœ ํ–ˆ์„ ๋•Œ ๊ฒฐ๊ณผ์‚ฌ๊ฑด(A)์ด ๋ฐœ์ƒํ•  ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  $$P(A|B1), P(A|B2), ... P(A|Bk)$$

โ‘ข โ‘ ๊ณผ โ‘ก๋ฅผ ์•Œ๋ฉด ๊ฒฐ๊ณผ์‚ฌ๊ฑด์ด ์ผ์–ด๋‚ฌ๋‹ค๋Š” ์กฐ๊ฑด ํ•˜์— ๋ฐœ์ƒํ•œ ์›์ธ๋“ค์˜ ํ™•๋ฅ ์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค

$$P(Bi|A) = \cfrac{P(A \cap Bi)}{P(A)} = \cfrac{P(Bi)P(A|Bi)}{P(B1)P(A|B1) + ... + P(Bk)P(A|Bk)}$$

 

๐Ÿ–๐Ÿป ์˜ˆ์‹œ

Q. ์–ด๋Š ํ•œ ๊ณต์žฅ์—์„œ ์ œํ’ˆ์„ ์ƒ์‚ฐํ•˜๋Š” ๊ธฐ๊ณ„๋“ค์€ ๋‹จ ์„ธ ๋Œ€๋งŒ ์กด์žฌํ•œ๋‹ค. ์–ด๋–ค ํ•œ ์ œํ’ˆ์€ ๋‘ ๊ฐœ ์ข…๋ฅ˜ ์ด์ƒ์˜ ๊ธฐ๊ณ„๋“ค๋กœ๋ถ€ํ„ฐ ์ƒ์‚ฐ์ด ๋ถˆ๊ฐ€๋Šฅ(์ƒํ˜ธ๋ฐฐ๋ฐ˜)ํ•˜๋‹ค๊ณ  ํ•œ๋‹ค. ํ•ด๋‹น ํšŒ์‚ฌ ์ œํ’ˆ์€ ๋ฐ˜๋“œ์‹œ ์ด ์„ธ ๊ฐœ ์ œํ’ˆ์œผ๋กœ๋งŒ ์ƒ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์„ธ ๊ธฐ๊ณ„ B1, B2, B3์˜ ์ƒ์‚ฐ๋ฅ ์€ ๊ฐ๊ฐ 0.2, 0.3, 0.5์ด๋ฉฐ ๊ฐ ๊ธฐ๊ณ„๋ณ„ ๋ถˆ๋Ÿ‰๋ฅ ์€ B1, B2, B3 ๊ฐ๊ฐ 0.13, 0.11, 0.1์ด๋‹ค. ์ด ๋•Œ (1)๋ถˆ๋Ÿ‰์ด๋ผ๊ณ  ํ–ˆ์„ ๋•Œ ํŠน์ • ๊ธฐ๊ณ„๊ฐ€ ์›์ธ์ผ ํ™•๋ฅ  ๊ฐ ์„ธ ๊ธฐ๊ณ„ ๊ฐ๊ฐ ๊ตฌํ•ด๋ณด์ž. ๊ทธ๋ฆฌ๊ณ  (2)์–ด๋–ค ์ข…๋ฅ˜์˜ ๊ธฐ๊ณ„๊ฐ€ ๋ถˆ๋Ÿ‰์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•๋ฅ ์ด ์ œ์ผ ๋†’์€ ๊ธฐ๊ณ„์˜ ์ข…๋ฅ˜๋ฅผ ์ฐพ์•„๋ณด์ž.

 

A. 

* ๋ฌธ์ œ๋ฅผ ๋ฒ ์ด์ฆˆ ์ด๋ก ์— ์ ์šฉํ•ด ๊ฐ„๋‹จํžˆ ๊ทธ๋ฆผ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ณด๋ฉด..>

 

P(B1), P(B2), P(B3) → P(A) (๊ฒฐ๊ณผ A์‚ฌ๊ฑด์€ '๋ถˆ๋Ÿ‰')

 

* โ‘  ๊ฐ ์›์ธ ์‚ฌ๊ฑด๋“ค

→ P(B1) = 0.2

→ P(B2) = 0.3

→ P(B3) = 0.5

 

* โ‘ก ๊ฐ ๊ธฐ๊ณ„๋ณ„ ๋ถˆ๋Ÿ‰๋ฅ 

→ P(A|B1) = 0.13

→ P(A|B2) = 0.11

→ P(A|B3) = 0.1

 

* โ‘ข ๋ถˆ๋Ÿ‰์ด๋ผ๊ณ  ํ–ˆ์„ ๋•Œ ๊ฐ ๊ธฐ๊ณ„๋ณ„ ์›์ธ์ผ ํ™•๋ฅ 

→ ๊ธฐ๊ณ„ B1) P(B1|A) = (0.2*0.13) / (0.2*0.13) + (0.3*0.11) + (0.5*0.1) = 26/109 โ‰’ 0.239

→ ๊ธฐ๊ณ„ B2) P(B2|A) = 33/109 โ‰’ 0.303

→ ๊ธฐ๊ณ„ B3) P(B3|A) = 50/109 โ‰’ 0.459

 

* โ‘ฃ ๋ถˆ๋Ÿ‰์ด ์ผ์–ด๋‚ฌ๋‹ค๋ฉด ๋ถˆ๋Ÿ‰์›์ธ์ด๋ผ ๋งํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์ œ์ผ ๋†’์€ ๊ธฐ๊ณ„๋Š” B3์ด๋‹ค!

ํ™œ์šฉ>

๐Ÿ‘ฉ‍๐Ÿฆฑ ๋ฒ ์ด์ง€์•ˆ ์ด๋ก ์€ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋Š” ์‚ฌ์ „ํ™•๋ฅ ์„ ํ† ๋Œ€๋กœ ์ดํ›„ ์‚ฌํ›„ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ • - ์ฆ‰ ์ง€์†์ ์œผ๋กœ data๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜๋ฉด์„œ ์‚ฌํ›„ํ™•๋ฅ ์„ ์•Œ์•„๋‚ผ ๋•Œ ๋งŽ์ด ์‚ฌ์šฉ๋œ๋‹ค.

 

๐Ÿ‘ฉ‍๐Ÿฆฑ ์œ„ ์˜ˆ์™€ ์„ค๋ช…์—์„œ ์›์ธ(B) → ๊ฒฐ๊ณผ(A) ํ˜•ํƒœ๋กœ ์ œ์‹œํ–ˆ๋Š”๋ฐ, ์‹ค์ƒํ™œ์˜ ๊ฒฝ์šฐ B๋Š” ์ด๋ฏธ ํ™•๋ฅ ๊ฐ’์„ ์•Œ๊ณ  ์žˆ๋Š” ์‚ฌ์ „ํ™•๋ฅ (prior probability)์ด๊ณ , ์—ฌ๊ธฐ์„œ A๋ผ๋Š” ์ƒˆ๋กœ์šด ์ •๋ณด๊ฐ€ update๋˜๋ฉด์„œ ์šฐ๋ฆฌ๋Š” ์•Œ๊ณ  ์žˆ๋Š” B๋ผ๋Š” ์ƒํ™ฉ์—์„œ์˜ Aํ™•๋ฅ  P(A|B)์„ ์–ป์–ด๋‚ธ data๋กœ ํ™œ์šฉํ•ด, ์ตœ์ข…์ ์œผ๋กœ ์ƒˆ๋กœ์šด ์ •๋ณด A๋ฅผ ์–ป์–ด๋‚ธ ์ƒํ™ฉ์—์„œ์˜ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋Š” event B๊ฐ€ ๋ฐœ์ƒํ•  ํ™•๋ฅ ์ธ P(B|A)๋ฅผ ์–ป์–ด๋‚ด๋Š” ๊ฒฝ์šฐ๋กœ ๋งŽ์ด ์“ฐ์ธ๋‹ค

 

๊ทธ๋Ÿฌ๋ฉด ๋‹ค์‹œ! ์˜ˆ๋ฅผ ๋“ค์–ด๋ณด๋ฉด ๐Ÿ‘จ‍๐ŸŒพ

Q. ์—ฌํƒœ๊นŒ์ง€ ์ง€๊ตฌ์ƒ์˜ ๋ชจ๋“  ์ธ๊ตฌ๊ฐ€ ํŠน์ • ์งˆ๋ณ‘ X์— ๋Œ€ํ•ด 0.5%๋งŒ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค(์‚ฌ์ „ํ™•๋ฅ ; P(B) = 0.005)๊ณ  ์•Œ๋ ค์ ธ ์žˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ด ์งˆ๋ณ‘์„ ํƒ์ง€ํ•˜๋Š” ์ƒˆ๋กœ์šด event A๊ฐ€ ์ƒ๊ฒจ๋‚ฌ๋‹ค(update - ๋ฒ ์ด์ง€์•ˆ ํ™•๋ฅ  ์‚ฌ์šฉ). ์ด ๋ฐฉ๋ฒ• A๋กœ 99%์˜ ํ™•๋ฅ ๋กœ ์งˆ๋ณ‘์„ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค(P(A|B) = 0.99). ๊ทธ๋Ÿฌ๋‚˜ ๋™์‹œ์— 1%์˜ ํ™•๋ฅ ๋กœ ์งˆ๋ณ‘์ด ์—†๋Š”๋ฐ๋„ ์งˆ๋ณ‘์ด ์žˆ๋‹ค๊ณ  ์ง„๋‹จํ•œ๋‹ค๊ณ  ํ•œ๋‹ค (P(A|Bc) = 0.01). ์ด ๋•Œ ํ•œ ํŠน์ • ์‚ฌ๋žŒ์ด ์ด A ๋ฐฉ๋ฒ•์œผ๋กœ ์งˆ๋ณ‘์ด ์žˆ๋‹ค๊ณ  ์ง„๋‹จ ๋ฐ›์•˜์„ ๋•Œ(A), ์‹ค์ œ๋กœ ์งˆ๋ณ‘์„ ๊ฐ€์ง€๊ณ  ์žˆ์„(B) ํ™•๋ฅ (P(B|A))์€?

(ํŠน์ • ์งˆ๋ณ‘ X์— ๋Œ€ํ•ด ๊ฐ€์ง€๊ณ  ์žˆ๊ฑฐ๋‚˜, ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š์€ ๋‘ ๊ฐ€์ง€ case๋งŒ ์กด์žฌํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค)

 

A. P(B) = 0.005, P(A|B) = 0.99, P(A|Bc) = 0.01์„ ์ด์šฉํ•ด์„œ P(B|A)๋ฅผ ๊ตฌํ•˜์ž!

→ P(B|A) = (P(B)*P(A|B)) / {(P(B)*P(A|B)) + P(Bc)*P(A|Bc)} = (0.005*0.99) / {(0.005*0.99) + (0.995*0.01)} โ‰’ 0.332

∴ ์ฆ‰! ์•ฝ 33.2%์˜ ํ™•๋ฅ ๋กœ ์งˆ๋ณ‘์ด ์žˆ๋‹ค๊ณ  ์ง„๋‹จ๋ฐ›์œผ๋ฉด ์‹ค์ œ๋กœ ์งˆ๋ณ‘์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค

 

๐Ÿ‘ฉ‍๐Ÿฆฑ ํŠนํžˆ ML ๋จธ์‹ ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ, ์ฃผ์–ด์ง„ dataset์—์„œ ๊ฐ€์„ค์„ ์ง€์†์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ ํ•ด๊ฐ€๋ฉฐ ์ตœ์ ์˜ ๋ชจ๋ธ์„ ๊ตฌ์ƒํ•ด ๊ฐˆ ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค. ์ƒˆ๋กœ์šด ๊ฐ€์„ค, ์ฆ‰ ์ •๋ณด๊ฐ€ update๋˜๋ฏ€๋กœ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๊ฒฐ๊ตญ์€ ๋†’์ด๊ธฐ ์œ„ํ•ด Bayesian Theorem ์‚ฌ์šฉ!

 

๐Ÿ‘ Naive Bayesian Classifier ๋ชจ๋ธ + (Gaussian ๊นŒ์ง€) Bayseian ์ด๋ก ์„ ํ™œ์šฉํ•œ model ๋ฐ ๊ฐœ๋…๋“ค์€ ์ถ”ํ›„์— postingํ•  ์˜ˆ์ •! 

 

๐Ÿ‘ ๋˜ํ•œ ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ ๊ตฌ์ถ•๊ณผ์ •๊ณผ Bayesian ๊ณ„์‚ฐ ๊ณผ์ •์ด ์ผ๋งฅ์ƒํ†ตํ•˜๋ฏ€๋กœ ML์— ์ ‘๊ทผํ•˜์—ฌ์„œ๋„ ์ถ”๊ฐ€์ ์œผ๋กœ ๊ณต๋ถ€ํ• ๊ฒŒ ๋งŽ์€ ๋ถ€๋ถ„์ž„! ์ถ”ํ›„ ํฌ์ŠคํŒ…๋“ค์„ ํ†ตํ•ด ๋” ๊นŠ์ด ์ดํ•ดํ•˜์ž!


* ์ถœ์ฒ˜1) ProDS(์ดˆ+์ค‘๊ธ‰)1

* ์ถœ์ฒ˜2) https://www.youtube.com/watch?v=9wCnvr7Xw4E 

๋Œ“๊ธ€