Math & Linear Algebra/Concepts

Odds Ratio & log(Odds Ratio)

metamong 2022. 7. 11.

๐ŸŒŸ ์ผ๋ช… '์˜ค์ฆˆ(Odds)'๋ผ๊ณ  ํ•ด์„œ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ชจ๋ธ์„ ๊ตฌํ•˜๋Š” ์‹์— ์‚ฌ์šฉ๋œ๋‹ค๊ณ  ์˜ˆ์ „ ๋กœ์ง€์Šคํ‹ฑ ํฌ์ŠคํŒ…์—์„œ ๋ฐฐ์šด ์ ์ด ์žˆ๋‹ค.

 

๐ŸŒŸ '์˜ค์ฆˆ๋น„'์™€ ๋กœ๊ทธ๋ฅผ ์”Œ์šด '๋กœ๊ทธ์˜ค์ฆˆ๋น„' ๋‘ ๊ฐ€์ง€ ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์•Œ์•„๋ณด์ž ๐Ÿฆ”

1> odds

ex) 'the odds in favor of my team winning the game are 1 to 4 ... ' → $\cfrac{1}{4}$

= 0.25 - ์ฆ‰ ๋‚ด ํŒ€์ด ๊ฒŒ์ž„์— ์ด๊ธธ odds๋Š” 0.25์ด๋‹ค

 

๐ŸŒŸ ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ๊ฑด odds of p๋Š” probability of p ํ™•๋ฅ ์ด ์•„๋‹ˆ๋ผ๋Š” ์  ๐Ÿ’ซ

 

โ‰ซ odds๋Š” $\cfrac{p}{1-p}$๋ผ๋ฉด, probability๋Š” $\cfrac{p}{1}$์ด๋‹ค. 

 

โ‰ซ (1-p)์ธ ํ™•๋ฅ ์— ๋Œ€ํ•œ p์ธ ํ™•๋ฅ ์„ odds๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. $\cfrac{P(p)}{P(1-p)}$ = $\cfrac{p}{1-p}$

2> log(odds)

๐ŸŒŸ odds์˜ ๋ฒ”์œ„๋ฅผ ์‚ดํŽด๋ณด๋ฉด, ์šฐ๋ฆฌ๊ฐ€ ๊ด€์‹ฌ์žˆ์–ด ํ•˜๋Š” ์„ฑ๊ณตํ™•๋ฅ  p์— ๋น„ํ•ด ๋ฐ˜๋Œ€ ์‹คํŒจํ™•๋ฅ  (1-p)๊ฐ€ ๋งค์šฐ ํด์ˆ˜๋ก odds๋Š” 0์— ๊ฐ€๊นŒ์›Œ์ง์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , ๊ทธ ๋ฐ˜๋Œ€๋กœ ์„ฑ๊ณตํ™•๋ฅ  p๊ฐ€ ์‹คํŒจํ™•๋ฅ  (1-p)์— ๋น„ํ•ด ๋งค์šฐ ํด์ˆ˜๋ก 1์—์„œ +$\infty$๊นŒ์ง€ ๋ฒ”์œ„๊ฐ€ ๋ฌดํ•œํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๐ŸŒŸ 0์„ ๊ธฐ์ค€์œผ๋กœ p>(1-p)์ด๋ฉด log(odds)๋Š” ์–‘์ˆ˜๊ฐ’, (1-p)>p์ด๋ฉด log(odds)๋Š” ์Œ์ˆ˜๊ฐ’์ด๋‹ค.

 

๐ŸŒŸ์—ฌ๊ธฐ์„œ $log(\cfrac{p}{1-p})$๋ฅผ logit function์ด๋ผ๊ณ  ํ•œ๋‹ค. (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ basis)

 

๐ŸŒŸ log๊ฐ’์„ ์ทจํ•œ ์ˆ˜์น˜๋Š” 0์„ ๊ธฐ์ค€์œผ๋กœ ๋Œ€์นญ์ธ, ๋ถ„ํฌ๋ฅผ ์ด๋ฃฌ๋‹ค.

3> odds ratio

→ ์•„๋ž˜ ์˜ˆ๋ฅผ ์ด์šฉํ•ด์„œ odds ratio๋ฅผ ์“ฐ๋Š” ์‹ค์ƒํ™œ ์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด์ž

 

  Has Cancer Doesn't have Cancer
Has the mutated gene 23 117
Doesn't have the mutated gene 6 210

โ‘  mutated gene์„ ๊ฐ–๊ณ  ์žˆ๋‹ค๋ฉด, cancer๋ฅผ ๊ฐ€์งˆ odds๋Š” $\cfrac{23}{117}$

โ‘ก mutated gene์„ ๊ฐ–๊ณ  ์žˆ์ง€ ์•Š๋‹ค๋ฉด, cancer๋ฅผ ๊ฐ€์งˆ odds๋Š” $\cfrac{6}{210}$

 

→ odds ratio๋ฅผ ๋”ฐ์ง€๋ฉด $\cfrac{23/117}{6/210}$ = 6.88

→ ์ฆ‰ mutated gene์„ ๊ฐ–๊ณ  ์žˆ๋Š” ์‚ฌ๋žŒ์ด cancer๋ฅผ ๊ฐ€์งˆ odds๋Š” 6.88๋ฐฐ ๋” ๋†’๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

→ log(odds)๊ฐ’์€ 1.93

 

๐ŸŒŸ ์œ„ ์˜ˆ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด odds ratio๋Š” cancer ์œ ๋ฌด์— mutated gene ์—ฌ๋ถ€ ์š”์†Œ๊ฐ€ ์˜ํ–ฅ์„ ๋ผ์น˜๋Š” ์ง€, ์ข‹์€ predictor์ธ์ง€ ํŒ๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ์ˆ˜์น˜์ด๋‹ค. (R-squared ์ฒ˜๋Ÿผ). odds ratio๊ฐ’์ด ๋†’์„์ˆ˜๋ก(๋˜๋Š” log(odds ratio)๊ฐ’) cancer ์œ ๋ฌด์— ๊ด€ํ•œ good predictor๋ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๐ŸŒŸ ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๋Š” ๊ณ„์‚ฐ์œผ๋กœ ์•Œ๊ฒŒ ๋œ odds ratio๊ฐ€ ๋‹น์—ฐํžˆ statistically significantํ•œ์ง€ test๋ฅผ ๊ฑฐ์ณ์•ผ ํ•œ๋‹ค

(fisher's exact test, chi-square test(p-value), the wald test(confidence interval) ์„ธ test๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ๋ฌด์—‡์ด best ์ธ์ง€๋Š” ์—†์Œ)

โ‘  chi-square test

→ ๋จผ์ € cancer ์—ฌ๋ถ€์— mutated gene ์š”์†Œ๊ฐ€ ์˜ํ–ฅ์„ ๋ผ์น˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

๊ฐ ์นธ์˜ expected value ๊ตฌํ•˜๊ธฐ

- mutated gene ์—ฌ๋ถ€์™€ ์ƒ๊ด€์—†์ด ์ „์ฒด cancer๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ํ™•๋ฅ ์€ $\cfrac{29}{356}$ = 0.08

- ์ฆ‰, mutated gene ์—ฌ๋ถ€์™€ ์ƒ๊ด€์—†์ด ๊ตฌํ•œ cancer ์—ฌ๋ถ€ ํ™•๋ฅ ๋กœ ๊ธฐ๋Œ“๊ฐ’์„ ๊ตฌํ•œ ๊ฒƒ์ด๋‹ค.

 

expected values Has Cancer Doesn't have Cancer
Has the mutated gene 11.2 (=140*0.08) 128.8
Doesn't have the mutated gene 17.3(=216*0.08) 198.7

expected values์™€ ์‹ค์ œ observed values ๋น„๊ตํ•˜๊ธฐ - p-value ๊ตฌํ•˜๊ธฐ

โ‘ก the wald test

โ€ป log(odds ratios)๊ฐ’์ด ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋ณด์ธ๋‹ค๋Š” ๊ฐ€์ •๊ณผ ํ•จ๊ป˜ ์ถœ๋ฐœํ•œ๋‹ค.

→ 1000๊ฐœ์˜ sample์ด ๋„˜๋Š” log(odds ratio)๊ฐ’๋“ค์ด ๋ชจ์ธ ์ •๊ทœ๋ถ„ํฌ

→ ์ƒ˜ํ”Œ size๋Š” 300์—์„œ 400๊ฐœ ์‚ฌ์ด

→ ๊ฐ ์ƒ˜ํ”Œ๋งˆ๋‹ค 0์—์„œ 1์‚ฌ์ด์˜ ๋žœ๋ค๊ฐ’์„ ์ •ํ•˜๊ณ , 0.08 ์ดํ•˜์˜ ๊ฐ’์€ cancer๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ„์ฃผ

→ ๋‹ค์‹œ 0์—์„œ 1์‚ฌ์ด์˜ ๋žœ๋ค๊ฐ’์„ ์ •ํ•˜๊ณ , 0.39 ์ดํ•˜์˜ ๊ฐ’์€ mutated gene์ด ์žˆ๋‹ค๊ณ  ๊ฐ„์ฃผ

→ ๋žœ๋คํ•˜๊ฒŒ ๋ฝ‘๊ณ  ๊ฐ„์ฃผํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ถ„๋ฅ˜ํ•ด ๊ฐœ์ˆ˜๋ฅผ ์„ธ๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

 

using random values has cancer - yes has cancer - no
has the mutated gene - yes 14 141
has the mutated gene - no 15 222

→ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์•ฝ 10,000๋ฒˆ ์‹œํ–‰ํ•ด histogram์„ ๊ทธ๋ฆฌ๋ฉด ์ •๊ทœ๋ถ„ํฌ ํ˜•ํƒœ๊ฐ€ ๋‚˜์˜จ๋‹ค. standard deviation ๊ฐ’์€ ์•ฝ 0.43์ด ๋‚˜์˜ด

→ observed values์—์„œ standard deviation ๊ฐ’์„ ๊ตฌํ•˜๋ฉด 0.47

์„œ๋กœ ๊ฐ’์ด ๋งค์šฐ ๋น„์Šทํ•˜๋‹ค

 

→ log(odds ratio)์—์„œ standard deviation๊ฐ’์„ ๋‚˜๋ˆ„๋ฉด 4.11 (์ฆ‰ ๊ตฌํ•œ log(odds ratio)๊ฐ’์€ ๋ถ„ํฌ์˜ ํ‰๊ท ์œผ๋กœ๋ถ€ํ„ฐ ์•ฝ 4.11 ํ‘œ์ค€ํŽธ์ฐจ ๋ฐฐ์ˆ˜๋งŒํผ ๋–จ์–ด์กŒ๋‹ค๋Š” ๋œป์ด๋‹ค.

(์ผ๋ฐ˜์ ์œผ๋กœ 2๋ฐฐ์ˆ˜๋ณด๋‹ค ํฐ ๋ฐฐ์ˆ˜๋งŒํผ ๋–จ์–ด์กŒ๋‹ค๋Š” ๊ฑด p-value๊ฐ’์ด 0.05๋ฏธ๋งŒ์œผ๋กœ log(odds ratio)๊ฐ’์ด statistically significantํ•˜๋‹ค๊ณ  ๊ฒฐ๋ก ์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๋‹ค.)


* ์ถœ์ฒ˜1) STATQUEST - odds & log(odds) https://www.youtube.com/watch?v=ARfXDSkQf1Y 

* ์ถœ์ฒ˜2) STATQUEST - odds ratio & log(odds ratio) https://youtu.be/8nm0G-1uJzA

'Math & Linear Algebra > Concepts' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Linear Equation & Linear System / Rank & det(A)  (0) 2023.02.01
Matrix (fundamentals)  (0) 2022.07.31
eigenvalue & eigenvector  (0) 2022.05.14
linear & non-linear โ†’ span, basis, rank, projection  (0) 2022.05.13
Pearson & Spearman correlation coefficients  (0) 2022.05.13

๋Œ“๊ธ€