Statistics/Concepts(+codes)

ANOVA & (One-Way ANOVA + w/code)

metamong 2022. 4. 25.

๐Ÿง ์šฐ๋ฆฌ๋Š” ํ•œ sample ์ง‘๋‹จ์˜ ํ‰๊ท ์ด ๋ชจ์ง‘๋‹จ์˜ ํ‰๊ท ์ด ๊ฐ™์€ ์ง€๋ฅผ ๊ฒ€์ •ํ–ˆ๊ณ  (one-sample t-test)

 

T-test ๐Ÿ‘‰ ใ€ŠOne-sample T-test (w/ python code)ใ€‹

๐Ÿ‘’ ์ €๋ฒˆ ์‹œ๊ฐ„์— statistics์—์„œ ๋นผ๋†“์„ ์ˆ˜ ์—†๋Š” '๊ฐ€์„ค๊ฒ€์ • TEST - Hypothesis Test'์— ๋Œ€ํ•ด ๋ฐฐ์› ๋‹ค. Hypothesis Test: H0 & Ha - concepts 1. Hypothesis Testing? → Null Hypothesis(H0) ๐Ÿ™†‍โ™‚๏ธ 1โ–ถ Creat..

sh-avid-learner.tistory.com

๐Ÿง ๋‘ sample ์ง‘๋‹จํ‰๊ท ๋ผ๋ฆฌ์˜ ์ฐจ์ด๊ฐ€ ์—†๋Š” ์ง€ ์žˆ๋Š” ์ง€๋„ ๊ฒ€์ •ํ–ˆ๋‹ค (two-samples 'independent' t-test)

 

T-test ๐Ÿ‘‰ใ€ŠTwo-samples 'independent' T-test (w/python code)ใ€‹

โ‘  ๊ฐ€์„ค๊ฒ€์ • hypothesis test์— ๋Œ€ํ•ด์„œ ๋ฐฐ์› ๊ณ  โ‘ก ๊ทธ ์ค‘ ๋Œ€ํ‘œ์ ์ธ One-sample T-test์— ๋Œ€ํ•ด์„œ ๋ฐฐ์› ๋‹ค. T-test ๐Ÿ‘‰ ใ€ŠOne-sample T-test (w/ python code)ใ€‹ ๐Ÿ‘’ ์ €๋ฒˆ ์‹œ๊ฐ„์— statistics์—์„œ ๋นผ๋†“์„ ์ˆ˜ ์—†๋Š” '๊ฐ€์„ค..

sh-avid-learner.tistory.com

 

โ˜… ๊ทธ๋Ÿผ ๊ณผ์—ฐ ์„ธ ๊ฐœ ์ด์ƒ์˜ ์ง‘๋‹จ์€? โ˜…

A. ANOVA ํ™œ์šฉ!

ANOVA(ANalysis Of VAriance)

ํ‰๊ท  ๋น„๊ต์— ๊ด€ํ•œ ๊ฒ€์ •์ด๋‹ค

๋‹จ์ˆœํžˆ 2๊ฐœ ์ด์ƒ์˜ ๋ชจ์ง‘๋‹จ์˜ ๋ชจํ‰๊ท ์˜ ์ฐจ์ด๊ฐ€ ์žˆ๋Š”๊ฐ€๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ฒ€์ •์œผ๋กœ, ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๊ฐ€ ์žˆ๋Š” ์ง€ ๊ตฌ์ฒด์ ์œผ๋กœ ๊ฒ€์ •ํ•˜๊ณ  ์‹ถ์„ ๋•Œ๋Š” (1)์ผ๋‹จ ANOVA๋กœ ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ๋ณด์ด๊ณ , (2)์‚ฌํ›„๊ฒ€์ •์„ ํ†ตํ•ด ์ถ”๊ฐ€์ ์ธ ๊ฒ€์ •์„ ์‹ค์‹œํ•จ!

ํ‰๊ท  ๋น„๊ตํ•˜๋Š” ANOVA์˜ ์›๋ฆฌ?

A. '๋ณ€๋™์„ฑ' ์ด์šฉ

(๊ทธ๋ฃน๋ณ„ ํ‰๊ท ์ด ๋‹ค๋ฅด๋ฉด ๋‹ค๋ฅธ๋งŒํผ ๊ทธ๋ฃน๋ณ„ ํ‰๊ท ์˜ ๋ณ€๋™์„ฑ์ด ํฌ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์ด์šฉํ•œ ๊ฒƒ์ž„)

 

โ˜† ํ‘œ๋ณธ์˜ ๋ณ€๋™์„ฑ ์ •๋ณด๋ฅผ ์ด์šฉํ•ด์„œ 2๊ฐœ ์ด์ƒ ์ง‘๋‹จ์—์˜ ํ‰๊ท ๋“ค์— ๊ด€ํ•œ ์ถ”๋ก ์„ ํ•˜๋Š” ๊ฒƒ โ˜†

 

<์šฉ์–ด ์•Œ๊ณ  ๊ฐ€gi>

 

๐Ÿ‘€ factor(์š”์ธ) = ๋ชจ์ง‘๋‹จ(๊ทธ๋ฃน)์„ ๊ตฌ๋ถ„ํ•˜๋Š” ๊ธฐ์ค€

๐Ÿ‘€ treatment(์ฒ˜๋ฆฌ) = ์š”์ธ์— ์˜ํ•ด ๊ตฌ์„ฑ๋˜๋Š” ๊ฐ ๊ทธ๋ฃน๋ณ„ ๋ชจ์ง‘๋‹จ

 

๐Ÿง™โ™€๏ธ ์—ฌ๊ธฐ์„œ 1๊ฐœ์˜ ์š”์ธ์— ์˜ํ•ด ๊ตฌ๋ถ„๋˜๋Š” ๋ชจ์ง‘๋‹จ๋“ค์ธ์ง€, 2๊ฐœ์˜ ์š”์ธ์— ์˜ํ•ด ๊ตฌ๋ถ„๋˜๋Š” ์ง€์— ๋”ฐ๋ผ One-Way ANOVA, Two-Way ANOVA๋กœ ๋‚˜๋‰จ!

(+) Multiple Comparison์ด ์•ˆ๋˜๋Š” ์ด์œ ?

๐Ÿง™‍โ™€ ์„ธ ๊ฐœ ์ด์ƒ์˜ ์ง‘๋‹จ ๋น„๊ต๋ฅผ ํ•  ๋•Œ, ANOVA๊ฐ€ ์•„๋‹ˆ๋ผ ๊ฐ๊ฐ ๋‘ ๊ทธ๋ฃน์”ฉ 'two-samples ind t-test'๋ฅผ ์‹œํ–‰ํ•ด์•ผ ํ•˜๋Š” ๊ฑฐ ์•„๋‹Œ๊ฐ€๋ผ๊ณ  ์ƒ๊ฐ ๊ฐ€๋Šฅ

๐Ÿง™‍โ™€ ํ•˜์ง€๋งŒ! multiple comparsion ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ด์„œ๋Š” ์•ˆ๋œ๋‹ค

 

๐Ÿ‘ ๊ทธ ์ด์œ ๋ฅผ ์ˆ˜ํ•™ ์‹์„ ํ†ตํ•ด ์•Œ์•„๋ณด์ž

 

→ 3๊ฐœ์˜ ๊ทธ๋ฃน์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž. ์—ฌ๊ธฐ์„œ ๊ฐ ๊ทธ๋ฃน๋ณ„๋กœ ํ†ต๊ณ„์  ์—๋Ÿฌ๊ฐ€ ๋‚  ํ™•๋ฅ ์„ α๋ผ๊ณ  ํ•˜๋ฉด

โ‘  ํ•œ ๊ทธ๋ฃน์—์„œ ์—๋Ÿฌ๊ฐ€ ๋‚˜์ง€ ์•Š์„ ํ™•๋ฅ  (1-α)

โ‘ก 3๊ฐœ ๋ชจ๋“  ๊ทธ๋ฃน์—์„œ ์—๋Ÿฌ๊ฐ€ ๋‚˜์ง€ ์•Š์„ ํ™•๋ฅ  (1-α)^3

โ‘ข ์ ์–ด๋„ ํ•œ ๊ณณ์—์„œ ์—๋Ÿฌ๊ฐ€ ๋‚˜์ง€ ์•Š์„ ํ™•๋ฅ  1-(1-α)^3

 

→ ์ด m๊ฐœ์˜ ๊ทธ๋ฃน์ด ์žˆ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ 1-(1-α)^m ≤ mα๊ฐ€ ์•Œ๋ ค์ ธ ์žˆ๋‹ค

 

๐Ÿ‘ ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ๊ทธ๋ฃน ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ์—๋Ÿฌ๋„ ์ปค์ง„๋‹ค๋Š” ์ ! ์ฆ‰ ๋น„๊ตํ•˜๋Š” ์ง‘๋‹จ์˜ ์ˆ˜๊ฐ€ ๋งŽ์•„์งˆ์ˆ˜๋ก ํ†ต๊ณ„์  ์—๋Ÿฌ๊ฐ€ ๋‚  ํ™•๋ฅ ๋„ ์ปค์ง€๋ฏ€๋กœ

๐Ÿ‘ ANOVA๋ฅผ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ทธ๋ฃน์„ ํ•œ๊บผ๋ฒˆ์— ๋น„๊ตํ•ด์•ผ ํ•œ๋‹ค!

One-way ANOVA

One-way ANOVA ๊ฐ€์ • ๋ฐ ์‹œํ–‰ ์ ˆ์ฐจ>

๊ฐ€์ • โ‘ > ๋ชจ์ง‘๋‹จ์€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค

๊ฐ€์ • โ‘ก> ๋“ฑ๋ถ„์‚ฐ์„ฑ - treatment์˜ ๋ถ„์‚ฐ์ด ๋ชจ๋‘ ๋™์ผํ•˜๋‹ค (=๋ฐ˜์‘๋ณ€์ˆ˜๋“ค์ด treatment ๋‚ด์—์„œ ๊ฐ€์ง€๋Š” ๋ณ€๋™์„ฑ์€ ๋ชจ๋‘ ์ผ์ •)

๊ฐ€์ • โ‘ข> 1๊ฐœ์˜ factor์— ์˜ํ•ด k๊ฐœ์˜ treatment๋กœ ๋ถ„๋ฅ˜๋˜์–ด ์žˆ๋‹ค

 

"The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.

๐Ÿ‘† The samples are independent.

๐Ÿ‘† Each sample is from a normally distributed population.

๐Ÿ‘† The population standard deviations of the groups are all equal. This property is known as homoscedasticity."

 

โ˜… k๊ฐœ์˜ ๋ชจํ‰๊ท ์ด ๋ชจ๋‘ ๋™์ผํ•œ ์ง€, ์•„๋‹Œ ์ง€ ๊ฒ€์ • โ˜…

(์—ฌ๊ธฐ์„œ ๊ฐ treatment๋ณ„ ์ž๋ฃŒ ์ˆ˜์ธ ๋ฐ˜๋ณต์ˆ˜(n)์€ ๊ผญ ๋™์ผํ•  ํ•„์š”๋Š” ์—†๋‹ค)

 

ex)

→ 1๊ฐœ์˜ factor์— ์˜ํ•ด ๊ตฌ๋ถ„๋˜๋Š” ์„œ๋กœ ๋‹ค๋ฅธ 4๊ฐœ์˜ ๋ชจ์ง‘๋‹จ(4๊ฐœ์˜ treatment)์ด ์กด์žฌํ•œ๋‹ค

์ด ๋•Œ ๊ฐ ๋ชจ์ง‘๋‹จ๋ณ„ ํ‘œ๋ณธ์„ ์ถ”์ถœ!

๊ฐ ๋ชจ์ง‘๋‹จ๋ณ„ ํ‰๊ท ์€ ๋ชจ๋‘ 0์œผ๋กœ ๋™์ผ์„ ์ƒ์— ๋†“๊ณ  ์ถœ๋ฐœ 

๊ฐ ํ‘œ๋ณธ์˜ ํ‰๊ท ์€ ๋ชจํ‰๊ท ๊ณผ ๋™์ผํ•œ ๊ฐ’ ๊ทผ์ฒ˜์— ์žˆ๋‹ค๋Š” ๊ฑธ ์ „์ œ๋กœ ์ง„ํ–‰

์ด ๋•Œ ํ•œ ๋ชจ์ง‘๋‹จ์˜ ํ‘œ๋ณธ ํ‰๊ท ์ด ์œ ๋‚œํžˆ ๋‹ค๋ฅผ ๊ฒฝ์šฐ ํ•ด๋‹น ๋ชจ์ง‘๋‹จ์œผ๋กœ ์ธํ•ด ์ „์ฒด ํ‘œ๋ณธํ‰๊ท ์˜ ๋ณ€๋™์„ฑ์ด ์ปค์ง!

์ฆ‰ ํ‘œ๋ณธ์˜ ๋ณ€๋™์„ฑ์„ ์ง‘๋‹จ๋ณ„๋กœ ๋ชจ๋‘ ๊ตฌํ•ด ๋ณ€๋™์„ฑ์ด ํฌ๋‹ค๋ฉด ๋ชจํ‰๊ท ์ด ๋ชจ๋‘ ๊ฐ™์€ ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค๋ผ๊ณ  ๊ฒฐ๋ก  ๋‚ด๋ฆผ

์ œ๊ณฑํ•ฉ> ๋ณ€๋™์„ฑ ๊ตฌํ•˜๊ธฐ

(ํ•ด๋‹น ์‹์—์„œ๋Š” k๊ฐœ์˜ treatment, ๊ฐ treatment ๋‚ด๋ถ€์—๋Š” n๊ฐœ์˜ ์ž๋ฃŒ)

 

 

๐Ÿงต R-squared์—์„œ ๋ฐฐ์šด ์ ์ด ์žˆ๋Š” ์‹!

 

โ‘  ์ด์ œ๊ณฑํ•ฉ(SST; Sum of Squares Total): ์ „์ฒด๊ฐ€ ํ•˜๋‚˜์˜ ๊ทธ๋ฃน์ด๋ผ ์ƒ๊ฐํ•˜๊ณ  ์ „์ฒด์˜ ํ‰๊ท ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ data๊ฐ€ ์–ด๋Š๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๋Š” ์ง€ ๊ฐ ํŽธ์ฐจ์˜ ์ œ๊ณฑ์„ ๋”ํ•œ ๊ฒƒ (treatment ๊ตฌ๋ถ„ x) - ๊ทธ๋ฃน๊ณผ ๋ฌด๊ด€ํ•˜๊ฒŒ ์ „์ฒด ์ž๋ฃŒ์˜ ๋ณ€๋™์„ฑ ์ธก์ •

โ‘ก ์˜ค์ฐจ์ œ๊ณฑํ•ฉ(SSE; Sum of Squares Error) <๊ทธ๋ฃน ๋‚ด ๋ณ€๋™>: ๊ฐ ๊ทธ๋ฃน(treatment) ๋‚ด์—์„œ์˜ ๋ณ€๋™ ์ธก์ •, ๊ฐ ๊ทธ๋ฃน ๋‚ด์˜ data๋“ค์ด ํ•ด๋‹น ๊ทธ๋ฃน์˜ ํ‰๊ท ์œผ๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š” ์ง€ ํ•ด๋‹น ํŽธ์ฐจ์˜ ์ œ๊ณฑํ•ฉ

โ‘ข ์š”์ธ์ œ๊ณฑํ•ฉ(SSTR; Sum of Square TReatment) <๊ทธ๋ฃน ๊ฐ„ ๋ณ€๋™>: ๊ฐ ๊ทธ๋ฃน๋ณ„ ํ‰๊ท ์˜ ๋ณ€๋™ ์ธก์ • - ๊ฐ ๊ทธ๋ฃน์—์„œ ๋‚˜์˜จ ๊ทธ๋ฃน๋ณ„ ํ‰๊ท ์ด ์ „์ฒด data์˜ ํ‰๊ท ์œผ๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š” ์ง€ ๊ทธ๋ฃน๋ณ„ ํŽธ์ฐจ์˜ ์ œ๊ณฑํ•ฉ (SSTR์˜ ํฌ๊ธฐ๋กœ ์ง‘๋‹จ๋ณ„ ๋ชจํ‰๊ท ์˜ ์ฐจ์ด ํŒŒ์•… ๊ฐ€๋Šฅ)

 

๐Ÿง–โ™‚๏ธ ์ฆ‰! SSTR์ด ํฌ๋‹ค๋ฉด ๋ชจํ‰๊ท ์€ ๊ฐ™์ง€ ์•Š์Œ / ์ž‘๋‹ค๋ฉด ๋ชจํ‰๊ท ์€ ๊ฐ™์Œ

(๋ชจํ‰๊ท ์ด ๋ชจ๋‘ ๊ฐ™๋‹ค๋Š” ๊ฐ€์ • ํ•˜์— SSTRํฌ๊ธฐ๋ฅผ ์ฐพ์Œ - ํ‘œ๋ณธ์„ ๊ด€์ฐฐํ•˜๊ธฐ์— ์ •ํ™•ํžˆ 0์€ ์•ˆ๋‚˜์˜ด)

 

๐Ÿ˜บ ์šฐ๋ฆฌ๋Š” SSE์™€ SSTR๊ฐ„์˜ ๋น„ - ์ฆ‰, ๊ทธ๋ฃน ๋‚ด ๋ณ€๋™์— '๋น„ํ•ด' ๊ทธ๋ฃน ๊ฐ„ ๋ณ€๋™์˜ ์ƒ๋Œ€์  ํฌ๊ธฐ๊ฐ€ ํฌ๊ณ  ์ž‘์€ ์ง€(=์›๋ž˜ ์ž๋ฃŒ๊ฐ€ ๊ฐ€์ง€๋Š” ๋ณ€๋™(๊ทธ๋ฃน ๋‚ด ๋ณ€๋™)์— ๋น„ํ•ด์„œ ๊ทธ๋ฃน ๊ฐ„ ๋ณ€๋™์ด ์ƒ๋Œ€์ ์œผ๋กœ ์ถฉ๋ถ„ํžˆ ํฐ ์ง€) - ์ƒ๋Œ€์  ํฌ๊ธฐ ๋น„๊ต๋ฅผ ์ด์šฉํ•ด์„œ ๊ฒ€์ •ํ•œ๋‹ค (๊ทธ๋ฃน๋ณ„ ๋ชจํ‰๊ท  ๋ชจ๋‘ ๋™์ผํ•˜๊ฒŒ ๋†“๊ณ  ๊ฐ€์ •)

๊ฒ€์ • ์ ˆ์ฐจ>

1> ๊ฐ€์„ค

๊ท€๋ฌด๊ฐ€์„ค H0) '์ง‘๋‹จ ๊ฐ„ ํ‰๊ท  ์ฐจ์ด๊ฐ€ ์—†๋‹ค'

๋Œ€๋ฆฝ๊ฐ€์„ค Ha) '์ง‘๋‹จ ๊ฐ„ ํ‰๊ท  ์ฐจ์ด๊ฐ€ ์กด์žฌํ•œ๋‹ค' = ์ฆ‰, ํ•ด๋‹น factor์˜ ์ฒ˜๋ฆฌํšจ๊ณผ๊ฐ€ ์กด์žฌํ•œ๋‹ค

 

๐Ÿ™‹โ™€๏ธ ์ฃผ๋กœ ์šฐ๋ฆฌ๋Š” ํ•œ ๋ชจ์ง‘๋‹จ์—์„œ ํ•ด๋‹น factor์— ์˜ํ•ด ๋‘ ๊ฐœ ์ด์ƒ์˜ ์ง‘๋‹จ์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋Š”์ง€, factor์˜ ์˜ํ–ฅ ํšจ๊ณผ ์œ ๋ฌด๋ฅผ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ANOVA ์‹œํ–‰ํ•จ

 

2> ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ ๊ณ„์‚ฐ

โ‘  SSE์™€ SSTR์˜ ํ‰๊ท ์ œ๊ณฑ์„ ๊ตฌํ•œ๋‹ค(๊ฐ๊ฐ์˜ ์ž์œ ๋„๋กœ ๋‚˜๋ˆ”)

MSE = SSE/(nk-k)

MSTR = SSTR/(k-1) (๊ทธ๋ฃน๊ฐ„ ๋ณ€๋™์ด๋ฏ€๋กœ k๊ฐœ์˜ treatment๋งŒ ์ž์œ ๋„ ๊ณ„์‚ฐ์— ๊ณ ๋ คํ•˜๋ฉด ๋จ)

 

โ‘ก ๊ทธ๋ฃน๋ณ„ ๋ชจํ‰๊ท ์ด ๋ชจ๋‘ ๊ฐ™๋‹ค๋ฉด? MSE โ‰’ MSTR

โ‘ข ๊ทธ๋ฃน๋ณ„ ๋ชจํ‰๊ท ์ด ๋ชจ๋‘ ๊ฐ™์ง€ ์•Š๋‹ค๋ฉด? MSTR โ‰ซ MSE

MSE๋Š” ๊ทธ๋ฃน๋ณ„ ๋ชจํ‰๊ท ์ด ๊ฐ™๊ณ  ๋‹ค๋ฆ„๊ณผ ๊ด€๊ณ„ ์—†์ด ํ•ญ์ƒ unbiasedํ•˜์ง€๋งŒ (treatment ๋‚ด์—์„œ ๊ฐ€์ง€๋Š” ๋ณ€๋™์„ฑ์€ ๋ชจ๋‘ ์ผ์ •ํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•จ)

MSTR์˜ ๊ฒฝ์šฐ ์ง‘๋‹จ ๊ฐ„ ํ‰๊ท ์ฐจ์ด๊ฐ€ ์—†์„ ๊ฒฝ์šฐ์—๋งŒ (์ฆ‰ ๊ท€๋ฌด๊ฐ€์„ค์ด ์ฐธ์ผ ๊ฒฝ์šฐ) unbiasedํ•˜๋‹ค. ์ง‘๋‹จ ๊ฐ„ ํ‰๊ท  ์ฐจ์ด๊ฐ€ ์‹ฌํ• ์ˆ˜๋ก, ๋‹น์—ฐํžˆ ๊ทธ๋ฃน๊ฐ„ ๋ณ€๋™์€ ์ปค์ง€๋ฏ€๋กœ MSE๋ณด๋‹ค ์›”๋“ฑํžˆ ์ปค์ง€๊ฒŒ ๋œ๋‹ค

 

โ‘ฃ ์ด ๋•Œ ์šฐ๋ฆฌ๋Š” MSTR/MSE ratio๊ฐ’์„ F[k-1, nk-k] ([]์•ˆ์€ ๊ฐ๊ฐ์˜ ์ž์œ ๋„๋ฅผ ๋œปํ•จ)๋กœ ๋†“๊ณ  ์œ„ (2)์˜ ๊ฒฝ์šฐ F๋Š” 1์„, (3)์˜ ๊ฒฝ์šฐ F๋Š” ๊ณ„์† ์ปค์ง€๊ฒŒ ๋œ๋‹ค. 

 

โ‘ค ์ด ๋•Œ์˜ F๊ฐ’ (ratio)์€ F๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ฒŒ ๋œ๋‹ค (** F๋ถ„ํฌ๋Š” ์ถ”ํ›„ ํฌ์ŠคํŒ…!)

 

3> F๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ์˜ ์œ ์˜ํ™•๋ฅ  ๊ณ„์‚ฐ

๐Ÿ‘‹ ์œ ์˜ํ™•๋ฅ  = '๊ท€๋ฌด๊ฐ€์„ค H0๊ฐ€ ์‚ฌ์‹ค์ผ ๋•Œ, ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ X(MSTR/MSE ~ F[k-1, nk-k])์—์„œ, x0(ํ‘œ๋ณธ์ž๋ฃŒ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐ๋œ ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ ๊ฐ’)๋ณด๋‹ค ๋” 'ํฐ' ๊ฐ’์ด ๋‚˜์˜ฌ ํ™•๋ฅ '

 

โ€ป <์ฃผ์˜> ์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ๊ฑด, x0์˜ ๋Œ€๋ฆฝ๊ฐ€์„ค Ha ๋ฐฉํ–ฅ์€ 'ํ•ญ์ƒ ์˜ค๋ฅธ์ชฝ' (์˜ค๋ฅธ๊ผฌ๋ฆฌ ๋ฐฉํ–ฅ) โ€ป

(MSE๋Š” ํฌ๊ธฐ๊ฐ€ ์ผ์ •ํ•œ๋ฐ, MSTR์˜ ๊ฒฝ์šฐ ๋ณ€๋™์„ฑ์ด ์ƒ๊น€์— ๋”ฐ๋ผ ๋” ์ปค์งˆ ์ˆ˜ ๋ฐ–์— ์—†๊ธฐ์— - ์ž‘์•„์ง€์ง€ ๋ชปํ•จ)

 

4> ๊ฒ€์ •

๐Ÿ‘‹ ์œ„ 3>์—์„œ ๊ณ„์‚ฐ๋œ ์œ ์˜ํ™•๋ฅ (p-value)์ด ์ฃผ์–ด์ง„ ์œ ์˜์ˆ˜์ค€ α๋ณด๋‹ค ์ž‘์œผ๋ฉด ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ (์ž‘๋‹ค๋Š” ๊ฑด ์œ ์˜์ˆ˜์ค€์— ํ•ด๋‹นํ•˜๋Š” F statistics๊ฐ€ α๋ณด๋‹ค ํฌ๋‹ค๋Š” ๋œป)

ANOVA Table>

๐Ÿ‘‹ ์ œ๊ณฑํ•ฉ ์ข…๋ฅ˜ / ์ž์œ ๋„ / ํ‰๊ท ์ œ๊ณฑํ•ฉ ์ข…๋ฅ˜ / ์ตœ์ข… F๊ฐ’

 

์˜ˆ์‹œ>

Q. 3๊ฐ€์ง€ ํƒ€์ž…์˜ ์ž๋™์ฐจ ํ—ค๋“œ๋ผ์ดํŠธ ๋””์ž์ธ์„ ๊ณ ๋ คํ•˜๊ณ  ์žˆ๋‹ค๋ผ์ดํŠธ์˜ ํšจ๊ณผ๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ๊ฐ ๋””์ž์ธ์„ ์ ์šฉํ•œ ์ฐจ๋Ÿ‰๋ณ„๋กœ ์‹œ์† 60KM์˜ ์†๋„์—์„œ ์žฅ์• ๋ฌผ์„ ์ธ์ง€ํ•˜๋Š” ๊ฑฐ๋ฆฌ(m)๋ฅผ 5๋ฒˆ์”ฉ ๋ฐ˜๋ณตํ•˜์—ฌ ์ธก์ •ํ•˜์˜€๋‹ค. ์ด ์ž๋ฃŒ๋ฅผ ํ† ๋Œ€๋กœ ๋ผ์ดํŠธ ์ข…๋ฅ˜์— ๋”ฐ๋ผ ์ธ์ง€๊ฑฐ๋ฆฌ์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š” ์ง€๋ฅผ ์œ ์˜์ˆ˜์ค€ 1%๋กœ ๊ฒ€์ •ํ•˜๊ณ ์ž ํ•œ๋‹ค.

 

โ™ฌ ์š”์ธ(factor) = ์ž๋™์ฐจ ํ—ค๋“œ๋ผ์ดํŠธ ๋””์ž์ธ

โ™ฌ ์ฒ˜๋ฆฌ๋˜๋Š” treatment = A, B, C ์ด ์„ธ ๊ฐœ์˜ type

โ™ฌ ๋ฐ˜์‘๋ณ€์ˆ˜(์–ด๋–ค ๋Œ€์ƒ์„ ๊ด€์ฐฐํ•˜๋Š” ์ง€) = ์žฅ์• ๋ฌผ์„ ์ธ์ง€ํ•˜๋Š” ๊ฑฐ๋ฆฌ (์ธ์ง€๊ฑฐ๋ฆฌ)

 

 

A.

1> ๊ฐ€์„ค

๊ท€๋ฌด๊ฐ€์„ค H0: ํ—ค๋“œ๋ผ์ดํŠธ ์ข…๋ฅ˜๋ณ„ ํ‰๊ท ์ •์ง€๊ฑฐ๋ฆฌ๋Š” ์ฐจ์ด๊ฐ€ ์—†์Œ

→ ๋Œ€๋ฆฝ๊ฐ€์„ค Ha: ํ—ค๋“œ๋ผ์ดํŠธ ์ข…๋ฅ˜๋ณ„ ํ‰๊ท ์ •์ง€๊ฑฐ๋ฆฌ๋Š” ์ฐจ์ด๊ฐ€ ์žˆ์Œ

 

2> ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ & ํ†ต๊ณ„๋Ÿ‰์˜ ๋ถ„ํฌ

→ ๊ท€๋ฌด๊ฐ€์„ค์ด ์‚ฌ์‹ค์ผ ๋•Œ, X = MSTR/MSE ~ F[2,12]

→ ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ x0๊ฐ’ = 28.23

 

3> ์œ ์˜ํ™•๋ฅ  & ์œ ์˜์ˆ˜์ค€ ๊ฒ€์ • ๋ฐ ๊ฒฐ๊ณผ

→ ํ•ด๋‹น p-value๋Š” P[X>28.23] = 2.9e-05๋กœ ์œ ์˜์ˆ˜์ค€ 0.01๋ณด๋‹ค ์ž‘์Œ

→ ๊ท€๋ฌด๊ฐ€์„ค์ด ๊ธฐ๊ฐ๋จ

 

4> ๊ฒฐ๋ก 

→ ๊ท€๋ฌด๊ฐ€์„ค์ด ๊ธฐ๊ฐ๋˜์—ˆ์œผ๋ฏ€๋กœ ํ—ค๋“œ๋ผ์ดํŠธ ์ข…๋ฅ˜๋ณ„ ํ‰๊ท ์ •์ง€๊ฑฐ๋ฆฌ๋Š” ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค!

→ ์ฆ‰, ํ—ค๋“œ๋ผ์ดํŠธ ์ข…๋ฅ˜๊ฐ€ ์ •์ง€๊ฑฐ๋ฆฌ๋ผ๋Š” ๊ฒƒ์„ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ factor๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฒฐ๋ก ์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๋‹ค.

 

(+) ์˜ˆ์‹œ ANOVA table

 

 


w/code

♠ scipy.stats.f_oneway ♠

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html

 

scipy.stats.f_oneway(*args, axis=0)

 

"Perform one-way ANOVA. The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.'

 

์˜ˆ์‹œ>

Q. ์„œ์šธ์‹œ ๊ฐ€๋กœ์ˆ˜ ํ˜„ํ™ฉ ํ†ต๊ณ„์—์„œ ๋Šํ‹ฐ๋‚˜๋ฌด, ์€ํ–‰๋‚˜๋ฌด, ์–‘๋ฒ„์ฆ˜๋‚˜๋ฌด data๋ฅผ ์ˆ˜์ง‘ํ•ด๋ณด์ž. ์„œ์šธ์‹œ ๊ตฌ๋ณ„ ๋Šํ‹ฐ๋‚˜๋ฌด, ์€ํ–‰๋‚˜๋ฌด, ๊ทธ๋ฆฌ๊ณ  ์–‘๋ฒ„์ฆ˜๋‚˜๋ฌด data์˜ ๊ฐ ๊ทธ๋ฃจ ์ˆ˜ ํ‰๊ท ์ด ์„œ๋กœ ๊ฐ™์€ ์ง€ '๊ทธ๋ฃจ ์ˆ˜'๋ผ๋Š” ํ•œ factore๊ฐ€ ์ ์šฉ๋œ One-Way ANOVA๋ฅผ ์ด์šฉํ•˜์—ฌ ์œ ์˜์„ฑ 1% ๋ฒ”์œ„ ๋‚ด์—์„œ ๊ฒ€์ •ํ•ด๋ณด์ž

(two independent samples t-test์—์„œ ์‚ฌ์šฉํ•œ data ๊ทธ๋Œ€๋กœ)

 

1> data ์ค€๋น„

 

df_anova = df.loc[:,['์€ํ–‰๋‚˜๋ฌด','์–‘๋ฒ„์ฆ˜๋‚˜๋ฌด','๋Šํ‹ฐ๋‚˜๋ฌด']]

#dataframe ๋‚ด ๋ชจ๋‘ , ์ œ๊ฑฐ & intํ˜• ๋ณ€ํ™˜ (for ~ in df.iterrows() ๊ตฌ๋ฌธ ์‚ฌ์šฉ)
for i,s in df_anova.iterrows():
    df_anova.iloc[i] = pd.to_numeric(s.str.replace(',',''))

 

2> EDA

→ violin plot์œผ๋กœ '๋‚˜๋ฌด ๊ฐœ์ˆ˜'๋ผ๋Š” factor๋กœ ๋‚˜๋ˆˆ 3๊ฐœ์˜ treatment ์‹œ๊ฐํ™”>

 

from matplotlib import font_manager, rc

matplotlib.rcParams['axes.unicode_minus'] = False 

font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
rc('font', family=font_name)

#violinplot
sns.violinplot(data=df_anova,palette="muted");

 

 

3> F ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ ๊ฐ’๊ณผ p-value ์ถœ๋ ฅํ•˜๊ธฐ

 

fvalue, pvalue = stats.f_oneway(df_anova['์€ํ–‰๋‚˜๋ฌด'], df_anova['์–‘๋ฒ„์ฆ˜๋‚˜๋ฌด'], df_anova['๋Šํ‹ฐ๋‚˜๋ฌด'])
print(fvalue, pvalue)
#17.006289557888046 8.935183167883698e-07

 

4> ๊ฒ€์ • ๊ฒฐ๊ณผ

→ pvalue๊ฐ€ 0.01๋ณด๋‹ค ์ž‘์œผ๋ฏ€๋กœ ๊ท€๋ฌด๊ฐ€์„ค์ด ๊ธฐ๊ฐ๋จ

→ ๋”ฐ๋ผ์„œ ์€ํ–‰๋‚˜๋ฌด, ์–‘๋ฒ„์ฆ˜๋‚˜๋ฌด, ๋Šํ‹ฐ๋‚˜๋ฌด data๋Š” '๊ทธ๋ฃจ ์ˆ˜'๋ผ๋Š” factor์— ์˜ํ•ด ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด factor์— ์˜ํ•ด ๊ตฌ๋ถ„๋œ ์ด 3๊ฐœ์˜ treatment์˜ ํ‰๊ท ๋“ค์€ ์„œ๋กœ ๊ฐ™์ง€ ์•Š์Œ์„ 'ํ†ต๊ณ„์  ์œ ์˜์„ฑ' ๋ฒ”์œ„ ๋‚ด์—์„œ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค!

 

- ๋! -

 


* ์ธ๋„ฌ ์ถœ์ฒ˜) https://ourcodingclub.github.io/tutorials

* ๋‚ด์šฉ ์ „์ฒด ์ถœ์ฒ˜) ProDS(์ดˆ๊ธ‰+์ค‘๊ธ‰)1 

* ANOVA ์„ค๋ช… ์ž๋ฃŒ) https://www.mathstat.dal.ca/~stat2080/Fall14/Lecturenotes/anova1.pdf

๋Œ“๊ธ€