Statistics/Concepts(+codes)

Law of Large Numbers (ํฐ ์ˆ˜์˜ ๋ฒ•์น™; LLN)

metamong 2022. 5. 5.

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ sample data ์ˆ˜๊ฐ€ ์ปค์งˆ์ˆ˜๋ก, sample์˜ ํ†ต๊ณ„์น˜๋Š” ์ ์  ๋ชจ์ง‘๋‹จ์˜ ๋ชจ์ˆ˜์™€ ๊ฐ™์•„์ง„๋‹ค๋Š” ๋œป!

โ˜… ๊ตฌ์ฒด์ ์ด๊ฒŒ ๋งํ•˜๋ฉด 'the mean of your sample is going to converge to the true mean of the population or to the expected value of the random variable'

 

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ ์ผ๋ฐ˜์ ์œผ๋กœ sample์˜ ์ˆ˜๊ฐ€ 30๊ฐœ ์ด์ƒ์ด๋ฉด ํฐ ์ˆ˜์˜ ๋ฒ•์น™์ด ์ ์šฉ๋œ๋‹ค๊ณ  ํ•œ๋‹ค

 

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ ๋„ˆ๋ฌด๋‚˜ ๋‹น์—ฐํ•œ ๋‚ด์šฉ์ด๋ฏ€๋กœ..! ๋น ๋ฅด๊ฒŒ ํ›‘๊ณ  ๋„˜์–ด๊ฐ€์ž ๐Ÿงš๐Ÿพ

concepts

- wikipedia -

'In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.'

 

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ ์—„๋ฐ€ํžˆ ๋งํ•˜๋ฉด sample์˜ ํ‰๊ท ์ด sample ๊ฐฏ์ˆ˜๊ฐ€ ์ปค์งˆ์ˆ˜๋ก, ๋ฌดํ•œํžˆ ๊ฐˆ์ˆ˜๋ก ์ „์ฒด ๊ธฐ๋Œ“๊ฐ’(๋ชจ์ˆ˜)์— ๊ฐ€๊นŒ์›Œ์ง„๋‹ค๋Š” ๋œป์ด๋‹ค (ํ‰๊ท  ํ•œ์ •)

 

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ stable long-term results๋ฅผ ๋ณด์žฅํ•ด์ฃผ๋Š” law์ด๊ธฐ์— ์ค‘์š”ํ•˜๊ฒŒ ์“ฐ์ž„ (ํŠนํžˆ randomํ•˜๊ฒŒ ๋ฐœ์ƒํ•˜๋Š” event์™€ ๊ด€๋ จํ•  ๋•Œ)

 

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ ๋‹น์—ฐํžˆ ๋งŽ์€ ๊ด€์ธก์น˜๋“ค(์‹œ๋„๋“ค)์ด ๋ณด์žฅ๋˜์–ด์•ผ ํ•จ

 

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ ์ด ๋•Œ, gambler's fallacy(Monte Carlo fallacy) ใ€ŠํŠน์ • event์˜ ๊ณผ๊ฑฐ๋นˆ๋„๊ฐ€ ๋†’์•˜๋‹ค๋ฉด ๋ฏธ๋ž˜์—๋„ ๋†’๊ฒŒ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜, ๊ฑฐ๊พธ๋กœ ํ›จ์”ฌ ์ ๊ฒŒ ๋ฐœ์ƒํ•œ๋‹ค๊ณ  ์˜ˆ์ธกํ•˜๋Š” fallacyใ€‹์— ์˜ํ•ด ๊ณผ๊ฑฐ ์‚ฌ๊ฑด์— ์˜ํ–ฅ์„ ๋ฐ›์•„ ์˜ˆ์ธก๋œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด์„œ๋Š” ์•ˆ๋œ๋‹ค. LLN์€ ๋งค event๋ผ๋ฆฌ ์„œ๋กœ ์˜ํ–ฅ์„ ์•ˆ๋ฐ›๊ณ  ๋…๋ฆฝ์ ์œผ๋กœ, ๋งค ๊ฒฐ๊ณผ๋Š” ์˜ˆ์ธก๋ถˆ๊ฐ€๋กœ randomํ•˜๊ฒŒ ๋‚˜์˜จ๋‹ค๊ณ  ๊ฐ€์ •

 

๐Ÿ‘จ๐Ÿพ‍๐Ÿ”ฌ Weak LLN vs. Strong LLN?

→ WLLN์€ ๋ฌดํ•œ๋Œ€์˜ the number of sample์ด ์กด์žฌํ•  ๊ฒฝ์šฐ ๋ชจํ‰๊ท ๊ณผ sample mean์˜ ์ฐจ์ด๊ฐ€ ๊ทธ ์–ด๋–ค ์–‘์ˆ˜ ε๋ณด๋‹ค๋„ ์ž‘์€ ๊ฒฝ์šฐ๊ฐ€ ๋ฐ˜๋“œ์‹œ(์•„๋ž˜ Pr์ด 1์ด ๋œ๋‹ค๊ณ  ์ œ์‹œ๋จ) ์กด์žฌํ•œ๋‹ค๋Š” ๋œป

→ SLLN์€ ๋ฌดํ•œ๋Œ€์˜ the number of sample์ด ์กด์žฌํ•œ๋‹ค๋ฉด sample mean์€ ๋ฌด์กฐ๊ฑด ๋ชจํ‰๊ท ์ด ๋œ๋‹ค๋Š” ๋œป

→ ๊ทธ๋ž˜์„œ WLLN์ด ์ข€ ๋” ์•ฝํ•˜๊ฒŒ ๋ชจํ‰๊ท ์„ ๋Œ€ํ‘œํ•œ๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค!

 

- (์™ผ์ชฝ๋ถ€ํ„ฐ) LLN Form - WLLN - SLLN -

 

 

w/code

โ‘  ํ‰๊ท ์ด 50์ด๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 10์ธ 2๋งŒ๊ฐœ์˜ sample์ด ๋”ฐ๋ฅด๋Š” ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋ชจ์ง‘๋‹จ์œผ๋กœ ๊ฐ€์ •

 

population = np.random.normal(50, 10, 20000) #mean 50, std 10, 1000 samples normal distribution

population.mean()
#50.03223541665855

 

โ‘ก sample size๋ฅผ 5 ๊ฐ„๊ฒฉ์œผ๋กœ 5์—์„œ 19995๊นŒ์ง€ ๋Š˜๋ฆฌ๋ฉด์„œ sample์˜ mean์„ ์ธก์ • & ์‹œ๊ฐํ™”

 

dat = []

for i in np.arange(start = 5, stop = 19995, step = 5) :
  s = np.random.choice(population, i)
  dat.append(s.mean())
dat

#method chaining ์‚ฌ์šฉ - ๋ฉ”์„œ๋“œ๊ฐ€ ๊ฐ์ฒด๋ฅผ ๋ฐ˜ํ™˜ํ•˜๊ฒŒ ๋˜๋ฉด, ๋ฉ”์„œ๋“œ์˜ ๋ฐ˜ํ™˜ ๊ฐ’์ธ ๊ฐ์ฒด๋ฅผ ํ†ตํ•ด ๋˜ ๋‹ค๋ฅธ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœ

(pd
 .DataFrame(dat)
 .plot(figsize=(7,7))
 .axhline(y = 50, color = '#F80909')
 );

 

 

โ‘ข ์‹œ๊ฐํ™” ๊ฒฐ๊ณผ> ์šฐ๋ฆฌ๋Š” sample mean์ด ์ „์ฒด ๋ชจํ‰๊ท ์ธ 50์„ ํ–ฅํ•ด ์ ์  convergeํ•จ์„ ๊ทธ๋ฆผ์„ ํ†ตํ•ด ์•Œ ์ˆ˜ ์žˆ๋‹ค! ๐Ÿคฉ


* ์ถœ์ฒ˜1) https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/expected-value-lib/v/law-of-large-numbers

* ์ถœ์ฒ˜2) https://en.wikipedia.org/wiki/Law_of_large_numbers

๋Œ“๊ธ€