Python/Pandas&Numpy

pandas Tricks_09&10๐Ÿ‘‰๐Ÿป 'EXPANDING → a string & a series of lists - into a DF' (Kevin by DataSchool)

metamong 2022. 4. 14.

๐Ÿ˜Œ dataframe๋‚ด์˜ data๋ฅผ ์ž์œ ์ž์žฌ๋กœ ๋‚˜๋ˆ„์–ด ๊ฐ–๋‹ค ๋ถ™์ด๊ณ , ๋“ค์–ด ์žˆ๋Š” ๊ฒŒ list๋ฉด ๋”ฐ๋กœ ๋ฝ‘์•„๋‚ด์„œ ๊ฐ–๋‹ค ๋ถ™์ด๊ณ  ์‹ถ๋‹ค๋ฉด..?

→ ์ฆ‰, dataframe ์ž์ฒด๋ฅผ ์ž์œ ์ž์žฌ๋กœ ์ถ”๊ฐ€ ์ •๋ณด๋ฅผ ๋ถ™์—ฌ ํ™•๋Œ€ํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์‚ฌ์šฉ! ←

1. EXPANDING(1) - splitting a string into MULTIPLE COLUMNS

Q) column ๋‚ด์˜ string ๋ฌธ์ž์—ด์„ ์ผ์ • ๊ธฐ์ค€์œผ๋กœ ๋‚˜๋ˆ„์–ด ๊ธฐ์กด dataframe์— ๊ฐ–๋‹ค ๋ถ™์ด๊ณ  ์‹ถ์œผ๋ฉด?

A) .column_name.str method๋ฅผ ํ™œ์šฉํ•˜์—ฌ split ์ ์šฉ → expand = True

 

df = pd.DataFrame({'name': ['Ryan Murphy Kim', 'Jane Doe Rhondall'],
                   'location': ['Los Angeles, CA', 'Washington, DC']})

 

 df[[]] ์ด์ค‘์œผ๋กœ ์ƒˆ๋กœ์šด ์„ธ columns๋ฅผ ๋งŒ๋“ค๊ณ  ' '์— ์˜ํ•ด ๋‚˜๋ˆ„์–ด ๊ฐ–๋‹ค ๋ถ™์ธ๋‹ค

 

df[['first', 'middle', 'last']] = df.name.str.split(' ', expand = True)

 

(๋’ค์— [0] [1] ๋“ฑ๋“ฑ locate ํ•ด์„œ ์ผ๋ถ€๋งŒ ๋นผ์„œ ๋ถ™์ผ ์ˆ˜๋„ ์žˆ๋‹ค)

 

- ์„ธ column ๋ถ™์ด๊ธฐ ์™„์„ฑ! -

 

2. EXPANDING(2) - expanding a Series of lists into a DATAFRAME

Q) column๋‚ด์— nested๋˜์–ด ์žˆ๋Š” list ์›์†Œ๋ฅผ ๋นผ์„œ ๋ถ™์ด๋Š” ๋ฐฉ๋ฒ•?

A) ์‰ฝ๊ฒŒ pd.Series ํ™œ์šฉํ•˜๊ธฐ (apply ์ ์šฉ)

 

df = pd.DataFrame({'col_one': ['a', 'b', 'c'],
                   'col_two': [[10,40], [20,50], [30,60]]})

 

→ ์›ํ•˜๋Š” column col_two์—์„œ pd.Series๋ฅผ ํ™œ์šฉํ•œ๋‹ค (list๋‚ด์˜ ์›์†Œ๋“ค์„ Seriesํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•ด ๊ฐœ๋ณ„ column๋“ค๋กœ ๋งŒ๋“ค์–ด์คŒ!)

 

df_new = df.col_two.apply(pd.Series)

 

 concat์œผ๋กœ ๊ฐ™๋‹ค ๋ถ™์ด๋ฉด ์™„์„ฑ!

(**** concat ๊ด€๋ จ ํฌ์ŠคํŒ… ์•„๋ž˜ ์ฐธ์กฐ ↓↓↓↓↓↓ ****)

 

concat & append & merge & join

๐Ÿ‘‹ data๋Š” ๋ฌด์ˆ˜ํžˆ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜๋กœ ๋‚˜๋ˆ„์–ด์ ธ ์žˆ๋‹ค(for ๋ณด์•ˆ & ํšจ์œจ์„ฑ). ํ•ฉ์น˜๋Š” ๊ณผ์ •์„ data preprocessing ๊ณผ์ •์—์„œ ๋ฐ˜๋“œ์‹œ ๊ฒช๊ฒŒ ๋˜๋Š” ๋ฐ,, ์–ด๋–ค case์— ์–ด๋–ค ์ตœ์ ์˜ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•  ์ง€ ์ด๋ฒˆ ํฌ์ŠคํŒ…์„ ํ†ต

sh-avid-learner.tistory.com

 

pd.concat([df, df_new], axis='columns')

 

- 0๊ณผ 1 column๋“ค์ด ์™„์„ฑ๋จ! -

 

 

- dataframe ๊ฐ–๋‹ค ๋ถ™์ด๊ธฐ ๋ -

 

* ์ถœ์ฒ˜) https://youtu.be/RlIiVeig3hc

๋Œ“๊ธ€