Python/Pandas&Numpy

concat & append & merge & join

metamong 2022. 4. 9.

๐Ÿ‘‹ data๋Š” ๋ฌด์ˆ˜ํžˆ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜๋กœ ๋‚˜๋ˆ„์–ด์ ธ ์žˆ๋‹ค(for ๋ณด์•ˆ & ํšจ์œจ์„ฑ). ํ•ฉ์น˜๋Š” ๊ณผ์ •์„ data preprocessing ๊ณผ์ •์—์„œ ๋ฐ˜๋“œ์‹œ ๊ฒช๊ฒŒ ๋˜๋Š” ๋ฐ,,

์–ด๋–ค case์— ์–ด๋–ค ์ตœ์ ์˜ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•  ์ง€ ์ด๋ฒˆ ํฌ์ŠคํŒ…์„ ํ†ตํ•ด

์ด ๋„ค ๊ฐ€์ง€! concat, append, merge, join์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์ž

 

๐Ÿ‘‰ docu list

- concat) https://pandas.pydata.org/docs/reference/api/pandas.concat.html

- append - dataframe) https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html

- append - series) https://pandas.pydata.org/docs/reference/api/pandas.Series.append.html#pandas.Series.append

- merge) https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

- join) https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html


1. concat

→ ์„ธ method์™€ ๋‹ค๋ฅธ ์ ์€ pandas objectํ˜•์ด๋ฉด ์–ธ์ œ๋‚˜ concat์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด๋‹ค.

→ ๋ง ๊ทธ๋Œ€๋กœ 'concatenate' - ์ฆ‰ ๊ฐ–๋‹ค ๋ถ™์ธ๋‹ค๋Š” ๋œป

 Series์™€ dataframe ๋ชจ๋‘์— ์ ์šฉ๊ฐ€๋Šฅ

 

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

 

"Concatenate pandas objects along a particular axis with optional set logic along the other axes. Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number."

 

axis (default 0(index))

"The axis to concatenate along."

 default๋Š” ํ•œ Series(dataframe)์„ index ๊ธฐ์ค€ ์ •๋ ฌ, ์ฆ‰ ํ•œ Series(dataframe)์˜ ์•„๋ž˜๋ถ€๋ถ„์— ๊ฐ–๋‹ค ๋ถ™์ธ๋‹ค๋Š” ์˜๋ฏธ (๋ฌผ๋ฆฌ์  ๋ถ™์ด๊ธฐ ๊ฐœ๋…)

 1(columns)๋ผ๋ฉด ํ•œ Series(dataframe)์˜ ์˜ค๋ฅธ์ชฝ์— ๊ฐ–๋‹ค ๋ถ™์ž„ (์ฆ‰, ์—ฌ๋Ÿฌ columns๊ฐ€ ์ƒ์„ฑ๋˜๋ฏ€๋กœ dataframe ๋ฐ˜ํ™˜)

 

s1 = pd.Series(['a', 'b'])
s2 = pd.Series(['c', 'd'])
pd.concat([s1, s2], axis=0)

'''
0    a
1    b
0    c
1    d
dtype: object
'''

 

ignore_index (default False)

"If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join."

 True๋กœ ์„ค์ •ํ•˜๋ฉด index๊ฐ€ ์ฐจ๋ก€๋Œ€๋กœ 0๋ถ€ํ„ฐ n-1๊นŒ์ง€ ์„ค์ •๋œ๋‹ค. ์ฆ‰, ์œ„์˜ ์ฝ”๋“œ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด default ์ธ์ž๋กœ๋Š” index๊ฐ€ 0๋ถ€ํ„ฐ 3๊นŒ์ง€ ์ฐจ๋ก€๋กœ ์ •๋ ฌ๋˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

→ ๋”ฐ๋ผ์„œ index๊ฐ€ ์ง€์ €๋ถ„ํ•˜๊ณ  ๊น”๋”ํ•˜๊ฒŒ ๋งŒ๋“ค๊ณ  ์‹ถ๋‹ค๋ฉด ignore_index์ธ์ž๋ฅผ ๊ผญ! True๋กœ ๋ฐ”๊พธ์ž (๋งŒ์•ฝ ๊ธฐ์กด index ์ •๋ณด๊ฐ€ ๋ฌด์˜๋ฏธํ•˜๋‹ค๋ฉด)

 

keys (default False)

"If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level."

 ๋‘ ๊ฐœ ์ด์ƒ์˜ ๋ณตํ•ฉ hierarchical index๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” ์ธ์ž

 ๋งŒ๋“ค๊ณ  ์‹ถ์€ index์˜ ์ธ์ž๋ฅผ list ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด ์ธ์ž์— ์ง‘์–ด๋„ฃ๋Š”๋‹ค.

→ ์•„๋ž˜์™€ ๊ฐ™์ด s1๊ณผ s2๊ฐ€ index ํ˜•ํƒœ๋กœ ๋“ค์–ด๊ฐ€ multiindex๋กœ ์„ค์ •ํ–ˆ๋‹ค.

 

s1 = pd.Series(['a', 'b'])
s2 = pd.Series(['c', 'd'])
pd.concat([s1, s2], keys=['s1', 's2'])

'''
s1  0    a
    1    b
s2  0    c
    1    d
dtype: object
'''

pd.concat([s1, s2], keys=['s1', 's2']).index

'''
MultiIndex([('s1', 0),
            ('s1', 1),
            ('s2', 0),
            ('s2', 1)],
           )
'''

 

names (default none)

"Names for the levels in the resulting hierarchical index."

→ keys์ธ์ž๋ฅผ ํ†ตํ•ด multiindex๋ฅผ ๋งŒ๋“ค์—ˆ์„ ๊ฒฝ์šฐ 2๊ฐœ ์ด์ƒ์˜ index์— ์ด๋ฆ„์„ ๋ถ€์—ฌํ•  ๋•Œ names ์ธ์ž์— index์ด๋ฆ„๋“ค์„ list ํ˜•ํƒœ๋กœ ์ง‘์–ด๋„ฃ์Œ

→ (์•„๋ž˜ ex) ๋‘ ๊ฐœ์˜ index์— ๊ฐ๊ฐ 'Series name'์™€ 'Row ID' naming

 

pd.concat([s1, s2], keys=['s1', 's2'], names=['Series name', 'Row ID'])

'''
Series name  Row ID
s1           0         a
             1         b
s2           0         c
             1         d
dtype: object
'''

 

verify_integrity(default False)

"Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation."

 ๋™์ผํ•œ index๋ผ๋ฆฌ concatenateํ•˜๋Š” ์ง€ duplicated index values๋ฅผ checkํ•ด์ฃผ๋Š” ์ธ์ž์ด๋‹ค

→ ์˜ˆ๋ฅผ ๋“ค์–ด ์•„๋ž˜์™€ ๊ฐ™์ด ๋™์ผํ•œ index 'a'๊ฐ€ ์žˆ์œผ๋ฉด ๋ฌด์ž‘์ • concatenateํ•˜์ง€ ๋ง๊ณ  ์˜ค๋ฅ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋ž€ ๋œป!

(ValueError ๋œธ - Indexes have overlapping values)

 

df5 = pd.DataFrame([1], index=['a'])
df6 = pd.DataFrame([2], index=['a'])

pd.concat([df5, df6], verify_integrity=True)

'''
ValueError: Indexes have overlapping values: Index(['a'], dtype='object')
'''

 

join(default outer)

"How to handle indexes on other axis (or axes)."

 ๊ธฐ๋ณธ default๋Š” outer๋‹ค. ์ฆ‰, concatenateํ•  ๋•Œ Series์ด๋“  dataframe์ด๋“  ๋ถ™์ด๊ณ  ๋‚œ ๋‚˜๋จธ์ง€ ๊ณต๊ฐ„์€ ๋ชจ๋‘ NaN ์ฒ˜๋ฆฌ

→ ํŠนํžˆ ๋‘ dataframe๋ผ๋ฆฌ concatํ•  ๋•Œ ๋™์ผํ•œ column์ด ์•„๋‹Œ data๊ฐ€ ์žˆ์œผ๋ฉด ๊ทธ๋ƒฅ ๋ถ™์ด๊ณ  ๋‚จ์€ data๋Š” ๋ชจ๋‘ NaN

→ ์˜ˆ๋ฅผ ๋“ค์–ด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‘ dataframe์ด ์žˆ๋Š”๋ฐ ๋ถ™์ด๋ ค๋Š” dataframe์— animal์ด๋ผ๋Š” ์ƒˆ column์ด ์žˆ๋‹ค. ์ด๋Ÿด ๊ฒฝ์šฐ ๊ธฐ์กด dataframe์—์˜ animal column ๊ฐ’์€ ๋ชจ๋‘ NaN ์ฒ˜๋ฆฌ

 

df1 = pd.DataFrame([['a', 1], ['b', 2]],
                   columns=['letter', 'number'])
df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
                   columns=['letter', 'number', 'animal'])
pd.concat([df1, df3])

 

 

→ ์—ฌ๊ธฐ์„œ ๋งŒ์•ฝ ์ธ์ž๋ฅผ inner๋กœ ์„ค์ •ํ–ˆ๋‹ค๋ฉด animal column์€ ์‚ญ์ œ๋œ๋‹ค (๊ณตํ†ต๋œ column letter์™€ number๋งŒ ์‚ด์•„๋‚จ์Œ)

→ ์ฆ‰ ๊ต์ง‘ํ•ฉ data๋งŒ ๋งŒ๋“ค์–ด ๋‚ด๊ณ  NaN์€ ์ ˆ๋Œ€ ์ถœ๋ ฅ๋˜์ง€ ์•Š๋Š”๊ฒŒ inner join ์„ค์ •

 

pd.concat([df1, df3], join = 'inner')

 

- ๋‘ ์นผ๋Ÿผ๋งŒ -

 

 

sort(default False)

"Sort non-concatenation axis if it is not already aligned when join is ‘outer’. This has no effect when join='inner', which already preserves the order of the non-concatenation axis."

→ True๋กœ ์„ค์ •ํ•˜๋ฉด column name์ด Sorting๋˜์–ด ์ถœ๋ ฅ๋œ๋‹ค.

 inner join์ผ ๊ฒฝ์šฐ ์ด๋ฏธ joinํ•˜๋ฉด์„œ column์˜ ์ˆœ์„œ๊ฐ€ ์†์— ์ •๋ ฌ๋˜๊ธฐ์— ์ ์šฉ๋˜์ง€ ์•Š์Œ!


2. append (DataFrame.append & Series.append)

→ concat์˜ ํŠน๋ณ„ํ•œ case - ์ฆ‰, concat์˜ ์ธ์ž join = outer & axis = 0์ผ ๊ฒฝ์šฐ append๋ผ ํ•จ 

 

DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)

 

"Append rows of other to the end of caller, returning a new object. Columns in other that are not in the caller are added as new columns."

 

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'), index=['x', 'y'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'), index=['x', 'y'])

df.append(df2)

 

!-- ๊ทผ๋ฐ Warning์ด ๋ฐœ์ƒํ•œ๋‹ค --!

 

 

๐Ÿ˜… append๋Š” ๊ณง ์‚ฌ๋ผ์งˆํ…Œ๋‹ˆ concat์“ฐ๋ผ๊ณ  ์นœ์ ˆํžˆ ์•Œ๋ ค์คŒ ๐Ÿ˜…

(Series๋„ ๋งˆ์ฐฌ๊ฐ€์ง€..!)

concat ์“ฐ์ž


3. merge

 how ์ธ์ž์— ๋”ฐ๋ผ inner, outer, left, right ํ˜•ํƒœ์˜ merge๋กœ ๋‚˜๋‰œ๋‹ค

 concat์€ ๊ทธ๋Œ€๋กœ ๋ถ™์ด๋Š” ํ•จ์ˆ˜์ด์ง€๋งŒ, merge๋Š” ์–ด๋–ค ๊ธฐ์ค€์— ์˜ํ•ด '๊ณตํ†ต๋œ ๋ถ€๋ถ„ + alpha(์ธ์ž์— ๋”ฐ๋ผ ๋‹ค๋ฆ„)'์„ ๋ณด์—ฌ์ค€๋‹ค

 

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

 

"Merge DataFrame or named Series objects with a database-style join. A named Series object is treated as a DataFrame with a single named column. The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed."

 

how(default 'inner')

 

 

"left: use only keys from left frame, similar to a SQL left outer join; preserve key order"

"right: use only keys from right frame, similar to a SQL right outer join; preserve key order"

"outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically"

"inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys"

 

ex) ๋‘ dataframe์„ ๋งŒ๋“ค๊ณ  ๊ต์ง‘ํ•ฉ์œผ๋กœ ๋งŒ๋“ค key๋Š” column 'a'๋กœ ์„ค์ •!

(๋ชจ๋“  merge ์ข…๋ฅ˜ ํ™•์ธํ•˜๊ธฐ + concat๊ณผ ๋น„๊ต!)

 

df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})

df1.merge(df2, how='inner', on='a')
df1.merge(df2, how='left', on='a')
df1.merge(df2, how='right', on='a')
df1.merge(df2, how='outer', on='a')

#concat๊ณผ ๋น„๊ตํ•˜๊ธฐ
pd.concat([df1,df2],axis=1,join='inner')

 

(ํ•˜๋‹จ) (์™ผ์ชฝ๋ถ€ํ„ฐ) inner - left -right - outer (์ƒ๋‹จ ์˜ค๋ฅธ์ชฝ์€ concat(inner))

 

 

→ ์ƒ๋‹จ ๊ทธ๋ฆผ์„ ํ†ตํ•ด ์ฐจ์ด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. concat inner join์˜ ๊ฒฝ์šฐ merge inner join๊ณผ ๋‹ฌ๋ฆฌ ๋ง ๊ทธ๋Œ€๋กœ ๊ฐ–๋‹ค ๋ถ™์ธ ๊ฒƒ์ด๋ผ column ๋ช…์ด ์ค‘๋ณต๋œ ์ฑ„๋กœ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถ™์—ฌ์ง„ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

→ ํ•˜์ง€๋งŒ merge์˜ ๊ฒฝ์šฐ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋‘ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๊ฐ„์˜ ๊ณตํ†ต์ ์€ ๋‹จ ํ•œ ๋ฒˆ ๋“ฑ์žฅํ•˜๋ฉฐ ๊ทธ ์ดํ›„ ์•ˆ ๊ฒน์น˜๋Š” ๋ถ€๋ถ„ ๋“ฑ์žฅ ์—ฌ๋ถ€์— ๋”ฐ๋ผ merge ์ข…๋ฅ˜๊ฐ€ ๋‚˜๋‰œ๋‹ค ๋ณผ ์ˆ˜ ์žˆ์Šด...!

 

on, left_on, right_on(default None)

"on)label or list - column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

left_on) label or list, or array-like - Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.

right_on) label or list, or array-like - Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns."

 

→ ์œ„์˜ ์˜ˆ๋ฅผ ํ†ตํ•ด์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด merge๋˜๋Š” ๊ธฐ์ค€์ด๋‹ค.

→ on์„ ์“ด๋‹ค๋Š” ๊ฒƒ์€ ๋‘ dataframe ๊ฐ„ ๋น„๊ตํ•  column ์ด๋ฆ„์ด ๋™์ผํ•˜๋‹ค๋Š” ๋œป

→ ๋งŒ์•ฝ column ์ด๋ฆ„์ด ๋‹ค๋ฅด๋‹ค๋ฉด left_on๊ณผ right_on์œผ๋กœ ๊ฐ df๊ฐ„ ๋น„๊ตํ•  column ์ด๋ฆ„์„ ๋ช…์‹œํ•œ๋‹ค!

(left_on๊ณผ right_on ์‚ฌ์šฉํ•ด์„œ merge ์‚ฌ์šฉ ์‹œ left_on๊ณผ right_on ๋‚ด์šฉ์ด ๊ฐ๊ฐ return๋จ - ๊ทธ๋ฆฌ๊ณ  value_x์™€ value_y๋กœ ๊ฐ๊ฐ ์™ผ์ชฝ๊ณผ ์˜ค๋ฅธ์ชฝ key๊ฐ’์˜ value๊ฐ€ ์˜จ๋‹ค)

 

→ ex)

 

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [5, 6, 7, 8]})

df1.merge(df2, left_on='lkey', right_on='rkey')

 

- ๊ธฐ์ค€์ด ๋˜๋Š” column๋“ค์ด ๋ชจ๋‘ ๋ช…์‹œ๋จ -

 

 

suffixes (default "_x", "_y")

"A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None."

 

→ mergeํ•˜๋ ค๋Š”๋ฐ ๊ณตํ†ต๋œ ๋ถ€๋ถ„์ด ์•„๋‹Œ ๋‹ค๋ฅธ column๋“ค ์ค‘ ๋‹ค๋ฅธ dataframe์—์„œ ์™”๋Š”๋ฐ๋„ column๋ช…์ด ๋˜‘๊ฐ™์œผ๋ฉด ํ—ท๊ฐˆ๋ฆด ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ column๋ช… ๋’ค์— ์ ‘๋‘์–ด๋ฅผ ๋ถ™์—ฌ ๊ตฌ๋ถ„์ด ๋˜๊ฒŒ ๋‚˜ํƒ€๋‚ด๊ฒ ๋‹ค๋Š” ๋œป (๊ธฐ๋ณธ์ ์œผ๋กœ on ์„ค์ •์„ ์•ˆํ•˜๋ฉด ๋˜‘๊ฐ™์€ column๋ช…์ด merge๊ธฐ์ค€์ด ๋˜๋ฏ€๋กœ suffixes๊ฐ€ ํ•„์š”์—†์ง€๋งŒ on ์„ค์ •์„ ํ•˜๋ฉด ๋˜‘๊ฐ™์€ column๋ช…์„ ๊ฐ€์ง„ column์ด merge๊ธฐ์ค€์ด ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ด๋•Œ suffixes๋ฅผ ํ†ตํ•ด column๋ช…์„ ๊ตฌ๋ถ„ํ•ด์ค€๋‹ค)

→ ์œ„ ์˜ˆ์˜ ๊ฒฝ์šฐ value๋ผ๋Š” ๋‘ columns๋“ค์ด ์žˆ์œผ๋ฏ€๋กœ ๋’ค์— left, right๋ฅผ ๊ฐ๊ฐ ๋ถ™์—ฌ ๊ตฌ๋ถ„ํ•˜๊ฒ ๋‹ค.

(default๋กœ๋Š” _x์™€ _y๋ฅผ ๋ถ™์ž„. ๊ทธ๋ž˜์„œ ์œ„ ๊ทธ๋ฆผ์— ๋ณด๋ฉด value_x์™€ value_y๋กœ ๊ตฌ๋ถ„๋˜์–ด ์žˆ์Œ ใ…‡ใ…‡)

 

df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=('_1', '_2'))

 

- value_1๊ณผ value_2๋กœ ๋‚˜๋ˆ ์ง - 

 


4. join

→ ๊ธฐ๋ณธ์ ์œผ๋กœ merge์™€ ์›๋ฆฌ๊ฐ€ ๋˜‘๊ฐ™๋‹ค. ํ•œ ๊ธฐ์ค€์— ์˜ํ•ด ๋‘ dataframe์„ ํ•ฉ์น˜๋Š” ํ•จ์ˆ˜

→ ๊ทธ๋ ‡๋‹ค๋ฉด merge์™€์˜ ์ฐจ์ด๋Š”? merge๋Š” ๋‚ด๊ฐ€ ์›ํ•˜๋Š” column์„ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์น  ์ˆ˜ ์žˆ์ง€๋งŒ, join์€ index๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์น  ์ˆ˜ ์žˆ๋‹ค!

(ํ•˜์ง€๋งŒ ๊ทธ๋ ‡๋‹ค๊ณ  index๋งŒ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์น  ์ˆ˜ ์žˆ๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ, on ์ธ์ž๋กœ ์›ํ•˜๋Š” column์„ ๊ธฐ์ค€์œผ๋กœ dataframe์„ ํ•ฉ์น  ์ˆ˜ ์žˆ๋‹ค)

(๋˜ merge ์ž…์žฅ์—์„œ๋„ left_index์™€ right_index๋ฅผ ๊ฐ๊ฐ True๋กœ ๋ฐ”๊ฟ”์ฃผ๋ฉด index๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์น˜๊ธฐ ๊ฐ€๋Šฅ)

 merge์˜ ํ•˜์œ„ํ˜ธํ™˜ ๋ฒ„์ „์ด๋ผ join๋ณด๋‹ค๋Š” merge ์ถ”์ฒœ...!

 join์„ ๋ถ™์ด๋Š” dataframe์ด left, join์•ˆ์˜ other parameter์— ๋“ค์–ด๊ฐ€๋Š” dataframe์ด right dataframe

 

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

 

"Join columns of another DataFrame. Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list."

 

ex) index๊ฐ€ ์ž๋™์œผ๋กœ join ๊ธฐ์ค€์ด ๋˜๋Š” join!

 

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                    'valuel': [1, 2, 3, 5]}, index=['1','4','5','6'])
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                    'valuer': [5, 6, 7, 8]}, index=['2','3','5','7'])
                    
df1.join(df2)

 

- ๊ณตํ†ต index 5๋ฅผ ๊ธฐ์ค€์œผ๋กœ left-join ํ˜•ํƒœ ์ถœ๋ ฅ! - 

 


* ์ถœ์ฒ˜1) https://www.datasciencemadesimple.com/join-merge-data-frames-pandas-python/

* ์ถœ์ฒ˜2) https://stackoverflow.com/questions/15819050/pandas-dataframe-concat-vs-append

๋Œ“๊ธ€