Data Science Fundamentals/Pandas&Numpy and Questions

map & applymap & apply(for dataframe & Series)

metamong 2024. 6. 2.

1. apply

🔺 pandas.DataFrame.apply 🔻(apply for dataframe)

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)

"Apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument."

→ axis default parameter - 0 column 방향 / 1이면 row 기준으로 함수 적용. apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

① dataframe 전체 apply 적용

ex1) dataframe 전체 모든 column에 정수를 모조리 2를 곱해준다 - apply 적용

df = pd.DataFrame({'col_one':[1.1,2.2,3.3], 'col_two':[4.4,5.5,6.6], 'col_three':[7.7,8.8,9.9]})

def multiply_by_two(num):
    return(2*num)
    
df.apply(multiply_by_two)

'''
	col_one	col_two	col_three
0	2.2	8.8	15.4
1	4.4	11.0	17.6
2	6.6	13.2	19.8
'''

: dataframe 전체에 apply를 적용했으므로 return type은 당연히 return type은 dataframe

② dataframe 특정 column apply 적용

ex2) iris dataframe의 'sepal_length' 라는 특정 column에 모든 data integer형을 string - 즉 object형으로 바꾸고 싶다면 - apply 적용

def toString(integer):
    return(str(integer)) #defining a func
    
iris['sepal_length'] = iris['sepal_length'].apply(toString)

iris.dtypes

'''
sepal_length     object
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object
'''

: series 한 column에 apply를 적용했으므로 당연히 return type은 series

③ Series 전체 apply 적용

🔺 pandas.Series.apply🔻(apply for Series)

docu

Series.apply(func, convert_dtype=True, args=(), **kwargs)

"Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values."

2. applymap(for dataframe)

🔺 pandas.DataFrame.applymap🔻

docu

DataFrame.applymap(func, na_action=None, **kwargs)

'Apply a function to a Dataframe elementwise. This method applies a function that accepts and returns a scalar to every element of a DataFrame.'

→ 즉, apply와 다르게 칼럼별, row별로 dataframe의 일부를 나누어서 원하는 함수를 적용할 수는 없다. na_action 값에 ignore 넣으면 Na값은 생략한 채로 나머지 data에 대해서 진행

ex) 모든 element들에 (elementwise) 각각 자체 생성 lambda 함수를 적용해서 return

df.applymap(lambda x: x**2)

→ 또한, apply와 다르게 applymap은 aggregation 연산을 수행할 수 없다. 주어진 dataframe에서 각 row의 summation / mean을 적용하는 aggregaton은 apply만 가능하다. 아래 apply 예시 참조

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Applying sum aggregation column-wise
column_sum = df.apply(lambda x: x.sum())

# Applying mean aggregation row-wise
row_mean = df.apply(lambda x: x.mean(), axis=1)

+ 하지만 applymap() 함수는 각 element 별로 적용하는 함수이기 때문에 aggregating해서 각 row나 column 별로 한번에 연산을 수행하는 기능은 진행할 수 없다.

3. map(for Series)

🔺 pandas.Series.map 🔻

docu

Series.map(arg, na_action=None)

"Map values of Series according to an input mapping or function. Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series."

→ 쉽게 알 수 있듯이 apply와 applymap과 다르게 map은 Series에만 적용 (즉 dataframe 내의 한 column에만 적용 가능). na_action인자는 위 함수와 의미 동일. 그리고 다른 점은 map accepts a dict or a Series - map 자체에 dict나 Series 형태를 넣어서 임의적으로 전체 data 조작도 가능. 이게 applymap과 apply와의 결정적인 차이점. applymap과 apply는 call하는 function 이름만 입력으로 넣을 수 있지만, map에는 dict와 다른 Series type을 넣어서 원본 Series data 자체를 변경할 수 있다. 또는 map 함수 자체로 새로운 dataframe 내에 새로운 column을 만들 수도 있다.

ex1) 아래와 같은 dataframe이 있다면 col_one column만 선택해서(한 Series) data 모두 세제곱 적용

pd.DataFrame({'col_one':[1.21,4.84,10.89],
             'col_two':[19.36, 30.25, 43.56],
             'col_three':[59.29, 77.44, 98.01]},
             index=['0','1','2'])

df['col_one'].map(lambda x: x**3)

'''
0     1.331
1    10.648
2    35.937
Name: col_one, dtype: float64
'''

ex2) dict형태의 인자를 집어넣어서 바꿀 값을 key값, 원하는 값을 dictionary의 value로 설정해서 변환

→ col_one이라는 Series를 df_data dataframe에서 찾아서 1.21이라는 값을 찾아서 2로 바꾸고, 4.84라는 값을 3으로 바꾸고, 마지막으로 10.89라는 값을 찾아서 4로 바꾼다.

df_data['col_one'].map({1.21:2, 4.84:3, 10.89:4})

'''
0    2
1    3
2    4
Name: col_one, dtype: int64
'''

4. conclusion

ⓛ apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize)).

② applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))

③ map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))

※ 표로 요약 정리하자면 ※

Naver AI Study

stackoverflow / docu

저작자표시 비영리 변경금지 (새창열림)

'Data Science Fundamentals > Pandas&Numpy and Questions' 카테고리의 다른 글

🥰 StrataScratch PythonPandas Easy I - 2 Solved (0)	2025.03.16
🥰 StrataScratch PythonPandas Medium I - 18 Solved (0)	2025.03.09
dataframe 꾸미기 (1)	2023.01.22
Numpy fundamentals 2/2 (0)	2023.01.16
pandas Tricks (Kevin by DataSchool) 완료! COMPILATION (0)	2022.04.18

map & applymap & apply(for dataframe & Series)

1. apply

2. applymap(for dataframe)

3. map(for Series)

4. conclusion

'Data Science Fundamentals > Pandas&Numpy and Questions' 카테고리의 다른 글

댓글

티스토리툴바