Python/Pandas&Numpy

pandas Tricks_07👉🏻 'Filtering - isin & tilde(~)&nlargest' (Kevin by DataSchool)

metamong 2022. 3. 31.

Q. dataframe 자체 내에 연산자를 써서 condition으로 data를 나눌 수 있다. 이 때 isin method & tilde를 써서 좀 더 깔끔하게! filtering 가능

A. condition에 isin method 사용, 그리고 tilde(~)를 condition 맨 앞에 붙이기

∬ isin docu ∬

DataFrame.isin(values)

Q) 타이타닉호 탑승자 중 Southampton & Queenstown 출신 / 이 두 지역이 아닌 사람들 data filtering하기

> titanic dataset 준비 & 'embark_town' column data 확인하기
(unique()를 사용하면 어떤 data가 있는지 numpy array 형태로 알 수 있다)

tt = sns.load_dataset('titanic')

tt['embark_town'].unique()
#array(['Southampton', 'Cherbourg', 'Queenstown', nan], dtype=object)

1> isin을 사용하여 embark_town 중 Southampton과 Queenstown data만 filtering하기

tt[tt.embark_town.isin(['Southampton','Queenstown'])]['embark_town'].unique()
#array(['Southampton', 'Queenstown'], dtype=object)

2> 앞에 ~를 붙여 embark_town 중 Southampton과 Queenstown data가 아닌 row만 filtering하기

(tilde는 python에서 operator가 아니기에 code로 적용이 가능하다)

tt[~tt.embark_town.isin(['Southampton','Queenstown'])]['embark_town'].unique()
#array(['Cherbourg', nan], dtype=object)

* nlargest를 써서 특정 column 값이 (numerical values만 가능) 제일 큰 x개를 갖는 dataframe 반환

¶ nlargest docu ¶

(의미만 반대인 nsmallest도 존재)

DataFrame.nlargest(n, columns, keep='first')

Q) 타이타닉호 탑승자 중 가장 비싼 요금을 낸 다섯 명의 사람들 성별, 생존여부, 나이 알아보기

> nlargest 반환형은 dataframe

(두 개 이상의 columns로 한 column으로 출력 - 그 다음 column으로 출력하려면 columns인자 활용)

data.loc[tt.nlargest(5,'fare').index,['fare','survived','age','sex']]

- 무려 80%의 생존율.. 19살 남성만.. ㅠㅠ -

pandas Tricks_08👉🏻 'missing values - dropna() & isna() (advanced)' (Kevin by DataSchool) (0)	2022.04.09
concat & append & merge & join (0)	2022.04.09
pandas Tricks_05 & 06👉🏻 'Create a DataFrame from the clipboard & Split a DataFrame into 2 random subsets' (Kevin by DataSchool) (0)	2022.03.30
pandas Tricks_04 👉🏻 'Build a DataFrame from multiple files (row-wise & column-wise) ' (Kevin by DataSchool) (0)	2022.03.25
pandas Tricks_03 👉🏻 'Convert Strings→numbers ' (Kevin by DataSchool) (0)	2022.03.25

댓글