Visualizations/Various Graphs

seaborn plots - displot, pairplot, regplot

metamong 2022. 6. 10.

๐Ÿคน๐Ÿป‍โ™€๏ธ ๊ฐ•๋ ฅํ•œ ์‹œ๊ฐํ™” library seaborn์˜ ์„ธ ๊ฐœ์˜ ๊ทธ๋ž˜ํ”„ displot, pariplot, regplot์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณด๋ ค ํ•œ๋‹ค.

 

๐Ÿก seaborn.displot() docu

https://seaborn.pydata.org/generated/seaborn.displot.html

 

๐Ÿก seaborn.pairplot() docu

https://seaborn.pydata.org/generated/seaborn.pairplot.html

 

๐Ÿก seaborn.regplot() docu

https://seaborn.pydata.org/generated/seaborn.regplot.html

1> seaborn.displot

seaborn.displot(data=None, *, x=None, y=None, hue=None, row=None, col=None, weights=None, kind='hist', rug=False, rug_kws=None, log_scale=None, legend=True, palette=None, hue_order=None, hue_norm=None, color=None, col_wrap=None, row_order=None, col_order=None, height=5, aspect=1, facet_kws=None, **kwargs)

 

'This function provides access to several approaches for visualizing the univariate or bivariate distribution of data, including subsets of data defined by semantic mapping and faceting across multiple subplots.'

 

ํ•œ ๊ฐœ์˜ ๋ณ€์ˆ˜ ๋˜๋Š” ๋‘ ๊ฐœ์˜ ๋ณ€์ˆ˜ ๊ฐ„์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๋Š” displot() ํ•จ์ˆ˜์ด๋‹ค.

 

 

Data Analysis with Python (2/2) (from Coursera)

4) Model Development * A Model = a mathematical equation used to predict a value given one or more other values - Relating one or more independent variables to dependant variables (ex) 'high..

sh-avid-learner.tistory.com

 

 

→ ์œ„ coursera posting์—์„œ ๋‘ ๋ณ€์ˆ˜์˜ ๋ถ„ํฌ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ seaborn์˜ distplot์„ ์‚ฌ์šฉํ•˜์˜€๋Š”๋ฐ, seaborn ๋ฒ„์ „์ด ์—…๊ทธ๋ ˆ์ด๋“œ๋˜๋ฉด์„œ displot()์œผ๋กœ ํ˜•ํƒœ๊ฐ€ ๋ฐ”๋€Œ์—ˆ๋‹ค. ์—…๊ทธ๋ ˆ์ด๋“œ๋˜๋ฉด์„œ ๋” ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ๋ถ„ํฌ๋ฅผ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋จ!

 

- univariate -

→ ์ธ์ž kind์— ์„ธ ๊ฐœ์˜ ๊ฐ’์„ ๋„ฃ์„ ์ˆ˜ ์žˆ๋‹ค

โ‘  'hist(defalut)' - ํžˆ์Šคํ† ๊ทธ๋žจ ์ƒ์„ฑ 

โ‘ก 'kde(kernel density estimates)' - ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๊ฒ‰ ํ…Œ๋‘๋ฆฌ ์„ ์œผ๋กœ ๊ทธ์€ ๊ทธ๋ž˜ํ”„๋ผ ์ƒ๊ฐํ•˜๋ฉด ํŽธํ•˜๋‹ค.

โ‘ข 'ecdf(empirical cumulative distribution functions)' - ๋ˆ„์  ํžˆ์Šคํ† ๊ทธ๋žจ์˜ ๋ˆ„์ ๊ฐ’์„ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ž˜ํ”„์ด๋‹ค. (univariate๋งŒ ํ‘œํ˜„ ๊ฐ€๋Šฅ)

 

→ ์œ„ 1~3 ๊ทธ๋ž˜ํ”„ ์˜ˆ์‹œ๋ฅผ ๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Œ (์™ผ์ชฝ๋ถ€ํ„ฐ histogram, kde, ecdf)

 

 

ํžˆ์Šคํ† ๊ทธ๋žจ์— kde ๋ผ์ธ์„ ์ถ”๊ฐ€ํ•ด์„œ ๊ฒน์นœ ํ˜•ํƒœ๋กœ ์‹œ๊ฐํ™”๋„ ๊ฐ€๋Šฅ! (kde = True๋กœ ์ธ์ž๋ฅผ ๋„ฃ์œผ๋ฉด ๋จ)

 

 

- bivariate -

 

โ‘  default๋กœ heatmap๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๊ฐ€ ์ถœ๋ ฅ๋œ๋‹ค (๋‘ ๋ณ€์ˆ˜ ํŠน์ • ๊ตฌ๊ฐ„์— ํ•ด๋‹น๋˜๋Š” count๊ฐ€ ๋†’์œผ๋ฉด ๋” ์ง„ํ•˜๊ฒŒ ์น ํ•ด์ง€๋Š” ํ˜•ํƒœ๋กœ ์‹œ๊ฐํ™”๋จ!)

โ‘ก kind='kde'๊ฐ’์„ ๋„ฃ์œผ๋ฉด ๋“ฑ๊ณ ์„ ์˜ ํ˜•ํƒœ๋กœ ๋นˆ๋„๊ฐ€ ๋†’์€ ์ชฝ์— ์„ ์ด ๋งŽ์ด ๋ฐ€์ง‘๋˜์–ด ์žˆ๋Š” ํ˜•ํƒœ๋กœ ์ถœ๋ ฅ๋จ

(์ถ”๊ฐ€๋กœ rug๊ฐ’์— True๋ฅผ ๋„ฃ์œผ๋ฉด ๊ธฐ์กด ๋“ฑ๊ณ ์„  ๊ทธ๋ž˜ํ”„์—์„œ ๋ฐ€์ง‘๋œ data๋ฅผ ๊ฐ ์ถ• ๋ณ„๋กœ ์„ ์„ ํ†ตํ•ด ๋ณด์—ฌ์คŒ)

 

 

โ˜… hue๊ฐ’์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ๊ทธ๋ฃน์˜ data๋ฅผ ํ•œ ๋ฒˆ์— ํ•œ ๊ฐœ์˜ ๊ทธ๋ž˜ํ”„๋กœ ๋ถ„ํฌ ๋น„๊ต๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ! (โ€ป๋งค์šฐ ์ค‘์š” ์ธ์ž) โ˜…

multiple = 'stack' ๊ฐ’์„ ๋„ฃ์–ด ํ•˜๋‹จ ์šฐ์ธก๊ณผ ๊ฐ™์ด ๊ฒน์นœ ํžˆ์Šคํ† ๊ทธ๋žจ ํ˜•ํƒœ๋กœ๋„ ํ‘œํ˜„ ๊ฐ€๋Šฅ (๋”ฑํžˆ ์‚ฌ์šฉํ•˜์ง€๋Š” ์•Š์Œ!)

 

 

โ˜… ํ•œ ๋ฒˆ์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ทธ๋ž˜ํ”„, ์ฆ‰ ๋‹ค์–‘ํ•œ facet ํ˜•ํƒœ๋ฅผ ์ถœ๋ ฅํ•ด์ฃผ์–ด ์‹œ๊ฐํ™”๋„ ๊ฐ€๋Šฅํ•˜๋‹ค

โ€ป ๋‹ค์ค‘ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜๋ˆŒ ๊ธฐ์ค€์„ col ์ธ์ž๋กœ ์„ค์ •ํ•ด ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Œ

 

2> seaborn.pairplot

seaborn.pairplot(data, *, hue=None, hue_order=None, palette=None, vars=None, x_vars=None, y_vars=None, kind='scatter', diag_kind='auto', markers=None, height=2.5, aspect=1, corner=False, dropna=False, plot_kws=None, diag_kws=None, grid_kws=None, size=None)

 

'Plot pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each numeric variable in data will by shared across the y-axes across a single row and the x-axes across a single column. The diagonal plots are treated differently: a univariate distribution plot is drawn to show the marginal distribution of the data in each column. It is also possible to show a subset of variables or plot different variables on the rows and columns.'

 

→ ์œ„์—์„œ ๋ฐฐ์šด displot์˜ ๊ฒฝ์šฐ ๋‹จ์ˆœํžˆ ๋ถ„ํฌ ์ž์ฒด๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋Š” ์‹œ๊ฐํ™”์˜€์ง€๋งŒ, pairplot์˜ ๊ฒฝ์šฐ ๋‘ ๋ณ€์ˆ˜๋ผ๋ฆฌ์˜ ๊ด€๊ณ„๋„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ถ”๊ฐ€๋กœ, ์™ผ์ชฝ ๋Œ€๊ฐ์„ ์— ๋†“์ธ ๊ทธ๋ž˜ํ”„๋“ค์€ ๊ฐ ๋ณ€์ˆ˜๋ณ„ ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์œผ๋กœ, displot๋ณด๋‹ค ํ›จ์”ฌ ๋” ๋งŽ์€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” ์‹œ๊ฐํ™” plot์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ!

 

โ˜… ๋‘ ๋ณ€์ˆ˜ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ทธ๋ž˜ํ”„๋Š” default๋กœ scatterplot ์‹œ๊ฐํ™”๊ฐ€ ์ œ๊ณต๋˜๋ฉฐ,

๋Œ€๊ฐ์„  ํ˜•ํƒœ์˜ plot์€ default histogram / hue๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๋ฉด kde ๊ทธ๋ž˜ํ”„๊ฐ€ ์ถœ๋ ฅ๋œ๋‹ค

(์ด ๋•Œ, diag_kind๊ฐ’์— ๋‹ค๋ฅธ ๊ทธ๋ž˜ํ”„ ํƒ€์ž…์„ hist๊ฐ’์œผ๋กœ ๋„ฃ์–ด์„œ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ๋„ ์ถœ๋ ฅ ๊ฐ€๋Šฅํ•˜๋‹ค)

 

 

โ˜… kind๊ฐ’์— kde, hist๋ฅผ ๋„ฃ์–ด scatterplot์ด ์•„๋‹Œ ๋‹ค๋ฅธ ํ˜•ํƒœ๋กœ ๋‘ ๋ณ€์ˆ˜ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์‹œ๊ฐํ™”ํ•  ์ˆ˜๋„ ์žˆ๋‹ค

 

 

โ˜… ๊ทธ ์™ธ๋กœ, x_vars์™€ y_vars์— ์ผ๋ถ€ ๋ณ€์ˆ˜๋งŒ ์ง€์ •ํ•ด ์ „์ฒด๊ฐ€ ์•„๋‹Œ pairplot์˜ ์ผ๋ถ€๋งŒ ์‹œ๊ฐํ™”๋„ ๊ฐ€๋Šฅํ•˜๋ฉฐ,

markers๊ฐ’์„ ์„ธ๋ถ€์ ์œผ๋กœ ์„ค์ •ํ•ด scatterplot์˜ ์  ํ˜•ํƒœ๋ฅผ ๋‹ฌ๋ฆฌ ๋ฐ”๊ฟ€ ์ˆ˜๋„ ์žˆ์Œ!

+ height๋กœ ๊ฐ ์„ธ๋ถ€ ๊ทธ๋ž˜ํ”„๋“ค ๋†’์ด๋„ ์„ค์ • ๊ฐ€๋Šฅ

+ corner = True๊ฐ’์œผ๋กœ ์™ผ์ชฝ ๋Œ€๊ฐ์„  ๊ธฐ์ค€ ์•„๋ž˜ ๊ทธ๋ž˜ํ”„ (๋Œ€๊ฐ์„  ํ”Œ๋กฏ ํฌํ•จ)๋งŒ ์‹œ๊ฐํ™” ์„ค์ • ๊ฐ€๋Šฅ

 

๊ทธ ์™ธ๋กœ dict ํ˜•ํƒœ๋กœ ์ธ์ž๋ฅผ ์ง‘์–ด๋„ฃ์–ด customize ๊ฐ€๋Šฅ!

์ถ”๊ฐ€์ ์ธ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ๋Š” ์ดํ›„ ํ”„๋กœ์ ํŠธ์—์„œ ์—ฌ๋Ÿฌ ์‹œ๊ฐํ™”๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ ์†Œ๊ฐœํ•ด ๋ณผ ์˜ˆ์ •์ด๋‹ค ๐Ÿ„‍โ™‚๏ธ

3> seaborn.regplot

seaborn.regplot(*, x=None, y=None, data=None, x_estimator=None, x_bins=None, x_ci='ci', scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, seed=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=True, dropna=True, x_jitter=None, y_jitter=None, label=None, color=None, marker='o', scatter_kws=None, line_kws=None, ax=None)

 

'Plot data and a linear regression model fit.'

 

→ ์˜ˆ์ „์— linear regression ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ๋ฐฐ์šด ์ ์ด ์žˆ๋‹ค.

 

 

Simple Linear Regression (concepts)

** ์šฐ๋ฆฌ๋Š” ์ €๋ฒˆ์‹œ๊ฐ„์— Supervised Learning - Regression - Linear Regression๊นŒ์ง€ concepts์— ๋Œ€ํ•ด ๋ฐฐ์› ๋‹ค (↓↓↓↓↓↓ ํ•˜๋‹จ ํฌ์ŠคํŒ… ์ฐธ์กฐ ↓↓↓↓↓↓) ML Supervised Learning → Regression → Linear Regr..

sh-avid-learner.tistory.com

 

→ ์ฃผ์–ด์ง„ data๊ฐ€ ์žˆ์œผ๋ฉด data์˜ ์ถ”์„ธ๋ฅผ ๋ฐ˜์˜ํ•ด ํ•œ ๊ฐœ์˜ ์„ ์œผ๋กœ ์ž๋™์œผ๋กœ ์‹œ๊ฐํ™” ํ•ด์ฃผ๋Š” plot (coursera ํฌ์ŠคํŒ…์—์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฐฐ์›€)

 

 

→ ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ์•„๋ž˜ ์ขŒ์ธก๊ณผ ๊ฐ™์ด ๊ทธ๋ž˜ํ”„๊ฐ€ ํ‘œํ˜„๋˜๊ณ , confidence interval ์ˆ˜์น˜๋ฅผ ๋” ์ค„์—ฌ์„œ ์„ ์ด ํฌํ•จ๋˜๋Š” ๋ฒ”์œ„๋ฅผ ๋” ์ค„์—ฌ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค (์šฐ์ธก)

 

 

logistic ์ธ์ž๋ฅผ ์„ค์ •ํ•จ์œผ๋กœ์จ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ชจ๋ธ๋กœ ์‹œ๊ฐํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๊ธฐ๋„ ํ•จ!


๐Ÿคพ ์ด ์„ธ ๊ฐ€์ง€์˜ seaborn plot์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์•˜๋Š”๋ฐ, ํ›„์— ์ง์ ‘ EDA ๊ณผ์ • ๋˜๋Š” ๋ถ„์„๊ณผ์ •์—์„œ ์—ฌ๋Ÿฌ ์‹œ๊ฐํ™”๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๋งŽ์ด ํ™œ์šฉ์ด ๋  ๋“ฏ ํ•˜๋‹ค!

 

- ๊ฐœ๋… ์ •๋ฆฌ ๋! ๐Ÿ†-

'Visualizations > Various Graphs' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Visualization - Graphs summarized  (0) 2022.05.02
violin plot (+seaborn)  (0) 2022.03.27
box plot (+seaborn)  (0) 2022.03.25
folium ์‹œ๊ฐํ™”  (0) 2022.03.24

๋Œ“๊ธ€