Python data analysis - drawing-2-Seaborn advanced drawing-4-classification diagram

Keywords: Python Data Analysis data visualization

1, Classified scatter diagram

1.stripplot

Function: seaborn.stripplot

Common parameters:

x,y,hueReceive the variable name in data to represent the selected drawing variable, hue pass in the classification variable to classify the color.
dataReceive DataFrame, array, list and series to represent the data set used for drawing.
order,order_hueReceive a list of strings to specify the drawing classification level.
jitterReceive float, True or 1, and add uniform random noise to optimize the graphic display. The default is True
dodgebool, indicating whether to separate along the classification axis when using classification nesting. The default is False
orientReceive v or h, indicating the direction of the graph.
tips=sns.load_dataset('tips')
fig,ax=plt.subplots(1,2,figsize=(8,4))
#Add random noise
sns.stripplot(x='day',y='total_bill',data=tips,ax=ax[0])
#No random noise is added
sns.stripplot(x='day',y='total_bill',data=tips,jitter=False,ax=ax[1])

Use the multi classification function:

sns.stripplot(x='day',y='total_bill',hue='sex',data=tips)

 

Modify the dodge parameter so that the variables are overlaid along the classification axis instead of overlapping:

sns.stripplot(x='day',y='total_bill',hue='sex',dodge=True,data=tips)

 

2.violinplot

Violin diagram is a combination of box diagram and kernel density estimation diagram. Compared with the box graph, it can not only display the statistical characteristics displayed on the graph, but also display the distribution of data.

  Function: seaborn.violinplot

Common parameters:

bwReceive "scott", "silverman" and float, indicating the selected drawing variables. The default is "scott"“
cutReceive float, control the density of the violin chart shell extending beyond the internal extreme data points, and set it to 0 to limit the violin chart range to the range of observation data. The default is 2
scaleReceive "area" "count""width", which is used to adjust the broadband of the graph. The default is "area"“
scale_hue

bool, when the classification is nested, determines whether the scaling is at each level of the main grouping variable or at all levels on the graph. The default is True.

gridsizeint, indicating the number of points in the discrete mesh used for kernel density calculation. The default is 100
innerReceive "box", "quartile", "point","stick",None, indicating the form of data points in the graph. The default is "box"
splitbool, indicating whether to draw a violin for each level when two types are nested. The default is False.
sns.set_style('whitegrid')
sns.violinplot(x="day",y="total_bill",data=tips)

sns.violinplot(x="day",y="total_bill",hue="sex",data=tips)

 

Pass in the split parameter to split the violin diagram:

sns.violinplot(x="day",y="total_bill",hue="sex",data=tips,split=True)

 

Adjust the width of violin chart and change the drawing method of quartile:

sns.violinplot(x="day",y="total_bill",hue="sex",data=tips,split=True,inner='quartile',scale='count',palette='Set2')

 

  Combine the classified scatter diagram with the violin diagram:

sns.violinplot(x="day",y="total_bill",data=tips,inner=None)
sns.stripplot(x="day",y="total_bill",data=tips,color="w",alpha=0.5)

3.boxenplot

The enhanced boxplot provides more information about the distribution shape by drawing more quantiles   Information about. It avoids the disadvantage that there is little information outside the quartile in the box diagram and a large number of extreme values will be displayed when the amount of data is large.

Function: seaborn.boxenplot

Special parameters:

k_depth"Proportion", "Tukey" and "trustworthy" indicate the expanded proportion of different boxes.
scale"linear""exponential""area" indicates the method of displaying the box width.
fig,ax=plt.subplots(1,2,figsize=(8,4))
sns.boxplot(x=tips["total_bill"],ax=ax[0])
sns.boxenplot(x=tips["total_bill"],ax=ax[1])

The enhanced box plot shows a wider quantile information and shows the corresponding distribution through the width, so as to accept more outlier information and reduce information loss.

fig,axes=plt.subplots(1,3,figsize=(12,4))
sns.boxenplot(x="day",y="total_bill",data=tips,k_depth="proportion",ax=axes[0])
sns.boxenplot(x="day",y="total_bill",data=tips,k_depth="tukey",ax=axes[1])
sns.boxenplot(x="day",y="total_bill",data=tips,k_depth="trustworthy",ax=axes[2])

 

4.pointplot

The point graph plots the point estimates and confidence intervals. The point graph is used to gather the comparison between different levels of one or more classification variables. Using the degree of line inclination, it can well show the changes of the relationship between different levels of one classification variable in different levels of other classification variables.

Function: seaborn.pointplot

sns.set_style('darkgrid')
fig,axes=plt.subplots(1,2,figsize=(8,4))
sns.pointplot(x="time",y="total_bill",data=tips,ax=axes[0])
#Errwidth, cap size, receive float, indicating the thickness and width of the error bar cap.
sns.pointplot(x="time",y="total_bill",data=tips,errwidth=4,capsize=0.2,ax=axes[1])

Draw nested group point diagram:

sns.pointplot(x="time",y="total_bill",hue="sex",data=tips,dodge=True,palette="Set1")

 

Set the join parameter to cancel the segment connecting two points:

sns.set_style('darkgrid')
fig,axes=plt.subplots(1,2,figsize=(8,4))
sns.pointplot(x="day",y="total_bill",data=tips,ax=axes[0])
#Errwidth, cap size, receive float, indicating the thickness and width of the error bar cap.
sns.pointplot(x="day",y="total_bill",data=tips,join=False,ax=axes[1])

 

Change the centralized trend estimation method from average to median:

import numpy as np
sns.pointplot(x="day",y="tip",data=tips,estimator=np.median)

 

 5.countplot

The count chart is used to display the number of observations per category. It can be considered as a histogram applied to categorical variables and comparing count differences between categories.

Function: seaborn.countlot

fig,axes=plt.subplots(1,2,figsize=(8,4))
sns.countplot(x="sex",data=tips,ax=axes[0])
sns.countplot(y="sex",data=tips,ax=axes[1])

Multi category nested count chart:

sns.countplot(x="sex",hue="smoker",data=tips,palette="Set2")

 

6.catplot

Similar to relplot in the relational graph, it can access all functions in the classification graph uniformly.

Function: seaborn.catplot

Common parameters:

x,y
data
row_wrapint indicates the number of columns in the grid graph. The default value is None.
legend_outbool, whether to draw the legend on the right side of the center. The default is True.
share{x,y}bool, indicating whether to share the x or y axis across rows or columns. The default is True.
margin_titlesbool, indicating whether to draw the title of the row variable to the right of the last column. The default is False.
kindReceive "strip" "swarm" "box" "Violin" "box" "point" "bar" "count", select the corresponding drawing function, and the default is "strip"

 

sns.catplot(x="day",y="total_bill",col="time",data=tips,jitter=True)

Add another variable:

sns.catplot(x="day",y="total_bill",hue="sex",col="time",data=tips,jitter=True)

 

Draw a violin diagram without sharing the y axis:

sns.set_style("whitegrid")
sns.catplot(x="day",y="total_bill",hue="sex",col="time",data=tips,kind="violin",split="True",sharey=False)

 

  Draw the enhanced box diagram, change the grid width, and set that only two diagrams are displayed in each column.

sns.catplot(x="time",y="total_bill",hue="sex",col="day",data=tips,kind="boxen",col_wrap=2,margin_titles=True)

Draw the count chart and adjust the image size.

sns.catplot(x="day",hue="sex",col="time",data=tips,kind="count",height=4,aspect=1)

 

 

Posted by Gamerz on Sat, 04 Dec 2021 22:34:19 -0800