Python data analysis - drawing-2-Seaborn advanced drawing-4-classification diagram

Keywords: Python Data Analysis data visualization

1, Classified scatter diagram

1.stripplot

Function: seaborn.stripplot

Common parameters:

x,y,hue	Receive the variable name in data to represent the selected drawing variable, hue pass in the classification variable to classify the color.
data	Receive DataFrame, array, list and series to represent the data set used for drawing.
order,order_hue	Receive a list of strings to specify the drawing classification level.
jitter	Receive float, True or 1, and add uniform random noise to optimize the graphic display. The default is True
dodge	bool, indicating whether to separate along the classification axis when using classification nesting. The default is False
orient	Receive v or h, indicating the direction of the graph.

tips=sns.load_dataset('tips')
fig,ax=plt.subplots(1,2,figsize=(8,4))
#Add random noise
sns.stripplot(x='day',y='total_bill',data=tips,ax=ax[0])
#No random noise is added
sns.stripplot(x='day',y='total_bill',data=tips,jitter=False,ax=ax[1])

Use the multi classification function:

sns.stripplot(x='day',y='total_bill',hue='sex',data=tips)

Modify the dodge parameter so that the variables are overlaid along the classification axis instead of overlapping:

sns.stripplot(x='day',y='total_bill',hue='sex',dodge=True,data=tips)

2.violinplot

Violin diagram is a combination of box diagram and kernel density estimation diagram. Compared with the box graph, it can not only display the statistical characteristics displayed on the graph, but also display the distribution of data.

Function: seaborn.violinplot

Common parameters:

bw	Receive "scott", "silverman" and float, indicating the selected drawing variables. The default is "scott"“
cut	Receive float, control the density of the violin chart shell extending beyond the internal extreme data points, and set it to 0 to limit the violin chart range to the range of observation data. The default is 2
scale	Receive "area" "count""width", which is used to adjust the broadband of the graph. The default is "area"“
scale_hue	bool, when the classification is nested, determines whether the scaling is at each level of the main grouping variable or at all levels on the graph. The default is True.
gridsize	int, indicating the number of points in the discrete mesh used for kernel density calculation. The default is 100
inner	Receive "box", "quartile", "point","stick",None, indicating the form of data points in the graph. The default is "box"
split	bool, indicating whether to draw a violin for each level when two types are nested. The default is False.

sns.set_style('whitegrid')
sns.violinplot(x="day",y="total_bill",data=tips)

sns.violinplot(x="day",y="total_bill",hue="sex",data=tips)

Pass in the split parameter to split the violin diagram:

sns.violinplot(x="day",y="total_bill",hue="sex",data=tips,split=True)

Adjust the width of violin chart and change the drawing method of quartile:

sns.violinplot(x="day",y="total_bill",hue="sex",data=tips,split=True,inner='quartile',scale='count',palette='Set2')

Combine the classified scatter diagram with the violin diagram:

sns.violinplot(x="day",y="total_bill",data=tips,inner=None)
sns.stripplot(x="day",y="total_bill",data=tips,color="w",alpha=0.5)

3.boxenplot

The enhanced boxplot provides more information about the distribution shape by drawing more quantiles Information about. It avoids the disadvantage that there is little information outside the quartile in the box diagram and a large number of extreme values will be displayed when the amount of data is large.

Function: seaborn.boxenplot

Special parameters:

k_depth	"Proportion", "Tukey" and "trustworthy" indicate the expanded proportion of different boxes.
scale	"linear""exponential""area" indicates the method of displaying the box width.

fig,ax=plt.subplots(1,2,figsize=(8,4))
sns.boxplot(x=tips["total_bill"],ax=ax[0])
sns.boxenplot(x=tips["total_bill"],ax=ax[1])

The enhanced box plot shows a wider quantile information and shows the corresponding distribution through the width, so as to accept more outlier information and reduce information loss.

fig,axes=plt.subplots(1,3,figsize=(12,4))
sns.boxenplot(x="day",y="total_bill",data=tips,k_depth="proportion",ax=axes[0])
sns.boxenplot(x="day",y="total_bill",data=tips,k_depth="tukey",ax=axes[1])
sns.boxenplot(x="day",y="total_bill",data=tips,k_depth="trustworthy",ax=axes[2])

4.pointplot

The point graph plots the point estimates and confidence intervals. The point graph is used to gather the comparison between different levels of one or more classification variables. Using the degree of line inclination, it can well show the changes of the relationship between different levels of one classification variable in different levels of other classification variables.

Function: seaborn.pointplot

sns.set_style('darkgrid')
fig,axes=plt.subplots(1,2,figsize=(8,4))
sns.pointplot(x="time",y="total_bill",data=tips,ax=axes[0])
#Errwidth, cap size, receive float, indicating the thickness and width of the error bar cap.
sns.pointplot(x="time",y="total_bill",data=tips,errwidth=4,capsize=0.2,ax=axes[1])

Draw nested group point diagram:

sns.pointplot(x="time",y="total_bill",hue="sex",data=tips,dodge=True,palette="Set1")

Set the join parameter to cancel the segment connecting two points:

sns.set_style('darkgrid')
fig,axes=plt.subplots(1,2,figsize=(8,4))
sns.pointplot(x="day",y="total_bill",data=tips,ax=axes[0])
#Errwidth, cap size, receive float, indicating the thickness and width of the error bar cap.
sns.pointplot(x="day",y="total_bill",data=tips,join=False,ax=axes[1])

Change the centralized trend estimation method from average to median:

import numpy as np
sns.pointplot(x="day",y="tip",data=tips,estimator=np.median)

5.countplot

The count chart is used to display the number of observations per category. It can be considered as a histogram applied to categorical variables and comparing count differences between categories.

Function: seaborn.countlot

fig,axes=plt.subplots(1,2,figsize=(8,4))
sns.countplot(x="sex",data=tips,ax=axes[0])
sns.countplot(y="sex",data=tips,ax=axes[1])

Multi category nested count chart:

sns.countplot(x="sex",hue="smoker",data=tips,palette="Set2")

6.catplot

Similar to relplot in the relational graph, it can access all functions in the classification graph uniformly.

Function: seaborn.catplot

Common parameters:

x,y
data
row_wrap	int indicates the number of columns in the grid graph. The default value is None.
legend_out	bool, whether to draw the legend on the right side of the center. The default is True.
share{x,y}	bool, indicating whether to share the x or y axis across rows or columns. The default is True.
margin_titles	bool, indicating whether to draw the title of the row variable to the right of the last column. The default is False.
kind	Receive "strip" "swarm" "box" "Violin" "box" "point" "bar" "count", select the corresponding drawing function, and the default is "strip"

sns.catplot(x="day",y="total_bill",col="time",data=tips,jitter=True)

Add another variable:

sns.catplot(x="day",y="total_bill",hue="sex",col="time",data=tips,jitter=True)

Draw a violin diagram without sharing the y axis:

sns.set_style("whitegrid")
sns.catplot(x="day",y="total_bill",hue="sex",col="time",data=tips,kind="violin",split="True",sharey=False)

Draw the enhanced box diagram, change the grid width, and set that only two diagrams are displayed in each column.

sns.catplot(x="time",y="total_bill",hue="sex",col="day",data=tips,kind="boxen",col_wrap=2,margin_titles=True)

Draw the count chart and adjust the image size.

sns.catplot(x="day",hue="sex",col="time",data=tips,kind="count",height=4,aspect=1)

Posted by Gamerz on Sat, 04 Dec 2021 22:34:19 -0800

Programmer Group

Python data analysis - drawing-2-Seaborn advanced drawing-4-classification diagram

1, Classified scatter diagram

Hot Keywords