matplotlib [5] - scatter

Keywords: Big Data

You can draw a line chart into a scatter chart. Scatter diagram can reflect the correlation between two variables, that is, if there is a correlation, is it a positive linear relationship or a negative linear relationship? Even nonlinear?

plt.scatter()

plt.scatter(x, y, s=20, 
            c=None, marker='o', 
            cmap=None, norm=None, 
            vmin=None, vmax=None, 
            alpha=None, linewidths=None, 
            edgecolors=None)

X: specify the x-axis data of the scatter diagram;

Y: specify the y-axis data of scatter diagram;

s: specify the point size of the scatter chart, which is 20 by default. The bubble chart can be drawn by passing in new variables;

c: Specifies the color of scatter plot points, which is blue by default;

marker: Specifies the shape of scatter points, which is circular by default;

cmap: Specifies the color chart. It works only when the c parameter is a floating-point array;

norm: set the data brightness, standardize to 0 ~ 1, use this parameter still need c as floating-point array;

vmin, vmax: brightness setting, similar to norm. If norm is used, this parameter is invalid;

alpha: sets the transparency of the scatter;

linewidths: sets the width of the scatter boundary line;

edgecolors: sets the color of the scatter boundary line;

#Import module
import pandas as pd
import matplotlib.pyplot as plt

#Set drawing style
plt.style.use('ggplot')
#Set the normal display of Chinese code and minus sign
plt.rcParams['font.sans-serif']=['Microsoft YaHei']
plt.rcParams['axes.unicode_minus']=False
#Set warning elimination
import warnings
warnings.filterwarnings('ignore')

#####1)Simple scatter diagram############
#read in data
file=open(r'E:\Zhihu document preservation\python_scatter\cars.csv')
cars=pd.read_csv(file)
#Mapping
plt.scatter(
    cars.speed,#x-axis is the vehicle speed
    cars.dist,#y-axis is the braking distance
    s=30,#Set point size
    c='steelblue',#Dot color
    marker='s',#Shape of points
    alpha=0.9,#Transparency of points
    linewidths=0.3,#Set the thickness of the scatter boundary
    edgecolors='red'#Set the color of the scatter boundary
)
#Add axis labels and titles
plt.title('Relationship between vehicle speed and braking distance')
plt.xlabel('speed')
plt.ylabel('distance')

#Remove the top and right scale of the border
plt.tick_params(top='off',right='off')

#display graphics
plt.show()

#####2) group scatter drawing
#Case: petal dataset
#Read data
file1=open(r'E:\Zhihu document preservation\python_scatter\iris.csv')
iris=pd.read_csv(file1)

#Mapping
#First of all, the data set is analyzed. There are several kinds of spice. The length and width of these petals need to be drawn with dots. Different kinds need different colors
#Because it is a grouping, it can be processed with a for loop

#Custom colors
colors=['steelblue','#9999ff','#ff9999']
#Three different kinds of decors
Species=iris.Species.unique()
#for loop to complete the drawing of group scatter
for i in range(len(Species)):
    plt.scatter(iris.loc[iris.Species==Species[i],'Petal.Length'],
                iris.loc[iris.Species==Species[i],'Petal.Width'],
                s=35,
                c=colors[i],
                label=Species[i])

#Add title and axis labels
plt.title('The relationship between the width and length of different petals')
plt.xlabel('Petal length')
plt.ylabel('petal width')

#Remove top and right border scale from borders
plt.tick_params(top='off',right='off')
plt.legend(loc='upper left')
plt.show()

#####3) bubble chart
import numpy as np

#Read data
sales=pd.read_excel('E:\Zhihu document preservation\python_scatter\sales.xlsx')

#Draw bubble chart
colors=['steelblue','#9999ff','#ff9999','#DAA520','#FFFFF0','#FFA07A','#808000']
region=sales.region.unique()
texts=['Southwest','Northwest','Central China','south China','East China','North China','Northeast']
for i in range(len(region)):
    plt.scatter(sales.finish_ratio[i],
                sales.profit_ratio[i],
                c=colors[i],
                s=sales.tot_target[ i]/30,
                edgecolors='black')
    plt.text(sales.finish_ratio[i],
             sales.profit_ratio[i]+0.0001,
             texts[i],
             size=7,
             ha='center')

#Change the display mode of scale (percentage form)
plt.xticks(np.arange(0,1,0.1),[str(i*100)+'%' for i in np.arange(0,1,0.1)])
plt.yticks(np.arange(0,1,0.1),[str(i*100)+'%' for i in np.arange(0,1,0.1)])
#Set the value range of the axis
plt.xlim(0.20,0.70)
plt.ylim(0.25,0.85)

#Add title and axis
plt.title('Relationship between completion rate and profit')
plt.xlabel('Completion rate')
plt.ylabel('Profit margin')

#Remove the top and right scale of the border
plt.tick_params(top='off',right='off')

plt.show()

Posted by kel on Sun, 15 Dec 2019 08:55:40 -0800