Super detailed Python matplotlib drawing histogram

Keywords: Python Back-end Data Analysis

Review review

Python provides a large number of excellent function packages for data display. Among them, matplotlib module can easily draw and make high-quality data packages such as line chart, histogram and scatter chart.

About the matplotlib module, we have studied the basic framework and common methods of matplotlib in the early stage

Python matplotlib drawing pie chart_

python matplotlib drawing line chart_

python introduction to advanced, crawler data analysis, a full set of information sharing and explanation

Python object-oriented programming: class inheritance and its derived terms

Python object-oriented programming: deep understanding of class_

Among the charts provided by the matplotlib module, in addition to the line chart, the histogram is also the chart for our daily data analysis.

In this issue, we begin to learn the attributes and methods related to drawing histogram, let's go~

1. Histogram overview

  • What is a histogram

    • Bar chart, also known as bar chart, is a statistical chart with the length of rectangle as variable data
    • A histogram is used to compare two or more types
    • The bar graph has only one variable with the length of the rectangle
    • The histogram can be displayed horizontally or in a multidimensional manner
  • Histogram usage scenario

    • The histogram is suitable for the analysis of smaller data sets
    • It is applicable to two-dimensional data sets. Only one dimension data difference item is compared
    • Visually display the data differences between individuals
    • Representing discrete time series
  • Histogram drawing steps

    1. Import the matplotlib.pyplot module
    2. To prepare the data, you can use numpy/pandas to organize the data
    3. Call to draw the histogram
  • Case display

    This time, we analyze the annual sales of products in the past five years

    • The data used in the case are as follows:

      import random
      x_data = ["20{}year".format(i) for i in range(16,21)]
      y_data = [random.randint(100,300) for i in range(6)]
      Copy code
    • Draw histogram

      import matplotlib.pyplot as plt
      for i in range(len(x_data)):
      plt.title("Sales analysis")
      plt.xlabel("particular year")
      plt.ylabel("sales volume")
      Copy code


2. Histogram attribute

  • Columnar color fill

    • facecolor (fc) keyword

    • color keyword

    • Color abbreviation:

    Attribute valueexplainAttribute valueexplain
    "g" /"green"green"y"/"yellow"yellow
    • rgb:

      • Format: (r,g,b)
      • Value range: 0 ~ 1
  • Cylindrical stroke settings

    • Column border color

      • edgecolor or ec
    • Column border style

      • linestyle or ls

      • Line style:

      Attribute valueexplain
      "-" ,"solid"Default solid line display
      "--","dashed"Dotted line
      "-." "dashdot"Dotted line
      ":","dotted"Dotted line
      "None" """"empty
    • Column border width

      • linewidth or lw
  • Histogram fill pattern

    • hatch: set fill pattern
    • Attribute values: {'/', '', '|' - ',' + ',' x ',' o ',' o ','. ',' * '}|
  • Histogram scale label

    • tickle label: the default number label is used
  • We add a border style of "-" to the first section of the histogram, add the specified rgb color and fill in the circle

    for i in range(len(x_data)):[i],y_data[i],color=(0.2*i,0.2*i,0.2*i),linestyle="--",hatch="o")
    Copy code


3. Stacked histogram

In the histogram, we will compare the manifestations of two groups of data in the same category at the same time, so we need to draw a stacked histogram

  • bottom: the y coordinate of the strip base. The default value is 0

  • In the case of section 1, add a set of y-axis data, all of which are as follows:

     x_data = ["20{}year".format(i) for i in range(16,21)]
     y_data = list(random.randint(100,300) for i in range(5))
     y2_data = list(random.randint(100,300) for i in range(5))
    Copy code
  • Add the method again and add the bottom attribute,y_data,lw=0.5,fc="r",label="Phone"),y2_data,lw=0.5,fc="b",label="Android",bottom=y_data)
    Copy code


4. Parallel histogram

In drawing a side-by-side histogram, you can use the width attribute to control the position and size of each column

  • Width: sets the width of each group of columns

  • X-axis: the width of x-axis should also be set directly for each group

  • For example, continue to modify the above case. After adding the width attribute to bar1 and bar2, set the width of the x-axis side by side to 0.3

    x_width = range(0,len(x_data))
    x2_width = [i+0.3 for i in x_width]
    Copy code


5. Horizontal histogram

In the histogram, sometimes we need to place the histogram horizontally to compare the differences. At this time, we need to use the barh method

  • pyplot.barh(y,width): draw a horizontal histogram

  • Combined with the above cases, the barh method is used

  x_data = ["20{}year".format(i) for i in range(16,21)]
  y_data = list(random.randint(100,300) for i in range(5))
  y2_data = list(random.randint(100,300) for i in range(5))

  x_width = range(0,len(x_data))
  x2_width = [i+0.3 for i in x_width]



  plt.title("Sales analysis")
  plt.ylabel("particular year")
  plt.xlabel("sales volume")
Copy code


6. Add a broken line histogram

When viewing the histogram, we sometimes need auxiliary polylines to view it

  • Use the pyplot.plot() method to summarize the line chart

  • Also use pyplot.text() to display coordinate values

  • When stacking a graph, you need to calculate the relative position of the broken line

    plt.plot(x_data, y2_data+200, color="skyblue", linestyle="-.")
    # Histogram,y_data,lw=0.5,fc="r",width=0.3,label="Phone",alpha=0.5),y2_data, lw=0.5, fc="b", width=0.3, label="Android",alpha=0.5,bottom=y_data)
    for i,j in zip(x_data,y_data):
    for i2,j2 in zip(x_data,y2_data):
    Copy code


7. Positive and negative histogram

We need to use the Axes object to set the position of the coordinate axis

  • First, create the axes object using the pyplot.gca() method

  • Then use the matplotlib.spines module to call set_position sets the axis position

  • set_position sets the axis position point

  • spines [] options include "left"|"bottom"|"width"|"height"

  • set_ The format of position value is (position type, quantity); Location type; "outward"|"axes"|"data"|; Quantity: Center - > ("axis", 0.5), zero - > ("data", 0.0)

    y_data = np.random.randint(100, 300,5)
    y2_data = np.random.randint(100, 300,5)
    ax = plt.gca()
    ax.spines["bottom"].set_position(('data', 0)),+y_data,lw=0.5,fc="r",width=0.3,label="Phone"),-y2_data, lw=0.5, fc="b", width=0.3, label="Android")
    for i,j in zip(x_data,y_data):
    for i2,j2 in zip(x_data,y2_data):
    Copy code


In this issue, we learn the attributes and methods related to drawing various columnar icons in detail in the matplotlib module. When we need to visually display the differences of discrete data points, we can use bar() or barh() to draw beautiful charts.

The above is the content of this issue. Welcome to praise and comment. See you in the next issue~

Posted by Glen on Wed, 17 Nov 2021 23:51:38 -0800