Handle missing and abnormal values in the data

preface

In order to cope with my homework, I wanted to find a code on the Internet and handed it in. I found that I didn't have what I wanted, so I wrote it in tears for two days. In order to study hard, I rolled it up with my classmates 🤣
If you need to hand in your homework, just look at the source code directly. In the CleaningData () method, you must change it to the data you need. I've written everything else for you

target

Deal with missing and abnormal values in data (find out and re assign values)

process

Blank value (missing value)

For the processing of blank value, I use the average value of the two lines before and after the blank value to assign a value;
If the first and second lines are blank values, the position will always be found;
If a column has only one data, I will use this data to assign a value to the whole column ψ (｀∇´) ψ
(at least I hope the last situation doesn't happen)

Outliers

Exception handling, I use Three point Lagrange interpolation
Why use this? Because the teacher asked ƪ ( ˘ ⌣ ˘)ʃ
Use three points to find an unknown equation, and use this equation to predict the value of abnormal value position.
This equation can be worked out. You can try to output it. (shown below)

harvest

Harvest 1:

# Show all columns
pd.set_option('display.max_columns', None)
# Show all rows
pd.set_option('display.max_rows', None)
# Align column names
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
# Cancel scientific counting output
np.set_printoptions(suppress=True)
# In order to display the numbers intuitively, the scientific counting method is not used
pd.set_option('display.float_format', lambda x: '%.2f' % x)
# Used to display Chinese labels normally
plt.rcParams['font.sans-serif'] = ['SimHei'] 
# Used to display negative signs normally
plt.rcParams['axes.unicode_minus'] = False

Harvest 2:
np.where (data = = '1'): find the position of 1

print(np.where(data == ''))  # Find the coordinates of the blank value

Harvest 3:
A valid multidimensional iterator object that can traverse an array.
Here I convert string data into floating-point numbers. Two can be implemented

np.nditer(row)
or
[float(num) for num in arr[:, 0]]

Harvest 4:
Three point Lagrange interpolation

from scipy.interpolate import lagrange
iloc_point_of_time = [float(num) for num in arr[:, 0]][4:7:1]  
iloc_temperature = [float(num) for num in arr[:, 1]][4:7:1]
a1 = lagrange(iloc_point_of_time, iloc_temperature)
print(a1)

Harvest 5:
If you want to make the points on the line chart highlight, you only need to use plot () and scatter () at the same time
It is the combination of line chart and scatter chart

Core source code

Process blank values

Get the coordinate array of the blank array, and pass the elements of the array to row and col in fill_value (arrow, row, Col)

def Create_Fill_Value_Arr(arr):  # If you need coordinates, the number of abscissa and ordinate should be the same. Create a 2-dimensional zero array with len (row) length
    # row and col are sets of abscissa and ordinate, respectively
    # print(np.where(data == ''))  # Find the coordinates of the blank value
    row = np.where(arr == '')[0]
    col = np.where(arr == '')[1]
    CoordinateArray = np.zeros((len(row), 2))  # Blank value coordinate array
    row_count = 0
    for i in np.nditer(row):
        CoordinateArray[row_count][0] = i
        row_count += 1
    col_count = 0
    for i in np.nditer(col):
        CoordinateArray[col_count][1] = i
        col_count += 1
    return CoordinateArray

arr: array to be processed
row: abscissa of the element to be processed
col: ordinate of the element to be processed

def Fill_Value(arr, row, col):
    # The two-dimensional array is filled with blank values
    # There is a precondition here. This assignment is based on the upstream and downstream values of the same column, so the column must not 1 change, only the number of rows
    current_row = row  # Record the number of lines entered
    current_col = col  # Record the number of columns entered
    next_row = 0  # Record the number of lines in the next line
    next_col = col  # Record the number of columns in the next column
    pre_row = 0  # Record the number of lines in the previous line
    pre_col = col  # Record the number of columns in the previous column

    # Assignment of the first and last lines
    if row == 0 or row == len(arr) - 1:
        if row == 0:  # If the position of the input element is in the first row, the non empty number of the next row in the same column is assigned to it
            flag = True  # Tag, if the assignment ends, the loop ends
            while flag:
                current_row += 1
                if arr[current_row][current_col] != '':
                    arr[row][col] = arr[current_row][current_col]  # assignment
                    flag = False
                elif arr[current_row][current_col] == '':  # Judge if the current number is null
                    current_row += 1  # Then find the value of the next row in the same column
                else:
                    print("Fill_Value()method==first line==Something went wrong!!")
        elif row == len(arr) - 1:  # If the position of the input element is in the last row, the non empty number of the row on the same column is assigned to it
            flag = True  # Tag, if the assignment ends, the loop ends
            current_row -= 1
            while flag:
                if arr[current_row][current_col] != '':
                    arr[row][col] = arr[current_row][current_col]
                    flag = False
                elif arr[current_row][current_col] == '':
                    current_row -= 1
                else:
                    print("Fill_Value()Methodical==Last line==Something went wrong!!")
        else:
            print("If this occurs, the condition judgment of the first and last lines is wrong")

    # Assignment of non first and last lines
    elif row != 0 and row != len(arr) - 1:  # Assign the average value of the upper and lower rows of the same column of the value to it
        # There is an arithmetic pit here, that is, when you get to this line, the previous numbers must be non empty values, because the above algorithm fills the previous values,
        # Therefore, for this number, the first n lines have values, but the latter ones do not necessarily have values. It is possible that many lines have null values,
        # Therefore, we need to make some judgment when selecting the next value
        # The practice here is to find a value to average no matter how many lines are separated
        # If all rows in this row are blank values, the algorithm here assigns the previous value of this row to all subsequent rows
        flag = True  # Tag, if the assignment ends, the loop ends
        next_row = current_row + 1
        pre_row = current_row - 1
        while flag:
            if arr[next_row][next_col] != '':  # Here we judge whether the next line is null
                arr[row][col] = str((float(arr[next_row][next_col]) +  # Since the numerical calculation of two numbers is involved here, it needs to be converted to numerical type, while the above only involves the assignment behavior of a single line
                                     float(arr[pre_row][pre_col])) / 2)  # Floating point numbers are required here, not integers
                flag = False
            elif arr[next_row][next_col] == '':
                next_row += 1
            else:
                print("Fill_Value()Methodical==Assignment of non first and last lines==Something went wrong!!")
    else:
        print('How could this be wrong?')
    return arr

Handling outliers

You need to modify some data here, because when you write it, you find that if you keep improving, it won't be enough for me to write in a week, so I can only be lazy secretly. Forgive me (´▽ ʃ ♡ ƪ) (≧∇≦)ﾉ(❁´◡❁)

arr: enter the array after processing the blank value

# Abnormal data can be obtained through the box diagram
def CleaningData(arr):
    # The data of arr is all strings
    # If a data is very different from all the data in the table, it is necessary to replace the normal data with the abnormal data
    point_of_time = [float(num) for num in arr[:, 0]]  # This sampling time point should not be used
    temperature = [float(num) for num in arr[:, 1]]
    relative_humidity = [float(num) for num in arr[:, 2]]
    Gas_concentration = [float(num) for num in arr[:, 3]]
    Carbon_monoxide_concentration = [float(num) for num in arr[:, 4]]
    # The data of data is all float
    data = {
        'temperature': temperature,
        'relative_humidity': relative_humidity,
        'Gas_concentration': Gas_concentration,
        'Carbon_monoxide_concentration': Carbon_monoxide_concentration,
    }
    df = pd.DataFrame(data)
    df.boxplot()  # Here, pandas has its own process
    plt.ylabel("ylabel")
    plt.xlabel("xlabel")  # We set the title of the abscissa and ordinate.
    plt.savefig('Box diagram.jpg')
    plt.show()
    # The des data here comes from df.describe (), because
    des = {
        'temperature': {
            'count': 44.00,
            'mean': 53.63,
            'std': 146.07,
            'min': 20.50,
            '25%': 29.06,
            '50%': 31.73,
            '75%': 35.93,
            'max': 999.99,
        },
        'relative_humidity': {
            'count': 44.00,
            'mean': 118.39,
            'std': 194.72,
            'min': 60.00,
            '25%': 67.00,
            '50%': 76.00,
            '75%': 88.25,
            'max': 999.99,
        },
        'Gas_concentration': {
            'count': 44.00,
            'mean': 25.23,
            'std': 150.37,
            'min': 1.08,
            '25%': 1.90,
            '50%': 2.65,
            '75%': 3.39,
            'max': 999.99,
        },
        'Carbon_monoxide_concentration': {
            'count': 44.00,
            'mean': 49.15,
            'std': 209.90,
            'min': 1.08,
            '25%': 2.47,
            '50%': 3.94,
            '75%': 5.23,
            'max': 999.99,
        }
    }
    columns_list = ['temperature', 'relative_humidity', 'Gas_concentration', 'Carbon_monoxide_concentration']
    # If any value exceeds the average value, the value is regarded as an abnormal value
    arrays_list = []  # This list is used to record the coordinates of outliers
    col = 0
    row = 0
    for col_list in columns_list:
        for i in range(len(arr)):
            if data[col_list][row] >= des[col_list]['std']:
                arrays_list.append([row, col])
                row += 1
            elif data[col_list][row] < des[col_list]['std']:
                row += 1
            else:
                print("This can't happen once")
        col += 1
        row = 0
    # Three values select (4, 5, 6) rows
    iloc_point_of_time = [float(num) for num in arr[:, 0]][4:7:1]  # This sample collection time
    iloc_temperature = [float(num) for num in arr[:, 1]][4:7:1]
    iloc_relative_humidity = [float(num) for num in arr[:, 2]][4:7:1]
    iloc_Gas_concentration = [float(num) for num in arr[:, 3]][4:7:1]
    iloc_Carbon_monoxide_concentration = [float(num) for num in arr[:, 4]][4:7:1]
    # Here, the lagrange three value interpolation method is used to call the lagrange () function
    a1 = lagrange(iloc_point_of_time, iloc_temperature)
    a2 = lagrange(iloc_point_of_time, iloc_relative_humidity)
    a3 = lagrange(iloc_point_of_time, iloc_Gas_concentration)
    a4 = lagrange(iloc_point_of_time, iloc_Carbon_monoxide_concentration)
    # Assign outliers
    for i in range(len(arrays_list)):
        if arrays_list[i][1] == 0:
            arr[arrays_list[i][0]][1] = a1(arrays_list[i][1])
        elif arrays_list[i][1] == 1:
            arr[arrays_list[i][0]][2] = a2(arrays_list[i][1])
        elif arrays_list[i][1] == 2:
            arr[arrays_list[i][0]][3] = a3(arrays_list[i][1])
        elif arrays_list[i][1] == 3:
            arr[arrays_list[i][0]][4] = a4(arrays_list[i][1])
    print(arr)
    return arr

Write at the end

Because the quality of the code you write is not high, it will be a headache for you to read the whole blog.
If you really need data processing code, I'll try to extract what I think is the key content.
I hope this blog can help you and solve your problems.
Wait a little further to modify the source code.

source file

data

I think you should be able to input these data into the csv file. It's good to read them anyway
(～￣▽￣)～

1,30.22,69,2.9,3.6
2,37.68,,2.86,3.64
3,29.32,66,,1.66
4,37.44,68,1.18,6.49
5,29.46,75,3.81,4.78
6,30.12,77,1.93,4.2
7,26.3,83,2.07,3.67
8,,66,1.46,
9,34.38,62,3.36,4.39
10,30.79,75,2.4,5.79
11,25.17,69,3.34,6.28
12,20.5,80,2.95,6.22
13,37.21,,999.99,4.75
14,31.88,94,2.41,1.49
15,39.94,77,1.98,1.08
16,33.65,67,2.03,
17,27.21,79,1.41,2.5
18,27.57,999,3.39,1.7
19,31.59,94,3.27,5.02
20,,93,,4.69
21,34.88,92,1.58,3.01
22,29.65,92,2.69,5.16
23,26.05,80,2.61,2.8
24,33.5,79,1.26,6.81
25,34.71,72,1.82,2.49
26,999.99,,3.77,999.99
27,37.06,99,1.94,3.85
28,28.57,64,3.93,
29,28,66,1.08,2.4
30,25.67,60,2.33,3.84
31,20.59,67,3.88,5.19
32,39.84,999,3.4,3.77
33,29.22,74,1.09,999.99
34,33.19,61,3.22,1.74
35,38.7,62,3.61,6.52
36,,62,,6.83
37,21.56,93,3.63,4.43
38,38.06,67,3.21,2.41
39,37.95,68,1.12,4.56
40,35.8,96,1.52,5.54
41,29.75,80,1.3,2.23
42,32.69,69,3.26,1.43
43,33.94,96,3.42,1.4
44,36.31,78,3.69,5.34

Source code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import lagrange

# Show all columns
pd.set_option('display.max_columns', None)
# Show all rows
pd.set_option('display.max_rows', None)
# Align column names
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
# Cancel scientific counting output
np.set_printoptions(suppress=True)
# In order to display the numbers intuitively, the scientific counting method is not used
pd.set_option('display.float_format', lambda x: '%.2f' % x)

'''
    param :
        arr: target array
        row: that 's ok
        col: column
'''
def Fill_Value(arr, row, col):
    # The two-dimensional array is filled with blank values
    # There is a precondition here. This assignment is based on the upstream and downstream values of the same column, so the column must not 1 change, only the number of rows
    current_row = row  # Record the number of lines entered
    current_col = col  # Record the number of columns entered
    next_row = 0  # Record the number of lines in the next line
    next_col = col  # Record the number of columns in the next column
    pre_row = 0  # Record the number of lines in the previous line
    pre_col = col  # Record the number of columns in the previous column

    # Assignment of the first and last lines
    if row == 0 or row == len(arr) - 1:
        if row == 0:  # If the position of the input element is in the first row, the non empty number of the next row in the same column is assigned to it
            flag = True  # Tag, if the assignment ends, the loop ends
            while flag:
                current_row += 1
                if arr[current_row][current_col] != '':
                    arr[row][col] = arr[current_row][current_col]  # assignment
                    flag = False
                elif arr[current_row][current_col] == '':  # Judge if the current number is null
                    current_row += 1  # Then find the value of the next row in the same column
                else:
                    print("Fill_Value()method==first line==Something went wrong!!")
        elif row == len(arr) - 1:  # If the position of the input element is in the last row, the non empty number of the row on the same column is assigned to it
            flag = True  # Tag, if the assignment ends, the loop ends
            current_row -= 1
            while flag:
                if arr[current_row][current_col] != '':
                    arr[row][col] = arr[current_row][current_col]
                    flag = False
                elif arr[current_row][current_col] == '':
                    current_row -= 1
                else:
                    print("Fill_Value()Methodical==Last line==Something went wrong!!")
        else:
            print("If this occurs, the condition judgment of the first and last lines is wrong")

    # Assignment of non first and last lines
    elif row != 0 and row != len(arr) - 1:  # Assign the average value of the upper and lower rows of the same column of the value to it
        # There is an arithmetic pit here, that is, when you get to this line, the previous numbers must be non empty values, because the above algorithm fills the previous values,
        # Therefore, for this number, the first n lines have values, but the latter ones do not necessarily have values. It is possible that many lines have null values,
        # Therefore, we need to make some judgment when selecting the next value
        # The practice here is to find a value to average no matter how many lines are separated
        # If all rows in this row are blank values, the algorithm here assigns the previous value of this row to all subsequent rows
        flag = True  # Tag, if the assignment ends, the loop ends
        next_row = current_row + 1
        pre_row = current_row - 1
        while flag:
            if arr[next_row][next_col] != '':  # Here we judge whether the next line is null
                arr[row][col] = str((float(arr[next_row][next_col]) +  # Since the numerical calculation of two numbers is involved here, it needs to be converted to numerical type, while the above only involves the assignment behavior of a single line
                                     float(arr[pre_row][pre_col])) / 2)  # Floating point numbers are required here, not integers
                flag = False
            elif arr[next_row][next_col] == '':
                next_row += 1
            else:
                print("Fill_Value()Methodical==Assignment of non first and last lines==Something went wrong!!")
    else:
        print('How could this be wrong?')
    return arr


def Create_Fill_Value_Arr(arr):  # If you need coordinates, the number of abscissa and ordinate should be the same. Create a 2-dimensional zero array with len (row) length
    # row and col are sets of abscissa and ordinate, respectively
    # print(np.where(data == ''))  # Find the coordinates of the blank value
    row = np.where(arr == '')[0]
    col = np.where(arr == '')[1]
    CoordinateArray = np.zeros((len(row), 2))  # Blank value coordinate array
    row_count = 0
    for i in np.nditer(row):
        CoordinateArray[row_count][0] = i
        row_count += 1
    col_count = 0
    for i in np.nditer(col):
        CoordinateArray[col_count][1] = i
        col_count += 1
    return CoordinateArray


# Abnormal data can be obtained through the box diagram
def CleaningData(arr):
    # The data of arr is all strings
    # If a data is very different from all the data in the table, it is necessary to replace the normal data with the abnormal data
    point_of_time = [float(num) for num in arr[:, 0]]  # This sampling time point should not be used
    temperature = [float(num) for num in arr[:, 1]]
    relative_humidity = [float(num) for num in arr[:, 2]]
    Gas_concentration = [float(num) for num in arr[:, 3]]
    Carbon_monoxide_concentration = [float(num) for num in arr[:, 4]]
    # The data of data is all float
    data = {
        'temperature': temperature,
        'relative_humidity': relative_humidity,
        'Gas_concentration': Gas_concentration,
        'Carbon_monoxide_concentration': Carbon_monoxide_concentration,
    }
    df = pd.DataFrame(data)
    df.boxplot()  # Here, pandas has its own process
    plt.ylabel("ylabel")
    plt.xlabel("xlabel")  # We set the title of the abscissa and ordinate.
    plt.savefig('Box diagram.jpg')
    plt.show()
    # The des data here comes from df.describe (), because
    des = {
        'temperature': {
            'count': 44.00,
            'mean': 53.63,
            'std': 146.07,
            'min': 20.50,
            '25%': 29.06,
            '50%': 31.73,
            '75%': 35.93,
            'max': 999.99,
        },
        'relative_humidity': {
            'count': 44.00,
            'mean': 118.39,
            'std': 194.72,
            'min': 60.00,
            '25%': 67.00,
            '50%': 76.00,
            '75%': 88.25,
            'max': 999.99,
        },
        'Gas_concentration': {
            'count': 44.00,
            'mean': 25.23,
            'std': 150.37,
            'min': 1.08,
            '25%': 1.90,
            '50%': 2.65,
            '75%': 3.39,
            'max': 999.99,
        },
        'Carbon_monoxide_concentration': {
            'count': 44.00,
            'mean': 49.15,
            'std': 209.90,
            'min': 1.08,
            '25%': 2.47,
            '50%': 3.94,
            '75%': 5.23,
            'max': 999.99,
        }
    }
    columns_list = ['temperature', 'relative_humidity', 'Gas_concentration', 'Carbon_monoxide_concentration']
    # If any value exceeds the average value, the value is regarded as an abnormal value
    arrays_list = []  # This list is used to record the coordinates of outliers
    col = 0
    row = 0
    for col_list in columns_list:
        for i in range(len(arr)):
            if data[col_list][row] >= des[col_list]['std']:
                arrays_list.append([row, col])
                row += 1
            elif data[col_list][row] < des[col_list]['std']:
                row += 1
            else:
                print("This can't happen once")
        col += 1
        row = 0
    # Three values select (4, 5, 6) rows
    iloc_point_of_time = [float(num) for num in arr[:, 0]][4:7:1]  # This sample collection time
    iloc_temperature = [float(num) for num in arr[:, 1]][4:7:1]
    iloc_relative_humidity = [float(num) for num in arr[:, 2]][4:7:1]
    iloc_Gas_concentration = [float(num) for num in arr[:, 3]][4:7:1]
    iloc_Carbon_monoxide_concentration = [float(num) for num in arr[:, 4]][4:7:1]
    # Here, the lagrange three value interpolation method is used to call the lagrange () function
    a1 = lagrange(iloc_point_of_time, iloc_temperature)
    a2 = lagrange(iloc_point_of_time, iloc_relative_humidity)
    a3 = lagrange(iloc_point_of_time, iloc_Gas_concentration)
    a4 = lagrange(iloc_point_of_time, iloc_Carbon_monoxide_concentration)
    # Assign outliers
    for i in range(len(arrays_list)):
        if arrays_list[i][1] == 0:
            arr[arrays_list[i][0]][1] = a1(arrays_list[i][1])
        elif arrays_list[i][1] == 1:
            arr[arrays_list[i][0]][2] = a2(arrays_list[i][1])
        elif arrays_list[i][1] == 2:
            arr[arrays_list[i][0]][3] = a3(arrays_list[i][1])
        elif arrays_list[i][1] == 3:
            arr[arrays_list[i][0]][4] = a4(arrays_list[i][1])
    # print(arr)
    return arr


# Represent the data in the form of images
def CreatePicture(arr):
    # Here, corresponding variables are created for files, which may not be applicable to other files
    # Because the initial value is a string, you need to convert the string in the list to floating point, otherwise an error will be reported
    point_of_time = [float(num) for num in arr[:, 0]]
    temperature = [float(num) for num in arr[:, 1]]
    relative_humidity = [float(num) for num in arr[:, 2]]
    Gas_concentration = [float(num) for num in arr[:, 3]]
    Carbon_monoxide_concentration = [float(num) for num in arr[:, 4]]
    # Connect each point into a line segment
    plt.plot(point_of_time, temperature, label="Temperature(℃)", linestyle=":")
    plt.plot(point_of_time, relative_humidity, label="relative humidity", linestyle="--")
    plt.plot(point_of_time, Gas_concentration, label="Gas concentration", linestyle="-.")
    plt.plot(point_of_time, Carbon_monoxide_concentration,
             label="Carbon monoxide concentration", linestyle="-")
    # Show each point
    plt.scatter(point_of_time, temperature, s=10)
    plt.scatter(point_of_time, relative_humidity, s=10)
    plt.scatter(point_of_time, Gas_concentration, s=10)
    plt.scatter(point_of_time, Carbon_monoxide_concentration, s=10)
    plt.rcParams['font.sans-serif'] = ['SimHei']  # Used to display Chinese labels normally
    plt.rcParams['axes.unicode_minus'] = False  # Used to display negative signs normally
    plt.title("Coal mine sampling data sheet", fontsize=18)
    plt.xlabel("Sampling time point", fontsize=12)
    plt.ylabel("numerical value", fontsize=12)
    plt.legend(fontsize=10)
    plt.savefig('Line chart.jpg')
    plt.show()


# Encapsulate all required functions in the main () method
def main(file):
    # The type here is converted to string type, because float or int is not suitable for null and numeric values
    data = np.loadtxt(file, skiprows=1, dtype=str, delimiter=',')
    CoordinateArray = Create_Fill_Value_Arr(data)  # Returns an array of coordinates containing all null values
    for i in range(len(CoordinateArray)):
        arr_final = Fill_Value(
            data, int(CoordinateArray[i][0]), int(CoordinateArray[i][1]))  # Although the floating-point type is converted to integer type, the coordinates here should be integers, so there is no problem with this forced conversion
    arr_finally = CleaningData(arr_final)
    CreatePicture(arr_finally)
    return arr_finally


# It's best to use absolute addresses here, because some software seems to be inaccurate in locating relative addresses
file = r'task\Week 8\ug_detect.csv'
print(main(file))

Operation results

This is the modified result. The previous outliers and blank values have been re assigned
There is a small problem: the temperature has an abnormal value after calculation, which may remind me not to take it lightly. The code that feels super good will also be abnormal

[['1' '30.22' '69' '2.9' '3.6']        
 ['2' '37.68' '67.5' '2.86' '3.64']    
 ['3' '29.32' '66' '2.02' '1.66']      
 ['4' '37.44' '68' '1.18' '6.49']      
 ['5' '29.46' '75' '3.81' '4.78']      
 ['6' '30.12' '77' '1.93' '4.2']       
 ['7' '26.3' '83' '2.07' '3.67']       
 ['8' '30.340' '66' '1.46' '4.0299']   
 ['9' '34.38' '62' '3.36' '4.39']      
 ['10' '30.79' '75' '2.4' '5.79']      
 ['11' '25.17' '69' '3.34' '6.28']     
 ['12' '20.5' '80' '2.95' '6.22']      
 ['13' '37.21' '87.0' '21.569' '4.75'] 
 ['14' '31.88' '94' '2.41' '1.49']     
 ['15' '39.94' '77' '1.98' '1.08']     
 ['16' '33.65' '67' '2.03' '1.79']     
 ['17' '27.21' '79' '1.41' '2.5']      
 ['18' '27.57' '107.0' '3.39' '1.7']   
 ['19' '31.59' '94' '3.27' '5.02']     
 ['20' '33.235' '93' '2.425' '4.69']   
 ['21' '34.88' '92' '1.58' '3.01']     
 ['22' '29.65' '92' '2.69' '5.16']     
 ['23' '26.05' '80' '2.61' '2.8']      
 ['24' '33.5' '79' '1.26' '6.81']      
 ['25' '34.71' '72' '1.82' '2.49']     
 ['26' '-41.04' '85.5' '3.77' '6.0900']
 ['27' '37.06' '99' '1.94' '3.85']     
 ['28' '28.57' '64' '3.93' '3.125']    
 ['29' '28' '66' '1.08' '2.4']
 ['30' '25.67' '60' '2.33' '3.84']     
 ['31' '20.59' '67' '3.88' '5.19']
 ['32' '39.84' '107.0' '3.4' '3.77']
 ['33' '29.22' '74' '1.09' '6.0900']
 ['34' '33.19' '61' '3.22' '1.74']
 ['35' '38.7' '62' '3.61' '6.52']
 ['36' '30.130' '62' '3.62' '6.83']
 ['37' '21.56' '93' '3.63' '4.43']
 ['38' '38.06' '67' '3.21' '2.41']
 ['39' '37.95' '68' '1.12' '4.56']
 ['40' '35.8' '96' '1.52' '5.54']
 ['41' '29.75' '80' '1.3' '2.23']
 ['42' '32.69' '69' '3.26' '1.43']
 ['43' '33.94' '96' '3.42' '1.4']
 ['44' '36.31' '78' '3.69' '5.34']]
[['1' '30.22' '69' '2.9' '3.6']
 ['2' '37.68' '67.5' '2.86' '3.64']
 ['3' '29.32' '66' '2.02' '1.66']
 ['4' '37.44' '68' '1.18' '6.49']
 ['5' '29.46' '75' '3.81' '4.78']
 ['6' '30.12' '77' '1.93' '4.2']
 ['7' '26.3' '83' '2.07' '3.67']
 ['8' '30.340' '66' '1.46' '4.0299']
 ['9' '34.38' '62' '3.36' '4.39']
 ['10' '30.79' '75' '2.4' '5.79']
 ['11' '25.17' '69' '3.34' '6.28']
 ['12' '20.5' '80' '2.95' '6.22']
 ['13' '37.21' '87.0' '21.569' '4.75']
 ['14' '31.88' '94' '2.41' '1.49']
 ['15' '39.94' '77' '1.98' '1.08']
 ['16' '33.65' '67' '2.03' '1.79']
 ['17' '27.21' '79' '1.41' '2.5']
 ['18' '27.57' '107.0' '3.39' '1.7']
 ['19' '31.59' '94' '3.27' '5.02']
 ['20' '33.235' '93' '2.425' '4.69']
 ['21' '34.88' '92' '1.58' '3.01']
 ['22' '29.65' '92' '2.69' '5.16']
 ['23' '26.05' '80' '2.61' '2.8']
 ['24' '33.5' '79' '1.26' '6.81']
 ['25' '34.71' '72' '1.82' '2.49']
 ['26' '-41.04' '85.5' '3.77' '6.0900']
 ['27' '37.06' '99' '1.94' '3.85']
 ['28' '28.57' '64' '3.93' '3.125']
 ['29' '28' '66' '1.08' '2.4']
 ['30' '25.67' '60' '2.33' '3.84']
 ['31' '20.59' '67' '3.88' '5.19']
 ['32' '39.84' '107.0' '3.4' '3.77']
 ['33' '29.22' '74' '1.09' '6.0900']
 ['34' '33.19' '61' '3.22' '1.74']
 ['35' '38.7' '62' '3.61' '6.52']
 ['36' '30.130' '62' '3.62' '6.83']
 ['37' '21.56' '93' '3.63' '4.43']
 ['38' '38.06' '67' '3.21' '2.41']
 ['39' '37.95' '68' '1.12' '4.56']
 ['40' '35.8' '96' '1.52' '5.54']
 ['41' '29.75' '80' '1.3' '2.23']
 ['42' '32.69' '69' '3.26' '1.43']
 ['43' '33.94' '96' '3.42' '1.4']
 ['44' '36.31' '78' '3.69' '5.34']]

The box chart is to see whether there are abnormal values and how many abnormal values there are
But look at the output pictures, you can see that you have only learned a little fur
The following is a combination of line chart and scatter chart

Posted by katlis on Sat, 23 Oct 2021 07:36:05 -0700

Programmer Group