Continuous learning machine learning: the regression model in the supervision model
The book uses fit function in scikit learn to implement linear regression model
Objective: train a regression model according to the training data, so as to predict the test data, and analyze the accuracy of the prediction
step1: load dataset
Thanks to omnipotent sklearn, there are Boston house price data in datasets
import numpy as np from sklearn import datasets from sklearn import metrics from sklearn import model_selection as modsel from sklearn import linear_model import matplotlib.pyplot as plt plt.style.use('ggplot') boston = datasets.load_boston() print(dir(boston)) print(boston.data.shape) print(boston.target.shape)
Step 2: Training Model
Firstly, the data set is divided into training data and test data. Generally, the test data accounts for 10% - 30% of all data sets
# The data set is divided into training data set and test data set. Generally, 10% - 30% of the data is suitable for testing # Divide data set into training data and test data x_train, x_test, y_train, y_test = modsel.train_test_split( boston.data, boston.target, test_size=0.1, random_state = 42 )
Next, we use fit in sklearn to calculate the mean variance and coefficient of the predicted value
linreg = linear_model.LinearRegression() linreg.fit(x_train, y_train) # By calculating the difference between the real house price and the predicted result, the mean variance of the predicted value can be obtained # Linreg.predict (x'train) is the predicted value print('Variance of predicted value:\t' + str(metrics.mean_squared_error(y_train, linreg.predict(x_train)))) print('Determination coefficient of predicted value( R Square value):\t' + str(linreg.score(x_train, y_train)))
Step 3: test model
Use test data to test the model linreg, and use matplotlib to show the fit relationship
# Step 3: test model y_pred = linreg.predict(x_test) print('Variance of prediction on test data:\t' + str(metrics.mean_squared_error(y_test, y_pred))) # Draw the fitting image of the real value of the test data and the predicted value of the model plt.figure(figsize=(10, 6)) plt.plot(y_test, linewidth=3, label='truth') plt.plot(y_pred, linewidth=3, label='predict') plt.legend(loc='best') plt.xlabel('data_points') plt.ylabel('target_value') plt.show()
Step 4: further display the fitness of training model
Use matplotlib to change to another form to better display fit
# Step 4: formalize the number of data variances - further display the fit of the model plt.plot(y_test, y_pred, 'o') plt.plot([-10, 60], [-10, 60], 'k--') plt.axis([-10, 60, -10, 60]) plt.xlabel('truth') plt.ylabel('predict') # Generate a text box to show variance scorestr = r'R$^2$ = %.3f' % linreg.score(x_test, y_test) errstr = 'MSE = %.3f' % metrics.mean_squared_error(y_test, y_pred) plt.text(-5, 50, scorestr, fontsize=12) plt.text(-5, 45, errstr, fontsize=12) plt.show()