A preliminary study of supervised learning 2: using linear regression to predict house prices in Boston (regression model)

Continuous learning machine learning: the regression model in the supervision model

The book uses fit function in scikit learn to implement linear regression model

Objective: train a regression model according to the training data, so as to predict the test data, and analyze the accuracy of the prediction

step1: load dataset

Thanks to omnipotent sklearn, there are Boston house price data in datasets

import numpy as np
from sklearn import datasets
from sklearn import metrics
from sklearn import model_selection as modsel
from sklearn import linear_model
import matplotlib.pyplot as plt
plt.style.use('ggplot')

boston = datasets.load_boston()
print(dir(boston))
print(boston.data.shape)
print(boston.target.shape)

Step 2: Training Model

Firstly, the data set is divided into training data and test data. Generally, the test data accounts for 10% - 30% of all data sets

# The data set is divided into training data set and test data set. Generally, 10% - 30% of the data is suitable for testing
# Divide data set into training data and test data
x_train, x_test, y_train, y_test = modsel.train_test_split(
	boston.data, boston.target, test_size=0.1,
	random_state = 42
)

Next, we use fit in sklearn to calculate the mean variance and coefficient of the predicted value

linreg = linear_model.LinearRegression()
linreg.fit(x_train, y_train)
# By calculating the difference between the real house price and the predicted result, the mean variance of the predicted value can be obtained
# Linreg.predict (x'train) is the predicted value
print('Variance of predicted value:\t' + 
	str(metrics.mean_squared_error(y_train, linreg.predict(x_train))))
print('Determination coefficient of predicted value( R Square value):\t' + str(linreg.score(x_train, y_train)))

Step 3: test model

Use test data to test the model linreg, and use matplotlib to show the fit relationship

# Step 3: test model
y_pred = linreg.predict(x_test)
print('Variance of prediction on test data:\t' + str(metrics.mean_squared_error(y_test, y_pred)))
# Draw the fitting image of the real value of the test data and the predicted value of the model
plt.figure(figsize=(10, 6))
plt.plot(y_test, linewidth=3, label='truth')
plt.plot(y_pred, linewidth=3, label='predict')
plt.legend(loc='best')
plt.xlabel('data_points')
plt.ylabel('target_value')
plt.show()

Step 4: further display the fitness of training model

Use matplotlib to change to another form to better display fit

# Step 4: formalize the number of data variances - further display the fit of the model
plt.plot(y_test, y_pred, 'o')
plt.plot([-10, 60], [-10, 60], 'k--')
plt.axis([-10, 60, -10, 60])
plt.xlabel('truth')
plt.ylabel('predict')

# Generate a text box to show variance
scorestr = r'R$^2$ = %.3f' % linreg.score(x_test, y_test)
errstr = 'MSE = %.3f' % metrics.mean_squared_error(y_test, y_pred)
plt.text(-5, 50, scorestr, fontsize=12)
plt.text(-5, 45, errstr, fontsize=12)
plt.show()

Posted by ibanez270dx on Mon, 02 Dec 2019 19:43:40 -0800