# A preliminary study of supervised learning 2: using linear regression to predict house prices in Boston (regression model)

Continuous learning machine learning: the regression model in the supervision model

The book uses fit function in scikit learn to implement linear regression model

Objective: train a regression model according to the training data, so as to predict the test data, and analyze the accuracy of the prediction

Thanks to omnipotent sklearn, there are Boston house price data in datasets

```import numpy as np
from sklearn import datasets
from sklearn import metrics
from sklearn import model_selection as modsel
from sklearn import linear_model
import matplotlib.pyplot as plt
plt.style.use('ggplot')

print(dir(boston))
print(boston.data.shape)
print(boston.target.shape)```

Step 2: Training Model

Firstly, the data set is divided into training data and test data. Generally, the test data accounts for 10% - 30% of all data sets

```# The data set is divided into training data set and test data set. Generally, 10% - 30% of the data is suitable for testing
# Divide data set into training data and test data
x_train, x_test, y_train, y_test = modsel.train_test_split(
boston.data, boston.target, test_size=0.1,
random_state = 42
)```

Next, we use fit in sklearn to calculate the mean variance and coefficient of the predicted value

```linreg = linear_model.LinearRegression()
linreg.fit(x_train, y_train)
# By calculating the difference between the real house price and the predicted result, the mean variance of the predicted value can be obtained
# Linreg.predict (x'train) is the predicted value
print('Variance of predicted value:\t' +
str(metrics.mean_squared_error(y_train, linreg.predict(x_train))))
print('Determination coefficient of predicted value( R Square value):\t' + str(linreg.score(x_train, y_train)))``` Step 3: test model

Use test data to test the model linreg, and use matplotlib to show the fit relationship

```# Step 3: test model
y_pred = linreg.predict(x_test)
print('Variance of prediction on test data:\t' + str(metrics.mean_squared_error(y_test, y_pred)))
# Draw the fitting image of the real value of the test data and the predicted value of the model
plt.figure(figsize=(10, 6))
plt.plot(y_test, linewidth=3, label='truth')
plt.plot(y_pred, linewidth=3, label='predict')
plt.legend(loc='best')
plt.xlabel('data_points')
plt.ylabel('target_value')
plt.show()``` Step 4: further display the fitness of training model

Use matplotlib to change to another form to better display fit

```# Step 4: formalize the number of data variances - further display the fit of the model
plt.plot(y_test, y_pred, 'o')
plt.plot([-10, 60], [-10, 60], 'k--')
plt.axis([-10, 60, -10, 60])
plt.xlabel('truth')
plt.ylabel('predict')

# Generate a text box to show variance
scorestr = r'R\$^2\$ = %.3f' % linreg.score(x_test, y_test)
errstr = 'MSE = %.3f' % metrics.mean_squared_error(y_test, y_pred)
plt.text(-5, 50, scorestr, fontsize=12)
plt.text(-5, 45, errstr, fontsize=12)
