Kernel Support Vector Machine for Sklearns Library Learning

Keywords: Big Data less

Kernel Support Vector Machine

The important parameters of kernel SVM are regularization parameter C, selection of kernel and parameters related to kernel.

  1. It performs well in low-dimensional data and high-dimensional data.
  2. But the scaling of sample size is not good.
  3. Preprocessing data and parameterization require great care.

Linear models may be very limited in low-dimensional space, because the flexibility of lines and planes is limited. Adding more features makes linear models more flexible.

import mglearn
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import numpy as np

X,y = make_blobs(centers = 4, random_state = 8)
y = y%2 #Four clusters are divided into two categories
mglearn.discrete_scatter(X[:,0],X[:,1],y)
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")

#Linear models for classification can only divide data with one straight line
from sklearn.svm import LinearSVC
linear_svm = LinearSVC().fit(X,y)

mglearn.plots.plot_2d_separator(linear_svm,X) #Boundary Visualization

#Add the square of the second feature to represent each data point as a three-dimensional point as a new feature
X_new = np.hstack([X,X[:,1:]**2])
#print(X_new)
from mpl_toolkits.mplot3d import Axes3D,axes3d
figure = plt.figure()
#3D visualization
ax = Axes3D(figure, elev = -152, azim = -26)
#First draw all y = 0 points, then draw all y = 1 points.
mask = y == 0

#When mask is true, this line has
ax.scatter(X_new[mask,0],X_new[mask,1],X_new[mask,2], c = 'b', cmap = mglearn.cm2, s = 60)
ax.scatter(X_new[~mask,0],X_new[~mask,1],X_new[~mask,2], c = 'r', marker = '^', cmap = mglearn.cm2, s = 60)

ax.set_xlabel = ("feature0")
ax.set_ylabel = ("feature1")
ax.set_zlabel = ("feature1 ** 2")

#Now we can use a linear model to separate the two categories.
linear_svm_3d = LinearSVC().fit(X_new, y)
coef, intercept = linear_svm_3d.coef_.ravel(), linear_svm_3d.intercept_

#Displaying Linear Decision Boundary
'''
print(X_new[:,0].min() - 2)
print(X_new[:,0].max() + 2)
'''

xx = np.linspace(X_new[:,0].min() - 2, X_new[:,0].max() + 2, 50)
yy = np.linspace(X_new[:,1].min() - 2, X_new[:,1].max() + 2, 50)

XX,YY = np.meshgrid(xx,yy)

ZZ = (coef[0] * XX + coef[1] * YY + intercept) / -coef[2]
ax.plot_surface(XX,YY,ZZ, rstride = 8, cstride = 8, alpha = 0.3)

#At this point, if the linear SVM is regarded as a function of the original feature, it is actually not linear anymore.
fig = plt.figure()
ax = fig.add_subplot(111)
ZZ = YY ** 2
dec = linear_svm_3d.decision_function(np.c_[XX.ravel(),YY.ravel(),ZZ.ravel()])
plt.contourf(XX, YY, dec.reshape(XX.shape),levels = [dec.min(),0,dec.max()],cmap = mglearn.cm2, alpha = 0.5)
mglearn.discrete_scatter(X[:,0],X[:,1],y)

Nuclear skills

Kernel techniques can be used to learn classifiers in higher dimensional spaces without actually calculating new data representations that may be very large. The principle is to directly calculate the distance (inner product) of the data points in the extended feature representation without actually calculating the extension.

Two Common Methods of Mapping Data to Higher Dimensional Space by Support Vector Machine

  1. The polynomial kernel calculates all possible polynomials of the original features within a certain order.
  2. The Radial Basis Function Kernel (Gauss Kernel) considers all possible polynomials of all orders. But the higher the order, the less important the feature is.

Support Vector: The points on the boundary of the category. SVM learns the importance of each training data point to represent the decision boundary between two categories.

Predict the distance between the new sample points and each support vector. Classification decision-making is based on the distance between it and support vector and the importance of support vector learned during training. The distance between data points is given by the kernel, which can be a Gauss kernel.

#Training SVM on forge dataset
from sklearn.svm import SVC
import mglearn
import matplotlib.pyplot as plt

X,y = mglearn.tools.make_handcrafted_dataset()
#The core is the Gauss core. C parameter is a regularization parameter, which limits the importance of each point. gamma parameters are used to control the width of Gaussian kernels and determine the distance between points.
svm = SVC(kernel = 'rbf', C = 10, gamma = 0.1).fit(X,y) 

mglearn.plots.plot_2d_separator(svm, X, eps=.5) #Visualization of decision boundary
mglearn.discrete_scatter(X[:,0],X[:,1],y) #Draw points

sv = svm.support_vectors_ #Assignment Support Vector
#print(sv)
sv_labels = svm.dual_coef_.ravel() > 0 #The class label of support vector is given by positive and negative sign of dual_coef_
#print(sv_labels)
mglearn.discrete_scatter(sv[:,0],sv[:,1],sv_labels, s = 15, markeredgewidth = 3) #Marker edge width, s marker size

plt.xlabel("Feature 0")
plt.ylabel("Feature 1")


#The smaller gamma means that the radius of the Gaussian kernel is larger, and many points are considered to be closer. The smaller gamma value means that the decision boundary changes slowly and the model with lower complexity is generated.
#C value is very small, indicating that the model is very limited, and the influence range of each point is limited.
fig, axes = plt.subplots(3,3,figsize = (15,10))
for ax, C in zip(axes, [-1,0,3]):
    for a, gamma in zip(ax, range(-1,2)):
        mglearn.plots.plot_svm(log_C = C, log_gamma = gamma, ax = a)
        
axes[0,0].legend(['class 0','class 1','sv class 0','sv class 1'],ncol = 4, loc = (.9,1.2))

#RBF core SVM is applied to breast cancer data set, C = 1, gamma = 1/n_features by default
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,random_state = 0)

svc = SVC()
svc.fit(X_train, y_train)

#SVM is very sensitive to parameter setting and data scaling. It requires that all features have similar ranges of variation.
print("Accuracy on training set:{:.3f}".format(svc.score(X_train,y_train)))
print("Accuracy on test set:{:.3f}".format(svc.score(X_test,y_test)))

#View the minimum and maximum values of each feature and plot them in logarithmic coordinates
plt.plot(X_train.min(axis = 0),'o',label = 'min')
plt.plot(X_train.max(axis = 0),'^',label = 'max') #axis = 0 is column
plt.legend(loc = 4)
plt.xlabel('Feature index')
plt.ylabel('Feature magnitude')
plt.yscale('log') #Calibration of coordinates
#Determine that the characteristics of breast cancer datasets have completely different orders of magnitude

#Scale data to roughly the same extent, such as scaling all features between 0 and 1
min_on_training = X_train.min(axis = 0)#Computing the Minimum Value of Each Feature in Training Set
#Calculate the range of each feature in the training set 
range_on_training = (X_train - min_on_training).max(axis = 0)
#Subtract the minimum by the range
X_train_scaled = (X_train - min_on_training) / range_on_training
print("Mininum for each feature\n{}".format(X_train_scaled.min(axis = 0)))
print("Maxinum for each feature\n{}".format(X_train_scaled.max(axis = 0)))


#The performance of training set and test set is very close, but not nearly 100%, so it may be under-fitting, try to increase C or gamma.

Doubts in Code

np.vstack() and np.hstack()
https://blog.csdn.net/m0_37393514/article/details/79538748

meshgrid function in numpy
https://blog.csdn.net/sinat_29957455/article/details/78825945

vm support vector machine
https://blog.csdn.net/weixin_37947156/article/details/76578261

The basis of matplotlib drawing
https://blog.csdn.net/pipisorry/article/details/37742423

Posted by cybaf on Mon, 28 Jan 2019 12:33:14 -0800