Kernel Support Vector Machine
The important parameters of kernel SVM are regularization parameter C, selection of kernel and parameters related to kernel.
- It performs well in low-dimensional data and high-dimensional data.
- But the scaling of sample size is not good.
- Preprocessing data and parameterization require great care.
Linear models may be very limited in low-dimensional space, because the flexibility of lines and planes is limited. Adding more features makes linear models more flexible.
import mglearn from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import numpy as np X,y = make_blobs(centers = 4, random_state = 8) y = y%2 #Four clusters are divided into two categories mglearn.discrete_scatter(X[:,0],X[:,1],y) plt.xlabel("Feature 0") plt.ylabel("Feature 1") #Linear models for classification can only divide data with one straight line from sklearn.svm import LinearSVC linear_svm = LinearSVC().fit(X,y) mglearn.plots.plot_2d_separator(linear_svm,X) #Boundary Visualization #Add the square of the second feature to represent each data point as a three-dimensional point as a new feature X_new = np.hstack([X,X[:,1:]**2]) #print(X_new) from mpl_toolkits.mplot3d import Axes3D,axes3d figure = plt.figure() #3D visualization ax = Axes3D(figure, elev = -152, azim = -26) #First draw all y = 0 points, then draw all y = 1 points. mask = y == 0 #When mask is true, this line has ax.scatter(X_new[mask,0],X_new[mask,1],X_new[mask,2], c = 'b', cmap = mglearn.cm2, s = 60) ax.scatter(X_new[~mask,0],X_new[~mask,1],X_new[~mask,2], c = 'r', marker = '^', cmap = mglearn.cm2, s = 60) ax.set_xlabel = ("feature0") ax.set_ylabel = ("feature1") ax.set_zlabel = ("feature1 ** 2") #Now we can use a linear model to separate the two categories. linear_svm_3d = LinearSVC().fit(X_new, y) coef, intercept = linear_svm_3d.coef_.ravel(), linear_svm_3d.intercept_ #Displaying Linear Decision Boundary ''' print(X_new[:,0].min() - 2) print(X_new[:,0].max() + 2) ''' xx = np.linspace(X_new[:,0].min() - 2, X_new[:,0].max() + 2, 50) yy = np.linspace(X_new[:,1].min() - 2, X_new[:,1].max() + 2, 50) XX,YY = np.meshgrid(xx,yy) ZZ = (coef[0] * XX + coef[1] * YY + intercept) / -coef[2] ax.plot_surface(XX,YY,ZZ, rstride = 8, cstride = 8, alpha = 0.3) #At this point, if the linear SVM is regarded as a function of the original feature, it is actually not linear anymore. fig = plt.figure() ax = fig.add_subplot(111) ZZ = YY ** 2 dec = linear_svm_3d.decision_function(np.c_[XX.ravel(),YY.ravel(),ZZ.ravel()]) plt.contourf(XX, YY, dec.reshape(XX.shape),levels = [dec.min(),0,dec.max()],cmap = mglearn.cm2, alpha = 0.5) mglearn.discrete_scatter(X[:,0],X[:,1],y)
Nuclear skills
Kernel techniques can be used to learn classifiers in higher dimensional spaces without actually calculating new data representations that may be very large. The principle is to directly calculate the distance (inner product) of the data points in the extended feature representation without actually calculating the extension.
Two Common Methods of Mapping Data to Higher Dimensional Space by Support Vector Machine
- The polynomial kernel calculates all possible polynomials of the original features within a certain order.
- The Radial Basis Function Kernel (Gauss Kernel) considers all possible polynomials of all orders. But the higher the order, the less important the feature is.
Support Vector: The points on the boundary of the category. SVM learns the importance of each training data point to represent the decision boundary between two categories.
Predict the distance between the new sample points and each support vector. Classification decision-making is based on the distance between it and support vector and the importance of support vector learned during training. The distance between data points is given by the kernel, which can be a Gauss kernel.
#Training SVM on forge dataset from sklearn.svm import SVC import mglearn import matplotlib.pyplot as plt X,y = mglearn.tools.make_handcrafted_dataset() #The core is the Gauss core. C parameter is a regularization parameter, which limits the importance of each point. gamma parameters are used to control the width of Gaussian kernels and determine the distance between points. svm = SVC(kernel = 'rbf', C = 10, gamma = 0.1).fit(X,y) mglearn.plots.plot_2d_separator(svm, X, eps=.5) #Visualization of decision boundary mglearn.discrete_scatter(X[:,0],X[:,1],y) #Draw points sv = svm.support_vectors_ #Assignment Support Vector #print(sv) sv_labels = svm.dual_coef_.ravel() > 0 #The class label of support vector is given by positive and negative sign of dual_coef_ #print(sv_labels) mglearn.discrete_scatter(sv[:,0],sv[:,1],sv_labels, s = 15, markeredgewidth = 3) #Marker edge width, s marker size plt.xlabel("Feature 0") plt.ylabel("Feature 1") #The smaller gamma means that the radius of the Gaussian kernel is larger, and many points are considered to be closer. The smaller gamma value means that the decision boundary changes slowly and the model with lower complexity is generated. #C value is very small, indicating that the model is very limited, and the influence range of each point is limited. fig, axes = plt.subplots(3,3,figsize = (15,10)) for ax, C in zip(axes, [-1,0,3]): for a, gamma in zip(ax, range(-1,2)): mglearn.plots.plot_svm(log_C = C, log_gamma = gamma, ax = a) axes[0,0].legend(['class 0','class 1','sv class 0','sv class 1'],ncol = 4, loc = (.9,1.2))
#RBF core SVM is applied to breast cancer data set, C = 1, gamma = 1/n_features by default from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.svm import SVC cancer = load_breast_cancer() X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,random_state = 0) svc = SVC() svc.fit(X_train, y_train) #SVM is very sensitive to parameter setting and data scaling. It requires that all features have similar ranges of variation. print("Accuracy on training set:{:.3f}".format(svc.score(X_train,y_train))) print("Accuracy on test set:{:.3f}".format(svc.score(X_test,y_test))) #View the minimum and maximum values of each feature and plot them in logarithmic coordinates plt.plot(X_train.min(axis = 0),'o',label = 'min') plt.plot(X_train.max(axis = 0),'^',label = 'max') #axis = 0 is column plt.legend(loc = 4) plt.xlabel('Feature index') plt.ylabel('Feature magnitude') plt.yscale('log') #Calibration of coordinates #Determine that the characteristics of breast cancer datasets have completely different orders of magnitude #Scale data to roughly the same extent, such as scaling all features between 0 and 1 min_on_training = X_train.min(axis = 0)#Computing the Minimum Value of Each Feature in Training Set #Calculate the range of each feature in the training set range_on_training = (X_train - min_on_training).max(axis = 0) #Subtract the minimum by the range X_train_scaled = (X_train - min_on_training) / range_on_training print("Mininum for each feature\n{}".format(X_train_scaled.min(axis = 0))) print("Maxinum for each feature\n{}".format(X_train_scaled.max(axis = 0))) #The performance of training set and test set is very close, but not nearly 100%, so it may be under-fitting, try to increase C or gamma.
Doubts in Code
np.vstack() and np.hstack()
https://blog.csdn.net/m0_37393514/article/details/79538748
meshgrid function in numpy
https://blog.csdn.net/sinat_29957455/article/details/78825945
vm support vector machine
https://blog.csdn.net/weixin_37947156/article/details/76578261
The basis of matplotlib drawing
https://blog.csdn.net/pipisorry/article/details/37742423