<p> Review of the Previous Situation</p>
<p>Machine Learning 100 Days | Day1 Data Preprocessing
100 Days Machine Learning | Day2 Simple Linear Regression Analysis
100 Days Machine Learning | Day3 Multivariate Linear Regression
100 Days Machine Learning | Day4-6 Logical Regression
100 Days Machine Learning | Day7 K-NN
The Mathematical Principle of Day8 Logic Regression
100 Days Machine Learning | Day9-12 Support Vector Machine
Make Machine Learning in 100 Days | Day11 Implementing KNN
100-day Machine Learning | Day13-14 SVM Implementation
100 Days Machine Learning | Day15 Naive Bayes
100 Days Machine Learning | Day16 Implements SVM through Kernel Skills
First, let's talk about how familiar we need to be with svm algorithm. Here we quote July's microblog, the founder of online in July.
When SVM understands to a certain extent, it can deduce relevant formulas from the beginning to the end. The initial classification function, maximizing the classification interval, max1/| w |, min1/2 | w | ^ 2, convex quadratic programming, Lagrange function, are transformed into dual problems. SMO algorithm is to find an optimal solution, an optimal classification plane. . Step by step, sorting out why so, so many things can be traced, and finally achieved.
sklearn.svm
Sklearn s include many common algorithms. Sckit-learns call learning modes. It has a strong unity. It is a truth to call machine learning methods. The algorithm is a class, which contains fit(),predict() and many other methods. We only need to input training samples and markers, as well as models. Some possible parameters naturally give the classification results directly.
To sum up, there are eight words: import - Modeling - Training - Prediction
Let's start with a small example and then go into detail.
import numpy as np X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.svm import NuSVC clf = NuSVC() clf.fit(X, y) print(clf.fit(X,y)) NuSVC(cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', max_iter=-1, nu=0.5, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) print(clf.predict([[-0.8, -1]]))
[1]
More cases, you can move to scikit-learn s website
https://scikit-learn.org/stab...
The algorithm library of SVM in scikit-learn s can be divided into two categories.
One is classified algorithm library, including SVC, NuSVC and Linear SVC.
The other is regression algorithm library, including SVR, NuSVR and Linear SVR.
The related classes are wrapped in the sklearn.svm module.
For three classes of SVC, NuSVC and Linear SVC, SVC and NuSVC are similar. The difference lies only in the different ways of measuring loss. Linear SVC can be seen from its name that it is a linear classification, that is to say, it does not support all kinds of low-dimensional to high-dimensional kernels, only linear kernels, and linear indivisible data. Out of commission.
Similarly, for the three regression classes of SVR, NuSVR and Linear SVR, SVR is similar to NuSVR, and the difference lies only in the different ways of measuring losses. Linear SVR is linear regression and can only use linear kernel function.
Now let's just look at the detailed usage of SVC. NuSVC and LinearSVC suggest that you take a look at the table of Pinard@cnblogs statistics of Liu Jianping.
https://www.cnblogs.com/pinar...
The SVC function has 14 parameters:
Explanation of SVC parameters
(1)C: The penalty coefficient C of the objective function is used to balance the margin of classification interval and the misclassification sample, default C = 1.0;
(2)kernel: parameter selection is RBF, Linear, Poly, Sigmoid, default is "RBF";
(3)degree: if you choose'Poly'in param 2, this is effective, degree determines the highest power of a polynomial;
(4)gamma: the coefficients of the kernel function ('Poly','RBF'and'Sigmoid'), the default is gamma = 1/n_features;
(5)coef0: the independent terms in the kernel function,'RBF'and'Poly' are valid;
(6)probablity: whether or not the possibility estimates are used (true or false);
(7)shrinking: heuristic or not;
(8) tol (default = 1E - 3): the accuracy of the end standard of svm;
(9)cache_size: the memory required for training (in MB units);
(10)class_weight: The weight occupied by each class. Different classes set different penalty parameter C, which is adaptive by default.
(11)verbose: related to multithreading;
(12)max_iter: maximum number of iterations, default = 1, if max_iter = 1, no limited;
(13)decision_function_shape:'ovo'one-to-one,'ovr' many-to-many or None, default=None
(14)random_state: The seed of a pseudo-random number generator for data rearrangement in probability estimation.
How to Select Kernel Functions
1) Linear Kernel is expressed as K(x,z)=x_z, which is the common inner product. Linear SVC and Linear SVR can only use it.
2) Polynomial Kernel is one of the commonly used kernels in linear inseparable SVM. Its expression is as follows: gamma, R and d all need to be defined by their own parameters, which is troublesome.
3) Gauss Kernel, also known as Radial Basis Function (RBF) in SVM, is the default kernel function of libsvm and, of course, the default kernel function of scikit-learn s. The expression is: where gamma is greater than 0, it needs to be defined by its own parameters.
4) Sigmoid Kernel is also one of the commonly used kernels in linear inseparable SVM. The expression is as follows: where gamma and r need to be defined by their own parameters.
More cases, you can move to scikit-learn s website
The most commonly used kernels are Linear and RBF. It should be noted that data normalization processing.
Linear: Mainly used in linear separable cases. With fewer parameters and faster speed, the classification effect is ideal for general data.
2. RBF: It is mainly used in the case of linear inseparability. There are many parameters, and the classification results are very dependent on the parameters.
Wu Enda has also given the method of selecting the kernel function.
1. If the number of Feature s is large, similar to the number of samples, choose LR or Linear Kernel's SVM at this time.
2. If the number of features is relatively small and the number of samples is general, not too big or too small, SVM+Gaussian Kernel is chosen.
3. If the number of features is small and the number of samples is large, you need to add some features manually to the first case.
Kernel Trick Implementing svm
04
The main idea and arithmetic flow are from Li Hang's Statistical Learning Method and the previously recommended Triple Realms of Understanding SVM (PDF at the end of the article).
#coding=utf-8 import time import random import numpy as np import math import copy a=np.matrix([[1.2,3.1,3.1]]) #print a.astype(int) #print a.A class SVM: def __init__(self,data,kernel,maxIter,C,epsilon): self.trainData=data self.C=C #Punishment factor self.kernel=kernel self.maxIter=maxIter self.epsilon=epsilon self.a=[0 for i in range(len(self.trainData))] self.w=[0 for i in range(len(self.trainData[0][0]))] self.eCache=[[0,0] for i in range(len(self.trainData))] self.b=0 self.xL=[self.trainData[i][0] for i in range(len(self.trainData))] self.yL=[self.trainData[i][1] for i in range(len(self.trainData))] def train(self): #support_Vector=self.__SMO() self.__SMO() self.__update() def __kernel(self,A,B): #Kernel function is the mapping of input vectors from low dimension to high dimension. res=0 if self.kernel=='Line': res=self.__Tdot(A,B) elif self.kernel[0]=='Gauss': K=0 for m in range(len(A)): K+=(A[m]-B[m])**2 res=math.exp(-0.5*K/(self.kernel[1]**2)) return res def __Tdot(self,A,B): res=0 for k in range(len(A)): res+=A[k]*B[k] return res def __SMO(self): #SMO is an iterative optimization algorithm based on KKT condition #SMO is the core algorithm of SVM support_Vector=[] self.a=[0 for i in range(len(self.trainData))] pre_a=copy.deepcopy(self.a) for it in range(self.maxIter): flag=1 for i in range(len(self.xL)): #print self.a #Updating the Solution of self.a Using Machine Learning #Computing j updates diff=0 self.__update() #The algorithm of choosing the maximum error j Danish Polytechnic University is to circle j on the data set. Random selection i is obviously inefficient. #Machine Learning Practical Coin Books Express Normal Code Chaos and Error Heuristic Search Ei=self.__calE(self.xL[i],self.yL[i]) j,Ej=self.__chooseJ(i,Ei) #Calculating L H (L,H)=self.__calLH(pre_a,j,i) #The idea is to derive the function expressed as the unique variable of self.a[j] (first derivative = 0 update) kij=self.__kernel(self.xL[i],self.xL[i])+self.__kernel(self.xL[j],self.xL[j])-2*self.__kernel(self.xL[i],self.xL[j]) #print kij,"aa" if(kij==0): continue self.a[j] = pre_a[j] + float(1.0*self.yL[j]*(Ei-Ej))/kij #The next one is L, i.e. intercept, which is 0 when it is less than 0. #The last time it was H, that is, the maximum value. When it was greater than H, it was H. self.a[j] = min(self.a[j], H) self.a[j] = max(self.a[j], L) #self.a[j] = min(self.a[j], H) #print L,H self.eCache[j]=[1,self.__calE(self.xL[j],self.yL[j])] self.a[i] = pre_a[i]+self.yL[i]*self.yL[j]*(pre_a[j]-self.a[j]) self.eCache[i]=[1,self.__calE(self.xL[i],self.yL[i])] diff=sum([abs(pre_a[m]-self.a[m]) for m in range(len(self.a))]) #print diff,pre_a,self.a if diff < self.epsilon: flag=0 pre_a=copy.deepcopy(self.a) if flag==0: print (it,"break") break #return support_Vector def __chooseJ(self,i,Ei): self.eCache[i]=[1,Ei] chooseList=[] #print self.eCache #chooseList Error Cache to Get Alternative j List from Error Cache: Solving the Initial Selection Problem for p in range(len(self.eCache)): if self.eCache[p][0]!=0 and p!=i: chooseList.append(p) if len(chooseList)>1: delta_E=0 maxE=0 j=0 Ej=0 for k in chooseList: Ek=self.__calE(self.xL[k],self.yL[k]) delta_E=abs(Ek-Ei) if delta_E>maxE: maxE=delta_E j=k Ej=Ek return j,Ej else: #Initial state j=self.__randJ(i) Ej=self.__calE(self.xL[j],self.yL[j]) return j,Ej def __randJ(self,i): j=i while(j==i): j=random.randint(0,len(self.xL)-1) return j def __calLH(self,pre_a,j,i): if(self.yL[j]!= self.yL[i]): return (max(0,pre_a[j]-pre_a[i]),min(self.C,self.C-pre_a[i]+pre_a[j])) else: return (max(0,-self.C+pre_a[i]+pre_a[j]),min(self.C,pre_a[i]+pre_a[j])) def __calE(self,x,y): #print x,y y_,q=self.predict(x) return y_-y def __calW(self): self.w=[0 for i in range(len(self.trainData[0][0]))] for i in range(len(self.trainData)): for j in range(len(self.w)): self.w[j]+=self.a[i]*self.yL[i]*self.xL[i][j] def __update(self): #Update self.b and self.w self.__calW() #The solution b under self.w is obtained. #print self.a maxf1=-99999 min1=99999 for k in range(len(self.trainData)): y_v=self.__Tdot(self.w,self.xL[k]) #print y_v if self.yL[k]==-1: if y_v>maxf1: maxf1=y_v else: if y_v<min1: min1=y_v self.b=-0.5*(maxf1+min1) def predict(self,testData): pre_value=0 #Change from trainData to suport_Vector for i in range(len(self.trainData)): pre_value+=self.a[i]*self.yL[i]*self.__kernel(self.xL[i],testData) pre_value+=self.b #print pre_value,"pre_value" if pre_value<0: y=-1 else: y=1 return y,abs(pre_value-0) def save(self): pass def LoadSVM(): pass
Save as SVM.py
from SVM import * data=[ [[1,1],1], [[2,1],1], [[1,0],1], [[3,7],-1], [[4,8],-1], [[4,10],-1], ] #If it's a Gauss core ['Gauss', standard deviation] svm=SVM(data,'Line',1000,0.02,0.001) print (svm.predict([4,0])) (1, 0.6300000000000001) print (svm.a) [0.02, 0.0, 0.0, 0.0, 0.0, 0.02] print (svm.w) [-0.06, -0.18000000000000002] print (svm.b) 0.8700000000000001
Reference:
https://www.cnblogs.com/pinar...
http://www.cnblogs.com/tornad...
https://blog.csdn.net/IT_zxl0...
https://cuijiahua.com/blog/20...
https://blog.csdn.net/sinat_3...
Machine Learning Practice Chapter 6