Hyperopt library provides algorithm and parallel scheme for model selection and parameter optimization in python. The common models of machine learning are KNN,SVM, PCA, decision tree, GBDT and a series of algorithms. But in practical application, we need to select the appropriate model and adjust the parameters of the model to get a combination of appropriate parameters. Especially in the parameter adjustment stage of the model, it takes a lot of time and energy, but it is inefficient. But we can look at this problem from another angle, the selection of model, and the parameters that need to be adjusted in the model, which can be seen as a set of variables. The quality criteria of the model (such as accuracy, AUC) and so on can be regarded as objective functions. This problem is the optimization of super-parameters. We can use search algorithm to solve this problem.
def q (args) :
x, y = args
return x ∗∗ 2 + y ∗∗ 2
Hyperopt provides an optimization interface, which accepts an evaluation function and parameter space, and can calculate the loss function value of a point in the parameter space. Users also specify the distribution of parameters in the space.
Hyheropt has four important factors: specifying functions to be minimized, searching space, trails database (optional), and searching algorithm (optional).
Firstly, we define an objective function, accept a variable, and return the loss value of a function after calculation, such as to minimize the function q(x,y) = x**2 + y**2:
from hyperopt import hp
space = [hp.uniform('x', 0, 1), hp.normal('y', 0, 1)]
Then, we define a parameter space, such as x in the range 0-1, y is a real number, so
Thirdly, specify the search algorithm, which is the algo parameter of the fmin function of hyperopt. The currently supported algorithms are random search (corresponding to hyperopt.rand.suggest), simulated annealing (corresponding to hyperopt.anneal.suggest), TPE algorithm. Take a chestnut:
from hyperopt import hp, fmin, rand, tpe, space_eval
best = fmin(q, space, algo=rand.suggest)
print space_eval(space, best)
The search algorithm itself has built-in parameters to determine how to optimize the objective function. We can specify parameters of the search algorithm, such as jobs for TPE:
from functools import partial
from hyperopt import hp, fmin, tpe
algo = partial(tpe.suggest, n_startup_jobs=10)
best = fmin(q, space, algo=algo)
print space_eval(space, best)
For parameter space settings, such as optimization function q, input fmin(q,space=hp.uniform('a', 0, 1). The first parameter of hp.uniform function is the label. Each hyperparameter must have a unique label in the parameter space. hp.uniform specifies the distribution of parameters. Other parameter distributions such as
hp.choice returns an option, which can be a list or tuple.options can be nested expressions used to compose conditional parameters.
hp.pchoice(label,p_options) returns an option of p_options with a certain probability. This option makes the possibility of each option uneven during the search process.
hp.uniform(label,low,high) parameters are uniformly distributed between low and high.
Hp.q uniform (label, low, high, q), the value of parameter is round(uniform(low,high)/q)*q, which is suitable for those discrete values.
hp.loguniform(label,low,high) plots exp (low, high). The range of variables is [exp(low),exp(high)]
hp.randint(label,upper) returns a random integer in an interval open before and after closing [0, upper].
Search space can contain list s and dictionaries.
from hyperopt import hp
list_space = [
hp.uniform('a', 0, 1),
hp.loguniform('b', 0, 1)]
tuple_space = (
hp.uniform('a', 0, 1),
hp.loguniform('b', 0, 1))
dict_space = {
'a': hp.uniform('a', 0, 1),
'b': hp.loguniform('b', 0, 1)}
Sample function is used to sample from parameter space:
from hyperopt.pyll.stochasti import sample
print sample(list_space)
# => [0.13, .235]
print sample(nested_space)
# => [[{'case': 1, 'a', 0.12'}, {'case': 2, 'b': 2.3}],
# 'extra_literal_string',
# 3]
Use functions in parameter space:
from hyperopt.pyll import scope
def foo(x):
return str(x) ∗ 3
expr_space = {
'a': 1 + hp.uniform('a', 0, 1),
'b': scope.minimum(hp.loguniform('b', 0, 1), 10),
'c': scope.call(foo, args=(hp.randint('c', 5),)),
}
———— It's a slightly short dusk secant.
A code using perceptron to discriminate iris data was found on the blog. The learning rate was 0.1 and the correct rate was 82% in a test set after 40 iterations. By using hyperopt to optimize the parameters, the accuracy is increased to 91%.
from sklearn import datasets
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=0)
ppn.fit(X_train_std, y_train)
y_pred = ppn.predict(X_test_std)
print accuracy_score(y_test, y_pred)
def percept(args):
global X_train_std,y_train,y_test
ppn = Perceptron(n_iter=int(args["n_iter"]),eta0=args["eta"]*0.01,random_state=0)
ppn.fit(X_train_std, y_train)
y_pred = ppn.predict(X_test_std)
return -accuracy_score(y_test, y_pred)
from hyperopt import fmin,tpe,hp,partial
space = {"n_iter":hp.choice("n_iter",range(30,50)),
"eta":hp.uniform("eta",0.05,0.5)}
algo = partial(tpe.suggest,n_startup_jobs=10)
best = fmin(percept,space,algo = algo,max_evals=100)
print best
print percept(best)
#0.822222222222
#{'n_iter': 14, 'eta': 0.12877033763511717}
#-0.911111111111
Xgboost has many parameters, the code of xgboost is written as a function, then it is passed into fmin to optimize the parameters, and the AUC of cross-validation is taken as the optimization goal. The larger the auc, the better. Since fmin is the minimum value, the minimum value of - AUC is obtained. The data set used is the data set of 202 columns, the first column is sample id, the last column is label, and the middle 200 columns are attributes.
#coding:utf-8
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import xgboost as xgb
from random import shuffle
from xgboost.sklearn import XGBClassifier
from sklearn.cross_validation import cross_val_score
import pickle
import time
from hyperopt import fmin, tpe, hp,space_eval,rand,Trials,partial,STATUS_OK
def loadFile(fileName = "E://zalei//browsetop200Pca.csv"):
data = pd.read_csv(fileName,header=None)
data = data.values
return data
data = loadFile()
label = data[:,-1]
attrs = data[:,:-1]
labels = label.reshape((1,-1))
label = labels.tolist()[0]
minmaxscaler = MinMaxScaler()
attrs = minmaxscaler.fit_transform(attrs)
index = range(0,len(label))
shuffle(index)
trainIndex = index[:int(len(label)*0.7)]
print len(trainIndex)
testIndex = index[int(len(label)*0.7):]
print len(testIndex)
attr_train = attrs[trainIndex,:]
print attr_train.shape
attr_test = attrs[testIndex,:]
print attr_test.shape
label_train = labels[:,trainIndex].tolist()[0]
print len(label_train)
label_test = labels[:,testIndex].tolist()[0]
print len(label_test)
print np.mat(label_train).reshape((-1,1)).shape
def GBM(argsDict):
max_depth = argsDict["max_depth"] + 5
n_estimators = argsDict['n_estimators'] * 5 + 50
learning_rate = argsDict["learning_rate"] * 0.02 + 0.05
subsample = argsDict["subsample"] * 0.1 + 0.7
min_child_weight = argsDict["min_child_weight"]+1
print "max_depth:" + str(max_depth)
print "n_estimator:" + str(n_estimators)
print "learning_rate:" + str(learning_rate)
print "subsample:" + str(subsample)
print "min_child_weight:" + str(min_child_weight)
global attr_train,label_train
gbm = xgb.XGBClassifier(nthread=4, #Process number
max_depth=max_depth, #Maximum depth
n_estimators=n_estimators, #Number of trees
learning_rate=learning_rate, #learning rate
subsample=subsample, #Sampling number
min_child_weight=min_child_weight, #Number of children
max_delta_step = 10, #If you don't drop 10 steps, stop.
objective="binary:logistic")
metric = cross_val_score(gbm,attr_train,label_train,cv=5,scoring="roc_auc").mean()
print metric
return -metric
space = {"max_depth":hp.randint("max_depth",15),
"n_estimators":hp.randint("n_estimators",10), #[0,1,2,3,4,5] -> [50,]
"learning_rate":hp.randint("learning_rate",6), #[0,1,2,3,4,5] -> 0.05,0.06
"subsample":hp.randint("subsample",4),#[0,1,2,3] -> [0.7,0.8,0.9,1.0]
"min_child_weight":hp.randint("min_child_weight",5), #
}
algo = partial(tpe.suggest,n_startup_jobs=1)
best = fmin(GBM,space,algo=algo,max_evals=4)
print best
print GBM(best)
Introduction of hyperopt literature:
Links: http://pan.baidu.com/s/1i5aAXKx Password: 2gtr