Deep learning practice Chapter 2 quick start to machine learning

Keywords: Python Machine Learning Deep Learning

Reference book "deep learning practice" by Yang Yun and Du Fei

softmax implementation exercise

In the exercises in this chapter, we will gradually complete:

  • 1. Be familiar with using CIFAR-10 dataset
  • 2. Code softmax_ loss_ The naive function uses an explicit loop to calculate the loss function and the gradient
  • 3. Code softmax_ loss_ The vectorized function uses the vectorized expression to calculate the loss function and the gradient
  • 4. The coded minimum batch gradient descent algorithm trains the softmax classifier
  • 5. Use validation data to select super parameters
#-*- coding: utf-8 -*-
import random
import numpy as np
from utils.data_utils import load_CIFAR10
from classifiers.chapter2 import *
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) 
%load_ext autoreload
%autoreload 2
# Import CIFAR-10 data
cifar10_dir = 'datasets\\cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# View data
print('Training data (number of data, data dimension): ', X_train.shape)
print('Training data markers (number of data markers,): ', y_train.shape)
print('Test data (number of data, data dimension): ', X_test.shape)
print('Test data markers (number of data markers,): ', y_test.shape)
Training data (number of data, data dimension):  (50000, 32, 32, 3)
Training data markers (number of data markers,):  (50000,)
Test data (number of data, data dimension):  (10000, 32, 32, 3)
Test data markers (number of data markers,):  (10000,)
# Data visualization
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
# Total categories
num_classes = len(classes)
# Number of samples per category
samples_per_class = 7
for y, cls in enumerate(classes): # Loop the element position and element of the list, y represents the element position (0,num_class), cls element itself 'plane', etc
    # numpy.flatnonzero():
    # This function inputs a matrix and returns the position (index) of non-zero elements in the flattened matrix
    idxs = np.flatnonzero(y_train == y)  # Find the location of the y class in the label
    idxs = np.random.choice(idxs, samples_per_class, replace=False) #Select the seven samples we need
    for i, idx in enumerate(idxs): # Cycle the position of the selected sample and the position of the picture corresponding to the sample in the training set
        plt_idx = i * num_classes + y + 1  # Calculation of position in subgraph
        plt.subplot(samples_per_class, num_classes, plt_idx)  # Indicate the number of subgraphs to be drawn
        plt.imshow(X_train[idx].astype('uint8'))  # Drawing
        plt.axis('off')
        if i == 0:
            plt.title(cls)  # Write the title, that is, the class alias
plt.show()  # display



1. Data preprocessing

In general, we need to normalize the input data, that is, make the data have a standard normal distribution with zero mean and 1 variance. Since the feature range of the image is [0255], its variance has been constrained. We only need to centralize the data with zero mean, and there is no need to compress the data in the range of [- 1,1] (of course, you can also do this).

def get_CIFAR10_data(num_training=49000, num_validation=100, num_test=10000, num_sample=250):
    '''
    Skilled use CIFAR-10 data set
    CIFAR-10 The data set includes 60000 pieces with a size of 32*32 Ten categories of pictures.
    num_training: Number of training data samples
    num_validation: Number of validation data samples
    num_test: Number of test data samples
    num_sample: Sample data samples
    '''
    # Import CIFAR-10 data
    cifar10_dir = 'datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # Sampling data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask] 
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]
    mask = np.random.choice(num_training, num_sample, replace=False)
    X_sample = X_train[mask]
    y_sample = y_train[mask]
    
    # Data shape conversion
    # Compress (width, height, color track) on one dimension
    # Reshape data into (number of data, dimension) shape
    X_train = np.reshape(X_train,(X_train.shape[0], -1))
    X_val = np.reshape(X_val,(X_val.shape[0], -1))
    X_test = np.reshape(X_test,(X_test.shape[0], -1))
    X_sample = np.reshape(X_sample,(X_sample.shape[0], -1))
    
    # data normalization 
    # Data preprocessing, subtracting its mean
    mean_image = np.mean(X_train, axis = 0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_sample -= mean_image
    
    # Add an offset to each column of data
    # np.hstack superimposes the element array of parameter tuples horizontally
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_sample = np.hstack([X_sample, np.ones((X_sample.shape[0], 1))])
    
    return X_train, y_train, X_val, y_val, X_test, y_test, X_sample, y_sample

X_train, y_train, X_val, y_val, X_test, y_test, X_sample, y_sample = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('sample data shape: ', X_sample.shape)
print('sample labels shape: ', y_sample.shape)
Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (100, 3073)
Validation labels shape:  (100,)
Test data shape:  (10000, 3073)
Test labels shape:  (10000,)
sample data shape:  (250, 3073)
sample labels shape:  (250,)

2. Calculate the loss function and gradient using explicit loops

Please use as few loops as possible. The fewer loops you use, you can save more computing time. It is necessary to gradually adapt to vectorization calculation expression, because using vectorization expression can not only write a brief introduction, greatly improve the readability of the code, not easy to produce errors, but also greatly improve the operation efficiency.

# First, we will implement the loss function (cost function) of softmax in a circular way
# Open classifiers/chapter2/softmax_loss.py file and implement softmax_loss_naive function
# When finished, run the unit code
from classifiers.chapter2.softmax_loss import softmax_loss_naive
import time

# Initialize weight
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_sample, y_sample, 0.0)

# Your initialization loss value should be close to - log(0.1)
print('You achieved it softmax magnitude of the loss loss: %f' % loss)
print('Correct loss value: %f' % ( -np.log(0.1) ))
You achieved it softmax magnitude of the loss loss: 2.322683
 Correct loss value: 2.302585
# Use numerical gradients to test your implemented softmax_loss_naive
# The gradient you achieve should be close to the numerical gradient
from utils.gradient_check import grad_check_sparse
loss, grad = softmax_loss_naive(W, X_sample, y_sample, 0.0)

print('Check the weight attenuation softmax_loss_naive Gradient:')
f = lambda w: softmax_loss_naive(w, X_sample, y_sample, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

print('Check the weight after adding the weight attenuation term softmax_loss_naive Gradient:')
loss, grad = softmax_loss_naive(W, X_sample, y_sample, 1e2)
f = lambda w: softmax_loss_naive(w, X_sample, y_sample, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)
Check the weight attenuation softmax_loss_naive Gradient:
numerical: -1.643731 analytic: -1.643731, relative error: 1.767333e-08
numerical: -0.903899 analytic: -0.903899, relative error: 4.933728e-08
numerical: -1.871885 analytic: -1.871886, relative error: 1.466657e-08
numerical: 1.315985 analytic: 1.315985, relative error: 2.012378e-08
numerical: 1.398138 analytic: 1.398138, relative error: 2.379984e-08
numerical: 0.787859 analytic: 0.787859, relative error: 3.121790e-08
numerical: -2.592382 analytic: -2.592382, relative error: 2.092500e-08
numerical: -2.854713 analytic: -2.854713, relative error: 7.280248e-09
numerical: 1.232664 analytic: 1.232664, relative error: 2.542693e-08
numerical: 1.706167 analytic: 1.706167, relative error: 3.109060e-08
 Check the weight after adding the weight attenuation term softmax_loss_naive Gradient:
numerical: -0.176439 analytic: -0.176439, relative error: 3.089932e-07
numerical: -3.940286 analytic: -3.940286, relative error: 7.815151e-09
numerical: -1.458173 analytic: -1.458173, relative error: 3.271034e-09
numerical: -0.478771 analytic: -0.478771, relative error: 9.152370e-09
numerical: -0.668009 analytic: -0.668009, relative error: 1.134638e-07
numerical: -1.660221 analytic: -1.660221, relative error: 1.547873e-08
numerical: 1.744188 analytic: 1.744188, relative error: 2.014491e-08
numerical: -0.485254 analytic: -0.485254, relative error: 5.129877e-08
numerical: -2.515499 analytic: -2.515499, relative error: 2.575850e-09
numerical: 2.403642 analytic: 2.403642, relative error: 1.413098e-08

3. Calculate the loss function and gradient using vectorization expression

# Now we will implement the vectorized softmax loss and its gradient calculation
# Open softmax_loss_vectorized function and complete the corresponding task, run the code
# The vectorized version should be the same as the explicit loop version, but the former should be much faster
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_sample, y_sample, 0.00001)
toc = time.time()
print('Explicit circular version loss: %e   Spend time %fs' % (loss_naive, toc - tic))

from classifiers.chapter2.softmax_loss import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_sample, y_sample, 0.00001)
toc = time.time()
print('Vectorized version loss: %e    Spend time %fs' % (loss_vectorized, toc - tic))

# Comparison results
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('Loss error: %f' % np.abs(loss_naive - loss_vectorized))
print('Gradient error: %f' % grad_difference)
Explicit circular version loss: 2.322683e+00   Time spent 0.101504s
 Vectorized version loss: 2.322683e+00    Time spent 0.001956s
 Loss error: 0.000000
 Gradient error: 0.000000

4. The minimum batch gradient descent algorithm trains the softmax classifier

# Open softmax.trian() to complete the random gradient descent task, and then execute the code
from classifiers.chapter2.softmax import *
softmax = Softmax()
tic = time.time()
loss_hist = softmax.train(X_sample, y_sample, learning_rate=1e-7, reg=5e4,
                      num_iters=3500, verbose=True)
toc = time.time()
print('Spend time %fs' % (toc - tic))
Iterations 0 / 3500: loss 777.592212
 Number of iterations 500 / 3500: loss 7.042712
 Number of iterations 1000 / 3500: loss 1.952050
 Number of iterations 1500 / 3500: loss 1.920996
 Number of iterations 2000 / 3500: loss 1.897792
 Number of iterations 2500 / 3500: loss 1.923868
 Number of iterations 3000 / 3500: loss 1.904769
 Take time 11.818226s
# View the change of loss value
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



# Test the training set and verify the accuracy of the set
y_train_pred = softmax.predict(X_sample)
print(y_train_pred.shape)
print('Amount of training data:%f    Training accuracy: %f' % (X_sample.shape[0],np.mean(y_sample == y_train_pred), ))
y_val_pred = softmax.predict(X_val)
print('Amount of validation data:%f    Verification accuracy: %f' % (X_val.shape[0],np.mean(y_val == y_val_pred), ))
(250,)
Training data volume: 250.000000    Training accuracy: 0.576000
 Amount of validation data: 100.000000    Verification accuracy: 0.240000

5. Use validation data to select super parameters

# Use the verification set to adjust the super parameters (weight, attenuation factor, learning rate)
from classifiers.chapter2.softmax import *
results = {}
best_val = -1
best_l = 0  # Best learning rate
best_r = 0  # Optimal weight attenuation factor
best_softmax = None
# Learning rate
learning_rates=np.logspace(-9, 0, num=10)
# Weight attenuation factor
regularization_strengths=np.logspace(0, 5, num=10)
batch_size = [50]
num_iters  = [300]

for b in batch_size:
    for n in num_iters:
        for l in learning_rates:
            for r in regularization_strengths:
                # Create a Softmax class for comparison
                softmax = Softmax()
                loss_hist = softmax.train(X_sample, y_sample, 
                                         learning_rate=l,reg=r,
                                         num_iters=n,
                                         batch_size=b,
                                         verbose=False)
                y_train_pred = softmax.predict(X_sample)
                train_accuracy= np.mean(y_sample == y_train_pred)
                y_val_pred = softmax.predict(X_val)
                val_accuracy= np.mean(y_val == y_val_pred)
                results[(l,r)]=(train_accuracy,val_accuracy)
                # Find the parameter configuration with the highest accuracy
                if(best_val < val_accuracy):
                    best_val = val_accuracy
                    best_softmax = softmax
                    best_l =l
                    best_r =r
    
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e Training accuracy: %f Verification accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('The best learning rate is:%e The best weight attenuation coefficient is:%e The corresponding verification accuracy is: %f' % (best_l,best_r,best_val))
lr 1.000000e-09 reg 1.000000e+00 Training accuracy: 0.152000 Verification accuracy: 0.120000
lr 1.000000e-09 reg 3.593814e+00 Training accuracy: 0.064000 Verification accuracy: 0.090000
lr 1.000000e-09 reg 1.291550e+01 Training accuracy: 0.116000 Verification accuracy: 0.170000
lr 1.000000e-09 reg 4.641589e+01 Training accuracy: 0.048000 Verification accuracy: 0.080000
lr 1.000000e-09 reg 1.668101e+02 Training accuracy: 0.104000 Verification accuracy: 0.150000
lr 1.000000e-09 reg 5.994843e+02 Training accuracy: 0.148000 Verification accuracy: 0.100000
lr 1.000000e-09 reg 2.154435e+03 Training accuracy: 0.096000 Verification accuracy: 0.090000
lr 1.000000e-09 reg 7.742637e+03 Training accuracy: 0.112000 Verification accuracy: 0.140000
lr 1.000000e-09 reg 2.782559e+04 Training accuracy: 0.064000 Verification accuracy: 0.100000
lr 1.000000e-09 reg 1.000000e+05 Training accuracy: 0.108000 Verification accuracy: 0.070000
lr 1.000000e-08 reg 1.000000e+00 Training accuracy: 0.108000 Verification accuracy: 0.060000
lr 1.000000e-08 reg 3.593814e+00 Training accuracy: 0.104000 Verification accuracy: 0.160000
lr 1.000000e-08 reg 1.291550e+01 Training accuracy: 0.092000 Verification accuracy: 0.110000
lr 1.000000e-08 reg 4.641589e+01 Training accuracy: 0.080000 Verification accuracy: 0.090000
lr 1.000000e-08 reg 1.668101e+02 Training accuracy: 0.136000 Verification accuracy: 0.090000
lr 1.000000e-08 reg 5.994843e+02 Training accuracy: 0.132000 Verification accuracy: 0.080000
lr 1.000000e-08 reg 2.154435e+03 Training accuracy: 0.080000 Verification accuracy: 0.080000
lr 1.000000e-08 reg 7.742637e+03 Training accuracy: 0.148000 Verification accuracy: 0.080000
lr 1.000000e-08 reg 2.782559e+04 Training accuracy: 0.104000 Verification accuracy: 0.080000
lr 1.000000e-08 reg 1.000000e+05 Training accuracy: 0.092000 Verification accuracy: 0.070000
lr 1.000000e-07 reg 1.000000e+00 Training accuracy: 0.216000 Verification accuracy: 0.170000
lr 1.000000e-07 reg 3.593814e+00 Training accuracy: 0.300000 Verification accuracy: 0.120000
lr 1.000000e-07 reg 1.291550e+01 Training accuracy: 0.276000 Verification accuracy: 0.150000
lr 1.000000e-07 reg 4.641589e+01 Training accuracy: 0.304000 Verification accuracy: 0.170000
lr 1.000000e-07 reg 1.668101e+02 Training accuracy: 0.252000 Verification accuracy: 0.180000
lr 1.000000e-07 reg 5.994843e+02 Training accuracy: 0.232000 Verification accuracy: 0.170000
lr 1.000000e-07 reg 2.154435e+03 Training accuracy: 0.224000 Verification accuracy: 0.140000
lr 1.000000e-07 reg 7.742637e+03 Training accuracy: 0.256000 Verification accuracy: 0.130000
lr 1.000000e-07 reg 2.782559e+04 Training accuracy: 0.376000 Verification accuracy: 0.240000
lr 1.000000e-07 reg 1.000000e+05 Training accuracy: 0.500000 Verification accuracy: 0.210000
lr 1.000000e-06 reg 1.000000e+00 Training accuracy: 0.856000 Verification accuracy: 0.190000
lr 1.000000e-06 reg 3.593814e+00 Training accuracy: 0.872000 Verification accuracy: 0.200000
lr 1.000000e-06 reg 1.291550e+01 Training accuracy: 0.864000 Verification accuracy: 0.170000
lr 1.000000e-06 reg 4.641589e+01 Training accuracy: 0.852000 Verification accuracy: 0.200000
lr 1.000000e-06 reg 1.668101e+02 Training accuracy: 0.828000 Verification accuracy: 0.190000
lr 1.000000e-06 reg 5.994843e+02 Training accuracy: 0.868000 Verification accuracy: 0.240000
lr 1.000000e-06 reg 2.154435e+03 Training accuracy: 0.876000 Verification accuracy: 0.220000
lr 1.000000e-06 reg 7.742637e+03 Training accuracy: 0.808000 Verification accuracy: 0.240000
lr 1.000000e-06 reg 2.782559e+04 Training accuracy: 0.600000 Verification accuracy: 0.200000
lr 1.000000e-06 reg 1.000000e+05 Training accuracy: 0.404000 Verification accuracy: 0.190000
lr 1.000000e-05 reg 1.000000e+00 Training accuracy: 1.000000 Verification accuracy: 0.230000
lr 1.000000e-05 reg 3.593814e+00 Training accuracy: 1.000000 Verification accuracy: 0.170000
lr 1.000000e-05 reg 1.291550e+01 Training accuracy: 1.000000 Verification accuracy: 0.230000
lr 1.000000e-05 reg 4.641589e+01 Training accuracy: 1.000000 Verification accuracy: 0.210000
lr 1.000000e-05 reg 1.668101e+02 Training accuracy: 1.000000 Verification accuracy: 0.200000
lr 1.000000e-05 reg 5.994843e+02 Training accuracy: 0.996000 Verification accuracy: 0.210000
lr 1.000000e-05 reg 2.154435e+03 Training accuracy: 0.644000 Verification accuracy: 0.200000
lr 1.000000e-05 reg 7.742637e+03 Training accuracy: 0.460000 Verification accuracy: 0.210000
lr 1.000000e-05 reg 2.782559e+04 Training accuracy: 0.224000 Verification accuracy: 0.260000
lr 1.000000e-05 reg 1.000000e+05 Training accuracy: 0.140000 Verification accuracy: 0.080000
lr 1.000000e-04 reg 1.000000e+00 Training accuracy: 1.000000 Verification accuracy: 0.160000
lr 1.000000e-04 reg 3.593814e+00 Training accuracy: 1.000000 Verification accuracy: 0.160000
lr 1.000000e-04 reg 1.291550e+01 Training accuracy: 1.000000 Verification accuracy: 0.220000
lr 1.000000e-04 reg 4.641589e+01 Training accuracy: 1.000000 Verification accuracy: 0.210000
lr 1.000000e-04 reg 1.668101e+02 Training accuracy: 0.564000 Verification accuracy: 0.180000
lr 1.000000e-04 reg 5.994843e+02 Training accuracy: 0.492000 Verification accuracy: 0.190000
lr 1.000000e-04 reg 2.154435e+03 Training accuracy: 0.160000 Verification accuracy: 0.150000
lr 1.000000e-04 reg 7.742637e+03 Training accuracy: 0.160000 Verification accuracy: 0.120000
lr 1.000000e-04 reg 2.782559e+04 Training accuracy: 0.052000 Verification accuracy: 0.050000
lr 1.000000e-04 reg 1.000000e+05 Training accuracy: 0.092000 Verification accuracy: 0.110000
lr 1.000000e-03 reg 1.000000e+00 Training accuracy: 1.000000 Verification accuracy: 0.190000
lr 1.000000e-03 reg 3.593814e+00 Training accuracy: 1.000000 Verification accuracy: 0.180000
lr 1.000000e-03 reg 1.291550e+01 Training accuracy: 0.700000 Verification accuracy: 0.200000
lr 1.000000e-03 reg 4.641589e+01 Training accuracy: 0.304000 Verification accuracy: 0.160000
lr 1.000000e-03 reg 1.668101e+02 Training accuracy: 0.216000 Verification accuracy: 0.180000
lr 1.000000e-03 reg 5.994843e+02 Training accuracy: 0.116000 Verification accuracy: 0.130000
lr 1.000000e-03 reg 2.154435e+03 Training accuracy: 0.084000 Verification accuracy: 0.210000
lr 1.000000e-03 reg 7.742637e+03 Training accuracy: 0.104000 Verification accuracy: 0.100000
lr 1.000000e-03 reg 2.782559e+04 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-03 reg 1.000000e+05 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-02 reg 1.000000e+00 Training accuracy: 0.692000 Verification accuracy: 0.190000
lr 1.000000e-02 reg 3.593814e+00 Training accuracy: 0.380000 Verification accuracy: 0.180000
lr 1.000000e-02 reg 1.291550e+01 Training accuracy: 0.340000 Verification accuracy: 0.160000
lr 1.000000e-02 reg 4.641589e+01 Training accuracy: 0.148000 Verification accuracy: 0.090000
lr 1.000000e-02 reg 1.668101e+02 Training accuracy: 0.112000 Verification accuracy: 0.060000
lr 1.000000e-02 reg 5.994843e+02 Training accuracy: 0.076000 Verification accuracy: 0.060000
lr 1.000000e-02 reg 2.154435e+03 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-02 reg 7.742637e+03 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-02 reg 2.782559e+04 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-02 reg 1.000000e+05 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-01 reg 1.000000e+00 Training accuracy: 0.292000 Verification accuracy: 0.200000
lr 1.000000e-01 reg 3.593814e+00 Training accuracy: 0.196000 Verification accuracy: 0.180000
lr 1.000000e-01 reg 1.291550e+01 Training accuracy: 0.124000 Verification accuracy: 0.110000
lr 1.000000e-01 reg 4.641589e+01 Training accuracy: 0.116000 Verification accuracy: 0.170000
lr 1.000000e-01 reg 1.668101e+02 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-01 reg 5.994843e+02 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-01 reg 2.154435e+03 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-01 reg 7.742637e+03 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-01 reg 2.782559e+04 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e-01 reg 1.000000e+05 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 1.000000e+00 Training accuracy: 0.108000 Verification accuracy: 0.200000
lr 1.000000e+00 reg 3.593814e+00 Training accuracy: 0.108000 Verification accuracy: 0.050000
lr 1.000000e+00 reg 1.291550e+01 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 4.641589e+01 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 1.668101e+02 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 5.994843e+02 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 2.154435e+03 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 7.742637e+03 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 2.782559e+04 Training accuracy: 0.112000 Verification accuracy: 0.070000
lr 1.000000e+00 reg 1.000000e+05 Training accuracy: 0.112000 Verification accuracy: 0.070000
 The best learning rate is: 1.000000e-05 The best weight attenuation coefficient is: 2.782559e+04 The corresponding verification accuracy is: 0.260000
import math
# Generate corresponding point graph
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# Accuracy of drawing training data
marker_size = 100
# Production color
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
# Draw corresponding points
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
# Add color column
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# Drawing verification data accuracy
colors = [results[x][1] for x in results] 
plt.subplot(2, 1, 2)
# Draw corresponding points
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
# Add color column
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
# Automatic layout
plt.tight_layout()
plt.show()



It can be clearly seen from the visualization results that the learning rate is about [1e-6, 1e-4], the effect is significant, and the penalty factor is better between [1e0, 1e4], and the smaller the penalty factor is, the better the effect may be. Therefore, our next step is to continue to narrow the scope.

#Use the verification set to adjust the super parameters (weight, attenuation factor, learning rate)
from classifiers.chapter2.softmax import *
results = {}
best_val = -1
best_l = 0
best_r = 0
best_softmax = None
learning_rates=np.logspace(-6, -4, num=10)
regularization_strengths=np.logspace(-1, 4, num=5)
batch_size = [50]
num_iters  = [300]

for b in batch_size:
    for n in num_iters:
        for l in learning_rates:
            for r in regularization_strengths:
                softmax = Softmax()
                loss_hist = softmax.train(X_sample, y_sample,
                                         learning_rate=l, reg=r,
                                         num_iters=n,batch_size=b, verbose=False)
                y_train_pred = softmax.predict(X_sample)
                train_accuracy= np.mean(y_sample == y_train_pred)
                y_val_pred = softmax.predict(X_val)
                val_accuracy= np.mean(y_val == y_val_pred)
                results[(l,r)]=(train_accuracy,val_accuracy)
                if (best_val < val_accuracy):
                    best_val = val_accuracy
                    best_softmax = softmax
                    best_l =l
                    best_r =r

    
print('The best learning rate is:%e The best weight attenuation coefficient is:%e The corresponding verification accuracy is: %f' % (best_l,best_r,best_val))
D:\workspace\Python\jupyter\Practical examples of in-depth learning\DLAction\classifiers\chapter2\softmax_loss.py:111: RuntimeWarning: divide by zero encountered in log
  loss += -np.sum(y_trueClass*np.log(prob))/num_train+0.5*reg*np.sum(W*W)
D:\workspace\Python\jupyter\Practical examples of in-depth learning\DLAction\classifiers\chapter2\softmax_loss.py:111: RuntimeWarning: invalid value encountered in multiply
  loss += -np.sum(y_trueClass*np.log(prob))/num_train+0.5*reg*np.sum(W*W)


The best learning rate is: 1.291550e-05 The best weight attenuation coefficient is: 1.000000e+04 The corresponding verification accuracy is: 0.240000
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]


# Accuracy of drawing training data
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

#Drawing verification data accuracy
colors = [results[x][1] for x in results] 
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
# Automatic layout
plt.tight_layout()
plt.show()



Continue to narrow the parameter range.

#Use the verification set to adjust the super parameters (weight, attenuation factor, learning rate)
from classifiers.chapter2.softmax import *
results = {}
best_val = -1
best_l = 0
best_r = 0
best_softmax = None
learning_rates=np.logspace(-5, -4, num=10)
regularization_strengths=np.logspace(-2, 3, num=5)
batch_size = [50]
num_iters  = [300]

for b in batch_size:
    for n in num_iters:
        for l in learning_rates:
            for r in regularization_strengths:
                softmax = Softmax()
                loss_hist = softmax.train(X_sample, y_sample, learning_rate=l,
                                         reg=r,num_iters=n,batch_size=b, verbose=False)
                y_train_pred = softmax.predict(X_sample)
                train_accuracy= np.mean(y_sample == y_train_pred)
                y_val_pred = softmax.predict(X_val)
                val_accuracy= np.mean(y_val == y_val_pred)
                results[(l,r)]=(train_accuracy,val_accuracy)
                if (best_val < val_accuracy):
                    best_val = val_accuracy
                    best_softmax = softmax
                    best_l =l
                    best_r =r
    
print('The best learning rate is:%e The best weight attenuation coefficient is:%e The corresponding verification accuracy is: %f' % (best_l,best_r,best_val))
The best learning rate is: 1.000000e-05 The best weight attenuation coefficient is: 3.162278e+00 The corresponding verification accuracy is: 0.230000
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')


colors = [results[x][1] for x in results] 
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
# Automatic layout
plt.tight_layout()
plt.show()



#Evaluate the best softmax classifier on the test data set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('Test set accuracy: %f' % test_accuracy)
Test set accuracy: 0.234500

We observed that the effect was not ideal because the amount of data we trained was too small.

Increasing the amount of data can avoid the occurrence of over fitting, but it also makes us time-consuming and labor-consuming in selecting super parameters. Therefore, we can use smaller data to roughly select the value range of super parameters, which can save training time. However, the superparameter that performs best in small data does not necessarily perform better when the amount of data is large. Therefore, we need to pay attention to the control of the amount of data, but this is a very empirical problem, which needs to accumulate admiration from a large number of experiments and keep trying. Remember, don't be afraid of mistakes.

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=10000):
    cifar10_dir = 'datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) 
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]    
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))    
    mean_image = np.mean(X_train, axis = 0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]) 
    return X_train, y_train, X_val, y_val, X_test, y_test


X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print ('Train data shape: ', X_train.shape)
print ('Train labels shape: ', y_train.shape)
print ('Validation data shape: ', X_val.shape)
print ('Validation labels shape: ', y_val.shape)
print ('Test data shape: ', X_test.shape)
print ('Test labels shape: ', y_test.shape)
Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3073)
Validation labels shape:  (1000,)
Test data shape:  (10000, 3073)
Test labels shape:  (10000,)
from classifiers.chapter2.softmax  import Softmax
results = {}
best_val = -1
best_softmax = None
################################################################################
#                            task:                                             #
#               Use all training data to train a best softmax                            #
################################################################################
learning_rates = [1.4e-7, 1.45e-7, 1.5e-7, 1.55e-7, 1.6e-7]  #Learning rate
regularization_strengths = [2.3e4, 2.6e4, 2.7e4, 2.8e4, 2.9e4]  #Weight attenuation factor
for l in learning_rates:
    for r in regularization_strengths:
        softmax = Softmax()
        loss_hist = softmax.train(X_train, y_train, learning_rate=l, 
                                  reg=r,num_iters=2000, verbose=True)
        y_train_pred = softmax.predict(X_train)
        train_accuracy= np.mean(y_train == y_train_pred)
        y_val_pred = softmax.predict(X_val)
        val_accuracy= np.mean(y_val == y_val_pred)
        results[(l,r)]=(train_accuracy,val_accuracy)
        if (best_val < val_accuracy):
            best_val = val_accuracy
            best_softmax = softmax
################################################################################
#                            End coding                                          #
################################################################################

for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('The best verification accuracy is: %f' % best_val)
Iterations 0 / 2000: loss 362.275538
 Number of iterations 500 / 2000: loss 15.973378
 Number of iterations 1000 / 2000: loss 2.563832
 Number of iterations 1500 / 2000: loss 2.039525
 Iterations 0 / 2000: loss 402.579894
 Number of iterations 500 / 2000: loss 12.217043
 Number of iterations 1000 / 2000: loss 2.312635
 Number of iterations 1500 / 2000: loss 2.033261
 Iterations 0 / 2000: loss 420.798641
 Number of iterations 500 / 2000: loss 11.346001
 Number of iterations 1000 / 2000: loss 2.314607
 Number of iterations 1500 / 2000: loss 2.044269
 Iterations 0 / 2000: loss 440.503492
 Number of iterations 500 / 2000: loss 10.521333
 Number of iterations 1000 / 2000: loss 2.191913
 Number of iterations 1500 / 2000: loss 2.124927
 Iterations 0 / 2000: loss 453.308193
 Number of iterations 500 / 2000: loss 9.689195
 Number of iterations 1000 / 2000: loss 2.130514
 Number of iterations 1500 / 2000: loss 2.027209
 Iterations 0 / 2000: loss 362.176210
 Number of iterations 500 / 2000: loss 14.451922
 Number of iterations 1000 / 2000: loss 2.447913
 Number of iterations 1500 / 2000: loss 2.041807
 Iterations 0 / 2000: loss 402.706053
 Number of iterations 500 / 2000: loss 11.013674
 Number of iterations 1000 / 2000: loss 2.216208
 Number of iterations 1500 / 2000: loss 2.032448
 Iterations 0 / 2000: loss 421.450555
 Number of iterations 500 / 2000: loss 10.149416
 Number of iterations 1000 / 2000: loss 2.186461
 Number of iterations 1500 / 2000: loss 2.062753
 Iterations 0 / 2000: loss 431.157711
 Number of iterations 500 / 2000: loss 9.254785
 Number of iterations 1000 / 2000: loss 2.161939
 Number of iterations 1500 / 2000: loss 1.991775
 Iterations 0 / 2000: loss 444.492450
 Number of iterations 500 / 2000: loss 8.409564
 Number of iterations 1000 / 2000: loss 2.136935
 Number of iterations 1500 / 2000: loss 1.971083
 Iterations 0 / 2000: loss 362.221476
 Number of iterations 500 / 2000: loss 13.076090
 Number of iterations 1000 / 2000: loss 2.398948
 Number of iterations 1500 / 2000: loss 1.981336
 Iterations 0 / 2000: loss 404.341723
 Number of iterations 500 / 2000: loss 9.917005
 Number of iterations 1000 / 2000: loss 2.178486
 Number of iterations 1500 / 2000: loss 1.977197
 Iterations 0 / 2000: loss 422.058121
 Number of iterations 500 / 2000: loss 9.169317
 Number of iterations 1000 / 2000: loss 2.147070
 Number of iterations 1500 / 2000: loss 2.085903
 Iterations 0 / 2000: loss 430.965383
 Number of iterations 500 / 2000: loss 8.230988
 Number of iterations 1000 / 2000: loss 2.127792
 Number of iterations 1500 / 2000: loss 1.982288
 Iterations 0 / 2000: loss 449.796595
 Number of iterations 500 / 2000: loss 7.626812
 Number of iterations 1000 / 2000: loss 2.066997
 Number of iterations 1500 / 2000: loss 2.038826
 Iterations 0 / 2000: loss 355.354397
 Number of iterations 500 / 2000: loss 11.726022
 Number of iterations 1000 / 2000: loss 2.327054
 Number of iterations 1500 / 2000: loss 1.960372
 Iterations 0 / 2000: loss 408.173326
 Number of iterations 500 / 2000: loss 8.995592
 Number of iterations 1000 / 2000: loss 2.188157
 Number of iterations 1500 / 2000: loss 2.035182
 Iterations 0 / 2000: loss 416.190544
 Number of iterations 500 / 2000: loss 8.098396
 Number of iterations 1000 / 2000: loss 2.138810
 Number of iterations 1500 / 2000: loss 2.016468
 Iterations 0 / 2000: loss 435.709668
 Number of iterations 500 / 2000: loss 7.583673
 Number of iterations 1000 / 2000: loss 2.123043
 Number of iterations 1500 / 2000: loss 2.050762
 Iterations 0 / 2000: loss 455.764297
 Number of iterations 500 / 2000: loss 6.993459
 Number of iterations 1000 / 2000: loss 2.058059
 Number of iterations 1500 / 2000: loss 1.998903
 Iterations 0 / 2000: loss 360.201729
 Number of iterations 500 / 2000: loss 10.908834
 Number of iterations 1000 / 2000: loss 2.237939
 Number of iterations 1500 / 2000: loss 2.049242
 Iterations 0 / 2000: loss 406.568389
 Number of iterations 500 / 2000: loss 8.150559
 Number of iterations 1000 / 2000: loss 2.124073
 Number of iterations 1500 / 2000: loss 2.041791
 Iterations 0 / 2000: loss 418.556158
 Number of iterations 500 / 2000: loss 7.397969
 Number of iterations 1000 / 2000: loss 2.090971
 Number of iterations 1500 / 2000: loss 2.099561
 Iterations 0 / 2000: loss 434.558226
 Number of iterations 500 / 2000: loss 6.857017
 Number of iterations 1000 / 2000: loss 2.133622
 Number of iterations 1500 / 2000: loss 2.058132
 Iterations 0 / 2000: loss 444.248523
 Number of iterations 500 / 2000: loss 6.207619
 Number of iterations 1000 / 2000: loss 2.031800
 Number of iterations 1500 / 2000: loss 1.994766
lr 1.400000e-07 reg 2.300000e+04 train accuracy: 0.351163 val accuracy: 0.367000
lr 1.400000e-07 reg 2.600000e+04 train accuracy: 0.349898 val accuracy: 0.365000
lr 1.400000e-07 reg 2.700000e+04 train accuracy: 0.347102 val accuracy: 0.360000
lr 1.400000e-07 reg 2.800000e+04 train accuracy: 0.344673 val accuracy: 0.359000
lr 1.400000e-07 reg 2.900000e+04 train accuracy: 0.345122 val accuracy: 0.355000
lr 1.450000e-07 reg 2.300000e+04 train accuracy: 0.352020 val accuracy: 0.364000
lr 1.450000e-07 reg 2.600000e+04 train accuracy: 0.345939 val accuracy: 0.362000
lr 1.450000e-07 reg 2.700000e+04 train accuracy: 0.340673 val accuracy: 0.355000
lr 1.450000e-07 reg 2.800000e+04 train accuracy: 0.350980 val accuracy: 0.359000
lr 1.450000e-07 reg 2.900000e+04 train accuracy: 0.347735 val accuracy: 0.361000
lr 1.500000e-07 reg 2.300000e+04 train accuracy: 0.350551 val accuracy: 0.366000
lr 1.500000e-07 reg 2.600000e+04 train accuracy: 0.348245 val accuracy: 0.362000
lr 1.500000e-07 reg 2.700000e+04 train accuracy: 0.347898 val accuracy: 0.363000
lr 1.500000e-07 reg 2.800000e+04 train accuracy: 0.344327 val accuracy: 0.352000
lr 1.500000e-07 reg 2.900000e+04 train accuracy: 0.345857 val accuracy: 0.360000
lr 1.550000e-07 reg 2.300000e+04 train accuracy: 0.352531 val accuracy: 0.368000
lr 1.550000e-07 reg 2.600000e+04 train accuracy: 0.347714 val accuracy: 0.359000
lr 1.550000e-07 reg 2.700000e+04 train accuracy: 0.348286 val accuracy: 0.354000
lr 1.550000e-07 reg 2.800000e+04 train accuracy: 0.345224 val accuracy: 0.370000
lr 1.550000e-07 reg 2.900000e+04 train accuracy: 0.350837 val accuracy: 0.376000
lr 1.600000e-07 reg 2.300000e+04 train accuracy: 0.349327 val accuracy: 0.366000
lr 1.600000e-07 reg 2.600000e+04 train accuracy: 0.350327 val accuracy: 0.366000
lr 1.600000e-07 reg 2.700000e+04 train accuracy: 0.345673 val accuracy: 0.356000
lr 1.600000e-07 reg 2.800000e+04 train accuracy: 0.352796 val accuracy: 0.358000
lr 1.600000e-07 reg 2.900000e+04 train accuracy: 0.345510 val accuracy: 0.360000
 The best verification accuracy is: 0.376000
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('Test set accuracy: %f' % test_accuracy)
Test set accuracy: 0.354200

Another way to visually test the quality of the model is to visualize the model parameters. For example, if we identify the picture "horse", the model parameters are actually the picture "horse template", and the data can also be inferred through the visual parameters. For example, if there are a large number of pictures of "horse head left" and "horse head right", the trained model parameters are likely to become "double headed horse". Similarly, if most of the cars in the picture are red, the trained model may be a "red car template".

# Visually learned parameters
w = best_softmax.W[:-1,:] # Remove offset
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
    #Scale the weight back to 0-255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

Posted by Wolfsoap on Sun, 24 Oct 2021 02:46:34 -0700