Tensorflow code learning-4-1 cost function: quadratic, cross entropy, log likelihood

Keywords: Machine Learning AI neural networks TensorFlow Deep Learning

Cost function: quadratic, cross entropy, log likelihood (course: refining numbers into gold)

quadratic cost function

C = 1 2 n ∑ x ∣ ∣ y ( x ) − a L ( x ) ∣ ∣ 2 C = \frac{1}{2n} \sum_x ||y(x) - a^L(x)||^2 C=2n1​∑x​∣∣y(x)−aL(x)∣∣2

  • Where, C represents the cost function, x represents the sample, y represents the actual value (label), a represents the output value (predicted value), and n represents the total number of samples.
  • Example: when the sample is 1, i.e. x,n=1:
    C = ( y − a ) 2 2 C = \frac{(y-a)^2}{2} C=2(y−a)2​
    among a = σ ( z ) , z = ∑ W j ∗ X j + b a = \sigma(z), z = \sum W_j * X_j + b a= σ (z),z=∑Wj​∗Xj​+b; among σ ( ) \sigma() σ () is the activation function.
  • The Gradient descent method is used to adjust the size of weight parameters. The gradient derivation of weight W and offset b is as follows:
    ∂ C ∂ w = ( a − y ) σ ′ ( z ) x , ∂ C ∂ b = ( a − y ) σ ′ ( z ) \frac{\partial C}{\partial w} = (a-y)\sigma'(z)x, \frac{\partial C}{\partial b} = (a-y)\sigma'(z) ∂w∂C​=(a−y)σ′(z)x,∂b∂C​=(a−y)σ′(z)
    Where z represents the input of neurons, σ \sigma σ Indicates the activation function. The gradient of w and b is directly proportional to the gradient of the activation function. The larger the gradient of the activation function, the faster the size of w and b is modulated.

Cross entropy cost function

Instead of changing the activation function, change the cost function and use the cross entropy cost function:
C = − 1 n ∑ x [ y l n a + ( 1 − y ) l n ( 1 − a ) ] C = - \frac{1}{n} \sum_x [y ln a + (1-y)ln(1-a)] C=−n1​∑x​[ylna+(1−y)ln(1−a)]

  • Where, C represents the cost function, x represents the sample, y represents the actual value, a represents the output value, and n represents the total number of samples.
    among a = σ ( z ) , z = ∑ W j ∗ X j + b , σ ′ ( z ) = σ ( z ) ( 1 − σ ( z ) ) a = \sigma(z), z = \sum W_j * X_j + b, \sigma'(z) = \sigma(z)(1 - \sigma(z)) a=σ(z),z=∑Wj​∗Xj​+b,σ′(z)=σ(z)(1−σ(z))
    ∂ C ∂ w j = 1 n ∑ x x j ( σ ( z ) − y ) , ∂ C ∂ b = 1 n ∑ x ( σ ( z ) − y ) \frac{\partial C}{\partial w_j} = \frac{1}{n} \sum_x x_j (\sigma(z) - y), \frac{\partial C}{\partial b} = \frac{1}{n} \sum_x(\sigma(z) - y) ∂wj​∂C​=n1​∑x​xj​(σ(z)−y),∂b∂C​=n1​∑x​(σ(z)−y)
  • Adjustment and of weight and offset values σ ′ ( z ) \sigma'(z) σ′ (z) (the derivative of the activation function) is irrelevant. In addition, in the gradient formula σ ( z ) − y \sigma(z) - y σ (z) − y indicates the error between the output value and the actual value. So when the error is larger, the gradient is larger, and the parameter w w w and b b The faster the adjustment of b, the faster the training speed.
  • If the output neuron is linear, the quadratic cost function is an appropriate choice. If the output neuron is S S S-type function, then the cross entropy cost function is more applicable.

Log likelihood cost function

  • Log likelihood function is often used as s o f t m a x softmax The cost function of softmax regression if the output layer neuron is s i g m o i d sigmoid sigmoid function, cross entropy cost function can be used. A more common approach in deep learning is to s o f t m a x softmax softmax is the last layer, and the commonly used cost function is log likelihood function.
  • Log likelihood cost function and s o f t m a x softmax Combination and cross entropy of softmax and s i g m o i d sigmoid The combination of sigmoid functions is very similar. The log likelihood cost function can be simplified into the form of cross entropy cost function.
  • stay t e m s o r f l o w temsorflow Used in temsorflow:
    • tf.nn.sigmoid_cross_entropy_with_logits() to represent s i g m o i d sigmoid sigmoid collocation uses cross entropy.
    • tf.nn.softmax_cross_entropy_with_logits() to represent s o f t m a x softmax Cross entropy used with softmax.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data   #Handwritten numeral related packets
# Load dataset
mnist = input_data.read_data_sets("MNIST_data",one_hot=True)    #Load the data, {dataset package path, convert the label to the form of only 0 and 1}

#Define variables, that is, the size of each batch
batch_size = 100    #Put 100 pictures in at a time
n_batch = mnist.train.num_examples // batch_size # calculate the total number of batches; Number of training sets (integer division) one batch size

#Define two placeholder s
x = tf.placeholder(tf.float32,[None,784])    #[row uncertain, column 784]
y = tf.placeholder(tf.float32,[None,10])    #If the number is 0-9, it is 10

#Creating simple neural networks
W = tf.Variable(tf.zeros([784,10]))   #weight
b = tf.Variable(tf.zeros([10]))     #bias
prediction = tf.nn.softmax(tf.matmul(x,W)+b)    #forecast

#Define quadratic cost function
# loss = tf.reduce_mean(tf.square(y-prediction))
#Define cross entropy cost function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#Using gradient descent method
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

#initialize variable
init = tf.global_variables_initializer()

#The exact number, and the results are stored in a Boolean list
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))   #Compare whether the two parameters have the same size. If they are the same, it will be returned as true, and if they are different, it will be returned as false; argmax(): returns the position of the largest value in the tensor

#Accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))   #cast(): convert Boolean to 32-bit floating-point; (for example, 9 T's and 1 F's are 9 1's and 1 0's, i.e. the accuracy is 90%)

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(21):
        for batch in range(n_batch):
            batch_xs,batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step,feed_dict={x:batch_xs,y:batch_ys})
            
        acc = sess.run(accuracy,feed_dict={x:mnist.test.images,y:mnist.test.labels})
        print("Iter" + str(epoch) + ",Testing Accuracy" + str(acc))

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
Iter0,Testing Accuracy0.8502
Iter1,Testing Accuracy0.8954
Iter2,Testing Accuracy0.9014
Iter3,Testing Accuracy0.9052
Iter4,Testing Accuracy0.9079
Iter5,Testing Accuracy0.91
Iter6,Testing Accuracy0.9115
Iter7,Testing Accuracy0.9132
Iter8,Testing Accuracy0.9152
Iter9,Testing Accuracy0.9159
Iter10,Testing Accuracy0.9167
Iter11,Testing Accuracy0.9181
Iter12,Testing Accuracy0.9189
Iter13,Testing Accuracy0.9192
Iter14,Testing Accuracy0.9205
Iter15,Testing Accuracy0.9202
Iter16,Testing Accuracy0.921
Iter17,Testing Accuracy0.9209
Iter18,Testing Accuracy0.9213
Iter19,Testing Accuracy0.9216
Iter20,Testing Accuracy0.922

Like, follow, collect 👍,➕,👀 Like, follow, collect 👍,➕,👀 Like, follow, collect 👍,➕,👀
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪

Posted by w1ww on Mon, 27 Sep 2021 00:13:42 -0700