Cost function: quadratic, cross entropy, log likelihood (course: refining numbers into gold)
quadratic cost function
C = 1 2 n ∑ x ∣ ∣ y ( x ) − a L ( x ) ∣ ∣ 2 C = \frac{1}{2n} \sum_x ||y(x) - a^L(x)||^2 C=2n1∑x∣∣y(x)−aL(x)∣∣2
- Where, C represents the cost function, x represents the sample, y represents the actual value (label), a represents the output value (predicted value), and n represents the total number of samples.
- Example: when the sample is 1, i.e. x,n=1:
C = ( y − a ) 2 2 C = \frac{(y-a)^2}{2} C=2(y−a)2
among a = σ ( z ) , z = ∑ W j ∗ X j + b a = \sigma(z), z = \sum W_j * X_j + b a= σ (z),z=∑Wj∗Xj+b; among σ ( ) \sigma() σ () is the activation function. - The Gradient descent method is used to adjust the size of weight parameters. The gradient derivation of weight W and offset b is as follows:
∂ C ∂ w = ( a − y ) σ ′ ( z ) x , ∂ C ∂ b = ( a − y ) σ ′ ( z ) \frac{\partial C}{\partial w} = (a-y)\sigma'(z)x, \frac{\partial C}{\partial b} = (a-y)\sigma'(z) ∂w∂C=(a−y)σ′(z)x,∂b∂C=(a−y)σ′(z)
Where z represents the input of neurons, σ \sigma σ Indicates the activation function. The gradient of w and b is directly proportional to the gradient of the activation function. The larger the gradient of the activation function, the faster the size of w and b is modulated.
Cross entropy cost function
Instead of changing the activation function, change the cost function and use the cross entropy cost function:
C
=
−
1
n
∑
x
[
y
l
n
a
+
(
1
−
y
)
l
n
(
1
−
a
)
]
C = - \frac{1}{n} \sum_x [y ln a + (1-y)ln(1-a)]
C=−n1∑x[ylna+(1−y)ln(1−a)]
- Where, C represents the cost function, x represents the sample, y represents the actual value, a represents the output value, and n represents the total number of samples.
among a = σ ( z ) , z = ∑ W j ∗ X j + b , σ ′ ( z ) = σ ( z ) ( 1 − σ ( z ) ) a = \sigma(z), z = \sum W_j * X_j + b, \sigma'(z) = \sigma(z)(1 - \sigma(z)) a=σ(z),z=∑Wj∗Xj+b,σ′(z)=σ(z)(1−σ(z))
∂ C ∂ w j = 1 n ∑ x x j ( σ ( z ) − y ) , ∂ C ∂ b = 1 n ∑ x ( σ ( z ) − y ) \frac{\partial C}{\partial w_j} = \frac{1}{n} \sum_x x_j (\sigma(z) - y), \frac{\partial C}{\partial b} = \frac{1}{n} \sum_x(\sigma(z) - y) ∂wj∂C=n1∑xxj(σ(z)−y),∂b∂C=n1∑x(σ(z)−y) - Adjustment and of weight and offset values σ ′ ( z ) \sigma'(z) σ′ (z) (the derivative of the activation function) is irrelevant. In addition, in the gradient formula σ ( z ) − y \sigma(z) - y σ (z) − y indicates the error between the output value and the actual value. So when the error is larger, the gradient is larger, and the parameter w w w and b b The faster the adjustment of b, the faster the training speed.
- If the output neuron is linear, the quadratic cost function is an appropriate choice. If the output neuron is S S S-type function, then the cross entropy cost function is more applicable.
Log likelihood cost function
- Log likelihood function is often used as s o f t m a x softmax The cost function of softmax regression if the output layer neuron is s i g m o i d sigmoid sigmoid function, cross entropy cost function can be used. A more common approach in deep learning is to s o f t m a x softmax softmax is the last layer, and the commonly used cost function is log likelihood function.
- Log likelihood cost function and s o f t m a x softmax Combination and cross entropy of softmax and s i g m o i d sigmoid The combination of sigmoid functions is very similar. The log likelihood cost function can be simplified into the form of cross entropy cost function.
- stay
t
e
m
s
o
r
f
l
o
w
temsorflow
Used in temsorflow:
- tf.nn.sigmoid_cross_entropy_with_logits() to represent s i g m o i d sigmoid sigmoid collocation uses cross entropy.
- tf.nn.softmax_cross_entropy_with_logits() to represent s o f t m a x softmax Cross entropy used with softmax.
import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data #Handwritten numeral related packets
# Load dataset mnist = input_data.read_data_sets("MNIST_data",one_hot=True) #Load the data, {dataset package path, convert the label to the form of only 0 and 1} #Define variables, that is, the size of each batch batch_size = 100 #Put 100 pictures in at a time n_batch = mnist.train.num_examples // batch_size # calculate the total number of batches; Number of training sets (integer division) one batch size #Define two placeholder s x = tf.placeholder(tf.float32,[None,784]) #[row uncertain, column 784] y = tf.placeholder(tf.float32,[None,10]) #If the number is 0-9, it is 10 #Creating simple neural networks W = tf.Variable(tf.zeros([784,10])) #weight b = tf.Variable(tf.zeros([10])) #bias prediction = tf.nn.softmax(tf.matmul(x,W)+b) #forecast #Define quadratic cost function # loss = tf.reduce_mean(tf.square(y-prediction)) #Define cross entropy cost function loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction)) #Using gradient descent method train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss) #initialize variable init = tf.global_variables_initializer() #The exact number, and the results are stored in a Boolean list correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1)) #Compare whether the two parameters have the same size. If they are the same, it will be returned as true, and if they are different, it will be returned as false; argmax(): returns the position of the largest value in the tensor #Accuracy accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) #cast(): convert Boolean to 32-bit floating-point; (for example, 9 T's and 1 F's are 9 1's and 1 0's, i.e. the accuracy is 90%) with tf.Session() as sess: sess.run(init) for epoch in range(21): for batch in range(n_batch): batch_xs,batch_ys = mnist.train.next_batch(batch_size) sess.run(train_step,feed_dict={x:batch_xs,y:batch_ys}) acc = sess.run(accuracy,feed_dict={x:mnist.test.images,y:mnist.test.labels}) print("Iter" + str(epoch) + ",Testing Accuracy" + str(acc))
Extracting MNIST_data\train-images-idx3-ubyte.gz Extracting MNIST_data\train-labels-idx1-ubyte.gz Extracting MNIST_data\t10k-images-idx3-ubyte.gz Extracting MNIST_data\t10k-labels-idx1-ubyte.gz Iter0,Testing Accuracy0.8502 Iter1,Testing Accuracy0.8954 Iter2,Testing Accuracy0.9014 Iter3,Testing Accuracy0.9052 Iter4,Testing Accuracy0.9079 Iter5,Testing Accuracy0.91 Iter6,Testing Accuracy0.9115 Iter7,Testing Accuracy0.9132 Iter8,Testing Accuracy0.9152 Iter9,Testing Accuracy0.9159 Iter10,Testing Accuracy0.9167 Iter11,Testing Accuracy0.9181 Iter12,Testing Accuracy0.9189 Iter13,Testing Accuracy0.9192 Iter14,Testing Accuracy0.9205 Iter15,Testing Accuracy0.9202 Iter16,Testing Accuracy0.921 Iter17,Testing Accuracy0.9209 Iter18,Testing Accuracy0.9213 Iter19,Testing Accuracy0.9216 Iter20,Testing Accuracy0.922
Like, follow, collect 👍,➕,👀 Like, follow, collect 👍,➕,👀 Like, follow, collect 👍,➕,👀
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪💪