Forward propagation (tensor) - actual combat

Catalog

Handwritten digit recognition process
Forward propagation (tensor) - actual combat

Handwritten digit recognition process

MNIST handwritten numeral set 7000 * 10 pictures
60k picture training, 10k picture test
28 * 28 for each picture, 28 * 28 * 3 for color picture
0-255 represents the gray value of the picture, 0 represents pure white, 255 represents pure black
Flatten the matrix of 28 * 28 to get the vector of 28 * 28 = 784
For B pictures, we can get [b,784]; then for B pictures, we can give the coding
The general coding is given as the single heat coding, but the single heat coding is probability value, and the probability value is added to 1, similar to softmax regression
Applying linear regression formula
X[b,784] W[784,10] b[10] gets [b,10]
The implementation of high-dimensional image is very complex, a linear model can not be completed, so nonlinear factors can be added
f(X@W+b), use activation function to make it non-linear, and educe relu function
With the activation function, the model is still too simple
Use factory
- H1 =relu(X@W1+b1)
- H2 = relu(h1@W2+b2)
- Out = relu(h2@W3+b3)
First, change [1784] to [1512] to [1256] to [1,10]
After [1,10] is obtained, the results are encoded by single heat
Using Euclidean distance or mse for error measurement
[1784] output one [1,10] through three-layer network

Forward propagation (tensor) - actual combat

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets
import os

# do not print irrelevant information
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# x: [60k,28,28]
# y: [60k]
(x, y), _ = datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 1s 0us/step

# transform Tensor
# x: [0~255] ==> [0~1.]
x = tf.convert_to_tensor(x, dtype=tf.float32) / 255.
y = tf.convert_to_tensor(y, dtype=tf.int32)

f'x.shape: {x.shape}, y.shape: {y.shape}, x.dtype: {x.dtype}, y.dtype: {y.dtype}'

"x.shape: (60000, 28, 28), y.shape: (60000,), x.dtype: <dtype: 'float32'>, y.dtype: <dtype: 'int32'>"

f'min_x: {tf.reduce_min(x)}, max_x: {tf.reduce_max(x)}'

'min_x: 0.0, max_x: 1.0'

f'min_y: {tf.reduce_min(y)}, max_y: {tf.reduce_max(y)}'

'min_y: 0, max_y: 9'

# batch of 128
train_db = tf.data.Dataset.from_tensor_slices((x, y)).batch(128)
train_iter = iter(train_db)
sample = next(train_iter)
f'batch: {sample[0].shape,sample[1].shape}'

'batch: (TensorShape([128, 28, 28]), TensorShape([128]))'

# [b,784] ==> [b,256] ==> [b,128] ==> [b,10]
# [dim_in,dim_out],[dim_out]
w1 = tf.Variable(tf.random.truncated_normal([784, 256], stddev=0.1))
b1 = tf.Variable(tf.zeros([256]))
w2 = tf.Variable(tf.random.truncated_normal([256, 128], stddev=0.1))
b2 = tf.Variable(tf.zeros([128]))
w3 = tf.Variable(tf.random.truncated_normal([128, 10], stddev=0.1))
b3 = tf.Variable(tf.zeros([10]))

# learning rate
lr = 1e-3

for epoch in range(10):  # iterate db for 10
    # tranin every train_db
    for step, (x, y) in enumerate(train_db):
        # x: [128,28,28]
        # y: [128]

        # [b,28,28] ==> [b,28*28]
        x = tf.reshape(x, [-1, 28*28])

        with tf.GradientTape() as tape:  # only data types of tf.variable are logged
            # x: [b,28*28]
            # h1 = x@w1 + b1
            # [b,784]@[784,256]+[256] ==> [b,256] + [256] ==> [b,256] + [b,256]
            h1 = x @ w1 + tf.broadcast_to(b1, [x.shape[0], 256])
            h1 = tf.nn.relu(h1)
            # [b,256] ==> [b,128]
            # h2 = x@w2 + b2  # b2 can broadcast automatic
            h2 = h1 @ w2 + b2
            h2 = tf.nn.relu(h2)
            # [b,128] ==> [b,10]
            out = h2 @ w3 + b3

            # compute loss
            # out: [b,10]
            # y:[b] ==> [b,10]
            y_onehot = tf.one_hot(y, depth=10)

            # mse = mean(sum(y-out)^2)
            # [b,10]
            loss = tf.square(y_onehot - out)
            # mean:scalar
            loss = tf.reduce_mean(loss)

        # compute gradients
        grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
        # w1 = w1 - lr * w1_grad
        # w1 = w1 - lr * grads[0]  # not in situ update
        # in situ update
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])
        w3.assign_sub(lr * grads[4])
        b3.assign_sub(lr * grads[5])

        if step % 100 == 0:
            print(f'epoch:{epoch}, step: {step}, loss:{float(loss)}')

epoch:0, step: 0, loss:0.5366693735122681
epoch:0, step: 100, loss:0.23276552557945251
epoch:0, step: 200, loss:0.19647717475891113
epoch:0, step: 300, loss:0.17389704287052155
epoch:0, step: 400, loss:0.1731622964143753
epoch:1, step: 0, loss:0.16157487034797668
epoch:1, step: 100, loss:0.16654588282108307
epoch:1, step: 200, loss:0.15311869978904724
epoch:1, step: 300, loss:0.14135733246803284
epoch:1, step: 400, loss:0.14423415064811707
epoch:2, step: 0, loss:0.13703864812850952
epoch:2, step: 100, loss:0.14255204796791077
epoch:2, step: 200, loss:0.1302051544189453
epoch:2, step: 300, loss:0.12224273383617401
epoch:2, step: 400, loss:0.12742099165916443
epoch:3, step: 0, loss:0.1219201311469078
epoch:3, step: 100, loss:0.12757658958435059
epoch:3, step: 200, loss:0.11587800830602646
epoch:3, step: 300, loss:0.10984969139099121
epoch:3, step: 400, loss:0.11641304194927216
epoch:4, step: 0, loss:0.11171815544366837
epoch:4, step: 100, loss:0.11717887222766876
epoch:4, step: 200, loss:0.10604140907526016
epoch:4, step: 300, loss:0.10111508518457413
epoch:4, step: 400, loss:0.10865814983844757
epoch:5, step: 0, loss:0.10434548556804657
epoch:5, step: 100, loss:0.10952303558588028
epoch:5, step: 200, loss:0.09875871241092682
epoch:5, step: 300, loss:0.09467941522598267
epoch:5, step: 400, loss:0.10282392799854279
epoch:6, step: 0, loss:0.09874211996793747
epoch:6, step: 100, loss:0.10355912148952484
epoch:6, step: 200, loss:0.09315416216850281
epoch:6, step: 300, loss:0.08971598744392395
epoch:6, step: 400, loss:0.0982089415192604
epoch:7, step: 0, loss:0.09428335726261139
epoch:7, step: 100, loss:0.09877124428749084
epoch:7, step: 200, loss:0.08866965025663376
epoch:7, step: 300, loss:0.08573523908853531
epoch:7, step: 400, loss:0.09440126270055771
epoch:8, step: 0, loss:0.09056715667247772
epoch:8, step: 100, loss:0.09483197331428528
epoch:8, step: 200, loss:0.0849832147359848
epoch:8, step: 300, loss:0.08246967941522598
epoch:8, step: 400, loss:0.09117519855499268
epoch:9, step: 0, loss:0.08741479367017746
epoch:9, step: 100, loss:0.09150294959545135
epoch:9, step: 200, loss:0.08185736835002899
epoch:9, step: 300, loss:0.07972464710474014
epoch:9, step: 400, loss:0.08842341601848602

Posted by aromakat on Tue, 12 Nov 2019 12:19:01 -0800

Programmer Group

Forward propagation (tensor) - actual combat

Handwritten digit recognition process

Forward propagation (tensor) - actual combat

Hot Keywords