After learning Tensorflow, after using MNIST data sets to train simple MLP, self-encoder and CNN, I wondered if I could make a data set and train with convolutional neural network. So I checked the data on the Internet and found that standard TFrecords format could be used. However, there are some problems. The data set of TFrecords is well made, and the error is reported when it runs. There is no relevant method found on the internet. Later, I found a way to solve it myself. If someone has a better way, you can communicate.
1. Preparing data
I have prepared two categories of images, cat and dog, which are stored in the train_data folder of D disk, as follows:
2. Making tfrecords files
The code is named make_own_data.py
tfrecord will automatically label each class with the same label, depending on the type of input file you choose.
The code is as follows:
# -*- coding: utf-8 -*- """ @author: caokai """ import os import tensorflow as tf from PIL import Image import matplotlib.pyplot as plt import numpy as np cwd='D:/train_data/' classes={'dog','cat'} #Artificial Setting of Two Categories writer= tf.python_io.TFRecordWriter("dog_and_cat_train.tfrecords") #Files to be generated for index,name in enumerate(classes): class_path=cwd+name+'/' for img_name in os.listdir(class_path): img_path=class_path+img_name #The address of each picture img=Image.open(img_path) img= img.resize((128,128)) img_raw=img.tobytes()#Converting pictures to binary format example = tf.train.Example(features=tf.train.Features(feature={ "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])), 'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw])) })) #example object encapsulates label and image data writer.write(example.SerializeToString()) #Serialization into strings writer.close()
In this way, two types of data 0 and 1 are added to the pictures of dogs and cats, and the file is stored as dog_and_cat_train.tfrecords. You will find this file in the folder where your python code is located.
3. Read tfrecords files
Read out the pictures and labels. The reshape is 128x128x3.
Read the code as a single file, named ReadMyOwnData.py
The code is as follows:
# -*- coding: utf-8 -*- """ tensorflow : read my own dataset @author: caokai """ import tensorflow as tf def read_and_decode(filename): # Read in dog_train.tfrecords filename_queue = tf.train.string_input_producer([filename])#Generate a queue queue reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_queue)#Return the file name and file features = tf.parse_single_example(serialized_example, features={ 'label': tf.FixedLenFeature([], tf.int64), 'img_raw' : tf.FixedLenFeature([], tf.string), })#Remove image data and label img = tf.decode_raw(features['img_raw'], tf.uint8) img = tf.reshape(img, [128, 128, 3]) #Rehape 3-channel image 128*128 img = tf.cast(img, tf.float32) * (1. / 255) - 0.5 #Throwing an img tensor in a stream label = tf.cast(features['label'], tf.int32) #Throwing label tensor in flow return img, label
4. Using Convolutional Neural Network Training
This part of the Python code is named dog_and_cat_train.py
4.1 Define the structure of convolution neural network
To import ReadMyOwnData from the file we read, we initialize the weight using tf.truncated_normal, twice convolution operation, twice maximum pooling, activation function ReLU, full connection layer, and finally y_conv is the second kind of problem of soft Max output. The loss function is optimized by cross-entropy and Adam.
The convolution code is as follows:
# -*- coding: utf-8 -*- """ @author: caokai """ import tensorflow as tf import numpy as np import ReadMyOwnData batch_size = 50 #initial weights def weight_variable(shape): initial = tf.truncated_normal(shape, stddev = 0.1) return tf.Variable(initial) #initial bias def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) #convolution layer def conv2d(x,W): return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME') #max_pool layer def max_pool_4x4(x): return tf.nn.max_pool(x, ksize=[1,4,4,1], strides=[1,4,4,1], padding='SAME') x = tf.placeholder(tf.float32, [batch_size,128,128,3]) y_ = tf.placeholder(tf.float32, [batch_size,1]) #first convolution and max_pool layer W_conv1 = weight_variable([5,5,3,32]) b_conv1 = bias_variable([32]) h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1) h_pool1 = max_pool_4x4(h_conv1) #second convolution and max_pool layer W_conv2 = weight_variable([5,5,32,64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_4x4(h_conv2) #Turn it into a full connection layer and process it with an MLP reshape = tf.reshape(h_pool2,[batch_size, -1]) dim = reshape.get_shape()[1].value W_fc1 = weight_variable([dim, 1024]) b_fc1 = bias_variable([1024]) h_fc1 = tf.nn.relu(tf.matmul(reshape, W_fc1) + b_fc1) #dropout keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) W_fc2 = weight_variable([1024,2]) b_fc2 = bias_variable([2]) y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) #Loss function and optimization algorithm cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1])) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv,1),tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
4.2 training
Read data from your own data set and initialize training
image, label = ReadMyOwnData.read_and_decode("dog_and_cat_train.tfrecords") sess = tf.InteractiveSession() tf.global_variables_initializer().run() coord=tf.train.Coordinator() threads= tf.train.start_queue_runners(coord=coord)
When training feed data, try other articles on the Internet, and encounter the following mistakes:
Cannot feed value of shape (128, 128, 3) for Tensor u'Placeholder_12:0', which has shape '(50, 128, 128, 3)'
Or the following error:
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.
I tried, and the code was written as follows so that I could pass:
example = np.zeros((batch_size,128,128,3)) l = np.zeros((batch_size,1)) try: for i in range(20): for epoch in range(batch_size): example[epoch], l[epoch] = sess.run([image,label])#Remove image and label from the session train_step.run(feed_dict={x: example, y_: l, keep_prob: 0.5}) print(accuracy.eval(feed_dict={x: example, y_: l, keep_prob: 0.5})) #The eval function is similar to re-run, validate, and modify at the same time. except tf.errors.OutOfRangeError: print('done!') finally: coord.request_stop() coord.join(threads)
If you have a better and more efficient way to read tfrecords data sets and train CNN, you can exchange ideas.
Reference resources:
TensorFlow (2) Making its own TFRecord data set read, display and code details