http://blog.csdn.net/weixin_35653315/article/details/71015845 The process of transforming pascal voc dataset into tfrecord is described. In this paper, slim is used to read the generated tfrecord. Read and decode operations are performed by tf.TFRecordReader
import tensorflow as tf
slim = tf.contrib.slim
file_pattern = './pascal_train_*.tfrecord' #File name format
# Adapter 1: Deserialize example into a pre-storage format. Completed by tf
keys_to_features = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
'image/height': tf.FixedLenFeature([1], tf.int64),
'image/width': tf.FixedLenFeature([1], tf.int64),
'image/channels': tf.FixedLenFeature([1], tf.int64),
'image/shape': tf.FixedLenFeature([3], tf.int64),
'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/label': tf.VarLenFeature(dtype=tf.int64),
'image/object/bbox/difficult': tf.VarLenFeature(dtype=tf.int64),
'image/object/bbox/truncated': tf.VarLenFeature(dtype=tf.int64),
}
#Adapter 2: Assemble deserialized data into a more advanced format. Completed by slim
items_to_handlers = {
'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'),
'shape': slim.tfexample_decoder.Tensor('image/shape'),
'object/bbox': slim.tfexample_decoder.BoundingBox(
['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'),
'object/label': slim.tfexample_decoder.Tensor('image/object/bbox/label'),
'object/difficult': slim.tfexample_decoder.Tensor('image/object/bbox/difficult'),
'object/truncated': slim.tfexample_decoder.Tensor('image/object/bbox/truncated'),
}
# Decoder
decoder = slim.tfexample_decoder.TFExampleDecoder(keys_to_features, items_to_handlers)
# The dataset object defines the meta-information of the data set, such as file location, decoding method, etc.
dataset = slim.dataset.Dataset(
data_sources=file_pattern,
reader=tf.TFRecordReader,
num_samples = 3, # Manual generation of three files, each containing only one example
decoder=decoder,
items_to_descriptions = {},
num_classes=21)
#provider object reads data based on dataset information
provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
num_readers=3,
shuffle=False)
[image, shape, glabels, gbboxes] = provider.get(['image', 'shape',
'object/label',
'object/bbox'])
print type(image)
print image.shape
<class 'tensorflow.python.framework.ops.Tensor'> (?, ?, 3)
So far, the returned image is a tensor, and it's a three-dimensional one, only one at a time. A batch needs to be formed. Pictures need to be pre-processed before batch is formed. One is to change the size of the image into a fixed size, and the other is to increase the data. The example code above comes from https://github.com/balancap/SSD-Tensorflow/blob/master/datasets/pascalvoc_common.py#L49
. The following example code comes from https://github.com/balancap/SSD-Tensorflow/blob/master/train_ssd_network.py#L203
# Pre-processing image, labels and bboxes.
image, glabels, gbboxes = \
image_preprocessing_fn(image, glabels, gbboxes,
out_shape=ssd_shape,
data_format=DATA_FORMAT)
# Encode groundtruth labels and bboxes.
gclasses, glocalisations, gscores = \
ssd_net.bboxes_encode(glabels, gbboxes, ssd_anchors)
batch_shape = [1] + [len(ssd_anchors)] * 3
# Training batches and queue.
r = tf.train.batch(
tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
batch_size=FLAGS.batch_size,
num_threads=FLAGS.num_preprocessing_threads,
capacity=5 * FLAGS.batch_size)
b_image, b_gclasses, b_glocalisations, b_gscores = \
tf_utils.reshape_list(r, batch_shape)
# Intermediate queueing: unique batch computation pipeline for all
# GPUs running the training.
batch_queue = slim.prefetch_queue.prefetch_queue(
tf_utils.reshape_list([b_image, b_gclasses, b_glocalisations, b_gscores]),
capacity=2 * deploy_config.num_clones)