Data loading in deep learning (tensorflow2.0 as an example)

Keywords: Pycharm Python network Google

background knowledge

In deep learning, data preprocessing takes up a large part of our training process. How to use different databases to load our data is particularly important. So this blog will combine Data loading in tensorflow 2.0 The official documentation describes how to load our own tagged data and tags.

testing environment

  • Operating system: Windows10
  • Deep learning framework: Tensorflow==2.0 or plus
  • IDE: Pycharm

Introduction to custom data format

                          . Because I am interested in the field of image, so the data will take image classification as an example to demonstrate and explain.

The link for data download is   Demo data download
   
Simply look at the data storage mode. The directory structure of the data storage mode is as follows:

You can see that the parent node directory of each image is the result of their classification, and the content in each folder is the classified image. After we have prepared the data, we can start the operation of the data loading part.

Custom data loading

The data loading functions to be used in tensorflow==2.0 are:

# The main function of data recording module is to dynamically obtain corresponding slices by using iterator
ds = tf.data.Dataset.from_tensor_slices()
# Using a set of function functions output by ds is a mapping (this is similar to the transform in Python)
ds = ds.map(function)
# From the function name, it can be seen that the function is to scramble the whole data and fill it up (waiting for a little time) where the buffer size is buffer_size, and restart it by repeat
obj = tf.data.experimental.shuffle_and_repeat(buffer_size = buffer_size)
# Set batch size, object intrinsic property
ds = ds.batch(BATCH_SIZE)
# Get data from cache, object intrinsic properties
ds = ds.prefetch(buffer_size=AUTOTUNE)

With the above function introduction, now start the operation steps:

  • Get the path and label of the image as the parameter of the function DS = tf.data.dataset.from menu slices
  • Using ds.map to add image reading, preprocessing (normalization, resize, etc.) operations
  • Scramble the resulting data and set the batch [size]
  • get data

Preset parameters

import tensorflow as tf
AUTOTUNE = tf.data.experimental.AUTOTUNE

Path and label of image, get object ds

# Data_root here is the address where you download the image and save it. Here, data_root = "C:\Users\wu\.keras\datasets\flower_photos"pathlib generates a path object
import pathlib
data_root = pathlib.Path(data_root)
# Object to get the path of all pictures
all_image_paths = list(data_root.glob("*/*"))
# String to get the path of all pictures
all_image_paths = [str(path) for path in all_image_paths]
# Disrupt all paths
import random
random.shuffle(all_image_paths)
# Picture quantity
image_count = len(all_image_paths)
# Image name
label_names = sorted(item.name for item in data_root.glob("*/") if item.is_dir())
# Set mapping for image name, mapping result is integer
label_to_index = dict((name, index) for index, name in enumerate(label_names))
# Get the label corresponding to all images
all_image_labels = [label_to_index[pathlib.Path(path).parent.name] for path in all_image_paths]

# Get object ds
ds = tf.data.Dataset.from_tensor_slices((all_image_paths,all_image_labels))

Image reading, preprocessing

# Preprocessing picture
def preprocess_image(image):
    image = tf.image.decode_jpeg(image,channels=3)
    image = tf.image.resize(image,[192,192])
    image /=255.0
    return image
# Read picture + preprocess
def load_and_preprocess_image(path):
    image = tf.io.read_file(path)
    return preprocess_image(image)
# Read picture + preprocess + get the corresponding label of picture
def load_and_preprocess_from_path_label(path,label):
    return load_and_preprocess_image(path),label
# Attach the preprocessed part of the picture to the sliced object
image_label_ds = ds.map(load_and_preprocess_from_path_label)

Scramble the resulting data and set the batch [size]

ds = image_label_ds.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds = ds.batch(Batch_Size)
ds = ds.prefetch(buffer_size=AUTOTUNE)

get data

def change_range(image,label):
    return 2*image-1,label
keras_ds = ds.map(change_range)
image_batch,label_batch = next(iter(keras_ds))

At this point, the image data can be read and input to the neural network.

1. https://tensorflow.google.cn/tutorials/

Published 5 original articles, won praise 13, visited 10000+
Private letter follow

Posted by austenr on Sun, 15 Mar 2020 20:29:32 -0700