Introduction to MNIST Data Set

Keywords: REST

Introduction to MNIST Data Set

Introduction to MNIST Data Set

Data Set Introduction

The MNIST data set comes from the National Institute of Standards and Technology (NIST).

The entire data set consists of 250 different handwritten figures, 50% of which are high school students and 50% are Census Bureau staff.

The MNIST data set is available at http://yann.lecun.com/exdb/mnist/, which consists of four parts:

Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB after decompression, containing 60,000 samples)
Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB after decompression, containing 60,000 tags)
Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB after decompression, containing 10,000 samples)
Test set labels: t10k-labels-idx1-ubyte.gz (5KB, 10KB after decompression, containing 10,000 tags)

The downloaded data set is as follows:

Data Set Download

MNIST data set is the entry data set of Tensorflow. The MNIST data set API has been integrated in Tensorflow. We can download the data set and view the content of the data set through the following code.

# coding:utf-8
# Introduce modules from tensorflow.examples.tutorials.mnist. This is TensorFlow's program for teaching MNIST.
from tensorflow.examples.tutorials.mnist import input_data

# Read MNIST data from MNIST_data/. This statement automatically executes download when data does not exist
mnist = input_data.read_data_sets("/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/", one_hot=True)

# View the size of training data
print(mnist.train.images.shape)  # (55000, 784)
print(mnist.train.labels.shape)  # (55000, 10)

# View the size of validation data
print(mnist.validation.images.shape)  # (5000, 784)
print(mnist.validation.labels.shape)  # (5000, 10)

# View the size of the test data
print(mnist.test.images.shape)  # (10000, 784)
print(mnist.test.labels.shape)  # (10000, 10)

# Print out the Vector Representation of the 0th Picture
print(mnist.train.images[0, :])

# Print out the label of the 0th picture
print(mnist.train.labels[0, :])

Basic operation of data set

Print data set labels

label.py

# coding: utf-8
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
# Read the mnist dataset. If it does not exist, it will be downloaded in advance.
mnist = input_data.read_data_sets("/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/", one_hot=True)

# Look at the label of the first 20 training pictures
for i in range(20):
    # Get one-hot representation, such as (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
    one_hot_label = mnist.train.labels[i, :]
    # Through np.argmax, we can get the original label directly.
    # Because only one is 1, and the rest is 0.
    label = np.argmax(one_hot_label)
    print('mnist_train_%d.jpg label: %d' % (i, label)

The results are as follows:

mnist_train_0.jpg label: 7
mnist_train_1.jpg label: 3
mnist_train_2.jpg label: 4
mnist_train_3.jpg label: 6
mnist_train_4.jpg label: 1
mnist_train_5.jpg label: 8
mnist_train_6.jpg label: 1
mnist_train_7.jpg label: 0
mnist_train_8.jpg label: 9
mnist_train_9.jpg label: 8
mnist_train_10.jpg label: 0
mnist_train_11.jpg label: 3
mnist_train_12.jpg label: 1
mnist_train_13.jpg label: 2
mnist_train_14.jpg label: 7
mnist_train_15.jpg label: 0
mnist_train_16.jpg label: 2
mnist_train_17.jpg label: 9
mnist_train_18.jpg label: 6
mnist_train_19.jpg label: 0

Save as jpg picture

We know that the data set is actually a picture, and we can save the data set as a picture.

save_pic.py

#coding: utf-8
from tensorflow.examples.tutorials.mnist import input_data
import scipy.misc
import os

# Read the MNIST data set. If it does not exist, it will be downloaded.
mnist = input_data.read_data_sets("/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/", one_hot=True)

# We save the original image in the MINIST_data/raw/folder
# If there is no folder, it will be created automatically.
save_dir = '/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/raw/'
if os.path.exists(save_dir) is False:
    os.makedirs(save_dir)

# Save the first 20 pictures
for i in range(20):
    # Note that mnist.train.images[i,:] denotes the first picture (the serial number starts at 0).
    image_array = mnist.train.images[i, :]
    # The MNIST image in TensorFlow is a 784-dimensional vector. We restore it to the 28x28-dimensional image.
    image_array = image_array.reshape(28, 28)
    # The format of the saved file is mnist_train_0.jpg, mnist_train_1.jpg,..., mnist_train_19.jpg
    filename = save_dir + 'mnist_train_%d.jpg' % i
    # Save image_array as a picture
    # Scpy. misc. toimage is first converted to an image, and then saved directly by calling save.
    scipy.misc.toimage(image_array, cmin=0.0, cmax=1.0).save(filename)
print('Please check: %s ' % save_dir)

The results are as follows:

Please check: /Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/raw/ 

Let's take a look at the download directory, as follows:

Posted by lessthanthree on Sun, 06 Oct 2019 14:34:55 -0700