Introduction to MNIST Data Set
Introduction to MNIST Data Set
Data Set Introduction
The MNIST data set comes from the National Institute of Standards and Technology (NIST).
The entire data set consists of 250 different handwritten figures, 50% of which are high school students and 50% are Census Bureau staff.
The MNIST data set is available at http://yann.lecun.com/exdb/mnist/, which consists of four parts:
Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB after decompression, containing 60,000 samples)
Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB after decompression, containing 60,000 tags)
Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB after decompression, containing 10,000 samples)
Test set labels: t10k-labels-idx1-ubyte.gz (5KB, 10KB after decompression, containing 10,000 tags)
The downloaded data set is as follows:
Data Set Download
MNIST data set is the entry data set of Tensorflow. The MNIST data set API has been integrated in Tensorflow. We can download the data set and view the content of the data set through the following code.
# coding:utf-8 # Introduce modules from tensorflow.examples.tutorials.mnist. This is TensorFlow's program for teaching MNIST. from tensorflow.examples.tutorials.mnist import input_data # Read MNIST data from MNIST_data/. This statement automatically executes download when data does not exist mnist = input_data.read_data_sets("/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/", one_hot=True) # View the size of training data print(mnist.train.images.shape) # (55000, 784) print(mnist.train.labels.shape) # (55000, 10) # View the size of validation data print(mnist.validation.images.shape) # (5000, 784) print(mnist.validation.labels.shape) # (5000, 10) # View the size of the test data print(mnist.test.images.shape) # (10000, 784) print(mnist.test.labels.shape) # (10000, 10) # Print out the Vector Representation of the 0th Picture print(mnist.train.images[0, :]) # Print out the label of the 0th picture print(mnist.train.labels[0, :])
Basic operation of data set
Print data set labels
label.py
# coding: utf-8 from tensorflow.examples.tutorials.mnist import input_data import numpy as np # Read the mnist dataset. If it does not exist, it will be downloaded in advance. mnist = input_data.read_data_sets("/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/", one_hot=True) # Look at the label of the first 20 training pictures for i in range(20): # Get one-hot representation, such as (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) one_hot_label = mnist.train.labels[i, :] # Through np.argmax, we can get the original label directly. # Because only one is 1, and the rest is 0. label = np.argmax(one_hot_label) print('mnist_train_%d.jpg label: %d' % (i, label)
The results are as follows:
mnist_train_0.jpg label: 7 mnist_train_1.jpg label: 3 mnist_train_2.jpg label: 4 mnist_train_3.jpg label: 6 mnist_train_4.jpg label: 1 mnist_train_5.jpg label: 8 mnist_train_6.jpg label: 1 mnist_train_7.jpg label: 0 mnist_train_8.jpg label: 9 mnist_train_9.jpg label: 8 mnist_train_10.jpg label: 0 mnist_train_11.jpg label: 3 mnist_train_12.jpg label: 1 mnist_train_13.jpg label: 2 mnist_train_14.jpg label: 7 mnist_train_15.jpg label: 0 mnist_train_16.jpg label: 2 mnist_train_17.jpg label: 9 mnist_train_18.jpg label: 6 mnist_train_19.jpg label: 0
Save as jpg picture
We know that the data set is actually a picture, and we can save the data set as a picture.
save_pic.py
#coding: utf-8 from tensorflow.examples.tutorials.mnist import input_data import scipy.misc import os # Read the MNIST data set. If it does not exist, it will be downloaded. mnist = input_data.read_data_sets("/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/", one_hot=True) # We save the original image in the MINIST_data/raw/folder # If there is no folder, it will be created automatically. save_dir = '/Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/raw/' if os.path.exists(save_dir) is False: os.makedirs(save_dir) # Save the first 20 pictures for i in range(20): # Note that mnist.train.images[i,:] denotes the first picture (the serial number starts at 0). image_array = mnist.train.images[i, :] # The MNIST image in TensorFlow is a 784-dimensional vector. We restore it to the 28x28-dimensional image. image_array = image_array.reshape(28, 28) # The format of the saved file is mnist_train_0.jpg, mnist_train_1.jpg,..., mnist_train_19.jpg filename = save_dir + 'mnist_train_%d.jpg' % i # Save image_array as a picture # Scpy. misc. toimage is first converted to an image, and then saved directly by calling save. scipy.misc.toimage(image_array, cmin=0.0, cmax=1.0).save(filename) print('Please check: %s ' % save_dir)
The results are as follows:
Please check: /Users/zhusheng/WorkSpace/Dataset/7-Mnist/MNIST_data/raw/
Let's take a look at the download directory, as follows: