SRCNN image super resolution reconstruction (tf2)

Keywords: neural networks Pytorch Deep Learning

preface

The image with low resolution caused by enlargement and reduction is converted into an image with high resolution. More attention is paid to filling new pixels in the process of reconstructing the picture. SRCNN is also the ancestor of applying deep learning to image reconstruction. The network structure is very simple, so I decided to reproduce it.
Code link: https://github.com/jiantenggei/SRCNN-Keras (including all resources)

1, SRCNN


The network structure of SRCNN is particularly simple. First, a low-resolution image is used as the input, and after two convolutions, it is restored to a high-quality image. In the network convolution operation, the size of the characteristic image is consistent with that of the reconstructed image. And there is no linear connection in the network. There are only three layers of convolution ~.

2, SRCNN implementation

Firstly, the training and testing process of SRCNN network is briefly described here:
1. First reduce the picture, then enlarge it, and make an unclear picture as training data
2. Use the untreated original picture as the label during training.
3. Put pictures and labels into the network for training.
4. When testing the model, input an unclear picture into the trained network, and calculate the peak signal-to-noise ratio between the generated picture and the unclear picture.

1. Model construction

The code is as follows:

from keras.models import Sequential, model_from_json
from keras.layers.convolutional import Conv2D
from keras.layers.core import Activation


def built_model(input_shape=(33, 33, 1)):
    model = Sequential()
    model.add(Conv2D(filters=64, kernel_size=9,
                    padding='same', input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(Conv2D(32, 1, padding='same'))
    model.add(Activation('relu'))
    model.add(Conv2D(input_shape[2], 5, padding='same'))
    return model

if __name__ == '__main__':
    model = built_model()
    model.summary()

The last layer restores the image from the multi-channel feature map to be consistent with the input channel.

2. Generate training data

1. First reduce the picture, then enlarge it, and make an unclear picture as training data
2. Use the untreated original picture as the label during training.
The code is as follows:

def load_train(image_size=33, stride=33, scale=3,dirname=r'dataset\train'):
    dir_list = os.listdir(dirname)
    images = [cv2.cvtColor(cv2.imread(os.path.join(dirname,img)),cv2.COLOR_BGR2GRAY) for img in dir_list]
    #==========================
    #Here, it is determined whether the sampling step can be divided
    #=========================
    images = [img[0:img.shape[0]-np.remainder(img.shape[0],scale),0:img.shape[1]-np.remainder(img.shape[1],scale)] for img in images]

    trains = images.copy()
    labels = images.copy()
    #========================================
    #Reduce and enlarge the train image to produce an unclear image
    #========================================
    trains = [cv2.resize(img, None, fx=1/scale, fy=1/scale, interpolation=cv2.INTER_CUBIC) for img in trains]
    trains = [cv2.resize(img, None, fx=scale/1, fy=scale/1, interpolation=cv2.INTER_CUBIC) for img in trains]

    sub_trains = []
    sub_labels = []
    
    #========================================
    # Label and training data are formed by sampling,
    # Through sampling, a picture can be divided into many image blocks as training data to enrich the samples
    #========================================
    for train, label in zip(trains, labels):
        v, h = train.shape
        print(train.shape)
        for x in range(0,v-image_size+1,stride):
            for y in range(0,h-image_size+1,stride):
                sub_train = train[x:x+image_size,y:y+image_size]
                sub_label = label[x:x+image_size,y:y+image_size]
                sub_train = sub_train.reshape(image_size,image_size,1)
                sub_label = sub_label.reshape(image_size,image_size,1)
                sub_trains.append(sub_train)
                sub_labels.append(sub_label)
    #========================================
    #Encoded as numpy array
    #========================================
    sub_trains = np.array(sub_trains)
    sub_labels = np.array(sub_labels)
    return sub_trains, sub_labels

def load_test(scale=3,dirname=r'dataset\test'):
    #========================================
    # Test data is generated in the same way as training data
    # pre_tests is used to save the reduced image
    #========================================
    dir_list = os.listdir(dirname)
    images = [cv2.cvtColor(cv2.imread(os.path.join(dirname,img)),cv2.COLOR_BGR2GRAY) for img in dir_list]
    images = [img[0:img.shape[0]-np.remainder(img.shape[0],scale),0:img.shape[1]-np.remainder(img.shape[1],scale)] for img in images]

    tests = images.copy()
    labels = images.copy()
    
    pre_tests = [cv2.resize(img, None, fx=1/scale, fy=1/scale, interpolation=cv2.INTER_CUBIC) for img in tests]
    tests = [cv2.resize(img, None, fx=scale/1, fy=scale/1, interpolation=cv2.INTER_CUBIC) for img in pre_tests]
    
    pre_tests = [img.reshape(img.shape[0],img.shape[1],1) for img in pre_tests]
    tests = [img.reshape(img.shape[0],img.shape[1],1) for img in tests]
    labels = [img.reshape(img.shape[0],img.shape[1],1) for img in labels]

    return pre_tests, tests, labels

Note: the sampling process (three for training places) in the code is to intercept a picture and small areas, so that one picture can generate multiple data to make up for the shortage of training samples.

3. Training process:

The code is as follows:

from tensorflow.python.keras.saving.model_config import model_from_config
from model import built_model
from utils import load_train
from keras.optimizers import Adam

def train():
    # ==========================
    # input_shape input picture size
    # Stripe original picture sampling interval
    # batch_size epochs learning_rate
    #============================
    input_shape = (33, 33, 1)
    stride = 14
    batch_size = 64
    epochs=100
    learning_rate=0.001

    # Define model
    srcnn_model = built_model(input_shape=input_shape)
    srcnn_model.load_weights(r'model\srcnn_weight.hdf5')
    srcnn_model.summary()

    # Load data
    X_train, Y_train = load_train(image_size=input_shape[0], stride=stride)
    print(X_train.shape, Y_train.shape)
    optimizer = Adam(lr=learning_rate)
    srcnn_model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['accuracy'])
    srcnn_model.fit(X_train,Y_train,epochs=epochs,batch_size=batch_size)
    srcnn_model.save(r'model/srcnn.h5')

if __name__ == '__main__':
    train()

The mean square deviation is used to calculate the loss here, because the input and output are pictures of the same size ~ but with different resolutions.

4. Test process

The code is as follows:

from model import built_model
import os
from utils import load_test,psnr
import cv2
def test():
    input_shape = (None, None, 1)
    scale = 3
    srcnn_model = built_model(input_shape=input_shape)
    srcnn_model.load_weights(r'model\srcnn_weight.hdf5')

    X_pre_test, X_test, Y_test = load_test(scale=scale)

    predicted_list = []

    for img in X_test:
        img = img.reshape(1,img.shape[0],img.shape[1],1)
        predicted=srcnn_model.predict(img)
        predicted_list.append(predicted.reshape(predicted.shape[1],predicted.shape[2],1))
    n_img = len(predicted_list)
    dirname = './result'
    for i in range(n_img):
        imgname = 'image{:02}'.format(i)
        cv2.imwrite(os.path.join(dirname,imgname+'_original.bmp'), X_pre_test[i])
        cv2.imwrite(os.path.join(dirname,imgname+'_input.bmp'), X_test[i])
        cv2.imwrite(os.path.join(dirname,imgname+'_answer.bmp'), Y_test[i])
        cv2.imwrite(os.path.join(dirname,imgname+'_predicted.bmp'), predicted_list[i])
          # Calculate peak signal-to-noise ratio
        answer = psnr(X_test[i],predicted_list[i])
        print(imgname+"_psnr:",answer)

if __name__ == '__main__':
    test()

X_test stores unclear pictures, which are used to calculate the peak signal-to-noise ratio with the prediction results.

summary

Various networks have different functions, different ways of feeding data and different ways of calculating losses~

Posted by nerotic on Sun, 03 Oct 2021 19:21:01 -0700