Categories of Cats and Dogs in Deep Learning 1

Keywords: JSON github pip

Categorizing dogs and cats, this paper describes the classification algorithm of deep learning.

Cat and Dog

Part 1, Data Set, including:

  1. Download data sets: use Kaggle API to download data sets;
  2. Preprocessing data sets: The data sets are divided into two parts: training and testing.
  3. Display data sets: use Pillow to draw a combination of multiple pictures;

download

Data set: https://www.kaggle.com/c/dogs-vs-cats/data

Data

Download using the Kaggle API. GitHub The order is as follows:

kaggle competitions download -c dogs-vs-cats
Kaggle API

Download data sets:

Download

Training set: 25,000 pictures, 12,500 cats and 12,500 dogs;
Test Set: 12,500 pictures, no distinction between categories;

About the configuration of Kaggle API:

  1. Log in. In the API project on the MyAccount page, click Create New API Token to download kaggle.json, which contains username and key.
kaggle.json
  1. Place kaggle.json in the. kaggle folder. If it does not exist, you need to create a folder.

  2. Modify kaggle.json as a readable permission, Chmod 600.kaggle/kaggle.json.

  3. Install the Kaggle API, pip install kaggle.

  4. Execute download commands, such as kaggle competitions download-c dogs-vs-cats.

Preprocessing

1000 cats and 1000 dogs were used as training set, 400 cats and 400 dogs as test set.

Step 1: Read the data into memory and distinguish between cat and dog.

def list_dataset(dataset_dir):
    """
    //The training data set is read into memory and divided into two parts: cat and dog.
    """
    paths_list, names_list = traverse_dir_files(dataset_dir)
    cats_dict, dogs_dict = dict(), dict()

    for path, name in zip(paths_list, names_list):
        [clz, num, _] = name.split('.')
        num = int(num)
        if clz == 'cat':
            cats_dict[num] = path
        elif clz == 'dog':
            dogs_dict[num] = path
        else:
            continue

    # print('cat: {}, dog: {}'.format(len(cats_dict.keys()), len(dogs_dict.keys())))
    return cats_dict, dogs_dict

Step 2: Copy the dataset, and copy several cats or dogs to a new folder.

def copy_files(target_folder, clz_name, n_start, n_end):
    cats_dict, dogs_dict = list_dataset(O_DATASET_DIR)

    new_train = os.path.join(DATASET_DIR, 'train')
    new_test = os.path.join(DATASET_DIR, 'test')

    mkdir_if_not_exist(DATASET_DIR)
    mkdir_if_not_exist(new_train)
    mkdir_if_not_exist(new_test)

    # test data
    # target_folder = 'train'
    # clz_name = 'cat'
    # n_start = 0
    # n_end = 10

    for i in range(n_start, n_end):
        data_dict = cats_dict if clz_name == 'cat' else dogs_dict
        folder = new_train if target_folder == 'train' else new_test
        shutil.copy(data_dict[i], folder)
    print("[complete]Target folder: {}, category: {}, Start stop: {} ~ {}".format(
        target_folder, clz_name, n_start, n_end))

Step 3: Construct 1000 cats + 1000 dogs training set and 400 cats + 400 dogs testing set.

def main():
    # 1000 cats + 1000 dogs training set; 400 cats + 400 dogs testing set
    copy_files('train', 'cat', 0, 1000)
    copy_files('train', 'dog', 0, 1000)
    copy_files('test', 'cat', 0, 400)
    copy_files('test', 'dog', 0, 400)

Exhibition

Pillow's Image Library is used to compose multiple images into one picture, each of which is proportioned to 416 longest edges.

def draw_multi_imgs(path_list, file_name):
    """
    //Draw similar picture groups
    :param path_list: Picture Path List
    :param file_name: Output file name
    :return: None
    """
    img_w, img_h = 4, 3
    img_size = 416

    try:
        o_images = [Image.open(p) for p in path_list]
        images = []

        for img in o_images:
            wp = img_size / float(img.size[0])
            hsize = int(float(img.size[1]) * float(wp))
            img = img.resize((img_size, hsize), Image.ANTIALIAS)
            images.append(img)

    except Exception as e:
        print('Exception: {}'.format(e))
        return

    new_im = Image.new('RGB', (img_size * img_w, img_size * img_h), color=(255, 255, 255))

    x_offset, y_offset = 0, 0
    for i in range(img_h):
        for j in range(img_w):
            im = images[i * img_w + j]
            new_im.paste(im, (x_offset, y_offset))
            x_offset += 416
        y_offset += 416
        x_offset = 0

    new_im.save(file_name)  # Save pictures

Show 12 random cat pictures and 12 dog pictures in the training set.

def main():
    new_train = os.path.join(DATASET_DIR, 'train')
    new_test = os.path.join(DATASET_DIR, 'test')

    cats_dict, dogs_dict = list_dataset(new_train)
    cats_list = list(cats_dict.values())
    dogs_list = list(dogs_dict.values())
    random.shuffle(cats_list)
    random.shuffle(dogs_list)

    draw_multi_imgs(cats_list[:12], os.path.join(DATA_DIR, 'train_cat.jpg'))
    draw_multi_imgs(dogs_list[:12], os.path.join(DATA_DIR, 'train_dog.jpg'))

Data set:

cats
dogs

At this point, complete the construction of the data set.

Posted by robinstott on Sat, 11 May 2019 10:02:18 -0700