Categorizing dogs and cats, this paper describes the classification algorithm of deep learning.
data:image/s3,"s3://crabby-images/9281d/9281d96f1e7ea822e243d3dbd2d1f884e3e50f98" alt=""
Part 1, Data Set, including:
- Download data sets: use Kaggle API to download data sets;
- Preprocessing data sets: The data sets are divided into two parts: training and testing.
- Display data sets: use Pillow to draw a combination of multiple pictures;
download
Data set: https://www.kaggle.com/c/dogs-vs-cats/data
data:image/s3,"s3://crabby-images/f3b14/f3b14b6d958fa44a136eb30f23579a21ec51b0cc" alt=""
Download using the Kaggle API. GitHub The order is as follows:
kaggle competitions download -c dogs-vs-cats
data:image/s3,"s3://crabby-images/3de7e/3de7e491f4267d509869f6c3775220a53f417692" alt=""
Download data sets:
data:image/s3,"s3://crabby-images/32a5f/32a5f6417e5f6227d860816fba04ae8112dfbd1f" alt=""
Training set: 25,000 pictures, 12,500 cats and 12,500 dogs;
Test Set: 12,500 pictures, no distinction between categories;
About the configuration of Kaggle API:
- Log in. In the API project on the MyAccount page, click Create New API Token to download kaggle.json, which contains username and key.
data:image/s3,"s3://crabby-images/bb98d/bb98da6e8f839fdaf195a4e9b4e6be2eae9afc05" alt=""
Place kaggle.json in the. kaggle folder. If it does not exist, you need to create a folder.
Modify kaggle.json as a readable permission, Chmod 600.kaggle/kaggle.json.
Install the Kaggle API, pip install kaggle.
Execute download commands, such as kaggle competitions download-c dogs-vs-cats.
Preprocessing
1000 cats and 1000 dogs were used as training set, 400 cats and 400 dogs as test set.
Step 1: Read the data into memory and distinguish between cat and dog.
def list_dataset(dataset_dir): """ //The training data set is read into memory and divided into two parts: cat and dog. """ paths_list, names_list = traverse_dir_files(dataset_dir) cats_dict, dogs_dict = dict(), dict() for path, name in zip(paths_list, names_list): [clz, num, _] = name.split('.') num = int(num) if clz == 'cat': cats_dict[num] = path elif clz == 'dog': dogs_dict[num] = path else: continue # print('cat: {}, dog: {}'.format(len(cats_dict.keys()), len(dogs_dict.keys()))) return cats_dict, dogs_dict
Step 2: Copy the dataset, and copy several cats or dogs to a new folder.
def copy_files(target_folder, clz_name, n_start, n_end): cats_dict, dogs_dict = list_dataset(O_DATASET_DIR) new_train = os.path.join(DATASET_DIR, 'train') new_test = os.path.join(DATASET_DIR, 'test') mkdir_if_not_exist(DATASET_DIR) mkdir_if_not_exist(new_train) mkdir_if_not_exist(new_test) # test data # target_folder = 'train' # clz_name = 'cat' # n_start = 0 # n_end = 10 for i in range(n_start, n_end): data_dict = cats_dict if clz_name == 'cat' else dogs_dict folder = new_train if target_folder == 'train' else new_test shutil.copy(data_dict[i], folder) print("[complete]Target folder: {}, category: {}, Start stop: {} ~ {}".format( target_folder, clz_name, n_start, n_end))
Step 3: Construct 1000 cats + 1000 dogs training set and 400 cats + 400 dogs testing set.
def main(): # 1000 cats + 1000 dogs training set; 400 cats + 400 dogs testing set copy_files('train', 'cat', 0, 1000) copy_files('train', 'dog', 0, 1000) copy_files('test', 'cat', 0, 400) copy_files('test', 'dog', 0, 400)
Exhibition
Pillow's Image Library is used to compose multiple images into one picture, each of which is proportioned to 416 longest edges.
def draw_multi_imgs(path_list, file_name): """ //Draw similar picture groups :param path_list: Picture Path List :param file_name: Output file name :return: None """ img_w, img_h = 4, 3 img_size = 416 try: o_images = [Image.open(p) for p in path_list] images = [] for img in o_images: wp = img_size / float(img.size[0]) hsize = int(float(img.size[1]) * float(wp)) img = img.resize((img_size, hsize), Image.ANTIALIAS) images.append(img) except Exception as e: print('Exception: {}'.format(e)) return new_im = Image.new('RGB', (img_size * img_w, img_size * img_h), color=(255, 255, 255)) x_offset, y_offset = 0, 0 for i in range(img_h): for j in range(img_w): im = images[i * img_w + j] new_im.paste(im, (x_offset, y_offset)) x_offset += 416 y_offset += 416 x_offset = 0 new_im.save(file_name) # Save pictures
Show 12 random cat pictures and 12 dog pictures in the training set.
def main(): new_train = os.path.join(DATASET_DIR, 'train') new_test = os.path.join(DATASET_DIR, 'test') cats_dict, dogs_dict = list_dataset(new_train) cats_list = list(cats_dict.values()) dogs_list = list(dogs_dict.values()) random.shuffle(cats_list) random.shuffle(dogs_list) draw_multi_imgs(cats_list[:12], os.path.join(DATA_DIR, 'train_cat.jpg')) draw_multi_imgs(dogs_list[:12], os.path.join(DATA_DIR, 'train_dog.jpg'))
Data set:
data:image/s3,"s3://crabby-images/4bb2b/4bb2bac5066fa3c748a6d9d10fbd14e921323ade" alt=""
data:image/s3,"s3://crabby-images/2e0ed/2e0ed17cf2f7788c5a82bf8ef7a192bfd224b2dd" alt=""
At this point, complete the construction of the data set.