yolo Learning Series (V): K-means Dimensional Clustering
Deep learning: teach you to do target detection (YOLO, SSD) video tutorial
1. Dimensional Clustering
1.1 Clustering Purpose
Run voc_label.py under Ubuntu system to generate training set and test set list file. Under Windows, there will be coding errors.
- When using their own data sets for YOLO training, the first thing to do is to change anchors to their own size. The default is the size of 20/80 targets in VOC/COCO data sets.
Dimensional clustering is to use training set iteration to find the candidate frame width and height size of the data set. - First release results
***************** n_anchors = 5 ***************** k-means result: (9.503125000000356, 8.046875000000572) k-means result: (4.367264851484575, 6.158106435643181) k-means result: (7.935546874999882, 4.897786458333437) k-means result: (6.179457720588269, 7.807674632352704) k-means result: (3.7825989208630717, 3.5274280575536627) ***************** n_anchors = 6 ***************** k-means result: (6.195027372262894, 7.749087591240633) k-means result: (9.131578947368867, 9.546875000000345) k-means result: (4.324420103092257, 6.20972938144295) k-means result: (9.331640624999876, 5.898828124999875) k-means result: (3.7665307971012534, 3.5264945652170954) k-means result: (7.337187499999761, 4.57375000000048) ***************** n_anchors = 9 ***************** k-means result: (2.6570723684221056, 3.3848684210523685) k-means result: (3.6341911764701336, 5.45174632352828) k-means result: (5.790958737863881, 8.205703883494753) k-means result: (7.290719696969482, 4.354640151515531) k-means result: (5.568824404761428, 5.959821428571835) k-means result: (9.106445312500655, 9.826171875000409) k-means result: (9.356770833333087, 5.37803819444432) k-means result: (8.429036458334373, 7.22656250000046) k-means result: (4.455696202530808, 3.311313291139051)
1.2 source code
# coding=utf-8 # k-means ++ for YOLOv2 anchors # Through k-means ++ Algorithm acquisition YOLOv2 Needed anchors Dimensions import numpy as np # Define the Box class to describe the coordinates of bounding box class Box(): def __init__(self, x, y, w, h): self.x = x self.y = y self.w = w self.h = h # Calculate the overlapping parts of two box es on an axis # x1 is the coordinate of the center of box 1 on this axis # Len 1 is the length of box 1 on this axis # x2 is the coordinate of the center of box2 on this axis # len2 is the length of box 2 on this axis # The return value is the length of the overlap on the axis def overlap(x1, len1, x2, len2): len1_half = len1 / 2 len2_half = len2 / 2 left = max(x1 - len1_half, x2 - len2_half) right = min(x1 + len1_half, x2 + len2_half) return right - left # Calculating the intersection area of box a and box b # Both a and b are examples of Box types # The return value area is the intersection area of box a and box b def box_intersection(a, b): w = overlap(a.x, a.w, b.x, b.w) h = overlap(a.y, a.h, b.y, b.h) if w < 0 or h < 0: return 0 area = w * h return area # Computing the Consolidated Area of box a and box b # Both a and b are examples of Box types # The return value u is the Union area of box a and box b def box_union(a, b): i = box_intersection(a, b) u = a.w * a.h + b.w * b.h - i return u # Calculating iou of box a and box b # Both a and b are examples of Box types # The iou whose return values are box a and box b def box_iou(a, b): return box_intersection(a, b) / box_union(a, b) # Using k-means ++ Initialization centroids,Reducing random initialization centroids Impact on final results # Boxes are a list of Box objects for all bounding boxes # n_anchors is k-means Of k value # The return value centroids is the initialized n_anchors centroid def init_centroids(boxes,n_anchors): centroids = [] boxes_num = len(boxes) centroid_index = np.random.choice(boxes_num, 1) centroids.append(boxes[centroid_index]) print(centroids[0].w,centroids[0].h) for centroid_index in range(0,n_anchors-1): sum_distance = 0 distance_thresh = 0 distance_list = [] cur_sum = 0 for box in boxes: min_distance = 1 for centroid_i, centroid in enumerate(centroids): distance = (1 - box_iou(box, centroid)) if distance < min_distance: min_distance = distance sum_distance += min_distance distance_list.append(min_distance) distance_thresh = sum_distance*np.random.random() for i in range(0,boxes_num): cur_sum += distance_list[i] if cur_sum > distance_thresh: centroids.append(boxes[i]) print(boxes[i].w, boxes[i].h) break return centroids # Carry out k-means New calculation centroids # Boxes are a list of Box objects for all bounding boxes # n_anchors is k-means Of k value # Centoids are the center of all clusters # The return value new_centroids is the calculated new cluster center # The return value group is a list of boxes contained in n_anchors clusters # The return value loss is the sum of the nearest centroid to which all box distances belong def do_kmeans(n_anchors, boxes, centroids): loss = 0 groups = [] new_centroids = [] for i in range(n_anchors): groups.append([]) new_centroids.append(Box(0, 0, 0, 0)) for box in boxes: min_distance = 1 group_index = 0 for centroid_index, centroid in enumerate(centroids): distance = (1 - box_iou(box, centroid)) if distance < min_distance: min_distance = distance group_index = centroid_index groups[group_index].append(box) loss += min_distance new_centroids[group_index].w += box.w new_centroids[group_index].h += box.h for i in range(n_anchors): new_centroids[i].w /= len(groups[i]) new_centroids[i].h /= len(groups[i]) return new_centroids, groups, loss # Centoids for calculating the number of n_anchors for a given bounding boxes # label_path is the file address of the training set list # n_anchors are the number of anchors # Los_convergence is the minimum allowable change of loss # grid_size * grid_size Is the number of grids # iterations_num is the maximum number of iterations # plus = 1Enabled k means ++ Initialization centroids def compute_centroids(label_path,n_anchors,loss_convergence,grid_size,iterations_num,plus): boxes = [] label_files = [] f = open(label_path) for line in f: label_path = line.rstrip().replace('images', 'labels') label_path = label_path.replace('JPEGImages', 'labels') label_path = label_path.replace('.jpg', '.txt') label_path = label_path.replace('.JPEG', '.txt') label_files.append(label_path) f.close() for label_file in label_files: f = open(label_file) for line in f: temp = line.strip().split(" ") if len(temp) > 1: boxes.append(Box(0, 0, float(temp[3]), float(temp[4]))) if plus: centroids = init_centroids(boxes, n_anchors) else: centroid_indices = np.random.choice(len(boxes), n_anchors) centroids = [] for centroid_index in centroid_indices: centroids.append(boxes[centroid_index]) # iterate k-means centroids, groups, old_loss = do_kmeans(n_anchors, boxes, centroids) iterations = 1 while (True): centroids, groups, loss = do_kmeans(n_anchors, boxes, centroids) iterations = iterations + 1 print("loss = %f" % loss) if abs(old_loss - loss) < loss_convergence or iterations > iterations_num: break old_loss = loss for centroid in centroids: print(centroid.w * grid_size, centroid.h * grid_size) # print result for centroid in centroids: print("k-means result: \n") print(centroid.w * grid_size, centroid.h * grid_size) label_path = "/home/chris/workspace/2007_train.txt" n_anchors = 5 loss_convergence = 1e-6 grid_size = 13 iterations_num = 100 plus = 0 compute_centroids(label_path,n_anchors,loss_convergence,grid_size,iterations_num,plus)
1.3 Running Results
k-means result: (8.979910714285644, 5.140624999999976) k-means result: (4.5747282608690005, 7.813858695652043) k-means result: (2.2546296296290005, 7.7939814814810005) k-means result: (11.235351562499998, 9.699218750000407) k-means result: (2.442095588236353, 3.5698529411762943)
1.4 Reference Resources
YOLOv2 obtains anchor boxes through k-means
2.Caffe-SSD Length-Width Ratio Clustering
In this paper, two programs are used, one is get_w_h.py to get the width and height of the data set target (because the function of this part of the reference program is not good), the other is the clustering drawing program in the reference link.
The width and height of the target are obtained by get_w_h.py, and the data 1.txt file with the first and the second ranks is made by excel. The data 1.txt file and the K-means.py file are stored in the same directory, and the PY file can be run.
# The first67That's ok # print(ob_w) print(ob_h)