[model training] ubuntu compiles Darknet and YOLO training

Keywords: Algorithm AI Deep Learning yolo

Welcome to my official account, reply to 001 Google programming specification.

  O_o   >_<   o_O   O_o   ~_~   o_O

  Hello, I'm Jizhi horizon. This paper introduces the method of compiling darknet and yolo training on ubuntu.

1. Compiling darknet

1.1 compiling opencv

     I won't say much about the installation of cuda and cudnn. For the compilation of opencv, please refer to what I wrote before< [experience sharing] opencv method of compiling / cross compiling in x86, aarch64 and arm32 environments >, which records the method of compiling opencv on x86, aarch64 and arm32 platforms, which is concise and effective.

1.2 compiling darknet

    clone Source Code:

git clone https://github.com/AlexeyAB/darknet.git

cd darknet

    Modify Makefile and open gpu, opencv and openmp:


    Then start compiling, which is simple:

make -j32

    After that, verify whether the installation is successful:

./darknet detect cfg/yolov3.cfg cfg/yolov3.weights data/dog.jpg

    Of course, yolov3.weights needs to be downloaded and transmitted by yourself: https://pjreddie.com/media/fi...

    After running successfully, a classic detection diagram predictions.jpg will be generated in the < Darknet Path > Directory:

2. Yolo training

2.1 making VOC dataset

  you can make your own data set in VOC format, or you can directly use VOC data for training.

  for how to make VOC format data, please refer to my article:< [experience sharing] target detection VOC format data set production >, the introduction was quite detailed.

2.2 Yolo training

    After having the data set, and then get the model structure file and pre training weight, you can start a happy alchemy journey. In fact, many model structure files have been provided in the CFG folder, such as yolov3.cfg, yolov3-tiny.cfg, yolov4.cfg, yolov4-tiny.cfg, etc. you only need to find the corresponding pre training weight, such as:

    Next, let's take yolov4 as an example and start our pleasant training trip.

    This is a non desktop environment, so I added - dont_show pass parameters.

./darknet detector train cfg/voc.data cfg/yolov4.cfg yolov4.conv.137 -dont_show

  from the above command,. / darknet detector train is fixed, others:

  • cfg/voc.data: transfer training data;
  • cfg/yolov4.cfg: structure of training model;
  • yolov4.conv.137: pre training weight

  the above command to execute training is very clear. Take a look at voc.data:

classes= 20                                        # Number of target detection categories
train  = /home/pjreddie/data/voc/train.txt         # Training data set
valid  = /home/pjreddie/data/voc/test.txt          # Test data set
names = data/voc.names                             # Category name
backup = /home/pjreddie/backup/                    # Intermediate weight backup directory during training

  in. cfg, we can also make some changes according to our own training, mainly some parameters in [net]:

batch=64                  # batch settings
subdivisions=32           # Each time the data transferred into the batch/subdivision is insufficient, increase the parameter if the gpu memory is insufficient
# Training
width=608                 # Picture width
height=608                # Picture high
channels=3                # Number of channels
momentum=0.949            # Momentum, the influence gradient decreases to the optimal value velocity
decay=0.0005              # Weight attenuation regular term to prevent overfitting
angle=0                   # Increase training samples by rotation angle
saturation = 1.5          # Increase training samples by adjusting image saturation
exposure = 1.5            # Increase the training samples by adjusting the exposure
hue=.1                    # Increase the training samples by adjusting the tone

learning_rate=0.0013      # Learning rate, which is an important parameter, determines the speed of training convergence and whether it can achieve good results
burn_in=1000              # The learning rate setting is related. When it is less than the parameter, there is a way to update it. When it is greater than the parameter, the policy update method is adopted
max_batches = 500500      # Stop training when the training batch reaches this participation
policy=steps              # Learning rate adjustment strategy
steps=400000,450000       # step and scales are used together, which means that the learning rate decreases by 10 times when it reaches 400000 and 450000 respectively, because it converges slowly later

#cutmix=1                 # cutmix transform is a way of data enhancement
mosaic=1                  # mosaic transform is a way of data enhancement

  in addition to these, if you are training your own dataset, the number of detected categories may not be the official 20. Therefore, some modifications need to be made to the yolo layer. Take one of the yolo layers:

filters=75          # filters = 3*(classes+5), which needs to be modified according to the number of your classes

mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=20        # Number of detection categories
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.05

    The anchors in yolo layer also need to be modified for different detection tasks. For example, for the detection person, the anchor frame needs to be thin and tall, such as the detection vehicle, which may prefer the narrow and wide anchor frame. Then, some parameters such as confidence threshold and nms threshold also need to be adjusted during training.

     Then explain why you need to modify the filters of one layer of convolution on yolo. I'm in this article< [experience sharing] analyze the pointer offset logic of darknet entry_index >Some analysis has been done. From the data layout of yolo layer:

     (1) According to four dimensions [N, C, H, W], N is batch, C is 3 * (5 + classes), and H / W is feature_map height and width. It needs to be explained that C, C = 3 * (1 + 4 + classes), where 1 represents the confidence, 4 is the location information of the detection box, and classes is the number of categories, that is, each category gives a detection score, and multiplying 3 means that each grid has 3 anchor boxes. In this way, the four-dimensional data arrangement accepted by yolo layer is formed, that is, the output data arrangement of the previous layer of yolo;

     (2) As for the output of yolo layer, one-dimensional dynamic array will be used in darknet to store the data of yolo layer. Here is how to convert four-dimensional data into one-dimensional data. This is done in darknet. Assuming that the four-dimensional data is [n, c, h, w] and the corresponding index of each dimension is [n, c, h, w], then the expansion is n*C*H*W + c*H*W + h*W + w, which is stored in * output according to this logic.

    In this way, looking back, it should be better to understand why the filters of the convolution layer on yolo are 3 * (classes + 5).

  well, let's start training, or perform:

./darknet detector train cfg/voc.data cfg/yolov4.cfg yolov4.conv.137 -dont_show

  if you need to save the training log, you can do this:

./darknet detector train cfg/voc.data cfg/yolov4l.cfg yolov4.conv.137 2>1 | tee visualization/train_yolov4.log

    The console will output the training log:

  after the training, the final and intermediate weight files obtained from the training will be saved in backup = /home/pjreddie/backup /. If the effect is satisfactory, it can be deployed. For target detection, the indicator to measure the effect is generally map.

    Well, the above share the methods of compiling darknet on ubuntu and training yolo. I hope my sharing can be a little helpful to your learning.

[official account transmission]
<[model training] ubuntu compiles Darknet and YOLO training>

Posted by surajkandukuri on Tue, 30 Nov 2021 05:17:38 -0800