Improvement of YOLOv5 - loss function for target detection

Keywords: Pytorch Deep Learning Object Detection yolo yolov5

Improvement of YOLOv5 - loss function for target detection

flyfish

Full code download address
The improved source code is fully compatible with the original YOLOv5:v5 version. At the same time, the backbone supports mobilenetv3 and shufflenetv2, and the original backbone supports all of them

Categories include relationships. For example, a target can be a person, a man, or a category with mutually exclusive relationships, such as a person, a cat, and a dog. Try to improve the loss function when the category of data set is mutually exclusive

A category is one that contains relationships

BCEWithLogitsLoss can be used for multi label classification. A target can belong to one or more categories. For example, a target can be people, men and children. There is an inclusive relationship in the category.
Because BCEWithLogitsLoss = Sigmoid + BCELoss, BCEWithLogitsLoss adds Sigmoid to the loss function. The sum of Sigmoid probabilities does not need to be 1.
For example, the calculation result of sigmoid takes out a line and looks at the output [0.5100, 0.6713, 0.5025] in the example code. The cumulative number is not 1. If the defined threshold is greater than or equal to 0.50. Then the target belongs to three classes at the same time. As a result, if it is required to belong to only one class, the largest one can be taken.

Categories are mutually exclusive

If the detected category is a mutually exclusive relationship, such as human, cat and dog, how to transform it?
CrossEntropyLoss = LogSoftmax + NLLLoss
The sum of softmax probabilities is 1 or close to 1. Softmax has a greater probability than other values. If the Sigmoid value is large, the probability is large, but the probability will not be greater than that of another value.
Look at the output [0.2543, 0.4990, 0.2467] in the sample code. The sum of these three numbers is 1.

Sigmoid and Softmax sample code

import torch
import torch.nn as nn

input = torch.Tensor([[0.0402, 0.7142,0.01],
        [0.2214, 0.4781,0.01]])

net1 = nn.Sigmoid()
output1 = net1(input)
print(output1)
# tensor([[0.5100, 0.6713, 0.5025],
#         [0.5551, 0.6173, 0.5025]])
net2 = nn.Softmax(dim=-1)
output2 = net2(input)
print(output2)
# tensor([[0.2543, 0.4990, 0.2467],
#         [0.3224, 0.4167, 0.2609]])

Softmax is mutually exclusive, so try to use the cross entropy loss transformation.

Change the code as follows or go directly here YOLOv5-ShuffleNetV2-CrossEntropyLoss Download all codes

Training phase

utils/loss.py

class ComputeLoss:
    # Compute losses
    def __init__(self, model, autobalance=False):
        super(ComputeLoss, self).__init__()
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters

        # Define criteria
       

        #changed by Sisyphus

        BCEcls = nn.CrossEntropyLoss()
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets
        print("self.cp, self.cn: ",self.cp,":", self.cn)

        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

        det = model.module.model[-1] if is_parallel(model) else model.model[-1]  # Detect() module
        self.balance = {3: [4.0, 1.0, 0.4]}.get(det.nl, [4.0, 1.0, 0.25, 0.06, .02])  # P3-P7
        self.ssi = list(det.stride).index(16) if autobalance else 0  # stride 16 index
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, model.gr, h, autobalance
        for k in 'na', 'nc', 'nl', 'anchors':
            setattr(self, k, getattr(det, k))

    def __call__(self, p, targets):  # predictions, targets, model
        device = targets.device
        lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)
        tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets

        # Losses
        for i, pi in enumerate(p):  # layer index, layer predictions
            b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx
            print("indices[i] :",indices[i].shape )
            tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj

            n = b.shape[0]  # number of targets
            if n:
                ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets

                # Regression
                pxy = ps[:, :2].sigmoid() * 2. - 0.5
                pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
                pbox = torch.cat((pxy, pwh), 1)  # predicted box
                iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True)  # iou(prediction, target)
                lbox += (1.0 - iou).mean()  # iou loss

                # Objectness
                tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio

                # Classification
                if self.nc > 1:  # cls loss (only if multiple classes)
                    t = torch.full_like(ps[:, 5:], self.cn, device=device)  # targets
                    t[range(n), tcls[i]] = self.cp
                    #lcls += self.BCEcls(ps[:, 5:], t)  # BCE
                    #changed by Sisyphus 20210914
                    lcls += self.BCEcls(ps[:, 5:], tcls[i].clone().detach()) 

                # Append targets to text file
                # with open('targets.txt', 'a') as file:
                #     [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]

            obji = self.BCEobj(pi[..., 4], tobj)
            lobj += obji * self.balance[i]  # obj loss
            if self.autobalance:
                self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()

        if self.autobalance:
            self.balance = [x / self.balance[self.ssi] for x in self.balance]
        lbox *= self.hyp['box']
        lobj *= self.hyp['obj']
        lcls *= self.hyp['cls']
        bs = tobj.shape[0]  # batch size

        loss = lbox + lobj + lcls
        return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach()

Reasoning stage

models/yolo.py

class Detect(nn.Module):
    stride = None  # strides computed during build
    export = False  # onnx export

    def __init__(self, nc=80, anchors=(), ch=()):  # detection layer
        super(Detect, self).__init__()
        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        self.grid = [torch.zeros(1)] * self.nl  # init grid
        a = torch.tensor(anchors).float().view(self.nl, -1, 2)
        self.register_buffer('anchors', a)  # shape(nl,na,2)
        self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2))  # shape(nl,1,na,1,1,2)
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv

    def forward(self, x):
        # x = x.copy()  # for profiling
        z = []  # inference output
        self.training |= self.export
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            if not self.training:  # inference
                if self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i] = self._make_grid(nx, ny).to(x[i].device)

                y = x[i].sigmoid()
                tmp = x[i][...,5:]# add by Sisyphus 
                tmp = tmp.softmax(dim=-1)
                y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                y[...,5:] = tmp
                z.append(y.view(bs, -1, self.no))

        return x if self.training else (torch.cat(z, 1), x)

    @staticmethod
    def _make_grid(nx=20, ny=20):
        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

You can try it with your own data set

People who read this article also read the following
Target detection YOLOv5 - Early Stopping mechanism
Object detection YOLOv5 - Fusion of convolution layer and BN layer
Target detection YOLOv5 - Sample Assignment
Target detection YOLOv5 - data enhancement
Target detection YOLOv5 - learning rate
Target detection YOLOv5 - multi machine multi card training
Object detection YOLOv5 - floating point modulo
Target detection YOLOv5 - apply NMS in multiple categories (non maximum suppression)
Object detection yolov5 - loss for objectivity and classification
Object detection yolov5 - loss for bounding box region
Target detection YOLOv5 - index calculation
Target detection YOLOv5 - anchor settings
Object detection YOLOv5 - SPP module
Target detection YOLOv5 - bounding box prediction
Target detection YOLOv5 - custom network structure (YOLOv5 shufflenetv2)
Object detection YOLOv5 - Common bounding box coordinate representation method
Object detection YOLOv5 - relationship between image size and loss weight
Target detection YOLOv5 - change the depth and width of the network according to the configuration
Target detection YOLOv5 - transfer to ncnn mobile deployment
Target detection yolov5 - Focus in backbone
Target detection YOLOv5 - model training, reasoning, export command
Object detection YOLOv5 - face dataset widerface to YOLOv5 format
Target detection YOLOv5 - dataset format used
Target detection YOLOv5 - use COCO dataset in
Target detection YOLOv5 - convert crowdhuman dataset format to YOLOv5 format

Posted by born4flirt on Tue, 28 Sep 2021 02:52:14 -0700