Improvement of YOLOv5 - loss function for target detection
flyfish
Full code download address
The improved source code is fully compatible with the original YOLOv5:v5 version. At the same time, the backbone supports mobilenetv3 and shufflenetv2, and the original backbone supports all of them
Categories include relationships. For example, a target can be a person, a man, or a category with mutually exclusive relationships, such as a person, a cat, and a dog. Try to improve the loss function when the category of data set is mutually exclusive
A category is one that contains relationships
BCEWithLogitsLoss can be used for multi label classification. A target can belong to one or more categories. For example, a target can be people, men and children. There is an inclusive relationship in the category.
Because BCEWithLogitsLoss = Sigmoid + BCELoss, BCEWithLogitsLoss adds Sigmoid to the loss function. The sum of Sigmoid probabilities does not need to be 1.
For example, the calculation result of sigmoid takes out a line and looks at the output [0.5100, 0.6713, 0.5025] in the example code. The cumulative number is not 1. If the defined threshold is greater than or equal to 0.50. Then the target belongs to three classes at the same time. As a result, if it is required to belong to only one class, the largest one can be taken.
Categories are mutually exclusive
If the detected category is a mutually exclusive relationship, such as human, cat and dog, how to transform it?
CrossEntropyLoss = LogSoftmax + NLLLoss
The sum of softmax probabilities is 1 or close to 1. Softmax has a greater probability than other values. If the Sigmoid value is large, the probability is large, but the probability will not be greater than that of another value.
Look at the output [0.2543, 0.4990, 0.2467] in the sample code. The sum of these three numbers is 1.
Sigmoid and Softmax sample code
import torch import torch.nn as nn input = torch.Tensor([[0.0402, 0.7142,0.01], [0.2214, 0.4781,0.01]]) net1 = nn.Sigmoid() output1 = net1(input) print(output1) # tensor([[0.5100, 0.6713, 0.5025], # [0.5551, 0.6173, 0.5025]]) net2 = nn.Softmax(dim=-1) output2 = net2(input) print(output2) # tensor([[0.2543, 0.4990, 0.2467], # [0.3224, 0.4167, 0.2609]])
Softmax is mutually exclusive, so try to use the cross entropy loss transformation.
Change the code as follows or go directly here YOLOv5-ShuffleNetV2-CrossEntropyLoss Download all codes
Training phase
utils/loss.py
class ComputeLoss: # Compute losses def __init__(self, model, autobalance=False): super(ComputeLoss, self).__init__() device = next(model.parameters()).device # get model device h = model.hyp # hyperparameters # Define criteria #changed by Sisyphus BCEcls = nn.CrossEntropyLoss() BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device)) # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3 self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0)) # positive, negative BCE targets print("self.cp, self.cn: ",self.cp,":", self.cn) # Focal loss g = h['fl_gamma'] # focal loss gamma if g > 0: BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g) det = model.module.model[-1] if is_parallel(model) else model.model[-1] # Detect() module self.balance = {3: [4.0, 1.0, 0.4]}.get(det.nl, [4.0, 1.0, 0.25, 0.06, .02]) # P3-P7 self.ssi = list(det.stride).index(16) if autobalance else 0 # stride 16 index self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, model.gr, h, autobalance for k in 'na', 'nc', 'nl', 'anchors': setattr(self, k, getattr(det, k)) def __call__(self, p, targets): # predictions, targets, model device = targets.device lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device) tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets # Losses for i, pi in enumerate(p): # layer index, layer predictions b, a, gj, gi = indices[i] # image, anchor, gridy, gridx print("indices[i] :",indices[i].shape ) tobj = torch.zeros_like(pi[..., 0], device=device) # target obj n = b.shape[0] # number of targets if n: ps = pi[b, a, gj, gi] # prediction subset corresponding to targets # Regression pxy = ps[:, :2].sigmoid() * 2. - 0.5 pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i] pbox = torch.cat((pxy, pwh), 1) # predicted box iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True) # iou(prediction, target) lbox += (1.0 - iou).mean() # iou loss # Objectness tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype) # iou ratio # Classification if self.nc > 1: # cls loss (only if multiple classes) t = torch.full_like(ps[:, 5:], self.cn, device=device) # targets t[range(n), tcls[i]] = self.cp #lcls += self.BCEcls(ps[:, 5:], t) # BCE #changed by Sisyphus 20210914 lcls += self.BCEcls(ps[:, 5:], tcls[i].clone().detach()) # Append targets to text file # with open('targets.txt', 'a') as file: # [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)] obji = self.BCEobj(pi[..., 4], tobj) lobj += obji * self.balance[i] # obj loss if self.autobalance: self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item() if self.autobalance: self.balance = [x / self.balance[self.ssi] for x in self.balance] lbox *= self.hyp['box'] lobj *= self.hyp['obj'] lcls *= self.hyp['cls'] bs = tobj.shape[0] # batch size loss = lbox + lobj + lcls return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach()
Reasoning stage
models/yolo.py
class Detect(nn.Module): stride = None # strides computed during build export = False # onnx export def __init__(self, nc=80, anchors=(), ch=()): # detection layer super(Detect, self).__init__() self.nc = nc # number of classes self.no = nc + 5 # number of outputs per anchor self.nl = len(anchors) # number of detection layers self.na = len(anchors[0]) // 2 # number of anchors self.grid = [torch.zeros(1)] * self.nl # init grid a = torch.tensor(anchors).float().view(self.nl, -1, 2) self.register_buffer('anchors', a) # shape(nl,na,2) self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2) self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv def forward(self, x): # x = x.copy() # for profiling z = [] # inference output self.training |= self.export for i in range(self.nl): x[i] = self.m[i](x[i]) # conv bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85) x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() if not self.training: # inference if self.grid[i].shape[2:4] != x[i].shape[2:4]: self.grid[i] = self._make_grid(nx, ny).to(x[i].device) y = x[i].sigmoid() tmp = x[i][...,5:]# add by Sisyphus tmp = tmp.softmax(dim=-1) y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh y[...,5:] = tmp z.append(y.view(bs, -1, self.no)) return x if self.training else (torch.cat(z, 1), x) @staticmethod def _make_grid(nx=20, ny=20): yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)]) return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
You can try it with your own data set
People who read this article also read the following
Target detection YOLOv5 - Early Stopping mechanism
Object detection YOLOv5 - Fusion of convolution layer and BN layer
Target detection YOLOv5 - Sample Assignment
Target detection YOLOv5 - data enhancement
Target detection YOLOv5 - learning rate
Target detection YOLOv5 - multi machine multi card training
Object detection YOLOv5 - floating point modulo
Target detection YOLOv5 - apply NMS in multiple categories (non maximum suppression)
Object detection yolov5 - loss for objectivity and classification
Object detection yolov5 - loss for bounding box region
Target detection YOLOv5 - index calculation
Target detection YOLOv5 - anchor settings
Object detection YOLOv5 - SPP module
Target detection YOLOv5 - bounding box prediction
Target detection YOLOv5 - custom network structure (YOLOv5 shufflenetv2)
Object detection YOLOv5 - Common bounding box coordinate representation method
Object detection YOLOv5 - relationship between image size and loss weight
Target detection YOLOv5 - change the depth and width of the network according to the configuration
Target detection YOLOv5 - transfer to ncnn mobile deployment
Target detection yolov5 - Focus in backbone
Target detection YOLOv5 - model training, reasoning, export command
Object detection YOLOv5 - face dataset widerface to YOLOv5 format
Target detection YOLOv5 - dataset format used
Target detection YOLOv5 - use COCO dataset in
Target detection YOLOv5 - convert crowdhuman dataset format to YOLOv5 format