Detectron 2 (target detection framework) - 07: source details - model building - retinaet as an example

Keywords: network P4 Lambda

The following links are personal opinions on detectron 2 (target detection framework). If there is any error, please point out it. I will correct it as soon as possible. Interested friends can add wechat: a944284742 to discuss technology with each other. If it helps you anything, remember to like it! Because it's the biggest encouragement for me.
Detectron 2 (target detection framework) no dead angle play-00: Directory


Through the previous introduction, we can say that we have a very good understanding of detectron 2 in general. Next, let's look at the construction of network model. Based on the previous explanation, all models have been and are known to exist in a container. To obtain the desired model from the container, the core code is located in detectron 2 / modeling / meta arch /

from detectron2.utils.registry import Registry
META_ARCH_REGISTRY = Registry("META_ARCH")  # noqa F401 
META_ARCH_REGISTRY.__doc__ = """

def build_model(cfg):
    meta_arch = cfg.MODEL.META_ARCHITECTURE
    return META_ARCH_REGISTRY.get(meta_arch)(cfg)

It first loads the model container through meta arch Registry ("meta arch"), then constructs it through meta arch = retinaet, and obtains the retinaet model.


The implementation of RetinaNet is located in / detectron 2 / modeling / meta arch / as follows:

class RetinaNet(nn.Module):

The @ meta? Arch? Registry. Register() on it is to register the retinaet model into the meta? Arch? Registry container. First, let's look at the functions implemented by RetinaNet, as follows:

# Network front line communication
def forward(self, batched_inputs):

# Calculate loss
def losses(self, gt_classes, gt_anchors_deltas, pred_class_logits, pred_anchor_deltas):

# Combining with anchors to obtain ground truth
def get_ground_truth(self, anchors, targets):

# Network reasoning, if the size of the picture is the same, a single batch_
def inference(self, box_cls, box_delta, anchors, image_sizes):

# Network reasoning: if the size of pictures is the same, they cannot be unified into one batch size, then reasoning is carried out one by one
def inference_single_image(self, box_cls, box_delta, anchors, image_size):

# Image preprocessing
def preprocess_image(self, batched_inputs):

After a general understanding of the functions implemented, let's take a look at the initialization function of class RetinaNet:

    def __init__(self, cfg):

        # GPU related designation
        self.device = torch.device(cfg.MODEL.DEVICE)

        # fmt: off,
        # 11 when training my dataset
        self.num_classes              = cfg.MODEL.RETINANET.NUM_CLASSES
        # Default is ['p3 ',' P4 ',' P5 ',' P6 ',' P7 ']
        self.in_features              = cfg.MODEL.RETINANET.IN_FEATURES

        # Loss parameters: the correlation coefficient of focal Hou loss and the smoothing coefficient of L1
        self.focal_loss_alpha         = cfg.MODEL.RETINANET.FOCAL_LOSS_ALPHA
        self.focal_loss_gamma         = cfg.MODEL.RETINANET.FOCAL_LOSS_GAMMA
        self.smooth_l1_loss_beta      = cfg.MODEL.RETINANET.SMOOTH_L1_LOSS_BETA

        # Inference parameters:
        self.score_threshold          = cfg.MODEL.RETINANET.SCORE_THRESH_TEST
        self.topk_candidates          = cfg.MODEL.RETINANET.TOPK_CANDIDATES_TEST
        self.nms_threshold            = cfg.MODEL.RETINANET.NMS_THRESH_TEST

        # Maximum number of box es per picture
        self.max_detections_per_image = cfg.TEST.DETECTIONS_PER_IMAGE
        # fmt: on

        # Backbone network
        self.backbone = build_backbone(cfg)

        # The output shape of the feature map obtained from the backbone network mainly specifies the multiple of scaling, as follows:
        # p3 = ShapeSpec(channels=256, height=None, width=None, stride=8)
        # p4 = ShapeSpec(channels=256, height=None, width=None, stride=16)
        # p5 = ShapeSpec(channels=256, height=None, width=None, stride=32)
        # p6 = ShapeSpec(channels=256, height=None, width=None, stride=64)
        # ......
        backbone_shape = self.backbone.output_shape()
        feature_shapes = [backbone_shape[f] for f in self.in_features]
        # Head model building, mainly building multiple branches, respectively box, confidence regression, and classification
        self.head = RetinaNetHead(cfg, feature_shapes)
        # According to the parameters of cfg configuration, generate anchor
        self.anchor_generator = build_anchor_generator(cfg, feature_shapes)

        # Matching and loss
        # Convert box (dx, dy, dw, dh) to center band and offset value
        self.box2box_transform = Box2BoxTransform(weights=cfg.MODEL.RPN.BBOX_REG_WEIGHTS)
        # The predicted box matches the ground truth
        self.matcher = Matcher(

        # Average pixel level
        pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(self.device).view(3, 1, 1)
        # Standard deviation at pixel level
        pixel_std = torch.Tensor(cfg.MODEL.PIXEL_STD).to(self.device).view(3, 1, 1)
        # Regularization
        self.normalizer = lambda x: (x - pixel_mean) / pixel_std

In fact, through the creation of the above classes and functions, the process is outlined, but they only create classes and functions, which will be called in the back propagation. So let's take a look at retinaet's anti back propagation.


def forward(self, batched_inputs):   
            batched_inputs: a list, batch
                Each item in the list con
                For now, each item in the
                * image: Tensor, image in
                * instances: Instances   
                Other information that's 
                * "height", "width" (int)
                    See :meth:`postproces
            dict[str: Tensor]:           
                mapping from a named loss
        # Batched input is a batch size
        # If it includes: 'file Ou name' = '/ data / ZWH / 0
        # 'height' = 492     'width' = 65
        # 'instances': some comments, such as the coordinate of semantic segmentation, the seat of box
        # The preprocess? Image function is used to
        images = self.preprocess_image(ba
        if "instances" in batched_inputs[
            gt_instances = [x["instances"
        elif "targets" in batched_inputs[
                logging.WARN, "'targets' 
            gt_instances = [x["targets"].
            gt_instances = None          
        # The characteristic diagram P3 [24256,72,92], P4 [24,2] are obtained
        # The characteristic figure P5 [24256,9,12], P7 [24,25] are obtained
        features = self.backbone(images.t
        features = [features[f] for f in 
        # The feature map is sent to the head together for fusion, and the box, confidence level are predicted,
        box_cls, box_delta = self.head(fe
        # Generate anchor                       
        anchors = self.anchor_generator(f
        # If it's training, calculate the loss in combination with the ground truth  
            gt_classes, gt_anchors_reg_de
            return self.losses(gt_classes
        # If it's training, return the predicted result                 
            results = self.inference(box_
            processed_results = []       
            for results_per_image, input_
                results, batched_inputs, 
                height = input_per_image.
                width = input_per_image.g
                r = detector_postprocess(
            return processed_results     

English notes have not been cancelled here. Interested friends can read the following. Through the above notes, we should have a better understanding of the positive propagation of the network.

Input data source

Although it has been explained almost in 78-78, there is another problem, that is, in the process of forward propagation, that is, the source of the input of the forward (self, batched input) function, batched input is actually very easy to find. It is located at:

class SimpleTrainer(TrainerBase):
    def __init__(self, model, data_loader, optimizer):\
    def run_step(self):
        # Get the data of a batch ENU size. If necessary, you can decorate the dataloader
        If your want to do something with the data, you can wrap the dataloader.
        data = next(self._data_loader_iter)
        data_time = time.perf_counter() - start	

The data is corresponding to the previously mentioned batched Ou inputs, which is displayed in the configuration and operation of the user as follows:

In fact, as mentioned before, it is:

        Batched input is a batch size
        If it includes: 'file Ou name' = '/ data / ZWH / 0
        'height' = 492     'width' = 65
        'instances': some comments, such as the coordinate of semantic segmentation, the coordinate of box


So far, the analysis is more detailed. Later, we study data preprocessing, anchor, and loss calculation. Later on.
Come on, Wuhan, China. When peaches and plums are in full bloom, it must be the beginning for you and me to cook and work together!

233 original articles published, praised 1708, visited 150000+
Private letter follow

Posted by HA7E on Tue, 28 Jan 2020 05:20:32 -0800