voxelnet tensorflow code analysis learning notes

Unofficial Code: github codehttps://github.com/jeasinema/VoxelNet-tensorflow

The network structure can be divided into the above three parts: feature learning network, revolutionary middle layers, and the RPN part of the final production forecast results and boxes (refer to fast r-cnn for this part)

1, feature learning network

This part mainly divides the point cloud into small voxels, fully connects each voxel containing points, generates VFE layer for each voxel, and finally obtains the four-dimensional tensor, that is, all the steps shown in the figure below are completed.

VFE layer

Non empty voxel V (voxel contains T < = t points) is processed, and point Pi includes its XYZ coordinates and reflectivity information [Xi,Yi,Zi,ri].
The following are the steps for processing a single V. all non empty V processing is the same
1. Calculate the average value of points in V to obtain (Vx,Vy,Vz)
2. The mean value is extended to the tensor of each point, and the dimension is 7, [Xi, Yi, Zi, RI, Xi VX, Yi vy, Zi VZ]
3. Each point in V is transformed into a feature space through the fully connected network FCN
FCN layer includes linear layer, BN layer and ReLU layer
4. Maximum pooling processes each point feature of the full connection layer
5. Finally, the maximum value pool processing results of each point are combined with the FCN layer processing results

Code analysis training strategy

Setting of learning rate

        self.cls = cls
        self.single_batch_size = single_batch_size
        self.learning_rate = tf.Variable(
            float(learning_rate), trainable=False, dtype=tf.float32)
        self.global_step = tf.Variable(1, trainable=False)
        self.epoch = tf.Variable(0, trainable=False)
        self.epoch_add_op = self.epoch.assign(self.epoch + 1)
        self.alpha = alpha
        self.beta = beta
        self.avail_gpus = avail_gpus

        boundaries = [80, 120]
        values = [ self.learning_rate, self.learning_rate * 0.1, self.learning_rate * 0.01 ]
        lr = tf.train.piecewise_constant(self.epoch, boundaries, values)

Use tf.train.piece wise_ Constant() can change the learning rate during training in two ways. The first is the exponential decay of the learning rate, and the second is to specify a learning rate within a certain range of iterations.
tf.train.piecewise_constant() is the second. boundaries = [80, 120] is the range interval. Values = [self.learning_rate, self.learning_rate * 0.1, self.learning_rate * 0.01] defines the learning rate of each interval, that is, the learning rate is self.learning in steps 0 ~ 80_ Rate, 80 ~ 120, the learning rate is self. Learning_ After rate * 0.1120, the learning rate is self.learning_ rate * 0.01. self.epoch represents the number of iterations in the function global_step.

Code analysis II - VFE layer implementation

From model.group is referenced in the code model.py_ Pointcloud imports featurenet to implement the work of this layer
The following parameters are mainly passed in model.py

feature = FeatureNet(training=self.is_train,batch_size=self.single_batch_size)

group_pointcloud.py implements all the contents of the first layer feature learning network for two classes, FeatureNet class, and VFElayer defines how to implement each layer

class VFELayer(object):

    def __init__(self, out_channels, name):
        super(VFELayer, self).__init__()
        self.units = int(out_channels / 2)
        with tf.variable_scope(name, reuse=tf.AUTO_REUSE) as scope:
            self.dense = tf.layers.Dense(
                self.units, tf.nn.relu, name='dense', _reuse=tf.AUTO_REUSE, _scope=scope)#Set the full connection layer. See the following for the specific explanation of the function. Output = m*k, k=out_channels / 2
            self.batch_norm = tf.layers.BatchNormalization(
                name='batch_norm', fused=True, _reuse=tf.AUTO_REUSE, _scope=scope)#Set BN layer

    def apply(self, inputs, mask, training):
        # [K, T, 7] tensordot [7, units] = [K, T, units]
        pointwise = self.batch_norm.apply(self.dense.apply(inputs), training)

        #n [K, 1, units]
        aggregated = tf.reduce_max(pointwise, axis=1, keep_dims=True)#axis=1, calculate the maximum value according to the row, that is, pool the characteristics of each point obtained by FCN layer
        # [K, T, units]
        repeated = tf.tile(aggregated, [1, cfg.VOXEL_POINT_COUNT, 1])#Copy the result of the maximum value of the column on each column so that subsequent point features can be combined

        # [K, T, 2 * units]
        concatenated = tf.concat([pointwise, repeated], axis=2)#Combine the pool processing result of the maximum value of each point with the FCN layer processing result

        mask = tf.tile(mask, [1, 1, 2 * self.units])

        concatenated = tf.multiply(concatenated, tf.cast(mask, tf.float32))#The pool processing result of the maximum value of each point in the non empty voxel is combined with the FCN layer processing result

        return concatenated


class FeatureNet(object):

    def __init__(self, training, batch_size, name=''):
        super(FeatureNet, self).__init__()
        self.training = training

        # scalar
        self.batch_size = batch_size
        # [ΣK, 35/45, 7]
        self.feature = tf.placeholder(
            tf.float32, [None, cfg.VOXEL_POINT_COUNT, 7], name='feature')
        # [ΣK]
        self.number = tf.placeholder(tf.int64, [None], name='number')
        # [ΣK, 4], each row stores (batch, d, h, w)
        self.coordinate = tf.placeholder(
            tf.int64, [None, 4], name='coordinate')

        with tf.variable_scope(name, reuse=tf.AUTO_REUSE) as scope:
            self.vfe1 = VFELayer(32, 'VFE-1')
            self.vfe2 = VFELayer(128, 'VFE-2')

        # boolean mask [K, T, 2 * units]
        mask = tf.not_equal(tf.reduce_max(
            self.feature, axis=2, keep_dims=True), 0)#Determine the non empty set tf.not_equal(x,y) returns X= True value of Y
        x = self.vfe1.apply(self.feature, mask, self.training)
        x = self.vfe2.apply(x, mask, self.training)

        # [ΣK, 128]
        voxelwise = tf.reduce_max(x, axis=1)

        # car: [N * 10 * 400 * 352 * 128]
        # pedestrian/cyclist: [N * 10 * 200 * 240 * 128]
        self.outputs = tf.scatter_nd(
            self.coordinate, voxelwise, [self.batch_size, 10, cfg.INPUT_HEIGHT, cfg.INPUT_WIDTH, 128])#Spread the updates to the new (initially zero) tensor according to indices. The final sparse tensor is obtained

1,tf.layers.dense( input, units=k )

Author: Mr Fenrir
Link: https://www.jianshu.com/p/3855908b4c29
Source: Jianshu
The function automatically generates a weight matrix kernel and an offset term bias internally. The specific dimensions of each variable are as follows: for the two-dimensional tensor input with size [m, n], TF. Layers. Deny() will generate a weight matrix kernel with size [n, k] and an offset term bias with size [m, k]. The internal calculation process is y = input * kernel + bias, and the dimension of the output value y is [M, k].
Routine:

import tensorflow as tf

# 1. Call tf.layers.deny to calculate
input = tf.reshape(tf.constant([[1., 2.], [2., 3.]]), shape=[4, 1])    
b1 = tf.layers.dense(input,
                     units=2,
                     kernel_initializer=tf.constant_initializer(value=2),   # shape: [1,2]
                     bias_initializer=tf.constant_initializer(value=1))     # shape: [4,2]

# 2. Calculate by matrix multiplication
kernel = tf.reshape(tf.constant([2., 2.]), shape=[1, 2])
bias = tf.reshape(tf.constant([1., 1., 1., 1., 1., 1., 1., 1.]), shape=[4, 2])
b2 = tf.add(tf.matmul(input, kernel), bias)

with tf.Session()as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(b1))
    print(sess.run(b2))

2,tf.reduce_max()

https://blog.csdn.net/lllxxq141592654/article/details/85345864

import tensorflow as tf
import numpy as np

a=np.array([[1, 2],
            [5, 3],
            [2, 6]])

b = tf.Variable(a)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(b))
    print('************')
    # For a two-dimensional matrix, the axis=0 axis can be understood as the row growth direction (down), and the axis=1 axis can be understood as the column growth direction (right)
    print(sess.run(tf.reduce_max(b, axis=1, keepdims=False)))  # keepdims=False,axis=1 is subtracted
    print('************')
    print(sess.run(tf.reduce_max(b, axis=1, keepdims=True)))
    print('************')
    print(sess.run(tf.reduce_max(b, axis=0, keepdims=True)))

[[1 2]
 [5 3]
 [2 6]]
************
[2 5 6]
************
[[2]
 [5]
 [6]]
************
[[5 6]]

3,tf.title()

https://www.cnblogs.com/yibeimingyue/p/11869882.html
tf.tile() usage introduction
tile() means tile, which is used for copying on the same dimension

tile(
    input,     #input
    multiples,  #Number of copies on the same dimension
    name=None
)

Examples are as follows:

with tf.Graph().as_default():
    a = tf.constant([1,2],name='a')
    b = tf.tile(a,[3])
    sess = tf.Session()
    print(sess.run(b))

Copy the same dimension of [1,2] three times. The dimensions of the multiples parameter should be consistent with the input dimension. The results are as follows:

[1 2 1 2 1 2]

with tf.Graph().as_default():
    a = tf.constant([[1,2],[3,4]],name='a')   
    b = tf.tile(a,[2,3])
    sess = tf.Session()
    print(sess.run(b))

Output:

[[1 2 1 2 1 2]
 [3 4 3 4 3 4]
 [1 2 1 2 1 2]
 [3 4 3 4 3 4]]

4,tf.scatter_nd()

Spread the updates to the new (initially zero) tensor according to indices.

    indices = tf.constant([[4], [3], [1], [7]])
    updates = tf.constant([9, 10, 11, 12])
    shape = tf.constant([8])
    scatter = tf.scatter_nd(indices, updates, shape)
    with tf.Session() as sess:
      print(sess.run(scatter))

2, Revolutionary middle layers

3, Region Proposal Network

The contents of the last two networks are put together in the program
model.py is defined as follows:

rpn = MiddleAndRPN(input=feature.outputs, alpha=self.alpha, beta=self.beta, training=self.is_train)

This function is in rpn.py

class MiddleAndRPN:
    def __init__(self, input, alpha=1.5, beta=1, sigma=3, training=True, name=''):
        # scale = [batchsize, 10, 400/200, 352/240, 128] should be the output of feature learning network
        self.input = input
        self.training = training
        # groundtruth(target) - each anchor box, represent as △x, △y, △z, △l, △w, △h, rotation
        self.targets = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 14])
        # postive anchors equal to one and others equal to zero(2 anchors in 1 position)
        self.pos_equal_one = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 2])
        self.pos_equal_one_sum = tf.placeholder(tf.float32, [None, 1, 1, 1])
        self.pos_equal_one_for_reg = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 14])
        # negative anchors equal to one and others equal to zero
        self.neg_equal_one = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 2])
        self.neg_equal_one_sum = tf.placeholder(tf.float32, [None, 1, 1, 1])

        with tf.variable_scope('MiddleAndRPN_' + name):
            # convolutinal middle layers
            temp_conv = ConvMD(3, 128, 64, 3, (2, 1, 1),
                               (1, 1, 1), self.input, name='conv1')
            temp_conv = ConvMD(3, 64, 64, 3, (1, 1, 1),
                               (0, 1, 1), temp_conv, name='conv2')
            temp_conv = ConvMD(3, 64, 64, 3, (2, 1, 1),
                               (1, 1, 1), temp_conv, name='conv3')
            temp_conv = tf.transpose(temp_conv, perm=[0, 2, 3, 4, 1])
            temp_conv = tf.reshape(
                temp_conv, [-1, cfg.INPUT_HEIGHT, cfg.INPUT_WIDTH, 128])

            # rpn
            # block1:
            temp_conv = ConvMD(2, 128, 128, 3, (2, 2), (1, 1),
                               temp_conv, training=self.training, name='conv4')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv5')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv6')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv7')
            deconv1 = Deconv2D(128, 256, 3, (1, 1), (0, 0),
                               temp_conv, training=self.training, name='deconv1')

            # block2:
            temp_conv = ConvMD(2, 128, 128, 3, (2, 2), (1, 1),
                               temp_conv, training=self.training, name='conv8')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv9')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv10')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv11')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv12')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv13')
            deconv2 = Deconv2D(128, 256, 2, (2, 2), (0, 0),
                               temp_conv, training=self.training, name='deconv2')

            # block3:
            temp_conv = ConvMD(2, 128, 256, 3, (2, 2), (1, 1),
                               temp_conv, training=self.training, name='conv14')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv15')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv16')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv17')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv18')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv19')
            deconv3 = Deconv2D(256, 256, 4, (4, 4), (0, 0),
                               temp_conv, training=self.training, name='deconv3')

            # final:
            temp_conv = tf.concat([deconv3, deconv2, deconv1], -1)
            # Probability score map, scale = [None, 200/100, 176/120, 2]
            p_map = ConvMD(2, 768, 2, 1, (1, 1), (0, 0), temp_conv,
                           training=self.training, activation=False, bn=False, name='conv20')
            # Regression(residual) map, scale = [None, 200/100, 176/120, 14]
            r_map = ConvMD(2, 768, 14, 1, (1, 1), (0, 0),
                           temp_conv, training=self.training, activation=False, bn=False, name='conv21')
            # softmax output for positive anchor and negative anchor, scale = [None, 200/100, 176/120, 1]
            self.p_pos = tf.sigmoid(p_map)
            #self.p_pos = tf.nn.softmax(p_map, dim=3)
            self.output_shape = [cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH]

            self.cls_pos_loss = (-self.pos_equal_one * tf.log(self.p_pos + small_addon_for_BCE)) / self.pos_equal_one_sum
            self.cls_neg_loss = (-self.neg_equal_one * tf.log(1 - self.p_pos + small_addon_for_BCE)) / self.neg_equal_one_sum
            
            self.cls_loss = tf.reduce_sum( alpha * self.cls_pos_loss + beta * self.cls_neg_loss )
            self.cls_pos_loss_rec = tf.reduce_sum( self.cls_pos_loss )
            self.cls_neg_loss_rec = tf.reduce_sum( self.cls_neg_loss )


            self.reg_loss = smooth_l1(r_map * self.pos_equal_one_for_reg, self.targets *
                                      self.pos_equal_one_for_reg, sigma) / self.pos_equal_one_sum
            self.reg_loss = tf.reduce_sum(self.reg_loss)

            self.loss = tf.reduce_sum(self.cls_loss + self.reg_loss)

            self.delta_output = r_map
            self.prob_output = self.p_pos


def smooth_l1(deltas, targets, sigma=3.0):
    sigma2 = sigma * sigma
    diffs = tf.subtract(deltas, targets)
    smooth_l1_signs = tf.cast(tf.less(tf.abs(diffs), 1.0 / sigma2), tf.float32)

    smooth_l1_option1 = tf.multiply(diffs, diffs) * 0.5 * sigma2
    smooth_l1_option2 = tf.abs(diffs) - 0.5 / sigma2
    smooth_l1_add = tf.multiply(smooth_l1_option1, smooth_l1_signs) + \
        tf.multiply(smooth_l1_option2, 1 - smooth_l1_signs)
    smooth_l1 = smooth_l1_add

    return smooth_l1


def ConvMD(M, Cin, Cout, k, s, p, input, training=True, activation=True, bn=True, name='conv'):
    temp_p = np.array(p)
    temp_p = np.lib.pad(temp_p, (1, 1), 'constant', constant_values=(0, 0))
    with tf.variable_scope(name) as scope:
        if(M == 2):
            paddings = (np.array(temp_p)).repeat(2).reshape(4, 2)
            pad = tf.pad(input, paddings, "CONSTANT")
            temp_conv = tf.layers.conv2d(
                pad, Cout, k, strides=s, padding="valid", reuse=tf.AUTO_REUSE, name=scope)
        if(M == 3):
            paddings = (np.array(temp_p)).repeat(2).reshape(5, 2)
            pad = tf.pad(input, paddings, "CONSTANT")
            temp_conv = tf.layers.conv3d(
                pad, Cout, k, strides=s, padding="valid", reuse=tf.AUTO_REUSE, name=scope)
        if bn:
            temp_conv = tf.layers.batch_normalization(
                temp_conv, axis=-1, fused=True, training=training, reuse=tf.AUTO_REUSE, name=scope)
        if activation:
            return tf.nn.relu(temp_conv)
        else:
            return temp_conv

def Deconv2D(Cin, Cout, k, s, p, input, training=True, bn=True, name='deconv'):
    temp_p = np.array(p)
    temp_p = np.lib.pad(temp_p, (1, 1), 'constant', constant_values=(0, 0))
    paddings = (np.array(temp_p)).repeat(2).reshape(4, 2)
    pad = tf.pad(input, paddings, "CONSTANT")
    with tf.variable_scope(name) as scope:
        temp_conv = tf.layers.conv2d_transpose(
            pad, Cout, k, strides=s, padding="SAME", reuse=tf.AUTO_REUSE, name=scope)
        if bn:
            temp_conv = tf.layers.batch_normalization(
                temp_conv, axis=-1, fused=True, training=training, reuse=tf.AUTO_REUSE, name=scope)
        return tf.nn.relu(temp_conv)


if(__name__ == "__main__"):
    m = MiddleAndRPN(tf.placeholder(
        tf.float32, [None, 10, cfg.INPUT_HEIGHT, cfg.INPUT_WIDTH, 128]))

Posted by gauss on Tue, 21 Sep 2021 23:28:30 -0700

Programmer Group