GFPGAN source code analysis - Part 6

self,
out_size,
num_style_feat=512,
channel_multiplier=1,
decoder_load_path=None,
fix_decoder=True,
# for stylegan decoder
num_mlp=8,
input_is_latent=False,
different_w=False,
narrow=1,
sft_half=False

When called in class GFPGANer()-init():

self.gfpgan = GFPGANv1Clean(
    out_size=512,
    num_style_feat=512,
    channel_multiplier=channel_multiplier,
    decoder_load_path=None,
    fix_decoder=False,
    num_mlp=8,
    input_is_latent=True,
    different_w=True,
    narrow=1,
    sft_half=True)

(1) Settings for channels

When actually called, arrow = 1,

Channels stores the number of output channels after passing through the revolution layer

unet_narrow = narrow * 0.5

channels = {
    '4': int(512 * unet_narrow),
    '8': int(512 * unet_narrow),
    '16': int(512 * unet_narrow),
    '32': int(512 * unet_narrow),
    '64': int(256 * channel_multiplier * unet_narrow),
    '128': int(128 * channel_multiplier * unet_narrow),
    '256': int(64 * channel_multiplier * unet_narrow),
    '512': int(32 * channel_multiplier * unet_narrow),
    '1024': int(16 * channel_multiplier * unet_narrow)
}

(2) Call torch.nn.Conv2d() to create a convolutional neural network

#out_size=512，so log_size=9
self.log_size = int(math.log(out_size, 2))
#first_out_size = 512
first_out_size = 2 ** (int(math.log(out_size, 2)))
#channels['512']=32*2*0.5=32
self.conv_body_first = nn.Conv2d(3, channels[f'{first_out_size}'], 1)

Here are some parameters of nn.Conv2d()

in_channels: int,#Number of channels entered [required]
out_channels: int,# Number of output channels [required]
kernel_size: _size_2_t,#Size of convolution kernel, type: int (square side length) or tuple (length and width) [required]
stride: _size_2_t = 1,#step
padding: Union[str, _size_2_t] = 0,#Boundary gain, which can control the size of the output result
dilation: _size_2_t = 1,#Controls the spacing between convolution kernels
groups: int = 1,
bias: bool = True,
padding_mode: str = 'zeros',  # TODO: refine this type
device=None,
dtype=None

Then you can know

self.conv_body_first = nn.Conv2d(3, channels[f'{first_out_size}'], 1)

#It is actually an input with an incoming channel of 3 (RGB), using a convolution kernel with a side length of 1, and finally obtaining an output with a channel of 32
#Since the side length of the convolution kernel is 1, the size of the input and the input image remains the same, but the number of channels is increased

(3) Downsample

You can see that ResBlock is actually called for down sampling

# Enter the number of channels for the picture (actually 32)
in_channels = channels[f'{first_out_size}']
 #Create ModuleList container
self.conv_body_down = nn.ModuleList()
# i from self.log_size (9) - > 3: 7 cycles
for i in range(self.log_size, 2, -1):
    out_channels = channels[f'{2 ** (i - 1)}']
    #Call the ResBlock residual network for down sampling, and add the module to the set ModuleList
    self.conv_body_down.append(ResBlock(in_channels, out_channels, mode='down'))
    #The number of output pipes of this layer is the number of input pipes of the next layer
    in_channels = out_channels

Introduce nn.ModuleList()

nn.ModuleList, which is a container that stores different modules and automatically adds the parameters of each module to the network. You can add any subclass of nn.Module (such as nn.Conv2d, nn.Linear, etc.) to this list. The method is the same as Python's own list, which is nothing more than extend, append, etc. However, unlike ordinary lists, modules added to nn.ModuleList will be automatically registered to the whole network, and module parameters will also be automatically added to the whole network.
#Note that nn.ModuleList does not implement the internal forward function, so it needs to be implemented manually

Construction of the last convolution layer:

#The final number of output channels is channels['4']=256. The convolution kernel with side length of 3 is used. The step size is 1 and the padding is 1 to ensure that the dimension remains unchanged
self.final_conv = nn.Conv2d(in_channels, channels['4'], 3, 1, 1)

(4) upsample

#The number of input channels is channels['4']=256, that is, the number of output channels for down sampling
        in_channels = channels['4']
        #Create ModuleList container
        self.conv_body_up = nn.ModuleList()
        # i from 3 - > self.log_ Size (9): 7 cycles
        for i in range(3, self.log_size + 1):
            # Defines the number of channels to output
            out_channels = channels[f'{2 ** i}']
            # Call the ResBlock residual network with upsampling, and add the module to the set ModuleList
            self.conv_body_up.append(ResBlock(in_channels, out_channels, 
                                              mode='up'))
            #The number of output pipes of this layer is the number of input pipes of the next layer
            in_channels = out_channels

(5) Full connection layer

Select the size of each output sample according to the incoming parameter different_w, and build the corresponding full connection layer.

if different_w:
    #16*512=8192
    linear_out_channel = (int(math.log(out_size, 2)) * 2 - 2) * num_style_feat
    print(linear_out_channel)
else:
    #512
    linear_out_channel = num_style_feat
#Full connection layer size of each input sample: 4096, size of each output sample: 8192
self.final_linear = nn.Linear(channels['4'] * 4 * 4, linear_out_channel)

(6) Create self.stylegan_decoder

self.stylegan_decoder = StyleGAN2GeneratorCSFT(
    out_size=out_size,
    num_style_feat=num_style_feat,
    num_mlp=num_mlp,
    channel_multiplier=channel_multiplier,
    narrow=narrow,
    sft_half=sft_half)

(7) Read if decoder_load_path is not empty

if decoder_load_path:
    self.stylegan_decoder.load_state_dict(
        torch.load(decoder_load_path, map_location=lambda storage, loc: storage)['params_ema'])
if fix_decoder:
    for name, param in self.stylegan_decoder.named_parameters():
        param.requires_grad = False

(8)for SFT(SFT layer)

#ModuleList
self.condition_scale = nn.ModuleList()
self.condition_shift = nn.ModuleList()
  # i from 3 - > self.log_size (9): 7 cycles
for i in range(3, self.log_size + 1):
    # Defines the number of channels to output
    out_channels = channels[f'{2 ** i}']
     #Is the number of output channels halved
    if sft_half:
        sft_out_channels = out_channels
    else:
        sft_out_channels = out_channels * 2
         #Use nn.Sequential to build a network and add it to the ModuleList
    self.condition_scale.append(
        nn.Sequential(
             #The side length of convolution kernel is 3, the step size is 1, and the output and output maintain the same dimension
            nn.Conv2d(out_channels, out_channels, 3, 1, 1), nn.LeakyReLU(0.2, 
                                                                         True),
            nn.Conv2d(out_channels, sft_out_channels, 3, 1, 1)))
    self.condition_shift.append(
        nn.Sequential(
            nn.Conv2d(out_channels, out_channels, 3, 1, 1), nn.LeakyReLU(0.2, 
                                                                         True),
            nn.Conv2d(out_channels, sft_out_channels, 3, 1, 1)))

nn.Sequential is an ordered container, in which constructor classes (various classes used to process input) are passed in, and the final input will be executed successively by the constructors in sequential.

Posted by jabapyth on Mon, 06 Dec 2021 11:21:47 -0800

Programmer Group