Verilog HDL Implementation of Multiplier

1. Serial multiplier
The product of two N-bit binary numbers x and y is calculated by simple method, which is realized by shift operation.

module multi_CX(clk, x, y, result);
    
    input clk;
    input [7:0] x, y;
    output [15:0] result;

    reg [15:0] result;

    parameter s0 = 0, s1 = 1, s2 = 2;
    reg [2:0] count = 0;
    reg [1:0] state = 0;
    reg [15:0] P, T;
    reg [7:0] y_reg;

    always @(posedge clk) begin
        case (state)
            s0: begin
                count <= 0;
                P <= 0;
                y_reg <= y;
                T <= {{8{1'b0}}, x};
                state <= s1;
            end
            s1: begin
                if(count == 3'b111)
                    state <= s2;
                else begin
                    if(y_reg[0] == 1'b1)
                        P <= P + T;
                    else
                        P <= P;
                    y_reg <= y_reg >> 1;
                    T <= T << 1;
                    count <= count + 1;
                    state <= s1;
                end
            end
            s2: begin
                result <= P;
                state <= s0;
            end
            default: ;
        endcase
    end

endmodule

 

The multiplication function is correct, but it takes eight cycles to calculate a multiplication. Therefore, it can be seen that the serial multiplier is slow and time-delay, but the advantage of this multiplier is that it occupies the least resources of all types of multipliers and has a wide range of applications in low-speed signal processing.

2. Pipeline multiplier
In general, the fast multiplier usually adopts a bit-by-bit parallel iterative array structure, which submits the N bits of each operand to the multiplier in parallel. But generally speaking, for the FPGA, carry speed is faster than add speed, this array structure is not optimal. So we can adopt the form of multi-level pipeline and add the product of two adjacent parts to the final output product, that is, to form a structure of binary tree, so we need lb (N) level to implement N-bit multiplier.

module multi_4bits_pipelining(mul_a, mul_b, clk, rst_n, mul_out);
    
    input [3:0] mul_a, mul_b;
    input       clk;
    input       rst_n;
    output [7:0] mul_out;

    reg [7:0] mul_out;

    reg [7:0] stored0;
    reg [7:0] stored1;
    reg [7:0] stored2;
    reg [7:0] stored3;

    reg [7:0] add01;
    reg [7:0] add23;

    always @(posedge clk or negedge rst_n) begin
        if(!rst_n) begin
            mul_out <= 0;
            stored0 <= 0;
            stored1 <= 0;
            stored2 <= 0;
            stored3 <= 0;
            add01 <= 0;
            add23 <= 0;
        end
        else begin
            stored0 <= mul_b[0]? {4'b0, mul_a} : 8'b0;
            stored1 <= mul_b[1]? {3'b0, mul_a, 1'b0} : 8'b0;
            stored2 <= mul_b[2]? {2'b0, mul_a, 2'b0} : 8'b0;
            stored3 <= mul_b[3]? {1'b0, mul_a, 3'b0} : 8'b0;

            add01 <= stored1 + stored0;
            add23 <= stored3 + stored2;

            mul_out <= add01 + add23;
        end
    end

endmodule

As can be seen from the figure, pipeline multiplier is much faster than serial multiplier, which is widely used in non-high-speed signal processing. As for the multiplication of high-speed signals, it is generally necessary to use the hard core DSP unit embedded in the FPGA chip.

Posted by gunslinger008 on Mon, 25 Mar 2019 06:51:28 -0700