Verilog | Multiplier In
module array_multiplier #(parameter WIDTH = 4)( input [WIDTH-1:0] a, b, output [2*WIDTH-1:0] product ); wire [WIDTH-1:0] pp [0:WIDTH-1]; // Partial products genvar i; generate for(i = 0; i < WIDTH; i = i + 1) begin assign pp[i] = a & {WIDTH{b[i]}}; end endgenerate // Summation using a tree of adders (simplified) assign product = pp[0] + (pp[1] << 1) + (pp[2] << 2) + (pp[3] << 3); endmodule The problem is speed. The final addition uses a ripple-carry structure. For an N-bit multiplier, the critical path passes through N AND gates and an adder chain with O(N) gate delays. For 32-bit numbers, this becomes impractically slow. When area is constrained (e.g., in an ASIC or a small FPGA), the sequential multiplier is the classic solution. Instead of building all logic at once, it reuses a single adder over multiple clock cycles.
But relying solely on * is not always optimal. For very large bit-widths (e.g., 64x64) or when targeting low-cost FPGAs with few DSP slices, the inferred multiplier may be too slow or consume too much area. This is where the designer must step in, replacing the simple operator with a structured algorithm. The most intuitive hardware multiplier mimics grade-school multiplication. A 4-bit multiplier takes a 4-bit multiplicand A (A3 A2 A1 A0) and a 4-bit multiplier B (B3 B2 B1 B0). It generates four partial products (e.g., A & B0 , A & B1 shifted left, etc.) and then sums them. multiplier in verilog
In Verilog, this can be implemented using a generate loop: For 32-bit numbers, this becomes impractically slow
This essay explores the multiplier in Verilog, examining its direct implementation, the hidden complexity of synthesis, and the design strategies engineers use to optimize it. At its simplest, Verilog allows multiplication via the binary operator * . An engineer can write: But relying solely on * is not always optimal