Implementing Multipliers in FLEX 10K Devices ® March 1996, ver. 1 Introduction Application Note 53 The Altera FLEX 10K embedded programmable logic device (PLD) family provides the first PLDs in the industry with an embedded array. The embedded array consists of a series of embedded array blocks (EABs) that can implement complex logic functions, such as multipliers. Each EAB can be configured as an 8-input, 8-output look-up table (LUT). Therefore, a single EAB can create a multiplier with up to 8 inputs—such as a 4 × 4, 5 × 3, or 6 × 2 multiplier. Figure 1 shows a graphical representation of the flexible multiplier sizes that can be implemented in an EAB. Figure 1. Multiplier Configuration for a Single EAB 4×4 5×3 6×2 This application note describes how to implement large multipliers using several EABs and compares parallel multiplier and time-domainmultiplexed multiplier implementations. 1 Single-EAB Multipliers Altera Corporation A-AN-053-01 The design files described in this application note are available from the Altera BBS via modem at (408) 954-0104 and the Altera FTP site at ftp.altera.com. The self-extracting files are: an_53.exe and an_53.tar. You can implement a multiplier with up to 8 inputs in a single EAB using a function from the library of parameterized modules (LPM). The LPM is a set of architecture-independent modules with scalable widths that completely describes the logical operation of a circuit. Using the LPM function, lpm_mult, you can define the width of the multiplicand for the multiplier. Then, you can use MAX+PLUS II to place the multiplier in an EAB by following these steps: 1. Select the lpm_mult function in any MAX+PLUS II application. 2. Choose the Logic Options command (Assign menu). In the Logic Options dialog box, the name of the function is displayed in the Node Name box. 1 AN 53: Implementing Multipliers in FLEX 10K Devices Multiple-EAB Multipliers 3. Choose the Individual Logic Options button and turn on the Implement in EAB option. Choose OK. 4. Choose OK to implement the multiplier in the EAB. A multiplier with more than 8 inputs must be implemented in two or more EABs. Each EAB computes a single partial product, generated from a 4 × 4 multiplier. To illustrate how to split the multiplier across multiple EABs, consider how a 2-digit by 2-digit multiplication is calculated using base 10 multiplication. See Figure 2. Figure 2. Base 10 Multiplication 12 × 37 + 7×1 3×2 3×1 3 × 102 + 7×2 (7 + 6) × 101 + 14 × 100 Rather than using base 10 (as shown in Figure 2), the EAB performs the same operation in hexadecimal radix. Each partial product is calculated within a single EAB. See Figure 3. Figure 3. Hexadecimal Multiplication X[7..4] X[3..0] × Y[7..4] Y[3..0] Each partial product is generated by one EAB. Partial products are summed to produce the final product. X[7..4] × Y[3..0] + X[7..4] × Y[7..4] X[3..0] × Y[3..0] X[3..0] × Y[7..4] X[7..4] × Y[7..4] × 162 + ((X[7..4] × Y[3..0]) + (X[3..0] × Y[7..4])) × 161 + X[7..4] × Y[3..0] × 160 To account for the relative significance in hexadecimal radix, each partial product is multiplied by 16n (where n = 0, 1, 2,...) and then added together to determine the final product. You can choose one of two design methods to generate the final product: a parallel multiplier or a time-domain-multiplexed multiplier. 2 Altera Corporation AN 53: Implementing Multipliers in FLEX 10K Devices Parallel Multiplier The parallel multiplier design method uses multiple EABs to generate all of the partial products in parallel. For example, an 8 × 8 parallel multiplier uses four EABs (one for each partial product) to simultaneously generate four 4 × 4 partial products. Before adding the partial products together, each partial product is shifted to account for the 16n term (i.e., each partial product is shifted over n hexadecimal digits or 4 × n bits). The adder assembles the final product by shifting the data into different bits. Addition is normally generated by a two-stage adder with 8 bits for the first stage and 12 bits for the second stage (see Figure 4). Figure 4. 2-Stage Adder + R6 R5 R4 S7 S6 S5 S4 S3 S2 S1 S0 T7 T6 T5 T4 T3 T2 T1 T0 U1 U0 U7 U6 U5 U4 U3 U2 Q15 Q14 Q13 Q12 Q11 Q10 Where R7 Q9 Q8 Q7 Q6 Q5 Q4 R3 R2 Q3 Q2 R1 Q1 R0 Q0 R = X[3..0] × Y[3..0] Addition performed in the first stage S = X[3..0] × Y[7..4] Addition performed in the second stage T = X[7..4] × Y[3..0] U = X[7..4] × Y[7..4] You can pipeline the parallel multiplier to enhance design speeds by using registers to process logic over multiple Clock cycles. The registers within the EAB can be used for pipelining (see Figure 5). Altera Corporation 3 AN 53: Implementing Multipliers in FLEX 10K Devices Figure 5. Parallel Multiplier with Pipelining Optional Pipelining Registers EAB X[3..0] 4 4 4 Z[3..0] Y[3..0] 4 4 X[3..0] Y[7..4] 4 4 X[7..4] 4 4 Y[3..0] 4 4 4 4 4 4 4 4 4 4 Z[7..4] Z[11..8] X[7..4] Y[7..4] Z[15..12] Multiplier An 8 × 8 parallel multiplier is implemented in 3 stages: a multiplier stage using 4 EABs, and 2 adder stages with 8 bits for the first stage and 12 bits for the second stage. To pipeline the multiplier, each bit must be registered after each stage, which requires 21 registers for the first stage and 16 registers for the second stage. For the multiplier stage, each EAB has registers available at the inputs and outputs. Therefore, additional logic elements (LEs) are not required for the multiplier stage. The LEs containing the adder logic provide 21 registers; therefore only 20 additional LEs are required for the entire circuit. Time-Domain-Multiplexed Multiplier The time-domain-multiplexed multiplier design method uses a single EAB to generate all partial products on different Clock cycles (see Figure 6). Therefore, the appropriate bits need to be loaded into the EAB before each multiplication. After multiplication, the accumulator shifts the data to account for the 16n term and then sums the different partial products to produce the final product. 4 Altera Corporation AN 53: Implementing Multipliers in FLEX 10K Devices Figure 6. Simulation Waveform for Time-Domain-Multiplexed Multiplier Clock EAB Output R S T U Accumulator Output (1) (2) (3) (4) Where R = X[3..0] × Y[3..0] S = X[3..0] × Y[7..4] T = X[7..4] × Y[3..0] U = X[7..4] × Y[7..4] Notes: (1) (2) (3) (4) X[3..0] × Y[3..0] × 160 (X[3..0] × Y[3..0] × 160) + (X[3..0] × Y[7..4] × 161) (X[3..0] × Y[3..0] × 160) + ((X3..0] × Y[7..4]) + (X[7..4] × Y[3..0])) × 161 (X[3..0] × Y[3..0] × 160) + [((X[3..0] × Y[7..4]) + (X[7..4] × Y[3..0])) × 161] + (X[7..4] × Y[7..4] × 162) To pipeline the time-domain-multiplexed multiplier, insert registers between the EAB performing the multiplication and the accumulator performing the addition and shifting. Figure 7 shows a timedomain-multiplexed multiplier. Figure 7. Time-Domain-Multiplexed Multiplier X[7..4] Optional Input Registers D Q ENA 4 X[3..0] D 4 Q Loadable Accumulator 4 EAB 16 Multiplier ENA 8 Y[7..4] D 16 16 16 D Q Z[15..0] 4 4 D 16 12 8 Q ENA Y[3..0] Shift 8 Shift 4 Shift 0 4 Q ENA Control Altera Corporation 5 AN 53: Implementing Multipliers in FLEX 10K Devices You can also increase throughput in the time-domain-multiplexed multiplier design method by implementing the multiplier in two or more EABs. Then, the multiplier computes multiple partial products simultaneously, which reduces the number of Clock cycles. The time-domain-multiplexed multiplier implementation is well-suited for very large multiplications, such as 16 × 16 or 32 × 32, because it conserves EABs and logic cells. In contrast, large multiplications would consume a prohibitive amount of logic cells or EABs if computed in parallel. Design Speed The parallel multiplier generates all of the partial products and sums the response within a single Clock cycle. In addition, data is loaded on every Clock cycle, giving the parallel multiplier high throughput and fast calculation times. Designers can pipeline the parallel multiplier for faster Clock speeds. Pipelining requires multiple Clock cycles and more latency time to generate the multiplication for a single multiplier. However, it decreases the Clock period while still allowing new data to be loaded on every Clock cycle. The faster Clock speeds generated by pipelining allow for the highest throughput for consecutive operations because pipelining can generate a new product on every Clock cycle. See Figure 8. Figure 8. Simulation Waveforms for Non-Pipelined & Pipelined Parallel Multipliers Non-Pipelined Parallel Multiplier Clock Data 2 1 Product 3 1 4 3 2 4 1 Computation 4 Computations Pipelined Parallel Multiplier Clock Data Product 1 2 3 1 4 2 3 4 1 Computation 4 Computations 6 Altera Corporation AN 53: Implementing Multipliers in FLEX 10K Devices The typical time-domain-multiplexed multiplier uses a single EAB to compute all partial products on different Clock cycles. Therefore, multiplication requires the same number of Clock cycles as partial products. In the 8 × 8 bit multiplication example shown in Figure 7, the multiplication requires 4 Clock cycles. When consecutive multiplications are required, the first multiplication must be completed before the second multiplication can begin. Designers can pipeline the time-domain-multiplexed multiplier for faster Clock speeds. Pipelining creates faster Clock speeds by reducing the Clock period and generating higher throughput. Table 1 summarizes the performance of parallel and time-domain-multiplexed multipliers. Table 1. Circuit Performance Design Clock Cycles for an 8 × 8 Multiplier One Multiplication Two Multiplications Device Utilization Parallel Multiplier 1 2 Parallel Multiplier with 3-Stage Pipeline 3 4 Time-Domain-Multiplexed Multiplier 4 9 Time-Domain-Multiplexed Multiplier with 2-Stage Pipeline 5 10 The 8 × 8 parallel multiplier design uses 4 EABs plus 21 additional LEs required for the 12-bit and 8-bit adders. A 3-stage pipeline requires 20 additional registers to store data. A parallel multiplier with 3-stage pipelining will not require any additional LEs when the registers are implemented in the EAB. In contrast, the time-domain-multiplexed multiplier uses only one EAB. The multiplier uses logic, rather than EABs, to select which bits are used for multiplication. A time-domain-multiplexed multiplier with 2-stage pipelining does not require any additional LEs. Altera Corporation 7 AN 53: Implementing Multipliers in FLEX 10K Devices Table 2 summarizes the number of EABs and LEs required for each type of multiplier. Table 2. Device Utilization for an 8 × 8 Multiplier Design Conclusion 8 EABs Required LEs Required Parallel Multiplier 4 24 Parallel Multiplier with 3-Stage Pipeline 4 45 Time-Domain-Multiplexed Multiplier 1 65 Time-Domain-Multiplexed Multiplier with 2-Stage Pipeline 1 65 Large multipliers can be implemented in FLEX 10K devices with either a parallel multiplier or time-domain-multiplexed multiplier design method. The parallel multiplier offers the fastest Clock speeds but requires more space and device resources. The timedomain-multiplexed multiplier conserves space and device resources but offers slower Clock speeds. Both design methods can be pipelined for faster Clock speeds. Altera Corporation Copyright © 1995, 1996 Altera Corporation, 2610 Orchard Parkway, San Jose, California 95134, USA, all rights reserved. By accessing any information on this CD-ROM, you agree to be bound by the terms of Altera’s Legal Notice.
© Copyright 2025 Paperzz