【正文】
part of the filter 附件 D:譯文原文 D18 referred to in Figure 9 are implemented as a cascade of two dedicated carry logic adders. This adder has a total binatorial delay of 23 ns ( + ),which is close to that of a single dedicated carry logic adder. Thus the dominant delay is not necessarily in the logic path with two adders as it may seem, but could very well be in other paths with one multiplier and adder,unless placement is carefully considered. The floorplan is illustrated in Figure 10. The placement was done carefully to reduce the routing delays. Three 8 by 8multiplier units are arranged at the top of the array of 24 by 24 CLBs and two multipliers below them. The placement of the multipliers corresponding to a2 and b2 is not critical because there are no adders in their logic paths. The adders are arranged vertically to make use of the dedicated carry logic. The scaling block is implemented with a shifter whose shift is controlled by a 3 bit input (S). This shifter is implemented in two stages, the first stage providing shifts of any one of 0, 1, 2, or 4 shifts and the second stage 重慶大學(xué)本科學(xué)生畢業(yè)設(shè)計(jì)(論文)附件 D19 providing a shift of 0 or 3 stages. The first stage requires 8 CLBs and the second stage requires 4 CLBs. The worst case delay in any logic path was found to be less than 145 ns. Thus using this approach, the larger parts in this series can easily support a general purpose second order IIR filter with the indicated word sizes at sampling rates approaching 7MHz. . Dedicated IIR Filter Implementation on FPGAs Dedicated IIR sections have hardwired coefficients that are programmed when the array is configured. In binary multiplication,each partial product is a shifted version of the multiplicand if the corresponding multiplier bit is a one, and a zero if the corresponding bit is a zero. This zero term need not be puted and a row of adders in the multiplier array can be eliminated, so that higher densities can be achieved. We will show that a singleXC4013 can support two dedicated second order IIR filters using this approach. In order to evaluate the practicality of implementing several dedicated second order sections on a single FPGA, a typical low pass IIR filter was designed as a cascade of two second order sections, and was implemented on a Xilinx XC4013. As the coefficients have only a small number of nontrivial bits, the multiplications can be realized using shift and add technique. Thus for the filter, the first second order section needs 12 columns and the second one needs 11 columns. This was implemented in a singleXC4013 chip. The placement of the key modules is given in Figure of N inputs is performed by N1 stages of dedicated carry logic (DC) adder/subtractors. The dotted lines represent horizontal longlines and the shaded triangles, the shifts. Blocks which have registers are shaded. The shift for the scaling block and the shifts needed for the multiplication tend to cancel each other, thus shifting in the scaling block is absorbed into the shift of the multiplier units. Negative coefficients were handled by using the dedicated carry subtractors。 these columns are represented in the 附件 D:譯文原文 D20 layout by the small circles at the inputs to the adders. The numerous shifts encountered in the shift and add approach will very easily exhaust the routing resources. We have provided an empty column between every stage, which frees up additional routing resources. As shown in Figure 10, one second order section is placed on top of the other, for reasons discussed above, though the 24 columns available in the XC4013 might seem to satisfy the requirements for the two sections, which total only 23 columns. The adders in this implementation use 22 bits, and the most significant 14 bits of the output of the dedicated carry adders are fed to the shift and add blocks. The sampling rate achieved with this configuration was more than 10MHz. A number of other example fixedpoint filters were designed to 重慶大學(xué)本科學(xué)生畢業(yè)設(shè)計(jì)(論文)附件 D21 evaluate the utility of this approach。 in all the cases studied, it was practical to implement two IIR sections on a single chip. Figure 11. Placement of Two Dedicated Second Order IIR Filters on XC4013 6. PIPELINEDMAC UNITS It has been mentioned that the delay in the multiplier poses a major limitation on the maximum sampling rate that can be attained. Array multipliers can be configured to allow a pipelined mode of operation, 附件 D:譯文原文 D22 where the execution of separate multiplications overlaps. If this mode of operation is applied, the long delay associated with the carry propagating addition performed in the last row of the array multiplier can be minimized, since it determines the throughput of the pipeline. This approach has been shown to yield extremely high speed custom implementations [5]. With this more aggressive pipelining, a MAC unit which operates at rates approaching 100 MHz can be implemented on the XC4000series FPGAs, thus providing a building block for high sampling rate filters. The pipelined MAC units can be applied to high performance FIR and IIR filter structures, as well as other signal processing algorithms which can tolerate the pipeline delay. 重慶大學(xué)本科學(xué)生畢業(yè)設(shè)計(jì)(論文)附件 D23 . Structure of the Pipelined MAC Unit The structure of the pipelined MAC unit is shown in Figure basic cells shown here are identical to that in the unpipelined MAC unit except that these cells include pipeline registers. Registers are needed to propagate the multiplier and multiplicand bits to their destination and also to propagate the product bits that have been pleted, which is done in parallel w