【正文】
r of taps can be implemented on FPGAs by approximating the filter coefficients to a sum or difference of two 附件 D:譯文原文 D14 poweroftwo terms. Implementation of digital filters may be simplified by using only a limited number of poweroftwo terms so that only a small number of shift and add operations is required. A variety of techniques have been proposed [15, 16] to minimize the deterioration of the frequency response due to these constraints. Such coefficient optimization techniques yield performance sufficient for most practical applications. . Moderate Performance Filters on FPGAs When the size of the chip is a constraint, the arithmetic resources need to be shared at the expense of speed. The structure shown in Figure 7 is suitable for sharing of arithmetic resources. This is a multiply/accumulate (MAC) unit with four multipliers and an adder tree. The inputs and the corresponding filter coefficients are fed to the MAC unit as shown in Figure 7. With the insertion of pipeline registers, the clock speed is increased. The delay in the multiplier is greater than that in the adder and hence the clock frequency is dependent on the delay in the multiplier. As there are four multipliers in this MAC unit, summation of four terms is puted every clock cycle. Hence a four tap filter can be made to operate at a sampling rate equal to the clock rate, and an eight tap filter to operate at a sampling rate half that of the clock rate. In general, if there are M multipliers in a chip and if the delay in the multiplier is Tsec, then an N tap filter can operate at a maximum sampling frequency fs given by An implementation based on the multipleinput MAC unit, as shown in Figure 7,was used to evaluate this moderate performance approach to the realization of a filter with an arbitrary number of taps. The placement of the MAC unit on a Xilinx XC4010 is shown in Figure 8. The four multipliers are arranged in the four corners of the 20 by 20 array of CLBs to reduce the delay from the input pins to the multipliers. Inputs to the multipliers are fed in at right angles, as explained previously, and the 重慶大學(xué)本科學(xué)生畢業(yè)設(shè)計(jì)(論文)附件 D15 arrays are oriented in such a way that the routing delays from pads are minimized. For ease of understanding, the most significant bit (M), intermediate bit (I), and least significant bit (L) of the output of each multiplier are marked in the Figure. The four adders were arranged vertically to exploit the dedicated carry logic supported by the XC4000 series. The size of the chip limited the number of multipliers to four. Four columns of CLBs were left for the adders. The three intermediate adders were provided with the required number of bits, that is, 16 bits, 16 bits, and 17 bits, respectively. The adders were arranged in two columns as shown in Figure 8. This leaves two full columns capable of supporting more than 70 bits for the final adder and provides sufficient intermediate word width protection for most applications. The routing between the arithmetic elements is not critical because the delay in the multiplier is 100 ns and that in the adder is ns (for a 16 bit adder) which allows routing delays to be as large as 75 ns without affecting the clock speed. A clock of 10 MHz is used for the pipeline 附件 D:譯文原文 D16 registers through the global clock buffer. Additional support chips are required to synchronize the inputs to the multipliers. There are few CLBs left in the array after the implementation of multipliers and adders, which can be used to implement the logic for interfacing to these other devices. As the MAC unit has 4multipliers and the delay in the multiplier is 100 ns, an N tap filter with these word sizes can be operated with sampling rates of 40/NMHz, where N is a multiple of 4. For example, a 32 tap filter can support a sampling rate of MHz. 5. IIR FILTERS Our implementations of multiplyaccumulate units indicate that the larger FPGAs can easily support a general purpose second order IIR filter with reasonable word sizes at moderate to high sampling rates. Designs which exploit the FPGAs reconfigurability can be used to attain even higher densities and speeds. . IIR Filter Structure The transfer function of an Nth order IIR filter is given by Some of the realizations possible are direct form I and direct form II, as discussed in [18]. For reducing the delay in the paths between registers, however, the realization shown in Figure 9 is used in this paper. This realization, like direct form II, is a cascade of an autoregressive (AR) filter and a moving average (MA) filter, but with a pipeline register in between. The delay elements are also rearranged in such a way that there is only one path with a multiplier and two adders. The others have only one multiplier and one or no adder. This realization allows easier placement of the multipliers and adders in the array of CLBs to achieve minimal routing delays. . General IIR Filter Implementation on FPGAs Second order IIR filters with general purpose multipliers, which can take coefficients as inputs from outside the chip, can be used as building 重慶大學(xué)本科學(xué)生畢業(yè)設(shè)計(jì)(論文)附件 D17 blocks for cascade or parallel realizations of higher order IIR filters. We will show that an XC4013 can support a general purpose second order IIR filter at moderately high sampling rates. The first term in the denominator of the transfer function may be scaled according to the number of bits in the coefficients for fixed point implementation. This implies that a scaling module is needed before the pipeline register between the AR and MA sections shown in Figure 9. This divider can be implemented with a shifter without considerably increasing