542450.pdf

IEbE TRANSACTIONS ON SIGNAL PROCESSING VOL 44, NO I I N O V b M B E R 1Y96
operation consists, in the general case, of adding the AY/2 lowfrequency components X = O.....-V
/2 - 1 to the *V/2 highfrequency components k = V/2. . ’ . . -l’- 1. Then, according to
Proposals 1 and 2
-I
b,,
+
=
+ E,, * h i ,,
11
for
11
289 I
Using Multirate Architectures in
Realizing Quadratic Volterra Kernels
Vikram M. Gadre and R. K. Patney
+ B,, * h ~ + ~D , ,,* bu,,
= 0.. . . . -Y/2
-
1.
IV. CONCLUSION
We have shown that bandfolding i n the DCT domain does not
result in a downsampling of the signal. The result is two convolution
operations that involve both the odd and even samples of the
signal. The impulse response of the resulting filters ( h 1. It H . Ir 1 ) . h i , ’ )
depends on the parity of the samples and on the subband (low or
high).
ACKNOWLEDGMENT
The authors would like to thank the reviewers both for their helpful comments, suggestions, and related references. Their criticisms
helped the authors improve the quality of this paper. The authors wish
to express their deep gratitude for the support they have received
from Prof. R. Goutte of INSA.
REFERENCES
W. H. Chen and S. C. Fralick, “lniage enhancement using cosine
transform filtering,” in Proc. Svmp. Currmt Mcithemtitical Problems
Image Science, Montcrey, CA, Nov. 1976. pp. 186-192.
C. Diab, R. Prost, and R. Gouite, “Exact subband image deconipositionlreconstruction by DCT,” Si,yncil Proc.essing: Imti,ye Commun., vol.
4, no. 6, pp. 489-496, Nov. 1992.
G. Karlsson and M. Vetterli, “Extension of finite length signal for
subband coding,” Signul Proc.es.sing, vol. 17. pp. 161-168, Junc 1989.
H. Kiya, K. Nishikawa, and M. Iwahashi, “A development of symmetric
extension method for subband image coding,” I
Processing, vol. 3 , no. 1, pp. 78-81, Jan. 1994.
B. Chitprasert and K. R. Rao, “Diwcte cosine transform filtering,”
Signal Processing, vol. 19, no. 3. pp. 233-245, Mar. 1990.
S. A. Mmtucci, “Symmetric convolution and the discrete sine and
cosine transforms,” IEEE Truns. Signtrl Processing, vol. 42, no. 5, pp.
1038-1051, May 1994.
H. S. Hou, D. R. Tretter, and M. J. Vngcl, “lntere\ting properties o f t h e
discrete cosine transform,” J. Visuul Commun. Imuge Represent., vol. 3,
no. I , pp. 73-83, Mar. 1992.
S. C. Chan, K. L. Ho, and C. W. Kok. “lhterpolation of 2-D signal
by subsequence FFT,” IEEE Trcins. Circuirs Sjsf,-/l; Anrilog Digilal
Signcd Processing, vol. 40, no. 2, pp 115-1 18, Feb. 1993.
2. Wang, “Interpolation using type I discrete cosine transform,“ E k tron. Len., vol. 26, no. 15, pp. 1170-1172, July 1990.
Z . Wang and L. Wang, “Interpolation using the fast discrete sine
transform.” Signul Processing, vol. 26, no. I , pp. 131-137, Jan. 1992.
Z. Wang, “Interpolation using the discrete cosine transform: reconsideration,” Electron. Lett.. vol. 29, no. 2, pp. 198-200, Jan. 1993.
K. N. Ngan, “Experiments on two-dinienhional decimation in time and
orthogonal transform domains,” Signtil Processing, vol. 1 1. no. 3, pp.
249-263, Oct. 1986.
A. Neri, G. Russo, and P. Talone, “Inter-block filtcring and downsampling in DCT domain.” Signul Procrsingc Imcrge Commun., vol. 6, no.
4, pp. 303-317, Aug. 1994.
Z . Wang, “ Fast algorithms for the discrete I V transform and for
the discrete Fourier transform”, IEEE Truns. Acousl. Speech Signcil
Processing, vol. ASSP-32, pp. 803-8 16, Aug. 1984.
R. Prost, C. Diab, and R. Goutte, “Exact multiresolution image decomposition and reconstruction in discrete space and frequency domains,”
Signul Processing: Image Commun., vol. I, pp 249-257, Sept. 1995.
Abstract-Multirate
architectures have been used for realizing linear
FIR digital filters with reduced computational complexity. The Volterra
kernel can be represented as a generalized convolution. It would thus
be expected that multirate architectures could be used to advantage in
realizing Volterra kernels as well. The quadratic Volterra kernel may be
realized in the form of an “LDL structure.” The LDL structure includes
a set of FIR filters of increasing length, which may be realized in a
computationally efficient manner using multirate architectures.
I. INTRODUCTION
For linear and circular convolution, it is possible to make use
of short convolution algorithms given by Winograd [l, ch. 21 to
achieve a reduction in computational complexity (CC). These short
convolution algorithms have been used together with block processing
for reducing the CC of running FIR filtering [4] and [6]. Multirate
architectures offer a convenient framework for doing this, as has
been illustrated in these references.
The quadratic filter involves a polynomial of second degree in the
input process at a number of past samples, which may be represented
as a “generalized convolution.” In view of this, it would be expected
that the principles that enable a reduction in CC for linear convolution
have their counterparts for quadratic filters. In this correspondence,
it is shown that this is indeed the case.
Quadratic kernels may be realized using an “LDL structure” [3]
having an FIR filter in each of its parallel branches. The order of these
FIR filters increases from one branch to the next. The basic idea in
this correspondence is the following: Some of the longer FIR filters on
the parallel paths in the LDL structure can be realized using multirate
architectures that reduce the CC of the realization. By developing a
mean-length lemma, it is shown that the realization of a set of FIR
filters with increasing length can offer some additional advantages
in multiplicative complexity (MC) over realizing an isolated filter,
while leaving the additive complexity (AC) unaffected.
FOR QUADRATIC
KERNELS
11. LDL STRUCTURES
Consider a quadratic Volterra filter acting on the current sample and
-If past samples of an input process ( I to produce the output process
P. The form of the equation describing this relationship is [3]
where
cy,, \ I
B,
H,
vector { t i [ i t ] . . . [ , [ t i - -\I]]‘
vector of linear coefficients
symmetric coefficient matrix associated with the quadratic
form.
Manuscript received February 24, 1994: revised December 13, 1995. The
associate editor coordinating the review of this papcr and approving it for
publication was Prof. Roberto Bamberger.
V. M. Gadre is with the Department of Elcctrical Engineering, Indian
Institute of Technology, Bombay, Powai, Mumbai, 400 076, India.
R. K. Patney is with the Department of Electrical Engineering, Indian
Institute of Tcchnology, Dclhi, India.
Publisher Item Identifier S 1053-587X(96)08229-3.
1053-587X/96$05.00 0 1996 IEEE
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 44, NO. I I , NOVEMBER 1996
2892
( l i n e a r FIR filter)
( q u a d r a t i c form
realized as a weighted
sum of parallel FIR
filters followed b y
squarer.)
Each inner product
1
Un,~L,
is realized using architectures of
t h e form o f f i g . 2 a n d l o r f i g . 3
Fig 1
Realimtion of quadratic kernel bd?ed on
The LDL’ decomposition of symmetric matrices may now be used
to decompose H , into the product of a lower triangular, diagonal,
and upper triangular matrix Due to the symmetry of H,, the upper
and lower triangular matrices are related through transposition
H,=
L,D,L,T
(2)
where L , is lower triangular with unit diagonal, and D , is diagonal
By doing so, it is possible to realize the quadratic kemel as a parallel
“LDL structure” [3] shown in Fig 1 Substituting (2) in (1) gives
f
where
u,[r?]
u,[ri]
by
i(4
[I?]=
is L: 1. ,
B,T T -” , IT
+ u;[r?]D,Ik,[t!]
decomposition
Fig. 2 provides an example of a multirate architecture that realizes
a given FIR transfer function H(;)with reduced CC. This will
henceforth be referred to as MR2, “2” being a mnemonic for the
downsampling and upsampling factor involved. The ideas leading to
this architecture are briefly reviewed from [4]. The input signal X (s ) ,
output signal l - ( z ) , and filter H ( s ) are each decomposed into their
polyphase components of order 2 as follows:
(3)
Denoting the Ith element of the vector
,=o
z[tzl,
1
\I
Ilq L [ t L ]
LDL‘
L,
=
,!U[??- J
+ I] +
11[H -
+ 11
(4)
2=0
i=2+1
Since D , is diagonal, (3) may be rewntten as
u,[n] = BTCTt2
41
+
11
(5)
Dft2(11,L [ r t ] ) ’
%=I
,
If H , and D are both of full rank, the number of multiplications
successively increases from 1 to M as one goes down the branches
If H , and D , are not of full rank, the number of parallel paths gets
reduced according to the rank
cc
111. MULTIRATE
ARCHITECTURES
FOR REDUCED
The filters in each of the parallel paths of Fig. 1 are linear. Some
of them may be realized using reduced CC multirate architectures.
From (6)-(8), it is clear that each of the terms on the right-hand
sides of the equations may be regarded as a polynomial of degree 1,
with “coefficients” equal to the polyphase components of the respective signals. Taking the product of the :-transforms X ( s ) and H ( x )
may then be regarded as taking a product of two polynomials for
which one may use an efficient polynomial multiplication Winograd
algorithm. The “multiplications” of “coefficients” in this algorithm
now translate into linear convolutions, which are implemented on the
channels of MR2. The analysis segment combines the input polyphase
components linearly for the purpose of providing appropriate input
sequences to the channels, whereas the synthesis segment combines
the results of these partial convolutions suitably to produce the output
polyphase components.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply.
lbEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 44, NO 1 I , NOVEMBbR I996
2893
Channel
f iltefS
\
-1
*
x Cnl
(with z-transform
x
(2))
analysis
_It-
segment
--.
synthesis
segment
y In1
(with z-transform y (z))
Fig. 2. Multirate system MR2 bwed on 2-by-2 point convolution algorithm. FIR filter H ( c ) realized with a delay of 1 sample.
Y
bl
(2-transform
y(t)
-t-l
(With Z
Fig. 3 . Multirate architecture MR3 realiLtrig a three-by-three point convolution algorithm FIR hlter H ( - ) realized with a delay of two samples
Other examples of multirate architectures that reduce the CC of FIR
filtering are given in [4]. An example of an architecture, henceforth
referred to as MR3, based on a three-point by three-point algorithm
[I, p. 851 is shown in Fig. 3. The channel filters in this figure are
derived from the polyphase components of order 3 of the FIR filter
H ( ; ) to be realized.
It can be verified, by writing I‘(2) in terms of X(): in MR2, that
a delay of one sample is introduced, resulting in the overall system
having a transfer function of L-’€€(z) instead of H ( , : ) . Similarly,
MR3 introduces a delay of two samples. In general, a multirate causal
system with a multirate factor of’\L introduces a minimum delay of
9- 1 [SI. Thus, MR2 and MR3 incur only a minimum delay of 1 and
2, respectively. However, this delay can be “absorbed’ conveniently
as is shown in Section V.
The MC and AC of direct realization, as well as of realization with
an arbitrary reduced CC multirate architecture, may be expressed in
the following general “slope-intercept” form:
MC = p L ;
+
AC = ~ J L 5.
(9)
where L is the length of the FIR filter segment being realized. The
constants 1) and s have been tabulated in Table I for direct realization,
realization with MR2, and with MR3.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply.
lEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO I I , NOVEMBER 1996
2894
Realization
P
S
Using M R 2
314
112
Using MR 3
213
2
Direct realization
1
-1
If MC is the lone criterion, it is seen that it is always advantageous
to use an architecture .4, for which p < 1, as compared with direct
realization. However, from the point of view of AC, the use of A
will be preferred to direct realization above a certain threshold length
only. If 4 is either MR2 or MR3, this threshold L r is derivable
from Table I and is given by
L7 =
s+1
1- y
-.
(10)
Substituting 11 and s for MR2 from Table I into (IO), it is seen that
MR2 would be preferred over direct realization for L, > 6. It may
also be inferred that MR3 is always advantageous as compared with
MR2 if MC is the lone criterion, but if AC is also considered, then
MR3 becomes advantageous over MR2 for L > 18.
the CC may be realized using such architectures. In this section, both
MC and AC will be kept in mind while making calculations. As
mentioned before, MR3 becomes advantageous over MR2 only for
L 2 18. If the quadratic kernel involves fewer than 18 samples of
the input, only MR2 need be used. That situation is considered first.
From (4), each FIR filter in the quadratic kernel of Fig. 1 has a
leading coefficient equal to I , which does not need a multiplication. If
we consider the impulse response coefficients of the FIR filters other
than the leading I , they form an FIR filter in their own right. MR2
can be used to realize them with no additional delay incurred since
they already have a delay of 1 incorporated. L in (11) will then be
taken to mean one less than the filter length since the leading unity
coefficient has been omitted.
The special feature of the current situation is that it is possible to
use the system MR2 from filter segment lengths less than six onwards,
even if the overall AC is to be left unaffected. This would be useful if
multiplications are much more cumbersome than additions in a given
signal processing situation, where one may gain more in overall MC
without losing in AC. The mean of the filter lengths in this set is
easily calculated to be ( L t L L l ) / 2 . From the mean length lemma
and Table I, the use of the architecture A, with given p7 and s t is
preferable to direct realization (for which = I, s , = - l), provided
+
IV. THEMEANLENGTH
LEMMA
FOR REALIZING
A SETOF FII~TERS
This section develops a lemma that addresses the following situation: Two realizations -4, and A, are considered for realizing each
filter in any set of AV filters keeping AC in mind. The set of filter
lengths is F’I, = ( L , . L2. . . . . L , v ) . From (9), AC varies with L
according to
AC,(L)= p,L
+ s,.
(11)
It is assumed that the slope parameter of A , , viz. y 3 , is less than
A, i.e., 11, < p i .
I ) The Mean Length Lemma: It is advantageous to use the realization A, as compared with ,4z in this situation if the mean of F/,
exceeds a threshold, i.e., provided
1
3, - s,
p L of
c
L>-.
-1-/,€rr, i’t
-
In particular, d48could be the architecture MR2. For MR2, s , = 1 / 2 ,
p) = 3/4, and hence, Lc L , , > 1 2 is required. It is not meaningful
to use MR2 for L , < 2 anyway since each of the two polyphase
components must include at least one filter coefficient. With L / = 2,
one would have L,, > 10.Of course, L,, is constrained by the number
of samples used in the Volterra kemel. If this number is only 9, for
example, then, as (17) indicates, one can use MR2 for all filters with
L
1 to 9.
The gain in MC is enhanced by using MR2 for L 2 Ll, rather
than L 2 6, if Ll < G . This is because one has availed of the MC
advantage for L , 5 L < 6 as well while suffering from no loss in
(L),
AC. The gain in MC as a function of L , which is denoted MCqa17,
as compared with direct realization, is
+
(12)
-
1’J
Proof( In order that 14, should be preferred over
MC,,,,,, ( L )
L - +L
1
d4z,
it should
be true that
The total gain in MC, which is denoted MCt,,,
is therefore
,I,
for L = Ll : L , ,
L
=>-1
s Ltrr
L > - si - b z
P I -P ,
which proves the lemma. If the filter lengths are consecutive, then
this lemma may be used to gain additionally in MC without losing
in AC. This is shown in the next section.
V. MULTIRATE
SYSTEMS
AND QUADRATIC
KERNELS
The FIR filters in the parallel paths of the LDL structure that are
“long enough” to merit the use of a multirate architecture for reducing
For the specific example of Lj = 4, L,, = 9, and MCL11,,L72
= 9.7.5.
The additional gain in MC due to the filters of length Li 5 L < G
having been realized with MR2 is then obtained by putting Li = 4,
L, = 5 in (19), and is 2.25. This computation may be repeated for
any values of Lt, and Ll.
The situation is now considered when the number of samples
involved in the quadratic Volterra kernel is greater than 18 and, hence,
large enough to warrant the use of MR3. Suppose 22 samples, viz.
. r [ n ] . . . . . ,r[n - 211, are involved in the quadratic kemel. A look at
(4) reveals that none of the FIR filters of the LDL structure, except
. other words, all of
the longest one, involves the sample s [ n ] In
them, barring the first one, have inbuilt delays, which increase with
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 11, NOVEMBER 1996
decreasing length. Therefore, in the current example, the shortest filter
has an inbuilt delay of 21; the filter with length 2 (and hence L = 1)
has a delay of 20, and so on. For using MR3, a delay of two samples
would have to be taken care of. This is implicitly provided for in all
the filters of length less than or equal to 20. For the length 21 filter,
there is an inbuilt delay of 1, and on omitting the leading coefficient
of 1 as explained in the beginning of this section, an additional delay
of 1 has automatically been provided for, as required by MR3. For the
length 22 filter, however, one additional coefficient must be realized
“loose” as explained earlier, other than the leading coefficient of 1 .
Thus, for the length 21 and 22 filters, 20 filter coefficients can be
included in the realization employing MR3 while taking care of the
delay incurred.
Were a single filter segment being considered in isolation, then
it would be appropriate to use MR3 for L > 18. However, a
consequence of the mean length lemma is that one may begin from a
smaller length since all that one requires is that the mean length of the
set of filter segments realized using MR3 be greater than 18. Thus,
one may use MR3 to realize the filters with length 17 onwards in
this example. From the preceding discussion, the set FI, for thejfilter
segments being realized with MR3 in this case is (16, 17, 18, 19, 20,
20) after considering coefficient omissions to take care of delays and
leading unity coefficients. The mean is 18.33, which is greater than
18. The MC of MR2 is (3/4)L, and that of MR3 is ( 2 / 3 ) L .Thus,
one has additionally gained by ( 3 / 3 - 213) * (16 17) = 11/4 in
MC by realizing the filter segments of lengtlh 16 and 17 using MR3.
+
VI. CONCLUSION
In this correspondence, the use of multirate architectures for
the realization of quadratic Volterra kernels with reduced CC is
investigated. A mean-length lemma is developed to explain the
variations that arise when a set of FIR filters is being realized by
using multirate architectures as opposed to an isolated filter.
REFERENCES
R. E. Blahut, Fast Algorithmsfor Digital Signal Processing. Reading,
MA: Addison-Wesley, 1985.
N. K. Bose, Digital Filters: Theory andApp1ication.r New York: North
Holland, 1985.
Y. Lou, C. L. Nikias, and A. N. Venetsanopoulos, “Efficient VLSI array
processing structures for adaptive quadratic digital filters,” Circ., Sys.,
Signal Procesisng, vol. 7, no. 2, pp. 253-273, 1988.
Z. J. Mou and P. Duhamel, “Short-length FIR filters and their use in
fast nonrecursive filtering,” IEEE Trans. Signal Processing, vol. 39, no.
6, pp. 1322-1332, June 1991.
M. Vetterli, “A theory of multirate filter banks,” IEEE Trans. Acoust.,
Speech, Signal Processing, vol. ASSP-35, pp. 356-372, Mar. 1987.
-,
“Running FIR and IIR filtering using multirate filter hanks,” IEEE
Trans. Acoust., Speech, Signal Processing, vol. 36, no. 5, pp. 730-738,
May 1988.
2895
A Nonlinear Analytical Model for the Quantized
LMS Algorithm-The Power-of-Two Step Size Case
Neil J. Bershad and JosC Carlos M. Bermudez
Abstruct- This correspondence presents a study of the quantization
effects in the finite precision LMS algorithm with power-of-two step
sizes. Deterministic nonlinear recursions are presented for the mean and
second-moment matrix of the weight vector about the Wiener weight
for white Gaussian data models and small algorithm step size p. The
numerical solutions of these recursions are shown to agree very closely
with the Monte Carlo simulations during all phases of the adaptation
process. Design examples demonstrate the selection of the number of
quantizer bits and the adaptation step size fi to yield a desired transient
behavior and cancellation performance. The results obtained indicate that
previous models are too conservative in predicting the converged MSE
for a given number of bits.
I. INTRODUCTION
The least mean squares (LMS) algorithm is very popular in implementations of real-time high-speed digital adaptive filters. Fixed-point
arithmetic is prevalent in such applications. The effects of a finite
word-length on the behavior of the LMS algorithm have been
studied in [1]-[8]. In particular, [8] extended the conditional moment
techniques developed in [4]-[7] to the study of the nonlinear behavior
of the quantized LMS adaptation using arbitrary step sizes ,U. The
reader is referred to [8] for further details of the problem.
For arbitrary j i , the LMS updating equation requires two multiplications [9]. First, p is multiplied by the error signal. The result
is then quantized and multiplied by the input signal. Finally, a
second quantization determines the updating term [8]. A different
implementation is possible when p is an exact power of two. In
this case, multiplications by 1’ are usually realized as right shifts.
The error and input signals are first multiplied in double precision.
The result is then shifted (multiplication by j ~ )and quantized to
single precision. Compared with the arbitrary step size case [8], this
implementation substantially modifies the algorithm behavior. The
convergence becomes controlled by the quantized value of the entire
weight update term. This was the problem studied in [1]-[3] using a
linear model and in [4] using a continuous nonlinear function.
This note studies the nonlinear behavior of the quantized LMS
algorithm when products by a power-of-two step size p are implemented as right shifts. The results for the arbitary step size case, which
have been derived in [SI,cannot be used because the operational order
is different, and the quantizer input is a product of two unquantized
signals. Furthermore, the mathematical approach used in [8] cannot
be applied either. Instead, the quantizer operation is expressed as the
sum of linear and periodic functions. Then, characteristic functions
are used to evaluate conditional expectations in the adaptive weight
recursion. A small j i approximation yields recursive equations for
the mean and second moment matrix of the weight vector about the
Wiener weight. The recursions are solved numerically and shown to
Manuscript received September 1, 1994; revised April 2, 1996. This work
was supported, in part, by the Brazilian National Council for Development of
Science and Technology (CNPq) under grant No. 201532/93-0. The associate
editor coordinating the review of this paper and approving it for publication
was Prof. JosC M. F. Moura.
N. J. Bershad is with the Department of Electrical and Computer Engineering, University of California, Irvine, Irvine, CA 92692 USA.
J. C. M. Bermudez is with the Department of Electrical Engineering, Federal
University of Santa Catarina, Florianopolis, SC 88040-900, Brazil.
Publisher Item Identifier S 1053-587X(96)08230-X.
1053-587W96$05,00 0 1996 IEEE
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply.