D I S T R I B U T E D ARITHMETIC ARCHITECTURE FOR IMAGE CODING S . N. Merchant B. V. Rao 8t A.C.R.E., I.I.T., Powai, BOMBAY-400 076 ABSTRACT T h e aim o f this paper i s t o describe the development of a hardware circuit based on distributed arithmetic architecture t o obtain the fast DCT of a given image. It not only provides DCT transform coefficients biut also other transforms' coefficients. I n addition t o t h i s it also provides inverse transforms with little o r n o change in components/ interconnections. It has also been demonstrated that distributed arithmetic circuit can be used t o obtain fourth order FIR/IIR filter. Orthogonal transforms have been used for a long time in field of image coding for reduction in t h e amount o f data needed for storeltransmit a n image. Many transforms have been used f o r t h i s purpose. But o f all the transforms t h e discrete cosine transform has been found t o be the best suitable transform for image coding since it gives t h e best compression ratio for a given amount o f mean-square error. T h e aim of this paper i s t o describe the development of a hardware circuit based o n distributed arithmetic architecture t o obtain the fast DCT of a given image. The image i s first broken into small blocks of 8x8 matrices. The hardware circuit developed and described in this paper n o t only provides DCT transform coefficients but also other transforms' coefficients. In addition t o this it also provides inverse transforms with little or no change incomponents/ interconnections. The change in the circuit for different transforms i s just a change in the look-up-table stored in the ROM. It has also been demonstrated that distributed arithmetic circuit can b e used t o obtain fourth order FIR filter. The modification needed i s that input data bits a r e given serially t o the serial input of one of the input registers. W e have also shown that a fourth order IIR filter can be 4 point distributed implemented using arithmetic circuit if four additional parallel t o serial registers a r e used. 2.DISTRIBUTED ARITHMETIC PROCESSING In image processing o n e has t o obtain following function N y = z a x where a n=l is often (1) a coefficients, and x the data values a r e set of are data such n=l Interchanging t h e order o f over indices n and k yields y = E- 1 2-k N k=l anxn k - n=l predetermined values. that lxnl<l ....,xi) F(x:,xE, N summation a x with N binary Fl: = (4) anxn = Fk n=l Then w e can write v a s E-1 k 0 y 2 kF(x:,x2 k xN)- F(xY,. .,xN) (5) k=l T h u z fiven value o f the function F(x1,x2 i t i s possible t o compute y ,., ,..., =c :i: ,...., by using additions (subtraction for k=O) and shift operations only. S i n c e the arguments only, F has a of 'F' can take values 0 o r o f possible finite number (equal t o 2*) outcomes. T h i s s e t of possible outcomes a r e stored in a memory a s a look-up-table. The look-up-ROM i s accessed using t h e arguments o f 'F' a s address. T h e output o f the ROM, i.e. 'Fk' i s given t o fiLU units where appropriate addition and shift operations a r e performed. This method o f obtaining N= 1 i s known a s distributed arithmetic system. IMPLEMENTATION OF A DISTRIBUTED ARITHnETIC SYSTEM 3. ~ ~~ N We have represented as y = E- 1 2-kF(x:,x2 the k equation ,...,xi) - = C anxn n=1 y F(x1x2 0 0 ,...xo) k=l F o r simplicity w e assume B = 8 and then N N = 0, 7 . I Let A F ( x! k ,x2, ....xk 1 be represented as ' F 'then k 7 If y = and k= 1 2-kF-F k 9 4.3.1 74 (3) n=1 W e now define a function F valued arguments a5 follow INTRODUCTION 1. represented in signed 2's complement c o d e of above equation a s E 1 3 E bits accuracy then w e can write t h e CH2766 - 4/89/0000 - 0074 0 1989 IEEE Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply. (7) Thus if o n e knows ' Z m * and 'Sm' +F4 ) .2-'+Fg) .2-l+F2) .2-'+F1) .2-l+ Fo (8) The above equation can now be implemented using fllowing hardware components: (1) eight 8-bit registers (parallel t o serial). (2) CI ROM o f capacity Z8= 256 locations which stores the look-up-table. (3) CI register which acts a s a memory data register or pipe line kegister so that when adder acts on 'Fk+Fk-l', t h e ROM i s being bits t o obtain Fk-2. accessed by xk-2 (4) CIn adder / subtractor. ( 5 ) CI temporary register t o store inter- DISTRIBUTED CIRITHMETIC 41 DISCRETE 4. COSINE TRf3NSFORM Discrete cosine transform of sequence 'Xm' i s given a s N- 1 Yk = akm Xm m=Q where 1 a = k=Q km K , k = Q,l,...., cos [-i2.'iikE-- a data (9) N-1 -, = f = 1 ,k # Q (10) 2N Thus each ' Y k * can be treated a s an equation of t h e type of 1). T h u s if N = 8, then for each Yk (represented in 8 bits) w e need a ROM of 256 locations. Fig. 1 gives implementation of 8-point DCT using distributed arithmetic circut. Even though 16-registers i s not a small number (each regkster having para3hel to serial function) the ROM s i z e of 2 is too large and expensive t o implement. So if one has to develop a distributed arithmetic system t o implement 16-point DCT, then t h e memory size has t o be brought down t o s o m e practical level. With this in aim, it can be m = Q,1,2, N-1 i s a shown that if 'Xm .... ', input sequence, then DCT i s given by N/2)-1 k c even (11) yk e m=Q bkm ' m ' N-point N/S)-l k' E m=Q k c odd Ckm 'm, (12) where m' + 'N-1-m = m' m' + 'N-1-m = m' bkm = c k cos w e need a memory capacity of 2N/2 t o obtain o n e even o r odd term of t h e DCT. Thus the total memory capacity needed will be N.ZNf2 N N/2+ ZN/2 (---.2 ). 2 A block schematic t o obtain DCT using pre-addition and pre-subtraction a s shown in Fig.2 for N=8. . 9. mediate results. [--;--I (2m+l)krr then o n e can obtain t h e even and odd numbered DCT output coefficients separately. Also s i n c e t h e summation is now done only for N/Z terms IMPLEMENTCITION OF 6-POINT DCT USING DISTRIBUTED CIRITHMETIC It w a s shown above that a 8-point DCT can be obtained using a pair of 4-point distributed arithmetic circuits. But s i n c e t h e two 4-point distributed arithmetic circuits differ only d u e t o t h e different look-up-tables stored in t h e ROMs, if both t h e look-up-tables [ o n e for even numbered and o n e for odd numbered coefficients) a r e stored in a single ROM, then only o n e 4 point distributed arithmetic circuit is enough. Of course now t h e speed is reduced by an equivalent factor. Hence t o reduce the hardware, a 4 point distributed arithmetic circuit w a s implemented t o obtain DCT of a N x N image matrix in blocks o f 8 x 8. A 4-point distributed aritmetic 8 point DCT using system t o obtain microprocessor for above mentioned i s shown in Fig.3. T h e memory i s shown a 5 a 16 x 16 array, since 16 is t h e next higher multiple of 8. Therefore any program written t o obtain DCT of a 16 x 16 image in blocks of 8 x 8, can be modified with t h e help of a few changes in s o m e microprccessor instructions t o obtain DCT o f a bigger sized image matrix. A two dimentional DCT of an image i s obtained by row transformationsfollowed by column transformations. CIPPLICCITIONS OF 4 P O I N T DISTRIBUTED CIRITHMETICCIRCUIT 6. ( 1 ) In addition t o obtaining DCT and IDCT t h e s a m e circuit with s o m e modifications and different look-up-tables ( R O M ) can be used t o obtain Slant transform, Walsh-Hadamard transform, and discrete Fourier transform and i t s inverses. (2) T h e 4 point distributed arithmetic circuit can be used t o obtain fourth order FIR filter. T h e modification needed i s that input data bits a r e given serially t o t h e serial input of o n e o f the input registers. T h i s type of a scheme t o obtain fourth w - d e r FIR filter i s shown in Fig.4. ( 3 ) CI fourth order IIR filter can a l s o be 4 point distributed implemnted using arithmetic circuit, if four additional parallel t o serial registers a r e used. Fig.5 s h o w s a block diagram t o implement fourth order KIR filter. REFERENCES C l 3 CI. Peled c4 R. Liu, Digital Signal Processing-Theory, Design and Implementatio, New York, John Wiley c4 Sons, 1976. 4.3.2 75 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply. c U 0 - 0 W3lSAS CO 80s S3308d083IH f Q. I- m 0 2 -13 I- ? 'L U E LL U U I- W I I t [L a n W U I c 3 I E! U c !L! a OI3-l- - n x Q 1"1 ADDER U I!PARALLELTO SERIAL I 1 I INPUT :FASTER INPUT REGISTER ROM a IMEMORY DATA REGISTERI- MEMORY DATA REGISTER * I , OUTPUT REGISTER I 1- r. , I OUTPUT REGISTER FI G - L L p i . DISTRIBUTED ARITHMETIC CIRCUIT TO OBTAIN 8 p l . D C T 4.3.3 76 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply. 0 3 FIG. 4 L t h O R O E R F I R FILTER USING 4 pf DISTRIBUTED ARITHMETIC I -- I R O M i c: ,* MEMORY DATA REGISTER FIG. f 41h ORDER I I R FILTER USING L PI. DISTRIBUTED ARITHMETIC 4.3.4 77 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
© Copyright 2025 Paperzz