single flux quantum one-decimal-digit rns adder

PII:
Applied Superconductivity Vol. 6, Nos 10±12, pp. 609±614, 1998
# 1999 Published by Elsevier Science Ltd. All rights reserved
Printed in Great Britain
S0964-1807(99)00018-6
0964-1807/99 $ - see front matter
SINGLE FLUX QUANTUM ONE-DECIMAL-DIGIT RNS
ADDER
NADA VUKOVIC and MARC J. FELDMAN
University of Rochester, Rochester, NY 14627, USA
AbstractÐResidue number system (RNS) arithmetic has a promising role for fault-tolerant high
throughput superconducting single ¯ux quantum (SFQ) circuits for digital signal processing (DSP) applications. We have designed one of the basic computational blocks used in DSP circuits, one-decimaldigit RNS adder. A new design for its main component, the single-modulus adder, has been developed.
It combines simple and robust RSFQ elementary cells, both combinational and sequential. The central
units are a circular shift register, a code converter, and the clock control circuitry. Our mod5 adder
employs 195 Josephson junctions, consumes 50 mW of power, and occupies an area of less than 2 mm2.
Chips were fabricated at HYPRES, Inc. using 1 kA/cm2 low-Tc Niobium technology. The mod5 adder
was successfully tested at low speed, and gave experimental bias margins of 226%. # 1999 Published
by Elsevier Science Ltd. All rights reserved
INTRODUCTION
The use of the residue number system (RNS) o€ers the possibility of high-speed processing
because of the separability of operation on each of the residue digits [1]. In addition, RNS arithmetic is intrinsically fault tolerant [2]. Common signal processing tasks such as digital ®ltering,
correlation, interpolation, prediction, and spectral analysis characteristically require large numbers of addition, subtraction, multiplication, negation, etc., operations that are very suitable for
RNS arithmetic [3]. RNS design techniques have been most successful to date for ®nite impulse
response (FIR) ®lters, which can take full advantage of fast RNS operations while avoiding the
problems associated with scaling [4].
There is continually increasing interest in the realization of ultra-high speed, very low power
superconducting LSI using single ¯ux quantum (SFQ) logic. The most prevalent is rapid single
¯ux quantum (RSFQ) logic [5]. This and most other superconducting logic schemes implement a
binary approach to perform basic arithmetic operations, thus inheriting some of the weaknesses
of the semiconductor binary logic, such as the carry propagation problem in addition and multiplication that may ultimately limit the performance of the system. In this paper we present
results on SFQ circuits which perform RNS arithmetic.
Both RNS arithmetic and superconducting digital logic have been recognized as especially
well suited for high performance digital signal processing (DSP) circuits, where the computation
is dominated by a repetitive sequence of multiply and add operations with infrequent calls to
memory, and high speed is the primary criterion. The ®rst RNS implementation in superconducting electronics based on processing SFQ pulses was proposed in [6].
The basic architecture of an SFQ one-decimal-digit adder is presented in Fig. 1. Inputs X and
Y are two one-decimal-digit integers, i.e. X = [0, 9] and Y = [0, 9]. The output, which is the
sum of X and Y, can be any integer between 0 and 18. Two single-modulus adders, mod5 and
mod4, are sucient to cover this dynamic range. The cyclic shift register (SR) is the primary circuit element used to code RNS numbers and perform arithmetic.
The particular design for the single-modulus adder detailed in [6] lacks a robust RSFQ circuit
implementation, for two reasons. The ®rst problem is that counter¯ow and concurrent clocking
are mixed together, using the same clock line. In counter¯ow clocking the data ¯ows faster then
the clock, and in concurrent clocking the opposite is true, the clock propagates faster then the
data [7]. This results in narrower margins to process variations in fabrication, because the only
adjustable parameter, the bias current, has a tendency to speed up one part of the SR and slow
down the other, or vice versa. Second, the choice of non destructive readout (NDRO) cell and
DMUX in the feedback part of SR results in an overall lower maximum operating frequency as
609
610
N. VUKOVIC and M. J. FELDMAN
Fig. 1. Block diagram of an SFQ one-decimal-digit RNS adder.
well as layout constraints. The NDRO cell has never been established as one of the more robust
RSFQ cells.
This paper presents a new design for the single-modulus adder used in the one-decimal-digit
adder. It combines simple and robust RSFQ elementary cells, both combinational and sequential. The design and successful functionality test of the mod5 adder will be discussed.
DESIGN
Figure 2 shows a block diagram of our mod5 adder. It performs mod5 addition for two residues, vXv5 and vYv5. The circuit consists of a ®ve stage shift register (SR5), a feedback path with
an AND gate and con¯uence bu€er (CB), a code converter (CC) composed of destructive readout (DRO) cells, splitters (S) and CBs, and an output AND gate. The overall clocking scheme is
counter¯ow, i.e. clock and data ¯ow in opposite directions. The numbers are coded in ``1-outof-n'' code, where n is a given modulus; in this case n is 5. Because of the chosen coding scheme,
the design of the SR with the feedback loop is better suited to satisfy the timing requirements
that exist in any synchronous clocking scheme when a stream of successive one's is applied [8].
Fig. 2. Block diagram of an SFQ mod5 adder. Notation: SR: shift register; DRO: destructive read-out
cell; CB: con¯uence bu€er; S: splitter, OR: or gate; AND: and gate; JTL2: two junction Josephson
transmission line.
Single ¯ux quantum one-decimal-digit RNS adder
611
Fig. 3. Circuit diagram and parameter values of the single stage of SR.
The code converter converts the ``1-out-of-n'' code into ``number-of-pulses'' code. Two signals
clk5 and clk5' are 1808 out of phase and represent sequence of ®ve ones (SFQ pulses) and ®ve
zeros. These signals could easily be generated using a simple ten stage circular SR as part of the
control circuitry for entire mod5 portion of an LSI RNS circuit.
Input vYv5, which is delayed by four clock cycles, is applied to the serial input of the code converter. The output of the code converter is applied to the clock input of the SR. The second
number, vXv5, is applied to the input of the SR. In the ®rst ®ve clock cycles the number vXv5 is
loaded into the SR. In the second ®ve clock cycles it will be advanced by vYv5, stages around the
SR, and so the number in the SR now represents vXv5 + vYv5. This sum is then readout through
the second AND gate at the same time as the subsequent vXv5 number is loaded. Note that each
AND gate is used to perform the function of a switch in this design. These switches replace the
DMUX and NDRO which were used in [6].
A schematic of the SR cell and its optimized parameters are shown in Fig. 3. The same cell
has been implemented in a 4-bit data acquisition shift register and tested with experimental bias
margins of 240%, in [9].
The mod5 adder and its subcells were fully optimized at 10 GHz using MALT [10] and
JSPICE [11]. Table 1 shows the resulting parameter margins from optimization and bias current
margins from experiment.
Table 1. Margins of mod5 adder and its subcells (%)
Cell name
AND
OR
Code Conv
5-stage SR
mod5 adder
GL$ (%)
GIcb$ (%)
40
42
41
35
33
55
59
56
40
39
Simulated bias (%)
ÿ55,
ÿ58,
ÿ34,
ÿ38,
ÿ30,
+48
+63
+32
+42
+35
Experimental bias (%)
240
238
229
235
226
$Percentages are lower bounds on the margins corresponding to the ``axis lengths'' ®gure of merit returned by MALT.
GL and GIcb denote the global inductance and global critical current with the bias current adjusted proportionally,
respectively.
612
N. VUKOVIC and M. J. FELDMAN
LAYOUT AND TESTING
The mod5 adder was laid out using the Cadence Design Framework II graphical environment
[12] calibrated for the HYPRES, Inc. standard Nb process [13]. The micrograph of the circuits
is shown in Fig. 5. It was fabricated at HYPRES, Inc. with target junction critical current density of 1 kA/cm2. The mod5 adder employs 195 Josephson junctions, consumes 50 mW of power,
and occupies an area of less than 2 mm2.
Low speed testing was performed using our automated thirty-nine channel data acquisition
setup, controlled by a PC running Labview. Figure 4 shows the low speed test results on the
mod5 adder. All critical combinations of the inputs are successfully tested. Figure 4 shows only
three combinations: vXv5 = 4 (00001), and vYv5 = 2 (00100), 3 (00010) and 4 (00001). The resulting sums are vXv5 + vYv5 = 1 (01000), 2 (00100) and 3 (00010), respectively. The upper traces
represent external inputs and clock signals which are coded using the return-to-zero (RZ) convention. Each edge triggers an SFQ pulse from a DC/SFQ converter. Outputs are captured by
an SFQ/DC converter, where each transition corresponds to one SFQ pulse. The code converter
output (Code Conv_Out) shows the correct sequence of pulses when data vXv5 is applied, i.e.
seven (5 + 2), eight (5 + 8) and nine (5 + 4) transitions. The experimental bias margins of the
circuit and its subcells are shown in Table 1.
CONCLUSIONS
A new single module adder, mod5 adder, the main component of a one-decimal-digit RNS
adder, was designed, laid out and evaluated at low speed. The mod5 adder consists of simple, elementary RSFQ cells and has a very good parameter margins. The circuit presents ®rst successful implementation of residue number system arithmetic in single ¯ux quantum superconducting
technology.
Fig. 4. The experimental results of the mod5 adder.
Single ¯ux quantum one-decimal-digit RNS adder
613
Fig. 5. Micrograph of the mod5 adder.
AcknowledgementsÐThe authors would like to thank Qing Ke for bringing the RNS concept to their attention. This
work was supported in part by the University Research Initiative at the University of Rochester, sponsored by the Army
Research Oce under Grant No. DAAL03-92-G-0112.
REFERENCES
1. N. S. Szabo and R. I. Tanaka, Residue Arithmetic and Its Applications in Computer Technology. McGraw-Hill, New
York (1967).
2. P.E. Beckmann and B.R. Musicus, IEEE Trans. Signal Proc. 41, 2300 (1993).
3. M. A. Soderstrand, W. K. Jenkins, G. A. Jullien, and F. J. Taylor (eds.), Residue Number System Arithmetic:
Modem Applications in Digital Signal Processing. IEEE Press, New York (1986).
614
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
N. VUKOVIC and M. J. FELDMAN
M.A. Soderstrand and R.A. Escott, IEEE Trans. Circuits Syst. CAS-33, 5 (1986).
K.K. Likharev and V.K. Semenov, IEEE Trans. Appl. Superconduct. 1, 3 (1991).
Q. Ke and M.J. Feldman, IEEE Trans. Appl. Superconduct. 5, 2988 (1995).
K. Gaj, E.G. Friedman and M.J. Feldman, IEEE Trans. Appl. Superconduct. 5, 3320 (1995).
C.A. Mancini, N. Vukovic, A.M. Herr, K. Gaj, M.F. Bocko and M.J. Feldman, IEEE Trans. Appl. Superconduct.
7, 2832 (1997).
Q.P. Herr, K. Gaj, A.M. Herr, N. Vukovic, C.A. Mancini, M.F. Bocko and M.J. Feldman, IEEE Trans. Appl.
Superconduct. 7, 2975 (1997).
Q.P. Herr and M.J. Feldman, IEEE Trans. Appl. Superconduct. 5, 3327 (1995).
S.R. Whiteley, IEEE Trans. Magn. 27, 2902 (1991).
Cadence Corporation, Cadence Openbook, San Jose, CA (1993).
HYPRES Niobium process ¯ow and design rules are available from HYPRES, Inc., 175 Clearbrook Road,
Elmsford, NY 10523, http://www.hypres.