# Ultra-Robust Null Convention Logic Circuit with Emerging Domain Wall Devices

Yu Bai University of Central Florida 4000 Central Florida Blvd. Orlando, FL, 32817 ybai@knights.ucf.edu Bo Hu University of Central Florida 4000 Central Florida Blvd. Orlando, FL, 32817 bo.hu@knights.ucf.edu

Mingjie Lin University of Central Florida 4000 Central Florida Blvd. Orlando, FL, 32817 mingjie@eecs.ucf.edu

# ABSTRACT

Despite many attractive advantages, Null Convention Logic (NCL) remains to be a niche largely due to its high implementation costs. Using emerging spintronic devices, this paper proposes a Domain-Wall-Motion-based NCL circuit design methodology that achieves approximately 30x and 8x improvements in energy efficiency and chip layout area, respectively, over its equivalent CMOS design, while maintaining similar delay performance for a 32-bit full adder. These advantages are made possible mostly by exploiting the domain wall motion physics to natively realize the hysteresis critically needed in NCL. More Interestingly, this design choice achieves ultra-high robustness by allowing spintronic device parameters to vary within a predetermined range while still achieving correct operations.

# **CCS Concepts**

•Hardware  $\rightarrow$  Asynchronous circuits; Reconfigurable logic applications; Spintronics and magnetic technologies;

# **Keywords**

Spintronics and magnetic technologies; NULL Convention Logic; Asynchronous circuit

# 1 Introduction

Delay-insensitive asynchronous circuit possesses many attractive properties, such as low PVT device susceptibility, high energy efficiency, high robustness, great module reusability due to its clockless nature, and the much-coveted correctby-construction property, i.e., timing analysis is not required for its correct operation [1]. Among many architectural variations of asynchronous circuits, NULL Convention Logic (NCL) is one of the most promising candidates. In fact, many prior studies, including real chip fabrications, have Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

GLSVLSI '16, May 18-20, 2016, Boston, MA, USA

© 2016 ACM. ISBN 978-1-4503-4274-2/16/05...\$15.00

 ${\tt DOI: http://dx.doi.org/10.1145/2902961.2903019}$ 

shown that NCL can be effectively designed and implemented with standard-cell based methodology [9, 7, 10].

Weidong Kuang

The University of Texas Rio

Grande Valley

1201 W University Drive

Edinburg, TX 78539

kuangw@utrgv.edu

Unfortunately, NCL circuits have some notable shortcomings despite many significant advantages. First, the correct operation of a NCL circuit critically depends on the use of *hysteresis*, which requires the support of complicated control mechanism. Second, its use of dual-rail logic signalling based on 1-hot delay-insensitive code needs two wires per bit in NCL, thus approximately doubling its transistor usage relative to traditional CMOS circuits. Finally, the NCL circuit design is largely incompatible with the existing commercial EDA tools. As a result, fewer people are trained in this style compared to synchronous design.

Clearly, given its high hardware overhead, NCL is justifiably hard to adopt without innovations in circuit design. Fortunately, emerging spintronic device technology may offer at least two precious opportunities to revive the NCL circuit design.

- One essential requirement of NCL's correct operation is to keep delay insensitivity with hysteresis, which is significantly expensive to implement with conventional CMOS circuits. Interestingly, some emerging spintronic-based devices naturally exhibits certain physics property similar to the hysteresis in nature. Therefore, it is quite plausible to devise innovative circuit design to natively exploit these physics behavior without complicated control mechanism.
- Spintronic devices, such as magnetic tunnel junctions (MTJ's), spin-valves, and domain-wall magnets (DWM), use spin transfer torque, instead of charge, as the medium of information processing, therefore offering not only ultra-low critical current (e.g.,  $\leq 100 \ \mu$  A at 65 nm), simple switching scheme, and ultra-fast-speed, but also many fascinating probabilistic-related physical properties. All these can potentially enable new NCL design methodologies in order to circumvent the reliability issues caused by the large device variations widely found in spintronic devices.

In this paper we propose a new asynchronous NCL circuit topology based on magnetic domain wall logic. Our major contributions include:

- 1. In conventional CMOS-based circuit design, complicated and costly control modules have to be added in order to support the hysteresis critically needed for the correct NCL operations. In this study, we instead exploit the inherent hysteresis switching property possessed by the domain-wall-motion devices. This significantly reduces the circuit design complexity of our spintronic-based NCL circuits.
- 2. Leveraging emerging device technology for high performance, even for NCL circuit design, is not a new idea. However, most existing studies focus on using spintronic devices as high-performance switching devices, therefore following almost identical circuit design methodologies as with CMOS. We instead deviate from this common approach. As a result, the correct operation of our spintronic-based NCL circuits only requires the device parameters to be in a predetermined range, thus being ultra-tolerant to the high spintronic device variations.

The rest of this paper is organized as follows. First, Section 5 introduces the basic concept and circuit design of NCL. Next, Section 3 briefly describes the basic physics and the operational physics of a DWM device. In particular, we attempts to make a connection between the physics of spintronic devices and the NCL mechanism. In Section 4 and 5, we present the implementation details of a typical  $\text{TH}_{m,n}$  gate and a spintronic-based dual-rail NCL circuit, respectively. Subsequently, Section 6 presents the performance comparison results between our spintronic-based dual-rail NCL circuit and its CMOS counterpart for a full adder with variable bit width. Finally, we conclude our study in Section 7.

# 2 NCL Concept and Circuit Design



Figure 1: NCL overall scheme: input wavefronts are controlled by local handshaking and completion detection signals. (a) Traditional NCL pipeline. (b) Symbol and structure of threshold gate TH23. (c) Implementation of logic function  $Z = X \oplus Y$ . (d) Two-bit register and completion detector [2].

NCL circuit typically consists of multiple stages, each of which contains at least two registers, one at the input and one at the output, and can be finely pipelined by inserting additional registers. As shown in Figure. 1(a), two adjacent register stages interact through their request and acknowledge signals,  $K_i$  and  $K_o$ , respectively. To prevent the current DATA wavefront from overwriting the previous DATA wavefront, these two DATA wavefronts are always separated by a NULL wavefront. The acknowledge signals are combined in the Completion Detection circuitry to produce the request signal(s) to the previous register stage, utilizing either the full-word or bit-wise completion strategy. Specifically, NCL circuit methodology exploits two core ideas, dual-rail signaling and NULL signal propagation, to achieve delayinsensitivity. In NCL, each dual-rail signal, D, transported by two wires,  $(D_0, D_1)$ , can assume one of three possible values, logic 0, logic 1, NULL state, encoded as (1,0), (0,1), and (0,0), respectively. The unique Null state has special meaning that the value of D is not yet available. Note that  $D_0$  and  $D_1$  are mutually exclusive, such that both rails can never be asserted simultaneously, therefore (1,1) is defined as an illegal state.

NCL commonly uses threshold gates with hysteresis for its basic circuit elements. The primitive type of threshold gate is the  $\operatorname{TH}_{m,n}$  gate with n inputs  $(1 \leq m \leq n)$ , where at least m of n inputs must be asserted before the output will become asserted. The typical gate symbol denoting a TH23 is shown in Figure. 1(b). Threshold gates can be composed to construct NCL combinational logic blocks, NCL registers, and completion detectors. Figure. 1(c) illustrates the implementation of an NCL combinational logic block  $Z = X \oplus Y$ using threshold gates. Figure. 1(d) depicts the implementation of a 2-bit NCL register and a 2-bit completion detector using threshold gates. Generally, the implementation of an n-bit NCL register needs 2n TH22 gates, and the implementation of an n-bit completion detector requires n 2-input OR (i.e., TH12) gates and an n-input C-element (i.e., THnn). One important result in designing NCL circuits is that a set of only 27 fundamental NCL gates can implement any logic function with four or fewer variables, i.e., logically complete.

#### **3** Spin-Hall-Effect Device Physics



Figure 2: Schematic illustration of domain wall motion device. (a) Simplistic conceptual view. (b) and (c) More realistic four-terminal DWM cell structure. PL: Pinning layer. RL: Reference layer with fixed magnetization. FL: Free layer.

The basic concept of the DW-motion device is that the stored information is associated with the DW position in a magnetic wire. As shown in Figure. 2(a), through controlling the position of the domain wall (DW), a current-induced magnetic domain wall (DW)-motion device with a threeor four-terminal structure can potentially enable interesting memory and logic functions. Both ends of the magnetic wire have their magnetization fixed in the antiparallel direction relative to each other. The bidirectional current applied into the wire drags the DW back and forth, thus switching the stored information. As such, many recent studies have explored to implement novel integrated circuits with DWM devices, although mainly focused on Boolean-based logic circuits and used DWM devices as high-performance logic switches. For example, the DW motion depicted in Figure. 2(a) has been proposed to replace high-speed working memories in integrated circuits such as static random access memories (SRAMs), which are now facing the scaling limit. Furthermore, since the DW-motion devices, like other spintronics devices, require no power supply to retain information and can be integrated in the back-end-of-line process, their implementation into integrated circuits with logic-in-memory architecture and power gating techniques allows drastic reduction of data transfer delay and power consumption originating from charge-discharge in the interconnection and leakage current in standby mode, which are also urgent issues concerning recent electronics development.



Figure 3: (a) Simulation of domain wall moving by current injection in terminal T1, the domain wall is moving to right by the spin polarized electrons. (b) Compact model presents good agreement with micromagnetic simulation for DW motion speed V as a function of current density  $j_p$  [12]. (c). A non-zero current inject to DW motion and obtains results in a hysteresis in the DW switching characteristics [3, 8].

More practically, as shown in Figure. 2(b), a MTJ device is laid on the top of DW with a fixed polarity magnetic used to read the resistance. The moving of domain wall is effected by magnitude, direction and duration of injection current. To illustrate and validate such behavior, we have conducted a domain wall moving simulation with the standard mumax<sup>3</sup> software. Our obtained results, presented in Figure. 3(a), clearly show the domain wall moving with different velocity by injecting different magnitudes of current  $(1.5 \times 10^{13} \text{A/m}^2)$  into the terminal T1. This simulation utilizes the device parameters: damping coefficient  $\alpha = 0.02$ , uniaxial anisotropy constant  $Ku = 5.9 \times 10^5 \text{J/m}^3$ , saturation magnetization Ms =  $6 \times 10^5$  A/m, exchange stiffness  $A_{ex} = 1 \times 10^{11}$ , and polarization P = 1 [6]. The terminal T3 is used to read the position of domain wall according to MTJ resistant. The resistant model of MTJ is based on supplied voltage, tunnelling oxide thickness $(t_{ox})$ , and angle of magnetization between free layer and pinned layer. The resistant model of proposed domain wall device is described in [4, 5]with  $R = \frac{A}{B \cdot x + C}$ , where  $A = RA_{AP} \cdot RA_P \cdot RA_{DW}$ ,  $B = (RA_{AP} - RA_P)RA_{DW} \cdot W$ , and  $C = RA_P \cdot RA_{DW} \cdot W \cdot L + C$  $(RA_{AP} \cdot RA_{P} - 0.5RA_{P} \cdot RA_{DW} - 0.5RA_{AP} \cdot RA_{DW})W \cdot L_{DW}.$ According to these modelling equations, given the length of free layer (100nm), width of free layer W, DW position x(middle point), all MTJ resistances,  $RA_{AP}$ ,  $RA_{DW}$ , and  $RA_P$  can be readily computed. Therefore, the output voltage can be computed as a rational function of DW positions (0 < x < 100 nm). Finally, Figure. 3(c) exhibits a hysteresis phenomenon found in the DW switching characteristics. The Figure 2 (c) shows the critical current simulation for DW motion speed V as a function of current density j. The domain wall velocity is equal to 0, when input current density is equal and less than current density j. This critical current can be adjusted either by changing memristor memristances or by changing device width to. In this paper, we employ different combination of input and NULL module memristor to achieve hysteresis of NCL gate.

# 4 Spin-Based TTL Gate Implementation



(a) Figure 4: (a) TH23 static NCL gates. (b) TH23 DWL NCL gate.

A conventional NCL gate typically consists of *four* transistor sub-networks: SET, RESET, HOLD0, and HOLD1, each of which incurs significant hardware usage. Furthermore, conventional NCL gates rely on parasitic capacitors to keep state information and logic inverters to implement feedback mechanisms. Unfortunately, these practices render NCL gates quite vulnerable to leakage, noise, and charge sharing problem. Finally, these feedback inverters can slow down the gates due to the intrinsic switching contention involved. Specifically, as shown in Figure. 4(a), the extra transistors to build HOLD0 and HOLD1 make NCL gates quite expensive to build. It is obviously to see that large numbers of transistor are used to keep delay-insensitive performance, rather than "doing the real work", in a conventional static NCL gate.

In contrast, our spintronic-based NCL TH gate consumes much less hardware to implement. To compare, in Figure. 4(c), we have drawn a side-by-side circuit diagram of a CMOS-based and a spintronic-based TH23 gate, with their component correspondence in red dashed lines. In our method, domain wall NCL gate employs both memristors, whose conductance can be precisely modulated by a charge or flux through it, and domain wall motion devices to implement. Specifically, the weighted currents can be generated through programming different memristors by constants  $V_{dd}$ ,  $\frac{V_{dd}}{m_{i,j}d}$ . The sum of analog current is obtained through connecting in parallel of inputs based on Kirchhoff's Current Law with I-V resistor, which is implemented by a domain wall device. Figure. 4 (a) shows architecture of proposed DWL NCL gate. The inputs binary are represented by  $V_1, \cdots V_n$  with  $V_{dd}$  is 1 and GND is 0, receptively. The sum of input current is depend on number of inputs is equal to 1. So that, the more number of input is 1, the bigger sum of input current is obtained to inject to domain wall logic device. The hysteresis of NCL logic can be also implemented by domain wall device through critical current and NULL module memristor.

However, transforming the standard Boolean-based NCL circuit into a spintronic-based NCL circuit is not straightforward. On the contrary, different inputs memristances and NULL module memristance have to be carefully computed to ensure the correct NCL operation. Before diving into algorithm details, we first introduce some default definitions and values shown in step 1-8 in Algorithm 1. The given Boolean NCL netlist G is input to the algorithm. The in-

dex i, j indicates different NCL gates and different inputs of an individual NCL gate. Moreover, a given value of  $V_{dd}$  is used to generate different weighted current through memristor.  $T_i$  and  $w_{i,j}$  are written by function of Thres(G)and Weigh(G), which is used to read the logic threshold and weight of individual NCL gate from given Boolean NCL netlist. The calculated memristance of input  $m_{i,j}$  and NULL module  $M_i$  are the outputs of Algorithm 1, which is constrained in range of  $m_{min}$  and  $m_{max}$  obtained from memristor device parameters. In our case, this range is between 100 $\Omega$  to < 38000 $\Omega$ . Finally, two domain wall device critical current densities,  $Jc, i^1$  and  $J^2_{c,i}$ , are used to achieve hysteresis of NCL based on the measurements of a DW device [6]. The domain wall device critical current density  $J_{c,i}^2 = 6.2 \times 10^{12} A/m^2$  will cause domain wall moving with 20m/s velocity. On another side, current density  $J_{c,i}^1 = 5.2 \times 10^{12} A/m^2$  will cause domain wall moving with 0m/s velocity. The critical current  $I_{c,i}^1$  and  $I_{c,i}^2$  can be calculated by the injection area and critical current density. In order to explain the algorithm clearly, we consider two Boolean NCL gates TH23W2 and TH44 with function of f = A + BC and f = ABCD, respectively. For boolean NCL function f = A + BC, three inputs weights are (2,1,1) with threshold is 2. Since the weights of each input are different with each other, therefore, the algorithm from step 19 - 27are used. By given those conditions, the three input and NULL module memristance values are calculated for function f = A + BC as follows, the sequence of memristance A, B, C is  $m_{1,1}, m_{1,2}, m_{1,3}$ . In general, four cases need to be considered. In Case 1: Hysteresis-set 1, the sum of input current needs to be smaller than the threshold, therefore, both  $\frac{V_{dd}}{m_{1,2}} - \frac{V_{dd}}{M_1} < I_{c,1}^1$  and  $\frac{V_{dd}}{m_{1,3}} - \frac{V_{dd}}{M_1} < I_{c,1}^1$  have to be true. In Case 2: Set 1, the sum of all input currents has to be larger than the threshold value in order to move the domain wall, therefore, both  $2 \cdot \frac{V_{dd}}{m_{1,2}} - \frac{V_{dd}}{M_1} > I_{c,1}^2$  and  $\frac{V_{dd}}{m_{1,1}} - \frac{V_{dd}}{M_1} > I_{c,1}^2$  have to be true. In Case 3: Hysteresis-set NULL, the sum of input currents needs to be larger than the negative threshold in order to not make domain wall moving back, therefore, both  $\frac{V_{dd}}{m_{1,2}} - \frac{V_{dd}}{M_1} > -I_{c,1}^1$  and  $\frac{V_{dd}}{m_{1,1}} - \frac{V_{dd}}{M_1} > -I_{c,1}^1$  hold. Finally in Case 4: NULL, the sum of input currents needs to be reset to be zero, thus making the domain wall moving back to initial position, thus  $-\frac{V_{dd}}{M_1} < -I_{c,1}^2$  is true.

The possible memristance of 3 different inputs and Null module are given by equation above. with  $V_{dd}$  is equal to 0.3V The memristance of input A is  $m_{i,A} = 608\Omega$ , memristance of input B is  $m_{i,B} = 1209$ , memristance of input C is  $m_{1,C} = 1209\Omega$ , memristance of Null module is  $M_i = 1209\Omega$ , receptively. For the TH44 gate f = ABCD, the method is similar with above, the memristance of input A is  $m_{i,A} = 2418\Omega$ , memristance of input B is  $m_{i,B} = 2418\Omega$ , memristance of input D is  $m_{i,D} = 2418\Omega$ , memristance of Null module is  $M_i = 1209\Omega$ .

The algorithm is applying to 27 typical TH gate truth tables, in order to verify results. This algorithm shows that the TH gate can be classified to 5 different groups according to its threshold. The parameter of domain wall device is based on paper [6]. According to the configuration of this DW device, current density  $6.2 \times 10^{12} A/m^2$  will cause domain wall moving with 20m/s velocity, on the contract, current density  $5.2 \times 10^{12} A/m^2$  will cause domain wall moving with 0m/s velocity. The results of mapping Algorithm1 is shown in Figure 1. There are 3 special NCL function, which are not threshold gates. In order to use our method,

**Algorithm 1:** Calculating Stochastic weight and threshold algorithm

| 0111   |                                                                                                                          |  |  |  |  |  |
|--------|--------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
|        | Input : G-Boolean NCL netlist                                                                                            |  |  |  |  |  |
|        | Output: N-DWL NCL netlist                                                                                                |  |  |  |  |  |
| 1      | $V_{dd} \leftarrow 0.3V$                                                                                                 |  |  |  |  |  |
| 2      | $S \leftarrow 40 nm^2$ // injection area of domain wall                                                                  |  |  |  |  |  |
| 3      | $T_i \leftarrow \text{Thres(G)}$ // read threshold of each node                                                          |  |  |  |  |  |
| 4      | $w_{i,j} \leftarrow Weigh(G)$ // read weight of each node                                                                |  |  |  |  |  |
| 5<br>6 | $n_{min} \leftarrow 100\Omega$ // set minimal memristance<br>$n_{max} \leftarrow 38000\Omega$ // set minimal memristance |  |  |  |  |  |
| 7      | $Ic_{i}^{1} \leftarrow S \cdot 5.2 \times 10^{12} A/m^{2}$ // set critical current density for DW velocity=0             |  |  |  |  |  |
| 8      | $Lc_i^2 \leftarrow S \cdot 6.2 	imes 10^{12} A/m^2$ // set critical current density for DW velocity=20m/s                |  |  |  |  |  |
| 9      | $\mathbf{\hat{or}}\ i=1:N\ \mathbf{do}$                                                                                  |  |  |  |  |  |
| 10     | if $w_{i,j} = w_{i,1}, \cdots, = w_{i,n_i}$ then                                                                         |  |  |  |  |  |
| 11     | $\label{eq:minimize} \begin{tabular}{lllllllllllllllllllllllllllllllllll$                                                |  |  |  |  |  |
| 12     | subject to :                                                                                                             |  |  |  |  |  |
| 13     | $T_i \cdot \frac{\sqrt{dd}}{m_{i,j}} - \frac{\sqrt{dd}}{M_i} > Ic_i^2 \qquad // \text{ set } 1$                          |  |  |  |  |  |
| 14     | $(T_i - w_{i,j}) \cdot \frac{V_{dd}}{m_{i,j}} - \frac{V_{dd}}{M_i} < Ic_i^1 \qquad \qquad // \text{ hysteresis}$         |  |  |  |  |  |
| 15     | $-\frac{V_{dd}}{M_i} < -Ic_i^2 \qquad // \operatorname{null}$                                                            |  |  |  |  |  |
| 16     | $\frac{V_{dd}}{m_{i,j}} - \frac{V_{dd}}{M_i} > -Ic_i^1 $ // hysteresis                                                   |  |  |  |  |  |
| 17     | $m_{min} < m_{i,j}, M_i < m_{max}$ // device constraint                                                                  |  |  |  |  |  |
| 18     | else                                                                                                                     |  |  |  |  |  |
| 19     | $w_{min} \leftarrow findmin(w_{i,i})$ // find the minimal boolean weight of input                                        |  |  |  |  |  |
|        | j=1:n                                                                                                                    |  |  |  |  |  |
| 20     | $m_{i,j=1:n} \leftarrow m_{w_{min}} \cdot \frac{w_{i,j}}{w_{min}}$ // calculate memristance of each input                |  |  |  |  |  |
| 21     | $minimize(m_{min})$ // find the minimal memristance of input i=1:n                                                       |  |  |  |  |  |
| 22     | subject to :                                                                                                             |  |  |  |  |  |
|        | VII VII 2                                                                                                                |  |  |  |  |  |
| 23     | $T_i \cdot \frac{uu}{m_{w_{min}}} - \frac{uu}{M_i} > Ic_i^2 \qquad // \text{ set } 1$                                    |  |  |  |  |  |
| 24     | $(T_i - w_{min}) \cdot \frac{v_{dd}}{mw_{min}} - \frac{v_{dd}}{M_i} < Ic_i^1 \qquad // \text{ hysteresis}$               |  |  |  |  |  |
| 25     | $ - \frac{V_{dd}}{M_i} < -Ic_i^2  // \operatorname{null} $                                                               |  |  |  |  |  |
| 26     | $\frac{V_{dd}}{m_{w_{min}}} - \frac{V_{dd}}{M_i} > -Ic_i^1 \qquad // \text{ hysteresis}$                                 |  |  |  |  |  |
| 27     | $m_{min} < m_{i,i}, M_i < m_{max}$ // device constraint                                                                  |  |  |  |  |  |
|        |                                                                                                                          |  |  |  |  |  |

we decompose these gates to sub NCL gate.

| NCL gate | Boolean function  | Weight: Threshold | Memristance Range( $\Omega$ )                                                                                   |
|----------|-------------------|-------------------|-----------------------------------------------------------------------------------------------------------------|
| TH12     | A+B               | (1,1:1)           | $m_{i,A}, m_{i,B} \in [100, M_i/2]; M_i \in [100, 1209]$                                                        |
| TH13     | A+B+C             | (1,1,1:1)         | $m_{i,A}, m_{i,B}, m_{i,C} \in [100, M_i/2]; M_i \in [100, 1209]$                                               |
| TH14     | A+B+C+D           | (1,1,1:1)         | $m_{i,A}, m_{i,B}, m_{i,C}, m_{i,D} \in [100, M_i/2]; M_i \in [100, 1209]$                                      |
| TH22     | AB                | (1,1:2)           | $m_{i,A}, m_{i,B}, M_i \in [100, 1209]$                                                                         |
| TH23     | AB+AC+BC          | (1,1,1:2)         | $m_{i,A}, m_{i,B}, m_{i,C}, M_i \in [100, 1209]$                                                                |
| TH23W2   | A+BC              | (2,1,1:2)         | $m_{i,A} \in [100, M_i/2]; m_{i,B}, m_{i,C}, M_i \in [100, 1209]$                                               |
| TH24     | AB+AC+AD+BC+BD+CD | (1,1,1,1:2)       | $m_{i,A}, m_{i,B}, m_{i,C}, m_{i,D}, M_i \in [100, 1209]$                                                       |
| TH24W2   | A+BC+BD+CD        | (2,1,1,1:2)       | $m_{i,A} \in [100, M_i/2]; m_{i,B}, m_{i,C}, m_{i,D}, M_i \in [100, 1209]$                                      |
| TH24W22  | A+B+CD            | (2,2,1,1:2)       | $m_{i,A}, m_{i,B} \in [100, M_i/2]; m_{i,C}, m_{i,D}, M_i \in [100, 1209]$                                      |
| TH33     | ABC               | (1,1,1:3)         | $m_{i,A}, m_{i,B}, m_{i,C} \in [100, (2/3) \cdot M_i]; M_i \in [100, 1209]$                                     |
| TH33W2   | AB+AC             | (2,1,1:3)         | $m_{i,A} \in [100, (3/4) \cdot M_i]; m_{i,B}, m_{i,C} \in [100, (3/2) \cdot M_i]; M_i \in [100, 1209]$          |
| TH34     | ABC+ABD+ACD+BCD   | (1,1,1,1:3)       | $m_{i,A}, m_{i,B}, m_{i,C}, m_{i,D} \in [100, (3/2) \cdot M_i]; M_i \in [100, 1209]$                            |
| TH34W2   | AB+AC+AD+BCD      | (2,1,1,1:3)       | $m_{i,A} \in [100, (3/4) \cdot M_i]; m_{i,B}, m_{i,C}, m_{i,D} \in [100, (3/2) \cdot M_i]; M_i \in [100, 1209]$ |
| TH34W3   | A+BCD             | (3,1,1,1:3)       | $m_{i,A} \in [100, M_i/2]; m_{i,B}, m_{i,C}, m_{i,D} \in [100, (3/2) \cdot M_i]; M_i \in [100, 1209]$           |
| TH34W22  | AB+AC+AD+BC+BD    | (2,2,1,1:3)       | $m_{i,A}, m_{i,B} \in [100, (2/3) \cdot M_i]; m_{i,C}, m_{i,D} \in [100, (3/2) \cdot M_i]; M_i \in [100, 1209]$ |
| TH34W32  | A+BC+BD           | (3,2,1,1:3)       | $m_{i,A} \in [100, M_i/2]; m_{i,B} \in [100, (2/3) \cdot M_i]$                                                  |
|          |                   |                   | $m_{i,C}, m_{i,D} \in [100, (3/2) \cdot M_i]; M_i \in [100, 1209]$                                              |
| TH44     | ABCD              | (1,1,1,1:4)       | $m_{i,A}, m_{i,B}, m_{i,C}, m_{i,D} = \in [100, 2 \cdot M_i]; M_i \in [100, 1209]$                              |
| TH44W2   | ABC+ABD+ACD       | (2,1,1,1:4)       | $m_{i,A}, M_i \in [100, 1209]; m_{i,B}, m_{i,C}, m_{i,D} \in [100, 2 \cdot M_i]$                                |
| TH44W3   | AB+AC+AD          | (3,1,1,1,4)       | $m_{i,A} \in [100, (2/3) \cdot M_i]; m_{i,B}, m_{i,C}, m_{i,D} \in [100, 2 \cdot M_i]; M_i \in [100, 1209]$     |
| TH44W22  | AB+ACD+BCD        | (2,2,1,1:4)       | $m_{i,A}, m_{i,B}, M_i \in [100, 1209]; m_{i,C}, m_{i,D} \in [100, 2 \cdot M_i]$                                |
| TH44W322 | AB+AC+AD+BC       | (3,2,2,1:4)       | $m_{i,A} \in [100, (2/3) \cdot M_i]; m_{i,B}, m_{i,C}, M_i \in [100, 1209]; m_{i,D} \in [100, 2 \cdot M_i]$     |
| TH54W22  | ABC+ABD           | (2,2,1,1:5)       | $m_{i,A}, m_{i,B} \in [100, (5/4) \cdot M_i]; m_{i,C}, m_{i,D} \in [100, (5/2) \cdot M_i]; M_i \in [100, 1209]$ |
| TH54W32  | AB+ACD            | (3,2,1,1:5)       | $m_{i,A} \in [100, (5/4) \cdot M_i], m_{i,B} \in [100, (5/4) \cdot M_i]$                                        |
|          |                   |                   | $m_{i,C}, m_{i,D} \in [100, (5/2) \cdot M_i]; M_i \in [100, 1209]$                                              |
| TH54W322 | AB+AC+BCD         | (3,2,2,1:5)       | $m_{i,A} \in [100, (5/6) \cdot M_i]; m_{i,B}, m_{i,C} \in [100, (5/4) \cdot M_i]$                               |
|          | 1                 |                   | $m_{i} p \in [100, (5/2)cdotM_{i}]; M_{i} \in [100, 1209]$                                                      |

Table 1: Mapping results of proposed Algorithm1 for 27 foundational NCL functions.

| Symbol | Description                       | Value                             |
|--------|-----------------------------------|-----------------------------------|
| α      | damping coefficient               | 0.02                              |
| Ku     | uniaxial anisotropy constant      | $0.59 \times 10^6 \mathrm{J/m^3}$ |
| Xi     | Non-adiabaticity of spin-transfer | 0.2                               |
|        | -torque anisotropy constant       |                                   |
| Ms     | saturation magnetization          | $6 \times 10^5 \text{A/m}$        |
| Р      | polarization                      | 0.6                               |
| Aex    | exchanges stiffness               | $1.1 \times 10^{11} \text{J/m}$   |

Table 2: Device simulation used in simulation of TH44 gate.

To validate our NCL scheme, we have simulated our proposed DWL architecture for a single TH44 gate with the software mumax<sup>3</sup> with its parameters determined by Algorithm 1 and shown in Figure. 2. When the sum of input current which is less or equal to critical current may not cause any movement of domain. At the time of 4 inputs are high, the sum of current is larger than critical current and move domain move to right terminal. Therefore, the different combinations of inputs can make domain wall moving or stepping. The simulation of TH44 gate is shown



Figure 5: Simulation of proposed TH44 gate through domain wall logic device.

in Figure. 5. The number of inputs is increasing sequentially to test hysteresis. Before the four inputs are all ones, the different combinations of input are shown in Figure. 5, A = 0, B = 0, C = 0, D = 0, A = 0, B = 0, C = 0, D = 1,A = 0, B = 0, C = 1, D = 1, , A = 0, B = 1, C = 1, D = 1.At those cases, domain wall is stepped since the sum of input current and NULL module current are not larger than critical current. While the four inputs are all active, the sum of input current and NULL module current is larger than critical current and making domain wall moving. After the domain wall is moving to specific position at time duration of all input currents are ones, the number of inputs current is decreasing. At those cases, the domain wall is not moving back to initial position, since inverse current is not larger than resetting critical current, domain wall is still stepped at its current position. At the time of all inputs are zeros, the sum of input currents and Null module current is larger than resetting critical current, and pushing domain wall back to its original position. The simulation is shown in Figure. 5. From the simulation, hysteresis of NCL logic is implemented through domain wall hysteresis by using different memristance.

### 5 Spin-Based Dual-Rail NCL Circuits

Figure. 6(1.a) illustrates our architecture of a DWL-based dual-rail NCL. The two adjacent domain wall devices with same resistance are connected with a shared terminal injected by a NULL module current. The two other terminals are injected by different current sums of inputs combinations. Usually, one side is injected by current sum of  $D^0$ , another side is injected by current sum of  $D^1$ . The resistance of vertical write-current path for left  $R_l$  and  $R_r$  right side domain wall are same and given by the device measurements. To clearly explain its operation, we also depicted its equivalent analog circuit model as shown in Figure. 6 (1.b), (1.c), (1.d). In Figure. 6(1.b), the NULL case happens with two input combinations: the inputs are all zero,  $V_{sum}^0 = 0$  and  $V_{sum}^1 = 0$ ,  $V_{null}$  is larger, thus, the two currents with opposite direction are created. If we set current direction from NULL to input terminal in positive direction, the combinations of sum of input currents and the NULL



Figure 6: (1.a) DWL-based dual-rail NCL implementation, the two dual rail bits can be implemented through two domain wall device which is separated by a shared terminal. (1.b) The equivalence analog circuit of proposed DWL duail rail architecture in NULL case. (1.c) The equivalence analog circuit of proposed DWL duail rail architecture in DATA 1 case. (1.d) The equivalence analog circuit of proposed DWL duail rail architecture in DATA0 case. (2) CMOS duail rail NCL architecture of one bit full adder. (3) DWL duail rail NCL architecture of one bit full adder, the NPN amplifier is used to increase output current of DW device.

module current will be smaller than the negative critical current, therefore, the domain wall will move back to its initial positions. In Figure. 6(1.c), the input vector  $V_0^0 \cdots V_n^0$  has smaller voltages than the NULL module supplied voltage  $V_{null}$ , therefore, the domain wall device for input  $V_{sum}0$  is not moving. On the other side, the input vector  $V_0^1 \cdot V_n^1$  has higher voltages than the NULL module supplied voltage  $V_{null}$ , therefore, the domain wall device for input  $V_{sum}1$  will move. Finally in Figure. 6(1.d), the input vector  $V_0^0 \cdots V_n^0$  has a higher voltage than the NULL module supplied voltage  $V_{null}$ , therefore, the domain wall device for input  $V_{sum}1$  will move. On the other side, the input vector  $V_0^1 \cdots V_n^1$  has a smaller voltage than the NULL module supplied voltage  $V_{null}$ , therefore, the domain wall device for input  $V_{sum}0$  will move. On the other side, the input vector  $V_0^1 \cdots V_n^1$  has a smaller voltage than the NULL module supplied voltage  $V_{null}$ , therefore, the domain wall device for input  $V_{sum}0$  will move. On the other side, the input vector  $V_0^1 \cdots V_n^1$  has a smaller voltage than the NULL module supplied voltage  $V_{null}$ , therefore, the domain wall device for input  $V_{sum}1$  will not move.

To further exam our proposed DWL-based dual-rail logic, one-bit NCL full adder is implemented. In Figure. 6 (2), this one-bit full adder employs double TH23 and TH34W2 gate to implement DATA0 and DATA1. The schematic of one bit full adder is shown in Figure. 6 (2), where X and Y are input addends and C is carry input. The optimized circuit is obtained through TCR method 2 [11], and the carry out is given by  $C_o^0 = X^0 Y^0 + C^0 X^0 + C^0 Y^0$ ,  $C_o^1 = X^1 Y^1 + C^1 X^1 + C^1 Y^1$ ,  $S^0 = X^0 C_o^1 + C_o^1 Y^0 + C_o^1 C^0 + X^0 Y^0 C^0$ , and  $S^1 = X^1 C_o^0 + C_o^0 Y^1 + C_o^0 C^1 + X^1 Y^1 C^1$ . Therefore, the one bit full adder can be implemented through four TH NCL gates, TH34W2 and TH23 gates. Although, the paper [11] try to reduce transistor size by TCR optimization, the area and power consumption are still significant. In contrast, Figure. 6(3) shows our proposed DWL-based NCL implementation. The two TH23 NCL gates are implemented by two domain wall devices connected through a shared terminal and similarly to TH34W2 gate. The operation of DWL for dual rail architecture is as same as previous proposed single static DWL NCL gate.

#### 6 Performance Comparison and Analysis

In this section, we compare the performance between the conventional and our proposed DWL NCL circuit for 1-bit, 4-bit, 8-bit, 16-bit, 32-bit full adders. The conventional NCL full adder follows the previous architecture presented. The circuit is implemented and simulated using IBM SOI1250 45nm CMOS process standard cell library. The simulation is using nominal power supply voltage of 0.92V, temperature 27C, and capacitive load of 10fF. The proposed DWL is using the parameter in Figure. 2. In Figure. 7(a), we compare their delay performance. Our proposed DWL-based



Figure 7: (a). Delay measurement of NCL full adder with increasing bits. (b). Energy measurement in log scale of NCL full adder with increasing bits. (c). Area measurement in log scale of NCL full adder with increasing bits.

adder has a slightly higher delay than its CMOS counterpart, mainly because the velocity of its DW moving is only around 20m/s. This limitation can be mitigated by adjusting device thresholds to create larger writing current. In our case, we use only  $Jc_i^2 = 6.2 \times 10^{12} \text{A/m}^2$  in order to achieve higher energy efficiency. Since the full adder is fully pipelined, its delay does not increase with bit width. In Figure. 7(b), we also compared the energy consumption of these two different implementations. Our proposed NCL circuit is running under very low operation current, only few  $\mu W$  for memristors,  $0.15\mu W$  for sensing unit and a few  $\mu W$ fro DW device. The Figure. 7(b) shows its power saving in log scale. On average, our proposed DWL full adder can achieve about 20 times energy saving for a 32bit full adder. Finally in Figure. 7(c), we compare the chip layout area between the CMOS-based NCL and DWL-based NCL full adders. By using 3D structure for our proposed dual-rail DWL-based NCL full adder, the area of our proposed full adder is only about 1/8 of the size of CMOS version.

# 7 Conclusion

Implemented with the CMOS device technology, many innovative logic circuit design methodologies, such as threshold logic and NCL, prove to be difficult for wide adoption due to their high costs. Fortunately, emerging spintronic devices present ample opportunities to innovate in logic circuit design. This work a first step towards this direction. One valuable lessor we learned from this study is that the key to the success in using emerging devices for logic circuits is how to natively exploit the inherent physical property of these emerging devices, instead merely treating them simply as some "super" switches to replace CMOS transistors.

#### 8 References

- P. A. Beerel, R. O. Ozdag, and M. Ferretti. A Designer's Guide to Asynchronous VLSI. Cambridge University Press, New York, NY, USA, 1st edition, 2010.
- [2] M.-C. Chang, M.-H. Hsieh, and P.-H. Yang. Low-power asynchronous ncl pipelines with fine-grain power gating and early sleep. *Circuits and Systems II: Express Briefs, IEEE Transactions on*, 61(12):957–961, Dec 2014.
- [3] D. Fan, M. Sharad, A. Sengupta, and K. Roy. Hierarchical temporal memory based on spin-neurons and resistive memory for energy-efficient brain-inspired computing. arXiv preprint arXiv:1402.2902, 2014.
- [4] D. Fan, Y. Shim, A. Raghunathan, and K. Roy. STT-SNN: A spin-transfer-torque based soft-limiting

non-linear neuron for low-power artificial neural networks. *CoRR*, abs/1412.8648, 2014.

- [5] X. Fong, S. Gupta, N. Mojumder, S. Choday, C. Augustine, and K. Roy. Knack: A hybrid spin-charge mixed-mode simulator for evaluating different genres of spin-transfer torque mram bit-cells. In Simulation of Semiconductor Processes and Devices (SISPAD), 2011 International Conference on, pages 51–54, Sept 2011.
- [6] S. Fukami, M. Yamanouchi, K.-J. Kim, T. Suzuki, N. Sakimura, D. Chiba, S. Ikeda, T. Sugibayashi, N. Kasai, T. Ono, and H. Ohno. 20-nm magnetic domain wall motion memory with ultralow-power operation. In *Electron Devices Meeting (IEDM), 2013 IEEE International*, pages 3.5.1–3.5.4, Dec 2013.
- [7] C. Jeong and S. Nowick. Optimal technology mapping and cell merger for asynchronous threshold networks. In Asynchronous Circuits and Systems, 2006. 12th IEEE International Symposium on, pages 10 pp.-137, March 2006.
- [8] T. Koyama, K. Ueda, K.-J. Kim, Y. Yoshimura, D. Chiba, K. Yamada, J.-P. Jamet, A. Mougin, A. Thiaville, S. Mizukami, et al. Current-induced magnetic domain wall motion below intrinsic threshold triggered by walker breakdown. *Nature nanotechnology*, 7(10):635–639, 2012.
- [9] M. Ligthart, K. Fant, R. Smith, A. Taubin, and A. Kondratyev. Asynchronous design using commercial hdl synthesis tools. In *Proceedings of the* 6th International Symposium on Advanced Research in Asynchronous Circuits and Systems, ASYNC '00, pages 114–, Washington, DC, USA, 2000. IEEE Computer Society.
- [10] F. Parsan, W. Al-Assadi, and S. Smith. Gate mapping automation for asynchronous null convention logic circuits. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 22(1):99–112, Jan 2014.
- [11] S. Smith, R. DeMara, J. Yuan, D. Ferguson, and D. Lamb. Optimization of {NULL} convention self-timed circuits. *Integration, the* {*VLSI*} *Journal*, 37(3):135 – 165, 2004.
- [12] Y. Zhang, W. S. Zhao, D. Ravelosona, J.-O. Klein, J. V. Kim, and C. Chappert. Perpendicular-magnetic-anisotropy cofeb racetrack memory. *Journal of Applied Physics*, 111(9):-, 2012.