# Ultra Low-Power Wireless Sensor Node Design for ECG Sensing Applications

by

Jaeyoung Kim

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in The University of Michigan 2017

Doctoral Committee:

Professor Pinaki Mazumder, Chair Assistant Professor Cynthia A. Chestek Professor Trevor N. Mudge Professor Dennis M. Sylvester C Jaeyoung Kim 2017

All Rights Reserved

To my God who has always led me with his goodness and love and Yehee, my lovely wife, my best counselor, and supporter, and my beloved Sarang and Hanbit

with love and gratitude

### ACKNOWLEDGEMENTS

First, I am extremely grateful to my advisor, Professor Pinaki Mazumder. When I began my Ph.D at the University of Michigan, I was new to VLSI circuit designs. He has patiently supported my study and provided good opportunities. He is a visionary scholar, so I could learn his perspective toward the future research works and his analytical skills to define a problem and to pursue the next work. Without his professional and personal supports, this dissertation would never have been completed.

I would like to express my sincere appreciation to my dissertation committe. I appreciate Professor Dennis M. Sylvester as he gladly accepted to be on my dissertation committee. I was fortunate to be able to serve as his Graduate Student Instructor (GSI) for four semesters in different VLSI courses (EECS312, 427, and 627). His generous selection of me as his GSI unburdened my financial matters. In addition, I could learn his intuition and knowledge of modern VLSI design from his lectures and discussions. That was extremely helpful to conduct my research. I also appreciate Professor Trevor M. Mudge for his gracious agreement to be on my dissertation committee. His invaluable feedback and advices to my dissertation have helped me to define and to solve a research problem more scientifically. I also appreciate Professor Cynthia A. Chestek as she has offered rational advices in a variety of perspectives on my dissertation. Her advices have made my dissertation more reasonable.

I am very blessed to work with the past and current fellow students of the research group. Dr. Kyungjun Song always encouraged me to work patiently, and guided me how to walk through this journey. Dr. Idongeist E. Ebong was alway cheerful to me. He has become my good friend and taught me how to proceed on the way that I had never gone through. I thank him for his contribution to settle our group's unofficial party. I also thank Zhao Xu, Yalcin Yilmaz, Mahmood Barangi, Nan Zheng, and Mahdi Aghadjani. I have not finished my dissertation without their valuable advices, discussions, and answers to my questions in their area of expertise. In particular, WSN project could be successfully finished with their endeavors and passions.

I would like to express my appreciation to my Korean EECS fellows, Dr. Yoonmyung Lee, Dr. Yejoong Kim, and Myungjoon Choi, for their meaning advices to the WSN project.

Many thanks to Rackham Graduate School and EECS department staffs. Stephen Reger has been always available for any administrative helps. Joel VanLaven helped with all CAD related issues. Beth Stalnaker and Steven Pejuan have provided administrative helps. Karen Liska and Anne Rhoade helped me to find GSI positions, and were willing to help me even for my personal affairs.

During my internship at Qualcomm, I could learn practical VLSI design techniques and knowledge from the memory design team, especially from my internship mentor, Dr. Rui Li. His guidance helped me to improve one chapter of my dissertation.

I have received financial support from Samsung Scholarship and the EECS department for my Ph.D study. I would like to express my appreciation for their generosity.

I would like to thank Rev. Sun Myung Lyu, Rev. Jae Joong Hwang, and my friends at Korean Presbyterian Church of Ann Arbor for their prayers and supports.

Most importantly, I would like to express my deepest appreciation to my family. My wife, Yehee Hong, has always shown her devotion, unconditional love, encouragement, and prayer through my long journey at Michigan. I also thank parents, parents-in-law, and grandparents for their ceaseless supports and prayers. I am deeply indebted to them for their love. I also thank my beloved daughter, Sarang Kim, and my beloved son, Hanbit Kim for their pleasant laughs and expressing their love on me.

Lastly, I would like to give glory to God who is always with me in joy and sorrow, who shows His endless love even when I forget Him, and who never gives up on me.

# TABLE OF CONTENTS

| DEDICATIO          | Ν                                                                                                                                                                                | ii                                                                     |
|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| ACKNOWLE           | DGEMENTS                                                                                                                                                                         | iii                                                                    |
| LIST OF FIG        | URES                                                                                                                                                                             | ix                                                                     |
| LIST OF TAI        | BLES                                                                                                                                                                             | xi                                                                     |
| LIST OF AB         | BREVIATIONS                                                                                                                                                                      | xii                                                                    |
| ABSTRACT           |                                                                                                                                                                                  | XV                                                                     |
| CHAPTER            |                                                                                                                                                                                  |                                                                        |
| I. Intro           | $\mathbf{luction}$                                                                                                                                                               | 1                                                                      |
| 1.1<br>1.2<br>1.3  | Motivation                                                                                                                                                                       | $     \begin{array}{c}       1 \\       4 \\       7     \end{array} $ |
| 1.5                | Thesis Organization                                                                                                                                                              | 9                                                                      |
| II. Ultra<br>troca | Low-Power Body Sensor Node Chip Design for Elec-<br>rdiogram Sensing Applications                                                                                                | 11                                                                     |
| 2.1<br>2.2         | Introduction       System Overview         2.2.1       System Design under Limited Power Budget         2.2.2       Sub-block Specifications         2.2.3       Operation Modes | 11<br>15<br>17<br>19<br>22                                             |
| 2.3                | Packet Frame Format                                                                                                                                                              | 24<br>24<br>24<br>27                                                   |
| 2.4                | Block Description                                                                                                                                                                | 28<br>29<br>29                                                         |

|           | 2.4.3 DSP          |                                     |    |
|-----------|--------------------|-------------------------------------|----|
|           | 2.4.4 SRAM .       |                                     |    |
|           | 2.4.5 Packetize    | er/Depacketizer/Main Controller     | 32 |
|           | 2.4.6 RF Trans     | sceiver                             | 34 |
|           | 2.4.7 PMU          |                                     | 36 |
| 2.5       | Simulation Results |                                     | 36 |
|           | 2.5.1 AFE          |                                     | 36 |
|           | 2.5.2 SAR AD       | С                                   |    |
|           | 2.5.3 RF Trans     | sceiver                             | 37 |
|           | 2.5.4 PMU          |                                     | 39 |
|           | 2.5.5 Other Blo    | ocks                                | 40 |
| 2.6       | Conclusions        |                                     | 41 |
| 2.7       | Acknowledgements   |                                     | 43 |
|           |                    |                                     |    |
| III. A Ro | bust 12T SRAM      | Cell Design                         | 46 |
|           |                    |                                     |    |
| 3.1       | Introduction       |                                     | 46 |
| 3.2       | 12T SRAM Cell De   | esign                               |    |
|           | 3.2.1 SRAM C       | ell Structure                       |    |
|           | 3.2.2 Operation    | n Principle                         |    |
|           | 3.2.3 Sizing Co    | onstraint                           |    |
| 3.3       | Analytical Model . |                                     |    |
|           | 3.3.1 Read Sta     | tic Noise Margin                    |    |
|           | 3.3.2 Definition   | ns of Write Margin                  |    |
|           | 3.3.3 Write Sta    | atic Noise Margin Modeling          | 60 |
| 3.4       | Simulation Results |                                     | 63 |
| -         | 3.4.1 Analytica    | al Model                            | 63 |
|           | 3.4.2 Simulatip    | on Setup                            | 65 |
|           | 3.4.3 Read Sta     | tic Noise Margin                    | 65 |
|           | 3.4.4 Static Wr    | rite Margin                         | 67 |
|           | 3.4.5 Dynamic      | Write Margin                        | 70 |
|           | 3.4.6 Leakage (    | Current                             | 70 |
|           | 3.4.7 Performa     | nce                                 | 73 |
|           | 3.4.8 Cell Area    |                                     |    |
| 3.5       | Conclusions        |                                     | 76 |
|           |                    |                                     |    |
| IV. Energ | y-Efficient Hardy  | ware Architecture of Self-Organizin | ng |
| Map       | (SOM) for ECG (    | Clustering                          | 78 |
| -         | × ,                | <u> </u>                            |    |
| 4.1       | Introduction       |                                     | 78 |
| 4.2       | Theoretical Backgr | cound                               | 80 |
| 4.3       | Hardware Architec  | ture                                | 82 |
|           | 4.3.1 Clustering   | g Process                           | 86 |
|           | 4.3.2 SOM Net      | twork Updating Process              | 87 |
| 4.4       | Simulation Results |                                     | 89 |
|           |                    |                                     |    |

| 4.5 Conclusions                      | 90 |
|--------------------------------------|----|
| V. Conclusions                       | 95 |
| 5.1 Related Publications and Patents | 97 |
| BIBLIOGRAPHY                         | 99 |

# LIST OF FIGURES

## Figure

| 1.1  | A typical system of wireless sensor network                                                                                                              | 3  |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1  | A body sensor network (BSN) demonstrating a variety of biosensors                                                                                        | 13 |
| 2.2  | The block diagram of the proposed BSN SoC                                                                                                                | 16 |
| 2.3  | The proposed instruction frame format $\ldots \ldots \ldots \ldots \ldots \ldots$                                                                        | 25 |
| 2.4  | A typical implementation of CRC-16 Generator Polynomial                                                                                                  | 27 |
| 2.5  | The proposed data frame format                                                                                                                           | 28 |
| 2.6  | Analog front-end and 10-bit SAR ADC block diagram and schematic                                                                                          | 30 |
| 2.7  | The operation flowchart of (a) Packetizer and (b) Depacketizer                                                                                           | 33 |
| 2.8  | RF Transceiver schematic $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$                                                         | 35 |
| 2.9  | The waveform of the transmitter and the control signals                                                                                                  | 38 |
| 2.10 | Lifetime vs. power consumption for different types of batteries $\ldots$                                                                                 | 42 |
| 2.11 | The proposed BSN SoC layout                                                                                                                              | 43 |
| 3.1  | Ideal noise margin curve for (a) RSNM and (b) WSNM $\ . \ . \ . \ .$                                                                                     | 49 |
| 3.2  | The proposed 12T SRAM Cell Structure                                                                                                                     | 52 |
| 3.3  | The proposed 12T SRAM Cell Read Operation                                                                                                                | 53 |
| 3.4  | A series of write operation process of the proposed $12\mathrm{T}$ SRAM cell .                                                                           | 55 |
| 3.5  | The proposed 12T SRAM Read Noise Margin Model $\ \ldots \ \ldots \ \ldots$                                                                               | 57 |
| 3.6  | The proposed 12T SRAM Write Noise Margin Model $\ .$                                                                                                     | 59 |
| 3.7  | The Proposed 12T SRAM Analytical Model Simulation                                                                                                        | 64 |
| 3.8  | 50,000 RSNM Monte-Carlo simulation results for 6T, 8T, 10T, and the proposed 12T SRAM cells                                                              | 66 |
| 3.9  | 50,000 WSNM Monte-Carlo simulation results for 6T, 8T, 10T, and 12T SRAM cells at VDD=550 mV, SF corner, and $-30^{\circ}C$                              | 67 |
| 3.10 | WSNM, CWWM, and BLWM 50,000 Monte-Carlo simulation statis-<br>tical distributions for 6T, 8T, 10T, and 12T SRAM cells at VDD=550<br>mV, SF corner, -30°C | 69 |
| 3.11 | Dynamic write noise margin simulation setting                                                                                                            | 71 |

| 3.12 | 50,000 Monte-Carlo DNM simulation results for 6T, 8T, 10T, and the proposed 12T SRAM cells in 40nm CMOS technology | 72 |
|------|--------------------------------------------------------------------------------------------------------------------|----|
| 3.13 | The stick diagram of the proposed 12T bit cell $\ldots \ldots \ldots \ldots$                                       | 75 |
| 4.1  | The proposed SOM hardware architecture                                                                             | 83 |
| 4.2  | An example signal waveform of the proposed SOM hardware archi-<br>tecture                                          | 85 |
| 4.3  | Example results of SOM clustering                                                                                  | 91 |
| 4.4  | The proposed ECG clustering SOM chip layout                                                                        | 93 |

# LIST OF TABLES

## <u>Table</u>

| 2.1  | Operation Mode of the proposed BSN SoC $\hdots$                                     | 23 |
|------|-------------------------------------------------------------------------------------|----|
| 2.2  | Preamble Field in PHY SHR                                                           | 25 |
| 2.3  | SFD Field in PHY SHR                                                                | 25 |
| 2.4  | Transmission Level & Filter Type Field                                              | 26 |
| 2.5  | Frame Control Field                                                                 | 26 |
| 2.6  | Operation Mode Field                                                                | 27 |
| 2.7  | IIR Filters Implemented in DSP Block                                                | 31 |
| 2.8  | Voltages Supplied by PMU                                                            | 36 |
| 2.9  | Analog Front-End Summary                                                            | 37 |
| 2.10 | SAR ADC Summary                                                                     | 37 |
| 2.11 | RF Transceiver Summary                                                              | 39 |
| 2.12 | Power Management Unit Summary                                                       | 40 |
| 2.13 | Digital Blocks Summary                                                              | 40 |
| 2.14 | Power Breakdown of an Individual Block in the BSN System $\ . \ . \ .$              | 41 |
| 2.15 | Power Consumption of Each Operational Mode                                          | 41 |
| 2.16 | Comparison Table of state-of-the-art Bio-medical Application SoCs                   | 44 |
| 3.1  | SRAM Cell Write Margin Simulation Results (50,000 MC, SF Corner, $T=-30^{\circ}C$ ) | 70 |
| 3.2  | SRAM CELL Delay Comparison                                                          | 73 |
| 3.3  | SRAM CELL Area Comparison                                                           | 76 |
| 4.1  | The SOM Clustering Results for Different Types of Beats                             | 88 |
| 4.2  | SOM Clustering Results by Recordings                                                | 89 |
| 4.3  | Comparisons with Other Algorithms                                                   | 90 |
| 4.4  | ECG Clustering SOM SoC Summary                                                      | 91 |
| 4.5  | Comparisons of SOM Hardware Implementation                                          | 94 |

## LIST OF ABBREVIATIONS

ADC analog-to-digital converter **AFE** analog front end **ANNs** artificial neural networks **AR** acknowledge request **ART** adaptive resonance theory **BLWM** bit line write margin **BSN** body sensor network CR cell ratio **CRC** cyclic redundancy check **CWWM** combined wordline write margin **DCVSL** differential cascade voltage switch logic **DNM** dynamic noise margin **DP** data payload **DVFS** dynamic voltage and frequency scaling **DVS** dynamic voltage scaling **DSN** data sequence number **DSP** digital signal processing e beat atrial escape beat ECG electrocariogram ECoG electrocorticography EEG electroencephalogram EMG electromyography **EOG** electrooculography FC frame control FCS frame check sequence **FIR** finite impulse response

**FP** frame pending

**FT** frame type

**IIR** infinite impulse response

ISM Band Industrial, scientific, and medical radio band

 ${\bf KCL}\,$  Kirchhoff's current law

LDO low-dropout

**LNA** low noise amplifier

**LPF** low field potential

LR-WPANs low-rate wireless personal area networks

**LSB** least significant bit

**LVQ** learning vector quantization

 $\mathbf{MAC}$  medium access control

MC Monte-Carlo

 $\mathbf{MFR}\ \mathrm{MAC}\ \mathrm{footer}$ 

 $\mathbf{MHR}\ \mathrm{MAC}\ \mathrm{header}$ 

 $\mathbf{MOE}$  mixture of experts

MRA multiscale recurrence analysis

MSB most significant bit

**OM** operation mode

OOK on-off keying

PA power amplifier

**PDs** power domains

PGA programmable gain amplifier

**PHY** physical

 $\mathbf{PLL}\xspace$  phase locked loop

 $\mathbf{PMU}$  power management unit

 $\mathbf{PR}$  pull-up ratio

**PSDU** PHY service data unit

**Q beat** unclassified beat

 ${\bf RBL}\,$  read bit line

**RF** register file

**RSNM** read static noise margin

**S** beat supraventricular premature beat

**SAR** successive approximation register

 ${\bf SFD}\,$  start-of-frame delimiter

 $\mathbf{SHR}$  synchronization header

SoC system on chip

**SOCMAC** self-organizing cerebellar model articulation controller

 ${\bf SOM}$  self-organizing map

 ${\bf TFS}\,$  transmission level & filter selection

T/H track and hold

 $\mathbf{VTC}$  voltage transfer characteristic

WSN wireless sensor network

 $\mathbf{WSNM}\xspace$  write static noise margin

 ${\bf WWM}\,$  wordline write margin

### ABSTRACT

Ultra Low-Power Wireless Sensor Node Design for ECG Sensing Applications

by

Jaeyoung Kim

Chair: Pinaki Mazumder

Ubiquitous computing, such as smart homes, smart cars, and smart grid, connects our world closely so that we can easily access to the world through such virtual infrastructural systems. The ultimate vision of this is Internet of Things (IoT) through which intelligent monitoring and management is feasible via networked sensors and actuators. In this system, devices transmit sensed information, and execute instructions distributed via sensor networks. A wireless sensor network (WSN) is such a network where many sensor nodes are interconnected such that a sensor node can transmit information via its adjacent sensor nodes when physical phenomenon is detected. Accordingly, the information can be delivered to the destination through this process. The concept of WSN is also applicable to biomedical applications, especially ECG sensing applications, in a form of a sensor network, so-called body sensor network (BSN), where affixed or implanted biosignal sensors gather bio-signals and transmit them to medical providers. The main challenge of BSN is energy constraint since implanted sensor nodes cannot be replaced easily, so they should prolong with a limited amount of battery energy or by energy harvesting. Thus, we will discuss several power saving techniques in this thesis.

The main low power techniques are low voltage operation and duty-cycling. The design challenges for the former are reliability and signal timing since performance is deteriorated as the supply voltage is scaled down. The challenge of the latter is that it is effective only to blocks which are power-hungry and seldom used. Thus, combination of these two techniques are presented to maximize the power saving for the proposed system on chip (SoC) where low-voltage operation was applied to always-on blocks, while a power-hungry RF transceiver was duty-cycled. In addition, a power gating technique was applied to the other blocks. As a result, the prototype chip consumes only 6.1  $\mu$ W in its full operation mode.

In a system perspective, the main issue of applying low voltage operation is reduced noise margin, which is critical to sequential logic, such as memory. Thus, a robust 12T SRAM bit cell is proposed to reach the theoretical write static noise margin (WSNM) limit. This could be achieved by eliminating a feedback of back-to-back inverters by means of data-dependent supply cutoff during write operation. This allows the proposed bit cell to enlarge write margin dramatically. Many previous works also attempt to cutoff the supply, but many of them were not data dependent. The proposed 12T bitcell is compared against the conventional 6T and 8T SRAM bit cells as well as a 10T bit cell. Monte-Carlo (MC) simulation results show the proposed 12T SRAM bit cell is more robust in static and dynamic noise margin than the other compared cells. The area overhead of the proposed bit cell is 1.96 times and 1.74 times greater than the 6T and 8T bit cells, respectively.

In order to increase the duty cycle further, on-chip signal processing is preferred than transmitting a sensed raw data. In this thesis, an energy-efficient hardware architecture of a self-organizing map (SOM) for ECG clustering is proposed. The hardware consists of a pre-processing block and a SOM block. It detects an R-peak, reconstructs the QRS complex around it, and clusters the complex by calculating the Euclidean distance between the complex and the weight vectors of each cell in the SOM network (i.e.  $5\times5$  cells). In the operation mode, the cluster ID related to the minimum Euclidean distance is provided, while the tagged weight vectors are updated in the learning mode. The proposed SoC is  $1,735\times1,020 \ \mu m^2$  in CMOS 65nm LP, and it consumes 5.853 mW at VDD=1.2V.

## CHAPTER I

# Introduction

## 1.1 Motivation

Heart disease has been the world's top leading cause of death over the last decade, increasing its proportion of the number of death [1]. Heart disease is also the top cause of death in U.S. [2]. Thus, the prevention of death due to heart disease is critical to lowering overall mortality. Electrocardiogram [3] is a bio-signal from which heart disease can be early detected. Thus, there have been many attempts to diagnose and classify different heart diseases based on the ECG. However, this process requires well-trained ECG technicians, and the recording of patients' ECG takes usually a few days. Electrodes must be attached on a patient's chest for a few days in order to record the patient's ECG, which means the patient needs to stay in the hospital to record their ECG, and in other cases, long-term hospitalization is required to monitor a patient's heart condition, which diminishes the quality of his or her life. In order to minimize the duration of hospitalization, many portable devices, ECG telemetry, have been developed. A patient can monitor his or her ECG at home by using these devices, and they can send the recorded data to the hospital by means of the Internet. Then, the transferred data is analyzed with abnormality detection algorithms. If an irregularity is detected, the system alerts medical providers so that they can intervene. The abovementioned ECG telemetry system involves a patient for recording his or her ECG signal in order to enhance the quality of his or her life. Thus, the patient is responsible of recording ECG in part. In other words, he or she must attach patch-typed ECG electrodes on the chest appropriately. If the electrodes are not closely affixed, ECG might not be recorded. In addition, the patient must transmit the collected records to medical providers at least everyday. These processes involves a patient's active participation in recording ECG signals. Therefore, some imperfect aspects exist in this system:

- A patient should send the recorded data regularly. If not, important signals for detecting abnormalities might be missing.
- 2. Since a patient must update his or her records periodically, an abnormal symptom might be detected after the worst period (i.e. 24 hours). If such a symptom requires immediate intervention from medical providers, the patient might be in danger for his or her life.
- 3. If patch-type electrodes are not affixed on a patient's chest, ECG signals may not be recorded.
- 4. Patch-type electrodes must be removed and re-applied closely on a patient's chest after taking showers. Otherwise, ECG signals might not be recorded.
- 5. Patch-type electrodes could irritate a patient's skin.

The abovementioned aspects are undesirable to patients as well as medical providers. In order to resolve these issues, the recording device should be affixed well, and it should transmit the recorded signal to the hospital in real time. In addition, there should be a way to circumvent the usage of the patch-type electrodes. One candidate solution which meets all these requirements is a miniaturized implantable



Figure 1.1: A typical system of wireless sensor network.

ECG sensing device in the perspective of wireless sensor network (WSN). 1 and 2 can be solved with a system which transmit data in real-time by means of a handheld device, such as a smartphone or a smartwatch. However, the other 3, 4, and 5 cannot be solved just by introducing a real-time transmission system. These issues can be solved either by a contactless sensor, such as a bio photonic sensor or by implanting a sensor underneath skin. However, the contactless sensor might have the issue number 3 when it does not align well. Moreover, it is difficult to locate such a sensor in between clothes and skin. Thus, the implantation method is more desirable. However, it has a tight power budget in order to last longer on a given battery so that it does not requires a replacement during the required lifespan. Besides, the need for batteries can be eliminated when energy harvesting supplies the required power of the node. In either case, it is necessary to achieve an ultra-low power sensing node.

### **1.2** Design Challenges and Low-power Techniques

Figure 1.1 depicts a typical WSN, comprising six main modules: a sensor, a frontend (e.g. an analog-to-digital converter), a microprocessor, a digital signal processor, a wireless transceiver, and a power management unit including a power source [4]. As this WSN is typically designed for a long operational life-span, power is carefully budgeted where pertinent, and it is energized only when required so that the overall average power is typically  $10\mu$ W to  $100\mu$ W. In a typical case, the power source in the power management unit allocates 20% of power to each module; the real power breakdown will vary depending on specific applications. The microprocessor module with ultra-low power dissipation is highly desirable as it often remains active for continuous monitoring, and it, in part, enables various power-efficient techniques (e.g. with a very low duty cycle for wireless data transmission) [5].

For realization of this ultra-low power WSN sensing node, many design approaches can be considered, including adoption of limited instruction sets at the cost of reduced programmability and versatility [6], usage of smaller feature size devices and their corresponding lower supply voltages, clever design techniques with a similar objective to reduce switched capacitance and switching activity [7], and adaptive circuits and systems for lowering power budgets, such as Dynamic Voltage and Frequency Scaling (DVFS). Many approaches to this adaptive circuits have been suggested: so-called "always correct" such as look-up tables [8] and canary circuits [9, 10, 11, 12], and "fail and correct" such as Razor II [13]. These approaches, however, could have some issues. For example, look-up tables have fixed data in a static memory block, so these data should be determined by intensive simulation results under Process-Voltage Temperature (PVT) variations, so it is not fully fitted optimized. Canary circuits should be designed as the worst case in performance for monitoring, so there is a timing margin between this canary's delay and the critical path of a real fabricated chip. For Razor, secondary latch is always correct, meaning that this latch should always be slower than its primary latch under PVT variations. Furthermore, Razor requires additional cycles to correct erroneous data, bringing about performance degradation.

The issues of the abovementioned approaches to the adaptive circuits and systems mainly relate to timing uncertainty. This is because inappropriate, insufficient budget for clock period or cycle time in digital circuits and systems could lead to incorrect functioning. This timing uncertainty is due to many components, such as Phase-Lock-Loop jitter, clock skew and jitter, power supply noise, PVT variations, and etc. As a result, either supply voltage increased or a cycle time becomes longer as a guard band for ensuring correct operation, resulting in more energy consumption [14]. Among these components, PVT variations have been a significant issue with technology enhancement, switching the design paradigm from deterministic to statistical [15]. Due to increased timing uncertainty, the required guard band has also been increased to make sure correct operation under this increased timing uncertainty. PVT variations are even compounded for operation at the sub-threshold voltage region, further increasing this guard band. Several earlier studies even reported that the timing uncertainty could be more than 200 times in the sub-threshold region due to PVT variations [16]. In order to circumvent the timing uncertainty due to this PVT variations, while reducing the power supply further, asynchronous style design technique was proposed [17].

In addition to the abovementioned techniques, there have been many ultra-low power techniques proposed in the past, but they can mainly be categorized into three techniques: 1) reducing load capacitance, C, 2) scaling the supply voltage,  $V_{DD}$  (e.g. dynamic voltage scaling(DVS), multiple voltage domains, and etc.), and 3) reducing activity factor,  $\alpha$ , (or duty-cycling). This is because the power consumption can be represented by

$$P = C V_{DD}{}^2 f \alpha \tag{1.1}$$

#### where

- C : load capacitance
- $V_{DD}$  : the supply voltage
- f : operating frequency
- $\alpha$  : activity factor.

Thus, any method of reducing these factors can be regarded as a low-power circuit technique. First, scaling load capacitance is achievable by sizing, changing P/N ratio, shorter routes, and different circuit structures. However, this is mostly technology dependent, and its effect is not as much as the other methods. Second, scaling the supply voltage, which is mainly for digital circuits, comes from the idea that power is a product of voltage (V) and current (I). When the supply voltage is scaled, the power consumption can be reduced in a quadratic manner since it linearly reduces the current as well. Thus, this method is really effective. However, it retards a propagation time due to the reduced current, so many previously proposed techniques dealt with the signal timing change resulting from the scaling. Lastly, reducing activity factor (i.e. duty-cycling) is a system level low-power technique. The main idea of the duty-cycling is temporal power-gating (or clock-gating); when a block on-chip has no need for use, it can be power-gated (or clock-gated). This technique is effective in a system where there is a block which takes a majority portion of the system's power consumption and only needs to operate during a short amount of time. In most WSN systems, an RF transmitter is such a block. Since its current is highly related to its transmission range and signal integrity, which are critical to a wireless system, scaling  $V_{DD}$  is undesirable. Therefore, duty-cycling is a good solution to reduce the power consumption of a WSN sensing node due to the RF transmitter. Other techniques which include data compression and feature extraction were proposed to reduce the power consumption further. These techniques are also in a category of reducing the activity factor.

#### **1.3** Contributions

In this work, we adopted low-power circuit techniques mentioned above to implement an ultra-low power body sensor network (BSN) node SoC for ECG sensing applications as discussed in Chapter II. The proposed chip senses an ECG signal, digitally filters it, packetize it, and sends it through an RF transmitter by On-Off Keying (OOK) modulation. The SoC is composed of an AFE, a DSP unit, a PMU block, a wake-up receiver, and an RF transmitter. Five different operation modes are wirelessly controlled by the proposed instruction packet (i.e. 10 octets). In standby mode, the SoC consumes only 912 nW. This is mainly due to the wake-up receiver (760 nW), of which sensitivity is less than -50 dBm. In full operation mode, the SoC transmits raw bio-signal data while only consuming 6.1  $\mu W$  with -10 dBm output power. This can be achieved through the aggressive duty-cycling (0.2526%). Previous reported SoCs reduced their duty-cycle by means of feature extraction, but medical providers may want to diagnose patients from raw bio-signals. Thus, our proposed SoC achieved a high duty-cycle by increasing a transmission rate (10 Mbps). In order to prevent extra process after fabrication, a configurable RC network is embedded on chip, so this chip is a more complete system than the previous SoCs except an energy harvesting unit. We kept the power supply for RF transmitter as a nominal voltage. Instead, we aggressively duty-cycled its usage (i.e. 0.2526%). Thus, it is power-gated in most of times. In addition, the transmitter has seven different transmission power levels which can be configured. This can save power consumption further due to over-powered transmitter at a specific environment. A novel structure of power amplifier (PA) enables such options. In addition, we scaled the supply voltage on analog domain (i.e. 1V) in order to save power consumption, while maintaining its linearity. We further scaled the supply voltage on digital domain (i.e. 550 mV) since most digital blocks remains active, and scaling can reduce their power consumption in a quadratic manner. As we scale the supply voltage down, another design challenge might arise. Since ECG sensing application does not require a very high clock frequency, timing constraints for digital circuits are not too stringent even at low  $V_{DD}$ . However, scaling adversely affects noise margin of sequential logic circuits. In particular, memory is an aggregate of sequential circuits. Thus, ensuring its noise margin at low  $V_{DD}$  is another design challenge.

As we discussed in Chapter III, we proposed a novel 12T SRAM bit cell [18], which is more robust at near-threshold voltage than the state-of-the-art. An ideal theoretical read and write static noise margins (RSNM/WSNM) are discussed. the proposed 12T SRAM bit cell reaches the theoretical WSNM limit. This could be achieved by eliminating a feedback of back-to-back inverters by means of data-dependent supply cutoff during write operation. This allows the proposed bit cell to enlarge write margin dramatically. Many earlier works also attempt to cutoff the supply, but many of them were not data dependent. Monte-Carlo (MC) simulation results show the proposed 12T SRAM bit cell is more robust in static and dynamic noise margin than the conventional 6T and 8T SRAM bit cells as well as a 10T bit cell. The area overhead of the proposed bit cell is 1.96 times and 1.74 times greater than the 6T and 8T bit cells, respectively. Analytical models of WSNM for the 12T bit cell in the super-threshold region and the sub-threshold region are also proposed.

In order to save the power consumption of the sensing node further, more aggressive duty-cycling can be considered. Instead of transmitting the sensed raw data, sensed signals can be digitally processed on-chip for diagnosis, and it can transmit only an alarm signal when an abnormality is detected. This is advantageous in terms of power saving. In order to achieve this, the diagnosis algorithm should be very accurate. However, most diagnosis algorithms are deterministic, so they are vulnerable under noise. Another resilient algorithm against such noise is required to circumvent this issue. One possible solution is machine learning. One advantage of using machine learning algorithm is that when a sensed signal is distorted by noise, it would not affect the diagnosis results of the algorithm when the learning network learned noise patterns. Among many learning algorithms, we adopted self-organizing map [19] since its mapping procedure and its topological property are very similar to real diagnosis. Although many previous works attempt to diagnose based on ECG through the SOM, we further simplified its algorithm to make it more hardware friendly as discussed in Chapter IV. Whereas the prior arts attempted to optimize SOM algorithm for the diagnosis based on ECG, we proposed a hardware architecture and implemented it. The hardware consists of a pre-processing block and a SOM block. It detects an R-peak, reconstructs the QRS complex around it, and clusters the complex by calculating the Euclidean distance between the complex and the weight vectors of each cell in the SOM network (i.e.  $5 \times 5$  cells). In the operation mode, the cluster ID related to the minimum Euclidean distance is provided, while the tagged weight vectors are updated in the learning mode. The proposed SoC is  $1,735 \times 1,020 \ \mu m^2$  in CMOS 65nm LP, and it consumes 5.853 mW at VDD=1.2 V.

#### 1.4 Thesis Organization

The reminder of this thesis is organized as follows. Chapter II discusses how we to design ultra-low power body sensor network (BSN) node SoC in a low-power perspective. We analytically present duty-cycling is a key factor for ultra-low power design. In addition, a discussion of the system architecture of the proposed BSN SoC is presented, and the sub-blocks of the system are described. Chapter III discusses an SRAM bit cell design for ultra-low power SoC. The ideal limit of the SRAM bit cell design is briefly discussed, and then the proposed bit cell is presented. Simulation results are presented in a comparison with other bit cells (i.e. 6T, 8T, and 10T). Chapter IV introduces an SOM algorithm for clustering QRS complexes and its hardware architecture. Lastly, Chapter V concludes this work.

## CHAPTER II

# Ultra Low-Power Body Sensor Node Chip Design for Electrocardiogram Sensing Applications

#### 2.1 Introduction

Wireless sensor network (WSN) has extensively been discussed and studied in recent years since it has many potential applications, such as habitat monitoring [20], location tracking [21], structural health monitoring [22], area monitoring [23], industrial monitoring [24], and greenhouse monitoring [25]. Among these applications, body sensor network (BSN) has attracted scholarly attention [26][27][28][29][30]. This is because chronic disease is responsible for most deaths in modern society, such as heart diseases, hypertension, neuro-degenerative diseases, and senile dementia [31]. These chronic diseases require ceaseless monitoring for prevention since these tend to be asymptomatic and sporadic. Such diseases are often preventable when detected in their early stages [32][33][34]. In addition, people tend to live longer due to medical advances, which causes growth in elderly population [35]. The aforementioned chronic diseases are more likely to develop with advancing age. Thus, more access to health monitoring will be required as the elderly will become a more part of the population. Moreover, most acute diseases require postoperative monitoring as well as medical intervention after surgery [36]. However, todays conventional monitoring system demands that patients stay in the hospital for an extended period, which would seem to lower the quality of their lives. Medical providers need a way to monitor senile patients in order to provide early intervention for chronic disease. At the same time, patients want a good quality of life. In order to satisfy both groups, a continuous and unobtrusive health monitoring system is required. To illustrate this monitoring system, Figure 2.1 shows how many ultra-low power bio-sensor nodes are wirelessly linked to a watch-shaped BSN coordinator. Each node can prolong its lifetime by harvesting energy or applying ultra-low power techniques from the individual node. It monitors bio-signals and wirelessly transmits them to the BSN coordinator which communicates with a gateway so that these data can be delivered to physicians or to an emergency system through the Internet. The medical providers can consistently monitor their customers without disrupting their lives and even retrieve the history of vital signs.

In order to achieve such a system, each sensing node should consume ultra-low power so that it lasts a significant amount of time with the limited energy supplied by a battery. Also, each node should harvest energy for each operation. Many energy harvesting techniques have been proposed, such as photovoltaics [37], temperature gradients [38][39], human power [40], wind/air flow [41], vibrations [42][43], and even nuclear microbatteries [44]. Among those techniques, temperature gradients have shown promise of future commercialization [38][39]. Although these energy harvesting techniques offer a gleam of hope in BSN, the development of a BSN node cannot be complete without ultra-low power techniques since the node should consume less energy than it harvests.

The two most common ways to achieve ultra-low power are (1) Dynamic Voltage and Frequency Scaling (DVFS) and (2) duty-cycling by which the system effectively switches off unused power-hungry units. In [45], an ECG sensing SoC was proposed



Figure 2.1: A body sensor network (BSN) demonstrating a variety of biosensors

with 3-channel analog front end (AFE) by means of clock-gating as well as dutycycling. However, an RF transceiver and a power management unit (PMU) were not included. In addition, voltage reduction technique was not adopted. In [46], an EEG seizure detection SoC was proposed. The SoC is integrated with a commercial RF transmitter, but this is not a power-optimized method since the specifications of the RF transmitter are not determined for a specific application. In addition, it does not include PMU. Another BSN system is proposed in [47], where an integrated chip is wire-bonded with solar cells and thin film battery to measure intraocular pressure. The system shows how energy harvesting can effectively be deployed in the system by managing the harvested energy to prolong the lifetime of the system. However, the RF transceiver was not integrated into the system. A complete system including energy harvesting and even an RF transceiver was proposed in [48]. Duty-cycling was deployed by feature extraction rather than transmitting raw bio-signals so that the size of a packet was dramatically reduced. The reduction of data lessened the power budget of the system so that the system could successfully be operated with minimal energy harvested. However, the RC network for RF communication is offchip, requiring an extra process (e.g. wire bonding) of the SoC to connect to another chip. This changes matching parameters so that the RF communication may not perform as designed. Another BSN-SoC consisting of an AFE and a digital back-end was proposed in [49]. The system only consumes a few tens of nW, but the PMU and the RF transceiver were not integrated. Since the RF transceiver is the most power hungry unit, a means of saving power with the RF transceiver needs to be applied. The SoC demonstrates how the system power could be reduced by means of feature extraction or data reduction. However, transmitting a raw bio-signal is indispensable for physicians to diagnose a patient as well as tracking their medical history. As discussed in Section II, the duty-cycle is determined by the size of data and transmission rate. Hence, high duty-cycling can be achieved either by reducing the size of data or by increasing transmission rate. In this chapter, an ultra-low power BSN SoC is presented with a high transmission rate (e.g. 10Mbps) rather than data reduction, while maintaining raw bio-signal data.

The remainder of this chapter is organized as follows: Section 2.2 describes a principle to design a system constrained within a specific energy budget and its proposed system. Section 2.3 presents the proposed frame format for RF transmission. Section 2.4 describes the sub-block design of the proposed system. Section 2.5 presents simulation results of the proposed BSN system. Section 2.6 draws conclusions.

#### 2.2 System Overview

The proposed BSN system on chip (SoC) diagram is shown in Figure 2.2. The BSN system wirelessly transmits data and receives an instruction that switches the system to a different operation mode. When sensing, the BSN system amplifies a sensed ultra-low-voltage signal via low noise amplifier (LNA) and programmable gain amplifier (PGA). The amplified signal is sampled and converted into a digital signal by 10-bit Successive Approximation Register (SAR) analog-to-digital converter (ADC). This digital signal is processed in a digital signal processing (DSP) unit according to an application (i.e. ECG, EMG, and EEG). The processed data is stored in a memory block. The accumulated data are sent to a packetizer that parcels them for transmission, then sends the data packet to an RF transmitter. The RF transmitter sends the data packet by modulating it. In our proposed system, On-Off Keying (OOK) modulation was adopted since an RF transmitter does not have to transmit logic '0' data in the OOK modulation to become energy efficient. When a user does not require sensing from the BSN system, the system power-gates unused blocks. On the other hand, another user might need sensed data, but does not digitally processed data. In any case, the system should operate according to an operation protocol that is given by a user. This direction is sent to the system wirelessly via an instruction



Figure 2.2: The block diagram of the proposed BSN SoC

packet that triggers the main controller of the system to power-gate unused blocks.

#### 2.2.1 System Design under Limited Power Budget

It is worthwhile to estimate the overall system power consumption by analyzing each power breakdown before determining the specifications of sub-blocks which consist of the entire system. Assume there are three power domains: analog, digital, and RF analog. The analog domain consists of an AFE and an ADC. In the digital domain, there are a DSP block, a memory, a main controller, a wake-up receiver and a depacketizer. The RF analog domain could have two blocks: an RF transmitter and a packetizer. Depending on the design, the wake-up receiver, the packetizer, and the depacketizer can be categorized in the RF analog domain. In addition, a PMU can be regarded as a power supplier to each domain. In this analysis, it is assumed that PMU supports only analog and digital domains since the supply voltage of the RF transmitter affects its communication range. The power consumption in analog block can generally be represented as follows:

$$P_{analog} = I_{analog} \cdot V_{analog}.$$
 (2.1)

In most analog circuits, a current source is used for biasing so the current to the analog circuits can be regarded constant. On the other hand, the power consumption in a digital block can be represented as follows:

$$P_{digital} = C_L \cdot V_{digital}^2. \tag{2.2}$$

In CMOS logic gates, a rail-to-rail swing is generally used, so power is consumed only when charging or discharging a load capacitor if short circuit current is dismissed. When all the blocks are taken into consideration, the total power consumption can be represented as follows:

$$P_{total} = P_{analog} + P_{digital} + P_{RF} + P_{PMU}.$$
(2.3)

The power consumption of the entire system can be represented by:

$$P_{total} = (2 - \eta_{analog}) \left\{ I_{AFE} V_{analog} + (2^N - 1) C_{unit} V_{analog}^2 f_{sample} \right\} + (2 - \eta_{digital}) \\ \left\{ \left( (C_{DSP} + C_{write} + C_{read}) f_{sample} + C_{DEPAC} f_{RX} + C_{ctrl} \frac{f_{TX}}{N} \right) V_{digital}^2 \\ + I_{RX} V_{digital} \right\} + (C_{PAC} V_{TX}^2 + I_{TX,active} V_{TX}) N f_{sample} + P_{TX,static}/T. \quad (2.4)$$

where

- $\eta$ : the efficiency of PMU
- N: the number of data bits
- $f_{sample}$  : sampling rate
- $f_{RX}$  : clock frequency of RX
- $f_{TX}$  : clock frequency of TX
- T : RF transmission period

According to the formula, the total power consumption decreases by reducing the number of bits, the supply voltage, and the sampling rate. Note that duty-cycling only helps to reduce the static power of RF transmitter. The active power is determined by the sampling rate, not by  $\tau$ . Intuitively, this agrees with the duty-cycle idea that the RF transmitter is switched on only when data are sampled. Note that RF blocks are duty-cycled; they are switched off after transmission is complete. With further duty-cycling (i.e. increasing T),  $t_{trans}$  also increases due to more data to be transmitted.
Therefore, the active power would be the same. However, the portion for the static power will be reduced, so it is traded off with the size of memory. The duty cycle,  $\tau$ , can be represented as:

$$\tau = \frac{t_{trans}}{T} = \frac{N \cdot f_{sample} \cdot T/f_{TX}}{T} = N \cdot \frac{f_{sample}}{f_{TX}}.$$
(2.5)

Interestingly, the duty cycle only depends on N,  $f_sample$ , and  $f_TX$ . Since the number of data bits and the sampling rate are application specific,  $f_TX$  seems to be the only factor to determine the duty cycle in a specific application. Thus,  $f_TX$  needs to be chosen as fast as possible in order to save the RF power consumption. On the other hand, a high duty cycle is achievable by trading off digitizing resolution, N, while keeping relatively low career frequency. In this work, we adopted a high carrier frequency (i.e. 2.4 GHz) in order to keep a high resolution (i.e. 10 bits). Note that this approach is for designing an SoC which transmits a raw data. Previous works did not transmit raw data. Rather, signals are processed on-chip (i.e. on-chip diagnosis), and transmitted some necessary information (e.g. abnormality alarm), which is a good approach for saving the system power, but it is not helpful for medical providers to analyze the causes of abnormalities. Thus, we believe transmitting a raw data is necessary unless the diagnosis algorithm perfectly detects any abnormality in ECG.

#### 2.2.2 Sub-block Specifications

The proposed SoC is designed for bio-signal sensing applications, which include ECG, EMG, EEG, neural spike, neural low field potential (LFP), EMG, EOG, and ECoG, so the specification of each sub-block is determined by considering those applications.

#### 2.2.2.1 Analog front-end

The purpose of AFE block is to amplify the sensed signal as well as to suppress noise. Since the amplitude range of bio-signals vary from  $\pm 0.5$  mV to  $\pm 2$  mV and the supply voltage for AFE block is 1V, the gain of AFE block should not be greater than 54 dB in order to prevent saturation. In addition, the gain should be variable according to the input voltage range. Therefore, the gain of AFE block should be adaptive to the input voltage range. The bandwidth of the signal is from 1 Hz to 1 kHz, which would cover the spectrum of most bio signals.

#### 2.2.2.2 Analog-to-Digital converter

ADC determines the resolution of digital signal, and it should cover a meaningful minimum value of the analog input signal. In the proposed SoC, signal is processed in 10 bits. In addition, the sampling rate should be greater than the Nyquist frequency of the input signal, and it is also related to the conversion energy of the ADC. Thus, the sampling rate should not be much greater than the Nyquist frequency in order to achieve ultra-low power consumption, so 2.4 kHz is set as the sampling frequency.

#### 2.2.2.3 DSP block

In the proposed SoC, an infinite impulse response (IIR) digital filter is adopted since it is more power efficient and more flexible than an analog filter. In addition, an IIR filter requires fewer taps than a finite impulse response (FIR) filter, which brings about less area overhead as well as low power consumption.

#### 2.2.2.4 Memory block

The main specifications of the memory block is to retain data reliably with ultralow power consumption. In order to achieve this objective, 8T SRAM bitcell is adopted in the memory block since it was known as a reliable bitcell to use in the sub-threshold region [50]. In addition, the memory block should have enough capacity to keep the sensed data until they are transmitted. As discussed in Section 2.3, a data packet contains 250 octets, which is 200 words (i.e. 2 kb). Thus, when data stored in each bank are transmitted per packet, a bank should contain 200 words by 10 bits. Furthermore, the ADC sampling rate is 2.4 kHz, so 24kb of data are stored per second. Since each bank can contain 2kb of data, 12 banks are required to store all data sensed over one second. Moreover, it takes time to packetize the stored data, so a spare bank is required to prevent the previous data from being written by new data before sending them. One spare bank is enough to buffer the memory block against overwriting, since transmission is complete before writing all words in the spare bank. In order to send 12 packets, 24,960 bits of data should be transmitted as in Equation 2.6.

$$12 \, packets \times 260 \, \frac{octets}{packet} \times 8 \, \frac{bits}{octet} = 24,960 \, bits. \tag{2.6}$$

Since the transmission rate is 10 Mbps, it takes 2.496 ms to transmit 12 packets. During 2.496 ms, 5.9904 sampling cycles have passed (i.e.  $2.496ms \times 2.4kHz = 5.9904$ ). Thus, the buffer space is 6 extra words, which is less than a bank size (i.e. 200 words). In conclusion, the memory contains 13 banks, one of which consists of 200 words by 10 bits. The performance requirement of the memory block is 2.4 kHz for reading with a sampling rate of 2.4 kHz, while the writing is 1 MHz with a transmission rate of 10 Mbps.

#### 2.2.2.5 Wake-up receiver

Since the proposed SoC is idle during the standby mode for power saving, a wake-up receiver is required to wake up the BSN node when new instruction arrives. For most applications, the BSN node spends most of its time sleeping, which makes the wake-up receiver operate in a high duty cycle. Therefore, the ultra-low power consumption of a wake-up receiver is critical to prolong the battery life.

#### 2.2.2.6 RF transmitter

In order to send the collected data to the BSN coordinator, a RF transmitter is employed. The RF transmitter takes a great portion of power breakdown in the entire system, so a high data rate is indispensable to maintain the duty cycle of the RF transmitter to remain very high so that the static power consumption can dramatically reduced as shown in Equation 2.5.

## 2.2.2.7 Power Management Unit

For the power reduction of digital blocks, low supply voltage is required since it has a quadratic relationship with power consumption. Therefore, a DC-DC converter is used to convert the battery voltage to the desired low supply voltage for digital circuits. In addition, there is a demand for a voltage regulator which keeps the supply voltage level stable. This is because the supply voltage from the battery may vary or be noisy. The results of the unstable voltage levels have a great impact on a sensitive analog unit, such as an AFE.

#### 2.2.3 Operation Modes

The proposed BSN SoC operates in five different modes according to the demand of a user. The mode switching is wirelessly controlled by parsing the upcoming instruction packet. The five operation modes are shown in Table 2.1. In any operation mode, the wake-up receiver and de-packetizer are always turned on, as they should monitor an upcoming instruction packet.

|                          | Sensing blocks<br>(Amps, ADC, Memory) | DSP (Filtering) | RF Transmitter |
|--------------------------|---------------------------------------|-----------------|----------------|
| Standby                  | ×                                     | ×               | ×              |
| Sensing                  | $\bigcirc$                            | ×               | ×              |
| RAW data<br>transmission | $\bigcirc$                            | ×               | $\bigcirc$     |
| DSP                      | $\bigcirc$                            | 0               | ×              |
| Full Operation           | $\bigcirc$                            | $\bigcirc$      | $\bigcirc$     |

Table 2.1: Operation Mode of the proposed BSN SoC

#### 2.2.3.1 Standby mode

Standby mode is used when a user does not want to sense anything. For saving power, all sensing blocks, the digital signal processing (DSP) block, and the RF transmitter are power gated.

# 2.2.3.2 Sensing mode

A sensing mode is proposed when a user wants to store signals, but does not want to transmit the stored data for saving the power consumption of the SoC. This mode would be more useful when the storage capacity is huge.

#### 2.2.3.3 RAW data transmission mode

RAW data transmission mode is the same as the sensing mode except this mode allows RF transmission. Thus, the DSP block is the only one that is turned off. This mode would be useful when a user want to check the sensed raw data.

# 2.2.3.4 DSP mode

In DSP mode, all sensing and DSP blocks are turned on, while the RF transmitter is turned off. This mode would be useful when a user wants to monitor a patient and save the system power by suppressing RF transmission. Since the DSP block can diagnose the sensed signal, the SoC can transmit the digitally processed signal only when it detects an abnormal pattern no matter what the application.

## 2.2.3.5 Full operation mode

In full operation mode, all blocks are under operation. Sensing blocks amplify the sensed signal and convert the sensed analog signal to digital, and the DSP block processes the digitally converted signal and stores the processed signal into the memory block. The packetizer packets the stored data and delivers to the RF transmitter which transmits the delivered data wirelessly.

# 2.3 Packet Frame Format

Processed data or instructions are wirelessly transferred by means of a packet. IEEE Standard 802.15.4 is a widely used protocol standard for Low-Rate Wireless Personal Area Networks (LR-WPANs) [51]. Data or instructions are packetized by following the frame format which is specified in the protocol standard. This standard, however, is devised for comprising a flexible network with an excessive length. This makes the protocol unattractive for ultra-low power applications. Thus, we modified the standard in order to optimize the protocol for the power consumption of the proposed system. Since the proposed SoC is wirelessly controlled through the packet, and it wirelessly transmits the processed data also through the packet, the two optimized protocol need to be proposed for transmission and reception.

#### 2.3.1 Instruction Frame Format

The operation of the proposed SoC is determined by parsing the reception frame which contains an instruction which triggers the SoC to operate in a specific operation mode. In contrast with the conventional IEEE Standard 802.15.4, the proposed reception frame format does not comprise frame length field in the physical (PHY) layer since the frame length is known for the SoC. Instead, it consists of synchroniza-



Figure 2.3: The proposed instruction frame format

tion header (SHR) and PHY service data unit (PSDU) as shown in Figure 2.3. The total length of the instruction frame format is 10 octets.

### 2.3.1.1 SHR (Preamble & SFD fields)

A preamble field belongs to SHR in IEEE 802.15.4, which consists of 32 consecutive zeroes. However, OOK modulation is adopted in the system, so the SoC cannot recognize whether a zero is a void or a valid zero. Thus, a new combination of zeroes and ones is required as shown in Table 2.2. For synchronization with a RF wakeup receiver, 26-bit alternating ones and zeroes are provided, and 111000 follows. There is another field called a start-of-frame delimiter (SFD) field in SHR. After

| Bits: | 0 | 1 | 2 | 3 | 4 | 5 | <br>26 | 27 | 28 | 29 | 30 | 31 |
|-------|---|---|---|---|---|---|--------|----|----|----|----|----|
|       | 1 | 0 | 1 | 0 | 1 | 0 | <br>1  | 1  | 1  | 0  | 0  | 0  |

 Table 2.2: Preamble Field in PHY SHR

synchronization with preamble, the parser block recognizes this field as a start of the packet. The proposed frame format adopts the same SFD pattern as shown in Table 2.3.

| Bits:0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|--------|---|---|---|---|---|---|---|
| 1      | 1 | 1 | 0 | 0 | 1 | 0 | 1 |

Table 2.3: SFD Field in PHY SHR

#### 2.3.1.2 PSDU (MPDU field)

PSDU is a medium access control (MAC) sublayer, which consists of two subfields: MAC header (MHR) and MAC footer (MFR). MHR has three sub-fields: Transmission level & Filter Selection (TFS), Frame Control (FC), and Data Sequence Number (DSN), while MFR has one sub-field: frame check sequence (FCS). TFS field specifies how intense the SoC would transmit the sensed data and what type of IIR filter would be applied for data signal processing as shown in Table 2.4.

| Bits: 0-2                | 3-4              | 5-7      |
|--------------------------|------------------|----------|
| Transmission Power Level | Filter Selection | Reserved |
| (000  off, 111  maximum) |                  |          |

Table 2.4: Transmission Level & Filter Type Field

| Bits: 0-2  | 3             | 4            | 5-7            |
|------------|---------------|--------------|----------------|
| Frame type | Frame pending | Ack. requist | Operation Mode |

 Table 2.5:
 Frame Control Field

FC field is shown in Table 2.5. Frame type (FT) field is identical to IEEE 802.15.4 standard, which describes the type of the currently transmitted packet. Frame pending (FP) field notify the recipient that the sender has more data for transmission. In the proposed instruction frame format, both FT and FP fields are set as fixed since the instruction can be complete with one instruction packet. Acknowledge request (AR) field is set to one when the sender requires the recipient to transmit an acknowledge packet for confirming the reception of an instruction. Operation Mode (OM) field specifies in which operation mode the SoC operates. The detailed information of OM field is shown in Table 2.6. DSN field is sequentially decided from 0 to 255 for the frame. When the sender requests an acknowledge packet, this field is used for matching the packet to the instruction packet. FCS field functions as cyclic redundancy check (CRC). The FCS is calculated as specified in IEEE 802.15.4. The FCS

| Operation mode value b2b1b0 | Description  |
|-----------------------------|--------------|
| 000                         | Standby      |
| 001                         | Sensing      |
| 010                         | RAW data     |
| 011                         | Operation    |
| 100                         | Transmission |
| 101-111                     | Reserved     |

Table 2.6: Operation Mode Field



Figure 2.4: A typical implementation of CRC-16 Generator Polynomial.

generator polynomial of degree 16 is shown in Equation 2.7:

$$G_{16}(x) = x^{16} + x^{12} + x^5 + 1. (2.7)$$

An implementation of the equation is depicted in Figure 2.4.

### 2.3.2 Data Frame Format

Data frame format is used to packetize the sensed data for RF transmission. As the instruction frame format, the data frame format comprises SHR, PHY header (PHR), and PSDU, but their component fields are different as shown in Figure 2.5.

## 2.3.2.1 SHR (Preamble & SFD fields)

SHR field of the proposed data frame format is identical to the instruction frame format.



Figure 2.5: The proposed data frame format.

## 2.3.2.2 PHR (Frame length field)

PHR field contains a frame length field, which specifies how many octets would be followed by this field. In the proposed SoC case, it is set to 254.

# 2.3.2.3 PSDU (Frame control, DSN, Data payload, FCS fields)

DSN and FCS fields are as the same as in the instruction frame format. The compositions of the FC field are as in the instruction frame format as shown in Table 2.5, but its subfields are different. FT field contains whether the packet is either for data transmission or for acknowledgement. FP field is set as one when there is a pending packet to be transmitted. AR & OM fields are defined as in the instruction frame format. Data payload (DP) field contains either sensed RAW data or digitally processed data, which depends on what operation mode the SoC is in. To maximize duty-cycling, the length of DP field is set as 250 octets so that the total length of PSDU would be 254 octets which would be within the range of the frame length field.

# 2.4 Block Description

The proposed BSN SoC shown in Figure 2.2 consists of three power domains (PDs): analog power domain, digital power domain, and RF analog power domain.

The power supply of RF analog power domain is 1.2V since the power amplifier in the RF transmitter requires high output for longer transmission. The power supply of analog power domain is 1V, which was determined by the power supply voltage, related to the linearity of analog blocks. When the supply is aggressively lowered for power saving, the lowered supply would adversely affect the linearity. The digital domain has 550 mV as the supply since the performance requirements for the digital blocks are not tight (i.e. up to 10 MHz), as the power is lowered, so the power is quadratically saved, but it also increases leakage. In addition, most digital components have power switches in the proposed SoC, so leakage current will be dominant when they are power-gated. Therefore, 550 mV is a viable option for saving power, while reducing leakage.

#### 2.4.1 Analog Front-end

The analog front-end block consists of a low noise amplifier (LNA), a programmable gain amplifier (PGA) and an output driver as shown in Figure 2.6. The LNA is the interface between off-chip sensors and the BSN SoC. It provides excellent noise suppression and a moderate gain. PGA provides further amplification with a programmable gain to increase the systems dynamic range. The output driver is the interface circuit between AFE and the following SAR ADC. It has a strong drivability onto a large track and hold (T/H) capacitor.

# 2.4.2 ADC

SAR topology is widely used to save conversion energy since it only requires a comparator. The SAR ADC consists of a bootstrap T/H switch [52], a 10-bit charge redistribution DAC [53], a dynamic comparator [54], a bootstrap driver, and a 10-bit SAR block. For better linearity, the capacitive array DAC is laid out as symmetric as possible as shown in Figure 2.6.





#### 2.4.3 DSP

| Filter   | Target bio-signal | Frequency response              |
|----------|-------------------|---------------------------------|
| Filter 1 | Neural spike      | $0.1~{\rm Hz} \sim 1 {\rm kHz}$ |
| Filter 2 | Neural LFP, ECoG  | $0.1~{\rm Hz}\sim 280~{\rm Hz}$ |
| Filter 3 | ECG, EEG          | $0.1~{\rm Hz}\sim 150~{\rm Hz}$ |
| Filter 4 | EMG, EOG          | $10~{\rm Hz}\sim 1~{\rm kHz}$   |

Table 2.7: IIR Filters Implemented in DSP Block

DSP block processes the digitally converted signal according to the desired application. As shown in Table 2.7, four different digital IIR filters were embedded in this block for processing bio-signals, such as ECG, EEG, EMG, EOG, ECoG, and a neural spike. In order to save energy and area, a multiplier is replaced with a MUX and shifter based structure as in [55]. The IIR filter has a standard 3-cascaded second-order structure, bringing about 6-order calculations. As in [55], the proposed IIR filter structure can be programmed or controlled by the main controller by means of the control signals of MUXes. In the proposed SoC, the coefficients for four different filters are stored in ROM, so that when switching to a different filter, the related coefficients would be connected to MUX control input ports.

#### 2.4.4 SRAM

SRAM block stores either the processed signal or a raw digital signal depending on the operation mode as described in Section 2.2. In order to save energy, near-threshold operation is adopted; the operating supply voltage is 550 mV, which would help to reduce power consumption, while minimizing the leakage. For the near-threshold operation, the conventional 8T SRAM bitcell was adopted [56]. Since SRAM should keep data until transmission phase in any operation mode (i.e. 1 second), aggressive reduction of the supply voltage would cause a great amount of leakage.

#### 2.4.5 Packetizer/Depacketizer/Main Controller

The packetizer block prepares data for transmission according to the transmission protocol that was modified and optimized from IEEE 802.15.4 standard for energy efficiency as proposed in Section 2.3. The depacketizer decodes the received data from the RF receiver, and sends them to the main controller. The main controller receives the parsed instruction from the depacketizer, and switches to a certain operation mode according to the instruction. Based on the operation mode, the main controller power-gates some blocks, while it triggers the others to be turned on. Figure 2.7 shows how the packetizer and the depacketizer process the instruction packet and operate the proposed SoC. As shown in Figure 2.7 (b), the depacketizer checks if a stream of upcoming data is matched with preamble and SFD. When it is matched, it stores the following 3 octets data, which is MHR. While streaming these 3 octets data, the CRC generator calculates FCS based on the streams. Once it finishes the calculation, the depacketizer compares the calculated CRC with the upcoming FCS (2 octets) bit by bit. If the coming data do not match with the calculated CRC, the depacketizer dismisses the packet. When it is matched, the depacketizer effectuates the stored instructions. It instructs the main controller to operate in a certain operation mode, and transfers what type of digital filter would be used in that mode. Simultaneously, it sends a message to the packetizer to prepare an ACK packet when ACK is requested. The main controller receives an instruction from the depacketizer, and switches its operation mode according to that instruction. As defined in the operation mode, the main controller manages power gating for each functional block. It also controls interfaces between blocks. When AFE is on, it enables ADC to convert the sensed signal to digital. Once this conversion is complete, the main controller triggers control signals to the digital filter block and memory block based on the operation mode. Either raw data or filtered data would be stored in the memory. Until the memory reaches the full, the controller keeps triggering data conversion, digital filtering, and



Figure 2.7: The operation flowchart of (a) Packetizer and (b) Depacketizer.

writing data in the memory block. When the memory is full, the main controller sends a transmission signal to PMU, the packetizer and the RF transmitter. Then, it keeps controlling for data conversion, filtering, and writing data in the spared bank. Once the transmission control signal arrives at the packetizer, it begins to read data from the memory block bank by bank. Simultaneously, it checks if the RF transmitter is available for the data transmission since it might not be available when ACK packet is requested through the instruction packet. In this case, the depacketizer is using the RF transmitter. When RF transmitter is ready for use, the packetizer triggers the power amplifier (PA) in the RF transmitter based on the prepared packet. Packetizing and transmission are conducted simultaneously. First, the packetizer sends all header information such as SHR and PHR to PA. Then, it reads a word from the memory at 1 MHz, in which case 10-bit data are read, and each bit is transferred at 10 MHz, so 10 Mbps transmission rate can be achieved. After reading and sending all data in 12 banks, the packetizer sends FCS calculated from the CRC-16 polynomial generator, which is the completion of data transmission. Since all transmission is complete, the packetizer sends a disable signal both to PMU and to the RF transmitter.

#### 2.4.6 **RF** Transceiver

The RF Transceiver block transmits the prepared packet wirelessly, and receives data from the user. When the transceiver is unused, it is powered-off so that the power consumption of the BSN system can be minimized. It consists of two subblocks: a RF transmitter and a wake-up receiver. The RF transmitter, as depicted in Figure 2.8, consists of a charge-pump based phase locked loop (PLL) and PA [57]. PLL generates a clean carrier signal which is directly modulated by PA with OOK. When the PA is switched on and off, the load of PLL is changing, which brings about a change of the career frequency. In order to prevent this fluctuation, a buffer is inserted between them. When receiving, a envelop detector and its following comparator in



Figure 2.8: RF Transceiver schematic.

the wake-up receiver demodulate a wake-up signal and deliver the demodulated data to the depacketizer. In order to save area, both the RF transmitter and the wake-up receiver share a matching network. Thus, when a block utilizes the network, the other should wait until it is released by means of a flag. For compensating mismatch, the matching network is reconfigurable.

## 2.4.7 PMU

For power efficient operation, a PMU is employed to convert the battery-supplying 1.2V into two voltages: 0.55V for digital blocks and 1V for analog blocks. PMU consists of two parts: a 2:1 switched-capacitor DC-DC converter and a low-dropout (LDO) regulator. In order to minimize the energy consumption of digital blocks while maintaining an acceptable performance, the supply voltage for the digital power domain is chosen to be a near-threshold voltage, 0.55V. For sensitive analog blocks, such as AFE and wake-up receiver, a LDO is used to isolate them from possible noise sources and variations of the battery supply. The provided voltages with respect to supplied sub blocks are summarized in Table 2.8.

| Voltage          | Supplied Block                                        |
|------------------|-------------------------------------------------------|
| 1.2V             | RF Transmitter, Packetizer, Analog biasing circuitry  |
| 1.0V             | AFE, ADC                                              |
| $0.55\mathrm{V}$ | Controller, DSP, SRAM, Depacketizer, Wake-up receiver |

Table 2.8: Voltages Supplied by PMU

# 2.5 Simulation Results

# 2.5.1 AFE

The simulation results of AFE are shown in Table 2.9. As described in Section 2.2, the bandwidth of AFE is up to 1.2 kHz so that the sampling frequency can be 2.4 kHz which is greater than the Nyquist rate of the target bandwidth (i.e. 1 kHz). Since

| Input referred noise | $4.5 \ \mu V_{rms}$                     |
|----------------------|-----------------------------------------|
| Bandwidth            | $0.005 \text{ Hz} \sim 1.2 \text{ kHz}$ |
| Power Consumption    | $1.3 \ \mu W$                           |
| Gain                 | $44 \sim 65 \text{ dB}$                 |
| Linearity            | $1\%$ THD @ input = 5 $mV_{pp}$         |
| PSRR                 | >52  dB                                 |
| CMRR                 | >52  dB                                 |

Table 2.9: Analog Front-End Summary

the proposed BSN SoC is targeting most bio signals, the gain of AFE is variable according to which bio-signal is sensed. The power consumption of AFE is 1.3  $\mu$ W.

## 2.5.2 SAR ADC

| Topology / Resolution    | 10-bit SAR      |
|--------------------------|-----------------|
| Sampling Clock Frequency | 2.4 kHz         |
| THD                      | -60 dB          |
| ENOB                     | 8.68            |
| SNDR                     | 54 dB           |
| SFDR                     | 60 dB           |
| Max. nonlinearity        | <1  mV (1  LSB) |
| Power Consumption        | 221.8 nW        |

Table 2.10: SAR ADC Summary

The simulation results of the SAR ADC are shown in Table 2.10. The ENOB of the ADC is 8.68 bits, and it consumes 221.8 nW at the sampling frequency,  $f_s=2.4$ kHz. Thus, the ADC is not taking a great portion in the total power consumption.

### 2.5.3 RF Transceiver

The simulation results of the RF transmitter are shown in Figure 2.9. When the 'Enable' signal pulse comes from the controller into the packetizer, the packetizer enables the RF transmitter by toggling the ' $TX_{PWR\_EN}$ ' signal. The toggling causes VCO to be locked in a certain voltage (i.e.  $V_{TUNE}$ ). Once it is locked, it triggers 'RF\_Ready' signal to the packetizer. The packetizer sends the packetized data to



Figure 2.9: The waveform of the transmitter and the control signals.

| Transmitter               |                                          |
|---------------------------|------------------------------------------|
| Carrier Frequency         | 2.46 GHz                                 |
| Output Power              | -10  dBm to  0  dBm                      |
| Maximum Power Consumption | 3.12  mW when transmitting '1'           |
| (Maximum Power Level)     | $1.93~\mathrm{mW}$ when transmitting '0' |
| Minimum Power Consumption | 1.01 mW when transmitting '1'            |
| (Minimum Power Level)     | 773.9 $\mu \rm W$ when transmitting '0'  |
| Minimum Power Consumption | 2.55 $\mu$ W when transmitting '1'       |
| with Duty-Cycling         | 1.96 $\mu W$ when transmitting '0'       |
| Start-up Time             | $<\!10 \ \mu s$                          |
| Data Rate                 | 10 Mbps                                  |
| Modulation Scheme         | On-Off Keying                            |
| Standby Power             | 100 nW                                   |
| Wake-up Receiver          |                                          |
| Sensitivity               | <-50 dBm                                 |
| Power Consumption         | 760 nW                                   |

the RF transmitter (i.e. PAC\_Data), and then the RF transmitter modulates the packetized signal (i.e. RF\_Out). The summary of the RF transceiver is shown in

 Table 2.11: RF Transceiver Summary

Table 2.11. Since OOK modulation is adopted in the proposed SoC, the power consumption of the transmitter depends on the data transmitted. When the maximum output power is set among the eight power levels, it consumes 3.12 mW for transmitting '1'. For the minimum output power, it consumes 1.01 mW when transmitting '1'. With duty-cycling, it consumes 2.55  $\mu$ W when transmitting '1', while 1.96  $\mu$ W when transmitting '0'. The wake-up receiver consumes only 760 nW, and its sensitivity is less than -50 dBm.

#### 2.5.4 PMU

The simulation results of the PMU are shown in Table 2.12. The peak efficiency of the DC-DC converter is 88%. When the load blocks (i.e. digital blocks) consume 10 nW to 3  $\mu$ W, the overall efficiency is always greater than 70 percent. The maximum allowable load current of the LDO is 5  $\mu$ A, which is sufficient for both the AFE and

| Switched-Capacitor DC-DC Converter         |                                      |  |  |
|--------------------------------------------|--------------------------------------|--|--|
| Input Voltage                              | 1.2V                                 |  |  |
| Output Voltage                             | $0.55\mathrm{V}$                     |  |  |
| $P_{out}$ with efficiency greater than 70% | $10~\mathrm{nW}\sim 3~\mu\mathrm{W}$ |  |  |
| Peak Efficiency                            | 88%                                  |  |  |
| LDO                                        |                                      |  |  |
| Input Voltage                              | 1.2V                                 |  |  |
| Output Voltage                             | 1V                                   |  |  |
| $I_{load,max}$                             | $5 \ \mu A$                          |  |  |
| $I_Q$                                      | 100 nA                               |  |  |
| PSRR                                       | 45  dB (250  kHz)                    |  |  |

 Table 2.12: Power Management Unit Summary

the ADC since they only consume around 1.5  $\mu$ W.

# 2.5.5 Other Blocks

| Packetizer              |                                   |
|-------------------------|-----------------------------------|
| Start-up and reset time | $<30 \ \mu s$                     |
| Down Congumntion        | 17.2 $\mu$ W (without Duty-Cycle) |
| Fower Consumption       | 43.4 nW (with Duty-Cycle)         |
| Depacketizer            |                                   |
| Power Consumption       | 73.4 nW                           |
| DSP                     |                                   |
| Power Consumption       | 279 nW                            |

 Table 2.13: Digital Blocks Summary

The summary of the other blocks are shown in Table 2.13. The packetizer consumes 17.2  $\mu$ W, but it is duty-cycled with the RF Transmitter. Since its start-up and resetting take less than 30  $\mu$ s, it is on for less than 2.526 ms when combined with duty-cycle. Therefore, it consumes 43.4 nW as a result. The depacketizer consumes 73.4 nW, while DSP consumes The system power consumption varies in a different operation mode since power-gating is applied to different blocks in a different mode. When all the blocks operate, the total power consumption is 6.1  $\mu$ W for the minimum RF power level. The power breakdown of each block is shown in Table 2.14. Lastly, the system power consumption in a different operation mode is shown in Table 2.15. Note that the power consumption of the system in standby mode is only 912 nW.

| Block                              | Power                       |
|------------------------------------|-----------------------------|
| Analog front-end                   | $1.3 \ \mu W$               |
| SAR ADC                            | $221.8~\mathrm{nW}$         |
| DSP                                | 279  nW                     |
| Packetizer (with duty-cycling)     | 43.4  nW                    |
| Depacketizer                       | 73.4  nW                    |
| SRAM                               | 539  nW                     |
| RF Transmitter                     | $0.89\sim 2.53~\mathrm{mW}$ |
| RF Transmitter (with duty-cycling) | $2.25\sim 6.39~\mu {\rm W}$ |
| RF Wake-up Receiver                | 760  nW                     |
| PMU Efficiency                     | 80%                         |
| Duty Cycle (including power-on)    | 0.2526~%                    |
| Total Power                        | $6.1 \sim 10.2 \ \mu W$     |

Table 2.14: Power Breakdown of an Individual Block in the BSN System

| Mode  | Standby | Sensing      | RAW data TX                       | Digital Operation | Full Operation          |
|-------|---------|--------------|-----------------------------------|-------------------|-------------------------|
| Power | 912  nW | $3.38~\mu W$ | $5.68 \sim 9.82 \ \mu \mathrm{W}$ | $3.81 \ \mu W$    | $6.1 \sim 10.2 \ \mu W$ |

Table 2.15: Power Consumption of Each Operational Mode

# 2.6 Conclusions

The proposed BSN chip only consumes 6.1  $\mu$ W even in the full operation mode due to the aggressive duty-cycle, which is 0.2526%. When the system power-gates all unused blocks in the standby mode, it only consumes 912 nW. The comparison table with the state-of-the-art is shown in Table 2.16. Figure 2.10 shows the relation between the lifetime and the power consumption of a system for different types of batteries. The lifetime of the proposed SoC would be greater than a year when using AAA Alkaline battery with the maximum power level. In standby mode, the proposed SoC prolongs the lifetime greater than ten years by using AAA Alkaline battery. Energy can be harvested from a human body since it can be regarded as a



Figure 2.10: Lifetime vs. power consumption for different types of batteries.



Figure 2.11: The proposed BSN SoC layout.

heat source (i.e. thermoelectric generator). The harvested power from the state of the art is greater than 20  $\mu$ W [38][39]. Therefore, the proposed SoC can perpetually be operated by energy harvesting. As shown in Figure 2.11, the proposed SoC area is 1920×1920  $\mu m^2$  and designed in CMOS 65nm LP (1P9M6X1Z1U).

# 2.7 Acknowledgements

This work could not be done without my colleagues, Nan Zheng, Yalcin Yilmaz, and Di Hu. Mr. Zheng designed AFE and RF blocks. Mr. Yilmaz helped coding

|                                   | This Work                                                     | [45]                                               | [46]                                                                 | [48]                                                                                      | [49]                                                             |
|-----------------------------------|---------------------------------------------------------------|----------------------------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------------------------|------------------------------------------------------------------|
| Sensors                           | ECG, EMG,<br>ECoG                                             | ECG                                                | EEG                                                                  | ECG, EMG, EEG                                                                             | ECG                                                              |
| Supply Voltage                    | $0.55 \mathrm{V}/1 \mathrm{V}/1.2 \mathrm{V}$                 | 1.2V                                               | 1V                                                                   | 30mV,-10dBm                                                                               | $0.4 \mathrm{V}/0.6 \mathrm{V}$                                  |
| Energy Harvesting                 | ×                                                             | ×                                                  | ×                                                                    | Thermal, RF                                                                               | ×                                                                |
| Supply Regulator                  | 0                                                             | ×                                                  | ×                                                                    | 0                                                                                         | ×                                                                |
| AFE                               | 1-channel                                                     | 3-channel                                          | 18-channel                                                           | 4-channel                                                                                 | 1-channel                                                        |
| On-chip Power<br>Management       | Power gat-<br>ing/clock gating                                | Clock gating                                       | ×                                                                    | Power gating/Clock<br>gating                                                              | ×                                                                |
| DSP                               | IIR, Packetizer,<br>Depacketizer                              | ASIC DSP (4×<br>SIMD), FIR, En-<br>cryption, DMA   | ASIC DSP                                                             | Programmable<br>FIR, AFib, MCU,<br>ENV DET, DMA,<br>Packetizer                            | Arrhythmia detec-<br>tion, R-R interval,<br>FFT, FIR             |
| Memory                            | <b>3.25kB</b> (0.55V)                                         | 42kB (1.2V)                                        | ×                                                                    | 5.5kB (0.3V-0.7V)                                                                         | 3.7 kB (0.4 V)                                                   |
| Digital Power                     | $935 \mathrm{nW}$                                             | $\tilde{1}2\mu W$                                  | $2.1\mu W$                                                           | $2.1\mu W$                                                                                | 45nW                                                             |
| TX (datarate)                     | 10  Mb/s                                                      | ×                                                  | ×                                                                    | 200  kb/s                                                                                 | ×                                                                |
| TX $P_{DC}$ (100% on)             | $774\mu W(0') TX)/1mW(1') TX)$                                | ×                                                  | ×                                                                    | $160\mu W$                                                                                | ×                                                                |
| TX $P_{OUT}$                      | -10 dBm 0 dBm                                                 | ×                                                  | ×                                                                    | -18.5 dBm                                                                                 | ×                                                                |
| Carrier Freq.                     | $2.4~\mathrm{GHz}$                                            | ×                                                  | ×                                                                    | $402/433 \mathrm{~MHz}$                                                                   | ×                                                                |
| On-chip LC Net-<br>work           | 0                                                             | ×                                                  | ×                                                                    | ×                                                                                         | ×                                                                |
| Raw Data Trans-<br>mission        | 0                                                             | ×                                                  | ×                                                                    | ×                                                                                         | ×                                                                |
| Total Power                       | $6.1 \mu { m W}$                                              | $31.1\mu$ W                                        | $77.1\mu$ W                                                          | $19\mu W$                                                                                 | 64 nW                                                            |
| Note on Total<br>Power (includes) | AFE, 10-bit<br>ADC, DSP(IIR),<br>TX-duty cycled<br>at 0.2526% | AFE, 12-bit ADC,<br>DSP(heart beat de-<br>tection) | 18-channel AFE,<br>12-bit ADC, and<br>DSP(EEG feature<br>extraction) | 1-channel AFE, 8-<br>bit ADC, DSP(R-<br>R extraction), and<br>TX duty-cycled at<br>0.013% | 1-channel AFE,<br>8-bit ADC, and<br>DSP(arrhythmia<br>detection) |
| Technology                        | 65nm                                                          | 180nm                                              | $180\mathrm{nm}$                                                     | $130\mathrm{nm}$                                                                          | 65nm                                                             |

Table 2.16: Comparison Table of state-of-the-art Bio-medical Application SoCs.

packetizer and depacketizer in Verilog. Di Hu helped to build up memory block. I would like to thank Dr. Idong Ebong and Mr. Mahmood Barangi for their technical helps and valuable discussions.

# CHAPTER III

# A Robust 12T SRAM Cell Design

# 3.1 Introduction

With the advancement of CMOS VLSI technology in nanometer regime, the process, the supply voltage, and the on-chip temperature (PVT) variations have been significant issues. These variations make a digital CMOS system vulnerable since drivability of each device changes from the intended design, causing read or write upset in a SRAM, synchronization problems in a latch, and adversely affect delays in logic gates. Among these three canonical CMOS circuit types which are a SRAM cell, a latch, and an inverter, a SRAM bit cell is a key component in designing a reliable system due to its highest failure rate [58]. In addition, as the demand for ultra-low power applications has been on the rise [59][60][61][4], many techniques have been proposed, including parallel computation [62], clock gating [63], low swing signaling [64], dynamic voltage and frequency scaling (DVFS) [10], low swing flops and latches [65], and sub-threshold operation [61]. Among these techniques, sub-threshold operation has had a high profile since dynamic power can dramatically be reduced in the sub-threshold region. In this region, sequential logic is more vulnerable to noise than combinational logic, so many sub-threshold SRAM cell structures have been proposed since the introduction of the first sub-threshold operating FFT processor [61]. A singled-ended read port was proposed by introducing two additional read transistors [56]. These additional devices decouple its read bit line from the storage node, so the disturbance of the SRAM cell can be eliminated during read operation, which improved the stability of SRAM cell during read operation. This proposed bit cell is widely used in [66][67][68][69]. Another attempt to reduce read disturbance was introduced in [70]. An additional device is added to the conventional 6T cell so that a pull-down network can be cut from the storage node. However, this approach has drawback for write operation. In another example, the number of read access transistors was increased to four [71]. The additional devices could increase the number of rows sharing a bit line due to stacking effects. In [68], a floating VDD scheme was proposed. In this work, write operation in the sub-threshold region was feasible due to a floating VDD during write operation since it weakened the feedback in the SRAM cell. In addition, a virtual ground concept driven by a read buffer foot driver was introduced, which helped leakage reduction from bit lines through read access devices. For realization of write operation in the sub-threshold region, a virtual supply scheme was introduced. In [72], a decoupled read port was also introduced in order to improve read static noise margin (RSNM), and halo doping was introduced in the access transistors in order to utilize reverse short channel effect, causing the increase of threshold voltage. This technique was for increasing write margin in the sub-threshold region. Another approach for improvement of RSNM was proposed in [73]. Dynamic differential cascade voltage switch logic (DCVSL) was introduced for read access. In order to increase write margin, wordline voltage was boosted. Although these proposed bit cells improved RSNM as well as the number of rows sharing a bit line, write margin was not dramatically improved since each bit cell itself has a feedback loop in the structure so that this loop contends with write access devices. Other than the above mentioned bit cells, many proposed SRAM cells

drastically improved read stability in the sub-threshold region, but write stability was not improved much [74][75][76][77]. Another bit cell was proposed to resolve this issue [78]. In this proposed cell, feedback loop is opened by cutting a pull-down network of a half-cell. However, every bit cell dynamically shares switch control signal during both write and read operations. As a result, the storage nodes might experience voltage droop due to the control signals shared with the other bit cells in a column. Thus, it is potentially hazardous to a dynamic noise source although it suggests a way to improve write margin of SRAM bit cell. In conclusion, no bit cell can be regarded robust enough during write operation in the sub-threshold region. At last, another bit cell is proposed in [79]. The proposed cell cuts the power supply by the data written within the bit cell. However, the supply cutoff can be achieved after the access transistor successfully writes data into a storage node. Thus, the supply cutoff is indirectly controlled through the access transistor. In [80], a single write port bitcell was proposed. During write operation, the power supply to one of the hald cells was cut so that writability was improved. This bitcell structure resembles a standard cell latch, but since the power cutoff is recovered after the write clock cycle, the data is latched after the current clock cycle, which has a potential hazard of noise interference during the clock transition. Another attempt to improve writability of SRAM was proposed in [81]. In this proposed bit cell, pull-up networks are cut to eliminate charge contention during write operation. However, this bit cell also sacrifices hold due to its structure.

Theoretically, the maximum achievable static noise margin can be considered as shown in Figure 3.1. Two conventional static noise margins for read (i.e. RSNM) and write (i.e. WSNM) are presented. These ideal margins can be acquired by combining two ideal voltage transfer characteristics (VTCs) of back-to-back inverters. These VTCs depend on each operation. When reading, ideal inverters should switch at VM=VDD/2 with gain=- $\infty$ , so when these inverters are connected back-to-back, the



Figure 3.1: Ideal noise margin curve for (a) RSNM and (b) WSNM.

DC responses can be represented as in Figure 3.1 (a). Hence, the maximum RSNM can be VDD/2 from the definition. When writing, the VTC of an inverter is identical to the normal VTC, while the VTC of the other is distorted so that mono-stability condition is met during writing. In order to achieve the ideal mono-stability, one of the VTCs should be the ideal VTC of an inverter, while the other should be a straight line along with y-axis so that those cannot intersect (i.e. hold a state) with each other. As shown in Figure 3.1 (b), the maximum WSNM can be VDD/2. This point of view presents a blueprint on how the static noise margin of an ideal SRAM would be.

In this paper, a 12T SRAM cell is proposed, which eliminates charge contention during write operation so that its VTC curves closely resemble the ideal VTC curves for WSNM. Therefore, the proposed bit cell is bulletproof as a bit cell design can be even in the sub-threshold region where device performance variation is extremely difficult to manage. As mentioned in the following chapters, the proposed cell work at some frequency no matter how the devices are sized. The only significant considerations that affect device sizing are performance (i.e. speed and power).

The proposed bit cell can be used in ultra-low power applications (i.e. subthreshold operation) since reliability is a concern in the sub-threshold region. In many cases, these applications require a small capacity of memory so that the size overhead of a bit cell might not be critical, compared to memory hungry applications. If the bit cell cannot find a way into production due to the size overhead, it might at least serve as the pseudo-golden reference for all subthreshold bit cell designs to be compared against since the proposed bit cell is as safe as a bit cell could ever be in terms of read and write static noise margin. Although a standard-cell latch proposed in [82] can be regarded as a golden reference due to no charge contention, the voltage transfer characteristic of the proposed 12T bitcell is also similar to the one of the standard-cell latch. The difference between them is that the proposed bitcell has initial charge contention, while the standard cell latch does not have any charge contention. However, the proposed bitcell forms a feedback loop during the write operation clock phase, while the standard cell latch forms it after the write operation clock phase. Since the characteristics of the proposed bitcell is very similar to the standard cell latch, the other sub-threshold bit cells traded safety and robustness for area reduction, so the degree to which it is accomplished could be compared to the proposed bit cell as a reference.

The proposed bit cell structure is based on a 16T SRAM proposed in [83]. While the 16T bit cell has dual-rail outputs and two footers for balancing the signal timing of dual-rail in asynchronous systems, the proposed 12T SRAM bit cell has a single-ended output and no footer to reduce area and power overhead.

The remainder of this chapter is organized as follows: Section 3.2 describes the proposed 12T SRAM bit cell design, its operation principle, and sizing constraint. Section 3.3 introduces sub-threshold and super-threshold analytical models for the write margin of the proposed 12T SRAM. Section 3.4 presents simulation results. Section 3.5 draws conclusions.

# 3.2 12T SRAM Cell Design

12T SRAM is designed to increase write margin. Previously proposed SRAM cells are mostly either for improving read static noise margin or for increasing the number of rows SRAM cells share in bit lines by reducing leakage current. Consequently, not many attempts to increase write margin have been done. Conventionally, write operation is conducted by applying state '0' or '1' to the bit lines so that the set values can override the previous state stored in the cross-coupled inverters. In this scenario, bit line input drivers should be stronger than SRAM cell transistors, otherwise, write operation may fail. Due to this characteristic of a SRAM, sizing has been one of the most dominant factors for designing a SRAM cell. This is attributed to SRAM cells feedback loop structureback-to-back inverter structure. In an instance of read operation, the read access switches can be used for decoupling the read bit lines from the storage nodes as an 8T SRAM cell, so these read access transistors can be free from sizing constraints. In the case of write operation, however, decoupling storage nodes from bit lines is infeasible because some paths through which charges can be stored or discharged should directly be connected to those nodes. Accordingly, an alternative needs to be proposed such as a static logic style.

#### 3.2.1 SRAM Cell Structure

The proposed SRAM cell structure is shown in Figure 3.2. Storage nodes Q and QB are comprised of transistors M1 through M4. More specifically, transistors M1 through M4 are arranged as a pair of inverters cross-coupled with each other. Transistors M7 through M10 comprise supply switches defined as two pairs of PMOS devices, such that each pair of PMOS devices have source terminals coupled to the supply voltage and drain terminals coupled to one of the two inverters. Additionally, a gate terminal of a single supply transistor is coupled to a write word line. Write access switches are comprised of transistors M5 and M6 as the conventional 6T and



Figure 3.2: The proposed 12T SRAM cell structure.

8T SRAM cells are. These six devices M5 through M10relate write operation. Two NMOS devices M11 and M12 form a read port as in the conventional 8T SRAM cell [56].

# 3.2.2 Operation Principle

12T SRAM is fully operated in static mode during read and write operation.

#### 3.2.2.1 Read Operation

Read operation is conducted through devices M11 and M12 as shown in Figure 3.3. As in a conventional 8T SRAM cell, the storage node QB is decoupled from the read bit lines RBL by device M11. In this case, M11 is turned on. When RWL is asserted, a path from RBL to VGND becomes transparent, and VGND is driven to GND by a driver, as shown in Figure 3.3 (a). Once this path is transparent, charges on the





Figure 3.3: The proposed 12T SRAM cell read operation.

floating bit line, RBL, begin to be discharged through the path as shown in Figure 3.3 (b). This process is the completion of read operation. After this completion, RWL is deasserted and RBL is precharged to VDD, while VGND is driven to VDD so that the leakage due to lack of voltage difference between RBL and VGND can be reduced when the SRAM cells connected to this word line are not used. It brings about more rows of cells shared in bit lines since the leakage has been an obstacle increasing the number of rows of cells.

#### 3.2.2.2 Write Operation

The write operation is a key feature of the 12T SRAM cell design. Figure 3.4 shows a series of processes in write operation. Device M5 to M10—six devices in total—are related to write operation. The basic principle is to make an SRAM cell operate in static mode without charge contention.

The write operation illustrated in Figure 3.4 is writing '1 to node Q, assuming that '0 is initially stored at node Q and '1 is initially stored at node QB. To begin, keep '0 at node WBLB, while asserting '1 at node WBL so that M7 is turned on, and M8 is turned off, as shown in Figure 3.4 (a). Next, WWL is asserted, which causes M5 and M6 to turn on and M9 and M10 to turn off, as shown in Figure 3.4 (b). Notice that a path from the supply to node QB is cut, so that no current can flow into the storage nodes. Instead, a path from node QB to GND is formed. On the other side, a path from supply to node Q is formed through M5. Accordingly, discharge at node QB is incurred through M6, while charging Q through M5 as shown in Figure 3.4 (c). Please notice that there is a charge contention between M1 and M5 (i.e. writing '1 at node Q). However, writing '0 at node QB would complete before writing '1 at node Q due to stronger  $V_{GS}$  of M6 as well as no charge contention in discharging path. Thus, the initial charge contention between M1 and M5 would be eliminated after discharging the node QB. In other words, this process turns M2 on, while it turns M1


Figure 3.4: A series of write operation process of the proposed 12T SRAM cell.

off, so that a path from VDD through M7 and M2 to node Q is transparent, while a path to GND is closed. This, in turn, helps charging node Q, causing M3 to become transparent, while switching M4 off as shown in Figure 3.4 (d). At this moment, writing '1 to node Q and '0 to node QB is completed. Subsequently, the asserted signal on WWL, WBL, and WBLB should be reset to '0 as shown in Figure 3.4 (e). With this reset, M7 through M10 can transfer power to the cross-coupled inverters, while M5 and M6 are turned off. Figure 3.4 (f) shows the state of the SRAM cell after the completion of write operation.

#### 3.2.3 Sizing Constraint

The proposed 12T SRAM cell has initial charge contention between the access transistor and the pull down transistor of one of the half cells during write operation. However, it will be eliminated when the write operation of the other half cell is complete, which means the write operation is sequentially conducted from one half cell to the other. Thus, sizing mostly affects the performance of an SRAM and its static and dynamic noise margins rather than its functionality. This is one of the advantages of the proposed 12T SRAM cell since engineering efforts to design an SRAM cell can dramatically be reduced. Unless performance is a matter of importance, every device size can be minimum. This can help to reduce energy consumption during read or write operation. For a balanced VTC, M2 and M4 can be sized twice as wide as M1 and M3. This makes pull-up and pull-down strength balanced, which causes the shape of each inverters VTC as well as static noise margin. In addition, the proposed 12T bit cell does not have any feedback during read and write operations, so sizing M11 and M12 up could improve read performance as the conventional 8T bit cell. Moreover, sizing M5-M10 up can improve write performance since the sizes of M5 and M6 determine discharging time, while the sizes of M7 and M8 affect charging time. Thus, the proposed 12T bit cell can be designed according to any certain performance



Figure 3.5: The proposed 12T SRAM Read Noise Margin Model.

requirement without concerning either read upset or write upset.

# 3.3 Analytical Model

In this section, an analytical model for writer margin is proposed.

#### 3.3.1 Read Static Noise Margin

Figure 3.5 shows static noise sources inserted at feedback nodes as in [84]. Since M7, M8, M9, and M10 are turned on, both nodes  $V_1$  and  $V_2$  are charged with VDD. In addition, M5 and M6 are also in off state. Only inverters (M1 through M4), M11 and M12 are relevant during the read operation. Accordingly, the proposed 12T SRAM cell is very similar to the conventional 8T SRAM cell during read operation, which will be shown in Section 3.4.

#### 3.3.2 Definitions of Write Margin

Many definitions of write margin have been proposed in literature [85][86][87][88]. The conventional write static noise margin (WSNM) is based on the VTCs of the back-to-back inverters [85]. In this definition, two static noise sources are injected in the feedback loop of the back-to-back inverters so that these sources prevent the bit cell from writing. Accordingly, the minimum voltage of the noise sources that forces the bit cell to hold the previous data during write operation can be defined as WSNM. Another definition of write margin is bit line write margin (BLWM) [86]. In this definition, a static noise source is injected in a bit line which is supposed to be '0'. In other words, it can be assumed that a bit line driver cannot force a bit line to discharge fully. Since write operation begins with discharging, this injected noise source could affect the write operation, so BLWM can be the noise voltage at which discharging cannot flip the state of a bit cell. Other definitions of write margin are related to wordline [87][88]. In [87], the wordline voltage of a half cell is swept so that one of the inverters can flip at a certain voltage, from which to VDD can be a wordline write margin (WWM). In [88], a newly combined wordline write margin (CWWM) is proposed after analyzing the drawback of WWM. Instead of sweeping the wordline voltage of a half cell, the whole wordline voltage is swept in order to acquire CWWM. CWWM can be the difference between VDD and the wordline voltage where the storage nodes flip to the opposite state. These definitions are examined in [89], and it was concluded that CWWM follows PVT variations better than the others. However, WSNM would be used in analytical modeling since WSNM is a counterpart of the conventional read noise margin in write operation. Thus, it gives better understanding of the relations between each device.



Figure 3.6: The proposed 12T SRAM Write Noise Margin Model

#### 3.3.3 Write Static Noise Margin Modeling

Static noise sources for write margin are inserted at feedback paths as shown in Figure 3.6. In contrast with read SNM, the signs of noise sources are opposite since these sources should function to disturb write operation. In other words, these sources increase the stability of the SRAM cell during hold and read. Assume that state '1' is stored at node Q, and value '0' is being written, so WBLB is set as '1', while WBL as '0'. In addition, WWL is also asserted, and RWL is deactivated (VGND is in '1' state). In this scenario, charges stored at node Q as well as at node V1 begin to discharge through M5 since M2 is turned on. Accordingly, the voltage at node Q and at node V1 is regarded as '0' in the dc analysis point of view. Moreover, the voltage at node V2 can be considered VDD because M8 is always in 'on' state. Since the node voltage at Q is '0', writing '1' at QB is the completion of the write operation. Therefore,  $V_n$  at which the drain current of M3 is the same as the one of M4 can be the static write margin since charges can barely be accumulated at QB, meaning almost '0' state. With these assumptions, the analytical model for write margin is acquired.

#### 3.3.3.1 Super-threshold Model

Assume M3 operates in the linear region, while M4 operates in the saturation region since '0' is stored at QB, so  $V_{DS4}$  is almost VDD. Equating drain currents of both M3 and M4 results in:

$$\frac{k_4}{2}(V_{SG4} - V_{tp})^2 = k_3 V_{DS3} \left( V_{GS3} - V_{tn} - \frac{V_{DS3}}{2} \right)$$
(3.1)

where

- $k_3 = \mu_n C_{ox} \left(\frac{W}{L}\right)_3$
- $k_4 = \mu_p C_{ox} \left(\frac{W}{L}\right)_4$

- $V_{tn}$  is the threshold voltage of NMOS
- $V_{tp}$  is the threshold voltage of PMOS

For simplicity,  $\mu_p$  and  $V_{tp}$  are treated as positive values.

From Kirchhoff's voltage law (KVL), the following equations are acquired:

$$V_{GS3} = V_Q + V_n \tag{3.2}$$

$$V_{SG4} = V_{DD} - V_Q - V_n (3.3)$$

$$V_Q = 0 \tag{3.4}$$

Notice that we only have the VTC of inverter 2; the VTC of inverter 1 is constant  $(V_Q=0)$ . Substituting these into Equation 3.1 yields:

$$V_{DS3}^{2} - 2(V_n - V_{tn})V_{DS3} + \frac{\mu_p}{\mu_n}\beta(V_{DD} - V_{tp} - V_n)^2 = 0$$
(3.5)

where  $\beta = \left(\frac{W}{L}\right)_4 / \left(\frac{W}{L}\right)_5$ 

When Equation 3.5 has two distinct real roots, the SRAM cell is regarded as holding the current stateretaining bistability. If Equation 3.5 has two distinct complex roots, the SRAM cell cannot hold datamonostable, so write operation can be performed. Therefore,  $V_n$  at which Equation 3.5 has a double root can be the write marginboth VTCs coincide at a point. This condition is identical to the discriminant of the quadratic equation Equation 3.5 as shown below:

$$aV_{DS3}{}^2 + bV_{DS3} + c = 0 ag{3.6}$$

$$b^2 = 4ac \tag{3.7a}$$

or

$$b = -2\sqrt{ac}(\because b < 0) \tag{3.7b}$$

When Equation 3.6 and Equation 3.7b are applied to Equation 3.5, the following equation is yielded:

$$2(V_n - V_{tn}) = 2\sqrt{\frac{\mu_p}{\mu_n}\beta}(V_{DD} - V_{tp} - V_n).$$
(3.8)

After solving Equation 3.8 for  $V_n$ , the static write margin for the super-threshold operating condition can be acquired:

$$\therefore WM_{static,super-V_{th}} = \frac{V_{tn} + \sqrt{\frac{\mu_p}{\mu_n}\beta}(V_{DD} - V_{tp})}{1 + \sqrt{\frac{\mu_p}{\mu_n}\beta}}.$$
(3.9)

#### 3.3.3.2 Sub-threshold Model

Sub-threshold modeling is similar to the super-threshold modeling except for the drain current expression. In this model, every parameter is treated as a positive value. At node QB, the drain currents of M3 and M4 can be equated by Kirchhoff's current law (KCL).

$$I_{SD4} = I_{DS3}.$$
 (3.10)

The drain current of each device are represented below:

$$I_D = I_S e^{\frac{V_{GS} - V_t}{n\Phi_T}} \left( 1 - e^{-\frac{V_{DS}}{\Phi_T}} \right)$$
(3.11)

where

• 
$$I_S = \mu\left(\frac{W}{L}\right) \sqrt{\frac{q\epsilon_{si}N_{DEP}}{2\Phi_T}} (\Phi_T)^2$$

• 
$$n = 1 + \frac{c_d}{c_{ox}}$$

• 
$$\Phi_T = \frac{kT}{q}$$

Since  $1 >> e^{-\frac{V_{DS}}{\Phi_T}}$ ,  $e^{-\frac{V_{DS}}{\Phi_T}}$  term can be dismissed, so substituting Equation 3.11 into Equation 3.10 yields:

$$I_{S,4}e^{\frac{V_{SG4}-V_{tp}}{n\Phi_T}} = I_{S,a}e^{\frac{V_{GS3}-V_{tn}}{n\Phi_T}}$$
(3.12)

As the case of super-threshold modeling, the same conditions Equation 3.2 to Equation 3.4are applicable to Equation 3.12. After substitution, solving Equation 3.12 for  $V_n$  yields the static write margin for sub-threshold condition:

$$\therefore WM_{static,sub-V_{th}} = \frac{1}{2} n \Phi_T ln \left( \frac{\mu_p}{\mu_n} \beta e^{\frac{V_{DD} - V_{thp} + V_{thn}}{n\Phi_T}} \right).$$
(3.13)

# 3.4 Simulation Results

#### 3.4.1 Analytical Model

WSNM analytical models developed in Section 3.3 are compared with simulation results as shown in Figure 3.7. Figure 3.7 (a) shows the comparison of super-threshold and sub-threshold models with simulation results versus VDD. The error range of super-threshold model is 3.1% to 8.7%, while sub-threshold model has 8.1% to 14.2% error range. The reason for greater error of the sub-threshold model is that leakage current exponentially increases as the device goes to deep sub-threshold region, and we assumed M1, M2, M7, and M9 were completely off in modeling, while they are not completely off due to sub-Vth VDD. The WSNM simulation results when changing



Figure 3.7: The Proposed 12T SRAM Analytical Model Simulation

 $\beta$  are compared to the analytical model as shown in Figure 3.7 (b). Super-threshold model is verified at VDD=1.0V, while sub-threshold one at VDD=0.25V. The error range of the super-threshold model is from 0.59% to 6.17%, while the range of the sub-threshold model is from 10.43% to 15.42%. Although the sub-threshold curves seem to be closer than super-threshold curves in the figure, the accuracy of the latter is greater. Since WSNM values in sub-threshold are relatively smaller, a tiny change of the value makes a great difference in error. If the two values in the sub-threshold region are compared, the difference in range is within 17.80mV.

#### 3.4.2 Simulation Setup

The proposed 12T cell was analyzed against the conventional 6T, 8T [56], and the 10T [80] cells. Sizing of each bit cell was determined as follows. The pull-up ratio (PR), which is defined as the ratio of the size of the pull up transistor to the size of the access transistor, of the 6T bit cell is set to 1, and the cell ratio (CR), the ratio of the size of the pull down transistor to the size of the access transistor, is set to 2. Both PR and CR of the 8T bit cell is set to 1, and the read access transistors are sized as minimum. All devices of the 10T and the proposed 12T bit cell are sized as minimum. All experiments were conducted with these setups. The operating supply voltage is set as a near-threshold voltage (i.e. 550 mV) since it provides a certain amount of performance, while saving energy much.

#### 3.4.3 Read Static Noise Margin

50k Monte-Carlo pre-layout schematic simulation results of RSNM at VDD=550mV, FS corner, and 125°C is shown in Figure 3.8. The RSNMs of 6T, 8T, 10T, and 12T bit cells are 80.08mV, 199.32mV, 198.67mV, and 198.28mV, respectively. According to the distributions, all bit cells can be considered robust under  $\pm 6\sigma$  local process and mismatch variations. In addition, the RSNM of the proposed 12T bit cell is com-



Figure 3.8: 50,000 RSNM Monte-Carlo simulation results for 6T, 8T, 10T, and the proposed 12T SRAM cells.



Figure 3.9: 50,000 WSNM Monte-Carlo simulation results for 6T, 8T, 10T, and 12T SRAM cells at VDD=550 mV, SF corner, and -30°C.

parable to the conventional 8T bit cell and the 10T bit cell, while the conventional 6T bit cell is more vulnerable than the others.

#### 3.4.4 Static Write Margin

Since noise can incur at any node including a storage node, wordline, and bit line, investigation of each write margin definition is essential. 50,000 WSNM Monte-Carlo pre-layout bitcell level simulation results at VDD=550mV, SF corner, and -30°C under process and mismatch variations are shown in Figure 3.9. The curves of the 10T and the proposed 12T bit cell resemble the ideal shape shown in Figure 3.1. Notice that the VTC of a half cell is a straight line along with y-axis even under process and mismatch variations. This is because a feedback loop is cut in the 10T and the 12T bit cell during write operation. Thus, the proposed bit cell provides mono stability even though the VTC of the other half cell is fluctuating under process and mismatch variations. The statistical distributions of write margin simulation results are shown in Figure 3.10. The 6T and 8T bit cells fail in some iterations of CWWM, and BLWM, while the 10T and the 12T bit cell do not fail at all in any write margin definition. The mean of WSNM for 6T, 8T, 10T, and 12T bit cells are 173.1 mV, 186.4 mV, 305.6 mV, and 307.8 mV, respectively. According to the distributions, 6T and 8T are robust under  $\pm 4\sigma$  variations, while 10T and 12T are robust under more than  $\pm 12\sigma$  variations, which can be concluded by the extrapolation of the distributions. The mean of CWWM for 6T, 8T, 10T, and 12T are 44.9 mV, 54.8 mV, 317.1 mV, and 251.5 mV, respectively. Note that the conventional 6T and 8T bit cells fail 5816 and 3856 times, respectively. In BLWM, the mean of 6T, 8T, 10T and 12T are 57.8 mV, 73.1 mV, 280.6 mV and 405.3 mV, respectively. The 6T and 8T bit cells also fail 5786 and 3816 times, respectively. The statistics of the write margin simulations are shown in Table 3.1. As shown in the table, the proposed bit cell has more BLWM than the compared 10T bit cell, while it has less CWWM. The reason why the 10T cell has more CWWM is that the 10T cell cuts a feedback path by weakening both a PMOS and an NMOS, while the proposed bit cell cuts the feedback only by a PMOS. Thus, the 10T cell can weaken the feedback more with the same amount of wordline voltage applied. The reason why the proposed bit cell has more BLWM is that data is written by both BL and BLB, while the 10T cell is only driven by a bit line. As shown in the figure and the table, we can conclude that the proposed 12T SRAM bit cell is robust under  $\pm 6\sigma$  variations at VDD=550mV, SF corner, and -30°C by extrapolation.





|      |          | 6T                    | 8T[56]              | 10T [80]            | 12T (proposed)      |
|------|----------|-----------------------|---------------------|---------------------|---------------------|
|      | $\mu$    | $173.1 \mathrm{mV}$   | $186.4 \mathrm{mV}$ | $305.6 \mathrm{mV}$ | $307.8 \mathrm{mV}$ |
| WSNM | $\sigma$ | $42. \ 7 \mathrm{mV}$ | $44.6 \mathrm{mV}$  | $24.1 \mathrm{mV}$  | $23.2 \mathrm{mV}$  |
|      | fail     | No fail               | No fail             | No fail             | No fail             |
|      | $\mu$    | $44.9 \mathrm{mV}$    | $54.8 \mathrm{mV}$  | 317.1mV             | $251.5 \mathrm{mV}$ |
| CWWM | $\sigma$ | $38.3 \mathrm{mV}$    | $39.1 \mathrm{mV}$  | $27.0 \mathrm{mV}$  | $27.0 \mathrm{mV}$  |
|      | fail     | 5816 fails            | 3856 fails          | No fail             | No fail             |
|      | $\mu$    | $57.8 \mathrm{mV}$    | $73.1 \mathrm{mV}$  | $280.6 \mathrm{mV}$ | $405.3 \mathrm{mV}$ |
| BLWM | $\sigma$ | $49.4 \mathrm{mV}$    | $52.8 \mathrm{mV}$  | $27.0 \mathrm{mV}$  | $81.0 \mathrm{mV}$  |
|      | fail     | 5786 fails            | 3816 fails          | No fail             | No fail             |

Table 3.1: SRAM Cell Write Margin Simulation Results (50,000 MC, SF Corner, T=- $30^{\circ}$ C)

\*These statistics exclude failed iterations.

#### 3.4.5 Dynamic Write Margin

Dynamic noise margin for a write operation (DNM) is analyzed for 6T, 8T, 10T, and 12T. Among the previously proposed DNMs, the minimum width of the WL assertion pulse to make a bitcell reach to the switching threshold [90] is used for this analysis. The simulation setting is shown in Figure 3.11. A bitwise column consists of 128 bitcells, and a wire RC model is inserted on the bitline. 50,000 Monte-Carlo simulation was conducted at VDD=550mV, SF corner, and -30°C. DNM per iteration is found by sweeping wordline width. The simulation results are shown in Figure 3.12. The proposed 12T bit cell did not incur any failure, while the other cells did. The mean of DNM for 6T, 8T, 10T, and 12T are 3.26ns, 3.47ns, 1.85ns, and 1.32ns, respectively. The standard deviations of DNM for 6T, 8T, 10T, and 12T are 3.97ns, 4.92ns, 1.99ns, and 1.75ns. Note that the mean values exclude failed interations, so these numbers show a DNM tendency. In conclusion, the proposed 12T cell is dynamically more stable than the other compared cells.

#### 3.4.6 Leakage Current

One of important metrics of an SRAM bit cell is the total bit cell leakage current since it limits the number of cells sharing bit lines. The total bit cell leakage of the



Figure 3.11: Dynamic write noise margin simulation setting.



Figure 3.12: 50,000 Monte-Carlo DNM simulation results for 6T, 8T, 10T, and the proposed 12T SRAM cells in 40nm CMOS technology.

6T, 8T, 10T, and 12T at VDD=550mV, TT corner, 25C are 6.42 nA, 5.04 nA, 3.94 nA and 4.12 nA, respectively. This is reasonable since the 10T cell has a single bitline, and both 10T and 12T cells have more stacks than 6T and 8T cells.

#### 3.4.7 Performance

Read access time of a column of 128 bit cell is simulated as the delay from 50% of read wordline voltage to 100mV voltage difference between BL and BLB (or a reference dummy bitline). Simlarly, write access time is simulated as the delay from 50% of write wordline voltage to 50% of written storage node voltage in the 128-bitcell column array. The simulation results are shown in Table 3.2. In read operation, the

| SRAM bit cell | Read (FS, $125^{\circ}$ C) | Write (SF, $-30^{\circ}$ C) |
|---------------|----------------------------|-----------------------------|
| 6T            | 22.07  ps                  | 2.45  ns                    |
| $8\mathrm{T}$ | $26.65 \mathrm{\ ps}$      | 2.19  ns                    |
| 10T           | 29.28  ps                  | 1.73  ns                    |
| 12T           | $29.89 \mathrm{\ ps}$      | 1.28  ns                    |

#### Table 3.2: SRAM Cell Delay Comparison

Write delay is simulated from 50% of wordline voltage to 50% of the storage node voltage, while read delay from 50% of wordline voltage until when the voltage difference between bitline and bitline bar to be 100mV.

proposed bit cell shows a comparable performance to the 10T bit cell. The reason why 8T has a better delay than 10T and the proposed 12T is that both 10T and 12T cells have an additional stack on pull-up network, so that the bitcells cant quickly recover voltage droop due to leakage. This weakens the drivability of the read port transistor. The conventional 6T bitcell shows the best performance in reading since it has a differential read port in additional to that the 6T cell has a greater CR than 8T, 10T and 12T cells (i.e. CR=2). If the read access transistor of 8T, 10T and 12T cells are sized up, the read performance can be improved, but it would trade power and area off. For write operation, the proposed 12T bit cell shows the best result. This is because the proposed 12T cell does not have a feed back for overdriving in a discharging path. 10T cell also shows a good performance for writing since it also cuts a feedback during write. However, it has a single write port, so the performance is worse than the proposed 12T cell. The conventional 6T SRAM cell gives the worst write delay due to a higher CR. In conclusion, the prospoed 12T bit cell provides comparable performance with the conventional 6T and 8T [56], and the 10T [80] SRAM cells. When there is a certain requirement of performance, the proposed 12T bit cell could achieve the requirement since there is no sizing constraint both in read and write operations. In this case, area and power can be traded off with performance.

#### 3.4.8 Cell Area

The layout of the proposed 12T bit cell is shown in Figure 3.13. The layout is 2-poly pitch height as in the conventional 6T and 8T cell layout, but only three sides (top, left and right) can be shared because the source terminals of M2 and M4 are not connected to VDD (i.e. the conventional 6T and 8T cell layout shares this terminal to another cell so that the bit cell area can drastically be reduce). Thus, the height of the proposed bit cell is 1.18 times greater than the other two cells which can share contacts with other cells both at the top and at the bottom. In addition, the width of the proposed cell is 1.65 times and 1.46 times greater than the 6T and 8T cells, respectively. Since the source terminals of M2 and M4 should be shared with four additional devices (M7 through M10), the drain terminals of inverters are connected by twisted metal 1 layer. Please note that the devices in the area delineated with read dot lines are additional ones compared to the conventional 8T bit cell. Overall, the area overhead of the proposed bit cell is 1.96 times and 1.74 times greater than 6T and 8T cells, respectively. The cell area comparison with respect to the conventional 8T cell is shown in Table 3.3. Although the proposed 12T cell has 2 or 3 more transistors than the previous proposed bit cells, the cell area overhead is not too great thanks to the layout optimization.



| SRAM bit cell   | Number of bitlines | Area (with respect to 8T) |
|-----------------|--------------------|---------------------------|
| 6T              | $2  \mathrm{BL}$   | $0.77 \times$             |
| 8T [56]         | 2  WBL / 1  RBL    | $1 \times$                |
| 10T [71]        | 2  WBL / 1  RBL    | 1.6 	imes                 |
| 8T [68]         | 2  WBL / 1  RBL    | $1.2 \times$              |
| 10T[72]         | 2  WBL / 1  RBL    | 1.6 	imes                 |
| 10T [73]        | $2  \mathrm{BL}$   | 1.6 	imes                 |
| 9T [79]         | 2  WBL / 2  RBL    | $1.4 \times$              |
| 12T (This work) | 2  WBl / 1  RBL    | 1.7 	imes                 |

Table 3.3: SRAM Cell Area Comparison

### 3.5 Conclusions

The proposed 12T bit cell dramatically improves the write margin by eliminating the charge contention due to the feedback structure of an SRAM cell. Its innate structure allows reliable operation during writing by blocking the power supply route. Since there is no charge contention, no sizing constraint exists. In order to improve RSNM, pull-up devices can be sized two times more than the pull-down devices for balancing the VTCs of back-to-back inverters. In addition, any device can be sized according to a certain performance requirement since there is no sizing constraint in the proposed structure. The VTC of the proposed cell in WSNM is very similar to the ideal curves suggested in Section I due to the feedback free structure during write. In three different definitions of write margin including WSNM, CWMM, and BLWM, the 12T cell is more robust than the conventional 6T and 8T cells, and it is comparable to the 10T cell [80]. In addition, the proposed 12T cell is more dynamically stable than the 6T, 8T, and 10T cells. Therefore, the proposed cell achieves a higher WSNM, BLNM, and DNM without sacrificing RSNM. Accordingly, the proposed 12T cell can be used for ultra-low power applications which requires low-voltage operations while demanding relatively low capacity since the area of memory block is comparable to the area of peripheral circuitry. In addition, the WSNM analytical model of the 12T cell is proposed. The super-threshold model fits within 8.7% errors, while the sub-threshold model fits within 14.2% errors. When ratio changes from 1 to 5, the super-threshold model fits within 6.17%, while the sub-threshold model fits within 15.42% errors.

# CHAPTER IV

# Energy-Efficient Hardware Architecture of Self-Organizing Map (SOM) for ECG Clustering

## 4.1 Introduction

Electrocardiogram (ECG) is one of the most crucial bio-signals for heart disease diagnosis, such as myocardial infarction and arrhythmia because such diseases are highly correlated with ECG (i.e. waveform shape, QRS complex, R-R interval, and etc.) [91]. Since these heart diseases are sporadic, and require immediate intervention for minimizing fatality, continuous monitoring of the ECG is indispensable [92]. Thus, patients with serious conditions need to stay in the hospital for continuous monitoring, which lowers the quality of their lives. In order to resolve this issue, ultra-low power monitoring systems have been proposed in mobile platforms [45][48][57][49]. This system can facilitate immediate intervention from medical providers; 1) it can diagnose the patient and alert medical experts when an abnormality is detected, or 2) it can transmit sensed raw ECG signals to a server so that the server can diagnose heart diseases according to the relevant signal processing algorithms for diagnosis. The latter is not regarded as a good candidate since sensed raw data have to be transmitted continuously through a power hungry RF transmitter, which limits the lifetime of a battery-operated system. Rather, processing ECG on-chip for diagnosis and wirelessly transmitting an alert is more appropriate for power hungry mobile devices and also for implantable devices. Since raw ECG data cannot be provided from this system to medical providers, on-chip signal processing should be highly accurate and immune to noise. In this sense, the use of artificial neural networks (ANNs) could be a promising technique since the performance of the conventional rule-based diagnosis algorithms deteriorate when noise is present, while ANNs still perform well [93].

There are two machine learning algorithm categories: supervised learning and unsupervised learning. The former categorizes each sample correctly after an enough number of learning processes, while requires a great number of data pairs of a sample and its category. The latter does not require such mapped data pairs (i.e. selfclustering), but it requires a well devised algorithm for each application. From these aspects, unsupervised learning can be more appropriate for biomedical applications. First, supervised learning requires enough training samples in order to map upcoming data onto a specific category. If there is not enough training data, ANNs cannot recognize some of the unknown input data. This is very challenging since enough training samples are not always available. This is because bio-signals are acquired from different people at different conditions. In other words, they have different races, ages, and health conditions at environments with different noise conditions. However, unsupervised learning is more flexible to unknown input data sets even in a new environment because it can resiliently adjust ANNs through a learning process with new input data.

Self-organizing map (SOM) [19] is one of the main two unsupervised learning algorithms with adaptive resonance theory (ART) [94]. Although ART is more plastic to unknown input patterns than SOM, spare hardware should be available for such plasticity. In other words, there is a trade-off between the plasticity and hardware cost. On the contrary, SOM does not require such extra hardware. In addition, SOM provides a topological map for each sample. If a sample is mapped to a certain cell in an ANN, it means the sample has similar facets with other samples mapped to cells nearby. The aspect of this mapping resembles real diagnosis. Most medical providers diagnose statistically; they tell how probable it is that a patient might have a certain disease based on his or her symptoms. Thus, SOM is a good candidate algorithm for an on-chip self-diagnostic hardware system.

In this chapter, we propose to use a ECG complex vector along with a R-R interval as an input vector to SOM network. This is simple for hardware, and efficiently enough for clustering ECG complexes.

The remainder of this chapter is organized as follows: Section II discusses the previous works of ECG clustering by using SOM algorithms, and presents the SOM algorithm used in this work. Section III introduces the proposed hardware architecture. Section IV presents simulation results. Section V draws conclusions.

## 4.2 Theoretical Background

There have been many attempts at utilizing SOM to cluster ECGs QRS complexes. In [95], ECG clustering by SOM was proposed after R-beat detection through a preprocessing algorithm. 50 input samples were used for the training, and two different QRS complexes (i.e. a normal beat and a ventricular beat) were successfully clustered, but no analysis on accuracy rate and types of beats was conducted. In [96], combination of SOM and learning vector quantization (LVQ) along with mixture of experts (MOE) were proposed for ECG beat clustering. However, LVQ is a supervised learning algorithm, and MOE is very difficult to implement in hardware due to the exponential function used. These algorithms clustered four different types of complexes, and their accuracy was 94%. In [97], Hermite basis was proposed for learning and weight vectors to SOM instead of ECG complex vectors. This idea could help clustering 16 types of QRS complexes, and its accuracy was 98.5%. However, Hermite function is not hardware-friendly since it requires calculations of square-roots and exponential functions. In [98], a self-organizing cerebellar model articulation controller (SOCMAC) was proposed. This idea combines SOM into CMAC, which reduces a memory requirement by two coordinate mapping. CMAC has a similar aspect to SOM in that it updates weight vectors nearby when updating the mapped weight vector. However, it cannot update the only mapped weight vector, which has a risk to make an ANN diverge. The authors mentioned initialization of the memory contents were critical for learning process. However, this dilutes the advantage of unsupervised learning. In [99], multiscale recurrence analysis (MRA) was integrated with SOM, which achieved 95.25% accuracy, but it requires 3-lead data and hardware expensive wavelet transform. In this work, we used the original SOM algorithm for hardware simplicity.

SOM algorithm matches an input vector with one of weight vectors corresponding cells in a SOM, and update the cell or adjacent cells based on the learning error.

Let  $x = [x_1, x_2, ..., x_n]^T \in \mathbb{R}^n$  be an input vector. Assume SOM consists of N cells, and the weight vector of cell *i* can be represented as  $w_i = [w_{i1}, w_{i2}, ..., w_{in}]^T \in \mathbb{R}^n$ . The weight vector of the winner cell,  $w_c$ , is found by

$$||x - w_c|| = \min_{i}[||x - w_c||]$$
(4.1)

where  $||x - w_i|| = \sqrt{(x_1 - w_{i1})^2 + \dots + (x_n - w_{in})^2}$  is the Euclidean distance between x and  $w_i$ .

The winner vector,  $w_c$ , and the vectors of its adjacent cells are updated as follows:

$$w_{j}(t+1) = \begin{cases} w_{j}(t) + \alpha(t)(x(t) - w_{j}(t)) & \text{if } j \in N_{c}(t) \\ w_{j}(t) & \text{if } j \in N_{c}(t) \end{cases}$$
(4.2)

where  $N_c$  is a neighborhood set around cell c and  $\alpha(t)$  is an adaptation gain satisfying the condition,  $0 < \alpha(t) < 1$ .

In this work, we set x as a combination of a QRS complex vector, c, and its corresponding R-R interval as follows:

$$x = [c_0, c_1, \cdots, c_{126}, R - R \ interval]^T.$$
(4.3)

Including the R-R interval into an input vector helps clustering more effectively. Since R-R interval is highly related to a type of heart diseases, similar ECG waveforms having different R-R interval can be clustered differently. Otherwise, it might be clustered into one group.

# 4.3 Hardware Architecture

The proposed hardware architecture is shown in Figure 4.1. The main advantage of the proposed architecture is that memory access is sequential, not in parallel. This can be achieved thanks to the narrow bandwidth of ECG as well as its sampling frequency. The architecture mainly consists of two sub-blocks: a preprocessing block (i.e. an FIR filter, a memory, and an R-peak detector) and an SOM block (a learning controller and an SOM network). The preprocessing block filters noise out through a digital FIR filter, and stores it on the memory block. The filtered input signal is also fed into an R-peak detector which detects an R-peak through squaring, moving integral, and threshold detection. These processes amplify a peak vector further, while suppressing the other components of small amplitudes. When an R-peak is detected, the SOM block begins to operate. Here, we adopted  $5 \times 5$  mesh topology for SOM since mesh topology is easy to implement in hardware, bringing about less overhead. In addition, the data representation of SOM weight vector is a 10-bit fixedpoint 2's complement. The reason why 10-bit was adopted is that many recent ECG





body sensor network works tend to use a small data representation for a power saving purpose (e.g. 8 bit) [48][49]. However, the bit length affects the clustering accuracy, so we adopted a 10-bit as a data representation.

Memory block is  $640 \times 10$  bit register files (RF). Since the input data is sampled at 360 Hz. The memory can cover a heart rate from  $60 \ bpm \times 360 \div 640 = 33.75 \ bpm$ . SOM network consists of  $5 \times 5$  cells one of which has  $128 \times 10$  bit RF. Thus, it will cover a heart rate by  $60 \ bpm \times 360 \div 127 = 170 \ bpm$ .

There are two modes: learning mode and operation mode. The purpose of learning mode is to make the clustering immune to unexpected noises. Since each patient has a different environment, the learning mode will help the system cluster an QRS complex more accurately under a certain noise environment. In learning mode, the SOM network is updated, whereas it is not updated in the operation mode. In both modes, the learning controller retrieves the stored vectors centered at the peak as well as weight vectors stored in the SOM network. Then, it calculates Euclidean distances between the stored vectors and the weight vector of each cell in the SOM network, and compares them with each other. This process continues until the last vector of the stored QRS complex is read. After this process, the learning controller tags the input QRS complex with a cluster ID, and outputs it. This is the end of the operation mode processes. In the learning mode, updating process follows after the clustering. The learning controller updates the winner cells weight vector first, and that of the neighborhood cells thereafter, depending on the learning phase. Thus, the operation principle of the architecture is serialized memory access instead of parallel access. This means the Learning Controller reads and writes data once at a time sequentially. This is realized by low clock frequency due to low ECG bandwidth (i.e. less than 100 Hz).

Example signal waveforms when an R-peak is detected and when the update process begins in the learning mode are shown in Figure 4.2 (a) and (b), respectively.





As shown in the Figure 4.2 (a), an R-peak is detected at Memory\_write\_addr=276. The stored filtered data is read from the 63rd earlier than this address to the 63rd later so that a QRS complex consisting of 127 samples can be read. The Euclidean distances between the sampled QRS complex and the weight vectors of each cell are calculated during this read phase. After the 127 samples are read, the weight vectors of the target cells are updated as shown in Figure 4.2 (b). The detailed procedures of clustering an QRS complex and updating weight vectors are shown in the following sub-sections.

#### 4.3.1 Clustering Process

The clustering process is triggered by CLK and Learning\_CLK which is 25 times faster than CLK so that the learning controller can calculate Euclidean distances of all the cells (i.e. 25 cells) at each sampling. The clustering processes shown in Figure 4.2 (a) are as follows:

- The value after moving integral exceeds the threshold. (R-peak is detected, and R-R interval is also calculated)
- 2. Address sequencer and learning controller prepare for the read addresses to the memory and the SOM network. (Note that these signals are triggered by CLK.)
- The first memory read address is, which is the current memory write address,
   63 so that learning controller can read 127 samples around the R-peak sample.
- 4. 25 sample weight vectors are sequentially read at each CLK cycle triggered by Learning\_CLK so that the Euclidean distance of each cell against the current input vector can be calculated.
- 5. The Euclidean distance calculations are finished after 127 CLK cycles. Then, the controller outputs the cell number which has the minimum Euclidean distance (i.e. SOM\_memory\_read\_cell shown in Figure 4.2 (b).).

#### 4.3.2 SOM Network Updating Process

The updating processes are followed after the clustering processes when the learning mode is active (see Figure 4.2 (b)). These are triggered by Writing\_CLK, which is 8 times faster than Learning\_CLK. We wanted to finish the updating phase before another R-peak is detected. Writing\_CLK is 8 times faster than Learning\_CLK here since 8 times difference is easily achieved by a few logic gates. In this case, Writing\_CLK is 200 times faster than CLK, which is acceptable even when all 25 cells are updated. In this case, updating takes 16 CLK cycles. If another R-peak is detected during this 16th CLK cycle, the period of R-peaks would be 79 CLK cycles. For this case, the heart rate should be 274 bpm, which is unrealistic. Since the memory read access rate is different during updating process from the clustering process, the memory is triggered by Learning\_CLK, while it is triggered by Writing\_CLK during updating process. The whole updating processes are as follows:

- 1. Once the winner cell is determined, the controller triggers *Read\_complete\_flag* high.
- The address sequencer reacts the flag signal at the next Writing\_CLK cycle (i.e. SOM\_write\_enable high).
- 3. The first input vector of the current QRS complex (e.g. at 214) and the first weight vector of the winner cell are read.
- 4. The new weight vector (i.e. *Data\_for\_write*) is calculated by reflecting the learning error and the current adaptation gain.
- 5. The new weight vector is written back in the next Writing\_CLK cycle.

| F (%)          | 68.1     | 7.22 | 6.60 | 2.34     | 0.13 | 0.08 | 0.00 | 6.72 | 0.73 | 0.73 | 0.02 | 0.73 | 0.10 | 6.29 | 0.10 | 0.03 |                |       |
|----------------|----------|------|------|----------|------|------|------|------|------|------|------|------|------|------|------|------|----------------|-------|
| $\epsilon$ (%) | 0.62     | 1.40 | 2.06 | 19.2     | 19.8 | 47.1 | 100  | 5.19 | 30.3 | 30.3 | 100  | 30.3 | 26.2 | 0.48 | 26.2 | 81.6 |                |       |
| S              | -        | 0    | 0    | 0        | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 7    | 12.5           | 0.01  |
| f              | 53       | 0    | 0    | 0        | 0    | 0    | 0    | 17   | 0    | 0    | 0    | 0    | 0    | 25   | 934  | 15   | 10.5           | 0.93  |
| Ч              | 2        | 0    | 0    | 0        | 0    | 0    | 0    | 2    | 0    | 0    | 0    | 0    | 0    | 7043 | 52   | 1    | 0.80           | 6.31  |
| E              | 0        | 0    | 0    | 0        | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 62   | 0    | 0    | 0    | 0              | 0.07  |
|                | 120      | 0    | 0    | 0        | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 151  | 0    | 0    | 0    | 0    | 44.3           | 0.24  |
| е              | 0        | 0    | 0    | 0        | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 100            | 0.00  |
| q              | 0        | 14   | 4    | 0        | 0    | 0    | 0    | 20   | 20   | 469  | 0    | 0    | Η    | 0    | 0    | 0    | 7.68           | 0.45  |
| ſщ             | 22       | 0    | 0    | Η        | 0    | 0    | 0    | 25   | 575  | 0    | 0    | 1    | 0    | 0    | 0    | 0    | 7.85           | 0.55  |
| Ν              | 142      | 14   | Ч    | $\infty$ | က    | 0    | 0    | 7174 | 104  | 6    | 1    | Ч    | 0    | 0    | IJ   | 4    | 3.91           | 6.63  |
| $\mathbf{s}$   | 0        | 0    | 0    | 0        | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 100            | 0.00  |
| ſ              | -        | 0    | Н    | 0        | 0    | 46   | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 4.17           | 0.04  |
| a              | $\infty$ | 0    | 0    | 0        | 121  | 0    | 0    | က    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 9.70           | 0.12  |
| Α              | 112      | 85   | 27   | 2133     | 1    | 0    | 0    | 43   | 0    | 9    | 1    | 0    | 0    | 0    | 0    | 0    | 11.4           | 2.13  |
| Я              | 17       | Η    | 7280 | 68       | 0    | 29   | 0    | 9    | 4    | 0    | 0    | ю    | 25   | 0    | 0    | 0    | 2.08           | 6.60  |
| Γ              | 0        | 8019 | Н    | 2        | 0    | 0    | 0    | 17   | 2    | 0    | 0    | 0    | H    | 0    | 0    | 1    | 0.30           | 7.14  |
| N              | 76245    | 0    | 119  | 426      | 26   | 12   | 2    | 260  | 140  | 0    | 16   | 98   |      | 6    | 66   | 10   | 1.53           | 68.8  |
|                | N        | L    | Я    | Α        | в    | Ŀ    | S    | >    |      | q    | e    | •.   | Ē    | Ч    | Ļ    | 0°   | $\epsilon$ (%) | F (%) |

Table 4.1: The SOM Clustering Results for Different Types of Beats.

# 4.4 Simulation Results

We simulated the proposed architecture by using MIT-BIH Arrhythmia database [100]. The 50% of the data is used for learning samples, while the other 50% is used for clustering. The SOM clustering results for different types of beats are tabulated in Table 4.1. In order for a fair comparison, we followed the table formatting introduced in [97]. As shown in Table I, the proposed algorithm can cluster 16 different types of beats. As in [97], we could not cluster supraventricular premature beat (S beat) and atrial escape beat (e beat) due to lack of learning samples in the database.

| Recording | Accuracy (%) | Recording | Accuracy (%) |  |  |
|-----------|--------------|-----------|--------------|--|--|
| 100       | 99.91        | 201       | 96.76        |  |  |
| 101       | 99.79        | 202       | 98.23        |  |  |
| 102       | 99.13        | 203       | 96.43        |  |  |
| 103       | 99.90        | 205       | 99.44        |  |  |
| 104       | 92.81        | 207       | 91.10        |  |  |
| 105       | 99.37        | 208       | 96.08        |  |  |
| 106       | 99.19        | 209       | 96.59        |  |  |
| 107       | 100.00       | 210       | 98.06        |  |  |
| 108       | 98.57        | 212       | 99.57        |  |  |
| 109       | 99.84        | 213       | 94.65        |  |  |
| 111       | 100.00       | 214       | 99.30        |  |  |
| 112       | 99.92        | 215       | 99.79        |  |  |
| 113       | 100.00       | 217       | 96.45        |  |  |
| 114       | 99.26        | 219       | 99.01        |  |  |
| 115       | 100.00       | 220       | 99.52        |  |  |
| 116       | 99.92        | 221       | 99.47        |  |  |
| 117       | 99.87        | 222       | 83.86        |  |  |
| 118       | 96.87        | 223       | 93.68        |  |  |
| 119       | 99.81        | 228       | 98.64        |  |  |
| 121       | 99.95        | 230       | 99.96        |  |  |
| 122       | 100.00       | 231       | 93.83        |  |  |
| 123       | 100.00       | 232       | 98.90        |  |  |
| 124       | 97.18        | 233       | 99.11        |  |  |
| 200       | 97.85        | 234       | 99.71        |  |  |
| Average   | accuracy (%) | 97.94     |              |  |  |

Table 4.2: SOM Clustering Results by Recordings

Table 4.2 shows the clustering results by recordings. The average clustering accu-

| Reference | Method                 | Number of clusters | Accuracy (%) |  |  |
|-----------|------------------------|--------------------|--------------|--|--|
| Proposed  | SOM with R-R interval  | 16                 | 97.94        |  |  |
| [95]      | SOM                    | 4                  | 94           |  |  |
| [97]      | SOM with Hermite basis | 16                 | 98.5         |  |  |
| [98]      | SOCMAC                 | 16                 | 98.21        |  |  |
| [99]      | SOM with MRA           | -                  | 95.25        |  |  |

Table 4.3: Comparisons with Other Algorithms

racy is 97.94%.

The two least accurate recording are 222 and 207. Recording 222 has mainly N, A, and j waves. Since the number of samples of A and j waves in 222 are relatively smaller than N wave (i.e. 10% of N wave), it failed to cluster some samples of N and j waves. Recording 207 consists of L, R, A, V, b, and E waves. Similarly, the number of samples of R, a, V, and E waves are less than 10% of L wave, which leads the low clustering accuracy for 207. The comparison with other works is presented in Table 4.3. This work achieved a comparable accuracy of clustering with the previous works. Although [97] and [98] achieved higher accuracy, [97] did not include recording 222, while [98] failed to cluster unclassified beat (Q beat). In addition, [97] is very difficult to implement hardware, and [98] has a convergence issue if the SOM network is not carefully initialized. The clustering latency from when an R-peak is detected is 128 Learning\_CLK cycles (i.e.  $128 \div 9 \text{ kHz} = 14.2 \text{ ms}$ ). The clustering results of recording 230 is shown in Figure 4.3. Similar QRS complexes are topologically mapped nearby.

## 4.5 Conclusions

In this chapter, we present a hardware architecture for ECG clustering by means of SOM as well as its implementation in 65nm CMOS LP. A filtered ECG signal samples along with its corresponding R-R interval are used as an input vector. The clustering accuracy is 97.94% only by the original SOM algorithm. The chip layout is shown in Figure 4.4, and the summary is tabulated in Table 4.4. The area of


Figure 4.3: Example results of SOM clustering.

| Technology                 | 65nm CMOS technology                         |  |  |
|----------------------------|----------------------------------------------|--|--|
| Die Size                   | $1,735 \times 1,020 \ \mu m^2$               |  |  |
| Die Size                   | 1,475×760 $\mu m^2$ (core)                   |  |  |
| Operating Supply Voltage   | 1.2 V                                        |  |  |
|                            | $3,812 \ \mu m^2$ (address sequencer)        |  |  |
| Component Area             | $6,930 \ \mu m^2$ (FIR filter)               |  |  |
|                            | $19,944 \ \mu m^2$ (learning block)          |  |  |
|                            | 91,663 $\mu m^2$ (memory)                    |  |  |
|                            | $2,689 \ \mu m^2$ (R-peak detector)          |  |  |
|                            | 828,800 $\mu m^2$ (SOM network)              |  |  |
|                            | 360 Hz (CLK)                                 |  |  |
| <b>Operating Frequency</b> | 9 kHz (Learning_CLK)                         |  |  |
|                            | $72 \text{ kHz} (\text{Writing}_\text{CLK})$ |  |  |
| Total Average Power        | $5.853 \mathrm{~mW}$                         |  |  |

Table 4.4: ECG Clustering SOM SoC Summary

the proposed SoC is  $1,735 \times 1,020 \ \mu m^2$ , and the core size is  $1,475 \times 760 \ \mu m^2$ . The total average power consumption is 5.853 mW at VDD=1.2V. The comparison with the-state-of-the-art is tabulated in Table 4.5.



Figure 4.4: The proposed ECG clustering SOM chip layout.

| Technology          | 65nm CMOS                 | 800nm CMOS                  | 650nm CMOS                  | FPGA<br>(masked 350nm<br>standard-cell) | 180nm CMOS                                                        | 180nm CMOS                          | FPGA           |
|---------------------|---------------------------|-----------------------------|-----------------------------|-----------------------------------------|-------------------------------------------------------------------|-------------------------------------|----------------|
| Clock Freq.         | 360 Hz<br>9 kHz<br>72 kHz | $45 \mathrm{MHz}$           | $45 \mathrm{MHz}$           | $100 \mathrm{MHz}$                      | $1  \mathrm{kHz}$                                                 | $50 \mathrm{MHz}$                   | 33 MHz         |
| Area                | <b>1.12</b> $mm^2$ (core) | $28.58 \ mm^2$              | $21.16\ mm^2$               | I                                       | $5 mm^2$                                                          | $1.6 mm^2 (SOM)$                    | I              |
| Power               | 5.853 mW                  | 425  mW                     | 50  mW                      | I                                       | $10 \text{ mW}(10 \text{ MHz}) \\ 10 \mu \text{W}(1 \text{ kHz})$ | *12.2 mW (SOM)<br>630 mW(full chip) | 1              |
| Architecture        | Custom ASIC               | SIMD NBX<br>neuro-processor | SIMD NBX<br>neuro-processor | Custom                                  | Custom ASIC                                                       | Custom ASIC                         | Custom         |
| Vector<br>Dimension | 256                       | 128                         | 16                          | 16                                      | ı                                                                 | 16                                  | 3              |
| Map Size            | $5 \times 5$              | $16 \times 8$               | 256                         | 256                                     | 130 (w/o memory)                                                  | $16 \times 16$                      | $16{\times}16$ |
| Reference           | This Work                 | [101]                       | [102]                       | [103]                                   | [104]                                                             | [105]                               | [106]          |

### CHAPTER V

## Conclusions

This work focused on ultra-low power SoC design techniques for BSN applications, especially ECG sensing and clustering applications. In order to achieve ultra-low power on SoC, duty-cycling is critical. There are mainly two methods to increase duty-cycle: Method 1, reducing the size of transferred data by compression or feature extraction and Method 2, increasing transmission rate (i.e. RF carrier frequency).

Design techniques for achieving ultra-low power bio-signal sensing SoC are described in Chapter II. For maximizing duty-cycling, 2.4 GHz ISM radio band was adopted (Method 2). Although there are other options, such as 5.8 GHz band, 2.4 GHz band is regarded a better option in terms of communication range. In other words, under the same range constraint, 2.4 GHz band saves power more than 5.8 GHz band since the wavelength of the 5.8 GHz band is half of the wavelength of 2.4 GHz band, and the 2.4 GHz band is well established in terms of its infrastructure and products. Moreover, the prototype SoC provides five different operation modes which can be chosen by a user so that power saving can be achieved further. Multiple power domains for analog (1.0V), digital (0.55V), and RF analog blocks (1.2V) are adopted to maximize power savings. Low supply voltage is applied to always-poweron blocks (i.e. digital blocks), while nominal supply voltage is applied to RF blocks since the current is highly related to their communication range, and the current is proportional to the supply voltage. Thanks to the low-power techniques applied and aggressive duty-cycling (0.2526%), the prototype SoC consumes only 6.1  $\mu$ W in full operation mode and 912 nW in standby mode.

We discussed a robust SRAM bit cell design for near-threshold operation in Chapter III. This work is essential to achieve ultra-low power BSN SoC since an SRAM is regarded as the most vulnerable block when scaling the supply voltage. A novel 12T SRAM bit cell was proposed. We compared the proposed 12T bit cell with the conventional 6T and 8T bit cells as well as a 10T bit cell. At VDD=550mV, the proposed 12T SRAM bit cell is more robust than the other cells in terms of static noise margin (i.e. WSNM, CWWM, and BLWM) and dynamic noise margin. The area overhead of the proposed bit cell is 1.96 times and 1.74 times greater than the 6T and 8T bit cells, respectively. Compared to 10T cell, 12T bit cell has less than 7% area overhead.

Finally, an energy-efficient hardware architecture of a self-organizing map (SOM) for ECG clustering is discussed in Chapter IV. This work further increases dutycycle through self-diagnosis so that the SoC does not need to transmit any signal unless it detects abnormalities from a patient (Method 1). The hardware consists of a pre-processing block and a SOM block. It detects an R-peak, reconstructs the QRS complex around it, and clusters the complex by calculating the Euclidean distance between the complex and the weight vectors of each cell in the SOM network (i.e.  $5\times5$  cells). In the operation mode, the cluster ID related to the minimum Euclidean distance is provided, while the tagged weight vectors are updated in the learning mode. The proposed SoC is  $1,735\times1,020 \ \mu m^2$  in CMOS 65nm LP, and it consumes 5.853mW at VDD=1.2V.

#### 5.1 Related Publications and Patents

- Jaeyoung Kim and Pinaki Mazumder, "Energy-Efficient Hardware Architecture of Self-Organizing Map (SOM) for ECG Clustering in 65nm CMOS", *IEEE Transactions on Circuits and Systems II: Express Briefs* (under review)
- Jaeyoung Kim, Nan Zheng, Yalcin Yilmaz, and Pinaki Mazumder, "A 2.4 GHz ISM-Band 6.1 W 10-bit Wireless Sensor Node Chip Design for Ultralow-voltage Bio-signal Sensing Applications with On/Off Keying Modulation in 65nm CMOS", *Integration, the VLSI Journal* (under review)
- Jaeyoung Kim and Pinaki Mazumder, "A Robust 12T SRAM Cell with Improved Write Margin for Ultra-Low Power Applications in 40nm CMOS", Integration, the VLSI Journal, vol. 57, pp. 1-10, 2017. [18]
- Nan Zheng, Jaeyoung Kim, and Pinaki Mazumder, "A Low-Power Reconfigurable CMOS Power Amplifier for Wireless Sensor Network Applications", Proc. IEEE Int. Symp. Circuits and Systems, May 2014. [57]
- Jaeyoung Kim, Kwen-Siong Chong, Joseph S. Chang, and Pinaki Mazumder, "A 250mV Sub-threshold Asynchronous 8051 Microcontroller with a Novel 16T SRAM Cell for Improved Reliability in 40nm CMOS", Proceedings of the 23rd ACM International Conference on Great Lakes Symposium on VLSI, Paris, pp. 83-88, 2013. [17]
- Kwen-Siong Chong, Mahmood Barangi, Jaeyoung Kim, Joseph S. Chang, and Pinaki Mazumder, "Ultra Low-Power Filter Bank for Hearing Aid Speech Processor", 2012 IEEE Subthreshold Microelectronics Conference, Waltham, MA, 2012. [55]
- Pinaki Mazumder, Jaeyoung Kim, Nan Zheng, "Static random access memory cell having improved write margin for use in ultra-low power application",

(WO2015102569/US20160254045), published on September 1, 2016.

# BIBLIOGRAPHY

#### BIBLIOGRAPHY

- [1] W. H. Organization, "The top 10 causes of death," 2014.
- [2] N. C. for Health Statistics, C. for Disease Control, and Preventi, Health, United States, 2013, with special feature on prescription drugs. Government Printing Office, 2015.
- [3] A. D. Waller, "A demonstration on man of electromotive changes accompanying the heart's beat," *The Journal of physiology*, vol. 8, no. 5, p. 229, 1887.
- [4] G. J. Pottie and W. J. Kaiser, "Wireless integrated network sensors," Commun. ACM, vol. 43, pp. 51–58, May 2000.
- [5] G. Asada, M. Dong, T. Lin, F. Newberg, G. Pottie, W. Kaiser, and H. Marcy, "Wireless integrated network sensors: Low power systems on a chip," in *Solid-State Circuits Conference*, 1998. ESSCIRC '98. Proceedings of the 24th European, pp. 9 – 16, sept. 1998.
- [6] M. Seok, S. Hanson, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "The phoenix processor: A 30pw platform for sensor applications," in VLSI Circuits, 2008 IEEE Symposium on, pp. 188–189, june 2008.
- [7] L. Benini, G. De Micheli, E. Macii, D. Sciuto, and C. Silvano, "Asymptotic zerotransition activity encoding for address busses in low-power microprocessorbased systems," in VLSI, 1997. Proceedings. Seventh Great Lakes Symposium on, pp. 77–82, mar 1997.
- [8] B. Stackhouse, S. Bhimji, C. Bostak, D. Bradley, B. Cherkauer, J. Desai, E. Francom, M. Gowan, P. Gronowski, D. Krueger, C. Morganti, and S. Troyer, "A 65 nm 2-billion transistor quad-core itanium processor," *Solid-State Circuits, IEEE Journal of*, vol. 44, pp. 18–31, jan. 2009.
- [9] T. Kuroda, K. Suzuki, S. Mita, T. Fujita, F. Yamane, F. Sano, A. Chiba, Y. Watanabe, K. Matsuda, T. Maeda, T. Sakurai, and T. Furuyama, "Variable supply-voltage scheme for low-power high-speed cmos digital design," *Solid-State Circuits, IEEE Journal of*, vol. 33, pp. 454–462, mar 1998.
- [10] M. Nakai, S. Akui, K. Seno, T. Meguro, T. Seki, T. Kondo, A. Hashiguchi, H. Kawahara, K. Kumano, and M. Shimura, "Dynamic voltage and frequency management for a low-power embedded microprocessor," *Solid-State Circuits*, *IEEE Journal of*, vol. 40, pp. 28–35, Jan 2005.

- [11] A. Drake, R. Senger, H. Deogun, G. Carpenter, S. Ghiasi, T. Nguyen, N. James, M. Floyd, and V. Pokala, "A distributed critical-path timing monitor for a 65nm high-performance microprocessor," in *Solid-State Circuits Conference*, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pp. 398 -399, feb. 2007.
- [12] J. Tschanz, K. Bowman, S.-L. Lu, P. Aseron, M. Khellah, A. Raychowdhury, B. Geuskens, C. Tokunaga, C. Wilkerson, T. Karnik, and V. De, "A 45nm resilient and adaptive microprocessor core for dynamic variation tolerance," in *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International*, pp. 282–283, feb. 2010.
- [13] S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. Bull, and D. Blaauw, "Razorii: In situ error detection and correction for pvt and ser tolerance," *Solid-State Circuits, IEEE Journal of*, vol. 44, pp. 32–48, jan. 2009.
- [14] R. Franch, P. Restle, N. James, W. Huott, J. Friedrich, R. Dixon, S. Weitzel, K. Van Goor, and G. Salem, "On-chip timing uncertainty measurements on ibm microprocessors," in *Test Conference*, 2007. ITC 2007. IEEE International, pp. 1–7, oct. 2007.
- [15] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, "Parameter variations and impact on circuits and microarchitecture," in *Proceedings of the 40th annual Design Automation Conference*, DAC '03, (New York, NY, USA), pp. 338–342, ACM, 2003.
- [16] R. Jorgenson, L. Sorensen, D. Leet, M. Hagedorn, D. Lamb, T. Friddell, and W. Snapp, "Ultralow-power operation in subthreshold regimes applying clockless logic," *Proceedings of the IEEE*, vol. 98, pp. 299–314, feb. 2010.
- [17] J. Kim, K.-S. Chong, J. S. Chang, and P. Mazumder, "A 250mv sub-threshold asynchronous 8051microcontroller with a novel 16t sram cell for improved reliability in 40nm cmos," in *Proceedings of the 23rd ACM International Conference* on Great Lakes Symposium on VLSI, GLSVLSI '13, (New York, NY, USA), pp. 83–88, ACM, 2013.
- [18] J. Kim and P. Mazumder, "A robust 12t sram cell with improved write margin for ultra-low power applications in 40 nm cmos," *Integration, the VLSI Journal*, vol. 57, pp. 1 – 10, 2017.
- [19] T. Kohonen, "The self-organizing map," Proceedings of the IEEE, vol. 78, pp. 1464–1480, Sep 1990.
- [20] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, "Wireless sensor networks for habitat monitoring," in *Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications*, WSNA '02, (New York, NY, USA), pp. 88–97, ACM, 2002.

- [21] H. Kung and D. Vlah, "Efficient location tracking using sensor networks," in Wireless Communications and Networking, 2003. WCNC 2003. 2003 IEEE, vol. 3, pp. 1954–1961 vol.3, March 2003.
- [22] S. Kim, S. Pakzad, D. Culler, J. Demmel, G. Fenves, S. Glaser, and M. Turon, "Health monitoring of civil infrastructures using wireless sensor networks," in *Information Processing in Sensor Networks, 2007. IPSN 2007. 6th International* Symposium on, pp. 254–263, April 2007.
- [23] U. Lee, B. Zhou, M. Gerla, E. Magistretti, P. Bellavista, and A. Corradi, "Mobeyes: smart mobs for urban monitoring with a vehicular sensor network," *Wireless Communications, IEEE*, vol. 13, pp. 52–57, October 2006.
- [24] V. Gungor and G. Hancke, "Industrial wireless sensor networks: Challenges, design principles, and technical approaches," *Industrial Electronics, IEEE Transactions on*, vol. 56, pp. 4258–4265, Oct 2009.
- [25] Z. Yiming, Y. Xianglong, G. Xishan, Z. Mingang, and W. Liren, "A design of greenhouse monitoring control system based on zigbee wireless sensor network," in Wireless Communications, Networking and Mobile Computing, 2007. WiCom 2007. International Conference on, pp. 2563–2567, Sept 2007.
- [26] G.-Z. Yang and M. Yacoub, *Body sensor networks*. Springer, 2006.
- [27] C. Otto, A. Milenković, C. Sanders, and E. Jovanov, "System architecture of a wireless body area sensor network for ubiquitous health monitoring," J. Mob. Multimed., vol. 1, pp. 307–326, Jan. 2005.
- [28] J. Penders, B. Gyselinckx, R. Vullers, M. De Nil, V. Nimmala, J. van de Molengraft, F. Yazicioglu, T. Torfs, V. Leonov, P. Merken, and C. Van Hoof, "Human++: From technology to emerging health monitoring concepts," in *Medical Devices and Biosensors*, 2008. ISSS-MDBS 2008. 5th International Summer School and Symposium on, pp. 94–98, June 2008.
- [29] S.-L. Chen, H.-Y. Lee, C.-A. Chen, H.-Y. Huang, and C.-H. Luo, "Wireless body sensor network with adaptive low-power design for biometrics and healthcare applications," *Systems Journal*, *IEEE*, vol. 3, pp. 398–409, Dec 2009.
- [30] J. Yoo, L. Yan, S. Lee, Y. Kim, and H.-J. Yoo, "A 5.2 mw self-configured wearable body sensor network controller and a 12 μw wirelessly powered sensor for a continuous health monitoring system," *Solid-State Circuits, IEEE Journal* of, vol. 45, pp. 178–188, Jan 2010.
- [31] H. Eyre, R. Kahn, R. M. Robertson, N. G. Clark, C. Doyle, T. Gansler, T. Glynn, Y. Hong, R. A. Smith, K. Taubert, *et al.*, "Preventing cancer, cardiovascular disease, and diabetes: a common agenda for the american cancer society, the american diabetes association, and the american heart association<sup>\*</sup>," *CA: a cancer journal for clinicians*, vol. 54, no. 4, pp. 190–207, 2004.

- [32] K. Dickstein, A. Cohen-Solal, G. Filippatos, J. J. McMurray, P. Ponikowski, P. A. Poole-Wilson, A. Strömberg, D. J. Veldhuisen, D. Atar, A. W. Hoes, et al., "Esc guidelines for the diagnosis and treatment of acute and chronic heart failure 2008," *European journal of heart failure*, vol. 10, no. 10, pp. 933– 989, 2008.
- [33] J. B. Soriano, J. Zielinski, and D. Price, "Screening for and early detection of chronic obstructive pulmonary disease," *The Lancet*, vol. 374, no. 9691, pp. 721 - 732, 2009.
- [34] F. Locatelli, L. Del Vecchio, and P. Pozzoni, "The importance of early detection of chronic kidney disease," *Nephrology Dialysis Transplantation*, vol. 17, no. suppl 11, pp. 2–7, 2002.
- [35] D. Callahan, Setting limits: medical goals in an aging society with a response to my critics. Georgetown University Press, 1995.
- [36] H. Kehlet and J. B. Dahl, "Anaesthesia, surgery, and challenges in postoperative recovery," *The Lancet*, vol. 362, no. 9399, pp. 1921 – 1928, 2003.
- [37] G. Yu, J. Gao, J. C. Hummelen, F. Wudl, and A. J. Heeger, "Polymer photovoltaic cells: Enhanced efficiencies via a network of internal donor-acceptor heterojunctions," *Science*, vol. 270, no. 5243, pp. 1789–1791, 1995.
- [38] B. Gyselinckx, C. Van Hoof, J. Ryckaert, R. F. Yazicioglu, P. Fiorini, and V. Leonov, "Human++: autonomous wireless sensors for body area networks," in *Custom Integrated Circuits Conference*, 2005. Proceedings of the IEEE 2005, pp. 13–19, Sept 2005.
- [39] V. Leonov, T. Torfs, P. Fiorini, and C. Van Hoof, "Thermoelectric converters of human warmth for self-powered wireless sensor nodes," *Sensors Journal, IEEE*, vol. 7, no. 5, pp. 650–657, 2007.
- [40] N. S. Shenck and J. A. Paradiso, "Energy scavenging with shoe-mounted piezoelectrics," *IEEE micro*, vol. 21, no. 3, pp. 30–42, 2001.
- [41] A. S. Holmes, G. Hong, K. R. Pullen, and K. R. Buffard, "Axial-flow microturbine with electromagnetic generator: design, cfd simulation, and prototype demonstration," in *Micro Electro Mechanical Systems*, 2004. 17th IEEE International Conference on. (MEMS), pp. 568–571, IEEE, 2004.
- [42] S. Meninger, J. O. Mur-Miranda, R. Amirtharajah, A. P. Chandrakasan, and J. H. Lang, "Vibration-to-electric energy conversion," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 9, no. 1, pp. 64–76, 2001.
- [43] P. D. Mitcheson, T. C. Green, E. M. Yeatman, and A. S. Holmes, "Architectures for vibration-driven micropower generators," *Microelectromechanical Systems*, *Journal of*, vol. 13, no. 3, pp. 429–440, 2004.

- [44] A. Lal and J. Blanchard, "Daintiest dynamos [nuclear microbatteries]," Spectrum, IEEE, vol. 41, no. 9, pp. 36–41, 2004.
- [45] H. Kim, S. Kim, N. Van Helleputte, A. Artes, M. Konijnenburg, J. Huisken, C. Van Hoof, and R. F. Yazicioglu, "A configurable and low-power mixed signal soc for portable ecg monitoring applications," *Biomedical Circuits and Systems*, *IEEE Transactions on*, vol. 8, no. 2, pp. 257–267, 2014.
- [46] N. Verma, A. Shoeb, J. Bohorquez, J. Dawson, J. Guttag, and A. P. Chandrakasan, "A micro-power eeg acquisition soc with integrated feature extraction processor for a chronic seizure detection system," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 4, pp. 804–816, 2010.
- [47] G. K. Chen, M. Fojtik, D. Kim, D. Fick, J. Park, M. Seok, M.-T. Chen, Z. Foo, D. Sylvester, and D. Blaauw, "Millimeter-scale nearly perpetual sensor system with stacked battery and solar cells.," in *ISSCC*, vol. 10, pp. 288–289, 2010.
- [48] F. Zhang, Y. Zhang, J. Silver, Y. Shakhsheer, M. Nagaraju, A. Klinefelter, J. Pandey, J. Boley, E. Carlson, A. Shrivastava, et al., "A batteryless 19μw mics/ism-band energy harvesting body area sensor node soc," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International, pp. 298–300, IEEE, 2012.
- [49] D. Jeon, Y.-P. Chen, Y. Lee, Y. Kim, Z. Foo, G. Kruger, H. Oral, O. Berenfeld, Z. Zhang, D. Blaauw, et al., "24.3 an implantable 64nw ecg-monitoring mixedsignal soc for arrhythmia diagnosis," in *Solid-State Circuits Conference Digest* of Technical Papers (ISSCC), 2014 IEEE International, pp. 416–417, IEEE, 2014.
- [50] L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, et al., "Stable sram cell design for the 32 nm node and beyond," in VLSI Technology, 2005. Digest of Technical Papers. 2005 Symposium on, pp. 128–129, IEEE, 2005.
- [51] R. Barrett, E. Callaway, and J. Gutierrez, "Ieee 802.15. 4 low-rate wireless personal area networks: Enabling wireless sensor networks," *Inst of Elect & Electronic*, 2003.
- [52] A. M. Abo and P. R. Gray, "A 1.5-v, 10-bit, 14.3-ms/s cmos pipeline analog-todigital converter," *Solid-State Circuits, IEEE Journal of*, vol. 34, no. 5, pp. 599– 606, 1999.
- [53] J. L. McCreary and P. R. Gray, "All-mos charge redistribution analog-to-digital conversion techniques. i," *Solid-State Circuits, IEEE Journal of*, vol. 10, no. 6, pp. 371–379, 1975.
- [54] M. Miyahara, Y. Asada, D. Paik, and A. Matsuzawa, "A low-noise selfcalibrating dynamic comparator for high-speed adcs," in *Solid-State Circuits Conference, 2008. A-SSCC'08. IEEE Asian*, pp. 269–272, IEEE, 2008.

- [55] K.-S. Chong, M. Barangi, J. Kim, J. Chang, and P. Mazumder, "Ultra lowpower filter bank for hearing aid speech processor," in *Subthreshold Microelectronics Conference (SubVT)*, 2012 IEEE, pp. 1–3, Oct 2012.
- [56] L. Chang, D. Fried, J. Hergenrother, J. Sleight, R. Dennard, R. Montoye, L. Sekaric, S. McNab, A. Topol, C. Adams, K. Guarini, and W. Haensch, "Stable sram cell design for the 32 nm node and beyond," in VLSI Technology, 2005. Digest of Technical Papers. 2005 Symposium on, pp. 128–129, June 2005.
- [57] N. Zheng, J. Kim, and P. Mazumder, "A low-power reconfigurable cmos power amplifier for wireless sensor network applications," in *Circuits and Systems* (ISCAS), 2014 IEEE International Symposium on, pp. 1086–1089, IEEE, 2014.
- [58] "International technology roadmap for semiconductors," tech. rep., 2011.
- [59] K. Nowka, G. Carpenter, E. MacDonald, H. Ngo, B. Brock, K. Ishii, T. Nguyen, and J. Burns, "A 32-bit powerpc system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling," *Solid-State Circuits, IEEE Journal of*, vol. 37, pp. 1441–1447, Nov 2002.
- [60] J. Rabaey, J. Ammer, T. Karalar, S. Li, B. Otis, M. Sheets, and T. Tuan, "Picoradios for wireless sensor networks: the next challenge in ultra-low power design," in *Solid-State Circuits Conference*, 2002. Digest of Technical Papers. ISSCC. 2002 IEEE International, vol. 1, pp. 200–201 vol.1, Feb 2002.
- [61] A. Wang and A. Chandrakasan, "A 180mv fft processor using subthreshold circuit techniques," in Solid-State Circuits Conference, 2004. Digest of Technical Papers. ISSCC. 2004 IEEE International, pp. 292–529 Vol.1, Feb 2004.
- [62] A. Chandrakasan, S. Sheng, and R. Brodersen, "Low-power cmos digital design," *Solid-State Circuits, IEEE Journal of*, vol. 27, pp. 473–484, Apr 1992.
- [63] Q. Wu, M. Pedram, and X. Wu, "Clock-gating and its application to low power design of sequential circuits," *Circuits and Systems I: Fundamental Theory and Applications, IEEE Transactions on*, vol. 47, pp. 415–420, Mar 2000.
- [64] H. Zhang, V. George, and J. Rabaey, "Low-swing on-chip signaling techniques: effectiveness and robustness," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 8, pp. 264–272, June 2000.
- [65] R. Krishnamurthy, S. Hsu, M. Anders, B. Bloechel, B. Chatterjee, M. Sachdev, and S. Borkar, "Dual supply voltage clocking for 5 ghz 130 nm integer execution core," in VLSI Circuits Digest of Technical Papers, 2002. Symposium on, pp. 128–129, June 2002.
- [66] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, "A stable 2-port sram cell design against simultaneously read/write-disturbed accesses," *Solid-State Circuits, IEEE Journal of*, vol. 43, pp. 2109–2119, Sept 2008.

- [67] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "An area-conscious low-voltage-oriented 8t-sram design under dvs environment," in VLSI Circuits, 2007 IEEE Symposium on, pp. 256–257, June 2007.
- [68] N. Verma and A. Chandrakasan, "A 65nm 8t sub-vt sram employing senseamplifier redundancy," in *Solid-State Circuits Conference*, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pp. 328–606, Feb 2007.
- [69] R. Joshi, R. Houle, K. Batson, D. Rodko, P. Patel, W. Huott, R. Franch, Y. Chan, D. Plass, S. Wilson, and P. Wang, "6.6+ ghz low vmin, read and half select disturb-free 1.2 mb sram," in VLSI Circuits, 2007 IEEE Symposium on, pp. 250–251, June 2007.
- [70] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake, "A read-static-noise-margin-free sram cell for low-vdd and highspeed applications," *Solid-State Circuits, IEEE Journal of*, vol. 41, pp. 113–121, Jan 2006.
- [71] B. Calhoun and A. Chandrakasan, "A 256kb sub-threshold sram in 65nm cmos," in Solid-State Circuits Conference, 2006. ISSCC 2006. Digest of Technical Papers. IEEE International, pp. 2592–2601, Feb 2006.
- [72] T.-H. Kim, J. Liu, J. Keane, and C. Kim, "A high-density subthreshold sram with data-independent bitline leakage and virtual ground replica scheme," in *Solid-State Circuits Conference*, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pp. 330–606, Feb 2007.
- [73] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy, "A 32kb 10t subthreshold sram array with bit-interleaving and differential read scheme in 90nm cmos," in *Solid-State Circuits Conference*, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, pp. 388–622, Feb 2008.
- [74] Z. Liu and V. Kursun, "Characterization of a novel nine-transistor sram cell," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 16, pp. 488–492, April 2008.
- [75] S. Verkila, S. Bondada, and B. Amrutur, "A 100mhz to 1ghz, 0.35v to 1.5v supply 256 x 64 sram block using symmetrized 9t sram cell with controlled read," in VLSI Design, 2008. VLSID 2008. 21st International Conference on, pp. 560–565, Jan 2008.
- [76] H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi, and M. Yoshimoto, "A 10t non-precharge two-port sram for 74video processing," in VLSI, 2007. ISVLSI '07. IEEE Computer Society Annual Symposium on, pp. 107–112, March 2007.

- [77] J. Chen, L. Clark, and C. Tai-Hua, "An ultra-low-power memory with a subthreshold power supply voltage," *Solid-State Circuits, IEEE Journal of*, vol. 41, pp. 2344–2353, Oct 2006.
- [78] M.-F. Chang, S.-W. Chang, P.-W. Chou, and W.-C. Wu, "A 130 mv sram with expanded write and read margins for subthreshold applications," *Solid-State Circuits, IEEE Journal of*, vol. 46, pp. 520–529, Feb 2011.
- [79] A. Teman, L. Pergament, O. Cohen, and A. Fish, "A 250 mv 8 kb 40 nm ultralow power 9t supply feedback sram (sf-sram)," *IEEE Journal of Solid-State Circuits*, vol. 46, pp. 2713–2726, Nov 2011.
- [80] S. Jain, S. Khare, S. Yada, V. Ambili, P. Salihundam, S. Ramani, S. Muthukumar, M. Srinivasan, A. Kumar, S. K. Gb, R. Ramanarayanan, V. Erraguntla, J. Howard, S. Vangal, S. Dighe, G. Ruhl, P. Aseron, H. Wilson, N. Borkar, V. De, and S. Borkar, "A 280mv-to-1.2v wide-operating-range ia-32 processor in 32nm cmos," in 2012 IEEE International Solid-State Circuits Conference, pp. 66–68, Feb 2012.
- [81] Y.-W. Chiu, Y.-H. Hu, M.-H. Tu, J.-K. Zhao, Y.-H. Chu, S.-J. Jou, and C.-T. Chuang, "40 nm bit-interleaving 12t subthreshold sram with data-aware write-assist," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 61, pp. 2578–2585, Sept 2014.
- [82] P. Meinerzhagen, S. M. Y. Sherazi, A. Burg, and J. N. Rodrigues, "Benchmarking of standard-cell based memories in the sub-v<sub>T</sub> domain in 65-nm cmos technology," *IEEE Journal on Emerging and Selected Topics in Circuits and* Systems, vol. 1, pp. 173–182, June 2011.
- [83] J. Kim, K.-S. Chong, J. S. Chang, and P. Mazumder, "A 250mv sub-threshold asynchronous 8051microcontroller with a novel 16t sram cell for improved reliability in 40nm cmos," in *Proceedings of the 23rd ACM International Conference* on Great Lakes Symposium on VLSI, GLSVLSI '13, (New York, NY, USA), pp. 83–88, ACM, 2013.
- [84] E. Seevinck, F. List, and J. Lohstroh, "Static-noise margin analysis of mos sram cells," *Solid-State Circuits, IEEE Journal of*, vol. 22, pp. 748–754, Oct 1987.
- [85] A. Bhavnagarwala, S. Kosonocky, C. Radens, K. Stawiasz, R. Mann, Q. Ye, and K. Chin, "Fluctuation limits amp; scaling opportunities for cmos sram cells," in *Electron Devices Meeting*, 2005. IEDM Technical Digest. IEEE International, pp. 659–662, Dec 2005.
- [86] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murray, N. Vallepalli, Y. Wang, B. Zheng, and M. Bohr, "A 3-ghz 70-mb sram in 65-nm cmos technology with integrated column-based dynamic power supply," *Solid-State Circuits*, *IEEE Journal of*, vol. 41, pp. 146–151, Jan 2006.

- [87] K. Takeda, H. Ikeda, Y. Hagihara, M. Nomura, and H. Kobatake, "Redefinition of write margin for next-generation sram and write-margin monitoring circuit," in *Solid-State Circuits Conference*, 2006. ISSCC 2006. Digest of Technical Papers. IEEE International, pp. 2602–2611, Feb 2006.
- [88] N. Gierczynski, B. Borot, N. Planes, and H. Brut, "A new combined methodology for write-margin extraction of advanced sram," in *Microelectronic Test Structures, 2007. ICMTS '07. IEEE International Conference on*, pp. 97–100, March 2007.
- [89] H. Makino, S. Nakata, H. Suzuki, S. Mutoh, M. Miyama, T. Yoshimura, S. Iwade, and Y. Matsuda, "Reexamination of sram cell write margin definitions in view of predicting the distribution," *Circuits and Systems II: Express Briefs, IEEE Transactions on*, vol. 58, pp. 230–234, April 2011.
- [90] J. Wang, S. Nalam, and B. H. Calhoun, "Analyzing static and dynamic write margin for nanometer srams," in *Low Power Electronics and Design (ISLPED)*, 2008 ACM/IEEE International Symposium on, pp. 129–134, Aug 2008.
- [91] P. Kligfield, L. S. Gettes, J. J. Bailey, R. Childers, B. J. Deal, E. W. Hancock, G. van Herpen, J. A. Kors, P. Macfarlane, D. M. Mirvis, O. Pahlm, P. Rautaharju, and G. S. Wagner, "Recommendations for the standardization and interpretation of the electrocardiogrampart i: The electrocardiogram and its technology a scientific statement from the american heart association electrocardiography and arrhythmias committee, council on clinical cardiology; the american college of cardiology foundation; and the heart rhythm society endorsed by the international society for computerized electrocardiology," *Journal* of the American College of Cardiology, vol. 49, no. 10, pp. 1109–1127, 2007.
- [92] L. RV, R. JM, O. S, and et al, "Effect of a community intervention on patient delay and emergency medical service use in acute coronary heart disease: The rapid early action for coronary treatment (react) trial," *JAMA*, vol. 284, no. 1, pp. 60–67, 2000.
- [93] B. Hedén, H. Öhlin, R. Rittner, and L. Edenbrandt, "Acute myocardial infarction detected in the 12-lead ecg by artificial neural networks," *Circulation*, vol. 96, no. 6, pp. 1798–1802, 1997.
- [94] G. A. Carpenter and S. Grossberg, Adaptive resonance theory. Springer, 2011.
- [95] M. R. Risk, J. F. Sobh, and J. P. Saul, "Beat detection and classification of ecg using self organizing maps," in *Engineering in Medicine and Biology Society*, 1997. Proceedings of the 19th Annual International Conference of the IEEE, vol. 1, pp. 89–91 vol.1, Oct 1997.
- [96] Y. H. Hu, S. Palreddy, and W. J. Tompkins, "A patient-adaptable ecg beat classifier using a mixture of experts approach," *IEEE Transactions on Biomedical Engineering*, vol. 44, pp. 891–900, Sept 1997.

- [97] M. Lagerholm, C. Peterson, G. Braccini, L. Edenbrandt, and L. Sornmo, "Clustering ecg complexes using hermite functions and self-organizing maps," *IEEE Transactions on Biomedical Engineering*, vol. 47, pp. 838–848, Jul 2000.
- [98] C. Wen, T.-C. Lin, K.-C. Chang, and C.-H. Huang, "Classification of {ECG} complexes using self-organizing {CMAC}," *Measurement*, vol. 42, no. 3, pp. 399 - 407, 2009.
- [99] Y. Chen and H. Yang, "Self-organized neural network for the quality control of 12-lead ecg signals," *Physiological Measurement*, vol. 33, no. 9, p. 1399, 2012.
- [100] R. Mark and G. Moody, "Mit-bih arrhythmia database directory," Cambridge: Massachusetts Institute of Technology, 1988.
- [101] M. Porrmann, U. Witkowski, and U. Ruckert, "A massively parallel architecture for self-organizing feature maps," *IEEE Transactions on Neural Networks*, vol. 14, pp. 1110–1121, Sept 2003.
- [102] D. C. Hendry, A. A. Duncan, and N. Lightowler, "Ip core implementation of a self-organizing neural network," *IEEE Transactions on Neural Networks*, vol. 14, pp. 1085–1096, Sept 2003.
- [103] A. Ramirez-Agundis, R. Gadea-Girones, and R. Colom-Palero, "A hardware design of a massive-parallel, modular nn-based vector quantizer for real-time video coding," *Microprocessors and Microsystems*, vol. 32, no. 1, pp. 33 – 44, 2008.
- [104] R. Dlugosz, M. Kolasa, W. Pedrycz, and M. Szulc, "Parallel programmable asynchronous neighborhood mechanism for kohonen som implemented in cmos technology," *IEEE Transactions on Neural Networks*, vol. 22, pp. 2091–2104, Dec 2011.
- [105] C. Shi, J. Yang, Y. Han, Z. Cao, Q. Qin, L. Liu, N. J. Wu, and Z. Wang, "A 1000 fps vision chip based on a dynamically reconfigurable hybrid architecture comprising a pe array processor and self-organizing map neural network," *IEEE Journal of Solid-State Circuits*, vol. 49, pp. 2067–2082, Sept 2014.
- [106] H. Hikawa and Y. Maeda, "Improved learning performance of hardware selforganizing map using a novel neighborhood function," *IEEE Transactions on Neural Networks and Learning Systems*, vol. 26, pp. 2861–2873, Nov 2015.