# LOW POWER CIRCUITS FOR MINIATURE SENSOR SYSTEMS

by

Yu-Shiang Lin

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in The University of Michigan 2008

Doctoral Committee:

Associate Professor Dennis M. Sylvester, Chair Professor David T. Blaauw Associate Professor Michael P. Flynn Professor Marios C. Papaefthymiou



To My Family

## ACKNOWLEDGEMENTS

When I looks back the days pursuing for the Ph.D, I see challenges and lots of precious memories. Coming to a foreign country for living and study for the first time in my life, a lot of people have helped me throughout the past five years.

I always know that the VLSI program at the University of Michigan was my top choice, especially Professor Sylvester's group. However, not until I joined the group I realized how fortunate I was to choose the right program. I was always given all the resources I need to implement my ideas under Professor Sylvester's guidance.

I want to thank Professor Blaauw for his advices on the researches. Having two advisors on research has always been a positive experience to me. Also I want to thank Professor Flynn and Professor Papaefthymiou to become my committee members and provide their expertise.

I would like to thank the stuffs from the EECS department. You make my life so much easier. It has been a pleasure to work with a group of brilliant people in our lab. I find myself always learning new things from you.

Last but not least, I want to dedicate my Ph.D to my family for their supporting. My wife Yen-Ting takes care of most daily routines for me and always encourages me. My son Andruw always gives me his big smile when I come back home. It is my parents who make who I am.

May the best to all of you.

# TABLE OF CONTENTS

| DEDIC  | CATIO  | ${f N}$                                                    | ii |
|--------|--------|------------------------------------------------------------|----|
| ACKN   | OWLE   | DGEMENTS                                                   | ii |
| LIST C | )F FIG | URES                                                       | vi |
| LIST C | OF TAI | BLES                                                       | ix |
| I.     |        | RODUCTION                                                  | 1  |
|        | 1.1    | Overview                                                   | 1  |
|        |        | 1.1.1 Sensors                                              | 3  |
|        |        | 1.1.2 Microcontroller                                      | 4  |
|        |        | 1.1.3 Storage elements                                     | 5  |
|        |        | 1.1.4 Communication module                                 | 7  |
|        |        | 1.1.5 Timer                                                | 8  |
|        |        | 1.1.6 Power source $\ldots$                                | 8  |
|        | 1.2    | Low power sensor system                                    | 10 |
| II.    |        | RA LOW POWER TIMER DESIGN FOR SENSOR                       |    |
|        | APPI   |                                                            | 18 |
|        | 2.1    |                                                            | 18 |
|        | 2.2    | A Sub-pW gate leakage timer                                | 20 |
|        |        | 0                                                          | 20 |
|        |        |                                                            | 24 |
|        | 2.3    | 1 1 1                                                      | 27 |
|        |        | 2.3.1 Oscillator with self temperature compensated current |    |
|        |        |                                                            | 28 |
|        |        |                                                            | 30 |
|        |        | 1                                                          | 33 |
|        | 2.4    |                                                            | 10 |
| III.   | AN U   | JLTRA LOW POWER 1V, 220NW TEMPERATURE                      |    |
|        |        |                                                            | 12 |
|        | 3.1    |                                                            | 12 |
|        | 3.2    | 1 1 0                                                      | 13 |
|        | 3.3    |                                                            | 18 |
|        | 3.4    |                                                            | 51 |
|        | 3.5    | Improving the voltage sensitivity                          | 52 |

| IV.           | SINGLE STAGE STATIC LEVEL SHIFTER DESIGN FOR                                                             |     | _  |
|---------------|----------------------------------------------------------------------------------------------------------|-----|----|
|               | SUBTHRESHOLD TO I/O VOLTAGE CONVERSION                                                                   |     | 5  |
|               | 4.1 Introduction $\ldots$                                                                                |     | 5  |
|               | 4.2 Conventional approach                                                                                |     | 5  |
|               | 4.3 Proposed approach                                                                                    |     | 5  |
|               | 4.4 Simulation results                                                                                   | •   | 6  |
|               | 4.5 Conclusion                                                                                           |     | 6  |
| $\mathbf{V}.$ | SENSOR DATA RETRIEVAL USING ALIGNMENT INDE                                                               |     |    |
|               | PENDENT CAPACITIVE SIGNALING                                                                             |     | 6  |
|               | 5.1 Introduction $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$        |     | 6  |
|               | 5.2 Geometry optimization                                                                                |     | 6  |
|               | 5.2.1 Sizing of the sensor pad $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                              |     | 7  |
|               | 5.2.2 Single-ended vs. differential signaling $\ldots$ $\ldots$                                          | •   | 7  |
|               | 5.3 System architecture                                                                                  | •   | 7  |
|               | 5.3.1 Data retrieval circuits design $\ldots$ $\ldots$ $\ldots$                                          | •   | 7  |
|               | 5.3.2 Sensor chip circuit design $\ldots \ldots \ldots \ldots \ldots$                                    | •   | 7  |
|               | 5.4 Chip measurement $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$             |     | 8  |
|               | 5.4.1 Test chip $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                           |     | 8  |
|               | 5.4.2 Alignment detection $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                          |     | 8  |
|               | 5.4.3 Measurement results                                                                                |     | 8  |
|               | 5.5 Conclusions                                                                                          | •   | 9  |
| VI.           | NEAR FIELD INDUCTIVE COUPLING USING PLL PHA                                                              | SE- | -  |
|               | LOCKING AND PULSE SIGNALING                                                                              | •   | 6  |
|               | 6.1 Introduction $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$        | •   | 9  |
|               | 6.2 System architecture                                                                                  |     | 9  |
|               | 6.2.1 Integrated inductor                                                                                | •   | 9  |
|               | 6.2.2 Transponder circuits                                                                               |     | 9  |
|               | 6.2.3 Reader circuits                                                                                    |     | 10 |
|               | 6.3 Measurement results                                                                                  |     | 10 |
|               | 6.4 Conclusion $\ldots$ |     | 11 |
| VII.          | CONTRIBUTIONS AND FUTURE WORKS                                                                           |     | 11 |
|               | 7.1 Contributions                                                                                        |     | 11 |
|               | 7.2 Future works                                                                                         |     | 11 |
|               |                                                                                                          |     |    |
|               | OGRAPHY                                                                                                  | -   | 11 |

# LIST OF FIGURES

| 1.1 Illustration of the building blocks of a monitoring system                      |        |
|-------------------------------------------------------------------------------------|--------|
| 1.2 The relationship between supply voltage and energy consumption per instruction. |        |
|                                                                                     |        |
| 1 0 1                                                                               |        |
| 1.4 Monitoring system considering power gating scheme                               | . 15   |
| 2.1 Illustration of the lifetime of a sensor system                                 | . 19   |
| 2.2 The concept of a one-shot oscillator. (a) The circuit diagram. (b               | )      |
| The operation waveform                                                              |        |
| 2.3 Proposed timer structure for subthreshold operation                             | . 23   |
| 2.4 Power consumption vs. supply voltage at different temperature point             | nts 24 |
| 2.5 Timer period vs. temperature at various supply voltages                         | . 25   |
| 2.6 Output period scatter plot highlighting die-to-die and within-die vari          | -      |
| ations.                                                                             | . 26   |
| 2.7 Timer output period variation with respect to time                              | . 27   |
| 2.8 The bias stage showing the voltage divider and a resistor based self            | -      |
| biasing loop.                                                                       | . 29   |
| 2.9 One-shot oscillator for timer output                                            | . 31   |
| 2.10 One-shot oscillator for timer output                                           | . 32   |
| 2.11 Circuits for charge holding. (a) type I, (b) type II                           | . 33   |
| 2.12 Timing diagram of the program-and-hold method.                                 | . 34   |
| 2.13 Die photo of the timer test chip                                               | . 35   |
| 2.14 Normalized frequency vs. temperature and supply voltage                        | . 36   |
| 2.15 Average of normalized frequency drift over time                                | . 37   |
| 2.16 Refresh the timer every four minutes with 1.1 second programming               | g      |
| time                                                                                | . 38   |
| 2.17 Normalized frequency vs. programming time                                      | . 39   |
| 2.18 Power consumption at the programming mode and the active mod                   | e      |
| with respect to different temperatures                                              | . 39   |
| 2.19 Power consumption and frequency deviation with different refreshing            | g      |
| time                                                                                | . 40   |
| 3.1 Temperature sensor block diagram.                                               | . 44   |
| 3.2 Schematic for $I_{PTAT}$ generation                                             |        |
| 3.3 Schematic for $I_{ref}$ generation.                                             |        |

| 3.4            | Block diagram and timing diagram of the sensor controller                                                                                         | 47              |
|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| 3.5            | Die photo of the temperature sensor                                                                                                               | 48              |
| 3.6            | Power consumption of the temperature sensor.                                                                                                      | 49              |
| 3.7            | Temperature inaccuracy of the temperature sensor with two-point                                                                                   | 50              |
| 3.8            | Temperature inaccuracy over samples (top: 10 samples/s; bottom:                                                                                   | 51              |
| 3.9            | <b>i</b> / /                                                                                                                                      | 51 $52$         |
| 3.10           | 1                                                                                                                                                 | 53              |
| 4.1            | Conventional DCVS-type level shifter with cross-coupled pull-up tran-<br>sistors.                                                                 | 56              |
| 4.2            | Simulation results showing the operating frequency with respect to pull-down transistor width Wn.                                                 | 57              |
| 4.3            | Proposed approach that uses input voltage independent diode-connected transistor stacks for pull-up devices.                                      | 58              |
| 4.4            |                                                                                                                                                   | 59              |
| 4.5            |                                                                                                                                                   | 60              |
| 4.6            |                                                                                                                                                   | 61              |
| 4.7            | Sizing of diode-connected stacked PMOS (Wp) versus gate delay and power dissipation.                                                              | 62              |
| 4.8            | 1 1                                                                                                                                               | 63              |
| 4.9            | Monte Carlo simulation results showing gate delay variation across                                                                                | ~               |
|                | process spread                                                                                                                                    | 65              |
| 5.1            | The relative position of the receiver array and sensor signal pad when $W_{RX} <= \sqrt{2}W_{TX}$ .                                               | 70              |
| 5.2            | Relationship between pad size of sensor chip to the coupling capaci-                                                                              | 73              |
| 5.3            | Differential signaling scheme. Pad A (square with slant lines) together with all the other pads in light gray are used to recover the signal from | 10              |
|                | 1 0 0 0                                                                                                                                           | 74              |
| 5.4            |                                                                                                                                                   | 76              |
| 5.5            | · · ·                                                                                                                                             | $\frac{1}{77}$  |
| 5.6            |                                                                                                                                                   | $\frac{1}{78}$  |
| 5.7            | Data retrieval with capacitive coupled input and periodic precharge                                                                               | •••             |
| 0.1            |                                                                                                                                                   | 80              |
| 5.8            | Timing diagram showing the operation of data retrieval circuits when                                                                              | 00              |
| 0.0            |                                                                                                                                                   | 80              |
| 5.9            |                                                                                                                                                   | 81              |
| 5.5<br>5.10    | Voltage limiter. (a) circuits diagram, (b) open loop voltage transfer                                                                             | <u></u>         |
| 0.10           |                                                                                                                                                   | 82              |
| 5.11           |                                                                                                                                                   | $\frac{62}{83}$ |
| $5.11 \\ 5.12$ |                                                                                                                                                   | 84              |
| 5.13           |                                                                                                                                                   | 85              |
| 0.10           | I GEORGEO COMPONICIÓN IOL CHO SYNCOM OL CINO CIMPO IM COUCINE E E E E                                                                             | $\sim 0$        |

| Procedures for alignment detection and pad reconfiguration              | 86                                                                                   |
|-------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| Decoded data waveform showing pseudo random bit sequences up to         |                                                                                      |
| 15 unrepeated cycles                                                    | 87                                                                                   |
| Operating frequency versus transmitting amplitude and carrier fre-      |                                                                                      |
| quency with estimated working distance showing on the second x-axis.    | 88                                                                                   |
| Energy consumption versus transmitting amplitude and carrier fre-       |                                                                                      |
| quency                                                                  | 89                                                                                   |
| (a) $T_w$ versus BER, (b) Clock modulation circuit that defines $T_w$ . | 89                                                                                   |
| Data rate versus BER with 10 random position testing                    | 90                                                                                   |
| Comparison of transponder data encoding with back scattering and        |                                                                                      |
| pulse signaling                                                         | 93                                                                                   |
| System architecture for the proposed pulse signaling method             | 94                                                                                   |
| Power harvesting module with the schematic of the voltage limiter       | 97                                                                                   |
| Schematic for the voltage regulator.                                    | 98                                                                                   |
| Schematic for the voltage regulator.                                    | 100                                                                                  |
| TSPC phase frequency detector. (a) TSPC D filp-flop, (b) circuit        |                                                                                      |
| diagram of the phase frequency detector.                                | 101                                                                                  |
| Schematics of (a) the charge pump, (b) the VCO.                         | 102                                                                                  |
| Timing waveform of the pulse signaling mode.                            | 103                                                                                  |
| Timing waveform of the PLL locking mode                                 | 103                                                                                  |
| The driver circuits for the transponder                                 | 105                                                                                  |
| Pulse generator and output driver of the reader                         | 107                                                                                  |
| Data receiving scheme for the reader                                    | 107                                                                                  |
| Die photo for the reader and transponder of the system                  | 108                                                                                  |
| (a) Test setup with the micromanipulator and the PC board, (b)          |                                                                                      |
| close-up photo                                                          | 109                                                                                  |
| Measured waveform from the oscilloscope showing the output data         |                                                                                      |
| and the clock signal                                                    | 110                                                                                  |
| Measured communication distance with respect to the data rate and       |                                                                                      |
| 0 1                                                                     | 111                                                                                  |
| Measured achievable communication distance with misalignment in         |                                                                                      |
| the x-axis or the y-axis.                                               | 112                                                                                  |
|                                                                         | Decoded data waveform showing pseudo random bit sequences up to 15 unrepeated cycles |

# LIST OF TABLES

| 1.1 | Summary of power generation sources       | 9  |
|-----|-------------------------------------------|----|
| 1.2 | Comparison between small batteries        | 10 |
| 1.3 | Summary of the contributions of the works | 17 |
| 2.1 | Comparison for the timers                 | 41 |
| 3.1 | Comparison of temperature sensors         | 50 |
| 4.1 | Comparison of level shifters.             | 65 |
| 5.1 | Summary of pad dimensions.                | 76 |
| 6.1 | Summary of the integrated inductor        | 95 |

### CHAPTER I

## INTRODUCTION

## 1.1 Overview

Sensor system is ubiquitous in our modern day of livings. Applications such as chemical sensing, biomedical monitoring to even industrial and automotive applications have all made large strides. Nowadays, those systems are more and more cost effective and with higher level of integration, thanks to the highly developed silicon technology [1, 2, 3, 4]. Basically, a sensor system utilizes the transducers to translate the "nonelectric" world to something that the electrical engineers are more familiar with, for instance, in the form of digital or analog signals. By interfacing the nonelectrical properties with signals that can be processed by electronic devices, the sensor system can perform more functions in addition to just "sensing". For example, the sensors can be used to build devices such as beams or diaphragms with their mechanical properties [5]. Such devices when controlled by externally applied voltages, can be used as the actuators. Another example is to continuously monitor an object over an extended period of time. The sensed signal is translated into digital data that can be managed by a microcontroller, which can perform tasks such as compression and store the recorded data in the storage elements. An example is a watchdog system that monitors the condition of perishables [6]. The deterioration process of the food can be represented by chemical reactions that are closely related to the function of activation energy and the integrated temperature over time. The reaction process can be simulated with a simple CMOS circuit with digital outputs. In this way, the condition of the perishables can be monitored throughout the lifetime. Another example is commonly implemented in modern VLSI microprocessors, where a thermal sensor is used for hot-spot detection and thermal management [7].

With RF modules, wireless sensor network (WSN) brings even broader applications such as environmental monitoring. Compact sensor nodes can be widely distributed to collect environmental related data such as the temperature and humidity inside an ecosystem [8]. The lifetime of the wireless sensor network is limited by the energy consumption of the individual sensor node. Most components in the sensor node need to be turned off when not used due to the power sources with limited capacifies. A wakeup receiver that consumes less than  $100\mu W$  was proposed to provide low standby power and to activate the main receiver upon request [9]. When the RF input power is reduced or when the wakeup sequence is shortened, the mean time between false alarm will also be decreased. This results in a more frequent wakeup cycle than what is needed for the sensor system. It was shown that by increasing the time of false alarm to more than  $10^{18}$  seconds, -50dBm of sensitivity can be achieved with a 7-bit code. Compared to -100dBm sensitivity of the main receiver, the saving of the power consumption is at the expense of shorter energy range. Further discussion of the wireless sensor network is beyond the scope of this work, we will mostly focus on a single sensor system throughout the chapter.

The advancement of the CMOS technology is the driving force for high performance computing systems. On the other hand, a sensor system can also benefit from higher density and smaller parasitic capacitances from device scaling while the throughput requirement is usually low. Fig. 1.1 illustrates a sensor monitoring system that integrates various functions into the same package. The components include sensor/transducer, power source, controller, storage elements, timer and communication module. In the following section, we will discuss the role of the components in the



Figure 1.1: Illustration of the building blocks of a monitoring system.

monitoring system and the challenges that need to be addressed.

#### 1.1.1 Sensors

Sensors, actuators and microelectronics form the backbone of a MicroElectroMechanical System (MEMS). The underline technology is so-called "micromachining" process that selective etches the silicon wafer or adds additional layers to form the mechanical or electrical devices. Pressure sensors formed by creating a thin diaphragm is an early and successful example of a MEMS sensor. Nowadays, sensors are not limited to mechanical devices only. Instead, thermal, optical, magnetic and chemical sensors have all seen promising developments [10, 11, 12, 13]. Typically, interface circuits such as analog amplifiers or analog-to-digital converters (ADC) are used to convert the signal into formats that can be understand by the microelectronic devices. As an example, a microfabricated capacitive servo pressure sensor was demonstrated with integrated circuits [14]. The cavity between the diaphragm and the electrode is vacuumed so that the pressure sensor detects absolute pressures. The idea is to use a capacitance-to-voltage converter to form a close loop with the servo sensor and to balance the position of the silicon electrode. The amplified voltage corresponds to the absolute pressure is then generated automatically by the sensor.

Temperature sensing is another example with broad interests for all sorts of applications. Unlike other environmental parameters, temperature has an direct and measurable impact on the characteristics of integrated circuit components such as the resistors and the transistors. Therefore, one does not need a special MEMS process to design a temperature sensor that is able to translate the "nonelectrical" property of temperature. For most applications, the ambient temperature varies slowly so that the conversion rate of less than a hundred ms is sufficient. On the other hand, the thermal sensor for a VLSI microprocessor may require less than 1ms of conversion time for effective thermal throttling.

#### 1.1.2 Microcontroller

After the "nonelectrical" property is quantized to digital signals that can be understood by the logic circuits, a microcontroller takes over the control of the system. The microcontroller can be as simple as a finite state machine that performs certain routines like serializing the data to the communication module or a full-fledged general purpose processor that can handle data manipulation and analysis. For sensor applications, the system throughput is generally lower than 1bps. As a result, it spends most of the time idling and still consumes leakage power due to the nature of non-ideal MOS switches. Technology scaling results in smaller parasitics and thus less switching power. On the other hand, to keep up the performance boost over technologies, the threshold voltage needs to be reduced as well and leads to more subthreshold leakages [15].

Fig. 1.2 shows the supply voltage of a processor versus the energy consumption per instruction. First, the graph shows that there is a specific power supply voltage  $(V_{min})$ that produces the minimum energy per instruction for a fixed activity rate  $\alpha$  [16]. This is because when the voltage is high, the circuit can operate at a higher frequency and the total power is dominated by the dynamic power. On the other hand, running at a lower voltage means that the circuits start to spend more time leaking while the dynamic power remains the same. Usually  $\mathrm{V}_{\min}$  is below the threshold voltage so the operating frequency decreases rapidly as the supply voltage reduced from  $V_{\min}$ . When  $\alpha$  increases, the curve shifts upward, and the minimum energy voltage is moved from  $V_{min1}$  to a lower value of  $V_{min2}$ . The analysis of  $V_{min}$  has an underline assumption that the processor consumes zero power after the computation is completed, which is not realistic for a sensor system. In practice, the strength of the power gating transistors should be considered. A footer (or header) transistor is used to provide a virtual rail when the circuit is in the active mode. When idling, it behaves like a high impedance path between the supply rails. There is a tradeoff in deciding the size of the footer transistor. Selecting a weak footer transistor sizes helps reducing leakage but hurts the performance and robustness of the system [17]. Process variation has to be considered when choosing the optimal transistor size. To sum up, operation at subthreshold region and heavily power gating are both required in order to save energy consumption of the microcontroller.

#### **1.1.3** Storage elements

Storage elements serve two purpose: recording the measurement data and providing the instruction routines. SRAM is commonly used as the storage element since it provides good compromise between high density and low latency. While the same minimum energy analysis can be applied to the SRAM devices, SRAM cells that are



Figure 1.2: The relationship between supply voltage and energy consumption per instruction.

designed for nominal voltage operation do not reliably work below 700mV without modifications [18]. The fundamental problem is the loss of  $I_{on}/I_{off}$  ratio while operating at subthreshold regions. The other problem is due to the process variation such that the cells suffer from reduced static noise margin (SNR). An early effort is to replace the SRAM with multiplexer based memory that is able to successfully work at 180mV [19]. To optimize the read and write margin at the same time, a virtual rail can be used to selectively weaken the latch transistors during the write operation [20]. Another solution is to increase the number of transistors in the SRAM cell and by doing so, the size of the cell can be optimized individually for the read and the write operations [21, 22]. Using these techniques, the supply voltage can be lowered to less than 200mV. Ultra low standby power SRAM cell of 10.9fW was reported using stack-forcing and gate length biasing techniques by sacrificing the bitcell area [17].

An alternative for the storage elements is the nonvolatile memories such as ROM, EPROM and FLASH. Such devices do not rely on power sources to sustain the data. Therefore, it is very energy efficient for infrequent operations. ROM is suitable for storing instruction routines because of its high density. However, writable memories are required for recording data. FLASH is widely used for mass storage in consumer electronics such as digital camera and mobile phones. Generally, it requires a special floating gate process during fabrication and relies on high programming voltage to provide the necessary electrical field for accessing the floating node [23]. CMOS compatible FLASH was proposed with 5V of programming voltage and 1.2V for reading operation [24].

#### 1.1.4 Communication module

There are two types of communication schemes that can be applied to the system: one that is able to individually send and receive data to other nodes and the other one that relies on a base station. The former one is the concept of a wireless sensor network which requires a reliable power source. On the other hand, the latter one can be remotely powered. Passive wireless nodes have been demonstrated with low data rates [25, 26, 27]. In these systems, the passive nodes harvest energy from the radio frequency (RF) input. In general, power and data are sent simultaneously. In radio frequency identification (RFID) terminology, the base station is called a interrogator or reader and the device that responds the request is called a transponder. Typical ranges for passive RFID devices are from less than 1cm to a few meters. Long range batteryless wireless telemetry has been reported with up to 18 meters of distance [28, 29]. To harvest enough energy for signal transmission, large capacitors are used to store the charges.

Active communication usually adopts a so-called *schedule rendezvous* action that only periodically wakes up the transceiver to perform communication [30]. The power saving is a strong function of the responsiveness of the sensor node which is highly application dependent. As an alternative, the aforementioned wakeup receiver that operates at lower power consumption when it is inactive can be used. With -50dBm RF input sensitivity, a wakeup detector is able to operate at 100nA of standby leakage and only a few mV of activation voltage [31]. A hybrid scheme can be implemented so that the sensors only turn on their low power wakeup detector every t seconds (where t is a design parameter). This scheme further saves power consumption at the expense of the response delay.

#### 1.1.5 Timer

Time keeping is essential to some sensor systems since the content can be highly time dependent. For example, the doctor may apply proper treatment by knowing the temperature variation of the patient in the past 48 hours. In another example, the scientist may be interested in the humidity of a forest during a particular time of the day to study its impact on the ecosystem. The timer has to be able to adapt to different weather conditions such as dramatic changing of temperature. Another function of the timer is to monitor the sleeping time of the system when power gating is applied. The precision requirement of the timer is highly application dependent. For medical applications, the temperature variation is small whereas it can be dramatically different when used in automotive systems. Generally, power consumption is the biggest challenge for the timer since it is the only active device while the other parts of the system are strongly power gated. The power consumption of the timer should not dominate the sleep power of the system.

#### 1.1.6 Power source

Energy scavenging and battery are two potential power sources for the sensor systems. Energy scavenging is the process by which energy is captured and stored. There are a variety of scavenging sources such as solar power, thermal energy, vibration energy or even human power [32, 33, 34, 35, 36]. Table. 1.1 summarizes the power density of various power generation sources [37]. For a lifespan of 10 years, the power density of the energy scavenging sources outperform the Lithium batteries. Among all, vibration is potentially the favorable mechanism because of its abundance. One way of exploiting it is to use the piezoelectric materials to produce electric field when the material is deformed by external forces. Other methods such as magnetic and

|                       | Power Density $(\mu W/cm^3)$            |  |  |
|-----------------------|-----------------------------------------|--|--|
| Solar (outdoors)      | 15,000 - direct sun<br>150 - cloudy day |  |  |
| Solar (indoors)       | 6 - office desk                         |  |  |
| Vibrations            | 200                                     |  |  |
| Acoustic Noise        | 0.003 @ 75 dB                           |  |  |
| Acoustic Noise        | 0.96 @ 100 dB                           |  |  |
| Daily Temp. variation | 10                                      |  |  |
| Temperature Gradient  | $15 @ 10^{\circ}{\rm C}$ gradient       |  |  |

Table 1.1: Summary of power generation sources.

electrical transducer can also be used to harvest the vibration energies. Vibration scavenging through the traffic of a bridge or induced by wind, for example, is a reasonable power source for a structural health sensor nodes [38]. However, considering the inconsistent nature of the mechanism, such power source is not reliable enough to guarantee the operation of the system.

Batteries are able to supply constant current until the lifetime is over. The lifetime of a battery depends mostly on the form factor and the chemistry. Table 1.2 compares several commercialized miniature batteries that are potential candidates for the sensor system. 4A and CR-1025 are commonly used batteries for small electrical devices such as watches and toys. Although the charge density is high, they are not compatible with microfabrication process. Power paper [39] (ink based technique) and Cymbet [40] (thin film battery) are advantageous in terms of size because the thickness of the battery can be less than 1mm. Take Cymbet for example, a 1mm by 1mm by  $25\mu$ m battery is able to provide the energy of roughly  $10\mu$ Ah. The lifetime of the sensor system can be calculated from the its power consumption and the capacity of the power source. A year of lifetime means that the whole system can only consume 1nA of current when directly supplied by the aforementioned Cymbet battery.

| Product     | Nominal Voltage | Capacity           | Size                   | Charge density                    |
|-------------|-----------------|--------------------|------------------------|-----------------------------------|
| 4A battery  | $1.5\mathrm{V}$ | $625 \mathrm{mAh}$ | $2298.0 \mathrm{mm}^3$ | $0.27 \mathrm{mAh}/\mathrm{mm}^3$ |
| CR-1025     | $3.0\mathrm{V}$ | 30mAh              | $196.25 \mathrm{mm}^3$ | $0.15 \mathrm{mAh}/\mathrm{mm}^3$ |
| Power paper | $1.5\mathrm{V}$ | $30 \mathrm{mAh}$  | $1064.7 \mathrm{mm}^3$ | $0.03 \mathrm{mAh}/\mathrm{mm}^3$ |
| Cymbet      | $3.6\mathrm{V}$ | N/A                | N/A                    | $0.40 \mathrm{mAh}/\mathrm{mm}^3$ |

Table 1.2: Comparison between small batteries

### 1.2 Low power sensor system

In this dissertation, we target for a sensor system that is mostly limited by the form factor. Applications such as implantable or non-intrusive systems may find 1mm<sup>3</sup> form factor attractive. One good example is a intraocular pressure monitoring system that is shown in Fig. 1.3. Eye pressure is highly related to some eye disease such as glaucoma. When the intraocular pressure (IOP) increases, it can cause malfunction of the eye's drainage structure. It will finally damage the optic nerve and result into permanently vision loss if left untreated. The raise of pressure inside the eye is due to the imbalance between drainage and reproduction of fluid [41]. Fluids continuously enter the eye but they are not able to be drained due to improper functioning drainage channels. IOP higher than 22mmHg is considered to be suspicious and possibly abnormal [42, 43]. Traditional way to examine eye pressure is through tonometer. Applanation tonometer measures eye pressure by the force requires to flatten a constant region of cornea. It is considered the most accurate way of measuring the eye pressure. The other tonometry such as air puff test or transpalpebral tonometry do not require direct contact of the cornea and are less accurate compared to the applanation tonometers. The disadvantage of those measurement is that the patient has to frequent the clinic in other to take the measurements.

In addition, according to [44], the measurement taken at the clinic does not reflect the peak pressure and the pressure variation. Large fluctuation in diurnal IOP is a risk factor of open-angle glaucoma. High IOP on awakening is reported from many



Figure 1.3: Intraocular pressure monitoring system.

publications [45, 46, 47]. Therefore, the measurement that is not taken during the morning very likely misses the peak of eye pressure. The average IOP is similar during different time of the day, but peak IOP is about 5mmHg higher on awakening compared to other time. Goldmann tonometry was done in the sitting position and it is reported that IOP is higher in supine position [48]. Thus, it suggests that the using of portable tonometry or self-tonometry is advantageous over the traditional methods. In [49], a non-invasive self-tonometry device has been reported and is demonstrated with the ability to measure the IOP in a pig's eye.

In order to continuously monitor the intraocular pressure, implantable pressure sensor is the preferable solution. Passive resonant sensor utilizes the inductancecapacitance oscillating circuit to detect the resonant frequency of the sensor, and uses the information to determine the absolute pressure from the capacitance value [50]. The other way to implement passive sensor is by designing a device so that the pressure can be transformed into changes in terms of electromagnetic properties [2]. The signal is then coupled to a external device through a coil and can be digitized and stored into the memory. The pressure sensor can also be implemented with surface micromachining [51]. In [52], the authors summarize the development of nontelemetry intraocular sensors to date where the implant sizes are ranging from 1.1mm to 11.5mm in diameter.

In [53], radio-frequency (RF) transmission is used to send the signal from the transponder to external telemetric component. This system, however, still requires external processing unit such as A/D converter or network analyzer to monitor the processor in real time. A full system demonstration of intraocular sensor was reported with an on-chip micromechanical pressure sensor, a microcontroller, the readout circuits and a RF transponder in [54]. Another readout method for intraocular application is to use a coil in parallel to the capacitive sensor [55]. This LC resonant circuit converts the pressure into a shift of the resonance frequency. A VCO is then used to excite the sensor over a frequency range and to detect the resonant frequency of the internal sensor.

For a monitoring system on the order of  $\sim \text{mm}^3$ , the power source, whether by energy scavenging or microfabricated battery, is the limiting factor. As was discussed in the previous section, less than 1nA of average current consumption is required for a year of lifetime based on the capacity of the battery. To operate at such a tight power budget, the system has to adopt aggressive power gating while it is not actively monitoring the objects. Fig. 1.4 shows such a monitoring system with a goal on minimizing the total power. In this system, it relies on a battery to provide the supply power to all the components except for the wireless module. The wireless module should harvest AC power directly from the RF input. The voltage regulator downconverts the voltage level from a typical battery output to the energy minimum



Figure 1.4: Monitoring system considering power gating scheme.

voltage level. In this way, the dynamic power can be quadratically reduced with supply voltage. With switched-capacitor DC-DC converter, the current efficiency can be higher than 100% [56].

In the active mode, the CPU reads the routines from the ROM and compresses the digitized data from the sensor to the data memory. The storage units are partially retentive in order not to lose the data while in the sleep mode. After the computation is completed, the CPU sends a request to the power controller before entering the sleep mode. Then the power controller takes over and switch off the footer/header transistors for the none data retentive blocks. At the same time, a START signal is also sent to the timer to start counting the time spent in the sleep mode. During the sleep mode, a retention memory is used to keep the recorded data. After a given number of cycles, the timer sends a expire signal to the power controller and returns the control to the CPU again.

Recently, many research efforts have been focused on ultra low power design for digital logics and memories. Energy number in a single digit of pJ was reported for the processor as well as SRAM [57, 20]. On the other hand, the power consumption of the peripheral circuits such as the timer have gotten little attentions. In this dissertation, we will discuss the design issues on the peripheral circuits under stringent power constraints and propose our solutions.

Crystal oscillator is widely used in digital system to provide excellent process, supply and temperature insensitivities. The package dimension, however, is at least a few mm<sup>3</sup> for discrete crystal oscillators. With Colpitts crystal oscillator and a cascode connected base-common buffer amplifier, an integrated crystal oscillator can be produced with low frequency sensitivity [58]. However, the total area is still on the order of  $1 \text{mm}^2$  and the power consumption is close to 1 mA. As an alternative, current controlled one-shot timer was proposed to provide steady output frequency with circuit that combines Schmitt trigger and a charge pump [59, 60]. For a given period of time, the charge pump provides a fixed amount of charge to the load capacitor and the output will eventually be flipped when the voltage level exceeds the transition point of the Schmitt trigger. In this design, the current source has the most impact on the output frequency variations. The implementation of steady low current source is challenging given that 1nA is the total system current budget. Ring oscillator based circuit can also be used to generate clock signal with low hardware overhead [61]. For MOS transistors, the drain current decreases at higher temperatures mainly due to the degradation of electron mobility. On the other hand, the drain current is increasing with the temperature while operating at subthreshold region where the threshold voltage becomes the dominating factor. Therefore, by properly operating the ring oscillator at a particular voltage, it can be temperature insensitive as well. However, the temperature coefficient is highly sensitive to the voltage and process variations.

In Chap. II, two timer designs that is suitable for sub-nA operations will be presented. The first one uses the gate leakage of a MOS transistor as the current source for the timer. Gate leakage is relatively insensitive to temperature compared to other current source in CMOS technologies. In addition to that, it provides large time constant which is ideal for reducing the switching activities with negligible area cost. The second timer design generates a temperature insensitive current source by forcing identical voltage across a resistor. The same current flows into a reference transistor such that it also becomes temperature insensitive. To further reduce the power consumption, a program-and-hold scheme is implemented to store the bias voltage on a capacitor while the biasing circuit is turned off. The current reduction is achieved by mirrored to a transistor that is 200X smaller than the reference one.

An ultra low power temperature sensor will be shown in Chap. III. Temperature sensor accounts for large portion of the total leakage when the system is remotely powered. The transmitting distance of such device is highly related to the power consumption. In this work, a PTAT (proportional to absolute temperature) current source and a temperature insensitive current source are implemented. After a current to frequency conversion, a digital counter is used to generate the temperature reading. Since both current sources are defined with reference resistors, the power consumption can be traded-off with the size of the resistor.

Chap. V presents a design of chip-to-chip proximity communication to read out the data from a sensor node. In certain situations, the reader can be brought in close proximity to the sensor node where a strong field is not needed for communication. Capacitive coupling is suitable for such applications where it has the advantage of low hardware overhead. Capacitive coupling was first proposed to alleviate I/O communications between the chips and was implemented through a substrate trace [62]. Face-to-face communication scheme was proposed to allow more channels and thus higher bandwidths [63, 64]. One obvious advantage is the power reduction and performance boosting due to the absence of the electrostatic discharge (ESD) protection device that is commonly used in wired communications. It is also suitable for passive communication since the transmitting frequency is not limited by the resonant frequency of the passive device. However, the major concern of the capacitive coupling scheme is the misalignment of the chips. Since the coupling capacitance is inversely proportional to the distance between the pads, the chip alignment has a strong impact on the signal strength of the receiver. An alignment independent method is proposed in this work. The transmitting pads are divided into smaller microplates and each microplate can be reconfigured to either transmit power or receive data depending on their location. The solution is demonstrated with less than 15% of achievable data rate by randomly dropped the sensor chip on the so-called data retrieval chip.

For sensor-type applications, passive RFID technique is a potential solution when the transmitting distance is less than 10m [65]. However, such system usually requires a bulky external coil so the enough energy can be harvested by the transponder. On the other hand, there are applications like intraocular pressure sensing that requires only a few mm's of distance between the reader and the transponder. Inductive coupling is well suited for this type of application. In Chap. VI, a pulse signaling based scheme is proposed to provide more robust transmission compared to traditional backscattering scheme. Pulse signaling is widely used in ultra wide band (UWB) communications [66]. It is also used in high performance proximity communication with inductive coupling [67]. In this work, a PLL is used to lock into the incoming RF frequency which is excited at the resonant frequency of the reader. A short gap between continuous waves is created so that the transponder can utilize it to send pulses with the frequency that was previously acquired by the PLL back to the reader. As a result, the signal-to-noise ratio (SNR) can be greatly improved and filtering of the strong interferences from the reader's local resonant clock is not required. The

| Chap.    | Title                                              | Contributions                                                                                                                                                          | Ref. |
|----------|----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| II<br>II | Gate leakage based timer<br>Program-and-hold timer | Sub-pW power consumption at 300mV.<br>Lower temperature dependency and sup-<br>ply sensitivity at comparable power con-<br>sumption compared to [68].                  | [68] |
| III      | Low power temperature sensor                       | Lowest published power consumption                                                                                                                                     | [69] |
| IV       | Static subthreshold to I/O voltage level shifter   | Single stage design with consistent 5FO4 delay for 300mV to 2.5V level conversion.                                                                                     | [70] |
| V        | Alignment independent ca-<br>pacitive signaling    | First alignment independent capacitive<br>coupling test chip with simultaneous<br>power and data transmission.                                                         | [71] |
| VI       | Inductive coupling using<br>pulse signaling        | Propose pulse signaling and PLL phase-<br>locking scheme that automatically ac-<br>quires resonant frequency for near field in-<br>ductive coupling data transmission. |      |

Table 1.3: Summary of the contributions of the works.

measured chip demonstrates a 1.1mm of communication distance with 1mm × 1mm of integrated inductors for both the reader and the transponder.

Chap. VII concludes the contributions of this dissertation, which are also summarized in Table 1.3. Future directions based on the works will also be discussed.

### CHAPTER II

# ULTRA LOW POWER TIMER DESIGN FOR SENSOR APPLICATIONS

## 2.1 Introduction

In this chapter, the designs of ultra-low power timers will be presented. To reduce the power consumption and extend the lifetime for a sensor system, both the active power and the idle power are crucial. For example, Fig. 2.1 illustrates the energy consumption of a sensor system during its lifetime. The system is actively performing tasks during only a short period of time. Previous work shows that the energy  $E_{min}$  of a few pJ per instruction can be achieved by operating at a subthreshold voltage  $V_{min}$ [57]. Although the voltage  $V_{min}$  optimize the energy consumption during active mode, the system spends most of the time idling. The idling energy can be greatly reduced by power gating technique. As shown in [17], however, strong power gating requires weak footer or header transistors that also reduces the performance and robustness of the circuits. Moreover, not every component can be turned off during the idle time of the sensor system. A timer that keeps track of the time in the sleep mode is one example. The power consumption of the timer should not dominate the power consumption of the other circuits, otherwise the power reduction from power gating can be largely degraded.

The crystal oscillator is widely used as the frequency reference due to its accurate value and insensitivity to temperature and supply variations. Typically incurring a bulky external component, it can also be implemented with a Colpitts oscillating



Figure 2.1: Illustration of the lifetime of a sensor system.

circuit on-chip [58]. However, the frequency coming off of the crystal oscillator is orders of magnitude higher than what we need and the resulting power consumption is unacceptable. An alternative option is a current controlled one-shot timer has been proposed to provide steady output frequency with a circuit that combines a comparator and a charge pump [59, 60]. For a given time the charge pump provides a fixed amount of charge to the load capacitance and the output will eventually flip when the voltage level exceeds the transition point of the Schmitt trigger. The design of the current source has a direct impact on the frequency sensitivity of the circuit. While the supply voltage for the sensor system is preferred to be in the subthreshold region to minimize the active energy, designing a current source that is insensitive to temperature, voltage and also small in magnitude is not trivial. In a standard CMOS process, several leakage sources are available options to provide reasonably small time constant for the timer. The subthreshold leakage is well-studied but unfortunately has exponential dependency on the temperature. The gate leakage is relatively insensitive to temperature, however, can vary by several orders of magnitude among different processes.

The remainder of the chapter will be organized as follows. We first show the design of a timer using gate leakage as the current source in Sec. 2.2. In Sec. 2.3, a timer with self temperature compensation current source will be presented. It relies on a charge holding technique to save the power during the active mode. We will compare the proposed timers and summarize our work in Sec. 2.4.

## 2.2 A Sub-pW gate leakage timer

#### 2.2.1 Circuit Design for the timer

Fig. 2.2(a) shows the typical implementation of the previously mentioned one-shot oscillator design [59, 60]. The output of the oscillator is decided by the voltage  $V_{in}$ . The waveform shown in Fig. 2.2(b) illustrates the operation of the circuit. When  $V_{out}$  is 0,  $V_{in}$  is charged toward the supply voltage by current source  $I_1$ . When  $V_{in}$  surpasses  $V_{b1}$ , the comparator will flip the output of the oscillator. Therefore,  $V_{in}$  starts to be discharged by  $I_2$  instead. Assuming that both  $I_1$  and  $I_1$  are equal to  $I_{on}$ , the frequency of the timer can be written by

$$\frac{2I_{\text{on}}}{C_t(V_{b1} - V_{b2})} \tag{2.1}$$

which is independent of the supply voltage.

In practice, the current sources are susceptible to bias condition and temperature if not carefully designed. There are many publications on CMOS temperaturecompensated current sources, however, none are targeted for ultra low power applications [72, 73]. To reduce overall power consumption of the timer, the circuit needs to be biased in the subthreshold region, further reducing the headroom for the current source. Implementing the voltage references is another challenge since the voltages  $V_{b1}$  and  $V_{b2}$  should also be independent to operating conditions such as temperature and voltage. Although a bandgap reference provides a reference voltage with great accuracy, it does not comply with the stringent power consumption that will be en-



Figure 2.2: The concept of a one-shot oscillator. (a) The circuit diagram. (b) The operation waveform.

forced on the timer. The goal of the timer is to awaken the processor just in time. Therefore, operating at a frequency faster than a very low value, for example, at sub-Hz to 10Hz range, should be avoided as it is only a waste of energy. This means that either the load capacitor  $C_t$  has to be very large or the current source should only generate very little current in order to achieve a large RC time constant.

To provide a solution to the aforementioned challenges, the circuit that takes advantage of the gate leakage of MOS transistors are proposed to replace the current sources  $l_1$  and  $l_2$  shown in Fig. 2.2(a). As a CMOS technology scales, gate oxide thickness will continue to shrink to maintain good channel control and drive current at reduced channel lengths and supply voltages. Therefore, tunneling-based gate leakage has becoming a non-negligible leakage source as opposed to subthreshold leakage, especially at 65nm or below. Gate leakage is the sum of many different tunneling currents such as the electron tunneling from the conduction band (ECB), the electron tunneling from the valance band (EVB), and the hole tunneling from the valance band (HVB). In general, gate current density has the following form [74]

$$J_g = A \cdot T_{oxratio} \cdot \frac{V_g \cdot V_{aux}}{t_{ox}^2} \cdot$$

$$\exp\left[-B(\alpha - \beta |V_{ox}|)(1 + \gamma |V_{ox}|)t_{ox}\right]$$
(2.2)

where  $A = q^2/8\pi h\phi_b$ ,  $B = 8\pi\sqrt{2qm_{ox}}\phi_b^{3/2}/3h$ ,  $m_{ox}$  is the effective carrier mass in the oxide,  $\phi_b$  the tunneling barrier height,  $t_{ox}$  the oxide thickness, and  $V_{aux}$  is a fitting function of the tunneling carrier density and available states.  $V_{aux}$  is a weak function of temperature and has the following form

$$V_{aux} = NIGC \cdot v_t \cdot \log\left(1 + \exp\left(\frac{V_{gs\_eff} - V_{th0}}{NIGC \cdot v_t}\right)\right)$$
(2.3)

Typical temperature sensitivity for gate leakage is about 10% per 10°C, which is much lower compared to the subthreshold leakage or junction leakage. For the timer application, using the gate leakage is also advantageous in its small magnitude compared to a transistor's saturation current (e.g., in  $0.13\mu$ m CMOS a typical gate leakage is on the order of 10s of pA/um [75]). The benefit of having small magnitude is two-fold. First, the static current that used to charge and discharge the load capacitance is small. Plus, large time constant helps reducing the switching of the clock network on the subsequent digital circuits (e.g. a digital counter).

For simplicity reason, the comparator function is implemented with a CMOS Schmitt trigger. The hysteresis nature of a Schmitt trigger is often used to suppress signal noises [76]. The low-to-high transition voltage  $V_{M+}$  and high-to-low transition voltage  $V_{M-}$  are defined as the 2 crossover points in the voltage transfer characteristic; i.e., when the input voltage  $V_{in}$  equals the output voltage  $V_{out}$ . In this work,  $V_{M+}$  and  $V_{M-}$  are the equivalent of  $V_{b1}$  and  $V_{b2}$  of Fig. 2.2, respectively. Since  $V_{M+}$  and  $V_{M-}$ are circuit parameters, there is no need to generate extra bias voltages.

Based on the previous discussion, our proposed timer is shown in Fig. 2.3. The Schmitt trigger inverter contains transistors MS1 through MS6. When operating at



Figure 2.3: Proposed timer structure for subthreshold operation

superthreshold voltages,  $V_{M+}$  can be determined when MS1, MS2 and MS5 are all in saturation. Considering  $I_{MS1} = I_{MS2} + I_{MS5}$ , and assuming that the channel length modulation effect is negligible,  $V_{M+}$  can be found as a function of  $V_{th}$  and  $V_{x,tran}$ 

$$V_{M+} = V_{x,tran} \cdot \left(\frac{\sqrt{k_2 + k_5} - \sqrt{k_1}}{\sqrt{k_2 + k_5} - \sqrt{k_1}}\right) + V_{th}$$
(2.4)

where  $V_{x,tran}$  is the voltage  $V_x$  in Fig. 2.3 when  $V_{M+}$  occurs. From simulation results, we know that  $V_{x,tran}$  is nearly constant when temperature varies. Therefore, the temperature impacts  $V_{M+}$  in the same way it affects  $V_{th}$ . The transition voltage for the Schmitt trigger decreases as the temperature rises.  $V_{M-}$  can be computed in the same way. Our simulation results show that at 300mV,  $\Delta V_M = V_{M+} - V_{M-}$  reduces  $0.1\%/^{\circ}$ C due to the lower on-off current ratio in subthreshold region. This results show adequate temperature dependency using the Schmitt trigger in replacement of the comparator and voltage references.

INV1 and INV2 are the inverters to provide sharper transition for the timer. To reduce the leakage power, they are stack-forced and sized with long channel lengths. The clock output is buffered again from the loading by a tri-state inverter TINV to isolate any possible noise that is coming from the other part of the system. MC1 and MC2 are thin oxide MOS transistors used to serve as the charging and discharging devices. Both PMOS and NMOS transistors are used to provide comparable charg-



Figure 2.4: Power consumption vs. supply voltage at different temperature points

ing/discharging strength. The load capacitance ML1 is implemented with a thick gate oxide transistor, which are commonly available in modern CMOS processes, to avoid unwanted gate leakage to ground. The corresponding waveform of  $V_{in}$  and  $V_{out}$  is similar to the one illustrated with Fig. 2.2(b). When  $V_{out}$  is pulled up,  $V_{inv}$  is pulled down to discharge  $V_{in}$  through both MC1 and MC2 until  $V_{in}$  is lower than  $V_{M-}$  of the Schmitt trigger, and vice versa.  $V_{clk}$  goes to a digital counter that is configurable by the system. Based on the application, the number of timer ticks can be used to decide the time between active modes.

#### 2.2.2 Measurement results

The test chip was implemented in a commercial  $0.13\mu$ m digital CMOS process. The total circuit area is approximately 480  $\mu$ m<sup>2</sup> where half of the area is allocated to the load capacitor ML1. Fig. 2.4 plots the power consumption of timer as a function of both the supply voltage and temperature. At 300mV, the power consumption of the



Figure 2.5: Timer period vs. temperature at various supply voltages.

timer is less than 1pW at 20°C and it consumes roughly 2nW at 600mV measured by a Keithley 6217 electrometer.

Fig. 2.5 shows the timer output period measured at different supply voltages and temperatures. The timer is more temperature insensitive at higher supply voltages, largely due to the fact that the impact of  $\Delta V_M$ , is minimized at the superthreshold region. Similarly, the variation due to supply voltage is also reduced at higher it V<sub>dd</sub>'s for the same reason. The measured temperature sensitivity is 0.16%/°C at 600mV and is 0.6%/°C at 300mV; supply sensitivity is 0.15%/mV from 300mV to 500mV and 0.04%/mV at 600mV, the lower figure at 600mV is specifically due to operating in the superthreshold region. For in-tissue biomedical sensor-type applications, temperature normally will not deviate more than a couple degrees and the temperature sensitivity is adequate. Also, for a system containing a temperature sensor, updated temperature information can be obtained when the processor wakes up and the number of sleep clock cycles can be adjusted the next time the system goes to sleep.



Figure 2.6: Output period scatter plot highlighting die-to-die and within-die variations.

Within-die and die-to-die process variation is a significant concern in advanced VLSI technologies. We measured ten timers (2x5) in each of the 25 dies and plot the output period in Fig. 2.6. To characterize die-to-die variation, we first compute the mean for each die and obtain  $\sigma/\mu$  across all 25 dies. Die-to-die variation is 28% and 27% at supply voltage of 300mV and 600mV respectively. Within-die variation is obtained by taking the average of  $\sigma/\mu$  within individual die and is measured at 12.4% and 9.2% for 300mV and 600mV. Key sources of variation includes oxide thickness variation and the voltage shift of the Schmitt trigger trip points  $V_{M+}$  and  $V_{M-}$  due to transistor mismatch. In general, the variation can be calibrated by adjusting the aforementioned counter. The processor can easily configure the number of counts between the readings by preloading digital values.

We also tested the proposed timer by running it continuously for 20 hours to measure the timing stability over an extended period. Measurement results over time



Figure 2.7: Timer output period variation with respect to time.

along with the resulting histogram are shown in Fig. 2.7. It takes approximately ten minutes for the timer to reach steady state, after which the output frequency is always within 1% throughout the remaining 20 hours of testing. The rms *jitter* for this timer is 30ms, equivalent to 0.14% of the output period.

## 2.3 Self temperature compensation for low power timer

As mentioned at the beginning of the chapter, the fundamental challenge to design a low power timer is to find a reliable current source which defines the output period accurately with low cost. The previous section discussed the design where gate leakage of a thin oxide device is used to provide such current. In general, gate leakage modeling is a complicated problem and accurately simulate the behavior is not feasible. Also, porting the design to another technology is not trivial since the change in gate leakage is typical huge. Therefore, we would like to explore other options that is available in a standard CMOS process.

# 2.3.1 Oscillator with self temperature compensated current source

Subthreshold leakage is still the most dominant contributor to the total leakage power dissipation for advanced CMOS technologies [77]. Therefore, it is also the most studied leakage source and many measurement data can be used to refine the circuit simulation model [78, 79]. The result of many recent literatures on subthreshold circuit designs provide further evidents on this [57, 80, 19]. An advantage of using the subthreshold leakage is that the value can be easily duplicated by the current mirror, which is critical for lowering the power consumption as we will present in Sec. 2.3.2. On the other hand, subthreshold current does suffer from temperature and process variation as shown by the equation

$$I_{sub} = \mu_{eff} C_{ox} \frac{W}{L} (m-1) \left(\frac{kT}{q}\right)^2 e^{q(V_g - V_t)/mkT} (1 - e^{-qV_{ds}/kT})$$
(2.5)

As temperature affects both the thermal voltage and the threshold voltage in a nonlinear way, finding a inverse function that is able to compensate for the temperature effect is not practical. In addition to that, process variation from doping and geometry can cause difference up to several times compared to the nominal value [81, 82].

In typical CMOS technologies, unsilicide polysilicon resistors provide decent sheet resistance and well-controlled temperature coefficient [83]. By forcing constant voltages across the two terminals of a resistor, it becomes a superior current source compared to the subthreshold current of a MOS transistor that is not temperature compensated. Fig. 2.8 shows the bias stage with resistor R1. Diode-connected transistors  $M_{d1}$ - $M_{d6}$  which are equally size evenly divide the supply voltage and the intermediate voltages are insensitive to temperature. Transistor  $M_0$  is biased with a gate voltage lower than the subthreshold voltage and forms a negative feedback loop with



Figure 2.8: The bias stage showing the voltage divider and a resistor based self-biasing loop.

resistor R1 and amplifier A0. The voltage on node nd3 is replicated to n1 through the feedback loop. As a result, the current that flows through R1 and the drain current of  $M_0$  are identical and can be given by

$$I_{R1} = I_{M_0} = \frac{V_{dd} - V_{nd3}}{R1}$$
(2.6)

the gate overdrive voltage **bn** is thus self-biased at different temperatures. When the temperature is high, voltage **bn** reduces to compensate for the higher leakage, and vice versa. Transistor that biased in subthreshold region provides very high output resistance since the drain current barely changes with  $V_{ds}$  when it is larger than 3 kT/q. The magnitude of current-mirrored output can be easily adjusted for process variation by dividing M<sub>0</sub> into smaller parallel transistors with series switches that can be selectively turned on to change the ratio. A **reset** signal is used to turn off the biasing circuit in order to save power during the active mode. We will have more discussion on this in the later section.

The oscillator that generates the output of the timer is shown in Fig. 2.9. It is a one-shot oscillator that determines the oscillation period by the load capacitor  $C_L$  and current sources that is biased by **bn** and **bp**. Both **bn** and **bp** are originally generated from the bias stage and replicated by the hold stage that will be presented later. When out is logic low, the charge stored on *load* will be sinked to ground. On the other hand, *load* is pulled up toward  $V_{dd}$  when **out** is logic high. The switching transistors for pull-up and pull-down are long channel devices and stack-forced with N=4. This is because biasing transistors  $M_n$  and  $M_p$  are biased in the deep subthreshold region. To guarantee the behavior at low temperatures, the switching transistors should be significantly weaker than biasing transistors when they are turned off so that the unwanted leakage does not contend with the charging transistors during oscillation. By comparing *load* voltage to reference voltages *refh* and *refl*, the output will flip from the previous state once *load* surpasses *refh* or becomes lower than *refl*. In this work, *refh* and *refl* is also generated from the bias stage (voltages *nd2* and *nd4*). The gain of the comparator is an important design parameter since it determines the delay of signal **ss** and **rs** once **load** triggers the comparator. It is noted that to maintain the comparator gain at different temperatures, the comparators are also biased with the same gate overdrive voltage **bp**.

#### 2.3.2 Power reduction by charge holding technique

From Eq. 2.6, the current consumption of the circuit is mainly determined by the resistance value. When  $V_{dd} = 600 \text{mV}$ ,  $V_{nd3} = 300 \text{mV}$  and R1 equals  $15 \text{M}\Omega$ , the power consumption for the bias stage is 10nW assuming that the power consumption of the amplifier at room temperature can be neglected. Further increasing R1 will reduce power consumption at the expense of increased silicon area. Reducing the voltage difference between  $V_{dd}$  and  $V_{nd3}$  is another option. However, it magnifies the voltage offset between  $V_{nd3}$  and  $V_{n1}$  and increases the temperature sensitivity. In



Figure 2.9: One-shot oscillator for timer output.

an effort to keep adequate footprint for the timer, a program-and-hold technique is proposed to bring down the power by two orders of magnitude with little hardware overhead.

The circuit diagram for the proposed method is shown in Fig. 2.10. The bias stage and the oscillator stage have already been presented in the previous section, with an addition of the hold stage for power saving. The idea is to store the voltage on a capacitor after the bias stage is turned off through power gating. The time before the bias stage is turned off is called the programming mode. And it enters the active mode when the bias voltage is only sustained by the hold stage. By applying the bias voltage to a much smaller transistor than bias transistor, the bias current for the oscillator and thus the total active power can be proportionally reduced. For example, if the ratio between the transistor width of  $M_0$  and  $M_1$  is 200:1, the power consumption can be reduced from 10nW during the programming mode to merely 50pW in the active mode.

Two types of charge holding circuit are shown in Fig. 2.11. For type I circuit,



Figure 2.10: One-shot oscillator for timer output.

the bias voltage bn is written into bn1 and bn2 when c[1] is high. After c[1] goes low, bn will be discharged to ground as amplifier A0 is turned on simultaneously. Assuming A0 has no input offset voltage, bn1 should be identical bn2 and therefore eliminates the subthreshold leakage through  $M_{s2}$ . Ideally, the charge stored on  $C_L$ can be maintained for a long period of time before it needs to be replenished again. However, the junction leakage of  $M_{s2}$  becomes a dominant source that discharges  $C_L$ when the temperature increases. To address this issue, type II circuit is considered. Instead of charging  $C_L$  with pass transistors, gate leakage of a thin oxide transistor is used. When the bias stage is ON, amplifier A1 controls the voltage of bn1 until bn2reaches the same value as bn. After the bias stage turns off, node bn2 acts like floating node. In this scheme, node bn2 does not suffer from other leakage source other than through transistor  $M_s$ . Since gate leakage is relatively insensitive to temperature, type II circuit is able to operate at much higher temperatures than the type I counterpart.

To further understand the operation of the hold circuit, the node voltages are plotted in Fig. 2.12. Assuming at the beginning of power on, every node has initial



Figure 2.11: Circuits for charge holding. (a) type I, (b) type II.

condition of voltage 0. During phase P1, node **bn** will reach steady state first as discussed in Sec. 2.3.1. **bn2** will slowly converge to **bn** with time constant depending on the ratio of gate leakage and the load capacitance. It is noted there is finite voltage offset between **bn1** and **bn** due to input offset voltage and finite gain for A1. In order to eliminate the gap between **bn1** and **bn**, transistor  $M_s$  is turned on during P2. At this point, ideally **bn**, **bn1** and **bn2** will all have the same voltage. In phase P3 and P4, amplifier A2 is turned on to minimize the voltage difference between **bn1** and **bn2** while bias stage can be turned off by **c[0]** to save power. While P2, P3 can take less than milliseconds, P1 will be the dominating period of the total programming time. Since **bn1** and **bn2** follow each other closely during P4, **bn1** is chosen to drive the oscillator stage. In this way, the unnecessary coupling noise from the switching of oscillator can be prevented from entering pseudo-floating node **bn2**. The programming time of the timer is defined as the total period combining P1, P2 and P3.

#### 2.3.3 Test chip and measurement results

The proposed timer design was fabricated in  $0.13\mu$ m technology. The die photo of the timer circuit is shown in Fig. 2.13. Total area of the timer is 0.019mm<sup>2</sup>, where the resistor occupies about half of the that. Ambient temperature is controlled by



Figure 2.12: Timing diagram of the program-and-hold method.

a TestEquity TE-105A temperature chamber. In order to guarantee a precisely controlled temperature, all testings are performed 10 minutes after the temperature is ramped up. The output frequency of the timer is measured by a Tektronics oscilloscope TDS5104A. In this setup, all the control signals are supplied externally through the parallel interface of a PC.

The normalized frequency variation due to temperature and supply voltage is plotted in Fig. 2.14. Since the frequency of the timer drifts over time by the gate leakage through the programming transistor, the output frequency in this figure is referred to as the frequency right after the programming mode is completed. The programming time is set to 30 seconds for this measurement. At lower temperatures, the output frequency decreases from the room temperature value. This is mainly because of the reduced gain from the comparators of the oscillate stage. The variation of the output frequency across temperature when the timer is operating at 600mV is 6% over the range from 0°C to 90°C. At different supply voltages, the curves closely track each other with respect to temperature, suggesting that the characteristics



Figure 2.13: Die photo of the timer test chip.

remain unchanged with the bias voltage. At room temperature, varying the supply voltage by  $\pm 50$ mV results in +4/-2% of frequency variation. In this work, the timer is the only active component in the sleep mode. Thus, the switching supply noise can be neglected. It is reasonable to assume that the cycle time error due to supply variation should be lower than 1%.

The expected timer behavior is, for example, to wake up the system every 10 minutes. The ideal situation is that the timer only requires to be reprogrammed whenever the processor is waken up to simplify the programming process. Given the fact that the timer output frequency is not constant after entering the active mode, the expiration time that is defined by a certain numbers of cycles counted by



Figure 2.14: Normalized frequency vs. temperature and supply voltage.

the timer varies depending on how often the timer needs to be refreshed. Fig. 2.15 shows the average timer frequency versus the time between refreshing. The results are measured at 600mV with a programming time of 30 seconds as well. Each curve in the figure represents a different temperature point at 0, 20, 50 and 90°C, respectively. Higher temperature also means higher leakage and thus larger slope is shown in the figure. At 0°C, the frequency drift is about 0.8% per minute, while it is 1.7% per minute at 90°C. The measurement results backs up the statement that choosing type II circuit for the hold stage is advantageous for its low temperature coefficient. By refreshing the timer every 4 minutes, 7% of frequency deviation is observed across the temperatures. Whereas by reducing the refreshing time to 2 minutes, the frequency deviation can be reduced to 5% as well.

So far, the programming time is set to 30 seconds to guarantee that the timer is properly biased. However, the power saving by the program-and-hold method is maximized by staying at the power hungry programming mode as short as possible.



Figure 2.15: Average of normalized frequency drift over time.

In Fig. 2.16, the timer is refreshed every four minutes with a programming time of 1.1 seconds. It shows that 3 to 4 programming cycles are required to bias the timer at the target frequency. After that, the timer will operate at a steady frequency. In order to achieve the steady state frequency in a single programming cycle, the programming time needs to be increased to at least 1.5 seconds.

Although it is of our interest to reduce programming time, the impact in terms of temperature sensitivity needs to be considered. Fig. 2.17 shows the output frequency normalized to the programming time of 10 seconds with three different temperature settings. When the programming time is further decreased below 1 second, the timer can no long be properly programmed and therefore no oscillation can be observed. This is consistently true across the temperatures of interest. At the room temperature, the output frequency drops 8% by reducing the programming time to less than 2 seconds. Since the programming time will be fixed across temperature, what matters is the frequency deviation at the same programming time. According to the figure,



Figure 2.16: Refresh the timer every four minutes with 1.1 second programming time.

reducing the programming time will not introduce more than 2% error on top of the frequency deviation shown in Fig. 2.14. Therefore, it is reasonable to always use the minimum programming time that is available for this work.

Power consumption is measured at programming mode and active mode, respectively. Fig. 2.18 shows the power consumption at supply voltage equals to 550mV. It is also shown that the programming power has a clear floor at around 11nW. This is mainly due to the fixed voltage drop across the polysilicon resistor in the bias stage. At higher temperatures, the programming power grows beyond linearly due to the exponentially increased power from the amplifiers that is biased in subthreshold region. The active power at room temperature is 55pW, which is directly proportional to the 200:1 current mirror ratio. In active mode, the power consumption of the comparators start to dominate the total power after 60°C.

Combining Fig. 2.15 and Fig. 2.18, the tradeoff between power and frequency deviation across the temperature range of interest from 0°C to 90°C is shown in



Figure 2.17: Normalized frequency vs. programming time.



Figure 2.18: Power consumption at the programming mode and the active mode with respect to different temperatures.



Figure 2.19: Power consumption and frequency deviation with different refreshing time.

Fig. 2.19. The programming time is given by 1 second, which is also the smallest period that can still bias the timer properly. As the refreshing time gets larger, the average power consumption includes both the programming power and the active power will be reduced. The frequency deviation is 5% without considering the shift of frequency over time. When the refreshing time increases over two minutes, the frequency deviation begins to rise as a result of leakage current difference between low and high temperatures. To sum up, 150pW and 100pW of power consumption can be achieved if the tolerable errors are 5% and 7%, respectively.

### 2.4 Summary

In this chapter, two types of timers are proposed for ultra lower power sensor platforms. Table. 2.1 summarizes the characteristics of the timers. The gate leakage timer has the advantage on smaller footprint and also consumes less power compared to the program-and-hold timers when operates at 300mV. Considering applications, the program-and-old timer works only under the assumption that temperature only vary slowly compared to the refreshing period. The gate leakage timer, however, suffers from larger variation when operates at different temperatures, especially when the supply voltage becomes lower. In exchange for better temperature insensitivity for gate leakage based timer, the supply voltage has to be increased and results into higher power consumption. In terms of low voltage operation, the gate leakage timer is preferred since the operation does not rely on the analog components as in the program-and-hold timer. Control for the gate leakage timer is trivial as the program-and-hold timer requires a finite-state machine for programming it properly.

|                             | Gate leakage timer                          | Program-and-hold timer               |  |
|-----------------------------|---------------------------------------------|--------------------------------------|--|
| Technology                  | $0.13 \mu \mathrm{m}$ $0.13 \mu \mathrm{m}$ |                                      |  |
| Total area                  | $480\mu m^2$ $0.019mm^2$                    |                                      |  |
| Nominal period              | 11s                                         | 0.09s                                |  |
| Cycle time error            | $16.3\%$ @ $450 \mathrm{mV^{*}}$            | 5% (refresh every 2 minutes)**       |  |
| due to temperature          |                                             | 7% (refresh every 4 minutes)**       |  |
| Power consumption           | 120 pW @ 450 mV                             | 50mV 150pW (refresh every 2 minutes) |  |
|                             |                                             | 100 pW (refresh every 4 minutes)     |  |
| Supply sensitivity          | $\pm 7.5\%$ @ 450mV                         | +4/-2% @ 600mV                       |  |
| $(\pm 50 \text{mV offset})$ |                                             |                                      |  |

\* Over a temperature range from  $0^{\circ}$ C to  $80^{\circ}$ C.

<sup>\*\*</sup> Over a temperature range from  $0^{\circ}$ C to  $90^{\circ}$ C.

#### CHAPTER III

# AN ULTRA LOW POWER 1V, 220NW TEMPERATURE SENSOR FOR PASSIVE WIRELESS APPLICATIONS

#### 3.1 Introduction

Since the last decade, smart temperature sensors have growing demands on VLSI, automotive, and wireless sensing applications due to their low cost. Monitoring VLSI chip temperature plays a key role on long-term system level reliability and performance. Rapidly increasing transistor numbers require embedded sensors with small area and low power that can be spread over the chip for temperature management [84]. Sensors that produce low power consumption not only helps with power grid integrity but also alleviates self-heating issues. Recently, growing interests in building monitoring systems with wireless telemetry or RFID cards demand even more stringent power consumption [85, 86]. The energy range which is defined as the distance from the transponder and reader that is just enough to operate the transponder can be extended by cutting down the power dissipation [87]. In the work reported in [65], the temperature sensor consumes  $10\mu$ W compared to  $2\mu$ W by the reader for their passive RFID transponder. This means that the power consumption of the temperature sensor is highly related to the working distance of such wireless systems.

Smart temperature sensor ICs were first developed using bandgap reference and analog-to-digital converters (ADCs) [88, 89]. Such sensors typically are able to achieve better than  $\pm 1^{\circ}C$  accuracy with calibration. Combining with offset cancellation, dynamic element matching and room-temperature calibration, accuracy of  $\pm 0.1^{\circ}C$  with 247.5 $\mu$ W power consumption was reported [90]. Time-to-digital converter (TDC) was also proposed to measure the temperature by tracking a pulsed signal along a delay line [91]. In this work, our goal is to implement a temperature sensor with sub- $\mu$ W power dissipation with acceptable accuracy for ultra low power passive wireless sensor applications. In the Sec. 3.2, the architecture of our proposed circuits will be discussed and the power consumption will be analyzed. Measurement results will be shown in Sec. 3.3 and is followed by the conclusion in Sec. 3.4. A discussion on improving the voltage sensitivity will be in Sec. 3.5.

## 3.2 low power temperature sensor design

Fig. 3.1 shows the block diagram of the temperature sensor. Temperature insensitive current source  $I_{ref}$  and proportional to absolute temperature (PTAT) current source  $I_{PTAT}$  are generated separately. Each current source is mirrored and fed into the current-starved ring oscillator to translate the temperature information into frequency. Afterwards, the clock signals are fed into an UP-counter that is triggered by a *start* signal in order to produce a digitized output. The sensor controller decides when the conversion should start and responds by a *data\_valid* signal when the data is available. The key blocks of this work is to generate current sources  $I_{ref}$  and  $I_{PTAT}$  with low power dissipation and is still able to maintain reasonable temperature characteristics.

Generating  $I_{PTAT}$  is a commonly used technique in bandgap reference design for compensating the complementary to absolute temperature (CTAT) current sources. Fig. 3.2 shows the schematic for such purpose that was originally implemented with bipolar circuits [92]. CMOS transistors can be used in place of bipolar transistors when operating in the subthreshold region. In this way, we can reduce the power consumption of this block significantly, which accounts for roughly 30% of the total



Figure 3.1: Temperature sensor block diagram.

power dissipation. When  $V_{gs}$  is less than  $V_{th}$  and  $V_{ds}$  is larger than three  $V_T$ , the drain current of transistor M4 and M5 can be approximately written by:

$$I_{sub} = \mu C_{OX} \cdot \frac{W}{L} \cdot {V_T}^2 \cdot \exp\left[\frac{V_{gs} - V_{th}}{nV_T}\right]$$
(3.1)

where  $V_T$  and  $V_{th}$  are the thermal voltage represented by kT/q and  $V_{th}$  is the threshold voltage of the transistor, respectively. Through current mirror transistors M2 and M3, the current through resistor  $R_{PTAT}$  can be expressed as

$$I_{R_{PTAT}} = \frac{nV_T}{R_{PTAT}} \ln \left[ \frac{W_5 W_3 L_4 L_2}{W_4 W_2 L_5 L_3} \right]$$
(3.2)

Assuming that  $V_{th}$  mismatch is ignored. By properly biasing the circuit, the output current is proportional to  $V_T$ . The sensitivity to the geometric variations can be minimized by designing a large value in the log function. Large transistor sizes also help to reduce the impact on threshold voltage due to random doping fluctuations.

The temperature insensitive current source is generated by a self-biasing technique. The circuit diagram is shown in Fig. 3.3. M1 through M5 are diode-connected



Figure 3.2: Schematic for  $I_{PTAT}$  generation.

transistors used to provide bias voltages that are proportional to the supply voltage. The voltage of nb is replicated to node na through negative feedback loop consisting of transistor M6, resistor R1 and the amplifier. Therefore, the drain current of M6 can be defined by  $(V_{dd} - V_{na} - V_{os})/R_{ref}$ , where  $V_{os}$  is the input offset voltage of the amplifier. The fractional temperature coefficient  $(TC_F)$  of  $I_{d6}$  is

$$TC_F(I_{d6}) = \frac{1}{I_{d6}} \cdot \frac{dI_{d6}}{dT}$$
 (3.3)

$$=\frac{1}{V_{dd}-V_{na}-V_{os}}\cdot\left(\frac{dV_{na}}{dT}-\frac{1}{R_{ref}}\cdot\frac{dR_{ref}}{dT}\right)$$
(3.4)

To reduce the non-ideal temperature effect on the sensor we do the following: 1) the resistor is chosen so that the second-order temperature coefficient (TC2) is minimized; and 2) transistors M1-M5 should be identically sized to eliminate the first term of Eq. 3.4.

It is noted that in this work, the voltage reference circuitry in Fig. 3.3 was imple-



Figure 3.3: Schematic for  $I_{ref}$  generation.

mented as a voltage divider. Thus,  $I_{ref}$  is inversely proportional to the supply voltages and lead to changing output value with power supply noises. To fix this issue in the future, the voltage reference should be re-designed to have constant output regardless of the supply voltage.

Both  $I_{ref}$  and  $I_{PTAT}$  blocks generate analog voltages bn and bp to provide the starving voltage for the ring oscillator. Temperature information in  $I_{ref}$  and  $I_{PTAT}$  are translated into frequency for the signals  $clk_i$  and  $clk_l$ . In Fig. 3.4, the sensor controller is shown as well as the timing diagram.  $clk_i$  and  $clk_l$  are used to clock the q-counter and the d-counter, respectively. When *start* is 0, both counter outputs are cleared. Triggered by input signal *start*, the controller asserts output *data\_valid* after the q-counter gets overflowed 2<sup>10</sup> cycles later. *data\_valid* immediately stops both counters from changing their content until *start* goes to 0 again to reset the states. The temperature sensor including the  $I_{ref}$  and  $I_{PTAT}$  blocks are implemented so



Figure 3.4: Block diagram and timing diagram of the sensor controller.

that they can be deactivated during sleep state by asserting *reset* signal. When high conversion rate in not required, the temperature sensor can be periodically deactivated to save power.

The total power of our proposed temperature sensor can be written as follow:

$$P_{tot} = V_{dd} \cdot [(n+1)I_{PTAT} + (m+1)I_{ref}] + P_{ctrl}$$
(3.5)

where n, m are the multiplication constants of current mirrors.  $P_{ctrl}$  is the power consumption of the sensor controller. For simplification, static power consumption is neglected in this first order analysis. Therefore,  $P_{ctrl}$  can be expressed as  $\alpha C_c V_{dd}^2 f_{clk}$ given the total capacitance  $C_c$ , effective activity factor  $\alpha$  and clock frequency  $f_{clk}$ . Considering  $f_{clk}$  as a function of  $I_{ref}$ ,  $I_{PTAT}$  and  $V_{dd}$ , Eq. 3.5 can be re-written as

$$P_{tot} = k1 \cdot \frac{V_{dd}}{R_{PTAT}} \cdot V_T + k2 \cdot \frac{V_{dd}^2}{R_{ref}}$$
(3.6)

where k1 and k2 are geometry and process related constants. It is shown that 1) the

power consumption of the sensor is a linear function of temperature; and 2) power consumption can be proportionally reduced by using large resistors. The size of resistors are determined by the target current consumption of 200nA and by matching  $I_{ref}$  and  $I_{PTAT}$  at room temperature. In this work, 6.2M $\Omega$  and 3.2M $\Omega$  P+ poly resistors are chosen for  $R_{ref}$  and  $R_{PTAT}$ , respectively.

## 3.3 Measurement results

The chip was implemented in a  $0.18\mu \text{m}$  1P6M digital CMOS process. The total area of the temperature sensor module is  $0.05\text{mm}^2$ . The die photo is shown in Fig. 3.5. In this test chip, 85% of the area is dominated by the resistor for biasing the current sources.



Figure 3.5: Die photo of the temperature sensor.



Figure 3.6: Power consumption of the temperature sensor.

The measurement is setup inside a TestEquity environment chamber TE-105A. The power consumption is measured by a Keithley electrometer 6517A and the results are shown in Fig. 3.6. The supply voltage is set to 1V while the nominal supply voltage for this technology is 1.8V. The power consumption increases from 200nW to 310nW from 0°C to 100°C, which matches the expected trend from Eq. 3.6. The slope at higher temperature is larger mainly because leakage components become non-negligible in this region. It is noted that there is a trade-off between power consumption and area. While most area are dominated by the resistors, reducing the resistance by half also reduce total area by 43%. In the same time, the conversion rate is also doubled because of the boost in ring oscillator's starving current. In this test chip, *clk\_i* is running at 100kHz for an equivalent of 100 samples/s. This is sufficiently fast for most applications, and in fact we can lower the conversion rate to lower the reading noise as will be shown in Fig. 3.8.

The temperature inaccuracy of 5 test samples after two-point calibration are shown in Fig. 3.7. The temperature error is ranging from  $-1.6^{\circ}$ C to  $+3^{\circ}$ C over the sweep-

| Sensor    | Inaccuracy $(^{\circ}C)$ | Power<br>Consumption | Technology        | $\begin{array}{c} Area \\ (mm^2) \end{array}$ | Temperature range (° $C$ ) | Conversion rate<br>(samples/s) |
|-----------|--------------------------|----------------------|-------------------|-----------------------------------------------|----------------------------|--------------------------------|
| [88]      | $\pm 1$                  | $7\mu W$             | $2\mu\mathrm{m}$  | 1.5                                           | -40~120                    | 50                             |
| [89]      | $\pm 1$                  | $1 \mathrm{mW}$      | $0.6 \mu { m m}$  | 3.32                                          | $-55 \sim 125$             | 40k                            |
| [90]      | $\pm 0.1$                | $247.5 \mu W$        | $0.7 \mu { m m}$  | 4.5                                           | $-55 \sim 125$             | $1 \sim 10$                    |
| [91]      | -0.7/+0.9                | $10 \mu W$           | $0.35 \mu { m m}$ | 0.175                                         | $0 \sim 100$               | 10k                            |
| [65]      | -1.8/+2.2                | $10 \mu W$           | N/A               | N/A                                           | $0 \sim 100$               | $\sim 2$                       |
| [86]      | $\pm 1$                  | $0.9 \mu W$          | $0.18 \mu { m m}$ | 0.2                                           | $27 \sim 47$               | N/A                            |
| This work | -1.6/+3                  | $0.22 \mu W$         | $0.18 \mu { m m}$ | 0.05                                          | 0~100                      | 100                            |

Table 3.1: Comparison of temperature sensors.



Figure 3.7: Temperature inaccuracy of the temperature sensor with two-point calibration at  $20^{\circ}$ C and  $80^{\circ}$ C.

ing range from  $0^{\circ}$ C to  $100^{\circ}$ C. With 11 bits output from the sensor controller, the temperature resolution is  $0.3^{\circ}$ C.

Table 3.1 lists the previous works on smart temperature sensors and compares the key circuits parameters to this work. It can be seen that our proposed temperature sensor adopts an approach that is favorable for low power operation at the expense in terms of temperature inaccuracy. The total area of our test chip is comparable or even smaller than other works after considering the translation of different technologies.

Fig. 3.8 shows the long term characteristics of the sensor by setting up the chip in



Figure 3.8: Temperature inaccuracy over samples (top: 10 samples/s; bottom: 100 samples/s; solid line: actual temperature).

the temperature chamber (top: 10 samples/s; bottom: 100 samples/s). After taking 1000 samples successively, the  $3\sigma$  inaccuracy value over the samples is 2.5°C. By lowering the conversion rate to 10 samples/s, the  $3\sigma$  inaccuracy is reduced to 0.28°C by averaging the samples. The actual temperature is also shown in solid line in Fig. 3.8.

### 3.4 Conclusion

In this work, we implemented an ultra low power temperature sensor for passive wireless applications. At room temperature, it consumes merely 220nW while continuously running. By utilizing a temperature independent current source  $I_{ref}$  and PTAT current source  $I_{PTAT}$ , the temperature information can be synthesized and translated into digital output in a conversion rate of 100 samples/s. Measured data shows that the temperature inaccuracy of the temperature sensor is  $-1.6^{\circ}C/+3^{\circ}C$  from 0°C to 100°C.



Figure 3.9: Modified temperature insensitive current source.

#### 3.5 Improving the voltage sensitivity

In order to minimize the supply voltage sensitivity, the circuit shown in Fig. 3.3 needs to be revisited. As supply voltage deviates from its nominal value by  $\Delta V$ , the current varies by  $\Delta I = \Delta V/R_{ref}$ .  $\Delta I$  will result in a shift of output value given the same measured temperature. More importantly, it will also distort the current to frequency conversion by the current starved ring oscillator since the relationship between the input current and the output frequency is a non-linear function. One solution is to investigate another way of translating the current to frequency that is less impacted by the supply voltage. For example, a transmission gate based current starved oscillator can be used.

Another solution is to consider the circuit that shown in Fig. 3.9. Instead of using voltage divider as the voltage reference,  $V_{ref}$  which provides absolute voltage level needs to be generated. The other change is to place the resistor between ground and *na*. Therefore, a supply voltage insensitive current source can be generated.  $V_{ref}$  can be provided by the circuit shown in Fig. 3.10. The details of the voltage reference will be discussed in Sec. 6.2.2. According to the Monte Carlo simulations, more than



Figure 3.10: Voltage reference generator.

40dB's of power supply rejection ratio (PSRR) can be achieved using the circuit. The temperature coefficient of  $V_{ref}$  can be compensated by sizing transistor M1. At the temperature of interest, the temperature coefficient of the thermal voltage and the threshold voltage of M1 cancels each other. Again from 100 Monte Carlo Simulations, the worse case temperature coefficient is 167ppm/°C.

#### CHAPTER IV

# SINGLE STAGE STATIC LEVEL SHIFTER DESIGN FOR SUBTHRESHOLD TO I/O VOLTAGE CONVERSION

#### 4.1 Introduction

Operating in the subthreshold region helps to greatly reduce power dissipation for applications that do not require high performance [80, 19]. Level conversion has always been an issue for systems that need to deal with two or more power domains. This problem is more severe in subthreshold circuits. Since the drive strength of the input devices are mostly limited to subthreshold operation and have a corresponding exponential dependency on voltage, several intermediate voltages are typically required to up-convert to I/O voltage levels. Generating intermediate voltages and the extra wiring requirements are undesirable side effects of multi-stage level conversion. In this paper we discuss the design issues associated with bridging the subthreshold core logic and I/O voltage in a single stage design, and propose a robust circuit that addresses these issues.

Differential cascode voltage switch (DCVS) is a commonly used circuit technique for voltage level conversion [93]. It has the advantage of low static power consumption and small propagation delay due to the cross-coupled latch structure. The drawback is that converting from ultra-low input voltages requires large transistors, which will be discussed in Sec. 4.2. Modified DCVS was proposed to alleviate the contention problem [94], however it still suffers the same sizing issue as the DCVS level shifter when the input is at subthreshold voltage. A dynamic level converter is a reliable way to achieve level conversion at low voltages [95, 96]. The disadvantage is that it is more power hungry compared to its static counterpart and requires extra clock routing and synchronization circuitry. Since power consumption is the critical performance metric in subthreshold systems, dynamic level conversion becomes undesirable. Single supply diode-voltage-limited buffer and half-latch level converters are other options used for dual supply systems [97]. While not specifically designed for subthreshold level conversion, these designs generally require the cascading of multiple stages.

In this chapter, we will first examine the sizing of conventional DCVS level shifters at very low voltages and demonstrate its susceptibility to process/voltage/temperature (PVT) variation. We will then present a modified diode-voltage-limited level shifter that takes the advantage of the input level independent pull-up devices. We compare the proposed circuit to the conventional approach in terms of power consumption, area, and delay.

### 4.2 Conventional approach

Fig. 4.1 shows the circuit diagram of a DCVS-type level shifter. The circuit operates on the basis of contention between pull-up and pull-down devices. In order for the output to switch, the NMOS drive strength has to be sufficiently greater than the PMOS drive strength. When *VDDL* is at a subthreshold level and *VDDH* is the I/O voltage, the difference in drive strength for the pull-up and pull-down transistors can easily be greater than three orders of magnitude.

Fig. 4.2 shows the simulated gate delay of the DCVS level shifter converting a periodic input to the I/O voltage in 0.13 $\mu$ m CMOS. Transistors Mp1 and Mp2 are both sized at W/L of 0.36 $\mu$ m/5 $\mu$ m<sup>-1</sup> to provide decent fall delay with respect to the

 $<sup>^{1}0.36\</sup>mu$ m is the minimum width of I/O device in this technology



Figure 4.1: Conventional DCVS-type level shifter with cross-coupled pull-up transistors.

rise delay while restricting the size of the pull-down transistors Mn1 and Mn2. When VDDL is 0.35V, the operating frequency is primarily limited by the fall delay if the width of Mn1 and Mn2 (represented by Wn) is greater than 15 $\mu$ m. This also indicates that the results can be further optimized by up-sizing Mp1 and Mp2. However, at lower VDDL, the rise delay is not able to catch up with the fall delay until Mn1 and Mn2 is disproportionately large. Other than the area and corresponding leakage power arising from such large pull-down transistor sizes, operating at low temperatures leads to another problem for this type of circuit. The drive strength of Mp1/Mp2 are (at most) quadratically sensitive to Vt shift while the drain current of Mn1/Mn2 has an exponential dependency on Vt. As a result it is much more difficult to balance drive strengths at lower temperature by sizing up Wn. In typical cases, conventional level shifter is able to achieve less than 5 fanout-of-four inverter (FO4) delays at VDDL, which is sufficiently fast for most subthreshold applications. However, considering  $3\sigma$  process variation, Wn needs to be up-sized at least two times larger than the nominal case to maintain functionality.



Figure 4.2: Simulation results showing the operating frequency with respect to pull-down transistor width Wn.

## 4.3 Proposed approach

To overcome the dramatic difference in overdrive voltage for pull-up and pull-down transistors, diode-connected PMOS transistors are used to replace the pull-up transistors. The proposed circuit is shown in Fig. 4.3. The pull-down of internal nodes *intn* and *intp* is directly through normal Vt transistor Mn1 and zero Vt thick oxide transistor Mn2. The use of Mn2 was previously proposed to reduce the potential difference across the drain and source of Mn1 [98]. Therefore, the drive strength of Mn1 can be increased by avoiding the use of thick oxide devices. We apply this stack transistor technique to the conventional level shifter for fair comparisons in Sec. 4.4. The purpose of Mp1 is to help eliminate part of the cross-bar current at the beginning of the transition, although it also introduces roughly 10% delay penalty due to the extra loading.

For pulling up the internal nodes, Md1 through MdN transistor stacks provide a variable resistance path to the supply. At the beginning of input switching, most of



Figure 4.3: Proposed approach that uses input voltage independent diode-connected transistor stacks for pull-up devices.

the *VDDH* voltage drop is across Md1-MdN. Assuming that each transistor in the stack is still biased above threshold voltage and neglecting second order effects, the effective resistance of the stack can be represented by

$$R_{eff} = \frac{\mathsf{VDDH} \cdot L_p}{\mu \cdot C_{ox} \cdot W_p (\frac{\mathsf{VDDH}}{\mathsf{N}} - V_t)^2}$$
(4.1)

where N is the number of devices in the stack. A small N helps to achieve faster falling delay by initially providing a smaller  $R_{eff}$  and reducing the time it stays in subthreshold region before the state is switched. However, a smaller N also leads to larger leakage when the input is at state 1. Nodes *intn* and *intp* are used to drive output transistors Mno and Mpo, respectively. In this way, the 'good 0' property of *intn* and 'good 1' property of *intp* can both be used to reduce static power with little impact on the ON current of the output stage.

The circuit in Fig. 4.4 is proposed to solve the leakage problem introduced in



Figure 4.4: Proposed level shifter with feedback path for leakage reduction.

the previous paragraph. The idea is to add a PMOS header Mp2 on top of the pull-up transistor stack. When both the input and the output is high, n0 is low and n1 is designed at 500mV below VDDH due to the reduced swing inverter. As a consequence, n2 will be pulled high, strongly turning off Mp2 to save leakage. When the input switches low, Mn3 needs to be strong enough with a gate voltage of VDDL to pull down n2. Otherwise, the pull-up transistor stack will not be able to charge *intn* and *intp*, causing functional failure. Assuming that node n2 can be pulled down very quickly after *in* goes low, the rise delay is dictated by the size of transistors Md1 through MdN. In this way, we can choose circuit parameters (including the number of transistors in the stack N, and transistor widths) to match the fall delay with the rise delay without sacrificing leakage power or vice versa.

The circuit diagram for the reduced swing inverter is shown in Fig. 4.5. Note that



Figure 4.5: Reduced swing inverter.

all devices are thick oxide I/O devices in this circuit. When *in* is low, Mp2 and Mp3 can easily pull *out* to *VDDH*. When *in* goes high, it behaves like a reduced swing driver, which is used to save switching energy for interconnect [99]. Instead of pulling all the way to 0, *out* will remain slightly higher than (VDDH-Vtn) in this situation. The use of the reduced swing inverter helps to match the gate overdrive voltage to the subthreshold voltage input that is being converted. The inverted output is designed at 2V for 0.3V to 2.5V voltage conversion. It provides a fast response time for leakage reduction and still makes Mp3 weak enough compared to Mn3 when there is logic contention at n2.

The simulated waveforms of our proposed circuit are shown in Fig. 4.6 where N is 5. As *in* transitions from high to low, n1 is pulled up to *VDDH* to ensure the diode transistors stack is able to sink current from *VDDH*. Therefore, both *intp* and *intn* rise to within 10% of their corresponding steady state voltage in a few hundred ns (FO4 delay at 0.3V in this technology is 18ns). Node *intp* has to be able to quickly turn on the output pull-up transistor Mpo. In this design, the internal node between Md2 and Md3 is chosen as *intp*. The tradeoff in the selection of this node is the



Figure 4.6: Waveforms demonstrating the operation of our proposed circuit.

leakage power since *intp* will never reach *VDDH*. When *in* goes high again, *n1* drops from *VDDH* to turn off Mp2 in order to save leakage power in this state. Nodes *intp* and *intn* decrease as well, allowing *out* to go high.

The guidelines for transistor sizing in the proposed level shifter can be summarized as follows:

- Use minimal transistor dimensions for Mp1 to minimize the intrinsic loading on node *intn*.
- Determine the size of Mp2 based on leakage current constraints when the input is at state 1.
- Size Mn1 to meet the target leakage current at state 0 and the rise delay requirements.
- Choose N (the number of stacked PMOS devices) by calculating the fall delay based on supply voltage and Vt of each transistor.
- Verify that the pull-down strength of Mn3 is always stronger than the pull-up strength of Mp3 at process corners.



Figure 4.7: Sizing of diode-connected stacked PMOS (Wp) versus gate delay and power dissipation.

Among these steps, the sizing of transistors Md1 through MdN is the most difficult to determine analytically since the operating range is mostly between the subthreshold and superthreshold regions. For a better understanding of the tradeoffs, transistor size Wp versus gate delay and power dissipation is simulated and shown in Fig. 4.7. By taking the maximum of the rise and fall delays, gate delay represents the maximum operating frequency of the level shifter. N is chosen to be 5 and the width of transistor Mn1 is  $5\mu$ m. When Wp is small, gate delay is dominated by the fall delay. Increasing Wp both decreases the fall delay by reducing the effective pull-up resistance and increases the rise delay as the parasitic capacitance also grows. To illustrate a typical case for subthreshold operation at 300mV, the circuit is running at 100 FO4 delay with an activity factor of 0.1. The cross-bar current dominates other leakage sources in this scenario. Therefore, as Wp increases the power consumption decreases due to the faster rising transition of internal nodes *indp* and *indn*, despite the fact that parasitic capacitances also rise. As expected, power dissipation saturates to a certain value as Wp becomes large and finally will start rising as parasitic capacitances dominate.



Figure 4.8: Comparison of level shifters. (a) Gate delay, (b) Power consumption.

### 4.4 Simulation results

In this section, we compare the performance of the proposed level shifter to the conventional design in terms of delay, power, robustness, and area. The target output voltage **VDDH** is 2.5V, which is being converted to from a subthreshold voltage of 300mV. The simulations are conducted using commercial  $0.13\mu$ m CMOS logic technology. Power values do not include the switching power that drives a physical package pin. As was explained in Sec. 4.2, the W/L of pull-up transistors are  $0.36\mu$ m/5 $\mu$ m for the conventional level shifter.

We first examine the gate delay and power dissipation of both circuits at different temperatures. Worst-case corner (fast PMOS and slow NMOS) is applied for all transistors. In Fig. 4.8a, gate delay is plotted with respect to temperature. Both circuits can operate sufficiently fast (roughly 5 FO4 delays) above room temperature. However, the conventional circuit runs much slower at low temperatures and even fails to function completely below -10°C. This is due to the pull-down NMOS devices becoming exponentially weaker at low temperature while PMOS become stronger due to their mobility increase (as they are not operating in subthreshold). On the other hand, the gate delay of our proposed circuit remains almost constant in terms of FO4 delays across temperature. The power dissipation of the level converters are calculated with a period of 5000 FO4 inverter delays. Fig. 4.8b clearly shows that the proposed circuit has lower power than the conventional design. The first reason is due to the fact that the conventional circuit is slower, making it more susceptible to crossbar current. In addition, large NMOS sizes are required in the DCVS case due to the difficulty in low temperature conversion and result in large parasitic capacitances. This increases both switching and leakage power components.

Process variation is an important characteristic for subthreshold circuits. We perform 5000 Monte Carlo SPICE simulations on both level shifters and show the variation in gate delay when converting from 0.3V to 2.5V (Fig. 4.9). Process related parameters such as Vt and geometry are the sweeping factors in this setup. Both level shifters are configured the same way as the experiment in the previous paragraph. The  $\mu$  and  $\sigma$  of conventional level shifter are 3.1 and 2.77 in terms of FO4 inverter delay. On the other hand, the  $\mu$  and  $\sigma$  of our proposed level shifter are 2.63 and 1.13 FO4 delays, respectively. Although the proposed level shifter is smaller in total area and has a more complicated circuit structure, it is still less affected by process variations. The reason is that our proposed circuits with the diode-connected pull-up stack has less contention at the beginning of state switching compared to the conventional one. In other words, speed penalty caused by contention has less impact on our circuits when process parameters vary.

Table 4.1 summarizes the circuit parameters of the conventional and proposed level shifters at worst-case process corner and room temperature. These values do not include the input buffer to the level shifters to simplify the comparison. Therefore,



Figure 4.9: Monte Carlo simulation results showing gate delay variation across process spread.

| Table 4.1: Comparison of level shifters. |                     |                 |  |  |
|------------------------------------------|---------------------|-----------------|--|--|
|                                          | Conventional design | Proposed design |  |  |
| Active energy (fJ)                       | 828                 | 102             |  |  |
| Leakage power (pW)                       | 1080                | 121             |  |  |
| Rise delay ( $\#$ of FO4)                | 4.74                | 4.16            |  |  |
| Fall delay ( $\#$ of FO4)                | 4.64                | 3.79            |  |  |
| Total transistor area $(\mu m^2)$        | 30.58               | 11.11           |  |  |

the power consumption of the conventional one is underestimated due to its larger input capacitances. The conventional level shifter suffers from area (calculated as the sum of transistor areas) and power penalties in order to increase the driving strength of pull-down transistors especially at low temperatures. The routing cost and irregular structure of the proposed circuit will reduce the area difference in physical design, however, it still has a clear edge in this category.

### 4.5 Conclusion

In this work, we proposed a subthreshold to I/O voltage level shifter that relies on a pull-up transistor stack independent of the input voltage. Through a feedback mechanism that reduces leakage when the input is high, it also improves the transition speed of the circuit. The proposed level shifter was compared to the conventional DCVS-type level shifter, and shows advantages in power dissipation, gate delay and total area. The proposed level shifter is also capable of converting a 0.3V incoming signal to 2.5V output robustly across process variation according to Monte Carlo SPICE simulations.

#### CHAPTER V

# SENSOR DATA RETRIEVAL USING ALIGNMENT INDEPENDENT CAPACITIVE SIGNALING

### 5.1 Introduction

Miniature self-sustaining sensor nodes have become a viable option with silicon technology scaling. Such a system can be easily attached to, or implanted into, various objects for applications such as periodic sensing and recording of temperature or biochemical data. With energy minimization techniques [80, 100, 101] and aggressive power gating, these systems can potentially operate using a micro-fabricated battery with comparable form factor over an extended period of time [56]. To maintain the form factor for such systems, data read-out can be challenging from two perspectives. First, hardware overhead has to be kept low such that the size of the system is not dominated by the communication components. Secondly, power consumption and instantaneous power spikes during read-out will determine the size of battery and passive components such as decoupling capacitors. Passive radio-frequency identification (RFID) transponder techniques can be used to eliminate read-out energy dissipation for the sensor, but this generally requires an external coil on a centimeter scale, significantly limiting the application space [102]. Near-field pulse signaling through inductive coupling has been reported to achieve high bandwidth using integrated inductors while also being energy efficient [103, 104]. However, the power required for sending data back from the sensor chip still needs to be supplied externally.

Capacitive coupling is another favored candidate for near field communication due to its high bandwidth and low energy consumption capable of achieving less than 0.1pJ/b [105, 106]. Simultaneous data and power transmission has also been successfully tested for silicon on a stack [107]. It is an advantageous solution for small form factor systems where chip stacking is applicable since all hardware can be integrated into a silicon chip. On the other hand, the signal strength of capacitive coupling is inversely proportional to the distance between the pads, which makes the robustness of such scheme very susceptible to misalignment. Pad alignment of about  $3\mu$ m, achieved by markings on the edge of a scriber line, was reported [63]. Vernier bar patterns were also proposed to electrically detect the alignment between chips so that alignment error down to  $1.4\mu m$  can be detected [64]. The accuracy of alignment can be further improved by dividing each transmit plate into smaller microplates. By driving the appropriate microplates with embedded switching circuits, the mechanical misalignment can be compensated up to  $+/-25\mu$ m [108]. More quantized alignment information can be provided by a capacitive sensor and alignment circuits. As reported in [109], the analog output by the alignment circuits is able to differentiate alignment error down to  $0.1 \mu m$ .

In this work, we propose a capacitive coupling based method where the communication module is fully integrated with the sensor node on a sub-mm scale. The goal is to provide a convenient read-out mechanism without the aid of a optical microscope and positioning by micromanipulator. We use the terminology sensor chip (SC) and data retrieval (DR) chip to indicate corresponding concepts referred to as the transponder and interrogator in RFID systems. By dividing the data retrieval pads into microplates, individual microplates can be grouped together to establish power and signal channels after alignment is known. A digital alignment circuit is used to translate misalignment into digital output so that the configuration can be computed externally. In Sec. 5.2, the geometric design issues associated with capacitive coupling will be discussed. The proposed system architecture will be shown in Sec. 5.3 along with circuits blocks. Several design aspects will be highlighted in Sec. 5.4 with silicon measurement results. Sec. 5.5 concludes the work.

### 5.2 Geometry optimization

Since our goal is to achieve chip to chip communication without fine tuning the alignment, the pad pattern is designed considering the electrical field in the *worst case* due to misalignment. Thus, we first seek the *worst case* scenario when stacking two chips face-to-face. There are two assumptions in the following analysis:

- 1. The data retrieval chip is composed of a large array of square pads so that the sensor chip is completely covered by the pad array.
- 2. The distance between the coupling pads is not a function of location, i.e. the thickness of passivation layer is fixed.

The first assumption requires that the data retrieval array is large enough such that the sensor chip can be easily dropped on top of it while the entire sensor chip is still within the boundary of the receiver array. By designing the receiver array two times larger than the sensor chip, this assumption can be satisfied without fine positioning by micromanipulator. The second assumption relies on the uniformity of the final passivation, which can be affected by many issues such as dust on the surface of the chip. In general, this is not a deterministic process from the circuit designer's perspective of view. Therefore, it is reasonable to make this assumption at design time.

Both the power and signal channels are required to be established during communication. For the sensor chip, the power pads will be allocated with as much area as possible to maximize the charge that can be harvested. On the other hand, signal pad sizing presents a tradeoff between capacitive loading and coupling factor. A



Figure 5.1: The relative position of the receiver array and sensor signal pad when  $W_{RX} \ll \sqrt{2}W_{TX}$ .

larger signal pad means greater energy consumption at each transition, while reducing the size decreases the sensible voltage seen by the data retrieval pad given fixed parasitic capacitances. For the data retrieval array, the pads are placed as close as possible so that the uncovered area can be minimized. The spacing between pads is typically constrained by two DRC (Design Rule Check) rules in advanced VLSI technologies: the metal density rule that is allowed in the process and the minimum spacing between top metal layers. In the following analysis, the separation between data retrieval pads is fixed at  $5\mu$ m according to the CMOS process we use. Ideally, dividing the pads into a smaller dimension is helpful for a finer configuration. In reality, however, the minimum size of the pads will be decided by the area of the functional blocks associated with each pad.

### 5.2.1 Sizing of the sensor pad

Fig. 5.1 illustrates the *worst case* condition given that all the signal pads are square in this work.  $\theta$  is defined as the offset angle between the two chips. Here  $W_{RX}$ ,  $W_{TX}$ ,  $W_{sep}$  are the width of data retrieval pads, sensor signal pad and the separation between the data retrieval pads, respectively. Since the pads are all squares, from symmetry  $\theta$  is only considered from  $[0, \frac{\pi}{4}]$ . In this case  $W_{RX} \ll \sqrt{2}W_{TX}$ , and polygon ABCD represents the area of interest, which is used to calculate the coupling capacitance of the pads. Line segments r1 through r4 are used to represent the length of the sides. Coupling capacitance is the sum of the parallel plate capacitance and the fringing capacitance

$$C_{c} = C_{pp} \cdot (\text{Area of polygon ABCD}) + C_{fr,TX} \cdot (\overline{BCD}) + C_{fr,RX} \cdot (\overline{DAB})$$
(5.1)

where  $C_{pp}$  is the parallel plate capacitance per unit area,  $C_{fr,RX}$  is the fringing capacitance per unit length of the data retrieval pads and  $C_{fr,TX}$  is the fringing capacitance per unit length of the sensor pads. Using trigonometric function, r1 through r4 can be written as:

$$r1 = \frac{W_{TX}}{2} \cdot \sec \theta - \frac{W_{sep}}{2} \cdot (1 - \tan \theta)$$
(5.2)

$$r2 = \frac{W_{TX}}{2} \cdot (1 - \tan \theta) - \frac{W_{sep}}{2} \cdot \sec \theta$$
(5.3)

$$r3 = \frac{W_{TX}}{2} \cdot \sec\theta - \frac{W_{sep}}{2} \cdot (1 + \tan\theta)$$
(5.4)

$$r4 = \frac{W_{TX}}{2} \cdot (1 + \tan \theta) - \frac{W_{sep}}{2} \cdot \sec \theta$$
(5.5)

It is noted that the above expressions are physically meaningful only when vertex C is still inside the data retrieval pad. In other words, they are valid when  $\theta \leq \theta_t$  where  $\theta_t$  is the angle when vertex B and vertex C overlap. Combining Eq. 5.2 through Eq. 5.5, the area of polygon ABCD can be obtained and simplified as

Area of polygon ABCD = 
$$\frac{1}{2} (r1 \cdot r3 + r2 \cdot r4)$$
  
=  $\frac{1}{2} \left[ \frac{W_{TX}}{2} \cdot \sec \theta - \frac{W_{sep}}{2} \cdot (1 - \tan \theta) \right]$   
 $\cdot \left[ \frac{W_{TX}}{2} \cdot \sec \theta - \frac{W_{sep}}{2} \cdot (1 + \tan \theta) \right]$   
 $+ \frac{1}{2} \left[ \frac{W_{TX}}{2} \cdot (1 - \tan \theta) - \frac{W_{sep}}{2} \cdot \sec \theta) \right]$   
 $\cdot \left[ \frac{W_{TX}}{2} \cdot (1 + \tan \theta) - \frac{W_{sep}}{2} \cdot \sec \theta) \right]$   
 $= \frac{1}{4} W_{TX}^2 + \frac{1}{4} W_{sep}^2 - \frac{1}{2} W_{TX} \cdot W_{sep} \cdot \sec \theta$  (5.6)

while the line segment  $\overline{BCD}$  and  $\overline{DAB}$  are

$$\overline{BCD} = (r2 + r4)$$
$$= W_{TX} - W_{sep} \cdot \sec \theta$$
(5.7)
$$\overline{DAB} = (r1 + r3)$$

$$= W_{TX} \cdot \sec \theta - W_{sep} \tag{5.8}$$

As mentioned before,  $W_{sep}$  is designed to be small enough to maximize the coupled area. Thus, it is reasonable to assume that  $C_{fr,RX}$  is negligible compared to  $C_{fr,TX}$  because the electric field lines from sidewall  $\overline{DAB}$  are mostly terminated at the neighboring receiver pads instead of the sensor pad.  $C_c$  can then be rewritten as

$$C_{c} = C_{pp} \cdot \left[\frac{1}{4}W_{TX}^{2} + \frac{1}{4}W_{sep}^{2} - \frac{1}{2}W_{TX} \cdot W_{sep} \cdot \sec\theta\right] + C_{fr,TX} \cdot (W_{TX} - W_{sep} \cdot \sec\theta)$$
(5.9)

Similar derivations can also be applied to other cases such as  $\theta > \theta_t$  or when  $W_{RX} > \sqrt{2}W_{TX}$ . Based on the analysis the worse case occurs when  $\theta = \frac{\pi}{4}$ .

With the aid of 3D field solver tools [110], the relationship between  $W_{TX}$  and coupling capacitance can be found as in Fig. 5.2. In the technology used in this work,



Figure 5.2: Relationship between pad size of sensor chip to the coupling capacitance with  $W_{RX}=50\,\mu\text{m}$ .

the minimum size of the data retrieval pad is 50 $\mu$ m due to the active circuits area. With  $W_{RX}$  and  $W_{sep}$  being 50 $\mu$ m and 5 $\mu$ m, the simulation results with respect to different outer dimensions of  $W_{TX}$  are plotted. The coupling capacitance gradually increases until about 150 $\mu$ m. At this point the sensor pad is large enough to cover at least one data retrieval pad no matter where it is located. Further simulation result shows that the difference between coupling capacitance at different orientations is within 1%, suggesting that a consistently good coupling ratio can be achieved at  $W_{TX} = 150\mu$ m. To sum up, sensor pads are chosen to be about three times larger than the receiver pad to maximize coupling in the worst case condition.

#### 5.2.2 Single-ended vs. differential signaling

In the previous section, only a single pad was considered to transmit a signal from the sensor. On the other hand, the signal strength can be doubled by implementing differential signaling. Consider the diagram shown in Fig. 5.3, assuming that the dimension of the pads are the same as given in Sec. 5.2.1. In this scheme, both Pads



Figure 5.3: Differential signaling scheme. Pad A (square with slant lines) together with all the other pads in light gray are used to recover the signal from the sensor.

A and B are required to amplify the differential signal from the sensor pads. Since the sensor chip can land in any orientation, 15 DR pads along with Pad B have to be routed into Pad A to make sure that signals from both sensor pads are able to be picked up by the DR pad.

In a simplified analysis, the coupled voltage from the sensor pad to the receiver pad is proportional to the ratio of coupling capacitance  $(C_{couple})$  and ground capacitance  $(C_{gnd})$  where  $C_{gnd}$  already includes the input capacitance of the amplifier. For a single-ended signaling scheme, the coupling coefficient can be written as

$$C_{c,single} = \frac{V_{couple}}{V_{tran}} = \frac{C_{couple}}{C_{gnd} + C_{couple}}$$
(5.10)

 $V_{couple}$  and  $V_{tran}$  are the coupled voltage and transmitted amplitude, respectively. For a differential signaling scheme, the coupling coefficient is given by

$$C_{c,diff} = \sum_{i=1}^{2} \frac{C_{couple,i}}{C_{gnd,i} + C_{couple,i} + N \cdot C_{sw} + C_{wire}}$$
(5.11)

where  $C_{sw}$  is the device loading of the switches that control the destination of coupled

signal, N is the number of other pads the pad has to connect to, and  $C_{wire}$  denote the extra wire loading due to the differential signaling scheme. Assuming  $C_{couple,1} \approx C_{couple,2} = C_{couple}$  and  $C_{gnd,1} \approx C_{gnd,2} = C_{gnd}$ , the difference between  $C_{c,single}$  and  $C_{c,diff}$  is

$$C_{c,single} - C_{c,diff}$$

$$= \frac{C_{couple}}{C_{gnd} + C_{couple}} - \frac{2C_{couple}}{C_{gnd} + c_{couple} + N \cdot C_{ckt} + C_{wire}}$$

$$= \left(\frac{C_{couple}}{C_{gnd} + C_{couple}}\right) \cdot \left(\frac{N \cdot C_{sw} + C_{wire} - C_{gnd} - C_{couple}}{C_{gnd} + C_{couple} + N \cdot C_{sw} + C_{wire}}\right)$$
(5.12)

In other words, the differential scheme is better than the single-ended scheme only when the sum of  $N \cdot C_{sw}$  and  $C_{wire}$  is smaller than the sum of  $C_{gnd}$  and  $C_{couple}$ .  $C_{gnd}$  and  $C_{couple}$  can be estimated from the process and geometry, or more precisely, through RC extraction tools. For a DR pad that is  $50\mu m$  by each side,  $C_{gnd}$  is  $40 \sim 50$  fF if the signal and power routing underneath it are restricted to metal 3 or below.  $C_{sw}$  can generally been ignored if, for example, a transmission gate that is four times as large as the minimum sized transistor is used.  $C_{wire}$  can estimated by the wire length. Considering 15 extra connections require  $150\mu m$  long metal wiring each with minimum width, the total wire loading is 150fF assuming isolated wires. Unless  $C_{couple}$  is more than two times larger than  $C_{gnd}$ , differential signaling scheme will not offer any advantage over the single-ended counterpart. In addition to that, complex wiring in the differential signaling scheme will force wires to be routed at higher levels of metal and will increase  $C_{gnd}$  as a result. Therefore, single-ended signaling is implemented in this work. The dimensions of the pads used in data retrieval chip and sensor chip are summarized in Table. 5.1. Due to fabrication constraints, the actual footprint of the pads are slightly different from the designed values. For example, the DR pad size is reduced from  $50\mu m$  to  $48\mu m$  on a side to comply with metal density rules.

| Table 5.1: Summary of pad dimensions. |                        |                |           |
|---------------------------------------|------------------------|----------------|-----------|
| Pad size                              |                        | Pad Spacing    | Number of |
| _                                     | $(\mu { m m})$         | $(\mu { m m})$ | pads      |
| Sensor chip                           | Power: 225 by 225      | $\sim 20$      | Power: 2  |
|                                       | Signal: $150$ by $150$ |                | Signal: 1 |
| Data retrieval chip                   | 48 by 48               | 5              | 400       |

### 5.3 System architecture

Fig. 5.4 shows the proposed system diagram for sensor data retrieval. The data retrieval chip is responsible for sending power and recovering data from the sensor chip at the same time. Since there is no common reference for both chips, two power channels are required to send AC power differentially. An AC to DC converter at the sensor chip side is used to harvest the supply voltage for the sensor. The clock signal is modulated with the power signals and can be demodulated by the sensor chip, so no additional channel is needed for synchronization. This also helps to precisely control the sensing window of the receiver circuit for better noise rejection. A single signal channel is used to transmit data back to the data retrieval chip as suggested in the previous section.



Figure 5.4: System architecture for the proposed data retrieval mechanism.



Figure 5.5: Data retrieval array showing 20 by 20 cells and controller.

#### 5.3.1 Data retrieval circuits design

While the sensor chip has three pads dedicated to individual channels, the data retrieval chip contains an array of 20 by 20 cells that each can be assigned as the signal channel or can be clustered as a power channel as needed (Fig. 5.5). Each cell is tied to a corresponding DR pad, which serves as communication channels that are reconfigurable based on alignment information. One of the following three functions can be performed by the DR cell at the same time,

- 1. Alignment detection. Alignment information is transformed to digital output and can be scanned out for post-processing.
- 2. Power transmission. The pad is driven by level converters with elevated amplitude to strengthen the signal that is able to reach the sensor pads.
- 3. Signal recovery. The capacitively coupled signal is first amplified and then decoded by the DR controller.



Figure 5.6: Alignment detector. (a) block diagram, (b) operation waveform.

After sensor chip is dropped on top of the data retrieval chip, the alignment detector is essentially an ring oscillator based capacitance-to-digital converter that translates capacitive loading for each DR pad. The ring oscillator converts the capacitance into frequency information represented by  $RING\_CLK$ . Then  $RING\_CLK$  is used to increment the synchronous counter during a given period of time when ENABLE is high (defined by  $SYS\_CLK$ ). The operation waveform is shown in Fig. 5.6(b). To adapt for different speed of ring oscillators across the DR array, a one time zero-calibration method needs to be implemented (Sec. 5.4.2). Although the output has to be limited to 9 bits to physically fit underneath each cell, the circuits can be operated in cyclic mode. This means that the alignment information is discarded. We will revisit the alignment detection issue in Sec. 5.4.2 to explain how useful information can be extracted efficiently for the whole data retrieval array.

For the power transmission drivers, traditional DCVS (differential cascode voltage switch) type level converters are used. Such level converters can easily operate at an output amplitude that is three times higher than the nominal supply voltage within our interested carrier frequency of tens of MHz's. The clock signal is globally distributed to every cell and is locally inverted if an out-of-phase signal is required. In an effort to reduce parasitic capacitance for the DR pads, we restrict the routing layers to metal 3 and below only. Uniform clock wire routing is achieved throughout the DR array by implementing the clock driver all from one side of each row. This provides a feasible routing scheme compared to an H-tree type clock network, at the expense of larger clock skew. The problem of clock skew will be discussed in Sec. 5.4 as it limits the carrier frequency for power harvesting.

Figure. 5.7 shows the data retrieval mechanism. Two differential amplifiers are used to detect both the rising and falling transitions. The input node  $(V_{in})$  is precharged high before the clock goes low to sensitize the amplifiers. Immediately after the clock fires, either  $V_{lh}$  or  $V_{hl}$  will be pulled down depending on the direction of the coupled signal. The high-to-low transition triggers the 400-to-1 AND tree gate that simultaneously monitors all DR pads and results in an UP/DN signal for the one-bit saturation counter that determines the data output. The difference between  $V_{dc}$  and  $V_{dc1}/V_{dc2}$  is designed to be 50mV to mitigate input offset voltage and the impact of noise. The timing diagram in Fig. 5.8 shows that the operation is synchronized to **ext\_clk**. The signal transition only happens after the negative edge of **ext\_clk** and is latched at the positive edge. In this scheme, signal **preset** is used to both precharge  $V_{in}$  and enable the decoder to detect switching events. In other words, the impact of noise on the floating node  $V_{in}$  can be minimized by properly control of the pulse width of **preset**. The pulse width of the signal **preset** and the delay from **ext\_clk** can both be programmable through delay lines.

#### 5.3.2 Sensor chip circuit design

The main building block of the sensor chip is the AC to DC conversion circuit shown in Fig. 5.9. The AC coupled inputs  $V_{in}$  and  $V_{inn}$  are rectified into DC supply voltages by cascading voltage doublers. Each voltage doubler contains a full-bridge hybrid cross-coupled PMOS rectifier. Transistors md1 and md2 set the lowest voltage of  $V_{n1}$ and  $V_{n2}$  to  $V_{dc1}$ . After each input transition at  $V_{in}$  and  $V_{inn}$ ,  $V_{dc2}$  is charged with



Figure 5.7: Data retrieval with capacitive coupled input and periodic precharge to sensitize the amplifier.



Figure 5.8: Timing diagram showing the operation of data retrieval circuits when switching happens.



Figure 5.9: AC to DC conversion circuits for sensor chip power harvesting.

a potential equals to  $V_{dc1}+\Delta V_{in}$  by the cross-coupled PMOS md3 and md4, where  $\Delta V_{in}$  is the coupled amplitude for the sensor chip. Although replacing md1 and md2 with cross-coupled NMOS transistors are advantageous in reducing turn on voltage at the first few stages, it is not feasible for stages with higher voltage inputs. The reason is that without a triple well or deep NWELL process, body effect can eventually result in large NMOS threshold voltage. At the output of the 10th stage, a voltage limiter prevents the supply voltage from going above operating range.

The design of the voltage limiter is shown in Fig. 5.10(a). The general concept is similar to the mode selector in [85]. In this work, a shunt transistor m10 is used to discharge current from  $V_{in}$  (VDD10 in Fig. 5.9) to ground when  $V_{in}$  is above a certain voltage level. To help explain how the voltage is set in hardware, the openloop voltage transfer curve in Fig. 5.10(b) is used. Node n2 will remain close to VSS before  $V_{in}$  exceeds  $2\Delta V$  (where  $\Delta V$  is the turn-on voltage of the diode-connected



Figure 5.10: Voltage limiter. (a) circuits diagram, (b) open loop voltage transfer curve.

transistors m5 and m6). When  $V_{in}$  increases beyond  $2\Delta V$ , the excessive voltage drop will occur mainly across R1, and thus the voltage on n2 begins to track the supply voltage. On the other hand, voltage n1 will be limited at  $2\Delta V$  once the supply voltage is higher than this value. By comparing n1 and n2, the amplifier output n3 will begin to turn on m10 strongly when the supply voltage is greater than 1.6V. Since each voltage doubler stage is identical, intermediate voltage levels VDD1 through VDD10are inherently generated. In this work, we use VDD4 (0.65V) to supply the voltage for a 4-bit LFSR circuit to generate a data stream with low power consumption and then up-convert to VDD10 to increase signal strength before transmission. A power-onreset circuit is usually required to avoid the deadlock situation when all the register outputs are zero. This is relatively easy for the LFSR circuit used in this work to represent logic, since the situation can be avoided by using a NAND4 gate to force advancing the state of LFSR if it starts at the deadlock state.

For clock synchronization, the system clock is amplitude modulated with carrier frequency  $f_c$  using the same power channels. An envelope detector is used to demodulate the clock signal as shown in Fig. 5.11. The differential AC input signal is first rectified and then filtered by a RC low pass filter. Since the input amplitude varies



Figure 5.11: Envelope detector for sensor chip.

due to several factors such as the transmitting amplitude and the distance between pads, a level converter is required so that the demodulated clock is able to drive the logic blocks at 0.65V. For robust level conversion for subthreshold input voltage, a single stage comparator is implemented. In this circuit, *VDD1* and *VDD2* from the voltage doubler stages are used as the reference voltage and bias voltage for the comparator, respectively. In this way, as long as the rectified voltage is higher than *VDD1* the demodulator is able to work properly.

### 5.4 Chip measurement

#### 5.4.1 Test chip

A test chip was fabricated in  $0.13\mu$ m CMOS technology. The die photo is shown in Fig. 5.12. The active die area consumed by the sensor chip is 0.014mm<sup>2</sup>. The size of the data retrieval array is 1.1mm × 1.1mm while the total size of the DR controller and clock generator is 0.08mm<sup>2</sup>. During measurement, the data retrieval chip is packaged and mounted on a PCB. The sensor chip is diced to 0.5mm by 0.5mm, and is manually dropped on top of the data retrieval array without precise positioning. Once the two chips are stacked, we first perform alignment detection and scan out



Figure 5.12: chip die photo.

the information to be externally processed by a PC. The PC will match the data to a known pattern and determine the channel that a particular pad should be assigned to for the DR array. Alternatively, the computations can also be processed on chip if an ALU (Arithmetic Logic Unit) is available. Data clock  $f_{data}$  is generated externally by a function generator and sent along with the decoded data to a PC-based logic analyzer to compute BER (Bit Error Rate).

#### 5.4.2 Alignment detection

We have seen that alignment information can be obtained using the ring oscillators to extract different coupling capacitances seen by each DR pad. To reduce the conversion time, we would like to run as many alignment detectors in parallel as possible. However, activating all alignment detectors at the same time will yield results that do not contain any alignment information. This can be explained by Fig. 5.13 showing the parasitic components of the system when two chips are put in a stack. For DR pads **P1** through **P5**, the parasitic capacitors include coupling capacitors  $C_{c1}$  through



Figure 5.13: Parasitic components for the system of two chips in a stack.

 $C_{c5}$ , ground capacitors  $C_{g1}$  through  $C_{g5}$  and capacitors  $C_{p1}$  through  $C_{p4}$  that exist between pads. By simultaneously oscillating all the pads at the same time, the coupling capacitances will be blocked from the AC ground and therefore the location of the sensor chip will not have any impact on the alignment detectors. In addition, since the impedance of  $C_{p1}$  through  $C_{p4}$  is low at high frequency the whole system will oscillate at the same frequency. To solve this problem, at least one neighboring pad should be grounded for any given oscillating pad. For example, **P2** and **P4** are grounded when **P1**, **P3** and **P5** are running to provide a close return AC path to ground.

From this analysis, we can develop the alignment detection algorithm in a systematic way (Fig. 5.14). The DR array is first divided into four quadrants and only one quadrant is activated at a time. By repeating the capacitance-to-digital conversion four times the results can be merged into a two dimensional table. The table represents a set of zero calibration values for the specific data retrieval chip. The same procedure needs to be repeated again every time the sensor is dropped on top



Figure 5.14: Procedures for alignment detection and pad reconfiguration.

of the DR array to generate another 2D table that represents the actual alignment. A 2D contour plot shown on the bottom right of Fig. 5.14 can be obtained by simply subtracting values from the 2D tables. Each pixel of the plot indicates the value of excessive coupling capacitance due to the existence of the sensor chip. From the plot, both the outline of the sensor chip and the position of the power pads and signal pad can be clearly seen. With the digitized alignment information, the clusters for power pads and signal pad can be computed by comparing the results with a known pattern coming from the chip geometry. As a result, the channels for power transmission and signal reception can be identified and reconfigured properly every time regardless of the position and orientation of the sensor chip.

#### 5.4.3 Measurement results

Measured waveforms of the test chip are shown in Fig. 5.15. At a clock frequency of 1.1MHz, the decoded output shows the data sequence that repeats every 15 cycles. We define the achievable operating frequency (or data rate, since there is only one serial data bit) of this system to be when no errors occur in  $10^9$  cycles. Achievable data rate



Figure 5.15: Decoded data waveform showing pseudo random bit sequences up to 15 unrepeated cycles.

is measured with different transmitting amplitude ( $A_{in}$ ) and carrier frequency  $f_c$ . The results are shown in Fig. 5.16. I/O devices are used for power transmission so  $A_{in}$  can be as high as 3.3V in this 0.13 $\mu$ m technology. The system starts successfully receiving sequence of data with BER less than 10<sup>-9</sup> when  $A_{in}$  exceeds 1.8V. Estimated working distance is also shown on the second x-axis of the  $f_{data}$  plot. Based on measurement data, I/O devices would not be needed if the passivation thickness were reduced by 1/3 from its 5.6 $\mu$ m original value (e.g., by further polishing). Increasing  $A_{in}$  monotonically increases the data rate as expected. At 3V, a data rate as high as 2.5MHz can be achieved with  $f_c$  of 216MHz. However, it is observed that raising  $f_c$  above 150MHz in fact reduces  $f_{data}$ . The reason is that at higher frequencies the clock skew between different cells can cause phase offset for signals in the same power cluster, eventually resulting in a reduction of electric field. Since targeted data working sets for sensor nodes are on the order of kb [111, 112], the achievable data rate is sufficient for complete data retrieval on the ms timescale.

Energy numbers for the test chip are shown in Fig. 5.17. It is clear that increasing  $f_c$  penalizes overall energy consumption since data rate does not scale well with carrier



Figure 5.16: Operating frequency versus transmitting amplitude and carrier frequency with estimated working distance showing on the second x-axis.

frequency  $f_c$ . In this measurement, the transmitting amplitude for minimal energy is around 2.8V. If operated above the minimal energy point, the junction will be slightly forward-biased after each transition for the rectifier circuit shown in Fig. 5.9. Therefore, the charge that can be harvested begins to saturate and results in lower rectifier efficiency. 2nJ/bit is the lowest energy achieved by the proposed system.

Fig. 5.18(a) shows BER with respect to the window size  $(T_w)$ , which is related to the modulated clock for power transmission.  $T_w$  is defined as the period when the output *clk<sub>mod</sub>* (Fig. 5.18(b)) remains at 0. It is required for clock synchronization purpose as the sensor chip needs to demodulate the clock signal and send back the data within the time when  $T_w$  is low. This sets the lower bound for  $T_w$  because of the demodulator's response time. From Fig. 5.18(a), the bathtub shape of BER suggests that there is also an upper bound for  $T_w$ . The reason is that the charge that can be harvested by the sensor chip reduces as  $T_w$  increases for a given period of time. In general, we need to fine tune  $T_w$  within a range of tens of ns for higher data rate. On



Figure 5.17: Energy consumption versus transmitting amplitude and carrier frequency.



Figure 5.18: (a) T<sub>w</sub> versus BER, (b) Clock modulation circuit that defines T<sub>w</sub>.

the other hand, since data rates close to MHz may be excessive for the application, the design requirement for  $T_w$  can be relaxed by simply reducing the transmitting data rate.

Fig. 5.19 shows the data rate vs. BER for 10 random locations at which the sensor was dropped. The alignment 2D contour plots (8 out of 10 locations) are



Figure 5.19: Data rate versus BER with 10 random position testing.

also shown for the corresponding BER curves. Some regions yield a lower data rate mainly because the electric field between the pads is not as strong as the others. The results are distributed into two distinct regions of the plot, however, there is no clear correlation between the position and the achievable data rate. Non-uniform surface of the passivation layer may be one cause for the discrepancy. These results verify that the proposed system adapts to different locations and orientations without the need for precise positioning.

## 5.5 Conclusions

In this work, we presented a near field data retrieval system using capacitive coupling. To alleviate the problem of chip misalignment, an alignment detection and pad reconfiguration method was proposed. The data retrieval pad is divided into an array of micropads, and each micropads can be assigned for sending power or receiving data depending on the alignment information. From the chip measurement results, it was shown that data rate higher than 900kbps can be achieved across 10 random positioning tests. For small form factor sensor systems, this work provides the advantage of little hardware overhead and a flexible operating frequency that is not limited by the dimension of passive components.

#### CHAPTER VI

# NEAR FIELD INDUCTIVE COUPLING USING PLL PHASE-LOCKING AND PULSE SIGNALING

### 6.1 Introduction

Radio frequency identification (RFID) is widely used among various areas including personal identification, public transportations, and many more. For near field RFID transponder, the range of operation can vary from a few meters to less than 10cm depending on the operating frequency [113, 114]. Because of the cheap cost, near field applications usually adopt passive RFID tag that does not rely on any internal power supply. While harvesting power from the reader (interrogator), the transponder transmits data back through backscattering [102, 115, 116]. The concept of backscattering is shown on the left of Fig. 6.1. The data is modulated by changing the load impedance that is seen by the incoming AC signal of the transponder. Changing  $Z_m$  leads to phase modulation (PM) or amplitude modulation (AM) depending on whether  $Z_m$  is in the form of the capacitance or the resistance. The modulated data is usually located on the subcarrier frequency which is tens of kHz's away from the carrier signal. One of the subcarrier frequencies can be downconverted at the reader to decode the data. In this scheme, there are two main limitations for the reader. First, the inductors should be designed with high quality factor (Q) to maximize the energy range. However, higher Q damps the subcarrier frequency and weakens the sensible amplitude. Secondly, the transmitter is continuously switching



Figure 6.1: Comparison of transponder data encoding with back scattering and pulse signaling.

in order to power the transponder which causes significant amount of noise by the oscillator. Therefore, the subcarrier signal should be no more than 100dB lower than the transmitter's carrier signal [87].

An unique application for the near field application is the medical implanted device, for example, an intraocular pressure sensor that helps glaucoma detection and diagnosis. The required range of operation for such an application can be as close as a few mm's. At the same time, the form factor should also be small enough considering the intrusiveness to the body. In this work, we propose a time-multiplexing inductive coupling scheme in an effort to alleviate the design limitations on traditional backscattering method. The concept is shown on the right of Fig. 6.1. Instead of sending power continuously to the transponder, a small gap is created so that the transponder can send the uplink signals through pulse signaling using the same inductor. During the same period of time, the transponder is also synchronized to the reader by an envelope detector. The oscillator of the reader can be turned off so that the noise floor of the receiver can be greatly reduced.Pulse signaling is widely used in ultra wide band (UWB) communications [66] and recently used by proximity inductive coupling to achieve high data rate and low energy operation [67, 103, 117]. By sending the pulses at the resonant frequency, the amplitude that reaches the receiver input is maximized with a given power constraint. A key to the scheme is to design the transponder and the reader with identical resonant frequency with the maximal available Q. In Sec. 6.2, the system architecture along with design of the circuits will be shown. Test chip and silicon measurement results will be presented in Sec. 6.3. Conclusion will be drawn in Sec. 6.4.

### 6.2 System architecture

The proposed system is shown in Fig. 6.2. The reader sends a continuous power signal that is modulated by the clock. At the transponder side, a power harvesting module generates the DC supply voltage by rectifying the incoming AC signals. In order to send pulses at the resonant frequency, a phase-locked-loop (PLL) is used to replicate the frequency  $f_{VCO}$  from the input frequency  $f_{in}$ . The clock for synchronization is demodulated from the notch of the continuous wave. A timing controller keeps a small state machine to control the behavior of the transponder and ensures that the timing is precisely followed.



Figure 6.2: System architecture for the proposed pulse signaling method.

| Table 0.1. Summary of the integrated inductor. |                     |  |
|------------------------------------------------|---------------------|--|
| Target distance                                | 2mm                 |  |
| Metal Width                                    | $12 \mu { m m}$     |  |
| Number of turns                                | 13                  |  |
| Metal spacing                                  | $5\mu \mathrm{m}$   |  |
| Shielding                                      | M1 patterned ground |  |
| Hollowness                                     | 0.79                |  |
| DC inductance                                  | 163nH               |  |
| Natural frequency                              | 294MHz              |  |
| Q @ 200 MHz                                    | 8.62                |  |

Table 6.1: Summary of the integrated inductor.

#### 6.2.1 Integrated inductor

Instead of using external coils, an integrated inductor is advantageous in saving device size. On the other hand, it suffers from poor quality factor (Q) due to a lossy substrate. In this work, we restrict the size of the inductor to 1mm by 1mm for both the reader and the transponder. It is reasonable to assume that in our pulse signaling scheme, the limiting factor for the range of operation is the power that can be harvested by the transponder. To optimize the transponder supply power, the geometry of the inductor should be optimized according to the target distance of 2mm. The power available for the transponder is a function of self inductance, coupling coefficient, the resistive loading and the operating frequency at a desirable Q. However, the resistive loading is not a linear function of input amplitude due to the nonlinear transistors. Instead of trying to solve it analytically, a inductor simulation tool called ASITIC is used to calculate the S-parameters and the coupling coefficient k of the inductor. The geometries of the inductor are constrained by the process and metal fills are neglected during the simulation. The resulting S-parameters are transformed into discrete R, L and C values that can be used in the circuit simulators like HSPICE. The optimized parameters for the integrated inductor is shown in Table. 6.1. The operating frequency is designed at 200MHz to compromise between the quality factor and achievable operation frequency of the transponder.

#### 6.2.2 Transponder circuits

In this section, the building blocks of the transponder will be presented. The goal of the power harvesting module is to perform the following three tasks:

- AC to DC conversion.
- Signals the controller when the rectified voltage is below certain level.
- Voltage regulation.

The block diagram of the power harvesting module is shown in Fig. 6.3. An AC to DC conversion is accomplished with the same method that used in Sec. 5.3. Instead of using 10 stages of voltage doublers, 5 stages are used in this work since the minimum input amplitude that is required is higher so the power harvesting module will not benefit from having more stages. The voltage limiter clamps the supply voltage at 2V which will be used to supply the voltage regulators and the output drivers. As explained in Sec. 5.3, voltage  $V_{n3}$  will suppress voltage  $V_{n6}$  when the voltage limiter approaches the designed voltage of 2V. When the supply voltage VDD5 reduces from 2V,  $V_{n3}$  reduces while  $V_{n6}$  should remain unchanged until VDD5 is lower than one diode drop. Implementing a Schmitt trigger with  $V_{n6}$  as the supply voltage and  $V_{n3}$  as the input voltage, the state of  $pump\_enable$  will be flipped when  $V_{n3}$  becomes lower than 1.2V. The active high *pump\_enable* signal provides an important information since it happens when the power harvesting module is unable to sustain the power of the PLL. The controller can use the information to determine the time when PLL should be disabled from further draining more power. Two voltage regulators are used in this work. One is to supply the voltage controlled oscillator (VCO) of the PLL and the other one is to supply the rest of the chip. With a dedicate power supply for the VCO, the supply noise from the digital controller can be largely reduced.

The regulator output voltage is a critical parameter to the transponder. Ideally, we would like to lower the supply voltage as low as possible to take advantage of the



Figure 6.3: Power harvesting module with the schematic of the voltage limiter.

quadratic saving of the dynamic power. On the other hand, it still needs to meet the requirement of the timing critical elements, in this case, the input buffer and the phase frequency detector of the PLL. Based on our simulation results, the minimum operating voltage for the PLL with 200MHz input is 670mV. Considering the margins for process variations, the output voltage is designed at 770mV. The voltage regulator circuit is shown in Fig. 6.4. It includes a voltage reference stage, a start-up stage and an output stage. The reference voltage  $V_{ref}$  can be expressed as

$$V_{ref} = nV_T \cdot \ln\left[\frac{W_3 W_4 L_2 L_5}{W_2 W_5 L_3 L_4}\right] + V_{n0}$$
 (6.1)

where  $V_T$  is the thermal voltage equals to kT/q. If biased at least three thermal voltages above ground potential, the  $V_{ds}$  dependency of transistor M3 can be neglected.



Figure 6.4: Schematic for the voltage regulator.

 $V_{n0}$  becomes

$$\mathbf{V}_{n0} = V_{th} + V_T \cdot \ln\left[\left(\frac{W_4 L_5}{L_4 W_5}\right) \left(\frac{\mathbf{I}_{R1}}{\mu_{eff} C_{ox} \frac{W_3}{L_3} V_T^2}\right)\right]$$
(6.2)

where  $I_{R1}$  can be decided by the first term of the RHS of Eq. 6.1 and the resistance of R1. It is shown that  $V_{ref}$  is unrelated to the supply voltage.

The output stage is simply a unity gain buffer. The size of transistor pairs (M12, M13) and (M15, M16) are made unbalanced so that the output voltage level can be shifted to a higher voltage compared to  $V_{ref}$ . In addition to that, unbalanced sizing also provides temperature compensation [92]. The bias current of the output stage is also controlled by  $V_{ref}$ . Transistors M18 and M19 are used to provide decent current loading in order to stabilize the output. The start-up circuit is to assist the transient response of the voltage reference stage before it achieves the steady state. The current of transistors M4 and M5 need to be large enough so that the self-biasing mechanism starts to take over. The voltage reference circuit has two operating point: the desired state with  $I_{ref}$  flowing and the undesired state where the current is near

0. As the supply voltage ramp up, voltage  $V_{n1}$  will be pulled up to near Vdd while  $V_{n0}$  will remain close to 0.  $V_{n0}$  will eventually reach the desired state because of leakage. However, it can be a slow process which largely depends on the supply voltage. The startup stage speeds up the process. When the current of transistor M7 is low, transistor M6 is turned on to shunt voltages  $V_{n1}$  and  $V_{ref}$ . The charge is redistributed between  $V_{n1}$  and  $V_{ref}$  so that  $V_{ref}$  rises faster toward the desired voltage. Simulation results show  $1\mu$ s for  $V_{ref}$  to attain the steady state which is at least 20 times better than the circuit without the startup stage.

The block diagram of the PLL is shown in Fig. 6.5. A type II PLL is used for locking into a wider range of incoming frequency [118]. For power saving purpose, each block of the PLL can be individually turned off. The PLL operates at a locking mode and a signal pulsing mode depending on whether the voltage controlled oscillator (VCO) is the only active circuits. During the locking mode, the PLL operates in a negative feedback loop and every block is activated. A pulsing mode is used when the blocks associated with **VDDR1** are turned off so that the VCO is running directly off of the voltage stored in the low-passed filter (LPF). A second order loop filter is implemented to reduce noise injection at every clock cycle and to still maintain a stabilized loop [119]. As a rule of thumb, the loop bandwidth has to be at least 5 times smaller than the reference frequency to avoid instability [120]. With a reference frequency of 200MHz in this work, 20MHz of loop bandwidth is designed to provide reasonable tradeoff between stability and transient response. With the polysilicon resistor and the MOS capacitors, the total area of the integrated loop filter is  $180\mu m^2$ in this  $0.13\mu$ m technology.

In order to operate at 200MHz with 770mV supply voltage, operating speed is the major concern for the phase frequency detector. We use the true single phase clock (TSPC) D flip-flops to replace the static flip-flops in the phase frequency detector [121]. The circuit diagram of the TSPC D flip-flop is shown in Fig. 6.6(a). When



Figure 6.5: Schematic for the voltage regulator.

**reset** is high, the output will be precharged high as well. After **reset** goes low, the output node becomes floating and is only sensitive to the positive edge of input signal. From the energy point of view, the TSPC D flip-flop is more efficient compared to static logics due to less parasitic capacitances. The phase frequency detector is shown in Fig. 6.6(b). Two TSPC D flip-flops are used to detect the rising edge from  $f_{ref}$  and  $f_{VCO}$ , respectively. The **reset** signal will be asserted whenever both of the flip-flop outputs are high. The output signals **upn** and **dn** are balanced in terms of propagation delay to reduce the noise that will be injected into the charge pump. The phase frequency detector can be disabled by setting **pll\_en\_bar** to high.

Typically, the charge pump and the VCO are the most sensitive components in PLL regarding phase noise. In this work, however, the PLL does not work in frequency locking mode during transmission. Thus the noise sources such as charge injection and charge sharing are irrelevant to the pulse signaling events. In addition, the timing jitter from VCO has little impact on the resonant signal since the quality factor of the integrated inductors is not high. We can therefore trade off the noise for the circuit complexity and the power consumption for the sensitive components. The circuit diagram of the charge pump is shown in Fig. 6.7(a). The bias current is generated by transistors M1 and M3. The charge pump pulls current from the supply to the output



Figure 6.6: TSPC phase frequency detector. (a) TSPC D filp-flop, (b) circuit diagram of the phase frequency detector.

when the *upn* signal is low, and vice versa. In pulse signaling mode, the charge pump can be turned off by opening the transistor M2. The output will become a floating node but can still drift over time due to the subthreshold leakage from transistors M4 to M7. To minimize the impact of leakage, careful sizing is needed for balancing the pull up and pull down networks when they are off. Fig. 6.7(b) shows the currentstarved VCO. The VCO is composed of 5 inverter stages that are current-starved with control voltage coming from the loop filter output. Since it is only current-starved through NMOS transistors, the duty cycle of the oscillator is not 50%. In order to compensate for uneven rising and falling transitions from the high state and the low state, another current-starved inverter is used to generate the output that is close to 50% duty cycle. A separate control signal *vco\_en\_bar* is used to stop the VCO from running when it is not needed.

For the timing control, the reference clock is obtained from the gap between the incoming AC signals. The demodulated output *demod\_out* from the envelope detector will start reducing its amplitude once the incoming signal stop switching as shown in the waveform of Fig. 6.8. The figure shows the related control signals when the transponder is in the pulse signaling mode. By detecting the transitions



Figure 6.7: Schematics of (a) the charge pump, (b) the VCO.

from *demod\_out*, the active low signal *clk\_bar* is generated to indicate the period when the transponder starts to take over the control of the communication channel. *vco\_en\_bar* becomes 0 right after *clk\_bar* goes low to enable the VCO in a free running mode. The VCO is in fact activated a couple hundreds of nanoseconds before the pulse signaling happens. This is because that the voltage regulator requires some response time switching from very light load to the load of an free running VCO. The delay between the time when *clk\_bar* goes low and the actual pulse signals events is controlled by the number of cycles of the VCO output. A series of pulses will be fired by the drivers at the output frequency of the VCO. At the same time, the signal *demod\_out* may rise since the same communication channel is being excited again. To prevent the controller from misinterpreting that the pulse signaling mode is finished, the controller needs to mask out *demod\_out* and prevents the *clk\_bar* signal from rising. The VCO can be put into sleep right after the pulse signaling mode is completed to save power.

The waveform when the system is in the PLL locking mode is shown in Fig. 6.9. At the beginning of this mode, *clk\_bar* goes high and activates both the VCO and the rest parts of the PLL. After a certain cycle, the VCO output will reach the same frequency as the incoming signal frequency. It is noted that the power consumption is at its peak



Figure 6.8: Timing waveform of the pulse signaling mode.



Figure 6.9: Timing waveform of the PLL locking mode.

in this mode where the PLL is the dominant source. The energy range for the reader depends on the power consumption of the transponder. When the distance between the two chips are far enough apart, the harvested power will not enough to supply the PLL. To extend the energy range, the harvested power should be allowed to be lower than what the PLL is consuming. As mentioned before, the rectified voltage can be prevented from being lower than a certain voltage by the *pump\_enable* signal. The PLL operation will be stopped upon the request of *pump\_enable*. In close distances, however, what may happen is that the PLL can keep running until it enters the pulse signaling mode again. When that happens the reference clock of the PLL will suddenly disappear while the PLL is still trying to track to a frequency that does not exist. To avoid the false locking attempt, a counter in the timing controller records the number of cycles the PLL has entered the locking mode and will stop it after 128 cycles if *pump\_enable* has not been asserted. In general, the PLL locking mode only accounts for a fraction of time when the transponder is remotely powered. The rest of the cycles will be used to replenish the other components that need to be charged for pulse signaling. Part of the harvested energy will go to the supply capacitors that will be discussed in the next paragraph.

During the signal pulsing mode, the open loop VCO output *pll\_clk* is used to produce pulses at the resonant frequency. The scheme is shown in Fig. 6.10. To provide enough current for exciting the reader inductor, the charges are stored on the supply capacitors during power harvesting. Supply voltages *vddp1* to *vddp4* are replenished by the rectifier output *VDD5* when *pulse\_en* is low. In case a signal "1" is sent, a series of pulses will be sent. The output drivers will inject the charges that were previously stored on *vddp1* to *vddp4* successively into the inductor and sink the current out of it at the other end. *lcn* is the input of the pull-down transistor Mn, while *lp1* to *lp4* are the inputs for the pull-up transistors. Each capacitor for supply voltages *vddp1* to *vddp4* is 10pF, which corresponds to  $1890\mu$ m<sup>2</sup> of silicon area.



Figure 6.10: The driver circuits for the transponder.

#### 6.2.3 Reader circuits

Compared to the transponder, the design for the reader is much simpler since no other signals are interfering with the readout data. The output driver is similar to the transponder driver except that it can be driven strongly by the power supply. The carrier clock is modulated with *ext\_clk* which defines the data rate. The circuit that generates the control signals for the output driver is shown in Fig. 6.11. Two pulse generators are composed of delay chains which define two critical period for the system.

- T1 is the time when the reader stops sending power and instead waiting for the readout signals.
- T1-T2 is the time when the reader is allowed to amplify the incoming signals.

Both T1 and T2 are referenced to the negative edge of *ext\_clk*. In this work, T1 is nominally designed at 500ns with adjustable delays to accommodate for process variations. To ensure that the receiver does not unintentionally triggered by the

damping resonant clock of the reader itself, T2 is given by 200ns. Signal *clk* is the carrier frequency of the system at 200MHz. The signals produced by the pulse generators are *pulse\_mod* and *pulse\_pre*, respectively. The drivers are controlled by *pulse\_mod* and *clk* where the charges are replenished into the inductor from opposite direction every half cycle. In order to conveniently adjust the magnetic field, level converters are implemented to drive the driver with larger swing. Assuming that the series resistance of the inductor dominates the current, the AC current will be proportional to the raised switching amplitude. Therefore, the driver transistors should be implemented with thick oxide devices in order to sustain the higher than nominal voltages.

The carrier clock signal is implemented on die as well. The clock generator supports two modes of operation. The first mode generates the clock output at the resonant frequency which can be directly sent to the driver. The second mode produces a clock frequency twice higher than the resonant frequency and follows by a divide-by-two circuit before feeding the output drivers. The reason for the second mode is to produce a near 50% duty cycle signal from the clock generator. Although the second mode consumes more power, the sinusoidal signal always gets replenished at the right time and efficiency of the driver can be improved compared to the first mode.

Since data receiving is time-multiplexed with the power transmitting mode, the receiver does not have to deal with strong interferences. Fig. 6.12 shows the circuit diagram of how the data is decoded and the timing diagram when a single bit of "1" is sent. First, a single stage amplifier with decent gain is sufficient to amplify the resonant signals to full rail. The input signal *in* is first AC coupled and properly DC biased before amplified to *rec\_in*. And then the data can be easily decoded by digital logic gates. Signal *pulse\_pre* is used here to reset the state of *rec\_data* and ensure that the transition of *rec\_data* is uni-direction. Assuming that the amplitude



Figure 6.11: Pulse generator and output driver of the reader.



Figure 6.12: Data receiving scheme for the reader.

of *rec\_in* is large enough to trigger the D flip-flop and produce a rising signal *rec\_data*, the output *data\_out* will be latched at the positive edge of *ext\_clk*. In case when a "0" is sent, *data\_out* will remain the same while *rec\_data* is precharged to 0.

### 6.3 Measurement results

The test chip for inductive coupling was fabricated with a  $0.13\mu$ m CMOS technology. Fig. 6.13 shows the die photo of both the reader and the transponder on the same die. The integrated inductor for both circuits are designed with the same dimensions as shown in Table. 6.1. The active area of the transponder measures  $0.084\mu$ m<sup>2</sup>, while the active area of the reader is  $0.04\mu$ m<sup>2</sup>.

Fig. 6.14 shows the setup for the two chips test. The reader is packaged and



Figure 6.13: Die photo for the reader and transponder of the system.



Figure 6.14: (a) Test setup with the micromanipulator and the PC board, (b) close-up photo.

mounted on a PC board with interfaces to the oscilloscope and a laptop for control signals. The transponder chip is attached to a micromanipulator that allows precise 3D positioning down to  $10\mu$ m on the vertical direction. For the x-axis and the y-axis, the resolution is 0.1mm. In order to maximize the energy range, the resonant frequency should be measured first. As the switching frequency getting closer to the resonant frequency, the loss of charges due to the resistive components decrease. Therefore, the resonant frequency can be found at the local minimum of power consumption by sweeping the switching frequency.

The received waveform in shown in Fig. 6.15. The data stream is encoded with a 4 bit LFSR that generates pseudo-random numbers and repeats every 15 cycles. The worst case happens when a "1" is sent following another "1" from the previous cycle since it gives the transponder the shortest time to replenish the supply capacitors. It is shown that the data can be correctly decoded.

The communication distance  $(d_{max})$  is shown on the left of Fig. 6.16. The switching amplitude  $(V_{sw})$  of the reader can be higher than the nominal Vdd to increase the



Figure 6.15: Measured waveform from the oscilloscope showing the output data and the clock signal

AC current. At a given data rate of  $f_{data}$ ,  $d_{max}$  monotonically increases with  $V_{sw}$ . 1.1mm is the achievable distance with  $V_{sw} = 3V$  and  $f_{data} = 50$ kHz while the power consumption is 16mW. For reduced distance at 0.9mm apart,  $f_{data} = 400$ kHz can be achieved. At higher data rates, the  $d_{max}$  starts decreasing because the harvested energy is also reduced. On the other hand, reducing the data rate is not always advantageous. It is because that the frequency of pulse signaling relies on the ability of the filter in the PLL to hold the bias voltage. However, it suffers from ~20pA of leakage even after the charge pump is turned off. The frequency will further deviate from the resonant frequency as the time between the refreshes increases and result in less sensible signals. As a result, the absolute minimum data rate is 6kHz regardless of the switching amplitude.

Fig. 6.17 shows the plot where the data can be successfully communicated with a



Figure 6.16: Measured communication distance with respect to the data rate and the switching amplitude.

combination of the horizontal misalignment and the vertical distance. At  $V_{sw} = 3V$ , every 0.1mm of misalignment translates to a loss of roughly 0.1mm of communication distance. While at  $V_{sw} = 2.3V$ , the impact from misalignment is only half of that. The power consumption when  $V_{sw} = 2.3V$  is about 8mW.

## 6.4 Conclusion

In this work, we present a pulse signaling based method for data readout from inductive coupled coils in short range. The use of time-multiplexing pulse signaling allows the optimization of quality factor for both the reader and the transponder, as long as the resonant frequency is the same. It also relaxes the constraint on the receiver's sensitivity by eliminating the dominant noise source during data receiving. A PLL is



Figure 6.17: Measured achievable communication distance with misalignment in the x-axis or the y-axis.

implemented on the transponder to replicate the resonant frequency while harvesting the power. The replicate frequency in stored on the loop filter in the form of voltage, which can be later used to drive the VCO and generate pulses that effectively excites the reader's inductor. The test chip was fabricated in  $0.13\mu$ m technology with 1mm×1mm of integrated inductors on both the reader and the transponder. The measurement results demonstrate successful reading at 1.1mm of distance with 16mW of power consumption.

#### CHAPTER VII

# CONTRIBUTIONS AND FUTURE WORKS

#### 7.1 Contributions

In this dissertation, several building blocks for a miniature sensor system were discussed. In order to achieve a small form factor, low energy operation becomes the key to such a system. Unlike the microcontrollers or the storage units like SRAM, the peripheral circuits did not get much emphasis in terms of low power operations. The timer, for example, is the only active component while the system is in the sleep mode and often dominates the total power if it is not properly designed. Passive communication is another important feature for the miniature system. At  $\sim mm^3$ scale, a system that is able to sustain the instantaneous power required by a RF module has yet been reported. Passive RFID technology provides a solution to access data remotely without actively powering the transponder. In order to work at close distance and limited form factor, new techniques are proposed in this work.

The contributions of the dissertation is summarized as the following:

• Two ultra low power timers that oscillate in sub-Hz to 10Hz are proposed. To effectively reduce the power consumption, the transistors are aggressively biased in subthreshold regions. The drain current of a MOS transistor has exponential dependency on the temperature when operating in this region. The first design uses the gate leakage of a MOS transistor as the charging/discharging source to reduce the temperature sensitivity and also to provide a low output frequency. The chip is measured with less than 0.1Hz of nominal frequency and sub-pW power consumption at 300mV supply voltage. By raising the voltage from 300mV to 600mV, the variation due to temperature reduces from  $0.6\%/^{\circ}$ C to  $0.16\%/^{\circ}$ C while the power consumption also increases to 2nW.

- Another approach for the low power timer is achieved by a program-and-hold technique. A current source is generated by referring to a resistor that has a low temperature sensitivity. To further reduce the power consumption in the active mode, the bias stage needs to be turned off. A hold stage is implemented to store the bias voltage so that the oscillation period remains temperature insensitive after the bias stage is turned off. Although the footprint of the program-and-hold timer is 40X larger than the gate leakage based design, it still only accounts for less than 2% of a 1mm<sup>2</sup> chip. The average power consumption is 150pW, with 5% cycle time error from 0°C to 90°C when the timer is refreshed every 2 minutes.
- A low power temperature sensor is proposed for remotely powered systems. The energy range of such system is highly correlated to the power consumption, which is typically dominated by the temperature sensor. The proposed temperature sensor generates a temperature insensitive current source and a PTAT current source which both operate at the subthreshold region. The current sources are translated into oscillating frequencies and can be used to generate a digitized output. In this work, the size of the temperature sensor is inversely proportional to the power consumption. With a footprint of 0.05mm<sup>2</sup>, the total power consumption is 220nW. The temperature inaccuracy is -1.6°C to +3°C over the temperature range from 0°C to 100°C.
- For communication between different power domains and testing for the subthreshold circuits, level shifters are widely used. A single stage static subthresh-

old to I/O voltage level shifter is implemented with the advantage of robustness across temperatures. The idea is to use cascode diode-connected transistors as the pull-up network so that both the pull-up and pull-down network have comparable driving strength. The experiment results show better performance and power consumption compared to a widely used DCVS level shifter when converting from 0.3V to 2.5V. The FO4 delay of the proposed design remains unchanged at temperatures lower than  $0^{c}irc$ C while the DCVS counterpart degrades exponentially below the room temperature.

- A passive capacitive coupling based proximity sensor data retrieval technique is presented. An alignment independent technique is proposed to alleviate the requirement for precise positioning in capacitive coupling systems. By dividing the data retrieval chip into smaller microplates, each microplate can detect the alignment information and reconfigure its function during communication. The test chip demonstrates that the achievable data rate varies less than 15% in 10 experiments when the sensor chip is randomly dropped on the data retrieval chip. In this work, data, clock and power can all be capacitively transmitted at the same time through different channels.
- For communication in mm's range, a inductive coupling technique is proposed to enhance the readout data robustness coming from the transponder compared to a traditional backscattering scheme. The readout data can be signaled in a series of pulses at the resonant frequency to maximize the received amplitude given a fixed energy. Another advantage is that by time-multiplexing of the power and data signals, the noise floor of the receiver is greatly reduced. The test chips for both the transponder and the reader are implemented with integrated inductors of 1mm×1mm. The achievable communication distance is 1.1mm which can be improved with larger reader inductor in the future.

# 7.2 Future works

In the future, we can work toward two directions for the sensor system.

- Wireless sensor network. The low power circuits shown in this dissertation were motivated toward a single sensor system. It is also possible to apply the circuit techniques to a wireless sensor network with the addition of RF modules. The RF module, however, has unacceptable active power consumption for the system that relies on a battery in  $\sim$ mm<sup>3</sup>. As a result, strong power gating is required to be applied to the RF circuits as well while data is not transmitting. In order to exchange data during a short period of wakeup time, a synchronization protocol should be established between the transmitter and the receiver. One solution is to use the wakeup receiver that was discussed in Chap. I. The idle power of the wakeup receiver should be further reduced so that it will not dominate the system power during the sleep mode. Another solution is to develop a low power timer that can be used on both end of the transceiver and the chips should remain synchronized after a extended period of time. This requires precise control on the output period of the timer as well as the jitter causing by the noises. Supply sensitivity should also be minimized since the operating condition can be different between two chips.
- High level integration.
  - Supply voltage management is a big challenge for a system with small current loadings. The output level from a microfabricated battery is consistent but typically too high for subthreshold operations according to the discussion in Sec. 1.1.6. An higher than 90% efficiency voltage regulator can be implemented by switched capacitors or bulk converters with decent loading. However, the efficiency drops significantly when the loading reduces in the sleep mode. In order to maintain decent efficiency, the de-

vices that is responsible for static power consumption should be minimized. Lowering the clock frequency intelligently based on the load is the key to cut down the power consumption. For example, the voltage regulator only needs to supply 100pW for the timer in the sleep mode and but the power demand increases to  $1\mu$ W for the whole system during the active mode.

- Another aspect for voltage management is by considering hybrid power sources such as the combination of battery and energy scavenging. To come up with a scheme that utilizes the advantages of both power sources is not trivial. Life time is not limited with energy scavenging, however, switching to the battery when the harvested energy is insufficient is challenging. A supply power monitor is required for implementation of such a hybrid power system.
- 3D stacking is an attractive option for the sensor system. The first reason is that by stacking the chips, higher densities can be achieved. Another reason is that heterogeneous technologies can be used to fabricate components such as the FLASH memory. This allows us to explore new architecture for the system as individual components can be optimized. For example, the timers that proposed in this work rely on the magnitude of gate leakage in a certain range so technology scaling can become adverse. On the other hand, the microcontroller typically favors smaller dimensions so that the parasitic capacitances can be reduced.

## BIBLIOGRAPHY

- G. Schimetta, F. Dollinger, G. Scholl, and R. Weigel, "Optimized design and fabrication of a wireless pressure and temperature sensor unit based on SAW transponder technology," in *Proc. MTT-S*, vol. 1, 20-25 May 2001, pp. 355–358.
- [2] S. Lizon Martinez, R. Giannetti, and J. L. Rodriguez Marrero, "Design of a system for continuous intraocular pressure monitoring," in *Proc. Instrumentation Measurement Technology Conf.*, vol. 3, 18-20 May 2004, pp. 1693–1696.
- [3] F. Udrea and J. Gardner, "SOI CMOS gas sensors," in *Proc. Sensors*, vol. 2, 2002, pp. 1379–1384.
- [4] Y. Leng, G. Zhao, Q. Li, C. Sun, and S. Liu, "A High Accuracy Signal Conditioning Method and Sensor Calibration System for Wireless Sensor in Automotive Tire Pressure Monitoring System," in *Proc. WiCOM*, Sept. 2007, pp. 1833–1837.
- [5] K. Wise, "Microelectromechanical systems: interfacing electronics to a nonelectronic world," in *Proc. IEDM*, Dec. 1996, pp. 11–18.
- [6] K. Ueno, T. Hirose, T. Asai, and Y. Amemiya, "A CMOS Watchdog Sensor for Certifying the Quality of Various Perishables with a Wider Activation Energy," *Trans. IEICE*, vol. E89-A, pp. 902–907, Apr. 2006.
- [7] E. Saneyoshi, K. Nose, M. Kajita, and M. Mizuno, "A 1.1V 35μm × 35μm thermal sensor with supply voltage sensitivity of 2°C/10%-supply for thermal management on the SX-9 supercomputer," in Symp. VLSI Circuits Dig. Tech. Papers, June 2008, pp. 152–153.
- [8] D. Culler, D. Estrin, and M. Srivastava, "Overview of Sensor Networks," Computer, vol. 37, no. 8, pp. 41–49, Aug. 2004.
- [9] N. Pletcher, S. Gambini, and J. Rabaey, "A 65μW, 1.9 GHz RF to Digital Baseband Wakeup Receiver for Wireless Sensor Nodes," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, Sept. 2007, pp. 539–542.
- [10] S. Selvarasah, C.-L. Chen, S.-H. Chao, P. Makaram, A. Busnaina, and M. Dokmeci, "A Three Dimensional Thermal Sensor Based on Single-Walled Carbon Nanotubes," in *Transducers*'07, June 2007, pp. 1023–1026.

- [11] A. Malik, M. Aceves, and S. Alcantara, "Novel FTO/SRO/silicon optical sensors: characterization and applications," in *Proc. Sensors*, vol. 1, 2002, pp. 116–120.
- [12] C. Dolabdjian, A. Qasimi, and C. Cordier, "Applied magnetic sensing: a long way," in *Proc. Sensors*, vol. 1, Oct. 2003, pp. 477–482.
- [13] D. Wilson, S. Hoyt, J. Janata, K. Booksh, and L. Obando, "Chemical sensors for portable, handheld field instruments," *IEEE Sensors J.*, vol. 1, no. 4, pp. 256–274, Dec. 2001.
- [14] M. Esashi, S. Sugiyama, K. Ikeda, Y. Wang, and H. Miyashita, "Vacuum-sealed silicon micromachined pressure sensors," *Proceedings of the IEEE*, vol. 86, no. 8, pp. 1627–1639, Aug. 1998.
- [15] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices. Cambridge Univ Pr, 1998.
- [16] S. Hanson, B. Zhai, D. Blaauw, and D. Sylvester, "Energy-Optimal Circuit Design," in Symp. System-On-Chip, Nov. 2007, pp. 1–4.
- [17] M. Seok, S. Hanson, D. Sylvester, and D. Blaauw, "Analysis and Optimization of Sleep Modes in Subthreshold Circuit Design," in *Proc. Design Automation Conf.*, June 2007, pp. 694–699.
- [18] H. Onoda, K. Miyashita, T. Nakayama, T. Kinoshita, H. Nishimura, A. Azuma, S. Yamada, and F. Matsuoka, "0.7 V SRAM Technology with Stress-Enhanced Dopant Segregated Schottky (DSS) Source/Drain Transistors for 32 nm Node," in Symp. VLSI Circuits Dig. Tech. Papers, June 2007, pp. 76–77.
- [19] A. Wang and A. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 310–319, Jan. 2005.
- [20] B. Zhai, D. Blaauw, D. Sylvester, and S. Hanson, "A Sub-200mV 6T SRAM in 0.13μm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 332–606.
- [21] I. J. Chang, J.-J. Kim, S. Park, and K. Roy, "A 32kb 10T Subthreshold SRAM Array with Bit-Interleaving and Differential Read Scheme in 90nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2008, pp. 388–622.
- [22] N. Verma and A. Chandrakasan, "A 256 kb 65 nm 8T Subthreshold SRAM Employing Sense-Amplifier Redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008.
- [23] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti, "Introduction to flash memory," *IEEE Proc.*, vol. 91, no. 4, pp. 489–502, Apr. 2003.

- [24] S. Shukuri, K. Tanagisawa, and K. Ishibashi, "CMOS process compatible ie-Flash (inverse gate electrode Flash) technology for system-on-a-chip," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, May 2001, pp. 179–182.
- [25] Q. Huang and M. Qberle, "A 0.5-mW passive telemetry IC for biomedical applications," *IEEE J. Solid-State Circuits*, vol. 33, no. 7, pp. 937–946, July 1998.
- [26] A. DeHennis and K. Wise, "A double-sided single-chip wireless pressure sensor," in Proc. IEEE Conf. Microelectromechanical Syst., Jan. 2002, pp. 252–255.
- [27] S. Kaiser, "Passive Telemetric Readout System," *IEEE Sensors J.*, vol. 6, no. 5, pp. 1340–1345, Oct. 2006.
- [28] D. Dudenbostel, K.-L. Krieger, C. Candler, and R. Laur, "A new passive CMOS telemetry chip to receive power and transmit data for a wide range of sensor applications," in *Proc. Solid State Sensors and Acuators*, vol. 2, 16-19 June 1997, pp. 995–998.
- [29] F. Kocer and M. P. Flynn, "A new transponder architecture with on-chip ADC for long-range telemetry applications," *IEEE J. Solid-State Circuits*, vol. 41, no. 5, pp. 1142–1148, May 2006.
- [30] W. Nosovic and T. Todd, "Scheduled rendezvous and RFID wakeup in embedded wireless networks," in *Proc. ICC*, vol. 5, 2002, pp. 3325–3329.
- [31] S. von der Mark and G. Boeck, "Ultra low power wakeup detector for sensor networks," in *Proc. IMOC*, Oct. 2007, pp. 865–868.
- [32] H. Kulah and K. Najafi, "Energy Scavenging From Low-Frequency Vibrations by Using Frequency Up-Conversion for Wireless Sensor Applications," *IEEE Sensors J.*, vol. 8, no. 3, pp. 261–268, Mar. 2008.
- [33] E. Yeatman, "Rotating and Gyroscopic MEMS Energy Scavenging," in International Workshop on Wearable and Implantable Body Sensor Networks (BSN 06), 2006, pp. 42–45.
- [34] M. Renaud, T. Sterken, P. Fiorini, R. Puers, K. Baert, and C. van Hoof, "Scavenging energy from human body: design of a piezoelectric transducer," in *Proc. Transducers*, vol. 1, 5-9 June 2005, pp. 784–787.
- [35] E. Reilly, E. Carleton, and P. Wright, "Thin Film Piezoelectric Energy Scavenging Systems for Long Term Medical Monitoring," in *International Workshop* on Wearable and Implantable Body Sensor Networks (BSN 06), Apr. 2006, pp. 38–41.
- [36] B. C. Yen and J. H. Lang, "A variable-capacitance vibration-to-electric energy harvester," *IEEE Trans. Circuits Syst. I*, vol. 53, no. 2, pp. 288–295, Feb. 2006.

- [37] S. J. Roundy, "Energy Scavenging for Wireless Sensor Nodes with a Focus on Vibration to Electricity Conversion," Ph.D. dissertation, The University of California, Berkeley, 2003.
- [38] H. Li and P. Pillay, "A Linear Generator Powered from Bridge Vibrations for Wireless Sensors," in Proc. IAS, 2007, pp. 523–529.
- [39] *Power paper*, http://www.powerpaper.com. [Online]. Available: http://www.powerpaper.com
- [40] J. Klassen, A description of Cymbet battery technology and its comparison with other battery technologies, Available: http://www.cymbet.com. [Online]. Available: http://www.cymbet.com
- [41] J. Flammer, S. Orgül, V. Costa, N. Orzalesi, G. Krieglstein, L. Serra, J.-P. Renard, and E. Stefánsson, "The impact of ocular blood flow in glaucoma," *Progress Retinal Eye Research*, vol. 21, pp. 359–393, 2002.
- [42] C. H. Hong and R. A. A. Arosemena, D. Zurakowski, "Glaucoma drainage devices: a systematic literature review and current controversies," *Ophthalmology Survey*, vol. 50, pp. 48–60, 2005.
- [43] S. Chandrasekaran, R. Cumming, E. Rochtchina, and P. Mitchell, "Associations between elevated intraocular pressure and glaucoma, use of glaucoma medications, and 5-year incident cataract: the Blue Mountains Eye Study," *Ophthalmology*, pp. 417–424, 2006.
- [44] S. Asrani, R. Zeimer, J. Wilensky, D. Gieser, S. Vitale, and K. Lindenmuth, "Large Diurnal Fluctuations in Intraocular Pressure Are an Independent Risk Factor in Patients With Glaucoma," *Journal of Glaucoma*, vol. 9, pp. 134–142, 2000.
- [45] R. C. Zeimer, J. T. Wilensky, and D. K. Gieser, "Presence and Rapid Decline of Early Morning Intraocular Pressure in Glaucoma Patients," *Ophthalmology*, vol. 97, pp. 547–550, 1990.
- [46] D. Da Rin and B. Brown, "Diurnal variation of intraocular pressure and the overriding effects of sleep," Am J Optom Physiol Opt, vol. 64, pp. 54–61, 1987.
- [47] P. P. Syam, I. Mavrikakis, and C. Liu, "Importance of early morning intraocular pressure recording for measurement of diurnal variation of intraocular pressure," *British Journal of Ophthalmology*, vol. 89, pp. 926–927, 2005.
- [48] J. H. K. Liu, X. Zhang, D. F. Kripke, and R. N. Weinreb, "Twenty-Four-Hour Pattern of Intraocular Pressure in the Aging Population," *Investigative Ophthalmology and Visual Science*, vol. 40, pp. 2912–2917, 1999.

- [49] A. Banobre, T. Alvarez, R. Fechtner, R. Greene, G. Thomas, O. Levi, and N. Ciampa, "Measurement of intraocular pressure in pig's eyes using a new tonometer prototype," in *Proc. NEBC*, 2-3 April 2005, pp. 260–261.
- [50] C. C. Collins, "Miniature passive pressure transensor for implanting in the eye," *IEEE Trans. Biomed. Eng.*, vol. 14, pp. 74–83, 1967.
- [51] M. Kandler and W. Mokwa, "Capacitive silicon pressure sensor for invasive measurement of blood pressure," in *Proc. Micromech. Euro. Tech. Dig*, Nov 1990, pp. 203–208.
- [52] K. C. Katuri, S. Asrani, and R. M.K., "Intraocular Pressure Monitoring Sensors," *IEEE Sensors J.*, vol. 8, no. 1, pp. 12–19, Jan. 2008.
- [53] W. Mokwa and U. Schnakenberg, "Micro-transponder systems for medical applications," *IEEE. Trans. Instrumentation and Measurement*, vol. 50, no. 6, pp. 1551–1555, Dec. 2001.
- [54] K. Stangel, S. Kolnsberg, D. Hammerschmidt, B. Hosticka, H. Trieu, and W. Mokwa, "A programmable intraocular CMOS pressure sensor system implant," *IEEE J. Solid-State Circuits*, vol. 36, no. 7, pp. 1094–1100, July 2001.
- [55] J. Coosemans, M. Catrysse, and R. Puers, "A readout circuit for an intra-ocular pressure sensor," Sens. Actuators, vol. 110, no. 1-3, pp. 432–438, 2004.
- [56] Y.-S. Lin, S. Hanson, F. Albano, C. Tokunaga, R.-U. Haque, K. Wise, A. Sastry, D. Blaauw, and D. Sylvester, "Low-voltage circuit design for widespread sensing applications," in *IEEE Int. Symp. on Circuits and Systems*, May 2008, pp. 2558–2561.
- [57] B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, D. Blaauw, and T. Austin, "A 2.60pJ/Inst Subthreshold Sensor Processor for Optimal Energy Efficiency," in *Symp. VLSI Circuits Dig. Tech. Papers*, June 2006, pp. 154–155.
- [58] K. Hosaka, S. Harase, S. Izumiya, and T. Adachi, "A cascode crystal oscillator suitable for integrated circuits," in *Proc. Frequency Control*, 29-31 May 2002, pp. 610–614.
- [59] R. Woudsma and J. M. Noteboom, "The Modular Design of Clock-Generator Circuits in a CMOS Building-Block System," *IEEE J. Solid-State Circuits*, vol. 20, no. 3, pp. 770–774, Jun 1985.
- [60] H. Okuno, T. Tominaka, S. Fujishima, T. Mitsumoto, T. Kubo, T. Kawaguchi, J.-W. Kim, K. Ikegami, N. Sakamoto, S. Yokouchi, T. Morikawa, T. Tanaka, A. Goto, and Y. Yano, "A programmable clock oscillator for integrated sensor applications," in *Proc. Electron Devices Meeting*, 29 Aug. 1998, pp. 1075–1077.

- [61] K. Sundaresan, K. Brouse, K. U-Yen, F. Ayazi, and P. Allen, "A 7-MHz process, temperature and supply compensated clock oscillator in 0.25 μm CMOS," in *IEEE Int. Symp. on Circuits and Systems*, vol. 1, 25-28 May 2003, pp. 693–696.
- [62] S. Mick, J. Wilson, and P. Franzon, "4 Gbps high-density AC coupled interconnection," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, May 2002, pp. 133–140.
- [63] K. Kanda, D. Antono, K. Ishida, H. Kawaguchi, T. Kuroda, and T. Sakurai, "1.27Gb/s/pin 3mW/pin wireless superconnect (WSC) interface scheme," in *IEEE ISSCC Dig. Tech. Papers*, vol. 1, 2003, pp. 186–487.
- [64] R. Drost, R. Hopkins, R. Ho, and I. Sutherland, "Proximity communication," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1529–1535, Sept. 2004.
- [65] K. Opasjumruskit, T. Thanthipwan, O. Sathusen, P. Sirinamarattana, P. Gadmanee, E. Pootarapan, N. Wongkomet, A. Thanachayanont, and M. Thamsirianunt, "Self-powered wireless temperature sensors exploit RFID technology," *IEEE Pervasive Comput.*, vol. 5, no. 1, pp. 54–61, Jan.-March 2006.
- [66] R. Xu, Y. Jin, and C. Nguyen, "Power-efficient switching-based CMOS UWB transmitters for UWB communications and Radar systems," *IEEE Trans. Microw. Theory Tech.*, vol. 54, no. 8, pp. 3271–3277, Aug. 2006.
- [67] N. Miura, D. Mizoguchi, T. Sakurai, and T. Kuroda, "Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless superconnect," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 829–837, April 2005.
- [68] Y.-S. Lin, D. Sylvester, and D. Blaauw, "A sub-pW timer using gate leakage for ultra low-power sub-Hz monitoring systems," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, Sept. 2007, pp. 397–400.
- [69] Y.-S. Lin, D. Sylvester, and D. Blaauw, "An ultra low power 1V, 220nW temperature sensor for passive wireless applications," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, Sept. 2008, pp. 507–510.
- [70] Y.-S. Lin and D. Sylvester, "Single stage static level shifter design for subthreshold to I/O voltage conversion," in *Proc. Int. Symp. Low Power Electronics and Design*, Aug. 2008, pp. 197–200.
- [71] Y.-S. Lin, D. Sylvester, and D. Blaauw, "Sensor data retrieval using alignment independent capacitive signaling," in *Symp. VLSI Circuits Dig. Tech. Papers*, June. 2008, pp. 66–67.
- [72] C. H. Lee and H. J. Park, "All-CMOS temperature independent current reference," in *Electronics Letters*, vol. 32, no. 14, 4 July 1996, pp. 1280–1281.

- [73] J. Georgiou and C. Toumazou, "A resistorless low current reference circuit for implantable devices," in *IEEE Int. Symp. on Circuits and Systems*, vol. 3, May 2002, pp. 193–196.
- [74] K. M. Cao, W.-C. Lee, W. Liu, X. Jin, P. Su, S. Fung, J. An, B. Yu, and C. Hu, "BSIM4 gate leakage model including source-drain partition," in *Proc. IEDM*, 10-13 Dec. 2000, pp. 815–818.
- [75] C.-H. Choi, K.-Y. Nam, Z. Yu, and R. Dutton, "Impact of gate direct tunneling current on circuit performance: a simulation study," *IEEE Trans. Electron Devices*, vol. 48, no. 12, pp. 2823–2829, Dec. 2001.
- [76] M. J. S. Smith and J. D. Meindl, "Exact analysis of the Schmitt trigger oscillator," *IEEE J. Solid-State Circuits*, vol. 19, no. 6, pp. 1043–1046, Dec 1984.
- [77] S. Borkar, "Design challenges of technology scaling," *IEEE Micro*, vol. 19, no. 4, pp. 23–29, Jul-Aug 1999.
- [78] Y. Liu, R. Dick, L. Shang, and H. Yang, "Accurate Temperature-Dependent Integrated Circuit Leakage Power Estimation is Easy," in *Proc. Design Au*tomation Test Eur., Apr. 2007, pp. 1–6.
- [79] H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif, "Full chip leakage-estimation considering power supply and temperature variations," in *Proc. Int. Symp. Low Power Electronics and Design*, Aug. 2003, pp. 78–83.
- [80] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, D. Sylvester, and D. Blaauw, "Performance and Variability Optimization Strategies in a Sub-200mV, 3.5pJ/inst, 11nW Subthreshold Processor," in *Symp. VLSI Circuits Dig. Tech. Papers*, June 2007, pp. 152–153.
- [81] S. Narendra, V. De, S. Borkar, D. Antoniadis, and A. Chandrakasan, "Full-Chip Subthreshold Leakage Power Prediction and Reduction Techniques for Sub-0.18-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 3, pp. 501–510, Mar. 2004.
- [82] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester., "Statistical Analysis of Subthreshold Leakage Current for VLSI Circuits," *IEEE Trans. VLSI Syst.*, vol. 12, no. 2, pp. 131–139, Feb. 2004.
- [83] H.-M. Chuang, K.-B. Thei, S.-F. Tsai, and W.-C. Liu, "Temperature-dependent characteristics of polysilicon and diffused resistors," *IEEE Trans. Electron De*vices, vol. 50, no. 5, pp. 1413–1415, May 2003.
- [84] D. Duarte, G. Geannopoulos, U. Mughal, K. Wong, and G. Taylor, "Temperature Sensor Design in a High Volume Manufacturing 65nm CMOS Digital Process," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, Sept. 2007, pp. 221–224.

- [85] F. Kocer and M. Flynn, "An RF-powered, wireless CMOS temperature sensor," *IEEE Sensors J.*, vol. 6, no. 3, pp. 557–564, 2006.
- [86] S. Zhou and N. Wu, "A novel ultra low power temperature sensor for UHF RFID tag chip," in *Proc. ASSCC*, Nov 2007, pp. 464–467.
- [87] K. Finkenzeller, RFID Handbook: Fundamentals and Applications in Contactless Smart Cards and Identification. John Wiley & Sons, 2003.
- [88] A. Bakker and J. Huijsing, "Micropower CMOS temperature sensor with digital output," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 933–937, July 1996.
- [89] M. Tuthill, "A switched-current, switched-capacitor temperature sensor in 0.6μm CMOS," *IEEE J. Solid-State Circuits*, pp. 1117–1122, 1998.
- [90] M. A. P. Pertijs, K. A. A. Makinwa, and J. H. Huijsing, "A CMOS smart temperature sensor with a 3σ inaccuracy of ±0.1°C from -55°C to 125°C," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2805–2815, Dec. 2005.
- [91] P. Chen, C.-C. Chen, C.-C. Tsai, and W.-F. Lu, "A time-to-digital-converterbased CMOS smart temperature sensor," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1642–1648, Aug 2005.
- [92] K. Kimura, "Low voltage techniques for bias circuits," *IEEE Trans. Circuits Syst. I*, vol. 44, no. 5, pp. 459–465, May 1997.
- [93] K. Usami, "Automated low-power technique exploiting multiple supply voltages applied to a media processor," *IEEE J. Solid-State Circuits*, vol. 33, no. 3, pp. 463–472, 1998.
- [94] C.-C. Yu, W.-P. Wang, and B.-D. Liu, "A new level converter for low-power applications," in *IEEE Int. Symp. on Circuits and Systems*, vol. 1, May 2001, pp. 113–116.
- [95] F. Ishihara, F. Sheikh, and B. Nikolic, "Level Conversion for Dual-Supply Systems," *IEEE Trans. VLSI Syst.*, vol. 12, no. 2, pp. 185–195, 2004.
- [96] I. J. Chang, J.-J. K., and K. Roy, "Robust Level Converter Design for Subthreshold Logic," in *Proc. Int. Symp. Low Power Electronics and Design*, 2006, pp. 14–19.
- [97] R. Puri, L. Stok, J. Cohn, D. Kung, D. Pan, D. Sylvester, A. Srivastava, and S. Kulkarni, "Pushing ASIC performance in a power envelope," in *Proc. Design Automation Conf.*, June 2003, pp. 788–793.
- [98] W.-T. Wang, M.-D. Ker, M.-C. Chiang, and C.-H. Chen, "Level shifters for high-speed 1 V to 3.3 V interfaces in a 0.13 μm Cu-interconnection/low-k CMOS technology," in *Proc. VLSI TSA*, 2001, pp. 307–310.

- [99] H. Zhang, V. George, and J. Rabaey, "Low-swing on-chip signaling techniques: effectiveness and robustness," *IEEE Trans. VLSI Syst.*, vol. 8, no. 3, pp. 264– 272, June 2000.
- [100] Y. Ramadass and A. Chandrakasan, "Minimum Energy Tracking Loop with Embedded DC-DC Converter Delivering Voltages down to 250mV in 65nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 64 – 587.
- [101] M.-E. Hwang, A. Raychowdhury, K. Kim, and K. Roy, "A 85mV 40nW Process-Tolerant Subthreshold 8x8 FIR Filter in 130nm Technology," in *Symp. VLSI Circuits Dig. Tech. Papers*, June 2007, pp. 154–155.
- [102] U. Karthaus and M. Fischer, "Fully integrated passive UHF RFID transponder IC with 16.7-μW minimum RF input power," *IEEE J. Solid-State Circuits*, vol. 38, no. 10, pp. 1602–1608, Oct. 2003.
- [103] N. Miura, D. Mizoguchi, M. Inoue, T. Sakurai, and T. Kuroda, "A 195-gbs 1.2-W inductive inter-chip wireless superconnect with transmit power control scheme for 3-D-stacked system in a package," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 23–34, Jan. 2006.
- [104] T. Kuroda, "Wireless Proximity Communications for 3D System Integration," in *IEEE Workshop on RFIT*, Dec. 2007, pp. 21–25.
- [105] R. Drost, R. Hopkins, and I. Sutherland, "Proximity communication," in Proc. IEEE Custom Integrated Circuits Conf. (CICC), Sept. 2003, pp. 469–472.
- [106] A. Fazzi, R. Canegallo, L. Ciccarelli, L. Magagni, F. Natali, E. Jung, P. Rolandi, and R. Guerrieri, "3D Capacitive Interconnections with Mono- and Bi-Directional Capabilities," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 356–608.
- [107] E. Culurciello and A. G. Andreou, "Capacitive Inter-Chip Data and Power Transfer for 3-D VLSI," *IEEE Trans. Circuits Syst. II*, vol. 53, no. 12, pp. 1348–1352, 2006.
- [108] R. Drost, R. Ho, D. Hopkins, and I. Sutherland, "Electronic alignment for proximity communication," in *IEEE ISSCC Dig. Tech. Papers*, 15-19 Feb. 2004, pp. 144–518.
- [109] R. Canegallo, M. Mirandola, A. Fazzi, L. Magagni, R. Guerrieri, and K. Kaschlun, "Electrical measurement of alignment for 3D stacked chips," in *Proc. ESSCIRC*, 12-16 Sept. 2005, pp. 347–350.
- [110] Raphael, Synopsys Inc., Mountain View, California, 2005.
- [111] L. Nazhandali, B. Zhai, A. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, T. Austin, and D. Blaauw, "Energy optimization of subthreshold-voltage sensor network processors," in *Proc. of the International Symposium on Computer Architecture (ISCA)*, June 2005, pp. 197–207.

- [112] M. Seok, S. Hanson, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "The Phoenix Processor: A 30pW platform for sensor applications," in Symp. VLSI Circuits Dig. Tech. Papers, June 2008, pp. 188–189.
- [113] V. Chawla and D. S. Ha, "An overview of passive RFID," IEEE Commun. Mag., vol. 45, no. 9, pp. 11–17, Sept. 2007.
- [114] K. W. Min, S. B. Chai, and S. Kim, "An Analog Front-End Circuit for ISO/IEC 14443-compatible RFID Interrogators," *Jour. ETRI*, vol. 26, no. 6, pp. 560–564, 2004.
- [115] J.-P. Curty, N. Joehl, C. Dehollain, and M. Declercq, "Remotely powered addressable UHF RFID integrated system," *IEEE J. Solid-State Circuits*, vol. 40, no. 11, pp. 2193–2202, Nov. 2005.
- [116] G. Balachandran and R. Barnett, "A 110 nA Voltage Regulator System With Dynamic Bandwidth Boosting for RFID Systems," *IEEE J. Solid-State Circuits*, vol. 41, no. 9, pp. 2019–2028, Sept. 2006.
- [117] N. Miura, H. Ishikuro, T. Sakurai, and T. Kuroda, "A 0.14pJ/b Inductive-Coupling Inter-Chip Data Transceiver with Digitally-Controlled Precise Pulse Shaping," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 358–608.
- [118] R. E. Best, *Phase-Locked Loops*. McGraw-Hill Professional, 2003.
- [119] B. Razavi, Design of Analog CMOS Integrated Circuits. McGraw-Hill, 2000.
- [120] S. Levantino, M. Milani, C. Samori, and A. Lacaita, "Fast-Switching Analog PLL With Finite-Impulse Response," *IEEE Trans. Circuits Syst. I*, vol. 51, no. 9, pp. 1697–1701, Sept. 2004.
- [121] W.-H. Lee, J.-D. Cho, and S.-D. Lee, "A high speed and low power phasefrequency detector and charge-pump," in *Proc. Asia and South Pacific Design Automation Conf.*, vol. 1, 1999, pp. 269–272.