# Low-Power Energy Efficient Circuit Techniques 

## for Small IoT Systems

by<br>Wanyeong Jung<br>A dissertation submitted in partial fulfillment of the requirements for the degree of<br>Doctor of Philosophy<br>(Electrical Engineering)<br>in the University of Michigan<br>2017

Doctoral Committee:
Professor David T. Blaauw, Chair
Assistant Professor Hun-Seok Kim
Associate Professor Kenn R. Oldham
Professor Dennis M. Sylvester

Wanyeong Jung
wanyeong@umich.edu
ORCID iD: 0000-0002-5671-1341
© Wanyeong Jung 2017

To my family

## TABLE OF CONTENTS

DEDICATION ..... ii
LIST OF FIGURES ..... viii
LIST OF TABLES ..... xiii
ABSTRACT ..... xiv
CHAPTER 1 Introduction ..... 1
1.1 Powering Small IoT Systems ..... 6
1.2 Energy-Efficient Sensor and Data Converter ..... 7
1.3 Improving Sensor Accuracy and Variation ..... 9
1.4 Outline of the Dissertation ..... 10
CHAPTER 2 Ultra-Low Power Energy Harvester. ..... 12
2.1 Introduction ..... 12
2.2 Self-Oscillating Voltage Doubler ..... 13
2.2.1 Motivation and Basic Structure ..... 13
2.2.2 Modulation Scheme for Optimum Conversion Efficiency ..... 16
2.2.3 Circuit Implementation ..... 22
2.3 Energy Harvester ..... 25
2.3.1 Overall Structure ..... 25
2.3.2 Conversion ratio modulation ..... 27
2.4 Measured Results ..... 29
2.5 Conclusion ..... 37
CHAPTER 3 Low-Power Wide-Range Power Management Unit ..... 38
3.1 Introduction ..... 38
3.2 Overall Architecture and Operation. ..... 39
3.3 Key Building Blocks and Techniques ..... 41
3.3.1 Switched-Capacitor DC-DC Converters ..... 41
3.3.2 Converter Frequency Control Loop ..... 42
3.3.3 Load-Proportional Biasing ..... 43
3.3.4 Drop Detector for Rapid and Robust Frequency Adjustment ..... 44
3.4 Measured Results ..... 46
3.5 Conclusion ..... 47
CHAPTER 4 Rational Conversion Ratio SC DC-DC Converter using Negative Output
Feedback ..... 49
4.1 Introduction ..... 49
4.1.1 Switched-Capacitor DC-DC Converters ..... 49
4.1.2 Binary-ratio-reconfigurable SC converters ..... 50
4.1.3 Proposed technique to generate arbitrary rational ratio ..... 53
4.2 Rational DC-DC Converter. ..... 53
4.2.1 Structure of the rational DC-DC converter ..... 53
4.2.2 Operation of the rational DC-DC converter ..... 56
4.2.3 Performance analysis of the rational DC-DC converter ..... 60
4.3 Chip Fabrication and Measurement ..... 63
4.3.1 Test Chip Fabrication ..... 63
4.3.2 Measurement ..... 65
4.4 Conclusion ..... 69
CHAPTER 5 Fully Digital Capacitance-to-Digital Converter using Iterative Delay-Chain
Discharge ..... 70
5.1 Introduction ..... 70
5.2 Structure of Proposed CDC ..... 71
5.2.1 Basic Operation Scheme ..... 71
5.2.2 Detailed Implementation ..... 72
5.2.3 Parasitic Capacitance Cancelation ..... 75
5.2.4 Output Code Calibration ..... 76
5.3 Chip Fabrication and Measured Results ..... 76
5.4 Conclusion ..... 80
CHAPTER 6 Edge-Pursuit Comparator: An Energy-Scalable Oscillator Collapse-Based
Comparator ..... 81
6.1 Introduction ..... 81
6.2 Structure and Operation of the Edge-Pursuit Comparator ..... 83
6.3 Analysis of Edge-Pursuit Comparator Performance. ..... 86
6.3.1 Operational Analysis in Phase Domain ..... 87
6.3.2 Comparison Time and Energy ..... 90
6.3.3 Input Referred Noise ..... 91
6.4 Discussion on Characteristics of Edge-Pursuit Comparator ..... 93
6.4.1 Input Noise Tunability ..... 93
6.4.2 Automatic Energy Scaling ..... 95
6.4.3 Energy vs. Noise Efficiency ..... 97
6.4.4 Offset ..... 101
6.5 SAR ADC with Edge-Pursuit Comparator ..... 103
6.6 Measured Results ..... 107
6.7 Conclusion ..... 112
CHAPTER 7 Fully-Integrated Voltage / Temperature Lock with On-Chip Oven Control114
7.1 Introduction ..... 114
7.2 Simultaneous Voltage-Temperature Lock Concept ..... 115
7.3 Implementation Detail ..... 116
7.4 Measurement Results ..... 119
CHAPTER 8 Conclusion ..... 123
8.1 Summary of Contributions ..... 123
8.2 Future Research Directions ..... 124
BIBLIOGRAPHY ..... 127

## LIST OF FIGURES

Figure 1.1 Theoretical / actual scaling trend of delay and energy. [2] ..... 1
Figure 1.2 "Cost/Area $\times$ Area/Transistor $=$ Cost/Transistor." [2], recited from [3] ..... 2
Figure 1.3 "Core counts on processors published at ISSCC." [1] ..... 2
Figure 1.4 "Clock frequency scaling trends." [1] ..... 3
Figure 1.5 "More than Moore" devices. [5] ..... 3
Figure 1.6 Dual trend of "miniaturization of digital functions (More Moore)" and "functional
diversification (More than Moore)". [5] ..... 4
Figure 1.7 "Sensors will populate the world of IoE" [7] ..... 5
Figure 2.1 Structure of a conventional capacitive voltage doubler. ..... 14
Figure 2.2 Basic structure of the proposed self-oscillating voltage doubler. ..... 15
Figure 2.3 Rough dependency of voltage doubler loss elements on $\Delta$. ..... 19
Figure 2.4 Leakage loss model of the voltage doubler. ..... 20
Figure 2.5 Implementation of the voltage doubler with frequency modulation. ..... 22
Figure 2.6 Detailed implementation (left) and timing diagram (right) of the delay block. ..... 23
Figure 2.7 Detailed implementation of the voltage divider (left) and the charge pump (right)
from Figure 2.4. ..... 24
Figure 2.8 Overall energy harvester architecture. ..... 25
Figure 2.9 5-stage bootstrapped ring oscillator for voltage doublers with lower $\mathrm{V}_{\mathrm{TH}}$ switches and its timing diagram (top right)26

Figure 2.10 Dual switching scheme for the harvester to reconfigure its conversion ratio while maintaining its capability of self-startup.................................................................... 28

Figure 2.11 Die micrograph of $0.18 \mu \mathrm{~m}$ CMOS test chip. Total flying cap sizes of the standalone voltage doubler and the harvester are 54 pF and 600 pF , respectively......................... 29

Figure 2.12 Measured results of the voltage doubler................................................................... 30
Figure 2.13 Measured results of the harvester with different conversion ratios........................... 31

Figure 2.15 Measured results of the harvester with a $0.84 \mathrm{~mm}^{2}$ silicon solar cell at the input. .... 32
Figure 2.16 Measurement setup for the second harvester chip's self-starting behavior............... 33
Figure 2.17 Cold start behavior of the harvester powered by a $1.33 \mathrm{~mm}^{2}$ solar cell. Output is connected to a capacitor. Light is turned on at some time between $0 \sim 20$ s............... 34

Figure 2.18 Measured results of the harvester in different temperatures, with solar cell $\mathrm{I}_{\mathrm{SC}}=$
$\qquad$
Figure 2.19 Micrograph of a small $\mathrm{M}^{3}$ wireless sensor node system [9] with harvester (top right), and a graph of measured battery voltage (bottom) showing that its battery is continuously charged by the harvester during system operation.35

Figure 3.1 Overall architecture of the complete power-management system and its operation... 39
Figure 3.2 Structure of SC Converters.......................................................................................... 41
Figure 3.3 Frequency control loop for each SC converter with load-proportional biasing scheme.
$\qquad$
Figure 3.4 Die micrograph of a test chip. .................................................................................... 45
Figure 3.5 Measured performance vs. input voltage..................................................................... 45
Figure 3.6 Measured drop detector operation. .............................................................................. 46
Figure 3.7 performance vs. output power. ..... 46
Figure 4.1 SAR SC DC-DC converter. [7] ..... 51
Figure 4.2 Recursive SC DC-DC converter. [29] ..... 51
Figure 4.3 Recursive DC-DC converter redrawn ..... 54
Figure 4.4 Structure of the proposed rational converter. ..... 55
Figure 4.5 Example configuration for $4 / 13$ conversion ratio $(\mathrm{A} \leq 1)$ ..... 57
Figure 4.6 Example configuration for $9 / 11$ conversion ratio ( $\mathrm{A}>1$ ) ..... 59
Figure 4.7 Conduction loss vs. conversion ratio of the 4-stage rational converter withcomparison to the recursive converter.62
Figure 4.8 Switching loss vs. conversion ratio of the 4 -stage rational converter with comparisonto the recursive converter.62
Figure 4.9 Switching loss vs. conversion ratio of the 4 -stage rational converter with comparison
to the recursive converter. ..... 63
Figure 4.10 Structure of a general reconfigurable DC-DC converter. ..... 64
Figure 4.11 Structure of voltage negators. ..... 65
Figure 4.12 Die micrograph of the test chip. ..... 65Figure 4.13 Measured efficiency vs. Vout of the rational and recursive converters. Ratios foroptimum efficiency between $2 / 3$ and $15 / 16$ for the rational converter are noted asexamples.66
Figure 4.14 Output conductance comparison among rational, SAR, and recursive converters at ratios around $2 / 3$.67
Figure 4.15 Output conductance comparison at $1 / 3$ conversion ratio. ..... 67
Figure 4.16 Output conductance comparison at $2 / 5$ conversion ratio. ..... 68
Figure 5.1 Basic structure of the proposed CDC. ..... 71
Figure 5.2 Basic operation scheme of the proposed CDC ..... 72
Figure 5.3 Detailed implementation of the CDC. ..... 73
Figure 5.4 Detailed timing diagram of the CDC. ..... 74
Figure 5.5 Technique for parasitic capacitance cancelation. ..... 76
Figure 5.6 Die micrograph of the 40 nm CMOS test chip. ..... 77
Figure 5.7 Measured CDC resolution and linearity error. ..... 78
Figure 5.8 Measured CDC temperature sensitivity before and after calibration. ..... 78
Figure 5.9 Measured results with capacitive pressure sensor with parasitic cancelation. ..... 79
Figure 6.1 Required energy for comparison vs. input difference. (a) Conventional comparatorswasting most energy for large input difference. (b) Energy scaling saved wastedenergy for comparison.81
Figure 6.2 Structure of the edge-pursuit comparator. ..... 83
Figure 6.3 Operation of the edge-pursuit comparator ..... 84
Figure 6.4 Output of the EPC vs. time during comparison. Output waveform changes according
to (a) the polarity of the $\left|\mathrm{V}_{\mathrm{INP}}-\mathrm{V}_{\mathrm{INM}}\right|$ (b) amount of the input signal difference. ..... 84
Figure 6.5 Simplified delay cell model for noise estimation. ..... 86
Figure 6.6 Operation of EPC in phase domain ..... 87
Figure 6.7 Simulated input referred noise vs. (a) delay cell size and (b) number of delay cell. . ..... 94
Figure 6.8 Graph of the EPC's scaling factor $S k$. ..... 96
Figure 6.9 Comparisons during SAR ADC conversion in the energy-worst case. ..... 96
Figure 6.10 Conventional dynamic comparators. (a) Single-stage [38], [50]. (b) Two-stage [39].

Figure 6.11 Comparison of simulated comparator performances among EPC and conventional 1stage [38], [50] and 2-stage [39] comparators. (a) Probability for output "high", inferring input-referred noise. (b) Comparison energy vs. input signal difference. . 101

Figure 6.12 Simulated input-referred offset voltage vs. number of delay cell. ............................ 102
Figure 6.13 15-bit SAR ADC architecture with EPC and dual CDAC for high resolution. ...... 102
Figure 6.14 Probability distribution function of the comparison time at VIN=0. ...................... 103
Figure 6.15 Operation of 5-bit find CDAC during fine-bit decision. (a) Initial state. (b) After a comparison with "COMP $=0$ " (c) After another comparison with "COMP=0"........ 104

Figure 6.16 Operation principle of the fine-bit CDAC generating small voltage change. ......... 105
Figure 6.17 Comparison between techniques for high-resolution CDAC. (a) Bridge-capacitor scheme [53], [54] (b) Presented common-mode switching CDAC. ......................... 106

Figure 6.18 Die photograph of 15-bit SAR ADC with EPC....................................................... 108
Figure 6.19 Measured average comparison energy of the EPC vs. SAR ADC bit position....... 108
Figure 6.20 Measured DNL and INL......................................................................................... 109
Figure 6.21 Measured SNDR and SFDR vs. input signal frequency.......................................... 109
Figure 6.22 Measured frequency spectrum............................................................................... 110
Figure 6.23 Measured power consumption of the SAR ADC at Nyquist frequency................... 111
Figure 7.1 Main concept of the voltage / temperature simultaneous lock. ................................. 115
Figure 7.2 Overall architecture of the implemented test circuit. ................................................ 117
Figure 7.3 Detailed oscillator implementation............................................................................ 118
Figure 7.4 Structure of the on-chip heater. ................................................................................ 118
Figure 7.5 Die micrograph......................................................................................................... 120
Figure 7.6 Measurement Results................................................................................................. 121

## LIST OF TABLES

Table 2.1 Switch mapping for harvester's overall conversion ratio control from $9 \times$ to $23 \times$...... 28
Table 2.2 Performance summery and comparison of the standalone voltage doubler. ................. 36
Table 2.3 Performance summary and comparison of the harvester.............................................. 36
Table 3.1 Performance summary and comparison........................................................................ 47
Table 4.1 Comparison of the number of configurable ratios in rational and binary converters... 60
Table 4.2 Performance summary and comparison........................................................................ 68
Table 5.1 Performance summary and comparison........................................................................ 80
Table 6.1 ADC Performance summary and comparison. ............................................................ 111
Table 7.1 Performance summary............................................................................................... 122

## ABSTRACT

Although the improvement in circuit speed has been limited in recent years, there has been increased focus on the internet of things (IoT) as technology scaling has decreased circuit size, power usage and cost. This trend has led to the development of many small sensor systems with affordable costs and diverse functions, offering people convenient connection with and control over their surroundings. This dissertation discusses the major challenges and their solutions in realizing small IoT systems, focusing on non-digital blocks, such as power converters and analog sensing blocks, which have difficulty in following the traditional scaling trends of digital circuits.

To accommodate the limited energy storage and harvesting capacity of small IoT systems, this dissertation presents an energy harvester and voltage regulators with low quiescent power and good efficiency in ultra-low power ranges. Switched-capacitor-based converters with wide-range energy-efficient voltage-controlled oscillators assisted by power-efficient self-oscillating voltage doublers and new cascaded converter topologies for more conversion ratio configurability achieve efficient power conversion down to several nanowatts.

To further improve the power efficiency of these systems, analog circuits essential to most wireless IoT systems are also discussed and improved. A capacitance-to-digital sensor interface and a clocked comparator design are improved by their digital-like implementation and operation in phase and frequency domain. Thanks to the removal of large passive elements and complex
analog blocks, both designs achieve excellent area reduction while maintaining state-of-art energy efficiencies.

Finally, a technique for removing dynamic voltage and temperature variations is presented as smaller circuits in advanced technologies are more vulnerable to these variations. A 2-D simultaneous feedback control using an on-chip oven control locks the supply voltage and temperature of a small on-chip domain and protects circuits in this locked domain from external voltage and temperature changes, demonstrating $0.0066 \mathrm{~V} / \mathrm{V}$ and $0.013{ }^{\circ} \mathrm{C} /{ }^{\circ} \mathrm{C}$ sensitivities to external changes. Simple digital implementation of the sensors and most parts of the control loops allows robust operation within wide voltage and temperature ranges.

## CHAPTER 1

## Introduction

In recent years, benefits from process scaling have becoming limited by side effects of scaling such as short-channel effects, leakages, and interconnect parasitic effects [1], [2]. This has resulted in a reduction in gate-delay scaling (Figure 1.1 [2]), which has been limiting the speed improvement of circuits solely by scaling previous circuits to new technology. This has necessitated architectural changes to maintain the expected trend of overall performance improvement. Fortunately, scaling trends of switching energy (Figure 1.1), and area and cost per transistor (Figure 1.2) still maintain as expected and hence, the overall performance has been able to grow by integrating more functional blocks without increasing overall area, cost and energy budgets.


Figure 1.1 Theoretical / actual scaling trend of delay and energy. [2]


Figure 1.2 "Cost/Area $\times$ Area/Transistor $=$ Cost/Transistor." [2], recited from [3]

Reflecting on these trends, parallel computing has become widespread as an alternative way to continue to improve performance while clock frequency improvement has slowed in the past decade as shown in Figure 1.3 and Figure 1.4. Despite of the limitation of the parallelism due to the challenges in hardware design and limited degree of parallelism in certain algorithms, parallel computing has seen significant success in certain useful algorithms and has opened a new application area of circuit design. For example, deep learning is receiving strong attention recently in order to improve performance in machine learning and signal processing, with help of new algorithmic and architectural trials involving massive degrees of parallelism. [4]


Figure 1.3 "Core counts on processors published at ISSCC." [1]


Figure 1.4 "Clock frequency scaling trends." [1]

On the other hand, maintained cost reduction trends for the same function has enabled another direction to expand the power of the circuit, by broadening its applications towards the areas that were unrealistic in old process technologies due to their costs or other resource requirements. Circuits and systems that do not necessarily need performance improvement can be made with lower area, cost and energy requirement, and realize more integration of diversified functions that interact with the outside environment, which is designated "More-than-Moore" in [5] (Figure 1.5).


Figure 1.5 "More than Moore" devices. [5]

Combined with the traditional processing performance improvement trends, this functional diversification trend can make a breakthrough in these new application areas and help people improve their quality of life as depicted in Figure 1.6. One good example of these combined trends can be found in recent success of smart phones. The performance of a smart phone is not better than a personal computer or a mobile laptop, but this new type of personal device has gained many new applications and changed our life thanks to its much smaller size and better mobility. Always connected to both an end-user and the Internet, a smart phone plays a role in building an intimate connection between the user and the Internet. On top of that, embedded sensors in smart phones have extended the possible applications by processing collected environmental data from those sensors such as camera, microphone and GPS in the network.


Figure 1.6 Dual trend of "miniaturization of digital functions (More Moore)" and "functional diversification (More than Moore)". [5]

The concept of the Internet of Things (IoT) expands this approach [6], [7], by trying to connect a person to not only the Internet, but also other surrounding objects around him/her. The
"objects" do not need to be restricted to certain types of electronic devices; rather than that, they can incorporate everything in our life, so that people feel always connected and have control over all the matters around them, including home appliance, car, or environment itself such as temperature, background music or light intensity in a room.

To broaden possible applications of the IoT area, it is important to collect and process data from many different sources. A sensor system is specialized in collecting environmental data and this plays a key role in connecting the IoT network to the real world. Advanced process technology enables implementation of circuits with the same functionality with smaller size and cost, and will allow many small sensor node systems to collect various environmental data with affordable total cost as expected in [7] (Figure 1.7). However, the reduced form factor of each sensor node draws additional challenges as many of the key components in the sensor systems does not follow the traditional scaling trend of digital circuits. In this dissertation, these new challenges in non-digital blocks such as power management and sensor interface blocks are discussed, while exploring possible solutions.


Figure 1.7 "Sensors will populate the world of IoE" [7]

### 1.1 Powering Small IoT Systems

Recent advances in low power circuits have enabled mm-scale wireless sensor node systems [8], [9], but providing power into those small IoT systems is not simple. Wiring through a power adaptor and cables from the wall outlet significantly increase total system size and cost. Embedding a battery in a system is not a good solution when solely used due to its limited energy capacity at this small form factor. Expecting increase of the number of IoT devices per person in the future, this short battery time can burden an end-user for charging too many devices repeatedly in a short period. Energy harvesting is an attractive way to replenish dissipated power from the battery and extend its battery time up to semi-permanent operation without any manual recharge, and combined with its expected low production cost, can enable an end-user to "install and forget" such sensors. However, the same size limitation restricts the amount of harvested power, which can be as low as tens of nW for mm-scale photovoltaic cells in indoor conditions. Efficient DCDC up-conversion at such low power levels (for battery charging) is challenging, and to solve this, this dissertation presents an energy harvester using new switched-capacitor voltage-doubler circuit that maintains high power conversion efficiency over a very wide power level range.

After energy harvesting, harvested energy stored in a battery has to be converted to the proper voltage level and supplied to each load circuit. In a small IoT system where the amount of stored energy is limited, having a good power-management unit (PMU) for such power conversion and distribution is important in improving overall energy utilization efficiency. The system's small form factor and chip size renders the use of inductive power conversion unfavorable, but adopting switched-capacitor (SC) DC-DC converters raises a second issue in controlling its conversion ratio because multiple-ratio reconfigurable converter is not as efficient as simple fixed-ratio converters. In addition, different from inductive conversion where single-inductor multiple-output converters
have been presented, it is difficult to generate multiple output voltage level using a single capacitive converter, which is important to optimize energy consumption in many IoT systems where different blocks have different optimum operating voltage level. In this dissertation, a fully integrated power management system that converts an input voltage within a $0.9 \mathrm{~V}-4 \mathrm{~V}$ range to 3 fixed output voltages, $0.6 \mathrm{~V}, 1.2 \mathrm{~V}$, and 3.3 V , is presented. While maintaining converter efficiency by limiting the number and level of the output voltages to 3 fixed voltage level, it also offers a choice of voltage for a load circuit so that the load circuit does not lose too much efficiency by using a voltage far off the optimum level.

To improve the number of configurable ratios and conversion efficiency from the binaryreconfigurable converter, a new reconfigurable SC DC-DC converter topology is presented in this dissertation so that it can be reconfigured to have any arbitrary rational conversion ratio: $\mathrm{p} / \mathrm{q}$, $0<\mathrm{p}<\mathrm{q} \leq 2^{\mathrm{N}}+1$. The key idea of the design, which we refer to as a rational DC-DC converter, is to incorporate negative voltage feedback into the cascaded converter stages using negativegenerating converter stages ("voltage negators"); this enables reconfiguring of both the numerator p and denominator q of the conversion ratio. With help from the current supply of the voltage negators, output conductance becomes comparable to conventional few-ratio SC DC-DC designs. Hence, the proposed design achieves a resolution higher than previous binary SC converters while maintaining the conversion efficiency of dedicated few-ratio SC converters.

### 1.2 Energy-Efficient Sensor and Data Converter

As the real world is analog, a sensor system that reads environmental variables usually require an analog sensor interface and an analog-to-digital converter. Hence, in addition to power
harvesting and management systems, a second important challenge of reducing the size of a sensor system is from these analog sensing parts. While most digital circuits can be relatively easily scaled with new process technology, analog circuits cannot in many cases [1], [4]. Because of the thermal noise limit of analog circuits, their current consumption is difficult to reduced. Reducing the level of supply voltage is also limited to maintain an optimum overdrive voltage at each transistor and enough signal swing at each signal path. Short-channel effects reduce the gain of each amplifier stage, making it difficult to get desired signal gain while maintaining loop stability. In addition, the size of the analog circuits is more difficult to scale than digital circuits, especially when they require passive elements such as resistors or capacitors.

This dissertation presents two related works in advanced processes, where the difficulty of analog circuit scaling is overcome by switching the structure and operation principle from traditional analog to a more digital approach. In these works, analog signals that have been traditionally represented in terms of voltage or current are represented as frequency, count or phase instead, which helps extend the signal swing and also facilitates digital processing downstream. Furthermore, this extended signal swing in phase domain can remove area-consuming passive elements such as load capacitors and biasing resistors, enabling significant area reduction.

First, a fully-digital capacitance-to-digital converter (CDC) with a new iterative delaychain discharge scheme is presented, where a ring-oscillator is used to discharge current from the sensed capacitor. By using a simple ring-oscillator to discharge or transfer charge from the sensed capacitor instead of using a complex current sources or switched-capacitor circuits, this circuit enables a simple, fully digital conversion scheme that is inherently linear over a wide range.

Second, a new energy-efficient ring oscillator collapse-based comparator, which we refer to as an edge-pursuit comparator (EPC), is presented. With help of limitless phase integration, this
comparator automatically adjusts its performance by changing the comparison energy according to its input difference without any control, eliminating unnecessary energy spent on coarse comparisons. Furthermore, a detailed analysis of the EPC in the phase domain shows improved energy efficiency over conventional comparators even without the automatic comparison energy scaling, and wider resolution tuning capability with small load capacitance and area. The EPC is used in a SAR ADC design, which supplements a 10-bit differential coarse CDAC with a 5-bit common-mode CDAC. This offers an additional 5 bits of resolution with common mode to differential gain tuning that improves linearity by reducing the effect of switch parasitic capacitance.

### 1.3 Improving Sensor Accuracy and Variation

As discussed above, challenges in implementing small sensor systems can be addressed by the proposed approaches so that a sensor system can efficiently distribute its harvested energy to operate an energy-efficient sensor to read environmental variables. However, there still exists an unresolved challenge - sensor offset due to process variation and environmental factors. Static offset can be easily removed by 1-point calibration, but the amount of offset is affected by different types of process, voltage and temperature (PVT) variations, so that it changes constantly and degrades the accuracy of sensor readout. As the sensor size becomes smaller, the effect of these variations can become more serious, as local process variation or voltage / temperature fluctuation are not averaged throughout the whole sensor area because of its too small in size.

To help overcome this challenge, an on-chip voltage / temperature locking circuit is proposed in this dissertation to remove offset variation from voltage and temperature change.

Rather than compensating for temperature and voltage variation, variation-sensitive circuits such as reference generators or high-fidelity analog circuits can operate in this locked domain to reduce their variation across voltage and temperature. Using the references in this locked domain as the standard references, other on-chip circuits can be repeatedly calibrated to maintain accuracy when its environment changes. Despite its relatively large power consumption for on-chip heating, its fast locking speed from local heating will limit the energy consumption per new calibration and calibrations need to be performed only periodically.

### 1.4 Outline of the Dissertation

This dissertation proposes circuit techniques to solve several major problems in implementation of small sensor systems. Limitation of energy in a small form factor is resolved by energy harvesting and improving energy utilization efficiency of power management units and actual sensors. Difficulties of analog circuit scaling are overcome by more digital-like implementation. Remaining offset issue is solved by voltage / temperature locking for variation removal.

Chapter 2 presents an ultra-low power energy harvesting circuit that can harvest from as low as 3 nW power source, which corresponds to output power level from $1 \mathrm{~mm}^{2}$ solar cell under dim room light. Chapter 3 presents a power management system that maintains $>60 \%$ conversion efficiency within very wide output power range of 20 nW to 0.5 mW , covering almost entire operating range a sensor system. Chapter 4 presents a new reconfigurable converter topology for better conversion efficiency and output conductance. As a result of efficiency improvement, a test chip implemented in $0.18 \mu \mathrm{~m}$ CMOS offers 79 conversion ratios using only 3 cascaded converter
stages and 2 voltage negator stages, and achieves $>90 \%$ efficiency when downconverting from 2 V to a 1.1-to-1.86V output voltage range.

Chapter 5 presents a capacitance-to-digital converter in which most analog operation is implemented in digital manner. As a result, a test chip fabricated in 40nm CMOS performs conversion across a very wide capacitance range of 0.7 pF to over 10 nF with $<0.06 \%$ linearity error, while showing 35.1 pJ conversion energy and $141 \mathrm{fJ} / \mathrm{c}$-s FoM with 11.3 pF input capacitance, which marks the lowest conversion energy and FoM reported. Chapter 6 presents a SAR ADC using new, oscillator-collapse based energy efficient clocked comparator. A test chip fabricated in 40nm CMOS shows 74.12 dB SNDR and 173.4 dB FOMs. With the full ADC consuming 1.17 $\mu \mathrm{W}$, the comparator consumes 104 nW , which is only $8.9 \%$ of the full ADC power, proving the comparator's energy efficiency.

Chapter 7 presents an on-chip variation removal circuit. By using on-chip local oven control, both voltage and temperature of a local domain is locked at certain constant level and variation from voltage and temperature changes are removed in the domain. A test chip is fabricated in 14 nm FinFET process shows $0.0066 \mathrm{~V} / \mathrm{V}$ voltage sensitivity and $0.013{ }^{\circ} \mathrm{C} /{ }^{\circ} \mathrm{C}$ ambient temperature sensitivity with the accompanying heater consuming $\sim 2 \mathrm{~mW} /{ }^{\circ} \mathrm{C}$.

Finally, charter 8 concludes the dissertation by summarizing the contributions and discuss several possible research directions in the future. Related publications from the author is listed below.

## CHAPTER 2

## Ultra-Low Power Energy Harvester

### 2.1 Introduction

Energy harvesting is an attractive way to power such systems due to the limited energy capacity of batteries at these form factors. However, the same size limitation restricts the amount of harvested power, which can be as low as tens of nW for mm-scale photovoltaic cells in indoor conditions. Efficient DC-DC up-conversion at such low power levels (for battery charging) is extremely challenging and has not yet been demonstrated.

Boost DC-DC converters are widely used to harvest energy from DC sources and yield high conversion efficiency [10]-[13]. However, they require a large off-chip inductor at low harvested power levels, increasing system size. Alternatively, switched-capacitor (SC) DC-DC converters can be fully integrated on-chip and are favored for form-factor constrained applications [14]-[21]. At low power levels, SC converter efficiency is constrained by the overheads of clock generation and level-conversion to drive the switches. As a result, efficient SC converter operation has been limited to the $\mu \mathrm{W}$ range.

This dissertation presents a fully integrated switched-capacitor energy harvester that consists of cascaded self-oscillating voltage doublers. In each voltage doubler, an oscillator is completely
internalized and clocking power overhead is reduced. The reduced power overhead of both clock generation and level shifting enables the harvester to operate with very weak power sources, as low as a few nWs. By completely integrating the clock generation in the SC, the overhead scales with the current load resulting in a very wide load range of $\sim 1000 \times$. By adjusting the number of cascaded voltage doublers as well as with a new method of modulating the low voltage applied to each doubler stage, the overall conversion ratio can be configured between $9 \times$ and $23 \times$.

### 2.2 Self-Oscillating Voltage Doubler

### 2.2.1 Motivation and Basic Structure

As shown in Figure 2.1, conventional SC DC-DC voltage doublers generally consist of three parts: clock generator, level shifter and switched capacitor network (SCN). The clock generator produces a clock, which is fed into the level shifters. The level shifters take the clock and create switch control signals for the SCN. As the clock oscillates, the SCN periodically changes its connections to generate the output voltage. Each of these blocks introduces power overhead, reducing efficiency. Looking at each transistor in the complete converter circuit, the dynamic power consumption of SCN switches directly contributes to generating output power, whereas the clock generator and level shifter power consumption does not contribute to output power. As a result, the basic motivation of the proposed self-oscillating voltage doubler is to remove the unnecessary power consumption of those secondary blocks and transistors.

Figure 2.2 shows the basic structure of the self-oscillating voltage doubler. It consists of two stacked ring oscillators with output nodes of corresponding stages connected through flying
caps ( $\mathrm{C}_{1} \sim \mathrm{C}_{\mathrm{N}}$ ). In each stage, inverters from the top and bottom ring either charge or discharge the flying cap, thereby transferring power to the upper ring. Simultaneously, the inverters drive the next stage in their ring, creating a multi-phase DC-DC converter with overlapping charge/discharge phases and self-sustaining operation. Every transistor in this structure is essentially a flying cap switch and hence dynamic power loss is minimized since there are no superfluous transistors. The natural multi-phase operation reduces output voltage ripple with little cost.


Figure 2.1 Structure of a conventional capacitive voltage doubler.


Figure 2.2 Basic structure of the proposed self-oscillating voltage doubler.

Another advantage of this structure lies in reduced level shifting overhead. Conventional level shifters generally use output keepers, which generate contention loss in addition to dynamic power loss. This contention loss comes from the timing mismatch among the signals of a level shifter; depending on the amount of mismatch, contention loss can dominate dynamic power consumption and greatly reduce overall efficiency. Several previous SC voltage converters have used nonoverlapping clocks to reduce level shifting contention loss [17]-[19]. However, this introduces another overhead, i.e., generation of the nonoverlapping clocks. Additionally, such a converter does not actively convert power during the nonoverlapping periods, reducing its maximum output power.

The self-oscillating voltage doubler has no dedicated level shifter because both ring oscillators actively generate their own clock signals. However, contention loss can still arise from phase mismatch between the two oscillations. This is mitigated by the fact that the two oscillators
are synchronized at every stage and hence the amount of mismatch is very small, avoiding the need for nonoverlapping clocks. According to simulation results, phase mismatch is less than $1 \%$ of a fanout-of-4 (FO4) inverter delay, and contention loss from this mismatch is also under $1 \%$ of total dynamic power loss.

The self-oscillating voltage doubler is capable of self-startup regardless of its initial state. When $\mathrm{V}_{\text {IN }}$ is initially supplied to $\mathrm{V}_{\text {MED }}$, the bottom oscillator starts oscillating. In each SCN stage of the doubler, both the nodes before and after the flying cap driver are coupled between the top and bottom oscillator. Therefore, even when $\mathrm{V}_{\text {HIGH }}$ is very low and the top oscillator is not oscillating by itself, the coupled nodes in the top oscillator will be rising and falling, and hence electrical charge is transferred to $\mathrm{V}_{\text {HIGH }}$ solely due to the driving capability of the bottom oscillator. Due to this fluctuation of the top nodes, $\mathrm{V}_{\text {HIGH }}$ can rise above the average voltage level of the top nodes. As $\mathrm{V}_{\text {High }}$ becomes higher, the average level of the top nodes also increases, forming a positive feedback that raises $\mathrm{V}_{\text {HIGH }}$ above $\mathrm{V}_{\text {MED }}$. As $\mathrm{V}_{\text {HIGH }}$ rises higher than $\mathrm{V}_{\mathrm{MED}}$, the top oscillator starts normal oscillation on its own. Because the top oscillator is initially much weaker than the bottom, the top oscillation is naturally synchronized to the bottom oscillator. After synchronization, the voltage doubler starts normal operation, continually generating output power.

### 2.2.2 Modulation Scheme for Optimum Conversion Efficiency

The self-oscillating voltage doubler is modulated to maintain optimum conversion efficiency over a wide range of output power levels. The specific goal of the modulation is to balance conduction and switching losses by examining the ratio of output to input voltages ( $\mathrm{R}_{\text {DIV }}$ $=\mathrm{V}_{\text {OUT }} / \mathrm{V}_{\text {IN }}$ ). A low $\mathrm{R}_{\text {DIV }}$ indicates a large voltage across the switches and dominant conduction
loss. Conversely, high $R_{\text {DIV }}$ indicates low conduction loss (zero as $R_{\text {DIV }} \rightarrow 2$ ) and more dominant switching losses due to a higher frequency needed to transfer the same amount of load current.

To find optimum $R_{\text {DIV }}$, we first define $\mathrm{C}_{\text {FLY }}$ as the total amount of flying cap, f as the oscillation frequency, and $\Delta$ as the amount of voltage drop:

$$
\begin{equation*}
\Delta=2 V_{I N}-V_{O U T} . \tag{2.1}
\end{equation*}
$$

The voltage doubler operates in a multi-phase manner with low ripple, and hence Vout is assumed to be constant in this analysis. In this case the input power to the voltage doubler $\mathrm{P}_{\text {IN }}$ can be approximately written as

$$
\begin{equation*}
P_{I N}=2 C_{F L Y} V_{I N} \Delta f, \tag{2.2}
\end{equation*}
$$

by additionally assuming that $\Delta \ll \mathrm{V}_{\text {IN }}$ and that the top and the bottom oscillators have similar total parasitic capacitances. With these additional assumptions, the active current going out from $\mathrm{V}_{\text {HIGH }}$ to $\mathrm{V}_{\text {MED }}$ through the top oscillator is nearly reused as the active current flowing from $\mathrm{V}_{\text {MED }}$ into $V_{\text {Low }}$ through the bottom oscillator. Therefore, only a small portion of the total parasitic effect, or switching loss, is actually incorporated into the true input power, hence the approximate equation is relatively accurate. Simulation results also support the existence of this current reuse and the $\mathrm{P}_{\text {IN }}$ approximation. For example, in a simulation with $\Delta=0.2 \mathrm{~V}_{\text {IN }}$, true input power differs from $\mathrm{P}_{\text {IN }}$ in Equation (2.2) only less than $15 \%$ of the total switching loss.

Conduction loss $\mathrm{L}_{\mathrm{C}}$ comes from the effective internal resistances of the voltage converter. Assuming DC at the power rails, this loss is the same as the loss from charge sharing, and can be written as

$$
\begin{equation*}
L_{C}=C_{F L Y} \Delta^{2} f . \tag{2.3}
\end{equation*}
$$

Switching loss $L_{S}$ is the total dynamic power loss in the voltage doubler:

$$
\begin{equation*}
L_{s}=\left(\sum_{\text {non-flying }} \alpha_{i} C_{i} V_{s w i n g}^{2}\right) f=C_{E F F} V_{I N}^{2} f \tag{2.4}
\end{equation*}
$$

where $\mathrm{C}_{\mathrm{i}}$ is every non-flying capacitor including parasitic capacitance, and $\mathrm{V}_{\text {SWING }}$ and $\alpha$ are the voltage swing and activity factor of each non-flying capacitor, respectively. $\mathrm{C}_{\text {EFF }}$ is defined as

$$
\begin{equation*}
C_{E F F}=\sum_{n o n-f l y i n g} \alpha_{i} C_{i} \frac{V_{S W I N G_{i}}^{2}}{V_{I N}^{2}} \cong \sum_{n o n-f l y i n g} C_{i} \tag{2.5}
\end{equation*}
$$

and is independent of the oscillation frequency. This value depends on $\Delta$ because the $\mathrm{V}_{\text {SWING }}$ of the top oscillator nodes depend on $\Delta$, however it is fairly constant with $\Delta \ll \mathrm{V}_{\text {IN }}$.

The ratio of these losses to input power can then be written as:

$$
\begin{equation*}
\frac{L_{C}}{P_{I N}}=\frac{C_{F L Y} \Delta^{2} f}{2 C_{F L Y} V_{I N} \Delta f}=\frac{\Delta}{2 V_{I N}} \tag{2.6}
\end{equation*}
$$

and

$$
\begin{equation*}
\frac{L_{S}}{P_{I N}}=\frac{C_{E F F} V_{I N}^{2} f}{2 C_{F L Y} V_{I N} \Delta f}=\frac{C_{E F F} V_{I N}}{2 C_{F L Y} \Delta} . \tag{2.7}
\end{equation*}
$$

These two ratios are clear functions of $\Delta$. Assuming $\Delta \ll \mathrm{V}_{\text {IN }}$ and neglecting the weaker dependency of $\mathrm{C}_{\mathrm{EFF}}$ on $\Delta$, the inequality of arithmetic and geometric means:

$$
\begin{equation*}
\frac{x+y}{2} \geq \sqrt{x y} \tag{2.8}
\end{equation*}
$$

can be applied as illustrated in Figure 2.3, to obtain the lower bound of total loss ratio:

$$
\begin{equation*}
\frac{L_{T O T A L}}{P_{I N}}=\frac{L_{C}+L_{s}}{P_{I N}}=\frac{\Delta}{2 V_{I N}}+\frac{C_{E F F} V_{I N}}{2 C_{F L Y} \Delta} \geq \sqrt{\frac{\Delta}{V_{I N}} \times \frac{C_{E F F} V_{I N}}{C_{F L Y} \Delta}}=\sqrt{\frac{C_{E F F}}{C_{F L Y}}} . \tag{2.9}
\end{equation*}
$$



Figure 2.3 Rough dependency of voltage doubler loss elements on $\Delta$.

Therefore, maximum efficiency $\eta_{M A X}$ is

$$
\begin{equation*}
\eta_{M A X}=1-\left(\frac{L_{T O T A L}}{P_{I N}}\right)_{M I N}=1-\sqrt{\frac{C_{E F F}}{C_{F L Y}}} \tag{2.10}
\end{equation*}
$$

when the following equality condition is satisfied:

$$
\begin{equation*}
\frac{\Delta}{2 V_{I N}}=\frac{C_{E F F} V_{I N}}{2 C_{F L Y} \Delta}, \tag{2.11}
\end{equation*}
$$

put differently:

$$
\begin{equation*}
\frac{\Delta}{V_{I N}}=\sqrt{\frac{C_{E F F}}{C_{F L Y}}}=\left(\frac{L_{T O T A L}}{P_{I N}}\right)_{M I N} \tag{2.12}
\end{equation*}
$$

or

$$
\begin{equation*}
R_{D I V}=\frac{V_{O U T}}{V_{I N}}=2-\frac{\Delta}{V_{I N}}=2-\sqrt{\frac{C_{E F F}}{C_{F L Y}}}=1+\eta_{M A X} \tag{2.13}
\end{equation*}
$$

Therefore, as long as the circuit operates properly and these two losses are dominant, its optimum efficiency is nearly a constant value that is determined by the ratio of total parasitic capacitances to the total flying capacitances $\mathrm{C}_{\text {FLY }}$, and $\mathrm{R}_{\text {DIV }}$ at optimum efficiency is also a constant.


Figure 2.4 Leakage loss model of the voltage doubler.

As output power becomes smaller, leakage power loss becomes dominant over the conduction and switching losses. Leakage loss can be modeled as a constant current sink attached to the output node, as shown in Figure 2.4. In simulation, amount of equivalent leakage current, $\mathrm{I}_{\text {LEAK, }}$, does not vary over $8 \%$ across a wide output voltage range $\left(\mathrm{V}_{\text {IN }}<\mathrm{V}_{\text {OUT }}<2 \times \mathrm{V}_{\text {IN }}\right)$. In this model, overall conversion efficiency is

$$
\begin{equation*}
\eta_{\text {overall }}=\eta_{\text {without-leakage }} \times \frac{I_{L O A D}}{I_{L O A D}+I_{L E A K}} \tag{2.14}
\end{equation*}
$$

and is optimized with the same arguments as a voltage doubler with no leakage, if the load can be approximately considered as a constant current sink. Therefore, even when output power is very small, the optimum efficiency point is still at a similar condition to (2.13), namely:

$$
\begin{equation*}
R_{D I V} \cong 2-\sqrt{\frac{C_{E F F}}{C_{F L Y}}} \tag{2.15}
\end{equation*}
$$

In this work, voltage doubler oscillation frequency is modulated to achieve optimum $\mathrm{R}_{\text {DIv }}$. Delay blocks are inserted in the oscillation paths and their delay is controlled by an analog delay tuning voltage, $\mathrm{V}_{\mathrm{CTR}}$ (Figure 2.5). Negative feedback control of $\mathrm{V}_{\mathrm{CTR}}$ adjusts the output voltage level to the desired optimum level.

Instead of frequency modulation, a block enabling scheme is another candidate approach to use the proposed design in a high performance setting with higher power demands. In this scheme, several independent voltage doubler blocks that share the same input and output ports are prepared, with each block capable of being turned on/off independently. According to the desired output power level, the number of turned-on blocks are adjusted to keep optimum output to input voltage ratio. This scheme does not require any delay elements in the oscillation paths, eliminating efficiency loss from delay elements. To match time constants for charging/discharging flying caps to the oscillation period, the ring structure can be lengthened (i.e., more stages) to match its openloop clock signal path effort to each stage effort for charging/discharging a flying capacitor. In this scheme, the coarser granularity control relative to frequency modulation reduces efficiency when output power is lower than the optimal output power of a unit voltage doubler block. The block enabling scheme also requires more transistors and flying capacitors, increasing area. To focus on the ultra-low power design space, this work adopts the frequency modulation scheme.

### 2.2.3 Circuit Implementation



Figure 2.5 Implementation of the voltage doubler with frequency modulation.

Figure 2.5 shows the detailed implementation of the voltage doubler with frequency modulation. To modulate oscillation frequency, delay blocks are inserted in the oscillation paths. As shown in Figure 2.6, a delay block consists of two coupled leakage-based delay elements [8] and a pass transistor $\mathrm{T}_{\mathrm{P}}$ controlled by $\mathrm{V}_{\mathrm{CTR}}$. When the inputs $\mathrm{H}_{\mathrm{I}}$ and $\mathrm{L}_{\mathrm{I}}$ of a stage switch from high to low, output nodes $H_{O D}$ and LOD (driven low) become isolated. $\mathrm{T}_{\mathrm{P}}$ then provides a leakage path from $L_{O}$ to $L_{O D}$ that slowly raises $L_{O D}$ and, through $C_{C}$, also $H_{O D}$. Back-to-back inverters in the delay element provide positive feedback and amplify the transition once it reaches $\mathrm{V}_{\mathrm{TH}}$, creating a sharp edge. This transition is then passed to the next stage. The opposite transition functions similarly.


Figure 2.6 Detailed implementation (left) and timing diagram (right) of the delay block.

A higher $\mathrm{V}_{\mathrm{CTR}}$ allows $\mathrm{T}_{\mathrm{P}}$ to provide more leakage, reducing the delay and speeding the oscillation. The leakage through $\mathrm{T}_{\mathrm{P}}$ can be adjusted to any amount between its on and off currents, offering a very wide range of delay controllability. Additionally, due to the output isolation, the structure can produce very long, synchronized delays while the coupled positive feedback creates a sharp edge that limits short-circuit current and contention loss, enabling ultra-low power operation with very slow oscillation speed.

This structure also has an advantage for low-power self-startup and idle power minimization. It can oscillate even when the control voltage is 0 , though very slowly, and therefore, is capable of self-startup. When the input voltage become available from the cold stage, $\mathrm{V}_{\text {CTR }}$ goes up from zero voltage, speeding up its oscillation until it reaches optimum. Start-up energy is reduced because its initial oscillation starts from the slowest speed, minimizing dynamic energy
loss during start-up. When no input power is available from the power source, $\mathrm{V}_{\text {IN }}$ always becomes lower than $\mathrm{V}_{\text {DIV }}$, pulling down the control voltage $\mathrm{V}_{\mathrm{CTR}}$ to its lowest possible value. This automatically minimizes the idle power consumption.


Figure 2.7 Detailed implementation of the voltage divider (left) and the charge pump (right) from Figure 2.4.
$\mathrm{V}_{\mathrm{CTR}}$ is adjusted through negative feedback. A clocked comparator, operating at a fraction of the internal oscillator frequency, takes in a divided form of the output voltage ( $\mathrm{V}_{\text {DIV }}=$ Vout $/ R_{\text {DIV_DESIRED }}$ ) and the input voltage $\mathrm{V}_{\text {IN }}$. A charge pump then takes in the corresponding pull-up/pull-down signals and adjusts the delay tuning voltage $\mathrm{V}_{\mathrm{CTR}}$ as needed to either speed or slow the oscillation. As shown in Figure 2.7, the voltage divider is implemented with a combination of a diode stack and a capacitive divider, to provide both fast response and good low-frequency behavior. In the charge pump (Figure 2.7, right), two input inverter chains with small capacitive loads, CSTEP, determine the amount of charge transfer per cycle to be similar to VDD $\times \mathrm{C}_{\text {STEP }}$. Each chain also generates a short pulse at an output isolation transistor, turning it on briefly and only
while the mirrored current flows through. The isolation transistors are turned off otherwise and help sustain the output voltage more than 1000 times longer in simulation than without isolation, even when clock frequency is as low as a few Hz.

### 2.3 Energy Harvester

### 2.3.1 Overall Structure



Figure 2.8 Overall energy harvester architecture.

Figure 2.8 shows the block diagram of the complete harvesting system, consisting of 4 stages of cascaded voltage doublers, a negative voltage generator, and circuits for conversion ratio
control. A negative voltage is used to boost overall conversion ratio over $16 \times$ and to power control circuits. The negative voltage generator is implemented by connecting $\mathrm{V}_{\text {HIGH }}$ and $\mathrm{V}_{\text {MED }}$ of the doubler to $\mathrm{V}_{\text {IN }}$ and ground, respectively, resulting in $\mathrm{V}_{\text {NEG }} \approx-\mathrm{V}_{\text {IN }}$ at the $\mathrm{V}_{\text {LOw }}$ port of the doubler. The target $\mathrm{R}_{\text {DIV }}$ of each voltage doubler is adjusted for its optimal operation.


Figure 2.9 5-stage bootstrapped ring oscillator for voltage doublers with lower $\mathrm{V}_{\mathrm{TH}}$ switches and its timing diagram (top right).

To facilitate energy harvesting from a low voltage source (e.g., a photovoltaic cell under low light), the first stage and negative voltage generator use low $\mathrm{V}_{\mathrm{TH}}(\sim 300 \mathrm{mV})$ devices for their flying cap drivers. Bootstrapping is also used with these low $\mathrm{V}_{\mathrm{TH}}$ switches, as shown in Figure 2.9, to improve Ion / Ioff ratio at low input voltages. To ensure the bootstrapped signal does not decay in a clock cycle, every transistor in the bootstrap circuit uses a regular threshold voltage. For robust
bootstrapping with a fast oscillation frequency, a reset switch for each bootstrap capacitor is driven by the output $\Phi_{\mathrm{I}}$, which has an increased voltage swing. To eliminate the short-circuit path through the reset switches, an isolation transistor is inserted in each reset path, which is driven by $\Phi_{\mathrm{I}-2}$, the output signal of one of the previous bootstrap stages. Thick oxide I/O devices are used in the final doubler stage to protect the circuit from high voltages used to charge energy storage devices such as batteries or supercapacitors.

### 2.3.2 Conversion ratio modulation

The conversion ratio is adjusted by changing the number of cascaded stages. We propose an additional adjustment scheme where the $\mathrm{V}_{\text {Low }}$ of a doubler is switched among $\mathrm{V}_{\mathrm{IN}}, \mathrm{G}_{\mathrm{ND}}$, and $\mathrm{V}_{\mathrm{NEG}}$, as shown in Figure 2.8. If $\mathrm{V}_{\text {Low }}$ is set to $-\mathrm{V}_{\mathrm{IN}}$, the voltage across the flying cap increases, resulting in $\mathrm{V}_{\text {OUT }}=\left(\mathrm{V}_{\text {MED }}+\mathrm{V}_{\text {IN }}\right) \times 2-\mathrm{V}_{\text {IN }}=2 \times \mathrm{V}_{\text {MED }}+\mathrm{V}_{\text {IN }}$. If $\mathrm{V}_{\text {LOW }}$ is set to ground for all 4 cascaded stages, the overall conversion ratio is $16 \times$. However, if the final stage $\mathrm{V}_{\text {LOw }}$ is set to $\mathrm{V}_{\mathrm{NEG}}$, the overall conversion ratio increases by $1 \times$ to become $17 \times$. Similarly, setting the third stage $V_{\text {Low }}$ to $\mathrm{V}_{\text {NEG }}$ raises voltage $\mathrm{V}_{\mathrm{C}}$ by $\sim \mathrm{V}_{\mathrm{IN}}$, resulting in an increase of overall conversion ratio by $2 \times$. On the other hand, setting $\mathrm{V}_{\text {Low }}$ to $\mathrm{V}_{\text {IN }}$ decreases conversion ratio. In this way the conversion ratio is controlled in a binary manner as shown in Table 2.1, generating any integer ratio from $9 \times$ to $23 \times$. By changing the conversion ratio, harvester input voltage $\mathrm{V}_{\text {IN }}$ can be adjusted to closely approximate the maximum-power point of the power source, thereby optimizing the power harvested from the source. By selecting the bottom voltage from among three choices rather than just two, the overall conversion ratio range is greater and also the voltage across each doubler can be chosen properly for best operation. For example, the switch mapping shown in Table 2.1 first seeks to develop a larger voltage across the second doubler since its use of standard $\mathrm{V}_{\text {TH }}$ transistors, coupled with its lower amplitude (relative to later stages) make its operation more challenging.

Table 2.1 Switch mapping for harvester's overall conversion ratio control from $9 \times$ to $23 \times$.

| Ratio | 9× | 10x | 11x | 12× | 13x | 14× | 15x | 16× | 17x | 18× | 19x | 20x | 21× | 22× | 23x |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Bypass | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| $\mathrm{V}_{\mathrm{L} 2}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | GND | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ |
| $\mathrm{V}_{\text {L3 }}$ | $\mathrm{V}_{\text {IN }}$ | $\mathrm{V}_{\text {IN }}$ | GND | GND | GND | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ | GND | $\mathrm{V}_{\text {IN }}$ | $\mathrm{V}_{\text {IN }}$ | GND | GND | GND | $\mathrm{V}_{\text {NEG }}$ | $\mathrm{V}_{\text {NEG }}$ |
| $\mathrm{V}_{\text {L4 }}$ | $\mathrm{V}_{\text {IN }}$ | GND | $\mathrm{V}_{\text {IN }}$ | GND | $\mathrm{V}_{\text {NEG }}$ | GND | $\mathrm{V}_{\text {NEG }}$ | GND | $V_{\text {IN }}$ | GND | $\mathrm{V}_{\text {IN }}$ | GND | $\mathrm{V}_{\text {NEG }}$ | GND | $\mathrm{V}_{\text {NEG }}$ |
| $\mathrm{V}_{\text {A }}$ | 1× | $1 \times$ | $1 \times$ | 1× | 1× | 1× | 1× | $2 \times$ | $2 \times$ | $2 \times$ | $2 \times$ | $2 \times$ | $2 \times$ | $2 \times$ | 2× |
| $V_{B}$ | $3 \times$ | $3 \times$ | $3 \times$ | $3 \times$ | $3 \times$ | $3 \times$ | $3 \times$ | $4 \times$ | $5 \times$ | 5× | $5 \times$ | $5 \times$ | $5 \times$ | $5 \times$ | 5× |
| $V_{C}$ | $5 \times$ | $5 \times$ | $6 \times$ | $6 \times$ | $6 \times$ | $7 \times$ | $7 \times$ | $8 \times$ | 9× | $9 \times$ | 10x | 10x | 10x | 11x | 11x |
| Vout | $9 \times$ | 10x | 11x | 12× | $13 \times$ | 14× | 15x | 16x | 17x | 18× | 19× | 20× | 21× | $22 \times$ | $23 \times$ |



Figure 2.10 Dual switching scheme for the harvester to reconfigure its conversion ratio while maintaining its capability of self-startup.

To enable cold start of the complete system, the control logic (including the conversion ratio register) operates between $\mathrm{V}_{\text {NEG }}$ and $\mathrm{V}_{\text {IN }}$ rails. Upon initial system startup, $\mathrm{V}_{\text {NEG }}$ and $\mathrm{V}_{2 \mathrm{X}}$ become available first, thus allowing the control logic to turn on and configure the switches. As shown in Figure 2.10, every switch is realized with a dual structure, one controlled with lower voltages for harvester self-startup, and the other controlled by a level-converted higher voltage to strongly turn on the switch for high output power levels. As each stage is powered up, its internal frequency modulation begins to control the frequency for optimum efficiency.

### 2.4 Measured Results



Figure 2.11 Die micrograph of $0.18 \mu \mathrm{~m}$ CMOS test chip. Total flying cap sizes of the standalone voltage doubler and the harvester are 54 pF and 600 pF , respectively.

The proposed voltage doubler (standalone) and energy harvester are fabricated in $0.18 \mu \mathrm{~m}$ CMOS (Figure 2.11). The standalone voltage doubler uses bootstrapping to minimize its leakage. The division ratio of the output voltage divider in the frequency feedback control circuit (see Figure 5), which is equivalent to the desired output to input voltage ratio (RDiv_desired), is set to 1.73 for the standalone voltage doubler in all measurements.

Figure 2.1 shows a single doubler has $>70 \%$ measured efficiency across 1 nA to 0.35 mA output current ( $>10^{5}$ range) with low idle power consumption of 170 pW . Internal clock frequency
is modulated to maintain constant $\mathrm{R}_{\text {DIV }}$ and is proportional to the load current until the clock period becomes too short relative to the time constant for charging/discharging a flying cap. As described in the expression (2.13) in Section 2.2.2, the conversion efficiency of the doubler is nearly flat within its operational range with an efficiency of roughly RII_DESIRED $-1=73 \%$.


Figure 2.12 Measured results of the voltage doubler.

Figure 2.13 shows measured results of the harvester with different conversion ratios. Results show that a 0.35 V input can be converted to a $2.2 \mathrm{~V}-5.2 \mathrm{~V}$ voltage range with similar conversion efficiencies across settings. As conversion ratio goes up, output voltage level monotonically increases except for a transition from $16 \times$ to $17 \times$. At this transition, the number of cascaded stages increases from 3 to 4, thereby introducing another power loss at the first stage and lowering output voltage level. Figure 2.14 shows measured results of the harvester at different $\mathrm{V}_{\text {IN }}$. Conversion ratio is adjusted to maintain a similar $\mathrm{V}_{\text {OUT }}$ level. With $\mathrm{V}_{\text {IN }}=0.45 \mathrm{~V}$, corresponding to an outdoor condition, the harvester delivers $5 \mathrm{nW}-5 \mu \mathrm{~W}$ output power with $>40 \%$ efficiency and an idle power consumption $<3 \mathrm{nW}$. For $\mathrm{V}_{\text {IN }}=0.25 \mathrm{~V}$, corresponding to a solar cell under very low light, the
harvester can take in between 10 nW and 120 nW to charge a $\sim 4 \mathrm{~V}$ battery with $>35 \%$ efficiency. For both $\mathrm{V}_{\mathrm{IN}}$, the harvester's output power range well covered expected solar cell power range.


Figure 2.13 Measured results of the harvester with different conversion ratios.



Figure 2.14 Measured results of the harvester at different $\mathrm{V}_{\text {IN }}$.

Figure 2.15 shows the measured results with a small silicon solar cell $\left(0.84 \mathrm{~mm}^{2}\right)$ at the input. In one test, the harvester is connected to the solar cell under various light conditions. These results are shown in the graph as the X-marked points. In the second test, the solar cell operation is emulated using an external current source in parallel with the solar cell, to perform a finer grain
sweep of harvester performance. These two test results are very consistent as shown together in this graph, showing that the harvester can convert input power from the solar cell with up to $50 \%$ efficiency under a wide range of light condition, from dim room lighting to beyond outdoor daylight. Because of its low idle power consumption, the harvester shows $>35 \%$ end-to-end efficiency even under a dim light of 260lux, where the solar cell generates only 7 nW output power. By adjusting the conversion ratio the harvester can take in nearly $100 \%$ of the solar cell output power at its maximum power point for incident light up to 200klux, covering almost all practical light conditions (Figure 2.15, "Solar cell efficiency" curve).


Figure 2.15 Measured results of the harvester with a $0.84 \mathrm{~mm}^{2}$ silicon solar cell at the input.

A second chip is fabricated in $0.18 \mu \mathrm{~m}$ CMOS that includes the harvester with the same design specifications previously described but has interfaces compatible with the $\mathrm{M}^{3}$ (Michigan Micro-Mote) sensor system [9]. This chip is tested with a solar cell of $1.33 \mathrm{~mm}^{2}$ area to measure
its self-startup characteristic (Figure 2.16). As shown in Figure 2.17, the harvester cold starts with 55lux of light and a 5.2 nW power source and charges an output capacitance to 4 V , which is a voltage enough to charge a battery. Figure 2.18 shows measured results in different temperatures, with solar cell short circuit current overridden to 180 nA to emulate room lighting. The results show the harvester's robust operation across $-10^{\circ} \mathrm{C}-50^{\circ} \mathrm{C}$ temperature range.


Figure 2.16 Measurement setup for the second harvester chip's self-starting behavior.


Figure 2.17 Cold start behavior of the harvester powered by a $1.33 \mathrm{~mm}^{2}$ solar cell. Output is connected to a capacitor. Light is turned on at some time between $0 \sim 20$ s.


Figure 2.18 Measured results of the harvester in different temperatures, with solar cell $\mathrm{I}_{\mathrm{SC}}=180 \mathrm{nA}$.


Figure 2.19 Micrograph of a small $\mathrm{M}^{3}$ wireless sensor node system [9] with harvester (top right), and a graph of measured battery voltage (bottom) showing that its battery is continuously charged by the harvester during system operation.

This chip is integrated in a very small $\mathrm{M}^{3}$ wireless sensor node system (Figure 2.19, top right) with volume of approximately $1 \mathrm{~mm}^{3}$ [9]. A graph at the bottom shows the system battery voltage during operation. As shown in the graph, the system periodically wakes up and sends a radio signal every $\sim 3$ minutes. The positive slope in the battery voltage plot during sleep cycles show that the battery is being charged effectively by the proposed harvester. Table 2.2 and Table 2.3 summarize the voltage doubler and harvester performance and compares to prior related work.

Table 2.2 Performance summery and comparison of the standalone voltage doubler.

|  | [13] | [14] | [15] | This work (Doubler) |
| :---: | :---: | :---: | :---: | :---: |
| Technology | 32nm CMOS | 45nm SOI CMOS w/ trench cap | $0.13 \mu \mathrm{~m}$ CMOS | $0.18 \mu \mathrm{~m}$ CMOS |
| Architecture | Multi-phase voltage doubler | 1:2 step-up/down converter | Multi-phase voltage doubler | Self-oscillating voltage doubler |
| Conversion ratio | 1:2 | 2:1,1:2 | 1:2 | 1:2 |
| Tested input voltage | 1V-1.2V | 1V | 1V-1.2V | 1.2 V |
| Frequency | $250 \mathrm{MHz}-2 \mathrm{GHz}{ }^{1}$ | 100 MHz | N/R | $70 \mathrm{~Hz}-19 \mathrm{MHz}$ |
| Peak efficiency | 64\% | 90\% | 82\% | 75\% |
| Load range | $\begin{array}{\|c\|} \hline 0.4 \mathrm{~mA}-9 \mathrm{~mA} \\ \mathrm{w} />40 \% \text { efficiency } \end{array}$ | $0.5 \mathrm{~mA}-5 \mathrm{~mA}$ <br> w/ >80\% efficiency ${ }^{1}$ | $\begin{array}{\|c\|} \hline 0.15 \mathrm{~mA}-2.2 \mathrm{~mA} \\ \mathrm{w} />70 \% \text { efficiency } \end{array}$ | $\begin{gathered} 1 \mathrm{nA}-0.35 \mathrm{~mA} \\ \mathrm{w} />70 \% \text { efficiency } \end{gathered}$ |
| Load range in ratio | 1:23 ${ }^{1}$ | 1:10 ${ }^{1}$ | 1:15 ${ }^{1}$ | 1:350,000 |
| Area | $0.0067 \mathrm{~mm}^{2}$ | $0.0012 \mathrm{~mm}^{2}$ | $2.25 \mathrm{~mm}^{2}$ | $0.069 \mathrm{~mm}^{2}$ |

N/R: Not reported
${ }^{1}$ Estimated number from the paper

Table 2.3 Performance summary and comparison of the harvester.

|  | [5] | [10] | [16] | This work (Harvester) |
| :---: | :---: | :---: | :---: | :---: |
| Technology | $0.13 \mu \mathrm{~m}$ CMOS | 65 nm CMOS | $0.35 \mu \mathrm{~m}$ CMOS | $0.18 \mu \mathrm{~m}$ CMOS |
| Architecture | Transformer selfstartup | Integrated charge pump | Integrated charge pump | Cascade of integrated voltage doublers |
| Fully integrated | No (off-chip transformer) | Yes | Yes | Yes |
| Self-startup | Yes (min. 40 mV ) | Yes (min. 120mV) | $N / R$ | Yes (min. 140 mV ) |
| Input voltage | 40 mV -300mV | 0.12V-0.16V | 0.6V-4V | $0.14 \mathrm{~V}-0.5 \mathrm{~V}$ |
| Output voltage | 2V | 1V, 1.8V, 3V | N/R | $\begin{gathered} 2.2 \mathrm{~V}-5.2 \mathrm{~V} \\ (0.35 \mathrm{~V} \mathrm{VIN}, 10 \mathrm{nA} \text { IOAD }) \end{gathered}$ |
| Peak efficiency | 61\% @ 0.3V $\mathrm{V}_{\text {IN }}$ | 38.8\% @ 0.12V $\mathrm{V}_{\text {IN }}$ | 70\% @ 2V VIN | $50 \%$ @ $0.45 \mathrm{~V} \mathrm{~V}_{\text {IN }}$ |
| Output power range | N/R |  | $1 \mu \mathrm{~W}-1 \mathrm{~mW}$ <br> (Only peak efficiency reported) | 5nW-5 $\mu \mathrm{W}$ w/ >40\% efficiency |
| Idle power consumption | N/R | N/R | $2 \mu \mathrm{~W} @ 100 \mu \mathrm{~W}$ input $7 \mu \mathrm{~W}$ @ 1mW input | $<3 n W$ |
| Minimum input power | N/R | N/R | N/R | 6 nW for self-startup <br> 1.7 nW to sustain harvesting |
| Area | $0.093 \mathrm{~mm}^{2}$ | $0.78 \mathrm{~mm}^{2}$ | $59 \mathrm{~mm}{ }^{2}$ | $0.86 \mathrm{~mm}^{2}$ |

[^0]
### 2.5 Conclusion

An ultra-low power fully integrated energy harvester based on a novel SC voltage doubler structure is presented. Internalized clock generation and clock frequency modulation allow the doubler to operate across a wide load range $(>105 \times$ ) with low idle power consumption of 170 pW . Four voltage doublers are cascaded to form an energy harvester, which can operate with a very limited power source of a few nWs. Overall harvester conversion ratio is configurable from $9 \times$ to $23 \times$ using bottom voltage switching, a negative voltage generator, and cascaded stage count, generating 2.2V-5.2V VOUT from 0.35 V VIN. Measured results with a small silicon solar cell ( 1.33 mm 2 ) show the harvester cold starts with 55lux of light and a 5.2 nW power source. The harvester chip is integrated in an actual wireless sensor node system and demonstrates charging of the system battery during typical operation.

## CHAPTER 3

## Low-Power Wide-Range Power Management Unit

### 3.1 Introduction

As Internet-of-Things (IoT) systems proliferate there is a greater demand for small and efficient power management units. Fully-integrated switched-capacitor (SC) DC-DC converters are promising candidates due to their small form factor and low quiescent power, aided by dynamic activity scaling [22]-[24]. However, they offer a limited number of conversion ratios, making it challenging to use in actual systems since they often require multiple output voltages (to reduce power consumption) and use various input power sources (to maximize flexibility). In addition, maintaining both high efficiency and fast load response is difficult at low output current levels, which is critical for IoT devices as they often target low standby power to preserve battery charge.

This dissertation presents a fully integrated power management system that converts an input voltage within a $0.9 \mathrm{~V}-4 \mathrm{~V}$ range to 3 fixed output voltages: $0.6 \mathrm{~V}, 1.2 \mathrm{~V}$, and 3.3 V . A 7 -stage binaryreconfigurable SC DC-DC converter [22], [23] first generates 1.2 V from battery voltage input, and $0.6 \mathrm{~V}, 3.3 \mathrm{~V}$ output is generated from 1.2 V output from the binary converter, each by $2: 1$ downconverter and 1:3 Dickson upconverter, respectively. Only one reconfigurable converter is used to optimize overall conversion efficiency within a small chip area. While maintaining converter efficiency by limiting the number and level of the output voltages to 3 fixed voltage
levels, it also offers a rough choice of voltage for a load circuit so that the load circuit does not lose too much efficiency by using a voltage far off the optimum level.

### 3.2 Overall Architecture and Operation



Figure 3.1 Overall architecture of the complete power-management system and its operation.

Figure 3.1 explains the overall structure of the full system (top) and its operation (bottom). It contains three SC converters (binary-reconfigurable SC up/downconverter, 1:3 Dickson upconverter, 2:1 SC downconverter) with each responsible for generating one of the three output
voltage: $1.2 \mathrm{~V}, 3.3 \mathrm{~V}$, and 0.6 V . The binary-reconfigurable up/downconverter converts a wide range of input voltage into a 1.2 V output voltage. The Dickson upconverter and 2:1 downconverter then receive this 1.2 V output and convert it into 3.3 V and 0.6 V , respectively.

Proper conversion ratio configuration of the binary converter is important for robust and power-efficient 1.2 V generation. If the ratio is set too low, the binary converter output cannot reach 1.2 V while if the ratio is set too high, conversion efficiency worsens due to large conduction loss. The system regulates the conversion ratio by using both feedback and feedforward control [24]. When the system input voltage $\left(\mathrm{V}_{\mathrm{BAT}}\right)$ becomes available, the main controller starts up and turns on the binary converter with a small default ratio. Conversion ratio is continually increased by feedback control until the converter output voltage V1P2 reaches $\sim 1.2 \mathrm{~V}$, which triggers the 'output on detector'.

At this point, the 1:3 Dickson upconverter turns on and generates the higher voltages 2.4 V and 3.3 V . The 2.4 V is then used to power an internal 1.2 V voltage reference and LDO to generate a clean reference voltage $\mathrm{V}_{\text {REF }}$ for more accurate regulation of the 1.2 V supply voltage. After $\mathrm{V}_{\text {REF }}$ becomes available, feedforward control acts to set the binary conversion ratio by directly computing the desired conversion ratio using an 8 -bit ADC. After the ADC has measured the battery voltage, the conversion ratio is calculated in digital logic to be the measured ratio $\mathrm{V}_{\text {REF }}$ / Vbat plus an offset value; this allows for an optimal voltage drop to balance conduction and switching losses, maximizing efficiency. After the system is fully turned on, the binary converter ratio is periodically adjusted while supplying output voltages, allowing for self-adaptation in the face of slow input voltage drift arising from battery discharge or temperature changes, both of which frequently occur in wireless IoT systems.

### 3.3 Key Building Blocks and Techniques

### 3.3.1 Switched-Capacitor DC-DC Converters



## 7-bit Recursive Binary-Reconfigurable SC DC-DC Converter



Non-overlapping Clock/
Periodic Reset Generator


2:1 SC Downconverter
1:3 Dickson SC Upconverter
Figure 3.2 Structure of SC Converters

Figure 3.2 shows the structure of the three SC converters: a 7-bit binary-reconfigurable up/downconverter, a 2:1 SC downconverter, and a 1:3 Dickson upconverter. The 7-bit binary converter (Figure 3.2, top) consists of seven 2:1 SC converters with configuration switches following a recursive topology [23]. Because the supply voltage level into each stage varies dynamically as the conversion ratio is continuously reconfigured, flying capacitance drivers are
implemented by level shifters using cross-coupled PMOS and NMOS switches to maintain the same clock swing and current drivability regardless of their voltage levels. Whenever the conversion ratio changes the intermediate voltages among stages have to be refreshed, while each internal node in the cross-coupled switches must be stabilized with respect to its corresponding intermediate voltage. This yields a chicken and egg problem because intermediate voltages can be stabilized into new values only when the cross-coupled switches are working properly, however these switches work properly only when the intermediate voltages are stabilized. By alternating normal operation and reset of the SC converters by a periodic reset generator (Figure 3.2, bottom left), those two floating nodes can be stabilized at the same time when necessary.

### 3.3.2 Converter Frequency Control Loop

In addition to conversion ratio adjustment, DC-DC converters in IoT systems should be able to self-adapt to widely varying output load conditions to maintain good efficiency. Figure 3.3 shows the frequency control loop of the binary converter, consisting of a main feedback path and a fast voltage drop detection path. For initial startup, the main path compares the first stage output V1 with the divided input voltage, maintaining a proper amount of voltage drop $\Delta$ through the first stage for optimum conversion efficiency. After the system is fully turned on, the binary converter output V1P2 is compared to $V_{\text {ReF }}$ to be the same level as $V_{\text {Ref. }}$. Given this ability to maintain a constant output voltage level, the binary converter offers near-optimum efficiency across load conditions as the conversion ratio is already configured for a proper voltage drop amount for optimum efficiency (via the feedforward ratio control path). The 1:3 Dickson upconverter and 2:1 downconverter also have similar frequency control loops for their own oscillators, allowing each
of the 3 converters to independently adapt to different load currents at their corresponding output voltages.


Frequency control loop for the binary converter


Figure 3.3 Frequency control loop for each SC converter with load-proportional biasing scheme.

### 3.3.3 Load-Proportional Biasing

The entire control loop operates with a divided converter clock to maintain dynamic power consumption that is proportional to the SC converter switching loss. This ensures that efficiency loss due to the control loop is always a small predictable value regardless of load current level. Other digital blocks are also clocked by this divided converter clock (Figure 3.3, bottom right). This helps reduce their power consumption relative to the system's output power level, but also
maintains control loop stability since the operating speed of the various blocks are all scaled by the same factor - hence, blocks can communicate with each other at similar relative response speed, including voltage output.

### 3.3.4 Drop Detector for Rapid and Robust Frequency Adjustment

While the load-proportional speed adjustment scheme offers these benefits, it also has a drawback in the case of small output power. In that case the system responds slower relative to external condition changes such as a sudden load current increase. To address this problem, the frequency control loop in each converter has a dedicated fast voltage drop detector that monitors converter output voltage and triggers a drop detection signal when it goes below certain threshold. The drop detector requires periodic reset to detect output voltage change and maintain a certain threshold level. Hence each converter contains two drop detectors for uninterrupted overlapping operation and detection: one detector always remains on while the other is being reset. By focusing only on triggering upon a fast single drop event without considering stability or continuous operation, the detector's response time can be boosted hundreds times faster (simulation) than the main feedback path, rendering the control loop fast enough for sudden current load changes. Once the detection signal is triggered, the clock frequency is set to its maximum, quickly restoring the output voltage to safe levels. Afterwards, feedback through the main path slowly lowers the clock frequency to support any sustained increase in load current. Drop detector bias current is also adjusted to be load proportional.


Figure 3.4 Die micrograph of a test chip.


Figure 3.5 Measured performance vs. input voltage.

### 3.4 Measured Results



Figure 3.6 Measured drop detector operation.


Figure 3.7 performance vs. output power.

The power management system chip was fabricated in 0.18um CMOS (Figure 3.4). As shown in Figure 3.5, the system stably delivers $1.2 \mathrm{~V}, 3.3 \mathrm{~V}$, and 0.6 V output voltages from an input voltage ranging from 0.9 V to 4 V . Figure 3.6 shows the drop detector responds to 100 X sudden load current increase. Graphs in Figure 3.7 shows the converter supplies $20 \mathrm{nW}-500 \mu \mathrm{~W}$ with $>60 \%$ efficiency. Table 3.1 summarizes result and compares the design with previous relevant work.

Table 3.1 Performance summary and comparison.

| Metric | $[17]$ | $[18]$ | $[19]$ | This Work |
| :---: | :---: | :---: | :---: | :---: |
| Topology | $7-b$ SAR <br> SC | $4-b$ Recursive <br> SC | Series-Parallel <br> SC (2:1, 3:2) | 7-b Binary + 2:1 SC + <br> 1:3 Dickson |
| Technology | 180 nm | 250 nm | 32 nm | 180 nm |
| Capacitor Type | MIM + MOS | MIM | Deep Trench | MIM + MOS |
| Capacitance (nF) | 2.24 | 3 | 64 | 3 |
| Input Range | $3.4 \sim 4.3 \mathrm{~V}$ | 2.5 V | 1.8 V | $0.9 \sim 4 \mathrm{~V}$ |
| Output Range | $>0.45 \mathrm{~V}$ | $0.1 \sim 2.18 \mathrm{~V}$ | $0.7 \sim 1.1 \mathrm{~V}$ | $0.6,1.2,3.3 \mathrm{~V}$ <br> (regulated) |
| Number of <br> Conversion <br> Ratios | 117 | 15 | 2 | $127 \times 2(\mathrm{up} / \mathrm{down})$ |
| Load Range | 300 uA | 2 mA | 10 W | $20 \mathrm{nW} \mathrm{\sim 500uW@>60} \mathrm{\%}$ |
| Clock Frequency | $80 \mathrm{kHz} \mathrm{\sim 2.7MHz}$ | $200 \mathrm{kHz} \mathrm{\sim 10MHz}$ | $<62.5 \mathrm{MHz}$ | $50 \mathrm{~Hz} \sim 10 \mathrm{MHz}(\mathrm{sim})$ |
| Peak Efficiency | $72 \%$ | $85 \%$ | $85.1 \%$ | $81 \%$ |

### 3.5 Conclusion

A fully integrated power management system that converts an input voltage within a 0.9 V 4 V range to 3 fixed output voltages, $0.6 \mathrm{~V}, 1.2 \mathrm{~V}$, and 3.3 V , is presented. A 7 -stage binaryreconfigurable DC-DC converter enables the wide input voltage range. Three-way dynamic frequency control maintains converter operation at near-optimum conversion efficiency under
widely varying load conditions from 5 nW to $500 \mu \mathrm{~W}$. A load-proportional bias scheme helps maintain high efficiency at low output power, fast response time at high output power and retains stability across the entire operating range. Analog drop detectors improve load response time even at low output power, allowing the converter to avoid the need for external sleep/wakeup control signals. Within a range of $1 \mathrm{~V}-4 \mathrm{~V}$ input voltage and $20 \mathrm{nW}-500 \mu \mathrm{~W}$ output power, the converter shows $>60 \%$ conversion efficiency while maintaining responsiveness to 100 X sudden current increase.

## CHAPTER 4

# Rational Conversion Ratio SC DC-DC Converter using Negative Output Feedback 

### 4.1 Introduction

### 4.1.1 Switched-Capacitor DC-DC Converters

Switched converters have been widely used in DC-DC voltage conversion because of their high conversion efficiency and simple structure that can easily be miniaturized. Among them, switched-capacitor (SC) DC-DC converters have several advantages over inductive switching DCDC converters. First, they use capacitors for temporal energy storage, which can easily be integrated on chip using general CMOS processes. While it is difficult to make on-chip inductors with a high quality factor, on-chip capacitors with little parasitic capacitance can be fabricated easily, enabling it possible to make fully-integrated DC-DC converters with high efficiency [19], [25], and the loss from parasitic capacitance can be further reduced by charge recycling [25]. Techniques to increase the density of flying capacitors such as deep trench capacitors [19] has already achieved high output power density per unit area comparable to inductive converters.

In addition to its ease of integration, SC conversion scheme has another advantage in its control. Switching converters, either capacitive or inductive, generates DC output voltage by
filtering a switching signal through high reactance of the energy-storage devices. Therefore, inductive converters require fast switching signals, or well-controlled short time switching signal even in discontinuous operation, to maintain the reactance of inductors, because the reactance of an inductor is proportional to the signal frequency. These switching control necessitates consuming some quiescent power for inductive converters. On the other hand, reactance of a capacitor is inversely proportional to the signal frequency through it, so SC converters can use any low switching frequencies when neglecting the effect of leakage currents. This characteristic of the SC converters enables them to easily have very low quiescent power, and therefore, very wide input / output power range. [26]

These advantages render SC DC-DC converters promising for integrated voltage regulators, especially for small, low-power systems such as Internet-of-Things (IoT) systems or wireless sensor nodes (WSNs). However, many SC DC-DC converters offer only a few conversion ratios, limiting their use for systems in which either the input and output voltages vary. This is particularly important in wireless systems where various input power sources can be used for energy harvesting, battery voltage degrades slowly even relying on a battery, and multiple, varying output voltages are usually required to optimize system power consumption.

### 4.1.2 Binary-ratio-reconfigurable SC converters

Though many ratio-reconfigurable SC converters has been reported to solve this problem, [24], [27], [28] they usually offered a few conversion ratios that are not enough to maintain high conversion efficiency across wide input and output voltage ranges with a fine grain enough for effective voltage scaling. S. Bang et al. proposed a converter structure called SAR SC DC-DC converter [22] to configure simple 2:1 SC downconverters in a cascaded topology as shown in

Figure 4.1 to achieve arbitrary binary ratios: $\mathrm{p} / 2^{\mathrm{N}}, 0<\mathrm{p}<2^{\mathrm{N}}$, where N is the maximum number of cascaded $2: 1$ converter stages.


Figure 4.1 SAR SC DC-DC converter. [7]


Figure 4.2 Recursive SC DC-DC converter. [29]

In contrast with previous reconfigurable SC converters with only a few fixed ratios with limited granularity, this structure introduced a new way of reconfigurable SC converters, offering
any ratios that can be represented as a finite binary fraction. This means that this converter can offer any conversion ratios between 0 and 1 with any desired ratio error tolerance by only increasing the number of cascaded stages. However, as the number of cascaded stages increases, the conduction path from the power source to output passes through more converter stages, giving rise to more conduction loss. Therefore, overall efficiency of the converter with more stages becomes worse and worse.

This structure was improved in [29] by modifying the power connection and reversing the cascading order to increase output conductance, as shown in Figure 2. By choosing input of each stage between the output of the previous stage and original supply rails, this structure, named recursive SC DC-DC converter, has an improved output conductance compared to [22]. If each stage is sized exponentially with base of 2 , this converter has an output conductance of

$$
\begin{equation*}
C_{t o t} f_{s w} /\left(1-\frac{1}{2^{N}}\right)^{-2}=C_{t o t} f_{s w}\left(\frac{2^{N}}{2^{N}-1}\right)^{2} \tag{4.1}
\end{equation*}
$$

when assuming the output is a DC voltage, where $C_{t o t}$ is the size of total flying capacitors, $f_{s w}$ the switching frequency, and N the number of stages. [29] According to this formula, the converter has smaller output conductance as the number of stages increase, but it has a lower bound of $C_{t o t} f_{s w}$, meaning that even the converter with any small desired granularity can have practically acceptable conductance and efficiency.

Despite its conductance improvement in [29] from [22], this design still provides less output conductance than previous works offering only a small fixed number of ratios [24], [27], [28], such as $1 / 3$ and $2 / 5$, because these ratios cannot be represented with simple finite binary fraction, so large number of stages are required to approximate those ratios and the output conductance becomes worse according to (4.1). [30] suggested a technique called "charge feedback" to add a few more ratios with the ratio of denominator 24 , but it failed to achieve output
conductance as high as previous work due to its use of inefficient ladder structure. In addition to that, different from binary techniques this simple tweak does not propose a way to exponentially expand the number of reconfigurable ratios of a converter.

### 4.1.3 Proposed technique to generate arbitrary rational ratio

This dissertation presents an SC DC-DC converter that can be reconfigured to have any arbitrary rational conversion ratio: $\mathrm{p} / \mathrm{q}, 0<\mathrm{p}<\mathrm{q} \leq 2^{\mathrm{N}}+1$. The key idea of the design, which we refer to as a rational DC-DC converter, is to incorporate negative voltage feedback into the cascaded converter stages using negative-generating converter stages ("voltage negators"); this enables reconfiguring of both the numerator p and denominator q of the conversion ratio. Contrary to the current loss and conductance degradation due to the charge feedback technique in [30], with help from the current supply of the voltage negators, output conductance becomes comparable to conventional few-ratio SC DC-DC designs. Hence, the proposed design achieves a resolution higher than previous binary SC converters while maintaining the conversion efficiency of dedicated few-ratio SC converters.

### 4.2 Rational DC-DC Converter

### 4.2.1 Structure of the rational DC-DC converter

Figure 4.3 shows the structure of the recursive converter in a different form, for its comparison to the proposed rational converter. In the rational converter, each stage has a $2: 1 \mathrm{SC}$
converter that receives one input from the previous stage's output, and the other from a power supply rail, either VDD or VSS. Since these circuit each $2: 1$ converter has $1 / 2$ voltage gain from input to output, changing supply voltage at a stage far away from the output has an exponentially smaller impact than ones near the output, resulting in binary ratio tuning.


Figure 4.3 Recursive DC-DC converter redrawn

Solving in a mathematical form, the output voltage of the converter $V_{\text {OUT }}$ is

$$
\begin{align*}
V_{\text {OUT }}= & V_{N}=\frac{1}{2}\left(V D D \times a_{N}+V_{N-1}\right)=\frac{1}{2}\left(V D D \times a_{N}+\frac{1}{2}\left(V D D \times a_{N-1}+V_{N-2}\right)\right) \\
& =\cdots  \tag{4.2}\\
= & V D D \times\left(\frac{1}{2} a_{N}+\frac{1}{4} a_{N-1}+\cdots+\frac{1}{2^{N}} a_{1}\right)=V D D \times\left(0 . a_{N} a_{N-1} \cdots a_{1}\right)_{(2)}
\end{align*}
$$

where (0. $\left.a_{N} a_{N-1} \cdots a_{1}\right)_{(2)}$ represents a binary fraction.


Figure 4.4 Structure of the proposed rational converter.

As shown in Figure 4.4, one input of each 2:1 SC downconverter is connected to the output of the previous stage in a rational converter, as in the binary converter. However, the other input is chosen among the supply rails as well as a set of negative feedback voltages, $-\mathrm{V}_{\mathrm{OUT}}, \mathrm{V}_{\mathrm{DD}}-\mathrm{V}_{\text {out }}$, and $2 \mathrm{~V}_{\mathrm{DD}}-\mathrm{V}_{\text {Out }}$ so that $\mathrm{V}_{\text {out }}$ is determined by an equation $\mathrm{V}_{\text {OUT }}=\mathrm{A} \times \mathrm{V}_{\mathrm{DD}}-\mathrm{B} \times \mathrm{V}_{\text {OUT }}$, where A and $B$ are referred to as the converter's forward path gain and feedback factor, respectively.

In this structure, negative voltage feedback enables three extra choices for each stage, increasing the number of combinations and thus its reconfigurability - this allows the converter to be reconfigured in an algorithmic way to any rational conversion ratio $\mathrm{p} / \mathrm{q}, 0<\mathrm{p}<\mathrm{q} \leq 2^{\mathrm{N}}+1$, where N is the maximum number of 2:1 stages. In addition, the negating converters provide extra current into the output terminal, improving overall converter output conductance. For any rational conversion ratio, the normalized conductance of the rational converter is provably no smaller than previous SC converters including fixed-ratio converters. In addition, switching loss in the rational converter matches previous best reported SC converters at many ratios and hence leads to similar or better overall efficiency.

### 4.2.2 Operation of the rational DC-DC converter

Figure 4.5 describes operation of the rational converter in more detail using an example where the conversion ratio is set to $p / q=4 / 13$. First, to generate this ratio, the number of stages $N$ is set by p and q to be three as $4 / 13$ can be represented as a ratio of two binary fractions with three digits after the binary point, $0.100_{(2)} / 1.101_{(2)}$. The numerator of this ratio becomes the forward path gain A , and the denominator minus one, $0.101_{(2)}$, becomes the feedback factor B . The input supply voltage of each stage is selected by the corresponding digits in the binary representation of A and B, i.e., ai and bi. Specifically the ith converter stage uses the ith bit from the right in A or B and selects an input voltage equal to $\mathrm{ai} \times \mathrm{V}_{\mathrm{DD}}-\mathrm{bi} \times \mathrm{V}_{\text {OUT }}$, which effectively gives the four options of $\mathrm{V}_{\mathrm{DD}}, \mathrm{V}_{\mathrm{SS}}, \mathrm{V}_{\mathrm{DD}}-\mathrm{V}_{\text {OUt }}$, and $-\mathrm{V}_{\text {OUt. }}$. Then, overall output voltage $V_{\text {OUT }}$ becomes

$$
\begin{gather*}
V_{\text {OUT }}=V_{N}=\frac{1}{2}\left(V D D \times a_{N}-V_{\text {OUT }} \times b_{N}+V_{N-1}\right) \\
=\frac{1}{2}\left(V D D \times a_{N}-V_{\text {OUT }} \times b_{N}+\frac{1}{2}\left(V D D \times a_{N-1}-V_{\text {OUT }} \times b_{N-1}+V_{N-2}\right)\right)=\cdots \\
=V D D \times\left(\frac{1}{2} a_{N}+\frac{1}{4} a_{N-1}+\cdots+\frac{1}{2^{N}}\left(a_{1}+a_{L}\right)\right)  \tag{4.3}\\
\quad-V_{\text {OUT }} \times\left(\frac{1}{2} b_{N}+\frac{1}{4} b_{N-1}+\cdots+\frac{1}{2^{N}}\left(b_{1}+b_{L}\right)\right) \\
=V D D \times\left(0 . a_{N} a_{N-1} \cdots\left(a_{1}+a_{L}\right)\right)_{(2)}-V_{\text {OUT }} \times\left(0 . b_{N} b_{N-1} \cdots\left(b_{1}+b_{L}\right)\right)_{(2)}
\end{gather*}
$$

If we define $A$ and $B$ as

$$
\begin{equation*}
A \equiv\left(0 . a_{N} a_{N-1} \cdots\left(a_{1}+a_{L}\right)\right)_{(2)} \tag{4.4}
\end{equation*}
$$

and

$$
\begin{equation*}
B \equiv\left(0 . b_{N} b_{N-1} \cdots\left(b_{1}+b_{L}\right)\right)_{(2)} \tag{4.5}
\end{equation*}
$$



Figure 4.5 Example configuration for $4 / 13$ conversion ratio $(\mathrm{A} \leq 1)$
$V_{\text {OUT }}$ can be represented as

$$
\begin{equation*}
V_{O U T}=A \times V D D-B \times V_{O U T}=\frac{A}{1+B} V D D . \tag{4.6}
\end{equation*}
$$

Therefore, the overall conversion ratio becomes

$$
\begin{equation*}
\text { Ratio }=\frac{V_{\text {OUT }}}{V D D}=\frac{A}{1+B}=\frac{p}{1+(q-1)}=\frac{p}{q}, \tag{4.7}
\end{equation*}
$$

which is the same as the ratio that is first set $(\mathrm{p} / \mathrm{q})$.

Therefore, in this manner the converter can be configured for any desired ratio $\mathrm{p} / \mathrm{q}$, provided A is less than or equal to 1 . Otherwise, it is impossible to make A that is greater than 1 by only changing ai parameters, where another configuration is used to generate desired ratio.

For A greater than 1 , the voltage negators are reconfigured to generate $\mathrm{V}_{\mathrm{DD}}-\mathrm{V}_{\text {Out }}$ and $2 \mathrm{~V}_{\mathrm{DD}}-\mathrm{V}_{\text {OUt }}$. For example, when the conversion ratio $\mathrm{p} / \mathrm{q}$ is $9 / 11$ as shown in Figure 6, N is set to three as $9 / 11=1.001_{(2)} / 1.011_{(2)}$, and A is $1.001_{(2)}$ and B is $0.011_{(2)}$ accordingly. With the change in voltage negator configuration, the voltage selection signal for forward path gain is also changed into a new value

$$
\begin{equation*}
A^{\prime} \equiv A-B, \tag{4.8}
\end{equation*}
$$

which is always less than 1 if $p<q$ because

$$
\begin{equation*}
A^{\prime}=A-B=\frac{p}{2^{N}}-\left(\frac{q}{2^{N}}-1\right)=1-\frac{q-p}{2^{N}}<1 \tag{4.9}
\end{equation*}
$$

and always greater than 0 if $\mathrm{A}>1$ because

$$
\begin{equation*}
A^{\prime}=A-B>A-1>0 . \tag{4.10}
\end{equation*}
$$

To compensate for the reduction in forward path gain by B , extra $\mathrm{V}_{\mathrm{DD}}$ is added whenever bi is 1 by selecting a'i $\times \mathrm{V}_{\mathrm{DD}}-\mathrm{bi} \times\left(\mathrm{V}_{\text {out }}-\mathrm{V}_{\mathrm{DD}}\right)$ among $\mathrm{V}_{\mathrm{DD}}, \mathrm{V}_{\mathrm{SS}}, \mathrm{V}_{\mathrm{DD}}-\mathrm{V}_{\text {out }}$, and 2VDD$-\mathrm{V}_{\text {out }}$. Then, overall output voltage $V_{\text {OUT }}$ becomes

$$
\begin{gather*}
V_{\text {OUT }}=V_{N}=\frac{1}{2}\left(V D D \times{a^{\prime}}_{N}-\left(V_{\text {OUT }}-V D D\right) \times b_{N}+V_{N-1}\right)=\cdots \\
=V D D \times\left(\frac{1}{2} a_{N}^{\prime}+\frac{1}{4}{a_{N-1}^{\prime}}^{\prime}+\cdots+\frac{1}{2^{N}}\left({a^{\prime}}_{1}+a_{L}^{\prime}\right)\right)-\left(V_{\text {OUT }}\right. \\
-V D D) \times\left(\frac{1}{2} b_{N}+\frac{1}{4} b_{N-1}+\cdots+\frac{1}{2^{N}}\left(b_{1}+b_{L}\right)\right)  \tag{4.11}\\
=V D D \times\left(0 . a_{N}^{\prime} a_{N-1}^{\prime} \cdots\left(a_{1}^{\prime}+{a^{\prime}}_{L}\right)\right)_{(2)}-\left(V_{\text {OUT }}-V D D\right) \times\left(0 . b_{N} b_{N-1} \cdots\left(b_{1}+\right.\right. \\
\left.\left.b_{L}\right)\right)_{(2)}
\end{gather*}
$$

If we define $A^{\prime}$ as

$$
\begin{equation*}
A^{\prime} \equiv\left(0 \cdot a_{N}^{\prime} a_{N-1}^{\prime} \cdots\left(a_{1}^{\prime}+a_{L}^{\prime}\right)\right)_{(2)} \tag{4.12}
\end{equation*}
$$

$V_{\text {OUT }}$ can be represented as

$$
\begin{align*}
V_{\text {OUT }}=A^{\prime} \times V D D & -B \times\left(V_{\text {OUT }}-V D D\right)=\left(A^{\prime}+B\right) \times V D D-B \times V_{\text {OUT }} \\
& =\frac{A^{\prime}+B}{1+B} V D D . \tag{4.13}
\end{align*}
$$

Therefore, the overall conversion ratio becomes

$$
\begin{equation*}
\text { Ratio }=\frac{V_{\text {out }}}{V D D}=\frac{A^{\prime}+B}{1+B}=\frac{(A-B)+B}{1+B}=\frac{A}{1+B}=\frac{p}{q}, \tag{4.14}
\end{equation*}
$$

achieving the ratio same as desired.


Figure 4.6 Example configuration for $9 / 11$ conversion ratio ( $\mathrm{A}>1$ )

In the example case of $\mathrm{p} / \mathrm{q}=9 / 11$ in Figure $4.6, \mathrm{~A}^{\prime}$ becomes $\mathrm{A}-\mathrm{B}=0.110_{(2)}$, which is actually realized in the converter by setting $a^{\prime} L=1, a^{\prime} 1=1, a^{\prime} 2=0$ and $a^{\prime} 3=1$ because this configuration offers lower bottom-plate parasitic switching loss than setting a' $\mathrm{L}=0$, and $\mathrm{a}^{\prime} 1$, $\mathrm{a}^{\prime} 2$,
and a' 3 to $0,1,1$, respectively. By combining these two general configuration scheme, the converter can be configured into any rational conversion ratio.

### 4.2.3 Performance analysis of the rational DC-DC converter

As shown in Table 4.1, the rational converter offers many more conversion ratios due to both numerator and denominator being selectable, and this number increases faster than binary converters as more stages are cascaded. Many of these non-binary ratio configurations have higher conductance than binary configurations for similar voltages, and thus, lower conduction loss.

Table 4.1 Comparison of the number of configurable ratios in rational and binary converters.

| Max. <br> Stages $^{1}$ | This work (Rational) |  | Conventional Binary <br> (Recursive) |  |
| :---: | :---: | :---: | :---: | :---: |
|  | Ratios | Stage size <br> granularity $^{2}$ | Ratios | Stage size <br> granularity $^{2}$ |
| $\mathbf{1}$ | 5 | $1 / 3$ | 1 | $1 / 1$ |
| $\mathbf{2}$ | 21 | $1 / 7$ | 3 | $1 / 3$ |
| $\mathbf{3}$ | 79 | $1 / 15$ | 7 | $1 / 7$ |
| $\mathbf{4}$ | 323 | $1 / 31$ | 15 | $1 / 15$ |
| $\mathbf{5}$ | 1259 | $1 / 63$ | 31 | $1 / 31$ |
| $\mathbf{6}$ | 5021 | $1 / 127$ | 63 | $1 / 63$ |
| $\mathbf{7}$ | 19947 | $1 / 255$ | 127 | $1 / 127$ |

1. Number of main stages w/o negators for rational
2. Required stage size granularity for optimum conductance

For every configuration, the rational converter has an output conductance of

$$
\begin{equation*}
C_{t o t} f_{s w}\left(\frac{q}{q-1}\right)^{2} \tag{4.15}
\end{equation*}
$$

when assuming the output is a DC voltage, which marks the best conductance among SC converters that do not include inductors. The recursive converter has the same output conductance for its available ratios because

$$
\begin{equation*}
C_{t o t} f_{s w}\left(\frac{2^{N}}{2^{N}-1}\right)^{2}=C_{t o t} f_{s w}\left(\frac{q}{q-1}\right)^{2} \tag{4.16}
\end{equation*}
$$

for $q=2^{N}$. However, it is impossible for the recursive converter to generate ratios with small nonbinary denominators, such as $1 / 3$ and $2 / 5$, which have good output conductance according to (4.15).

Figure 4.7 depicts and compares conduction loss for available ratios of the two converters. Furthermore, the flexibility in selecting $a_{L}$ and $b_{L}$ in the first stage can be exploited to reduce bottom plate swing in many conversion ratios, further lowering bottom plate switching loss as shown in Figure 4.8.

Therefore, a rational converter guarantees higher or equal efficiency relative to a binary converter over the entire output voltage range. Figure 4.9 shows efficiency curves of the rational and recursive converters versus conversion ratio, assuming each $2: 1 \mathrm{SC}$ converter stage has $90 \%$ conversion efficiency due to switching loss, and as shown in the figure, this statement holds even assuming infinite stages in the binary converter since most added ratios in this case offer poor efficiency arising from increased conduction loss.


Figure 4.7 Conduction loss vs. conversion ratio of the 4 -stage rational converter with comparison to the recursive converter.


Figure 4.8 Switching loss vs. conversion ratio of the 4 -stage rational converter with comparison to the recursive converter.


Figure 4.9 Switching loss vs. conversion ratio of the 4-stage rational converter with comparison to the recursive converter.

### 4.3 Chip Fabrication and Measurement

### 4.3.1 Test Chip Fabrication

To test the performance of this converter and fairly compare it with other previous converters, a generally reconfigurable SC converter is designed as shown in Figure 4.10. It consists of 15 identical unit converters that can form into an up to 4-stage binary converter with 15 ratio configurations ( $\mathrm{p} / 24,0<\mathrm{p}<24$ ), a few-ratio converter with $1 / 3$ and $2 / 5$ ratios, or an up to 3 -stage rational converter with 79 ratio configurations ( $\mathrm{p} / \mathrm{q}, 0<\mathrm{p}<\mathrm{q} \leq 24$ ), with relative sizing among stages for optimal normalized conductance.


Figure 4.10 Structure of a general reconfigurable DC-DC converter.

The unit converter is a 2-phase SC converter with four terminals. Each terminal can be connected to arbitrary voltage rails including $\mathrm{V}_{\mathrm{DD}}, \mathrm{V}_{\mathrm{SS}}$, $\mathrm{V}_{\text {OUT }}$, negative feedback voltages, and three intermediate voltages for inter-stage connections. Despite the large number of reconfiguration switches, they do not impact efficiency as they all form connections among DC voltages and hence do not contribute additional switching loss.

The unit converter can be configured as a $2: 1$ by shorting $V_{H L}$ and $V_{\text {LH }}$, or a voltage negator generating $\mathrm{V}_{\mathrm{DD}}-\mathrm{V}_{\text {OUT }}$ by connecting each of the four terminals to $\mathrm{V}_{\mathrm{DD}}, \mathrm{V}_{\mathrm{DD}}-\mathrm{V}_{\text {OUT }}$, $\mathrm{V}_{\text {OUT }}$, and $\mathrm{V}_{\text {SS }}$. Voltage negators generating $-\mathrm{V}_{\text {OUT }}$ and $2 \mathrm{~V}_{\mathrm{DD}}-\mathrm{V}_{\text {OUT }}$ uses the same configuration as $2: 1$ downconverter, but the negated outputs are generated at different terminals. (Figure 4.11)


Figure 4.11 Structure of voltage negators.

A test chip including the general reconfigurable DC-DC converter described above was fabricated in 180nm CMOS. (die photo in Figure 4.12). The fabricated converter includes a total flying capacitance of 1.8 nF ( 15 unit converters $\times 0.12 \mathrm{nF}$ per each unit converter)


Figure 4.12 Die micrograph of the test chip.

### 4.3.2 Measurement

To test the fabricated chip, an input voltage of 2 V is applied. Clock buffer $\mathrm{V}_{\mathrm{DD}}$ is set 1 V , and the converter's switching frequency is set 100 kHz . As shown in Figure 4.13, the rational DCDC converter has more ratios and higher conversion efficiency than binary converters, which is
consistent with theoretical calculations in Figure 4.9. The converter shows $95 \%$ peak conversion efficiency for $V_{\text {Out }}$ of $1.83 \mathrm{~V},>90 \%$ efficiency over a range of Vout from 1.1 V to 1.86 V , and $>80 \%$ efficiency over a wide $0.47-1.87 \mathrm{~V}$ Vout range.


Figure 4.13 Measured efficiency vs. Vout of the rational and recursive converters. Ratios for optimum efficiency between $2 / 3$ and $15 / 16$ for the rational converter are noted as examples.

Figure 4.14 compares the output conductance at $2 / 3$ configuration of the rational converter with the most similar ratio configuration of the binary converters, $11 / 16$, showing that $2 / 3$ configuration has higher output conductance, and thus, better efficiency. When compared to some previous few-ratio converters' configurations [27], [28] the rational converter shows similar or better conductance and efficiency for $1 / 3$ (Figure 4.15) and 2/5 (Figure 4.16).

Table 4.2 summarizes rational converter performance and compares it to previous related work.


Figure 4.14 Output conductance comparison among rational, SAR, and recursive converters at ratios around $2 / 3$.


Figure 4.15 Output conductance comparison at $1 / 3$ conversion ratio.


Figure 4.16 Output conductance comparison at $2 / 5$ conversion ratio.

Table 4.2 Performance summary and comparison.

|  | This Work | SAR [17] | Recursive [24] | Salem VLSI 15 [25] | $\begin{aligned} & \text { Le ISSCC'13 } \\ & \text { [22] } \end{aligned}$ | $\begin{gathered} \text { Jiang } \\ \text { ISSCC'15 [23] } \end{gathered}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Technology | $0.18 \mu \mathrm{~m} \mathrm{CMOS}$ | $0.18 \mu \mathrm{~m}$ CMOS | $0.25 \mu \mathrm{~m}$ CMOS | $0.25 \mu \mathrm{~m}$ CMOS | 65 nm CMOS | 65nm CMOS |
| Fully Integrated | Yes | Yes | Yes | No | Yes | Yes |
| Reconfigurability Type | Rational | Binary | Binary | Gear Train + Charge Feedback | Fixed (1/3, 2/5) | Fixed (1/3, 1/4) |
| All-Ratio Reconfigurability | Yes | No | No | No | No | No |
| Number of stages | $3+$ voltage negators | $5+4: 1 \mathrm{SC}$ | 4 | $4+1$ | N/A | N/A |
| Number of Configurations | 79 | 117 | 15 | 24 | 2 | 2 |
| Input Voltage | 2 V | $3.4 \mathrm{~V}-4.3 \mathrm{~V}$ | 2.5 V | $2.5 \mathrm{~V}-5 \mathrm{~V}$ | $3 \mathrm{~V}-4 \mathrm{~V}$ | $1.5-2.5 \mathrm{~V}$ |
| Output Voltage | $\begin{aligned} & 1.1 \mathrm{~V}-1.86 \mathrm{~V} @>90 \% ~ n \\ & 0.47-1.87 \mathrm{~V} @>80 \% ~ n \end{aligned}$ | 0.9V-1.5V | 0.1V-2.18V | $0.2 \mathrm{~V}-2 \mathrm{~V}$ | 1V | 0.4-0.7V |
| Peak Efficiency | 95\% | 72\% | 85\% | 95.5\% | 74.3\% | 79.5\% |
| Power Density @ $\eta_{\text {peak }}$ | $3.77 \mu \mathrm{~A} / \mathrm{mm}^{2} 1$ | $6 \mu \mathrm{~A} / \mathrm{mm}^{2}{ }^{2}$ | $0.43 \mathrm{~mA} / \mathrm{mm}^{2}$ | $2.88 \mathrm{~mA} / \mathrm{mm}^{2}{ }^{2,3}$ | $0.19 \mathrm{~mA} / \mathrm{mm}^{2}$ | $56 \mathrm{~mA} / \mathrm{mm}^{2}$ |

1. Measured results has low power density because the chips are tested at much lower frequency than their maximum frequency. The main purpose of the test is to fairly compare effectiveness of various configurations under the same wellcontrolled conditions.
2. Estimated number from the paper.
3. With off-chip capacitors.

### 4.4 Conclusion

A way of configuring a SC DC-DC converter into any rational conversion ratios is presented. Negative output feedback helps maintain the conversion efficiency comparable to conventional few-ratio SC DC-DC designs. Theoretical calculation and measures results are wellmatched, showing its efficiency is superior to conventional binary-reconfigurable structures.

## CHAPTER 5

## Fully Digital Capacitance-to-Digital Converter using Iterative Delay-Chain Discharge

### 5.1 Introduction

Capacitance sensors are widely used to measure various physical quantities, including position, pressure, and concentration of certain chemicals [31]-[36]. Integrating capacitive sensors into a small wireless sensor system is challenging due to their large power consumption relative to the system total power/energy budget, which can be as low as a few nW [34]. Typical Capacitance-to-Digital Converters (CDCs) use charge sharing or charge transfer between capacitors to convert the sampled capacitance to voltage, which is then measured with an ADC [31]-[36]. This approach requires complex analog circuits, such as amplifiers and separate ADCs, increasing design complexity and often increasing power consumption. Moreover, the initial capacitance to voltage conversion essentially limits the input capacitance range because of output voltage saturation. This paper presents a fully-digital CDC that is based on the observation that when a ring-oscillator (RO) is powered from a charged capacitance, the number of RO cycles to discharge the capacitance to a fixed voltage is naturally linear with the capacitance value. This observation enables a simple, fully digital conversion scheme that is inherently linear over a wide range.

### 5.2 Structure of Proposed CDC

### 5.2.1 Basic Operation Scheme

Figure 5.1 explains the proposed conversion method. The top node of sensed capacitor CT is directly connected to the supply node of a ring oscillator. This node is initially charged to VHIGH, and is then discharged gradually as the inverter RO oscillates. As signals in the RO transition, the RO draws a small charge from the sensed capacitor CSENSE, gradually lowering $\mathrm{V}_{\mathrm{CT}}$. As a result the RO propagation delay increases, which is compared to a constant delay reference. The RO transition count until the period delay becomes longer than the reference delay is recorded by a counter, which becomes the output code Dout.


Figure 5.1 Basic structure of the proposed CDC.

Since RO delay only depends on $\mathrm{V}_{\mathrm{CT}}$ (neglecting noise initially), Dout is equal to the number of RO transitions while $\mathrm{V}_{\mathrm{CT}}$ is discharged from $\mathrm{V}_{\mathrm{HIGH}}$ to a constant voltage, $\mathrm{V}_{\text {Low }}$. As shown in Fig. 2, during conversion, at any particular $\mathrm{V}_{\mathrm{CT}}$ value, the amount of charge withdrawn per RO transition only depends on $\mathrm{V}_{\mathrm{CT}}$ at that time. Therefore, the number of transitions required
to reduce $\mathrm{V}_{\mathrm{CT}}$ by a certain small voltage is proportional to input capacitance $\mathrm{C}_{\text {SENSE }}$. As this is true at any $\mathrm{V}_{\mathrm{CT}}$ level, the output code $\mathrm{D}_{\text {Out }}$, the sum of transition counts across all continuous small intervals from $V_{\text {high }}$ to $V_{\text {Low, }}$, is also proportional to Csense. As the RO draws charge directly from CSENSE without initial capacitance to voltage conversion, the CDC input capacitance range is essentially unlimited, constrained only by the counter size. This is desirable when the CSENSE range is uncertain at design time. Furthermore, energy used to charge CSENSE is reused to oscillate the RO, reducing overall power consumption.


Figure 5.2 Basic operation scheme of the proposed CDC.

### 5.2.2 Detailed Implementation

Figure 5.3 shows the detailed implementation of the CDC circuit and its operation. Here an inverter chain is used in place of an RO to discharge CSENSE - it is a 16-stage chain that is identical to the reference delay generator. Because of the identical structures, conversion stops when $\mathrm{V}_{\mathrm{CT}}$ drops below $\mathrm{V}_{\text {Low }}$. The number of stages in the inverter chain is chosen for optimal SNR per conversion energy, where the energy to charge CSENSE is balanced with the energy consumed by other blocks.


Delay Comparator

Figure 5.3 Detailed implementation of the CDC.

The two propagation delays are compared by three delay comparators, which have a similar structure to an RS latch. The bottom comparator compares the propagation delay of falling edges,
and the middle one compares the rising edges. Whenever the reference delay is shorter than the CSENSE discharge delay chain, the comparators output pulses once, increasing counts stored in the sub1 and sub2 counters. A third counter tracks the main oscillation triggering signal. After each comparison, the next edge generator block triggers the next discharge and delay comparison, maintaining oscillation. All blocks except the C Cense delay chain operate at $\mathrm{V}_{\text {Low, }}$, and a level converter drives the two delay chain inputs with $V_{\text {high }}$


Figure 5.4 Detailed timing diagram of the CDC.

As shown in the timing diagram of Figure 5.4, conversion starts by precharging Csense to $V_{\text {high. }}$ This is followed by Sense rising, triggering the first edge to propagate through the two delay chains. The top comparator takes in a slightly delayed version of the reference delay and determines when to finish the overall conversion, which occurs when $\mathrm{V}_{\mathrm{CT}}$ becomes lower than
$\mathrm{V}_{\text {Low }}$ by some margin. As $\mathrm{V}_{\mathrm{CT}}$ approaches $\mathrm{V}_{\text {Low }}$, the bottom two delay comparators pulse $\mathrm{CK}_{1}$ and $\mathrm{CK}_{2}$. They initially pulse sporadically, due to noise, and then more frequently as $\mathrm{V}_{\mathrm{CT}}$ crosses Vlow. Just before conversion finishes, these two comparators pulse every cycle. When the top comparator pulses Finish, Sense is turned off and oscillation stops. Final Dout is the total count of comparator outputs for which $\mathrm{V}_{\mathrm{CT}}>\mathrm{V}_{\mathrm{LOW}}$, and is calculated as $2 \times \mathrm{D}_{\text {MAIN }}-\left(\mathrm{D}_{\text {SUB1 }}+\mathrm{D}_{\text {SUB2 } 2}\right)$.

The use of three comparators is designed to increase SNR by averaging noise over many comparisons when $\mathrm{V}_{\mathrm{CT}}$ is near $\mathrm{V}_{\text {LOw. }}$. Comparing both rising and falling edges doubles the number of comparisons. By extending the conversion to where $\mathrm{V}_{\mathrm{CT}}$ falls some margin below $\mathrm{V}_{\text {Low }}$, comparisons are performed through the whole noisy region around $\mathrm{V}_{\mathrm{LO}}$, whereby false " $\mathrm{V}_{\mathrm{CT}}<$ VLow" decisions above $V_{\text {LOW }}$ are stochastically compensated by false " $V_{\text {CT }}>V_{\text {Low" }}$ decisions below $V_{\text {Low. }}$ Simulation shows that energy increases by $3 \%$ compared to the standard approach of stopping conversion immediately after the first comparison triggers, while overall conversion noise is square-rooted. In addition, the distribution of Dout using this scheme is centered at the number of exact counts from $V_{\text {HIGH }}$ to $V_{\text {LOW, }}$, thereby improving output code linearity.

### 5.2.3 Parasitic Capacitance Cancelation

The CDC measures the capacitance between one input node and ground, but several applications require the capacitance value between two input nodes excluding parasitic capacitance to ground. We accomplish this through three conversions, as shown in Figure 5.5, First, node B is connected to ground and the capacitance between node A and ground is measured, which includes parasitic capacitance $\mathrm{C}_{\mathrm{PA}}$. Second, nodes A and B are flipped and $\mathrm{C}_{\text {SENSE }}+\mathrm{C}_{\mathrm{PB}}$ is measured. Finally, both A and B nodes are connected to $\mathrm{V}_{\mathrm{CT}}$ to measure $\mathrm{C}_{\mathrm{PA}}+\mathrm{C}_{\mathrm{PB}}$. By adding the first two codes and subtracting the third, the parasitic capacitance is canceled out. While this requires three
conversions, parasitic capacitance typically remains unchanged or changes slowly and the parasitic cancelation can be performed infrequently, amortizing its overhead.


Figure 5.5 Technique for parasitic capacitance cancelation.

### 5.2.4 Output Code Calibration

The output code varies as temperature or supply voltage changes. This code deviation is removed by one-point calibration. In a calibration phase, $\mathrm{V}_{\mathrm{CT}}$ is connected to an internal reference capacitor with known capacitance $\mathrm{C}_{\text {REF }}$ and the ratio of $\mathrm{C}_{\text {REF }}$ to corresponding Dout is stored. In subsequent normal conversion, digital output codes are converted to actual capacitance value by multiplying the code and the stored ratio. If the supply voltage changes sufficiently slowly, this calibration can be re-done occasionally.

### 5.3 Chip Fabrication and Measured Results

The CDC is fabricated in 40 nm CMOS and tested with $\mathrm{V}_{\mathrm{HIGH}}=1.0 \mathrm{~V}$ and $\mathrm{V}_{\text {Low }}=0.45 \mathrm{~V}$. As shown in the die micrograph in Figure 5.6, Core circuit area without testing circuits and internal
capacitors is 0.0017 mm 2 . This small area comes from simplicity of the CDC core circuit which consists of only a few hundreds of logic gates.


Figure 5.6 Die micrograph of the 40 nm CMOS test chip.

Figure 5.7 shows the test chip has a very wide input capacitance range from 0.7 pF to 10 nF with a small linearity error of $<0.06 \%$. Measured output noise percentage reduces as CSENSE increases due to noise averaging. At 11.3 pF , the CDC has $0.109 \%$ resolution, 35.1 pJ total conversion energy (including both $\mathrm{V}_{\text {high }}$ and $\mathrm{V}_{\text {Low }}$ ), and $141 \mathrm{fJ} / \mathrm{c}-\mathrm{s}$ FoM. FoM increases monotonically with the sensed capacitance. Figure 5.8 shows output code sensitivity to temperature improves by $145 \times$ (from $2247 \mathrm{ppm} /{ }^{\circ} \mathrm{C}$ to $15.5 \mathrm{ppm} /{ }^{\circ} \mathrm{C}$ ) due to calibration.


Figure 5.7 Measured CDC resolution and linearity error.


Figure 5.8 Measured CDC temperature sensitivity before and after calibration.

Results with an actual pressure sensor (Figure 5.9) demonstrate 1.39 mmHg resolution with parasitic cancelation. As shown in the result, the parasitic capacitance of this pressure sensor shows much less sensitivity to its parasitic capacitance than its sensed capacitance, justifying infrequent parasitic capacitance cancelation suggested in Section 5.2.3. Table 5.1 summarizes CDC performance and compares it with prior work.


Figure 5.9 Measured results with capacitive pressure sensor with parasitic cancelation.

Table 5.1 Performance summary and comparison.

|  | [26] JSSC 09 | [27] ISSCC 14 | [28] VLSI 14 | [30] JSSC 13 | [31] JSSC 12 | This work |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Technology | $\begin{aligned} & 1.5 \mu \mathrm{~m} \\ & \text { CMOS } \end{aligned}$ | $0.18 \mu \mathrm{~m}$ CMOS | $0.18 \mu \mathrm{~m}$ CMOS | $0.16 \mu \mathrm{~m}$ CMOS | $0.35 \mu \mathrm{~m}$ CMOS | 40nm CMOS |
| Method | CDS + <br> Cyclic ADC | CDS + SAR | SAR $+\Delta \Sigma$ | $\Delta \Sigma$ | Period <br> Modulation | Iterative Delay-Chain Discharge |
| Input range | N/R | $2.5-75.3 p F$ | $0-24 \mathrm{pF}$ | $0.54-1.06 \mathrm{pF}$ | N/R | 0.7pF - 10nF |
| Resolution | 75aF | 6.0fF | 0.16fF | 70aF | N/R | $\begin{gathered} 0.109 \%{ }^{1} \\ (12.3 \mathrm{fF}) \end{gathered}$ |
| Meas. Time | 0.5 ms | 4 ms | $230 \mu \mathrm{~s}$ | 0.8 ms | 7.6 ms | $19.06 \mu \mathrm{~s}^{1}$ |
| Power | $36 \mu \mathrm{~W}{ }^{5}$ | 160nW | $33.7 \mu \mathrm{~W}$ | $10.3 \mu \mathrm{~W}^{5}$ | $211 \mu \mathrm{~W}^{5}$ | $1.84 \mu \mathrm{~W}^{1}$ |
| Conversion Energy ${ }^{2}$ | 18nJ | 640pJ | 7.75nJ | 8.26nJ | $1.61 \mu \mathrm{~J}$ | $35.1 \mathrm{pJ}{ }^{1}$ |
| FoM ${ }^{3}$ (fJ/c-s) | 22000 | 181 | 175 | 3900 | 139000 | $141^{1,4}$ |

1 Measured when sensing 11.3pF capacitance w/o parasitic cancelation or calibration.
2 Conversion Energy = Power * (Meas. Time)
3 FoM $=\left(\right.$ Conversion Energy) $/ 2^{(20 \log (\text { Input range } / 2 \text { Sqrt(2) } / \text { Resolution) }-1.76) / 6.02}$
4 Input range is assumed to be $0.7 \mathrm{pF}-11.3 \mathrm{pF}$ for this calculation
5 Estimated number from the paper
N/R: Not reported

### 5.4 Conclusion

A wide range fully-digital CDC with low conversion energy and FoM is presented. These benefits essentially come from its adopting new conversion method using iterative delay-chain discharge. This approach does not require complex analog circuits which consumes extra power, while retaining high linearity over a wide range which is theoretically unlimited.

## CHAPTER 6

## Edge-Pursuit Comparator: An Energy-Scalable Oscillator Collapse-Based Comparator

### 6.1 Introduction

Comparators are widely used in many applications such as voltage regulation, brown-out detection and analog-to-digital conversion. In many of these applications, the performance of the entire circuit relies directly on the comparator's performance as the comparator plays a key role. A high resolution SAR ADC is a good example, which needs an especially low-noise comparison to distinguish voltages that are very close for fine bit decisions requiring a large amount of energy that takes a significant portion of the total conversion energy.


Figure 6.1 Required energy for comparison vs. input difference.
(a) Conventional comparators wasting most energy for large input difference.
(b) Energy scaling saved wasted energy for comparison.

However, as depicted in Figure 6.1(a), while the actual energy requirement decreases sharply as the input signal difference becomes larger, conventional clocked comparators [37]-[39] usually consume nearly constant energy for each comparison since they are designed according to the most accurate and power-hungry comparison. Therefore, in these kinds of applications, adjusting the energy for comparison according to the input difference level can greatly help in reducing the total comparison energy (Figure 6.1(b)) as well as the overall energy consumption. For this reason, many prior works on SAR ADCs have presented techniques for comparator energy scaling [40][44], including dual ADC architectures which use two comparators for coarse and fine comparisons [45], [46], multiple repetitive comparisons for noise-critical bits [40], [41], [46], and time-domain comparators whose noise level can be modulated by changing the length of the delay lines [42]. However, these structures reduce the simplicity of the SAR structure by introducing overheads for extra control, increasing design and control complexity. They also have a limited number of energy scaling steps and a limited noise tuning range, making it difficult to benefit much from comparator energy scaling. In addition, some of these prior techniques require preprogrammed scaling by prediction, introducing additional inefficiencies from prediction misses.

This dissertation presents a ring oscillator collapse-based comparator, referred to as an edgepursuit comparator (EPC). The EPC automatically scales comparison energy according to its input difference without external control, tailoring comparison energy to each conversion. Wide-range energy scaling allows for saving a significant amount of energy for coarse comparisons. Phasedomain operation running for many cycles over the ring oscillator enables high resolution operation with a small load capacitance and area.

### 6.2 Structure and Operation of the Edge-Pursuit Comparator

Figure 6.2 shows the structure of the edge-pursuit comparator [43], which is composed of two NAND gates and inverter delay cells. The design is inspired by a physically unclonable function circuit that uses oscillator collapse to uniquely identify integrated circuits [47]. Here, the design is modified to serve as a comparator with differential inputs; the topology is shown below to be particularly well suited for use as a comparator.


Figure 6.2 Structure of the edge-pursuit comparator.

(a)

(c)

(b)

(d)

Figure 6.3 Operation of the edge-pursuit comparator.

(b)

Figure 6.4 Output of the EPC vs. time during comparison. Output waveform changes according to (a) the polarity of the $\left|\mathrm{V}_{\text {INP }}-\mathrm{V}_{\text {INM }}\right|$ (b) amount of the input signal difference.

Initially, the comparator is in the reset state as the signal START is low to disconnect the oscillation path, as shown in Figure 6.3(a). The comparator initiates a comparison when the signal START goes high simultaneously at both NAND gates (Figure 6.3(b)). This injects two propagating edges into the oscillator, which travel around the comparator (Figure 6.3(c)) until one overtakes the other, collapsing the oscillation (Figure 6.3(d)). Differential input signals ( $\mathrm{V}_{\text {INP }}, \mathrm{V}_{\text {INM }}$ ) are alternatively applied to both the top and bottom current-limiting transistors of the delay cells, modulating the pull-up and pull-down edge-propagation delays. The propagation delay of these two edges is controlled by mutually exclusive current-limiting transistors such that increasing $\mathrm{V}_{\text {INP }}$ causes one edge to propagate faster and the other to become slower (and vice versa for $\mathrm{V}_{\text {INM }}$ ). After one propagating edge overtakes the other edge, the oscillation collapses and the stage outputs settle to either VDD or GND, dictated by which edge was slower and hence overtaken (Figure 6.4(a)). The comparator output $C O M P$ is sampled from an internal stage that goes high when $\mathrm{V}_{\text {INP }}$ $>\mathrm{V}_{\text {INM }}$ and low otherwise. When the voltage difference between $\mathrm{V}_{\text {INM }}$ and $\mathrm{V}_{\text {INP }}$ is small, the two injected edges have similar propagation delays and the number of cycles required to make a decision automatically increases (Figure 6.4(b)). This filters out high frequency noise, as the design performs noise averaging over a longer period of time. On the other hand, for large voltage differences the oscillation inherently collapses quickly, limiting dynamic energy consumption for coarse comparisons. In this manner, the comparator naturally adjusts its energy dissipation without external control, and realizes both high accuracy and low power operation.

### 6.3 Analysis of Edge-Pursuit Comparator Performance

During a comparison, the edge-pursuit comparator operates similarly to a ring-oscillator. After it is triggered, two injected edges propagate with different speeds driven by different transistors, whose phase difference drifts until it shifts by $-\pi$ or $\pi$ compared to when the propagation started. Therefore, comparator noise can be estimated by analyzing the phase difference in the time- or phase-domain instead of voltage or current. To simplify the noise analysis, we reduce the circuit to be analyzed to the one shown in Figure 6.5. The NAND gates with the comparator clock are skipped as their propagation delay and jitter noise are much smaller than the other stages. Each current-limiting transistor is modeled as a noisy current source. Assuming the parasitic device capacitances are much smaller than the stage load capacitance $C_{L}$, the noise from transistors in the middle of the stack can be neglected, allowing these transistors to be modeled as simple noiseless switches that flip at $\sim V_{D D} / 2$.


Figure 6.5 Simplified delay cell model for noise estimation.


Figure 6.6 Operation of EPC in phase domain.

### 6.3.1 Operational Analysis in Phase Domain

We analyze the EPC behavior in the phase domain, with the basic concept illustrated in Figure 6.6. According to Abidi's analysis of ring oscillator noise in [48], the comparator period jitter variance is

$$
\begin{equation*}
\sigma_{\tau}^{2}=\frac{k T}{I f_{0}}\left(\frac{2}{V_{o v}}\left(\gamma_{N}+\gamma_{P}\right)+\frac{2}{V_{D D}}\right) \tag{6.1}
\end{equation*}
$$

Here $\tau$ is the oscillation period, $\sigma_{\tau}^{2}$ is the variance of the period jitter, $f_{0}$ is the oscillation frequency, and $V_{o v}$ is the overdrive voltage of the current-limiting transistors. Assuming that the period jitter is uncorrelated between the two propagating edges, the variance of the phase difference shift at a period is

$$
\begin{equation*}
\sigma_{\Delta \phi}^{2} \cong 2 \times(2 \pi)^{2} \frac{\sigma_{\tau}^{2}}{\tau^{2}}=8 \pi^{2} f_{0} \frac{k T}{I}\left(\frac{2}{V_{o v}}\left(\gamma_{N}+\gamma_{P}\right)+\frac{2}{V_{D D}}\right) \tag{6.2}
\end{equation*}
$$

In addition to noise, the phase difference shifts as the input voltage difference gives rise to a current difference for the two propagating edges. The average period difference between the two edges $\Delta_{\tau}$ is

$$
\begin{equation*}
\Delta \tau \cong \frac{\Delta I}{I} \tau=v_{i n} \frac{g_{m}}{I f_{0}}=\frac{v_{i n}}{V_{o v}} \frac{2}{f_{0}} \tag{6.3}
\end{equation*}
$$

where $v_{i n}$ is the input differential voltage and $g_{m}$ is the transconductance of the current-limiting transistors. Therefore, the average phase difference shift at a period $\mu_{\Delta \phi}$ is

$$
\begin{equation*}
\mu_{\Delta \phi} \cong 2 \pi \times \frac{\Delta \tau}{\tau}=4 \pi \frac{v_{i n}}{V_{o v}} \tag{6.4}
\end{equation*}
$$

Note that in this convention, a positive $v_{\text {in }}$ causes the phase difference to drift towards the boundary at $\pi$. Therefore, an oscillation finishing with the phase difference at $\pi$ means the comparison result is "high", and otherwise (finishing at $-\pi$ ) means "low".

For easier formulation of the phase shift during a comparison, we assume that the phase difference shift $\Phi(t)$ is a continuous-time random process with independent increments in nonoverlapping time intervals, similar to 1-dimensional Brownian motion with drift. Then, the probability density function of $\phi(t)$,

$$
\begin{equation*}
f(t, \phi) \equiv f_{\Phi(t)}(\phi), t \geq 0,-\pi \leq \phi \leq \pi \tag{6.5}
\end{equation*}
$$

satisfies the Fokker-Planck equation [49]

$$
\begin{equation*}
\partial_{t} f(t, \phi)=-M \partial_{\phi} f(t, \phi)+\frac{\Sigma}{2} \partial_{\phi}^{2} f(t, \phi) \tag{6.6}
\end{equation*}
$$

where $M$ and $\Sigma$ are "drift" and "diffusion" coefficients of this random process, defined as

$$
\begin{gather*}
M \equiv \frac{\mu_{\Delta \phi}}{\tau}=4 \pi f_{0} \frac{v_{i n}}{V_{o v}}  \tag{6.7}\\
\Sigma \equiv \frac{\sigma_{\Delta \phi}^{2}}{\tau}=8 \pi^{2} f_{0}^{2} \frac{k T}{I}\left(\frac{2}{V_{o v}}\left(\gamma_{N}+\gamma_{P}\right)+\frac{2}{V_{D D}}\right) . \tag{6.8}
\end{gather*}
$$

From the initial condition of the comparator, this system has a boundary condition

$$
\begin{equation*}
f(0, \phi)=\delta(\phi) \tag{6.9}
\end{equation*}
$$

In addition, another boundary condition at the phase shift boundaries is given by

$$
\begin{equation*}
f(t, \pi)=f(t,-\pi)=0 \tag{6.10}
\end{equation*}
$$

because a trial of this process is excluded from the probability density once its value reaches a boundary.

The solution of this system is

$$
\begin{equation*}
f(t, \phi)=\frac{1}{\pi} e^{\frac{M}{\Sigma} \phi} \sum_{n=1}^{\infty} e^{-\lambda(n) t} \cos (k(n) \phi) \tag{6.11}
\end{equation*}
$$

where the coefficients $k(n)$ and $\lambda(n)$ are defined as

$$
\begin{equation*}
k(n)=n-\frac{1}{2}, n \in N \tag{6.12}
\end{equation*}
$$

$$
\begin{equation*}
\lambda(n)=\frac{M^{2}}{2 \Sigma}+\frac{\Sigma}{2} \times k(n)^{2} . \tag{6.13}
\end{equation*}
$$

### 6.3.2 Comparison Time and Energy

Let $T$ be a random variable for the comparison time and $f(t)$ be a function representing the probability of the comparator oscillating at time $t . f(t)$ is derived by integrating $f(t, \phi)$ in equation (6.11) along the $\phi$ axis:

$$
\begin{align*}
f(t) \equiv P[T>t] & =P\left[-\pi<f_{\Phi(t)}(\phi)<\pi\right]=\int_{-\pi}^{\pi} f(t, \phi) d \phi \\
= & \frac{1}{\pi} \Sigma \cosh \left(\frac{M}{\Sigma} \pi\right) \sum_{n=1}^{\infty}(-1)^{n+1} \frac{k(n)}{\lambda(n)} e^{-\lambda(n) t} \tag{6.14}
\end{align*}
$$

By differentiating this with respect to $t$, we obtain the probability density function of $T$, $f_{T}(t)$, as

$$
\begin{equation*}
f_{T}(t)=\frac{d}{d t} P[T \leq t]=-f^{\prime}(t)=\frac{1}{\pi} \Sigma \cosh \left(\frac{M}{\Sigma} \pi\right) \sum_{n=1}^{\infty}(-1)^{n+1} k(n) e^{-\lambda(n) t} \tag{6.15}
\end{equation*}
$$

The average comparison time $E[T]$ is

$$
\begin{equation*}
E[T]=\int_{0}^{\infty} t f_{T}(t) d t=\frac{\pi \tanh \left(\frac{M}{\Sigma} \pi\right)}{M}=\frac{\pi^{2}}{\Sigma} S\left(\frac{M}{\Sigma}\right) \tag{6.16}
\end{equation*}
$$

where the scaling factor $S$, which is dependent on the ratio $M / \Sigma$, is defined as

$$
\begin{equation*}
S(k) \equiv \frac{\tanh (k \pi)}{k \pi} \tag{6.17}
\end{equation*}
$$

The function $S(k)$ is an even function with its maximum at $(0,1)$. Its value decreases as $|k|$ becomes larger, characterizing the automatic energy scaling behavior of this comparator. When $M=0$, i.e. $v_{i n}=0$, the average comparison time peaks at $\pi^{2} / \Sigma$.

The energy for a comparison is easily calculated from the comparison time. Because each edge draws current $I$ from the supply voltage $V_{D D}$ on average as it propagates, the comparator consumes an average power of

$$
\begin{equation*}
P=2 I V_{D D} \tag{6.18}
\end{equation*}
$$

Therefore, the average energy consumption per comparison is

$$
\begin{equation*}
E=P \times E[T]=2 I V_{D D} \frac{\pi^{2}}{\Sigma} S\left(\frac{M}{\Sigma}\right) \tag{6.19}
\end{equation*}
$$

### 6.3.3 Input Referred Noise

Let $g(t, \phi)$ be a function on the region $t \geq 0,-\pi \leq \phi \leq \pi$ whose value represents the probability for the comparator to finish its oscillation with a final phase difference shift of $\pi$, meaning output "high", when the current phase difference shift is $\phi$ at time $t$. By this definition, the value of this function must be independent of time $t$, because its dynamic behavior is only determined by the stationary random variable $\Delta \Phi(\Delta t)$ and current phase difference $\phi$. Then, $g(t, \phi)=g(\phi)$ satisfies the differential equation

$$
\begin{equation*}
A g^{\prime}(\phi)+\frac{B}{2} g^{\prime \prime}(\phi)=0 . \tag{6.20}
\end{equation*}
$$

Solving equation (6.20) with two boundary conditions

$$
\begin{equation*}
g(\pi)=1, g(-\pi)=0 \tag{6.21}
\end{equation*}
$$

that are clearly given by the earlier definition of $g$, we obtain

$$
\begin{equation*}
g(\phi)=\frac{e^{\frac{2 M}{\Sigma} \pi}-e^{\frac{-2 M}{\Sigma} \phi}}{e^{\frac{2 M}{\Sigma} \pi}-e^{\frac{-2 M}{\Sigma} \pi}} . \tag{6.22}
\end{equation*}
$$

Let $h\left(v_{\text {in }}\right) \equiv g(0)$ be a function representing the probability of the comparison result being "high" when the input voltage is $v_{i n}$. This function changes its value according to $v_{i n}$ as $M$ depends on $v_{i n}$.

$$
\begin{equation*}
\left.h\left(v_{i n}\right) \equiv g(0)\right|_{v_{i n}}=\frac{e^{\frac{2 A\left(v_{i n}\right)}{B} \pi}-1}{e^{\frac{2 A\left(v_{i n}\right)}{B} \pi}-e^{-\frac{2 A\left(v_{i n}\right)}{B} \pi}} . \tag{6.23}
\end{equation*}
$$

The comparator's input-referred noise voltage $v_{n}$ is a random variable with a probability density function $f_{v_{n}}(v)$ that satisfies

$$
\begin{equation*}
h\left(v_{\text {in }}\right)=\int_{-\infty}^{\infty} H\left(v_{i n}+v_{n}\right) f_{v_{n}}\left(v_{n}\right) d v_{n}=\int_{-v_{i n}}^{\infty} f_{v_{n}}\left(v_{n}\right) d v_{n} \tag{6.24}
\end{equation*}
$$

where $H\left(v_{i n}\right)$ is the Heaviside step function that models probability of the ideal noiseless comparator output, and therefore,

$$
\begin{equation*}
f_{v_{n}}\left(v_{n}\right)=h^{\prime}\left(-v_{n}\right)=h^{\prime}\left(v_{n}\right) \tag{6.25}
\end{equation*}
$$

because $h^{\prime}$ is an even function.

Then, the comparator's input-referred noise power $\sigma_{v_{n}}^{2}$ is obtained as

$$
\begin{align*}
& \sigma_{v_{n}}^{2}=\int_{-\infty}^{\infty} v_{n}^{2} f_{v_{n}}\left(v_{n}\right) d v_{n}=\int_{-\infty}^{\infty} v_{n}^{2} h^{\prime}\left(v_{n}\right) d v_{n} \\
&=\frac{\pi^{2}}{3} f_{0}^{2}\left(\frac{k T}{I}\right)^{2}\left(\frac{2}{V_{o v}}\left(\gamma_{N}+\gamma_{P}\right)+\frac{2}{V_{D D}}\right)^{2} V_{o v}^{2} \tag{6.26}
\end{align*}
$$

### 6.4 Discussion on Characteristics of Edge-Pursuit Comparator

The analysis in Section 6.3 reveals some useful characteristics of the edge-pursuit comparator compared to conventional comparators, which are discussed in this section. In addition, to evaluate the energy efficiency of the EPC and compare it to conventional clocked comparator topologies [37]-[39], energy efficiency norm values are estimated for EPC and conventional comparators and compared to each other.

### 6.4.1 Input Noise Tunability

From (6.26), the noise rms level $\sigma_{v_{n}}$ is

$$
\begin{equation*}
\sigma_{v_{n}}=\sqrt{\sigma_{v_{n}}^{2}}=\frac{\pi}{\sqrt{3}} f_{0} \frac{k T}{I}\left(\frac{2}{V_{o v}}\left(\gamma_{N}+\gamma_{P}\right)+\frac{2}{V_{D D}}\right) V_{o v} \tag{6.27}
\end{equation*}
$$

Note that the rms level of the comparator's input-referred noise is proportional to $f_{0} / I$, which is inversely proportional to the total capacitor size throughout the oscillator. Using this characteristic, one can easily tune the required input-referred noise level across a wide range for this comparator topology during both design time and runtime. On the other hand, other comparators usually require the tuning of design factors inversely proportional to the required
noise power, rather than the rms level, which renders wide-range noise tuning more difficult. Figure 6.7(a) shows an example of changing the total capacitance by changing the size of each inverter cell, and as expected, the noise rms level roughly follows the inverse of the total capacitance.


Figure 6.7 Simulated input referred noise vs. (a) delay cell size and (b) number of delay cell.

Another example in Figure 6.7(b) tries to change the total capacitance by changing the number of delay cells in the comparator. However, simulated results show that the noise changes more sensitively than expected, which is due to positive feedback on the phase difference shift. The phase difference shift changes the time that each stage output stays at 0 and $V_{D D}$, during which the internal nodes of each delay cell are reset. If this time for reset becomes too short, the nodes in the delay cell cannot be completely reset, accelerating the phase difference shift in the present direction. As shown in the graph, this positive feedback more affects the comparator with a small number of stages because the time for reset is shorter. This mechanism is similar to the regeneration of the output signal in conventional regenerative comparators, but this phase regeneration does not consume much energy whereas voltage regeneration in conventional comparators consumes a fixed amount of dynamic energy. For this reason, the energy efficiency of the edge-pursuit comparator is maintained even for designs with a small number of stages. This
positive feedback mechanism further increases the tunable noise range, showing a 12.5 x noise level change only by changing the number of stages from 8 to 14 , while conventional comparators require more than 100x design parameter tuning for a similar noise level change.

### 6.4.2 Automatic Energy Scaling

According to (6.19), the EPC's energy consumption depends on the energy scaling factor $S(M / \Sigma)$. To estimate how much energy is actually saved in usual applications, we obtain the relationship between $M / \Sigma$ and $v_{i n}$ from equation (6.27) and (6.7),

$$
\begin{equation*}
\frac{M}{\Sigma}=\frac{1}{\sqrt{12}} \frac{v_{i n}}{\sigma_{v_{n}}} . \tag{6.28}
\end{equation*}
$$

Taking this equation together with the graph of the scaling factor $S$ in Figure 6.8 into account, the energy scaling factor remains around 1 when $v_{\text {in }}$ is within the noisy region, but if $v_{\text {in }}$ goes outside the noisy region, it decreases fast towards 0 in a hyperbolic manner. Therefore, the comparator can save almost all its energy in most voltage ranges, except for a small noisy region that is usually within the $\mu \mathrm{Vs} \sim \mathrm{mVs}$ range.


Figure 6.8 Graph of the EPC's scaling factor $S(k)$.


Figure 6.9 Comparisons during SAR ADC conversion in the energy-worst case.

For example, assuming an application of the EPC in a SAR ADC where the comparator is designed to have the same noise level as the quantization noise, a comparison with $v_{i n}=L S B=$ $\sqrt{12} \sigma_{v_{n}}$ only consumes $\sim 0.317$ times the energy of the comparison with $v_{i n}=0$. Even assuming the worst case of consuming the maximum comparison energy, where the comparison occurs
alternatively below and above 0 to finally finish with the comparison exactly at $v_{\text {in }}=0$ as shown in Figure 6.9 , the calculated total energy for all comparisons $\left(v_{i n}=(1 \sqrt{12}) L S B, 1 L S B, 2 L S B\right.$, $\cdots 2^{14} L S B$ ) during a single ADC conversion is only $\sim 1.86$ times the energy of a single comparison with $v_{i n}=0$.

### 6.4.3 Energy vs. Noise Efficiency

To evaluate the edge-pursuit comparator's energy efficiency and compare it to other comparators, we shall define a norm for comparison

$$
\begin{equation*}
N \equiv E \times \frac{\sigma_{v_{n}}^{2}}{V_{D D}^{2}} \tag{6.29}
\end{equation*}
$$

which means the energy consumption per SNR assuming maximum signal power is $V_{D D}^{2}$. From (6.8), (6.19) and (6.26), we get the norm value for the EPC:

$$
\begin{equation*}
N_{E P C}=\frac{\pi^{2}}{6} S\left(\frac{M}{\Sigma}\right) k T \frac{\left(V_{D D}\left(\gamma_{N}+\gamma_{P}\right)+V_{o v}\right) V_{o v}}{V_{D D}^{2}} . \tag{6.30}
\end{equation*}
$$

Assuming $\gamma_{N}=\gamma_{P}=\gamma$ and $V_{D D}\left(\gamma_{N}+\gamma_{P}\right) \gg V_{o v}$, (6.30) simplifies to

$$
\begin{equation*}
N_{E P C} \cong \frac{\pi^{2}}{3} S\left(\frac{M}{\Sigma}\right) k T \gamma \frac{V_{o v}}{V_{D D}}, \tag{6.31}
\end{equation*}
$$

which has the dimension of energy in the form of $k T$ multiplied by some design factors.

From Nuzzo's analysis [50] on a single-stage regenerated comparator [38] illustrated in Figure $6.10(\mathrm{a})$, its input-referred voltage noise power $\sigma_{v_{n}}^{2}$ is derived as

$$
\begin{equation*}
\sigma_{v_{n}}^{2}=\sigma_{M_{1}}^{2}+\sigma_{S_{1}}^{2}+\sigma_{M_{3-5}}^{2}+\sigma_{S_{3}}^{2} \tag{6.32}
\end{equation*}
$$


(a)

(b)

Figure 6.10 Conventional dynamic comparators. (a) Single-stage [38], [50]. (b) Two-stage [39].

Assuming the comparator's noise is optimized enough so that $\sigma_{M_{1}}^{2}$ becomes dominant, (6.32) is simplified to

$$
\begin{equation*}
\sigma_{v_{n S S}}^{2} \cong \sigma_{M_{1}}^{2}=\frac{2 k T \gamma}{C_{X}} \frac{V_{o v}}{V_{t h}} . \tag{6.33}
\end{equation*}
$$

where $V_{t h}$ and $V_{o v}$ are $V_{T n 3}$ and $V_{o v 1,1}$ in the original equation in [50], which means the threshold voltage of $M_{3}$ and overdrive voltage of $M_{1}$ during comparison phase 1 defined in [50], respectively.

During a comparison, this comparator discharges $X_{1}$ and $X_{2}$ from $V_{D D}$ to 0 . Either node between the two output nodes is also fully discharged. The other output node is discharged down to around half of $V_{D D}$ where the current through $M_{3-4}$ and $M_{5-6}$ are balanced. Assuming most of the energy for comparison is used in recharging these nodes, this comparator consumes energy

$$
\begin{equation*}
E_{S S} \cong\left(2 C_{X}+1.5 C_{O}\right) V_{D D}^{2} \tag{6.34}
\end{equation*}
$$

per comparison. Using (6.29), this comparator's performance norm is

$$
\begin{equation*}
N_{S S}=E_{S S} \times \frac{\sigma_{v_{n S S}}^{2}}{V_{D D}^{2}} \cong 4 k T \gamma\left(1+\frac{3}{4} \frac{C_{o}}{C_{X}}\right) \frac{V_{o v}}{V_{t h}} \tag{6.35}
\end{equation*}
$$

and following the assumption $C_{O} \cong C_{X}$ in [50], it is further simplified to

$$
\begin{equation*}
N_{S S} \cong 7 k T \gamma \frac{V_{o v}}{V_{t h}} . \tag{6.36}
\end{equation*}
$$

From Elzakker's analysis [39] on a two-stage comparator illustrated in Figure 6.10(b), its input-referred voltage noise power $\sigma_{v_{n}}^{2}$ is derived as

$$
\begin{equation*}
\sigma_{v_{n} T S}^{2}=4 k T \frac{1}{V_{t h} C_{F}} \frac{I}{g_{m}}=\frac{2 k T \gamma}{C_{F}} \frac{V_{o v}}{V_{t h}} \tag{6.37}
\end{equation*}
$$

by substituting $g_{m}$ with $2 I / V_{o v}$, and restoring the omitted $\gamma$ by the assumption $\gamma=1$ in [39]. During a comparison, the drain nodes of the two input transistors are discharged from $V_{D D}$ to 0 , each of which is connected to a large capacitor $C_{F}$. Therefore, the energy to replenish these capacitors dominates the total energy consumption, which is

$$
\begin{equation*}
E_{T S} \cong 2 C_{F} V_{D D}^{2} \tag{6.38}
\end{equation*}
$$

per comparison. Therefore, the comparator's performance norm is

$$
\begin{equation*}
N_{T S}=E_{T S} \times \frac{\sigma_{v_{n}}^{2}}{V_{D D}^{2}}=4 k T \gamma \frac{V_{o v}}{V_{t h}} . \tag{6.39}
\end{equation*}
$$

All comparators' performance norms estimated in (6.31), (6.36) and (6.39) share the same dimension and similar form factored by $k T \gamma$. Ratios among those norms are

$$
\begin{equation*}
N_{E P C}: N_{S S}: N_{T S}=S\left(\frac{A}{B}\right) \frac{\pi^{2} / 3}{V_{D D}}: \frac{7}{V_{t h}}: \frac{4}{V_{t h}}, \tag{6.40}
\end{equation*}
$$

showing that the EPC has relatively smaller norms than the other two, even when $v_{\text {in }}=0$ and $S=$ 1 where the EPC does not benefit from the scaling factor $S$ at all. This efficiency gain comes from
(a) Saving energy used for output regeneration (versus a single-stage comparator only), by 4/7
(b) Increased voltage integration swing from $V_{t h}$ to $V_{D D} / 2$, by $V_{D D}$
(c) EPC's bidirectional operation similar to [51], where both pull-up and pull-down currents are used for phase integration, by $1 / 2$
(d) Fixed phase difference shift threshold for an output decision that prevents a decision with insufficient signal integration, by $\pi^{2} / 12$.

In addition to the above, the EPC can further reduce the average comparison energy due to automatic energy scaling. For example, the EPC consumes only around 1.86x energy per single SAR ADC conversion as discussed in Section 6.4.2, which is comparable to the energy level for only a single comparison of other comparators.

Figure 6.11 shows the simulation results of the three comparators compared. With similar levels of input-referred noise (Figure 6.11(a)), the edge-pursuit comparator shows large energy savings from automatic energy scaling, while the other two comparators show nearly constant energy consumption. The ratio among the energy consumption at $v_{i n}=0$ does not match with equation (6.40) as shown in Figure 6.11(b), because some important design parameters such as $V_{o v}$ differ among the simulated designs. If the $V_{o v}$ is matched by increasing the size of MOSFET ( $\mathrm{M}_{\text {CLK }}$ ), the ratio of the energy consumption becomes similar to the equation (6.40).


Figure 6.11 Comparison of simulated comparator performances among EPC and conventional 1stage [38], [50] and 2-stage [39] comparators. (a) Probability for output "high", inferring inputreferred noise. (b) Comparison energy vs. input signal difference.

### 6.4.4 Offset

The mismatch of the MOSFET causes an input-referred offset voltage, $V_{O S}$, and it makes a small delay difference, $\Delta t_{d}$ of each delay cell. Since mismatch factors among every delay cell are uncorrelated, the standard deviation of accumulated delay difference when the edge runs a lap of $N$-stage delay cells is $\sqrt{N} \cdot \Delta t_{d}$. According to [42], the voltage to time gain of the $N$-stage delay cell is $N \cdot \Delta t_{d} / V_{O S}$, thus the input-referred offset voltage of the N -stage delay cells, $V_{O S_{-} N}$ is

$$
\begin{equation*}
V_{O S_{-} N}=\frac{1}{\sqrt{N}} \cdot V_{O S} \tag{6.41}
\end{equation*}
$$

Therefore, the input-referred offset voltage is dependent on the total area of delay cells.
Figure 6.12 shows that the Monte-Carlo simulation result and the offset voltage is reduced when the number of delay cell is increased.


Figure 6.12 Simulated input-referred offset voltage vs. number of delay cell.


Figure 6.13 15-bit SAR ADC architecture with EPC and dual CDAC for high resolution.

### 6.5 SAR ADC with Edge-Pursuit Comparator

The EPC is applied to a 15-bit high resolution synchronous SAR ADC that is composed of a CDAC and digital logic as shown in Figure 6.13. The EPC uses 16 delay cells without load capacitor. The EPC has a meta-stability issue because the comparison time is automatically changed according to the input voltage difference. The performance and the comparison time of the EPC is maximized in meta-stability condition. For this reason, the sampling rate of the ADC should be decided by considering the comparison time when the input voltage difference is very small. Figure 6.14 shows that the transient noise simulation result when the input voltage difference is 0 . Most of the comparison time is smaller than $3.5 \mu \mathrm{~s}$, thus the sampling rate of the ADC is decided to $20 \mathrm{kS} / \mathrm{s}$.


Figure 6.14 Probability distribution function of the comparison time at $\mathrm{VIN}=0$.

The CDAC of the SAR ADC consist of the 10 -bit coarse CDAC, the 5 -bit fine CDAC and the 9 -bit common-mode to differential gain tuning CDAC. The unit capacitance of the coarse and fine CDACs is 16 fF and unit capacitance of the tuning CDAC is 4 fF . The 10 -bit differential

CDAC is implemented using a split capacitor array [52] to reduce the switching power. The 5-bit fine CDAC shares top plates with the CDAC $\left(\mathrm{V}_{\text {INP }}, \mathrm{V}_{\text {INM }}\right)$ and has the same unit capacitor size as the coarse CDAC. An intentional difference between tuning the capacitors Ctunep and Ctunem induces a small differential voltage change as the shared bottom plates of the fine CDAC change, allowing high resolution without significantly increasing the overall CDAC capacitance.


Figure 6.15 Operation of 5-bit find CDAC during fine-bit decision. (a) Initial state. (b) After a comparison with "COMP $=0$ " (c) After another comparison with "COMP $=0$ ".

Differing from a conventional bridge-capacitor technique [53], [54], the 5-bit fine CDAC has shared bottom plates for each pair of capacitors. Figure 6.15 shows the detailed operation of the bottom node switching. First, all bottom nodes of the fine CDAC are reset to GND during the sampling phase. After finishing the 10 -bit MSB decision using a differential CDAC, the shared bottom node of the fine DAC's MSB is set to VDD (Figure 6.15(a)). It changes the differential
input voltage into the comparator by half a LSB of the coarse DAC (Figure 6.16, upwards arrow) to form a middle point for the 11th bit decision. After the comparison, the voltage of the MSB bottom node is set according to the comparison result ( 0 in the example in Figure 6.15, Figure 6.16), and the bottom node of the second MSB is switched to VDD for the 12 th bit decision (Figure 15(b)). In this manner, the fine CDAC switches its shared bottom nodes in the same way as a usual single-ended SAR ADC (Figure 6.15(c)). Because the bottom node switching injects the same charge into both CDAC top output nodes, switching a capacitor only shifts the common mode voltage of the two output nodes and does not impact the SAR as long as the two CDACs are completely matched. However, by creating a small imbalance between the total capacitance to ground of the two CDAC output nodes, this common mode shift will also translate into a small differential voltage difference (Figure 6.16). This common mode charge injection to differential voltage gain is fine-tuned using the two tuning capacitors C $_{\text {TUNEP }}$ and C $_{\text {TUNEM }}$ (Figure 6.13).


Figure 6.16 Operation principle of the fine-bit CDAC generating small voltage change.

As depicted in Figure 6.17, both the bridge-capacitor technique and common-mode CDAC technique use tuning capacitor arrays to control the fine-to-coarse CDAC gain. Tuning switches in these capacitor arrays have parasitic capacitances whose values vary as the corresponding top voltage changes, possibly injecting harmonic voltage distortion in the sampled signal. Compared
to the bridge-capacitor technique, the common-mode CDAC technique is less affected by this distortion because the top plate is shared between the coarse and fine CDAC, where voltage across the tuning capacitors always stays near the input common-mode voltage whenever a fine comparison is performed (Figure $6.17(\mathrm{~b})$ ). On the other hand, the top plate voltage of the fine CDAC does not converge to the same level in the bridge-capacitor technique, so the value of the parasitic capacitor can vary more. Hence, the proposed top-plate shared fine CDAC structure shows improved linearity over the bridge-capacitor technique by reducing the non-linearity introduced by the non-linear parasitic capacitance of the switches controlling $\mathrm{C}_{\text {TUNEP }}$ and $\mathrm{C}_{\text {TUNEM }}$.

(a)

(b)

Figure 6.17 Comparison between techniques for high-resolution CDAC. (a) Bridge-capacitor scheme [53], [54] (b) Presented common-mode switching CDAC.

The 5 bit CDAC with common-mode shifting shares its top node, thus a mismatch of the fine CDAC can cause more error than the bridge capacitor technique. However, the total size of the fine CDAC is 16 times smaller than the total CDAC size and the mismatch factor of the MOM capacitor which is used to design the CDAC is $0.2 \%$.If we assume that the mismatch error follows a Gaussian distribution, the mismatch error caused by the fine CDAC does not make significant effect to the 15 bit CDAC. Also, the mismatch error can be calibrated using several techniques like a redundancy capacitor. On the other hands, it is very hard to reduce the non-linearity from the voltage dependent capacitor of switches.

The common-mode rejection ratio is also important design factor for the fine comparison. The maximum common-mode voltage changing at the fine comparison is about 30 mV and it can change the noise and delay from the current-limiting MOSFETs. However, the EPC has a symmetric structure and the two edges in the EPC propagate through the same delay cells and each delay cell has both PMOS and NMOS for current-limiting. For this reason, two propagation edges have same delay, when the input voltage difference is 0 , even though the common-mode voltage is shifted. Therefore, the input-referred offset of the EPC causing the common-mode shifting is negligible. Also the 9-bit tuning capacitance range of the common-mode to differential gain is made sufficiently wide to cover this common-mode change issue.

### 6.6 Measured Results

The ADC with the EPC was fabricated in a 40 nm CMOS process with a total area of 0.315 mm 2 (Figure 6.18). Shown as a white dot in the middle, the EPC has a very small area of $54 \mu \mathrm{~m} 2$, considering its low noise level. The EPC is not located at the center between the CDACs because
there are many digital signal lines at the center. Therefore, we move the EPC down to prevent the noise. The different distance between the plus/minus CDAC and comparator lead to different parasitic capacitance of the CDAC top node, but this capacitance value is much smaller than the unit capacitance and the sampling rate of the ADC is slow.


Figure 6.18 Die photograph of 15 -bit SAR ADC with EPC.


Figure 6.19 Measured average comparison energy of the EPC vs. SAR ADC bit position.

Figure 6.19 shows the measured average comparison energy for each bit position. The comparison energy for the MSB and LSB bit position differ more than 67 times, proving its widerange automatic energy scaling. Measured ADC results show a maximum DNL/INL of 1.9/5.2

LSB and a minimum DNL/INL is -1/-3.2 (Figure 6.20). Tuning capacitor values of the CTUNEP and CTUNEM are decided approximately by checking the output code of the ADC when the very slow and small amplitude is applied and is optimized to get a best DNL and INL. However, the tuning capacitor cannot remove the missing code perfectly and it limits the ENOB to 12 bit.


Figure 6.20 Measured DNL and INL


Figure 6.21 Measured SNDR and SFDR vs. input signal frequency.

SFDR and SNDR are 95.1 dB and 74.12 dB at the Nyquist frequency (Figure 6.21), which corresponds to 12.02 b ENOB. Figure 6.22 shows the measured frequency spectrum when the input frequency is 0.999 kHz (Figure $6.22(\mathrm{a})$ ) and 9.999 kHz (Figure 6.22(b)). The measured frequency spectrum shows that spurs increase when the input signal frequency is reduced, because the bandpass filter has a lower harmonics suppression and a larger signal attenuation at low input frequency. Figure 6.23 shows that the EPC consumes 104 nW at the Nyquist frequency, representing only $8.9 \%$ of the total ADC power of $1.17 \mu \mathrm{~W}$.

(b) $\mathrm{f}_{\text {in }}=9.999 \mathrm{k}$

Figure 6.22 Measured frequency spectrum.


Figure 6.23 Measured power consumption of the SAR ADC at Nyquist frequency.

Table 6.1 ADC Performance summary and comparison.

|  | This work | $\begin{aligned} & \text { Tai, ISSCC } \\ & 2014 \end{aligned}$ | Harpe, ISSCC 2014 |  | $\begin{array}{\|l} \text { Lim, ISSCC } \\ 2015 \end{array}$ | $\begin{aligned} & \text { Liu, ISSCC } \\ & 2016 \end{aligned}$ | $\begin{aligned} & \text { Lee, JSSC } \\ & 2011 \end{aligned}$ | Bannon, VLSIC 2014 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Technology | 40 nm | 40 nm | 65 nm |  | 65 nm | 28 nm | 180 nm | 180 nm |
| Architecture | SAR | SAR | SAR |  | PipelinedSAR | PipelinedSAR | SAR | Pipelined-SAR |
| Resolution (bits) | 15 | 10 | 12 | 14 | 13 | 12 | 10 | 18 |
| Sampling rate, Fs (kS/s) | 20 | 200 | 32 |  | 50000 | 100000 | 100 | 5000 |
| Input voltage range <br> (Differential) | $1.8 \mathrm{~V}_{\mathrm{pk}-\mathrm{pk}}$ | Not reported | Not reported |  | $2.4 \mathrm{~V}_{\mathrm{pk} \text {-pk }}$ | $1.8 \mathrm{~V}_{\mathrm{pk}-\mathrm{pk}}$ | $1.2 \mathrm{~V}_{\mathrm{pk}-\mathrm{pk}}$ | $10 \mathrm{~V}_{\mathrm{pk} \text {-pk }}$ |
| Area (mm²) | 0.315 | 0.0065* | 0.18 |  | 0.0544* | 0.0047* | 0.125 | 5.74 |
| SFDR | 95.1 | 76.25 | 78.4 | 78.5 | 84.6 | 75.4 | 67 | Not reported |
| SNDR | 74.12 | 55.63 | 67.8 | 69.7 | 70.9 | 64.43 | 57.7 | 98.6 |
| ENOB (SNDR-1.76)/602 | 12.02 | 8.95 | 10.97 | 11.29 | 11.5 | 10.39 | 9.3 | 16.09 |
| $\mathrm{INL}_{\text {max }}(\mathrm{LSB})$ | 5.5 | 0.45 | 0.82 | 3.50 | 0.96 | 0.82 | 0.8 | 0.52 |
| $\mathrm{DNL}_{\text {max }}(\mathrm{LSB})$ | 1.9 | 0.44 | 0.58 | 1.75 | 0.58 | 0.53 | 0.4 | 0.10 |
| Total Power $(\mu \mathrm{W})$ | 1.17 | 0.084 | 0.310 | 0.352 | 1000 | 350 | 1.3 | 60520 |
| Critical Power ${ }^{* *}(\mu \mathrm{~W})$ | 0.104 | $0.025^{* * *}$ | $\underset{\substack{ \\0.124 \\ * *}}{ }$ | $\begin{gathered} 0.141 \\ * * * \end{gathered}$ | 336 *** | 119*** | 0.130*** | Not reported |
| FOMs (dB) | 173.4 | 176.8 | 174.9 | 176.3 | 174.9 | 176 | 163.6 | 177.7 |
| $\mathrm{FOM}_{\mathrm{C}}(\mathrm{dB})^{* * * *}$ | 184 | 181.6 | 178.9 | 180.2 | 179.7 | 180.6 | 173.6 | - |

* Active area ** Power for noise critical block (comparator in SAR, amplifier in pipelined SAR)
*** Calculated value from the paper/presentation material $\quad$ **** $\mathrm{SNDR}+10 \log \left(\mathrm{~F}_{\mathrm{S}} / 2 /\right.$ (Critical power))

Table 6.1 summarizes the performance of the implemented ADC with the EPC, and compares it with other similar works on SAR or pipeline-assisted SAR ADCs. To compare the EPC's efficiency with other ADCs adopting different architecture more clearly, a new figure of merit, named FoMC, is derived from the Schreier FoM. In FoMC, the total power consumption term of the original Schreier FoM is replaced by the power consumption of the noise-critical blocks only, such as the comparators in SAR ADCs, amplifiers in pipelined ADCs, and integrators in delta-sigma ADCs , giving more emphasis on the energy efficiency of noise-critical blocks except for the logic and CDAC power.

$$
\begin{equation*}
F o M_{C} \equiv S N D R+10 \log \left(\frac{F_{S}}{2 \times(\text { Critical Power })}\right) \tag{6.42}
\end{equation*}
$$

While the $\mathrm{FoM}_{\mathrm{S}}$ of the ADC is 173.4 dB , the $\mathrm{FoM}_{\mathrm{C}}$ of the EPC in this SAR ADC is calculated to be 184 dB , which compares favorably to other similar designs. This underscores the applicability of the EPC to other low-power SAR ADC topologies.

### 6.7 Conclusion

An energy efficient comparator, named the edge-pursuit comparator (EPC), with an automatic energy scaling capability according to the input difference is presented. Capacitors in the oscillation path are recycled many times during phase-based operation, which allows for accurate comparisons with a small area and total capacitance. Bidirectional signal integration naturally occurs as edges propagate, offering extra efficiency gain. A 15-bit SAR ADC using a small EPC of $54 \mu \mathrm{~m} 2$ shows a 74.12 dB SNDR and a 173.4 dB FoMS at the Nyquist frequency of 10 kHz . The EPC shows $67 \times$ automatic energy scaling between the MSB and LSB bit decisions
without any external control, saving a significant portion of the energy for the MSB decision. It also has a 184 dB FoMC, which is the best number among the designs compared.

## CHAPTER 7

# Fully-Integrated Voltage / Temperature Lock with On-Chip Oven Control 

### 7.1 Introduction

PVT variation forces overdesign of many circuits to ensure robust operation, incurring area and power overhead. Techniques that lock circuit operation relative to a reference input such as a reference voltage, current, and frequency are widely used to relieve this variation, but these approaches require an accurate reference, creating another design overhead. Even with an accurate reference, temperature variation usually remains present as every single transistor is affected by temperature change.

To reduce variation due to temperature, several prior works use temperature compensation by tuning the circuit according to the measured temperature [55]. This requires temperaturespecific tuning of the circuit and adds significant cost. In other applications such as accurate frequency generators and MEMS sensors, system temperature is directly controlled at a constant level using a heater [56]. Some optical links use in-silicon local heating of the optical ring resonator to fine-tune its resonance frequency [57]. Though this oven-control temperature locking has been rarely implemented in fully-integrated monolithic circuit, internal heaters have been used in sensors and calibration circuits [55], [58].

This paper presents an on-chip local 2-D voltage / temperature lock for removing both voltage and temperature variation by locking these variables concurrently. A fully integrated structure, including controller, in-domain heater, and sensing components, enables significant power and area (system volume) reductions for both the heating and heat control. Temperature and voltage sensing and locking paths are integrated together to further reduce design and area overhead. After locking, only static process variation remains and can be easily compensated by an inexpensive 1-point calibration. A wide range of circuits can be placed in the local locked region for extra accuracy and robustness; for example, on-chip reference generators such as bandgap references and relaxation oscillators can achieve much higher accuracy than previously reported when used in conjunction with the proposed technique.


Figure 7.1 Main concept of the voltage / temperature simultaneous lock.

### 7.2 Simultaneous Voltage-Temperature Lock Concept

Figure 7.1 shows the main idea of the voltage/temperature locking circuit. To lock the voltage and temperature at specific points, three simple voltage/temperature sensing circuits are
implemented in the region to be temperature-locked. These three sensors are tuned to have different output sensitivities to voltage and temperature changes. The three sensing circuits' relative outputs are regulated at a fixed ratio by 2-D simultaneous voltage and temperature feedback. An example is illustrated in Figure 7.1 where the output of the three sensors, $\mathrm{O}_{\mathrm{A}}, \mathrm{O}_{\mathrm{B}}$ and $\mathrm{O}_{\mathrm{C}}$ are locked at a:b:c. This locking point will be at the intersection of two contour lines corresponding to $\mathrm{O}_{\mathrm{A}}: \mathrm{O}_{\mathrm{B}}=\mathrm{a}: \mathrm{b}$ and $\mathrm{O}_{\mathrm{A}}: \mathrm{O}_{\mathrm{C}}=\mathrm{a}: \mathrm{c}$. If these two lines cross at a unique point within a wide reasonable voltage-temperature operating region, the locking circuits will attempt to lock the ratio $\mathrm{O}_{\mathrm{A}}: \mathrm{O}_{\mathrm{B}}: \mathrm{O}_{\mathrm{C}}$ to a:b:c by moving the voltage and temperature onto this unique crossing point, resulting in a fixed voltage and temperature level.

This method does not require complex sensors or references for measuring an absolute voltage or temperature level. Instead, the ratio of three sensors (a:b:c) can be sampled at a certain operating point during testing and used to lock the domain at the same point during runtime. Once locked, the locked voltage and temperature themselves or outputs from simple circuits can be used as simple references. In addition, one can improve the temperature coefficient of any pre-existing circuit by placing it in the locked domain.

### 7.3 Implementation Detail

Figure 7.2 shows the actual implementation of the test circuit. Three differently tuned ringoscillators (ROs) are used as sensors because they are sensitive to both voltage and temperature, easy to implement, and their frequency outputs are easy to process. To lock the frequencies to a certain ratio, two fractional phase frequency detectors (PFDs) are individually used to control the voltage regulator and heater. To ensure proper feedback that moves toward locking, the frequency
weighting factors, $\mathrm{K}_{\mathrm{A}}, \mathrm{K}_{\mathrm{B}}$, and $\mathrm{K}_{\mathrm{C}}$, are differently tuned for the two PFDs. The output of each PFD is fed into a charge pump to control the VDD or heater current. Implemented outside of the locked domain, the PFDs and charge pumps are implemented digitally using standard cells, ensuring robust operation over a wide voltage and temperature range and offering ease of design and portability.


Figure 7.2 Overall architecture of the implemented test circuit.


Figure 7.3 Detailed oscillator implementation.


Figure 7.4 Structure of the on-chip heater.

Figure 7.3 shows the detailed structure of the three ROs. To reduce power dissipation and allow for wide tunability of the operating point, two or three ROs are stacked across VDD and VSS, as shown. Internal nodes in each stacked RO are capacitively coupled so that oscillations are in phase in all stacks and stable [26]. Oscillators A and B use a stack of two ROs, with low-VT
devices in oscillator A and standard-VT devices in oscillator B. When the shared VDD of these two oscillators is high, their overdrive voltage difference is relatively small and gives rise to a frequency ratio $f_{A}: f_{B}$ near 1 . When the shared VDD level reduces, $f_{A}: f_{B}$ increases as the relative overdrive voltage difference becomes large. Therefore, by locking this ratio at a certain intermediate value, one can ensure that the two oscillators operate in the near-threshold region, which is helpful in controlling the RO frequency and power dissipation levels. Oscillator C uses a stack of three ROs, hence the voltage across a single stack is lower and the operating point is moved towards the sub-threshold region where frequency is more sensitive to temperature. In this way, sufficient sensitivity to both voltage and temperature is achieved even in the presence of process variation.

For effective and uniform heating throughout the domain, heater elements are inserted between the inverters in ROs, as depicted in Figure 7.4. The heater current flows through the gate electrodes so that the heat is generated uniformly in the gate area.

### 7.4 Measurement Results

To experimentally demonstrate the concept of 2-D voltage / temperature lock and evaluate its performance, a test chip is fabricated in a 14 nm FinFET process (Figure 7.5).


Figure 7.5 Die micrograph.

We first consider a case with only the voltage lock turned on, controlled by $\mathrm{f}_{\mathrm{A}}: \mathrm{f}_{\mathrm{B}}$. All results with different temperatures from $25^{\circ} \mathrm{C}$ to $100^{\circ} \mathrm{C}$ (Figure 7.6a) shows successful frequency locking at $\sim 0.75-0.8 \mathrm{~V}$ within a $2.6 \%$ range, which corresponds to less than $0.0066 \mathrm{~V} / \mathrm{V}$ voltage sensitivity. On the other hand, temperature locking using $f_{A}: f_{C}$ with supply voltage fixed to 0.9 V shows $0.95 \%$ fC variation (Figure 7.6b), which corresponds to $0.013{ }^{\circ} \mathrm{C} /{ }^{\circ} \mathrm{C}$ temperature sensitivity.


Figure 7.6 Measurement Results.

Heater power dissipation is almost linear with the provided temperature increase, with a slope of $\sim 2 \mathrm{~mW} /{ }^{\circ} \mathrm{C}$. Measured heater performance (Figure 7.6 c ) shows that the heater accurately replicates the effect of an ambient temperature change, indicating its uniform heating over the locked domain. The test chips and packages do not use any heat insulation - an open ceramic package is used, with conductive epoxy that lowers thermal resistance and renders the heaters less effective. Plastic packaging without special heat-removal components would further improve the heater's efficiency.

The measured results with both locks active (Figure 7.6d) shows the total frequency spread is $\sim 7-8 \%$, which is $\sim 100 \times$ less than the frequency spread without the locking mechanism. This indicates that concurrent V/T locking performs as well as each locking mechanism in isolation. Table 7.1 summarizes the performance of this work, showing the effectiveness of monolithic ovencontrol to replace off-chip approaches.

Table 7.1 Performance summary.

| Technique | 2-D Voltage $/$ Temperature <br> lock |
| :---: | :---: |
| Process | 14 nm FinFET |
| Supply Voltage <br> Sensitivity | $<0.0066 \mathrm{~V} / \mathrm{V}$ <br> $@ ~$ <br> $25^{\circ} \mathrm{C}-100^{\circ} \mathrm{C}$ |
| Ambient Temperature | $0.013^{\circ} \mathrm{C} /{ }^{\circ} \mathrm{C}$ <br> (locked at $\sim 25^{\circ} \mathrm{C}$ ) <br> $0.007{ }^{\circ} \mathrm{C} /{ }^{\circ} \mathrm{C}$ <br> Sensitivity <br> (locked at $\sim 60^{\circ} \mathrm{C}$ ) |
| Power Consumption | Heater: $\sim 2 \mathrm{~mW} /{ }^{\circ} \mathrm{C}$ <br> Others: $<2.4 \mathrm{~mW} @ \mathrm{O}$ |
| Overall Area | $0.3 \mathrm{~mm} \times 0.4 \mathrm{~mm}$ |
| Area of Sensors Only | $7.5 \mu \mathrm{~m} \times 12.5 \mu \mathrm{~m}$ |

## CHAPTER 8

## Conclusion

### 8.1 Summary of Contributions

This dissertation has discussed the new challenges in small sensor systems arising in important non-digital blocks. Since these circuit do not follow the traditional scaling trend, new observations and techniques are required to solve these challenges.

Chapter 2 and Chapter 3 have discussed the circuits related to power management in remote systems, as their small form factor limits both the energy storage capacity and energy harvesting availability. Therefore, high-efficiency power conversion has become necessary over a wide power range down to several nanowatts, which has introduced new challenges. This dissertation presented low-power energy harvester and power regulator circuits maintaining high efficiency in wide power ranges from nanowatts level, which are realized by several newly proposed techniques such as a power efficient self-oscillating structure, cascaded $2: 1$ converter topologies for widely reconfigurable conversion ratio control, and a stable frequency control over a wide power range using leakage-based oscillators and load-proportional feedback control scheme. Chapter 4 also proposed a new cascaded converter topology with more ratio reconfigurability and improved conversion efficiency using negative voltage feedback.

Chapter 5 and Chapter 6 have addressed challenges in analog circuits essential to sensor interface implementation. Difficulties in area scaling in traditional analog circuits from their noise requirement and limited density of passive devices are overcome by shifting their operation into more digital-like manner. Analog operation in phase and frequency domain using simple combinations of digital logic blocks have realized similar or better performance than traditional analog operation in voltage and current domain, while saving lots of area by eliminating passive analog elements. Using this approach, Chapter 5 presented a capacitance-to-digital sensor interface with the best figure of merit, and Chapter 6 presented a new clocked comparator with automatic energy scaling capability and best energy efficiency among other previous works compared.

Chapter 7 discussed the PVT variation and its mitigation, which is becoming more important in advances technologies as the effect of variation becomes more serious due to its small device size and lower supply voltage level. A 2-D simultaneous voltage / temperature locking technique using an on-chip heater is presented in this chapter, by which a system in the locked domain can remove the variation from external voltage and temperature changes. After locking, only static process variation remains and can be easily compensated by an inexpensive 1-point calibration.

### 8.2 Future Research Directions

Chapter 2 through Chapter 6 have presented several techniques to overcome challenges from the small size of IoT systems, focusing on power and energy efficiency. In addition, Chapter 7 has presented a technique to operate these efficient circuits with high accuracy without the impact of variation. These techniques open many future research directions that can further improve the
quality and efficiency or relieve challenges or restrictions in applying these techniques in other circuits, some of which are discussed in this section.

Techniques used in energy harvesting and power management in this dissertation are all based on the switched-capacitor conversion technique, whose performance largely depends on the size and quality of the on-chip capacitors. Works in this dissertation mostly used low-density metal-insulator-metal (MIM) capacitors, but using higher density capacitors such as trench capacitor and capacitors with ferroelectric devices can greatly improve the overall performance.

The iterative discharge technique used in the CDC can be improved in many ways. Implementing a similar circuit in advanced technology can improve its resolution due to the reduced quantization noise and more noise averaging over periods. Energy loss in level converters and other auxiliary circuits can also be reduced by more design optimization. In addition, this digital-like circuit modification can be applied to other similar sensor interfaces such as resistance sensors and temperature sensors.

The great energy efficiency improvement of the edge-pursuit comparator can initiate many following research work. As shown in the measured results of the EPC in a SAR ADC, this comparator now consumes less energy than a CDAC in a SAR ADC, or an analog amplifier for low-noise signal amplification, which implies that it has a potential to reduce a complexity of a multi-stage sensor frontend or $\mathrm{A} / \mathrm{D}$ interface to less stages.

The variation removal loop can be easily improved in its performance and efficiency. As shortly discussed in Section 7.4, implementing any heat insulation around the test chip or locked domain can improve the heater efficiency. Measured data on the behavior of the temperature loop,
which is difficult to simulate before fabrication, can be used to design more stable feedback loop in the future.

## BIBLIOGRAPHY

[1] "ISSCC 2016 Trends," 2016, http://isscc.org/doc/2016/ISSCC2016_TechTrends.pdf.
[2] W. M. Holt, "1.1 Moore's law: A path going forward," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 8-13.
[3] W. Holt, "Intel Investor Meeting," Nov. 20, 2014, http://intelstudios.edgesuite.net/im/2014/index.html.
[4] D. C. Daly, L. C. Fujino, and K. C. Smith, "Through the Looking Glass - The 2017 Edition: Trends in Solid-State Circuits from ISSCC," IEEE Solid-State Circuits Mag., vol. 9, no. 1, pp. 12-22, winter 2017.
[5] W. Arden, M. Brillouët, P. Cogez, M. Graef, B. Huizing, and R. Mahnkopf, "'More-thanMoore’ White Paper," 2010, http://www.itrs2.net/uploads/4/9/7/7/49775221/irc-itrs-mtmv2_3.pdf.
[6] P. Medagliani et al., Internet of Things Applications-From Research and Innovation to Market Deployment. The River Publishers, 2014.
[7] "International Technology Roadmap for Semiconductors 2.0 - 2015 Edition," http://www.itrs2.net/itrs-reports.html.
[8] M. Fojtik et al., "A Millimeter-Scale Energy-Autonomous Sensor System With Stacked Battery and Solar Cells," IEEE J. Solid-State Circuits, vol. 48, no. 3, pp. 801-813, Mar. 2013.
[9] Y. Lee et al., "A Modular 1 mm Die-Stacked Sensing Platform With Low Power I C Inter-Die Communication and Multi-Modal Energy Harvesting," IEEE J. Solid-State Circuits, vol. 48, no. 1, pp. 229-243, Jan. 2013.
[10] J. P. Im, S. W. Wang, S. T. Ryu, and G. H. Cho, "A 40 mV Transformer-Reuse Self-Startup Boost Converter With MPPT Control for Thermoelectric Energy Harvesting," IEEE J. SolidState Circuits, vol. 47, no. 12, pp. 3055-3067, Dec. 2012.
[11] K. W. R. Chew, Z. Sun, H. Tang, and L. Siek, "A 400nW single-inductor dual-input-trioutput DC-DC buck-boost converter with maximum power point tracking for indoor photovoltaic energy harvesting," in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, 2013, pp. 68-69.
[12] K. Kadirvel et al., "A 330nA energy-harvesting charger with battery management for solar and thermoelectric energy harvesting," in 2012 IEEE International Solid-State Circuits Conference, 2012, pp. 106-108.
[13] S. Bandyopadhyay, P. P. Mercier, A. C. Lysaght, K. M. Stankovic, and A. P. Chandrakasan, " 23.2 A 1.1 nW energy harvesting system with 544 pW quiescent power for next-generation implants," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 396-397.
[14] H. Shao, C.-Y. Tsui, and W.-H. Ki, "An inductor-less MPPT design for light energy harvesting systems," in 2009 Asia and South Pacific Design Automation Conference, 2009, pp. 101-102.
[15] P. H. Chen et al., "A $120-\mathrm{mV}$ input, fully integrated dual-mode charge pump in $65-\mathrm{nm}$ CMOS for thermoelectric energy harvester," in 17th Asia and South Pacific Design Automation Conference, 2012, pp. 469-470.
[16] I. Lee et al., "A ripple voltage sensing MPPT circuit for ultra-low power microsystems," in 2013 Symposium on VLSI Circuits, 2013, pp. C228-C229.
[17] S. Bang et al., "A fully integrated switched-capacitor based PMU with adaptive energy harvesting technique for ultra-low power sensing applications," in 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013, pp. 709-712.
[18] D. Somasekhar et al., "Multi-Phase 1 GHz Voltage Doubler Charge Pump in 32 nm Logic Process," IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 751-758, Apr. 2010.
[19] L. Chang, R. K. Montoye, B. L. Ji, A. J. Weger, K. G. Stawiasz, and R. H. Dennard, "A fully-integrated switched-capacitor $2 \# x 2236 ; 1$ voltage converter with regulation capability and $90 \%$ efficiency at $2.3 \mathrm{~A} / \mathrm{mm} 2$," in 2010 Symposium on VLSI Circuits, 2010, pp. 55-56.
[20] T. V. Breussegem and M. Steyaert, "A $82 \%$ efficiency $0.5 \%$ ripple 16 -phase fully integrated capacitive voltage doubler," in 2009 Symposium on VLSI Circuits, 2009, pp. 198199.
[21] I. Doms, P. Merken, R. Mertens, and C. V. Hoof, "Integrated capacitive powermanagement circuit for thermal harvesters with output power 10 to 1000 \#x00B5;W," in 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, 2009, p. 300-301,301a.
[22] S. Bang, A. Wang, B. Giridhar, D. Blaauw, and D. Sylvester, "A fully integrated successive-approximation switched-capacitor DC-DC converter with 31 mV output voltage resolution," in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, 2013, pp. 370-371.
[23] L. G. Salem and P. P. Mercier, "4.6 An 85\%-efficiency fully integrated 15 -ratio recursive switched-capacitor DC-DC converter with 0.1-to-2.2V output voltage range," in 2014 IEEE

International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 88-89.
[24] T. M. Andersen et al., "20.3 A feedforward controlled on-chip switched-capacitor voltage regulator delivering 10W in 32nm SOI CMOS," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 1-3.
[25] T. M. Andersen et al., "A $4.6 \mathrm{~W} / \mathrm{mm} 2$ power density $86 \%$ efficiency on-chip switched capacitor DC-DC converter in 32 nm SOI CMOS," in 2013 Twenty-Eighth Annual IEEE Applied Power Electronics Conference and Exposition (APEC), 2013, pp. 692-699.
[26] W. Jung et al., "An Ultra-Low Power Fully Integrated Energy Harvester Based on SelfOscillating Switched-Capacitor Voltage Doubler," IEEE J. Solid-State Circuits, vol. 49, no. 12, pp. 2800-2811, Dec. 2014.
[27] H. P. Le, J. Crossley, S. R. Sanders, and E. Alon, "A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering $0.19 \mathrm{~W} / \mathrm{mm} 2$ at $73 \%$ efficiency," in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, 2013, pp. 372-373.
[28] J. Jiang, Y. Lu, C. Huang, W. H. Ki, and P. K. T. Mok, "20.5 A 2-/3-phase fully integrated switched-capacitor DC-DC converter in bulk CMOS for energy-efficient digital circuits with 14\% efficiency improvement," in 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2015, pp. 1-3.
[29] L. G. Salem and P. P. Mercier, "A Recursive Switched-Capacitor DC-DC Converter Achieving Ratios With High Efficiency Over a Wide Output Voltage Range," IEEE J. SolidState Circuits, vol. 49, no. 12, pp. 2773-2787, Dec. 2014.
[30] L. G. Salem and P. P. Mercier, "A battery-connected 24-ratio switched capacitor PMIC achieving 95.5\%-efficiency," in 2015 Symposium on VLSI Circuits (VLSI Circuits), 2015, pp. C340-C341.
[31] P. Cong, N. Chaimanonart, W. H. Ko, and D. J. Young, "A Wireless and Batteryless 10Bit Implantable Blood Pressure Sensing Microsystem With Adaptive RF Powering for RealTime Laboratory Mice Monitoring," IEEE J. Solid-State Circuits, vol. 44, no. 12, pp. 36313644, Dec. 2009.
[32] H. Ha, D. Sylvester, D. Blaauw, and J. Y. Sim, "12.6 A 160nW 63.9fJ/conversion-step capacitance-to-digital converter for ultra-low-power wireless sensor nodes," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 220-221.
[33] S. Oh, W. Jung, K. Yang, D. Blaauw, and D. Sylvester, " 15.4 b incremental sigma-delta capacitance-to-digital converter with zoom-in 9b asynchronous SAR," in 2014 Symposium on VLSI Circuits Digest of Technical Papers, 2014, pp. 1-2.
[34] M. H. Ghaed et al., "Circuits for a Cubic-Millimeter Energy-Autonomous Wireless Intraocular Pressure Monitor," IEEE Trans. Circuits Syst. Regul. Pap., vol. 60, no. 12, pp. 3152-3162, Dec. 2013.
[35] Z. Tan, R. Daamen, A. Humbert, Y. V. Ponomarev, Y. Chae, and M. A. P. Pertijs, "A 1.2V 8.3-nJ CMOS Humidity Sensor for RFID Applications," IEEE J. Solid-State Circuits, vol. 48, no. 10, pp. 2469-2477, Oct. 2013.
[36] Z. Tan, S. H. Shalmany, G. C. M. Meijer, and M. A. P. Pertijs, "An Energy-Efficient 15Bit Capacitive-Sensor Interface Based on Period Modulation," IEEE J. Solid-State Circuits, vol. 47, no. 7, pp. 1703-1711, Jul. 2012.
[37] T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto, and O. Watanabe, "A current-mode latch sense amplifier and a static power saving input buffer for low-power architecture," in 1992 Symposium on VLSI Circuits Digest of Technical Papers, 1992, pp. 28-29.
[38] Y.-T. Wang and B. Razavi, "An 8-bit 150-MHz CMOS A/D converter," IEEE J. SolidState Circuits, vol. 35, no. 3, pp. 308-317, Mar. 2000.
[39] M. van Elzakker, E. van Tuijl, P. Geraedts, D. Schinkel, E. A. M. Klumperink, and B. Nauta, "A 10-bit Charge-Redistribution ADC Consuming 1.9 W at $1 \mathrm{MS} / \mathrm{s}$," IEEE J. SolidState Circuits, vol. 45, no. 5, pp. 1007-1015, May 2010.
[40] P. Harpe, E. Cantatore, and A. v Roermund, "A 2.2/2.7fJ/conversion-step 10/12b 40kS/s SAR ADC with Data-Driven Noise Reduction," in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, 2013, pp. 270-271.
[41] P. Harpe, E. Cantatore, and A. van Roermund, "11.1 An oversampled 12/14b SAR ADC with noise reduction and linearity enhancements achieving up to 79.1dB SNDR," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 194-195.
[42] S. K. Lee, S. J. Park, H. J. Park, and J. Y. Sim, "A 21 fJ/Conversion-Step 100 kS/s 10-bit ADC With a Low-Noise Time-Domain Comparator for Low-Power Sensor Interface," IEEE J. Solid-State Circuits, vol. 46, no. 3, pp. 651-659, Mar. 2011.
[43] M. Shim et al., "An oscillator collapse-based comparator with application in a 74.1 dB SNDR, 20KS/s 15b SAR ADC," in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), 2016, pp. 1-2.
[44] M. Ahmadi and W. Namgoong, "Comparator Power Minimization Analysis for SAR ADC Using Multiple Comparators," IEEE Trans. Circuits Syst. Regul. Pap., vol. 62, no. 10, pp. 2369-2379, Oct. 2015.
[45] H. Y. Tai, Y. S. Hu, H. W. Chen, and H. S. Chen, "11.2 A 0.85fJ/conversion-step 10b $200 \mathrm{kS} / \mathrm{s}$ subranging SAR ADC in 40 nm CMOS," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 196-197.
[46] C. C. Liu, "27.4 A 0.35mW 12b 100MS/s SAR-assisted digital slope ADC in 28 nm CMOS," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 462-463.
[47] K. Yang, Q. Dong, D. Blaauw, and D. Sylvester, "14.2 A physically unclonable function with BER \#x003C;10 \#x2212;8 for robust chip authentication using oscillator collapse in 40 nm CMOS," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 1-3.
[48] A. A. Abidi, "Phase Noise and Jitter in CMOS Ring Oscillators," IEEE J. Solid-State Circuits, vol. 41, no. 8, pp. 1803-1816, Aug. 2006.
[49] H. Risken and J. H. Eberly, "The Fokker-Planck equation, methods of solution and applications," J. Opt. Soc. Am. B Opt. Phys., vol. 2, p. 508, Mar. 1985.
[50] P. Nuzzo, F. D. Bernardinis, P. Terreni, and G. V. der Plas, "Noise Analysis of Regenerative Comparators for Reconfigurable ADC Architectures," IEEE Trans. Circuits Syst. Regul. Pap., vol. 55, no. 6, pp. 1441-1454, Jul. 2008.
[51] M. Liu, P. Harpe, R. van Dommele, and A. van Roermund, "15.4 A 0.8V 10b 80kS/s SAR ADC with duty-cycled reference generation," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 1-3.
[52] B. P. Ginsburg and A. P. Chandrakasan, "500-MS/s 5-bit ADC in $65-\mathrm{nm}$ CMOS With Split Capacitor Array DAC," IEEE J. Solid-State Circuits, vol. 42, no. 4, pp. 739-747, Apr. 2007.
[53] Y. Chen et al., "Split capacitor DAC mismatch calibration in successive approximation ADC," in 2009 IEEE Custom Integrated Circuits Conference, 2009, pp. 279-282.
[54] J. Y. Um, Y. J. Kim, E. W. Song, J. Y. Sim, and H. J. Park, "A Digital-Domain Calibration of Split-Capacitor DAC for a Differential SAR ADC Without Additional Analog Circuits," IEEE Trans. Circuits Syst. Regul. Pap., vol. 60, no. 11, pp. 2845-2856, Nov. 2013.
[55] Y. Satoh, H. Kobayashi, T. Miyaba, and S. Kousai, "A 2.9mW, +/ \#x2212; 85ppm accuracy reference clock generator based on RC oscillator with on-chip temperature calibration," in 2014 Symposium on VLSI Circuits Digest of Technical Papers, 2014, pp. 1-2.
[56] J. Lim, K. Choi, H. Kim, T. Jackson, and D. Kenny, "Miniature Oven Controlled Crystal Oscillator (OCXO) on a CMOS Chip," in 2006 IEEE International Frequency Control Symposium and Exposition, 2006, pp. 401-404.
[57] H. Li et al., "22.6 A 25Gb/s 4.4V-swing AC-coupled Si-photonic microring transmitter with 2-tap asymmetric FFE and dynamic thermal tuning in 65nm CMOS," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. $1-3$.
[58] U. Sönmez, F. Sebastiano, and K. A. A. Makinwa, "11.4 1650 \#x00B5;m2 thermaldiffusivity sensors with inaccuracies down to $\# x 00 \mathrm{~B} 1 ; 0.75$ \#x00B0;C in 40 nm CMOS," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 206-207.


[^0]:    N/R: Not reported
    ${ }^{1}$ Estimated number from the paper

