## CMOS mm-Wave Digital Beamformer Receiver with Parallelized Continuous-Time Band-Pass Delta-Sigma ADCs

by

Rundao Lu

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in the University of Michigan 2021

**Doctoral Committee:** 

Professor Michael P. Flynn, Chair Professor Zhong He Professor David D. Wentzloff Associate Professor Zhengya Zhang Rundao Lu

lurundao@umich.edu

ORCID iD: 0000-0002-4019-2145

© Rundao Lu 2021

## Acknowledgments

There are many individuals I would like to thank during my past five years in Michigan.

First, I would like to express my deepest thank to my advisor, Prof. Flynn. I would not have made all the success without his support. Working with Mike is enjoyable, just like magic; his passion and mentorship keep me energized and hardworking. With his open-minded guidance, we do interesting, exciting projects meanwhile making significant progress every week. I am surprised at what we have achieved when we look back.

I would also like to thank all the group members and friends in MICL, and I appreciate their friendship and advice. Special thanks to Sunmin Jang, who taught me much basic knowledge about the digital beamformer; John Bell, who helped me understand the underlying math in delta-sigma loops; Daniel Weyer for helping with the PLL design and his jokes; Fred Buhler and Christine Weston for helping with the over-the-air measurement; Justin Correll, both a colleague and a friend, for all the lunches we had together and the starry night at Peach Mountain Observatory; Seunjong Lee, Taewook Kang, and Seungheun Song for their daily joking and delicious barbecue; Xing Chen, Zhehong Wang, Qirui Zhang, and Yuchen Gu for all the hot pots and road trips we had.

I want to give my sincere thanks to Daniel Lambalot and Ravi Chekuri from Bayside Design, Inc. for their help with the LTCC substrate design; Akihiro Tamura-san and his team from Kyocera International, Inc. for their generous technical support and fabricating the LTCC substrate.

Finally, I want to thank my beloved dad and mom for unconditionally supporting me.

ii

# **Table of Contents**

| Acknowledgments                                                         | ii  |
|-------------------------------------------------------------------------|-----|
| List of Tables                                                          | iv  |
| List of Figures                                                         | v   |
| List of Abbreviations                                                   | ix  |
| Abstract                                                                | xii |
| Chapter 1. Introduction                                                 | 1   |
| 1.1. Exploiting mm-wave bands                                           | 1   |
| 1.2. Digital beamforming and challenges                                 | 3   |
| 1.3. Continuous-Time Band-Pass Delta-Sigma Modulator element-ADC        | 6   |
| 1.4. Thesis contributions                                               | 6   |
| Chapter 2. Parallelized Continuous-Time Band-Pass Delta-Sigma Modulator | 9   |
| 2.1. Parallelized sub-ADCs                                              | 10  |
| 2.2. Multi-phase-sampling                                               | 11  |
| 2.2.1. Harmonic suppression                                             | 12  |
| 2.2.2. Clock jitter noise suppression                                   | 14  |
| 2.3. Implementation                                                     | 16  |
| 2.3.1. Modulator architecture                                           | 16  |
| 2.3.2. CTBPDSM loop design                                              | 17  |
| 2.3.3. Circuit blocks                                                   | 21  |
| 2.4. Measurements                                                       |     |
| Chapter 3. Prototype-I: A 16-Element 1GHz IF Digital Beamformer         |     |
| 3.1. System architecture                                                | 31  |
| 3.2. Digital Bit-Stream Processing (BSP)                                |     |
| 2.2.1 Implementation                                                    | 33  |

| 3.3. Measurements                                                                 |    |
|-----------------------------------------------------------------------------------|----|
| 3.3.1. Power spectra                                                              |    |
| 3.3.2. QAM constellation measurements                                             |    |
| 3.3.3. Measured Beampatterns                                                      |    |
| 3.4. Conclusion                                                                   | 40 |
| Chapter 4. Prototype-II: A 4×4 Element Fully-Integrated 28GHz Digital Beamformer  | 41 |
| 4.1. System architecture                                                          | 42 |
| 4.2. Implementation                                                               | 45 |
| 4.2.1. Mm-Wave frontend                                                           | 45 |
| 4.2.2. Phase-locked loop (PLL) and LO distribution                                | 51 |
| 4.2.3. 4× RX slice bank                                                           | 54 |
| 4.3. Measurements                                                                 | 57 |
| Chapter 5. Frequency-Interleaving Continuous-Time Band-Pass Delta-Sigma Modulator | 65 |
| 5.1. System architecture                                                          | 66 |
| 5.2. Implementation                                                               | 67 |
| 5.2.1. Modulator architecture                                                     | 67 |
| 5.2.2. Resonator design                                                           | 69 |
| 5.3. Measurements                                                                 | 71 |
| Chapter 6. Future Work: Tiled System                                              | 75 |
| Chapter 7. Conclusion                                                             | 77 |
| Appendix A. Calculation method for parametrized DAC coefficients                  | 79 |
| A.1. Parametrized rectangular DAC pulse-shape                                     | 79 |
| A.2. Arbitrary DAC pulse-shape                                                    |    |
| Appendix B. Center-frequency offset of the resonator with finite op-amp GBW       | 85 |
| Bibliography                                                                      |    |

## List of Tables

| Table 1: 5G mm-wave bands in different regions                    | 1  |
|-------------------------------------------------------------------|----|
| Table 2: State-of-the-art integrated digital beamformers          | 4  |
| Table 3: State-of-the-art integrated digital beamformers          | 10 |
| Table 4: DAC coefficients                                         | 18 |
| Table 5: RC values and tuning steps                               | 25 |
| Table 6: Measured QAM constellations                              | 38 |
| Table 7: Simulated performance of the mm-wave frontend (with VGA) | 50 |
| Table 8: Performance summary and comparison                       | 64 |
| Table 9: DAC coefficients                                         | 68 |
| Table 10: Nominal RC values of the single-op-amp resonators       | 70 |
| Table 11: State-of-the-art GHz DSMs with BW>100MHz                | 74 |

# List of Figures

| Figure 1-1. Atmospheric absorption plot                                                        |
|------------------------------------------------------------------------------------------------|
| Figure 1-2. A possible beamforming future use case in an urban area                            |
| Figure 1-3. (a) Analog beamforming, (b) digital beamforming, and (c) hybrid beamforming 3      |
| Figure 1-4. Array SNDR versus array size for an array with an element SFDR of 60dB (left) and  |
| an array with an element SFDR of 70dB (right)                                                  |
| Figure 2-1. Four parallelized sub-ADCs per-element sample the 1GHz IF input 10                 |
| Figure 2-2. Frequency response of the equivalent 4-tap FIR filter resulting from multi-phase   |
| sampling of the input                                                                          |
| Figure 2-3. Mathematical explanation of how multi-phase-sampling suppresses harmonics 13       |
| Figure 2-4. Behavioral simulation comparison of single-ADC, single-sampling 4× sub-ADC and     |
| multi-phase-sampling 4× sub-ADC                                                                |
| Figure 2-5. Clock jitter sensitivity comparison between single-phase sampling and multi-phase- |
| sampling for a 4× sub-ADC array                                                                |
| Figure 2-6. Multi-phase-sampling CTBPDSM sub-ADC array                                         |
| Figure 2-7. PZ map of the NTF and STF 17                                                       |
| Figure 2-8. NTF/STF PZ map with the variation of the duty-cycle of 1 <sup>st</sup> HZ DAC      |
| Figure 2-9 Quantizer output code distribution at different duty-cycles                         |
| Figure 2-10. Behavioral simulations with different duty-cycles for the 1st HZ DAC 20           |
| Figure 2-11. Schematic of the single-op-amp resonator                                          |

| Figure 2-12. Effect of op-amp bandwidth on NTF for a DC gain of 50dB and nominal 1GHz center                   |
|----------------------------------------------------------------------------------------------------------------|
| frequency                                                                                                      |
| Figure 2-13. Schematic of a single comparator of the 5-level quantizer                                         |
| Figure 2-14. Schematic of the DLL                                                                              |
| Figure 2-15. Duty-cycle controller for first-stage HZ DAC                                                      |
| Figure 2-16. Die photo and layout of the CTBPDSM sub-ADC array                                                 |
| Figure 2-17. Measured power spectra of four sub-ADCs and combined 4× sub-ADC array 29                          |
| Figure 2-18. Measured power spectra of the CTBPDSM for high and low 1 <sup>st</sup> HZ DAC duty-cycles         |
|                                                                                                                |
| Figure 3-1. System architecture of the IF digital beamformer with BSP and multi-phase-sampling                 |
| sub-ADC array                                                                                                  |
| Figure 3-2. Bit-Stream Processing AND gated pre-adder and interleaver                                          |
| Figure 3-3. Bit-Stream Processing (BSP) Digital Down-Conversion (DDC)                                          |
| Figure 3-4. Bit-Stream Processing (BSP) Complex Weight Multiplier (CWM)                                        |
| Figure 3-5. 4 <sup>th</sup> -order Cascaded Integrator-Comb (CIC) decimation filter with stage truncation . 35 |
| Figure 3-6. Die photo and layout of the element 4× sub-ADC array                                               |
| Figure 3-7. Testing Setup                                                                                      |
| Figure 3-8. Measured power spectra                                                                             |
| Figure 3-9. Measured beampatterns overlaid on simulated beampatterns                                           |
| Figure 4-1. System architecture of the fully-integrated 28GHz digital beamformer with BSP and                  |
| multi-phase-sampling sub-ADC array                                                                             |
| Figure 4-2. Aperture-coupled microstrip patch antenna array and mm-wave feed lines in the LTCC                 |
| substrate                                                                                                      |

| Figure 4-3. RX signal chain with the bit-stream processor (BSP)                               | 45 |
|-----------------------------------------------------------------------------------------------|----|
| Figure 4-4. Block diagram of the mm-wave frontend and ADC array                               | 45 |
| Figure 4-5. Single-ended to pseudo-differential low noise amplifier (LNA)                     | 46 |
| Figure 4-6. Double-balanced passive mixer                                                     | 47 |
| Figure 4-7. Schematic of the variable gain amplifier (VGA)                                    | 48 |
| Figure 4-8. Schematic of the constant transconductance bias                                   | 49 |
| Figure 4-9. Block diagram of the PLL                                                          | 52 |
| Figure 4-10. Schematic and simulated phase noise of the LC-tank VCO                           | 52 |
| Figure 4-11. Schematic and layout of the 1-to-2 LO buffer                                     | 53 |
| Figure 4-12. Metal routing of the 1-to-4 buffer tree                                          | 53 |
| Figure 4-13. Block diagram of the 4× RX slice bank                                            | 54 |
| Figure 4-14. Bump assignment                                                                  | 55 |
| Figure 4-15. The layout of a 4× RX slice bank                                                 | 56 |
| Figure 4-16. Die photo and layout of the 4× RX slice                                          | 57 |
| Figure 4-17. 28GHz anechoic chamber test setup                                                | 58 |
| Figure 4-18. Measured VSWR of the patch antennas for the $4 \times RX$ slice on the west side | 59 |
| Figure 4-19. Measured 3D beam-patterns                                                        | 60 |
| Figure 4-20. Measured QAM4 constellation diagram                                              | 61 |
| Figure 4-21. Test setup to measure the noise figure (NF)                                      | 62 |
| Figure 4-22. Power breakdown of a single RX slice                                             | 63 |
| Figure 5-1. System architecture of the frequency-interleaving CT delta-sigma modulator        | 66 |
| Figure 5-2. Block diagram of the sub-modulators                                               | 67 |
| Figure 5-3. Schematic of the single-op-amp resonator                                          | 69 |

| Figure 5-4. Simulated STF with and w/o zero-insertion $R_z$                         | 69               |
|-------------------------------------------------------------------------------------|------------------|
| Figure 5-5. Resonator gain with and w/o the Q-enhancement capacitor                 |                  |
| Figure 5-6. Op-amp gain with and w/o the neutralization capacitor                   |                  |
| Figure 5-7. Die photo and layout of the high/low band sub-modulator                 | 71               |
| Figure 5-8. Test boards                                                             | 71               |
| Figure 5-9. Full-chip block diagram                                                 | 72               |
| Figure 5-10. Measured 8192-point power spectra for each sub-modulator with FIR      | filter (top) and |
| overall ADC power spectrum (bottom)                                                 | 72               |
| Figure 5-11. Measured 8192-point -9dBFS two-tone (1495MHz and 1505MHz) po           | ower spectrum    |
|                                                                                     | 73               |
| Figure 5-12. Measured STF with FIR filtering                                        | 73               |
| Figure 5-13. Measured STF with FIR filtering                                        | 73               |
| Figure 5-14. Power/Fs and power/BW trends for GHz DSMs                              | 74               |
| Figure 6-1. A 4× 16-element titled system for 64-element digital beamforming        | 75               |
| Figure A-1. 2 <sup>nd</sup> -order and 4 <sup>th</sup> -order continuous-time loops | 79               |
| Figure A-2. Arbitrary DAC pulse-shape                                               |                  |

## List of Abbreviations

| ADC     | Analog-to-digital converter                     |  |  |
|---------|-------------------------------------------------|--|--|
| BSP     | Bit-stream processing                           |  |  |
| BW      | Bandwidth                                       |  |  |
| CA      | Channel aggregation                             |  |  |
| CMOS    | Complementary metal-oxide-semiconductor         |  |  |
| CIC     | Cascaded-integrator-comb                        |  |  |
| СТ      | Continuous-time                                 |  |  |
| CTBPDSM | Continuous-time band-pass Delta-Sigma modulator |  |  |
| CWM     | Complex weight multiplication                   |  |  |
| DAC     | Digital-to-analog converter                     |  |  |
| DDC     | Digital down-conversion                         |  |  |
| DDS     | Direct digital synthesis                        |  |  |
| DFF     | D flip-flop                                     |  |  |
| DLL     | Delay locked loop                               |  |  |
| DR      | Dynamic range                                   |  |  |
| DT      | Discrete-time                                   |  |  |
| ELD     | Excess-loop-delay                               |  |  |
| ENOB    | Effective number of bits                        |  |  |
| ESD     | Electrostatic discharge                         |  |  |

| EVM  | Error vector magnitude           |  |  |
|------|----------------------------------|--|--|
| FI   | Frequency-interleaving           |  |  |
| FIR  | Finite impulse response          |  |  |
| FPGA | Field-programmable gate array    |  |  |
| FSM  | Finite state machine             |  |  |
| FSPL | Free space path loss             |  |  |
| FoM  | Figure of merit                  |  |  |
| HZ   | Half-delay return-to-zero        |  |  |
| IC   | Integrated circuit               |  |  |
| IF   | Intermediate frequency           |  |  |
| I/O  | Input/output                     |  |  |
| LNA  | Low noise amplifier              |  |  |
| LO   | Local oscillator                 |  |  |
| LTCC | Low temperature co-fired ceramic |  |  |
| MXR  | Mixer                            |  |  |
| NF   | Noise figure                     |  |  |
| NTF  | Noise transfer function          |  |  |
| РСВ  | Printed circuit board            |  |  |
| PEX  | Parasitic extraction             |  |  |
| PLL  | Phase-locked loop                |  |  |
| QAM  | Quadrature amplitude modulation  |  |  |
| QFN  | Quad-flat no-leads               |  |  |
| RDL  | Re-distribution layer            |  |  |

| RF   | Radio frequency             |  |
|------|-----------------------------|--|
| RZ   | Return-to-zero              |  |
| STF  | Signal transfer function    |  |
| SFDR | Spurious-free dynamic range |  |
| SNDR | Signal to noise             |  |
| SPI  | Serial peripheral interface |  |
| TI   | Time-interleaving           |  |
| VGA  | Variable gain amplifier     |  |

## Abstract

Large-scale beamforming is an essential technology for emerging wireless communication systems. Beamforming mitigates the significant path loss at the mm-wave frequencies, enables spatial filtering, multiplexing, and substantially relaxes the TX power and RX sensitivity requirements. Although there has been significant progress on analog mm-wave beamforming, there are relatively few works on integrated digital-beamforming systems. Digital beamforming offers superior beam-pattern accuracy, inherent flexibility, fast steering, and the ability to generate multiple, simultaneous beams without duplicating frontend circuitry. However, there are several significant challenges to implementing a practical mm-wave digital beamforming system: 1) element-ADC performance is a performance bottleneck, especially the linearity; 2) sensitive mm-wave and analog signal lines are susceptible to local crosstalk from high-speed, high-swing digital buses; 3) enormous raw data rates demand high-speed and high-throughput digital processing; and 4) power and area are strict design constraints and therefore low-power and compact receiver slices are essential. In this thesis, we address these challenges.

First, we introduce the concept of a parallelized ADC using the multi-phase-sampling technique. The parallel elemental multi-phase-sampling sub-ADC array not only improves SNDR but also provides inherent FIR filtering. The measured parallel ADC SNDR improves by 7dB thanks to harmonic suppression, thermal noise averaging, and reduced jitter sensitivity.

Second, we present a prototype 16-element 1GHz IF digital beamformer with parallel element sub-ADC arrays. The accurate measured beam-patterns confirm the advantages of digital beamforming, and the measured 77dB SFDR proves the harmonic suppression from the multiphase-sampling technique.

Third, we report a 16-element fully integrated 28GHz digital beamformer, combined with a custom 8-layer LTCC substrate incorporating a 4×4 patch antenna array for a fully integrated 16-element single-chip 28GHz mm-wave-to-digital beamforming system. The inductor-less mm-wave frontend and 4× parallel continuous-time band-pass delta-sigma ADC arrays enable compact mm-wave-to-digital conversion. Direct ADC sampling of a high 1GHz IF facilitates single-phase mm-wave LO distribution and moves the I/Q mixing into the digital domain. Optimum bump and RX slice placement shorten both LO and mm-wave signal routing and reduce signal loss. The prototype generates four independent, simultaneous beams. Over-the-air measurements confirm accurate 3D beam-patterns, indicate a measured overall noise figure of 7dB, and QAM-4 EVM of -18dB.

Fourth, we introduce a frequency-interleaving technique to expand the element continuoustime band-pass delta-sigma modulator ADC bandwidth. The prototype 28nm CMOS chip achieves measured SNDR/ SFDR of 37dB/44dB at 300MHz BW, supporting a high input frequency of 1.5GHz while consuming only 38mW. This work demonstrates that frequency-interleaving breaks the power-bandwidth barrier of CT DSMs.

Finally, we discuss the advantages and challenges of a tiled beamforming system to support even more elements in future beamforming systems.

## **Chapter 1. Introduction**

#### 1.1. Exploiting mm-wave bands

Wireless communication systems are fast evolving to accommodate the increasing demand for high-speed, high-capacity, and low latency connections. Recent 4G LTE and sub-6G 5G (FR1) networks incorporate several new technologies, including Massive MIMO, Channel Aggregation (CA), and high order QAM to extend and better utilize the usable bandwidth below 6GHz [1]–[3]. However, the sub-6G frequency band is crowded with existing wireless protocols such as GSM, 3G (CDMA), Wi-Fi, Bluetooth, etc., limiting even higher bandwidths. As a result, we embrace the 5G mm-wave (FR2) [4], which exploits the mm-wave bands to extend the signal bandwidth further. Several mm-wave bands have been assigned in different regions for 5G communication systems, as shown in Table 1.

| Country | 5G mm-Wave Bands          |  |  |
|---------|---------------------------|--|--|
| USA     | 27.5-28.35GHz, 37-40GHz   |  |  |
| Korea   | 26.5-29.5GHz              |  |  |
| Japan   | 27.5-28.28GHz             |  |  |
| China   | 24.25-27.5GHz, 37-43.5GHz |  |  |
| EU      | 24.25-27.5GHz             |  |  |

Table 1: 5G mm-wave bands in different regions

Although mm-wave bands provide lots of bandwidth, we see several significant challenges because of millimeter-wave physical properties. First, the free-space path loss (FSPL) associated with mm-wave frequencies is much higher than for sub-6GHz. For example, regardless of the distance, the FSPL of a 28GHz mm-wave signal is ~24dB higher than for a 1900MHz LTE sub-

6G signal. Second, in addition to the increased FSPL due to the higher frequency, atmospheric absorption makes the path loss even worse. As shown in Figure 1-1 [5], at sea level, the attenuation due to atmospheric absorption is more than 0.1dB/km for 28GHz, whereas it is less than 0.005dB/km for sub-6G. For a base station that typically covers a 10km radius, the atmospheric absorption contributes another 1-2dB loss on top of the FSPL for 28GHz mm-wave. Various weather conditions such as rain and snow can significantly increase path loss [6].



Figure 1-1. Atmospheric absorption plot

Third, as the sub-6G bands use the diffraction to extend the coverage range, mm-wave bands have negligible diffraction property, resulting in a line-of-sight only propagation. Transmitting and receiving mm-wave in urban areas become very hard since buildings and trees block many directions.

To counteract the higher path loss and the line-of-sight propagation, large-scale beamforming is an essential technique in mm-wave 5G. Beamforming concentrates the signal power at desired directions to mitigate the high path loss and suppress uncorrelated noise. It also enables spatial multiplexing to allow more devices [7], [8]. Large-scale beamforming is favorable because of the increased SNR and narrower main lobe, which supports more spatial-division multiple access (SDMA) points, higher data-rates, and more users. Figure 1-2 shows a possible future use case in a crowded urban area where beamforming enables a relay drone to cover the blind spots and support SDMA.



Figure 1-2. A possible beamforming future use case in an urban area

#### **1.2. Digital beamforming and challenges**



Figure 1-3. (a) Analog beamforming, (b) digital beamforming, and (c) hybrid beamforming

There are three beamforming architectures: analog beamforming, digital beamforming, and hybrid beamforming [9]–[15]. As shown in Figure 1-3, analog beamforming uses analog phase-shifters to steer the element signals, followed by an analog combiner before the analog-to-digital conversion; digital beamforming directly digitizes the element signals and applies phase-shifting

in the digital domain; hybrid beamforming is a combination of analog beamforming and digital beamforming.

|                                | J. Jeong [9]<br>JSSC'2016 | S. Jang [10]<br>JSSC'2018 | S. Jang [11]<br>JSSC'2019 |
|--------------------------------|---------------------------|---------------------------|---------------------------|
| Technology                     | 65nm CMOS                 | 40nm CMOS                 | 40nm CMOS                 |
| Array Type                     | Phased Array              | Phased Array              | True-time Array           |
| Elements per IC                | 8                         | 16                        | 16                        |
| Beams per IC                   | 2                         | 4                         | 4                         |
| Frequency [MHz]                | 260                       | 1000                      | 1000                      |
| Bandwidth [MHz]                | 20                        | 100                       | 100                       |
| Active Area [mm <sup>2</sup> ] | 0.28                      | 0.22                      | 0.29                      |
| Total Power [mW]               | 124                       | 312                       | 453                       |

Table 2: State-of-the-art integrated digital beamformers

Table 2 summarizes state-of-the-art integrated digital beamformers. [9] is the first IC implementation of the IF-sampling bit-stream digital beamformer. This design supports a 260MHz input frequency and a 20MHz bandwidth. [10] expands the array size, bandwidth, and beams per IC of the digital bit-stream beamformer. It samples at a 1GHz IF frequency and a 100MHz bandwidth. The 16 elements generate four simultaneous beams. [11] reports the first true-time-delay digital beamforming IC. The prototype supports 16 elements, a 1GHz IF frequency, and provides a 100MHz bandwidth. The true-time array significantly mitigates beam squinting errors [16].

Compared to analog beamforming, digital beamforming has several appealing advantages for large arrays, including accurate beam-patterns, low-cost multiple simultaneous beams, flexible and re-configurable beam-patterns, and fast steering [17]. However, there are significant challenges that limit large-scale digital beamforming [18].

First, since the element-ADCs collect all the interferers and frontend harmonics in the digital beamformer, the ADC performance, especially the linearity, becomes the overall performance bottleneck. As an example, Figure 1-4 shows how the SFDR and SNR of the element ADCs affect the overall array SFDR, SNR, and SNDR for different array sizes. We see on the left side of the

figure that if the element-ADC has an SFDR of 70dB and an SNR of 60dB, then the array SNDR is limited to the element-ADC's SFDR of 70dB. We see little array-SNDR improvement as the array size increases beyond 64 elements. However, like on the right side of Figure 1-4, we see that if the SFDR of the element-ADCs increases by 10dB to 80dB, then there is a substantial SNDR improvement for array sizes beyond 64 elements. Therefore, it is vital to improve the element-ADC linearity to benefit from the beamforming array gain.



Figure 1-4. Array SNDR versus array size for an array with an element SFDR of 60dB (left) and an array with an element SFDR of 70dB (right)

Second, there is extensive mixed-signal routing in a fully integrated system. Making a largescale beamformer with more than 16 elements, whether analog or digital, is very difficult. Mixedsignal routing in a digital beamformer further complicates the design. The element-ADCs occupy much more area and potentially make the routing longer. Digital buses from the element-ADCs can be long and bulky, the fast clock rates and high voltage swing of the digital buses make the isolation between analog and digital signals difficult.

Third, the amount of digital data produced by per-element ADCs is tremendous, which causes significant power consumption from digital signal processing. As most state-of-the-art beamformers have fewer than 8 elements per IC [13], [19], [20], it is inevitable that in a tiled digital

beamformer system for more than 64 elements, the on-board digital I/O routing becomes a formidable challenge. These challenges bring us to our bit-stream processing (BSP) approach to mitigate the digital processing power consumption and 16-element beamforming to reduce on-board digital I/O in large-scale tiled systems.

#### 1.3. Continuous-Time Band-Pass Delta-Sigma Modulator element-ADC

The Continuous-Time Band-Pass Delta-Sigma Modulator (CTBPDSM) ADC is an attractive choice for a digital beamformer, as it features inherent anti-aliasing, compact size and delivers high speed with low power consumption [10], [11], [21], [22]. The oversampling feature of a CTBPDSM simplifies the digital bus routing due to the narrow output bit-width, and also enables the power and area efficient BSP approach. All these benefits are essential for a 16-element per IC beamformer.

Moreover, from a systematic point of view, as we discuss later, the CTBPDSM enables a high IF sampling receiver architecture providing several attractive advantages: 1) the IF sampling moves I/Q mixing to the digital domain, which is easy to implement in a standard digital design flow; 2) digital I/Q mixing removes the need for analog I/Q matching and halves the power consumption of the analog LO generation and distribution as a result of single-phase LO; and 3) IF sampling minimizes potential problems related to pulling.

### **1.4.** Thesis contributions

First, in Chapter 2, we introduce a  $4\times$  parallelized multi-phase-sampling CTBPDSM architecture for the element-ADC to circumvent the fundamental linearity bottleneck in large-scale digital beamformers. The multi-phase-sampling improves ADC HD3 by 9dB. The  $4\times$  parallelization improves ADC SNR by 7dB and relaxes the clock phase noise requirement by

5.6dB. Circuit techniques such as duty-cycle controlling of the first HZ DAC further improve CTBPDSM's loop stability and linearity.

Second, in Chapter 3, we demonstrate a 16-element four-beam digital beamformer prototype that works with a 1GHz IF and 4GS/s sampling frequency and provides a 100MHz BW. There are a total number of 64 sub-ADCs integrated in this prototype. Thanks to the parallelized multi-phase-sampling sub-ADC array, we get a measured overall SFDR of 77dB and SNDR of 56dB. High-order QAM modulation (2048QAM constellation) demonstrates a measured EVM of -39.9dB EVM, further verifying the excellent overall performance.

Third, in Chapter 4, we report the first-published 16-element fully-integrated single-chip 28GHz mm-wave digital beamformer, flip-chip mounted on an LTCC substrate with an in-package 4×4 patch antenna array. The high IF sampling architecture with an IF frequency of 1GHz facilitates a single-phase 27GHz LO distribution - four PLLs, driven by a shared 100MHz external clock generate four copies of the 27GHz LO. Each PLL is dedicated to one side of the RX slice bank, removing the need for bulky on-chip 1-to-16 transmission lines. Unique placement method and bump assignment simplify both LO and mm-wave routing. A total number of 64 CTBPDSM sub-ADCs, four for each element, minimizes the distortion and harmonics from the mm-wave frontend and the ADC. The 16-element beamformer with BSP generates four independent, simultaneous beams. Over-the-air measurements confirm accurate 3D beam-patterns, a measured mm-wave-to-digital noise figure of 7dB, and QAM-4 EVM of -18dB.

Fourth, in Chapter 5, we introduce a frequency-interleaving technique to expand the bandwidth of the per-element continuous-time band-pass delta-sigma modulator ADCs to 300MHz. The prototype ADC chip is fabricated in 28nm CMOS and occupies 0.0255mm<sup>2</sup>. The measured SNDR and SFDR are 37dB and 44dB, respectively. The prototype achieves 300MHz BW at a high input

frequency of 1.5GHz while consuming only 38mW, demonstrating that frequency-interleaving breaks the power-bandwidth barrier of CT DSMs.

## Chapter 2. Parallelized Continuous-Time Band-Pass Delta-Sigma Modulator

Digital beamforming demands high-speed, low-power, and compact-area element-ADCs. This chapter discusses the element-ADC design and introduces a parallelized ADC architecture using the multi-phase-sampling technique. Continuous-time ADCs have several advantages for digital beamforming, including small size, inherent anti-aliasing, and good energy efficiency. We introduce the multi-phase-sampling parallel approach to improve SNR/SNDR and for input filtering.

Compared to discrete-time ADC that requires an anti-aliasing filter to suppress higher frequencies from aliasing back to the band of interest, a Continuous-Time Delta-Sigma Modulator (CTDSM) ADC samples the signal within the feedback loop hence provides inherent anti-aliasing filtering [23]. We exploit this property to eliminate the bulky anti-aliasing filter, thus significantly save chip area. Compared to a discrete-time design, the op-amps in a CTDSM can work at a lower GBW since they operate without step switching activity, and this saves a lot of op-amp power and allows a higher sampling frequency [24]. In addition, CTDSM has a resistive input, which simplifies the driver design. A CTDSM can be translated to a bandpass modulator (CTBPDSM) [23] to directly digitize a high-frequency IF – this is a crucial part of our power/area efficient digital bit-stream beam processing approach [10].

Despite the advantages of the CTBPDSM, practical considerations such as the non-linearity of the feedback DACs, the finite amplifier GBW, and the clock jitter limit the SNDR of GHz input-frequency CTBPDSMs to about 40dB [10]. Higher power consumption and higher loop order could improve the performance [25]; however, these are prohibitive for high-bandwidth

beamformer array applications since the system cannot afford large copies of them. Thus, it is challenging to achieve high SNR and low distortion with a limited power and area budget.

Table 3 summarizes state-of-the-art GHz DSMs. Most state-of-the-art CT DSMs are low-pass, and there are few band-pass DSMs that achieve more than 100MHz signal bandwidth.

|                                | W. Wang [26]<br>ISSCC'2019 | S. Jang [10]<br>JSSC'2018 | S. Dey [27]<br>JSSC'2018 | S. Huang [28]<br>ISSCC'2017 | S. Wu [29]<br>ISSCC'2016 |
|--------------------------------|----------------------------|---------------------------|--------------------------|-----------------------------|--------------------------|
| Architecture                   | $CT-\Delta\Sigma$          | $CT-BP-\Delta\Sigma$      | $CT-\Delta\Sigma$        | $CT-\Delta\Sigma$           | $CT-\Delta\Sigma$        |
| Technology                     | 28nm CMOS                  | 40nm CMOS                 | 65nm CMOS                | 16nm CMOS                   | 16nm CMOS                |
| Fs [MS/s]                      | 2000                       | 4000                      | 1500                     | 2150                        | 2880                     |
| Bandwidth [MHz]                | 100                        | 100                       | 50                       | 125                         | 160                      |
| Fin_hf [MHz]                   | 18                         | 1000                      | 1                        | 40                          | 30                       |
| SNDR [dB]                      | 72.6                       | 48                        | 73.5                     | 71.9                        | 65.3                     |
| SFDR [dB]                      | 83.6                       | 66                        | 88                       | 85*                         | $70^{*}$                 |
| Active Area [mm <sup>2</sup> ] | 0.019                      | 0.011                     | 0.35                     | 0.22                        | 0.155                    |
| Power [mW]                     | 16.3                       | 20                        | 51.8                     | 54                          | 40                       |

Table 3: State-of-the-art integrated digital beamformers

\*Estimated from figures

## 2.1. Parallelized sub-ADCs



Figure 2-1. Four parallelized sub-ADCs per-element sample the 1GHz IF input

The parallel concept tackles the limitations of the CTBPDSM ADC by replacing the perelement ADCs in a conventional beamformer with multiple low-performance ADCs. As shown in Figure 2-1, four 4GS/s CTBPDSMs sample the 1GHz IF signal from the RF frontend. Parallelizing of CTBPDSM ADCs opens up three degrees of improvement: 1) using multiple parallel perelement ADCs improves SNR and reduces clock jitter sensitivity; 2) multi-phase sampling of the sub-ADCs suppresses harmonics both from the ADCs themselves and from the input; and 3) the smaller size and lower power for each sub-ADC get around the performance wall of power and delay – in the conventional approach, higher power means larger area and more parasitics, which in turn cause more delay, necessitating even higher power and so on.

#### 2.2. Multi-phase-sampling

Instead of simply using four identical sub-ADCs in parallel, we clock them with different phases of the 4GHz clock while all sub-ADCs sample the same input. As depicted in Figure 2-1, the sampling instances for the four sub-ADCs are spaced in 90-degree increments of the sampling clock. The multi-phase clock is generated from an on-chip per-element Delay-Lock-Loop (DLL). The bit-stream outputs of the CTBPDSMs are coarsely aligned by delay cells, synchronized to the 4GHz master clock by DFFs, and added together by a pre-adder. It is worth mentioning that multi-phase-sampling is fundamentally different from interleaving in a time-interleaved (TI) ADC. Multi-phase-sampling adds the outputs from the sub-ADCs; unlike the case with TI ADCs, the mismatch between sub-ADCs does not lead to signal artifacts [30].

There are several advantages to the parallel multi-phase-sampling ADCs: 1) multi-phasesampling suppresses harmonics and interferers; 2) a 6dB SNR improvement still holds for 4× sub-ADCs since the noise in the sub-ADCs is uncorrelated; 3) a further relaxation of the clock jitter requirement because of the clock-jitter-related noise is decorrelated by the different sampling instances; 4) again, the specification for each sub-ADC is relaxed so that we can keep each sub-ADC small, thus relaxing the area, power, and layout routing overheads.

## 2.2.1. Harmonic suppression

We explain the harmonic suppression of the multi-phase-sampling technique from two different perspectives: 1) equivalent digital FIR filtering and 2) mathematical analysis.

## 1) Equivalent digital FIR filtering

Intuitively, multi-phase-sampling in the analog domain combined with data synchronization and summing in the digital domain is equivalent to a 4-tap digital FIR filter. When the four sub-ADCs sample the input at a 90-degree increment, it implies an equivalent filter with a sampling rate of 16GS/s. Adding the four sub-ADCs outputs together with the same weight, the transfer function of this filter is:

$$H(z) = \sum_{k=0}^{3} z^{-k}$$
(2-1)



Figure 2-2. Frequency response of the equivalent 4-tap FIR filter resulting from multi-phase sampling of the input

Figure 2-2 shows the frequency response of the equivalent 4-tap digital FIR filter. This inherent filter has a notch at 4GHz and suppresses the 3GHz third harmonic by 9dB. The low-pass response of the multi-phase-sampling sub-ADC array also provides additional anti-aliasing, facilitating a more aggressive noise-shaping ADC architecture with less anti-aliasing but higher linearity (e.g., a feedforward continuous-time modulator). The FIR filtering is advantageous compared to using a 16GS/s 4-tap digital FIR filter combined with a single-channel 16GS/s ADC. Although both

approaches could produce the same filtering effect, the conventional method requires a 16GS/s ADC, and designing such a fast single-channel ADC is very challenging. On the other hand, with the multi-phase-sampling approach, four 4GS/s sub-ADCs enable a similar FIR filter.

2) Mathematical analysis



Figure 2-3. Mathematical explanation of how multi-phase-sampling suppresses harmonics

As illustrated in Figure 2-3, the frontend driving the sub-ADC array produces distortion. For simplicity, we only consider the 3<sup>rd</sup>-order harmonic for now. We write the distorted signal from the frontend as:

$$\sin(\omega t) - \alpha \sin(3\omega t) \tag{2-2}$$

where  $\omega$  is the input frequency. With four sub-ADCs sampling this input at phases spread over 90° increments, the sampled signals are:

$$\sin(\omega(t+3\tau/2)) - \alpha \sin(3\omega(t+3\tau/2))$$
(2-3)

$$\sin(\omega(t+\tau/2)) - \alpha \sin(3\omega(t+\tau/2))$$
(2-4)

$$\sin(\omega(t-\tau/2)) - \alpha \sin(3\omega(t-\tau/2))$$
(2-5)

$$\sin(\omega(t-3\tau/2)) - \alpha \sin(3\omega(t-3\tau/2))$$
(2-6)

where  $\tau = \pi/8\omega$  for the  $1/4 f_s$  center-frequency CTBPDSM sub-ADC. Data alignment in the digital domain does not affect the phase relationship between the signals, and thus the combined 5-bit 4GS/s digital bit-stream after the addition is simply the sum of (2-3)-(2-6):

$$2\left[\cos\frac{3\pi}{16} + \cos\frac{\pi}{16}\right]\sin(\omega t) - 2\alpha\left[\cos\frac{9\pi}{16} + \cos\frac{3\pi}{16}\right]\sin(3\omega t)$$
(2-7)

The coefficient ratio of the 3<sup>rd</sup>-order harmonic to the fundamental is:

$$\frac{\cos\frac{9\pi}{16} + \cos\frac{3\pi}{16}}{\cos\frac{3\pi}{16} + \cos\frac{\pi}{16}} = 0.35 \to -9 \text{dB}$$
(2-8)

indicating a 9dB HD3 suppression. A similar procedure can be applied to HD5 and HD7, indicating 12.6dB and 14dB suppression for HD5 and HD7, respectively. A behavior simulation shown below in Figure 2-4 confirms the improvement on both SNDR and HD3.



Figure 2-4. Behavioral simulation comparison of single-ADC, single-sampling 4× sub-ADC and multi-phase-sampling 4× sub-ADC

## 2.2.2. Clock jitter noise suppression

It is well known that in a CT sigma-delta modulator, quantizer clock jitter contributes little to the noise floor thanks to the feedback loop. Instead, the outer-loop Return-to-Zero (RZ) and Halfdelay Return-to-zero (HZ) feedback DACs dominate the jitter-induced noise floor [31], [32]. Clock jitter applied to the DACs and down-mixes the high-power out-of-band quantization noise to increase the in-band noise.

Multi-phase-sampling takes advantage of the low correlation between parallel CTBPDSMs to reduce the effect of clock jitter modulated in-band noise. The low correlation of the quantizer outputs results from the non-linear and chaotic nature of sigma-delta modulator operation. Multiphase-sampling further decorrelates the quantizer outputs among the different sub-ADCs. Simulations show that with a single-phase sampling, the quantization-noise correlation between different parallel sub-ADCs is 0.6. With multi-phase-sampling, the equivalent correlation falls to 0.35. In this way, multi-phase-sampling decorrelates the in-band noise due to the clock jitter.



Figure 2-5. Clock jitter sensitivity comparison between single-phase sampling and multi-phase-sampling for a 4× sub-ADC array

As shown in Figure 2-5, for the same  $4 \times$  sub-ADC array and the same amount of clock jitter, the simulated SNDR of multi-phase-sampling is 3-4dB better than the SNDR with single-phase

sampling. It is also worth mentioning that in Figure 2-5, HD3 for multi-phase-sampling is 9dB better than with single-phase sampling, confirming the analysis in Section 2.2.1.

## 2.3. Implementation

### 2.3.1. Modulator architecture



Figure 2-6. Multi-phase-sampling CTBPDSM sub-ADC array

Figure 2-6 shows a single slice of the multi-phase-sampling parallel CTBPDSM sub-ADC array. All the sub-ADCs are identical except that they use different phases of the 4GHz clock. An on-chip per-element Delay-Lock-Loop (DLL) generates the different clock phases. Each CTBPDSM is a cascade-of-resonator feedback (CRFB) bandpass delta-sigma modulator with a feedforward path. Each modulator has two energy-efficient single-op-amp resonators, a resistive current summer, a 5-level quantizer, as well as Return-to-Zero (RZ) and Half-delay return-to-Zero (HZ) pulse-shaped current-steering DACs [10], [21]. We add duty-cycle control of the 1<sup>st</sup> HZ DAC because, as we can see later, with proper tuning of the duty-cycle of the 1<sup>st</sup> HZ DAC, harmonics originating from DAC mismatch can be suppressed. Duty-cycle optimization also helps improve loop stability, which is critical because we eliminate one RZ DAC to save power at the cost of reduced stability.

Each sub-ADC generates a 4GS/s 3-bit output stream. All four digital outputs are digitally aligned and added together, forming a combined 4GS/s 5-bit output. To facilitate testing, we AND-gate each sub-ADC output, making it possible to observe arbitrary combinations of the sub-ADCs.

#### 2.3.2. CTBPDSM loop design

### 1) NTF/STF synthesis

The band-pass NTF is synthesized with the help of the MATLAB delta-sigma toolbox [23], [33]. The prototype discrete-time (DT) domain NTF is:



Figure 2-7. PZ map of the NTF and STF

Figure 2-7 shows the pole and zero locations for the NTF and STF. This band-pass NTF is synthesized for a moderate OSR of 20 (i.e., 100MHz BW and a 4GHz sampling frequency) and an out-of-band (OOB) gain of 1.7. The continuous-time loop is designed to match this discrete NTF. It is worth mentioning that the OOB gain for a high-speed modulator should be carefully chosen. A higher OOB gain leads to a lower in-band quantization noise but tends to make the modulator less stable. Moreover, a higher OOB gain increases the clock-jitter-induced in-band noise floor. Most of this clock jitter-related noise arises from OOB quantization noise modulating with the clock jitter in the outer-loop HZ DAC [32].

## 2) DAC coefficients

The current-steering DACs in the feedback loop are pulse-shaped to make it possible for the continuous-time loop response to match the discrete-time prototype. The details on the DAC coefficients calculation are provided in Appendix A. Table 4 summarizes the calculated DAC coefficients. It is noticeable that the coefficient of the 1<sup>st</sup> RZ DAC is very small; therefore, this DAC is eliminated to save power at the expense of degraded loop stability.

| Table 4: DA | C coefficients |
|-------------|----------------|
|-------------|----------------|

| DAC Coefficient | $k_{rz4}$  | $k_{hz4}$ | k <sub>rz2</sub> | $k_{hz2}$     |
|-----------------|------------|-----------|------------------|---------------|
| Value           | 6 <i>u</i> | -100u     | 64 <i>u</i>      | -160 <i>u</i> |

3) Variable duty-cycle 1<sup>st</sup> HZ DAC

A duty-cycle controller adjusts the first HZ DAC's duty-cycle to optimize pole/zero placement and the effective 4<sup>th</sup>-order loop gain. Due to DAC coefficients mismatch and clock duty-cycle variation, improper NTF/STF placement can degrade loop stability hence deteriorate SNR.



Figure 2-8. NTF/STF PZ map with the variation of the duty-cycle of 1<sup>st</sup> HZ DAC

The pole/zero plot in Figure 2-8 derived from Appendix A shows how the clock duty-cycle affects the NTF and STF. Duty-cycle control gives us flexibility in pole/zero placement, keeping the poles of NTF away from the unit circle hence improving loop stability. The blue arrows in

Figure 2-8 show the pole/zero movement with increasing duty-cycle, while the red arrows indicate a change with decreasing duty-cycle.

In a CTDSM, the mismatch in current-steering DACs generate harmonics. As mentioned, this is especially true for the outer-loop DACs, since their non-linearity directly refers to the input. Although increasing device sizes reduces mismatch, the larger area increases parasitics. The additional parasitics make the clock driver design harder and increase the tail-parasitic capacitance of the current sources, which at high speed degrades DAC linearity. We adjust the duty cycle of the first HZ DAC to mitigate the mismatch-introduced distortion from the DAC.

The duty-cycle affects the distribution of the output codes from the quantizer. As shown in Figure 2-9, the code distribution favors the maximum and minimum codes when the duty cycle increases. This change in distribution improves the effective linearity of the DAC.



Figure 2-9 Quantizer output code distribution at different duty-cycles

Behavioral simulations in Figure 2-10 indicate that increasing the duty-cycle to 55% reduces DAC mismatch-related harmonics by ~10dB. The increased DAC duty-cycle negatively impacts noise-shaping and causes a 1.8dB degradation in SNDR. This trade-off is worthwhile for digital beamforming since improving the SFDR is essential for large-scale beamforming



Figure 2-10. Behavioral simulations with different duty-cycles for the 1st HZ DAC

## 4) ELD compensation

For a GHz delta-sigma loop, the delay introduced by the quantizer and the current-steering DAC increases the effective loop order by introducing excess pulse shaping into the next clock cycle, which negatively affects loop stability [34]–[37]. An advantage of the  $1/4 f_s$  center frequency is that the excess loop delay (ELD) can be compensated by adding a one-period delay ( $z^{-1}$ ) to the quantizer output, and no additional loop is needed [38]. Referring to Appendix A, since the order for both sides of (A-9) is equal even with the additional delay, we only need to adjust the DAC coefficients to compensate for the ELD. However, if this delay becomes  $z^{-2}$  or the center frequency of the resonator is not  $1/4 f_s$ , then the order of the left side of (A-9) becomes higher than that of the right, meaning no solution to (A-9).

## 2.3.3. Circuit blocks

1) Single op-amp resonator



Figure 2-11. Schematic of the single-op-amp resonator

The resonator, shown in Figure 2-11, is based on the single-op-amp scheme first proposed in [39]. The single-op-amp approach is smaller and consumes less power than conventional two opamp biquadratic resonators and bulky LC-tank resonators [21]. The input resistor ( $R_{in}$ ) converts the input voltage into a current-mode signal. There are three pairs of RC networks in this resonator: 1) negative feedback parallel  $R_1C_1$ ; 2) positive feedback series  $R_2C_2$ ; and 3) output feedthrough series  $R_oC_o$ . The feedback RC networks are tunable to compensate for process mismatch and finite op-amp GBW. In the next sub-section, we provide a convenient way of estimating the center frequency in the presence of finite op-amp GBW.

## a) Ideal single-op-amp resonator

We first analyze an ideal op-amp case to show the importance of matching the RC networks. Mismatch both changes the center frequency and degrades the quality factor of the resonator. The
op-amp input is an ideal virtual ground, and we also assume that the resonator output feeds to the virtual-ground input of the following stage. Applying KCL for the input and output currents, we get:

$$i_{ip} = i_{in} = i_{1p} + i_{2p} = \frac{v_o}{Z_1} - \frac{v_o}{Z_2}$$
(2-10)

$$i_{op} = i_{on} = i_{2p} - i_{1p} = \frac{V_o}{Z_o}$$
(2-11)

The current-mode transfer function is then:

$$H_{r}(s) = \frac{i_{op}}{i_{ip}} = \frac{v_{o}}{Z_{o}} \frac{1}{i_{1} + i_{2}} = \frac{1}{Z_{o}} \frac{1}{\frac{1}{Z_{1}} - \frac{1}{Z_{2}}}$$
(2-12)

Given:

$$Z_{n} = \begin{cases} \frac{1}{sC_{n}} \left\| R_{n} = \frac{R_{n}}{1 + sR_{n}C_{n}} = \frac{R_{n}}{1 + s\tau_{n}}, & n = 1 \\ \frac{1}{sC_{n}} + R_{n}, & n = 2, o \end{cases}$$
(2-13)

where the time constants of the RC networks are:

$$H_{r}(s) = \frac{R_{1}}{R_{o}} \frac{1 + \tau_{2}s}{1 + \tau_{o}s} \frac{\tau_{0}s}{1 + (\tau_{1} + \tau_{2} - R_{1}C_{2})s + \tau_{1}\tau_{2}s^{2}}$$
(2-14)

If we cancel the pole-zero pair related to  $R_2C_2$  and  $R_0C_0$  in (2-14), then the quality factor Q of the single-op-amp resonator is:

$$Q = \frac{\tau_1 \tau_2}{\tau_0 \left(\tau_1 + \tau_2 - R_1 C_2\right)}$$
(2-15)

We see that the relative values of R and C determine the Q factor. By choosing appropriate values for the RC pairs, the Q factor can be made infinite. Following equation (2-16) gives one solution:

$$\begin{array}{ll} R_{\rm o} = 2R & R_{\rm I} = R & R_{\rm 2} = R/2 \\ C_{\rm o} = C/2 & C_{\rm I} = C & C_{\rm 2} = 2C \end{array} \tag{2-16}$$

With these RC values, (2-14) can be simplified to:

$$H_{r}(s) = \frac{0.5RCs}{1 + (RC)^{2} s^{2}}$$
(2-17)

We see that only the RC network and its matching determine the center frequency of the resonator for an ideal op-amp.

#### b) Finite GBW single-op-amp resonator

In the real case, finite op-amp GBW degrades the resonator Q and decreases the center frequency, complicating resonator tuning; thus, it is important to evaluate the impacts of the finite op-amp GBW. The objective of the analysis is to provide a concise equation for estimating the center frequency. For simplicity, we consider a finite GBW op-amp with a single pole. The transfer function of the op-amp in terms of GBW and DC gain  $A_0$  can be written as:

$$A(s) = \frac{GBW \cdot A_0}{A_0 s + GBW}$$
(2-18)

While the input current and output current still follow KCL, the non-ideal op-amp input is no longer a virtual ground. Considering the input and output voltages of the op-amp shown in Figure 2-11, the KCL equation becomes:

$$i_{ip} = i_{in} = i_{1p} + i_{2p} = \frac{v_i + v_o}{Z_1} - \frac{v_i - v_o}{Z_2}$$
(2-19)

$$i_{op} = i_{on} = i_2 - i_1 = \frac{v_o}{Z_o}$$
 (2-20)

Hence the current mode transfer function is then:

$$H_{r}(s) = \frac{1}{Z_{0}} \frac{1}{\frac{1}{A(s)} + 1} + \frac{1}{\frac{A(s)}{Z_{1}} - 1}$$
(2-21)

Substituting (2-13) and (2-18) into (2-21) and collecting terms, we get the following resonator transfer function:

$$H_{r}(s) = \frac{R_{1}}{R_{o}} \frac{(\tau_{2}s+1)}{(\tau_{o}s+1)} \frac{\tau_{o}s}{[\tau_{1}\tau_{2}s^{2} + (\tau_{1}+\tau_{2}-R_{1}C_{2})s+1] + \frac{1}{A(s)} [\tau_{1}\tau_{2}s^{2} + (\tau_{1}+\tau_{2}+R_{1}C_{2})s+1]}$$
(2-22)

We notice the non-ideal term in the denominator related to the op-amp gain. Again, assuming ideal RC matching and using the same RC values as is in (2-16), the simplified transfer function is:

$$H_r(s) = \frac{0.5\omega_0 s}{\left(s^2 + \omega_0^2\right) + \frac{1}{A(s)}\left(s^2 + 4\omega_0 s + \omega_0^2\right)}$$
(2-23)

where  $\omega_0 = 1/\tau_0 = 1/RC$ . The center frequency of the resonator can be estimated based on (2-23), and it is found to be:

$$\omega_{center} = \frac{\sqrt{\left(\frac{4\omega_0}{A_0} + \frac{\omega_0^2}{GBW}\right)^2 + 4\omega_0^2 \left(1 + \frac{4\omega_0}{GBW}\right)}}{2\left(1 + \frac{4\omega_0}{GBW}\right)} \approx \frac{\omega_0}{\sqrt{1 + \frac{4\omega_0}{GBW}}}$$
(2-24)

For an on-chip op-amp optimized for high-frequency operation with a DC gain of 50dB, Figure 2-12 below shows the effect of finite op-amp GBW on NTF. For a practical op-amp GBW of approximately  $10\omega_0$ , the center frequency of the resonator is only  $0.845\omega_0$ , thus an RC tuning range of more than 15% is needed to account for the finite GBW of the op-amp, alone. In practice, an even larger tuning range is needed to compensate for process mismatch.



Figure 2-12. Effect of op-amp bandwidth on NTF for a DC gain of 50dB and nominal 1GHz center frequency The RC values and the tuning-step sizes are listed in Table 5 below:

| $\mathcal{O}$ | 1 |  |  |
|---------------|---|--|--|
|               |   |  |  |
|               |   |  |  |

| Table 5: | RC values | s and tuning s | teps    |
|----------|-----------|----------------|---------|
|          | $R_{I}$   | $R_2$          | $C_{I}$ |

|              | $R_1$ | $R_2$ | $C_1$       | $C_2$      |
|--------------|-------|-------|-------------|------------|
| Nominal      | 960Ω  | 440Ω  | 130fF       | 267fF      |
| Tuning Step  | N/A   | N/A   | 2.18fF/step | 5.6fF/step |
| Total Tuning | N/A   | N/A   | 17.5fF      | 45fF       |

The finite DC gain of the op-amp also affects noise shaping. From simulations, for a two-pole op-amp with a phase margin of 60°, a DC gain of more than 50dB is sufficient to support effective noise shaping.

## 2) Quantizer

The quantizer is a 5-level flash ADC with four comparators. Figure 2-13 shows a schematic of a single comparator. A dynamic comparator is fast and has a low rate of metastability [40]. The trim-current DAC cancels input offset. The calibration bits are obtained with a one-time calibration by first shorting the inputs to common-mode voltage and then collecting the outputs and adjusting the offset bits. These steps run recursively in an FSM triggered by the SPI interface.



Figure 2-13. Schematic of a single comparator of the 5-level quantizer

3) Delay locked loop



Figure 2-14. Schematic of the DLL

A schematic of DLL is shown in Figure 2-14 [41]. It is an analog DLL with a 1<sup>st</sup>-order loop and a delay chain consisting of several delay cells. V\_CTRL controls each delay cell, as shown in the lower left of Figure 2-14. The DLL locks at the 4GHz master clock, and CLK1, CLK2, CLK3, and CLK4 are taped at the appropriate points of the delay chain to provide a 90-degree spacing.

## 4) Duty-cycle controller



Figure 2-15. Duty-cycle controller for first-stage HZ DAC.

Figure 2-15 shows a schematic of the duty-cycle controller [42] for the 1<sup>st</sup> HZ DAC. It is essential to modify only the falling edge of the clock while keeping the rising edge intact. Otherwise, the HZ DAC pulse might fall into the next clock cycle, effectively increasing the loop order, which is detrimental to loop stability. The master clock, CLK, is processed with different delays. Four delayed CLK versions (a, b, c, d) are connected to a stacked inverter and latch stage. When both a and b are low, the OUT clock has a rising edge, while when both c and d are high, the OUT clock has a falling edge. Since we only change the delays of c and d, we only vary the falling edge of the OUT clock.

## 2.4. Measurements



Figure 2-16. Die photo and layout of the CTBPDSM sub-ADC array

The sub-ADC array is fabricated in 40nm CMOS as an integral part of the prototype digital beamformers and occupies  $390\mu m \times 140\mu m$  and each sub-ADC core occupies  $97\mu m \times 140\mu m$  (Figure 2-16). To facilitate testing, a Digital Down Converter (DDC) down-converts the 4GS/s output bit-stream within 100MHz signal band around 1GHz IF to 50MHz I and Q baseband signals. A 4<sup>th</sup>-order Cascaded-Integrator-Comb (CIC) decimation filter decimates the down-converted output by 8, resulting in a 250MS/s digital output which is captured by a logic analyzer.

With the help of the AND-gated pre-adder, it is possible to observe each single sub-ADC independently. Utilizing this capability, the measured power spectra of each individual sub-ADC and the combined four multi-phase sampled sub-ADCs are shown in Figure 2-17. The measurements confirm two advantages of using the multi-phase-sampling sub-ADC array. Multi-phase-sampling improves HD3 and HD5 by 9.3dB and 4.1dB, respectively. First, the measured HD3 and HD5 are -59dB and -67.3dB if the ADCs are combined in-phase, while with multi-phase-sampling, the measured HD3 and HD5 improve to -68.3dB and -71.4dB, respectively. Second, multi-phase-sampling improves the SNDR by 7dB. This improvement is possible because thermal

noise, jitter noise, and quantization noise are decorrelated among the four CTBPDSM sub-ADCs. The significant harmonic suppression of multi-phase-sampling enables an SNDR improvement of more than 6dB.



Figure 2-17. Measured power spectra of four sub-ADCs and combined 4× sub-ADC array

Figure 2-18 reports another set of measured power spectra of the CTBPDSM ADC for low and high 1<sup>st</sup> HZ DAC duty-cycles. There is an enable/disable switch for the duty-cycle controller and a buffered low-pass filtered clock to the analog I/O pad, so we can estimate that the low duty-cycle setting is a bit less than 50%, while the high duty-cycle is a bit less than 60%<sup>1</sup>. With a low duty-cycle, the measured HD3 and HD5 are -60.7dB and -68dB, respectively. With a high duty-cycle, HD3 and HD5 are -72.1dB and -74.7dB. The measured SNDR improved by 3dB as the duty-cycle controller optimizes pole/zero placement of the NTF and reduces harmonics.

<sup>&</sup>lt;sup>1</sup> PEX simulation suggests a 9% duty-cycle tuning step.



Figure 2-18. Measured power spectra of the CTBPDSM for high and low 1st HZ DAC duty-cycles

## Chapter 3. Prototype-I: A 16-Element 1GHz IF Digital Beamformer

As discussed in Chapter 1, digital beamforming offers essential advantages for large arrays, such as accurate beam-patterns, multiple simultaneous beams, flexible and re-configurable beampatterns, and fast steering [17]. However, the element-ADCs in a digital beamformer collect all the interferers, thus the element-ADC performance, especially the linearity, is critical for large arrays. In this chapter, we incorporate ADC parallelization with the multi-phase-sampling ADC techniques described in Chapter 2 into a prototype 16-element 1GHz IF digital beamformer.



### **3.1. System architecture**

Figure 3-1. System architecture of the IF digital beamformer with BSP and multi-phase-sampling sub-ADC array

Figure 3-1 shows the system architecture of the 16-element 1GHz IF digital beamformer. The 16 multi-phase-sampling Continuous-Time Band-Pass Delta-Sigma Modulator (CTBPDSM) sub-ADC arrays digitize each 1GHz IF input. As we learned in Chapter 2, these ADCs are cascade-of-

resonator feedback (CRFB) bandpass delta-sigma modulators with a feedforward path. The CTBPDSMs sample at 4GS/s and have an effective signal bandwidth of 100MHz. In addition to all the benefits from directly digitizing a 1GHz IF, a quarter center frequency CTBPDSM is attractive because it dramatically simples digital I/Q down-mixing. In each sub-ADC array, four CTBPDSMs are clocked with different clock phases (i.e., CLK1, CLK2, ...) of the 4GHz clock. As mentioned in Chapter 2, the DLL in each sub-ADC array generates the four sampling clock phases, spaced in 90-degree increments. The raw bit-streams from sub-ADC arrays are then fed into the bit-stream processor.

#### **3.2. Digital Bit-Stream Processing (BSP)**

The 64 sub-ADCs generate an aggregate sampling rate of 0.256TS/s, and processing such a large amount of data is a significant challenge for digital beamforming. We exploit the short digital word length of the quantizer outputs of the CTBPDSMs. Instead of immediately decimating the quantizer outputs, we directly process the 4GS/s 3-bit bit-streams from each sub-ADC, without filtering or decimation [10], [11]. Simple digital MUXes perform digital down-conversion and complex weight multiplication, saving power and area compared to conventional DSP approaches. With BSP, the digital beamform processing for all four beams occupies only 0.14mm<sup>2</sup> and consumes 200mW.

The bit-stream outputs of the CTBPDSMs are digitally synchronized to and added together with an AND-gated pre-adder. The combined 4GS/s 5-bit bit-stream from each sub-array is interleaved into separate I and Q streams to halve the sampling rate to 2GS/s. This data-rate reduction is possible because the I and Q LO mixing sequences for I/Q Digital Down Conversion (DDC) are alternately 0. The DDC mixers down-convert the half-rate interleaved bit-streams to baseband I/Q signals. 10-bit Complex Weight Multiplication (CWM) phase rotates the baseband I/Q bit-stream vectors. An adder combines the 16 phase-rotated signals to form 2GS/s 50MHz BW I and Q baseband digital beam signals. A 4<sup>th</sup>-order truncated CIC decimator reduces the sample rate to 250 MS/s and delivers 12-bit baseband I and Q outputs. Four sets of CWMs, adders, decimators produce four independent simultaneous beams.

### **3.2.1. Implementation**

1) AND gated pre-adder and interleaver



Figure 3-2. Bit-Stream Processing AND gated pre-adder and interleaver

As shown in Figure 3-2, the multi-phase-sampling sub-ADC array generates four 3-bit 4GS/s output streams. Since the sub-ADC outputs are digitally aligned and synchronized with the master clock inside the sub-ADC array, the pre-adder can be implemented with standard digital placeand-route tools. Each bit-stream is AND-gated inside the pre-adder, under control of the ADC\_SEL control word provided through the SPI interface. An interleaver after the pre-adder takes advantage of the  $1/4 f_s$  center frequency of the CTBPDSM, halving the sample to 2GS/s [10], which dramatically relaxes the timing constraints for all the subsequent stages.

## 2) Digital Down-Conversion (DDC)



Figure 3-3. Bit-Stream Processing (BSP) Digital Down-Conversion (DDC)

The digital down-conversion stage processes the previously interleaved output bit-streams, as shown in Figure 3-3. Since the mixing LO sequence is 1, -1, 1, -1..., there is an inverter at one of the MUX inputs to realize the negative sign. The LO clocked at 2GHz alternates the MUX to realize the 1/-1 alternating mixing operation.

3) Complex Weight Multiplier (CWM)



Figure 3-4. Bit-Stream Processing (BSP) Complex Weight Multiplier (CWM)

The down-converted 5-bit 2GS/s I/Q signals are passed to the complex weight multipliers (CWM) for phase rotation. A rotation matrix (3-1) is applied to the I/Q signal to rotate the vector. A 15 level MUX implements multiplication of the 15-level 5-bit bit-stream. The 5-bit rotation coefficients are set by an SPI bus, as illustrated in Figure 3-4. The output bit-stream width is determined by the 5-bit rotation coefficients and the maximum signed bit-stream, so the rotated signal is 8-bit.

$$R(\theta_k) = \begin{bmatrix} \cos \theta_k & \sin \theta_k \\ -\sin \theta_k & \cos \theta_k \end{bmatrix}$$
(3-1)

4) Decimator with Truncation



Figure 3-5. 4th-order Cascaded Integrator-Comb (CIC) decimation filter with stage truncation

The summed beam is decimated by 8 by a 4<sup>th</sup>-order CIC internal stage truncated decimator (Figure 3-5) to produce the 12-bit 250MS/s output beam. Although a 3<sup>rd</sup>-order CIC decimator is sufficient for the 2<sup>nd</sup>-order noise shaping in a 4<sup>th</sup>-order CTBPDSM, we use more aggressive internal stage truncation for a smaller output bit-width. The internal stages word widths are [23 22 19 14 14 13 13 12] for the 4<sup>th</sup>-order decimator and [20 18 16 15 15 14] for a 3<sup>rd</sup>-order decimator.

# 3.3. Measurements



2178 um

Figure 3-6. Die photo and layout of the element 4× sub-ADC array

This prototype 16-element 4-beam digital beamformer is fabricated in 40nm CMOS and occupies a total area of  $4.6 \text{mm}^2$  shown in Figure 3-6. Each single sub-ADC occupies  $97\mu\text{m} \times 140\mu\text{m}$ . Including the per-element DLL and pre-adder, a sub-ADC array occupies  $390\mu\text{m} \times 140\mu\text{m}$ . The bit-stream processing (BSP) is located at the center. Thanks to the efficient BSP approach, the BSP circuitry measures  $380\mu\text{m} \times 380\mu\text{m}$  and consumes 200mW while processing an aggregate sample rate of 0.256TS/s.



Figure 3-7. Testing Setup

Figure 3-7 shows the test setup. 16 DDS boards generate the 16 1GHz IF test inputs to the beamformer. RF baluns on PCB transform the single-ended DDS outputs to differential. The chip is packaged in an 88-pin QFN package and mounted in an on-board socket. A logic analyzer captures the 12-bit digital beam outputs.



# 3.3.1. Power spectra



Figure 3-8 reports the measured power spectra of the entire 16-element digital beamformer with a total number of 64 sub-ADCs. The measurement was taken at a steered degree of 45°, and the measured overall SNDR and SFDR are 56dB (9.1-bit ENOB) and 77dB, respectively.

Compared to the single element measurement depicted in the red line, the entire 16-element array increases the SNR and SNDR by 8.7dB and 7.7dB, respectively.

## **3.3.2. QAM constellation measurements**

Modulated signal processing with signals such as QAM is an end objective with the digital beamformer, so we measure performance with test QAM constellations. The high array SNDR of 56dB allows the prototype beamformer to receive a very high order (i.e., 2048QAM) modulation without symbol errors in 16000 test symbols at a 5M/s symbol rate. The instrument limits the symbol rate and the measurement symbol length. The measured EVMs are -40.4dB, -40.3dB, - 39.9dB for 512QAM, 1024QAM, 2048QAM, respectively. The measured constellation diagrams are summarized in Table 6.

| Modulation            | 512QAM              | 1024QAM             | 2048QAM             |
|-----------------------|---------------------|---------------------|---------------------|
| Constellation Diagram |                     |                     |                     |
| EVM                   | -40.4dB             | -40.3dB             | -39.9dB             |
| Symbol Error Rate     | <10 <sup>-4</sup> * | <10 <sup>-4</sup> * | <10 <sup>-4</sup> * |
| Symbol Rate           | 5M/s                | 5M/s                | 5M/s                |
| Data Rate             | 45Mbps              | 50Mbps              | 55Mbps              |

**Table 6: Measured QAM constellations** 

\* No error symbols found in 16000 symbols

# **3.3.3. Measured Beampatterns**



Figure 3-9. Measured beampatterns overlaid on simulated beampatterns

Figure 3-9 shows measured beampatterns. Digital beamforming gives us great steering coefficients flexibility and accuracy at a very low cost, enabling advanced beamforming techniques, such as adaptive nulling and beam tapering. Adapted beam steering uses an adaptive LMS method to optimize the steering coefficients to provide directive rejection at a specified angle. Tapered beam steering suppresses the side-lobes at the expense of a wider main lobe. The accurate 10-bit CWM coefficients result in measured beam-patterns that are near-identical to ideal beampatterns. For a beam steered at 25°, an adaptive null at 60° is -41.8dB. The two tapered beams steered at -30° and 45° are also measured. The maximum side-lobe levels are -22.1dB and -24.1dB, respectively.

## **3.4.** Conclusion

In this work, we introduce a parallel multi-phase-sampling CTBPDSM sub-ADC array per element to overcome the element-ADC linearity bottleneck in large-scale digital beamforming. The prototype 16-element 1GHz IF digital beamformer IC integrates 64 4GS/s sub-ADCs, with four parallel multi-phase-sampling sub-ADCs per element. The bit-stream processor efficiently generates four simultaneous beams from the aggregate 0.256TS/s data stream generated by the 64 sub-ADCs. The measured HD3 suppression of 9dB agrees with the mathematical analysis. The measured array SNDR and SFDR are 56dB and 77dB, respectively, and this performance is verified by a 2048QAM modulation test showing a measured EVM of -39.9dB. Both measurements confirm the performance improvement from the parallel multi-phase-sampling technique.

### Chapter 4. Prototype-II: A 4×4 Element Fully-Integrated 28GHz Digital Beamformer

In this chapter, we present a 4×4 element 28GHz digital beamformer IC combined with a custom 8-layer LTCC substrate with a 4×4 patch antenna array to demonstrate a practical fully-integrated mm-wave digital beamforming system. In addition to the digital beamformer mentioned in Chapter 3, this 16-element fully-integrated 28GHz digital beamforming system further integrates essential mm-wave components such as the patch antenna array and the mm-wave frontend to form a single IC mm-wave-to-digital beamforming system. It provides all the advantages of a digital beamformer. Meanwhile, the full-digital I/Os are immune to crosstalk between multiple beams, thus it is ideal for large-scale highly-reconfigurable multi-beam beamforming systems.

Despite all the advantages, several design challenges lie ahead: 1) the element-ADC performance is a critical bottleneck; 2) mixed-signal routing is complicated in a fully-integrated system, as sensitive analog signal lines are susceptible to local crosstalk from high-speed and high-swing digital buses; 3) the enormous raw data requires high-throughput beamform processing; and 4) the power consumption and area of each element are restricted to avoid excessive power-supply IR drops and board-level heat management challenge.

In this chapter, we introduce several techniques to mitigate these challenges: 1) a compact inductor-less mm-wave frontend enables a small RX slice size; 2) the 1GHz high IF sampling enables a single-phase 27GHz LO to simplify LO generation and distribution; 3) the parallel multiphase-sampling CTBPDSM sub-ADC arrays mitigates harmonics and brings additional anti-

aliasing filtering; 4) four PLLs and a 1-to-4 buffer tree eliminate bulky transmission lines; 5) the compact floorplan of the  $4 \times RX$  slice bank reduces mm-wave input and LO routing length; and 6) bit-stream processing enables power/area efficient digital beamforming processing.



## 4.1. System architecture<sup>2</sup>

Figure 4-1. System architecture of the fully-integrated 28GHz digital beamformer with BSP and multi-phase-sampling sub-ADC array

Figure 4-1 shows the system architecture of prototype 4×4 elements fully-integrated 28GHz digital beamformer. The CMOS die is flip-chip mounted on the backside of a custom LTCC substrate. There is a 4×4 28GHz patch antenna array on the substrate's top side (Figure 4-2). The prototype IC incorporates mm-wave frontends, ADCs, clock generation/distribution, and digital

<sup>&</sup>lt;sup>2</sup> The author would like to acknowledge Christine Weston, Daniel Weyer and Fred Buhler for their contributions to this work.

beamform processing. 16 elements per IC represents an excellent trade-off in complexity and performance. It balances digital-routing complexity against RF routing complexity, RF routing loss, and RF bump egress in a single chip with competitive power consumption and a relatively-small die area. The IC prototype generates four fully-formed beams (or four partial beams in a tiled system), significantly reducing the digital I/O bandwidth and simplifying board-level digital routing.



Figure 4-2. Aperture-coupled microstrip patch antenna array and mm-wave feed lines in the LTCC substrate The substrate is an 8-layer Low-Temperature Co-fired Ceramic (LTCC) substrate using GL773 material<sup>3</sup>. Each antenna is an aperture-coupled microstrip patch with a single-stub matching network (Figure 4-2). The dielectric thickness of the patch elements is  $300\mu m (0.067\lambda)$ . The patch elements are  $3.6mm (0.8\lambda)$  wide,  $1.8mm (0.4\lambda)$  long, and have a 4.5mm pitch. The feed lines are 14.7mm long and have an estimated insertion-loss of about 1dB at 28GHz. The patch elements are placed on the package's top side, while the flip-chip die and BGA balls are on the package's bottom side. The measured VSWR of the antennas (measured at the bump pads) is <2.0 at 28GHz. The measured fractional bandwidth is 2.48GHz (9.1%).

<sup>&</sup>lt;sup>3</sup> The substrate design was in collaboration with Dan Lambalot and Ravi Chekuri from Bayside Design Inc.

Each patch-antenna element feeds to an RX slice. An inductor-less noise-canceling low-noise amplifier (LNA) performs the single-to-differential conversion. Passive mixers, driven by a single-phase 27GHz LO, generate 1GHz IF signals from the LNA outputs. We use four PLLs to simplify LO distribution. Four copies of the 27GHz LO each feed to a 1-to-4 buffer tree driving four RX slices. All PLLs share an external 100MHz reference clock.

Sixteen variable gain amplifiers (VGAs) drive sixteen 4 GS/s continuous-time band-pass deltasigma ADC arrays and correct for gain variation between the mm-wave frontends. The parallel sub-ADC array for each element directly digitizes the 1GHz high IF signal. Each sub-ADC array consists of four identical sub-ADCs driven by different phases of the 4GHz clock. As mentioned in Chapter 2, the parallelized sub-ADC array improves SNR and provides inherent FIR filtering. The 4GS/s oversampling ADC array with a bandwidth of 100MHz leads to an OSR of 20. The low quantizer resolution allows us to shorten the raw digital data bus to 5-bits per element and facilities bit-stream processing before final decimation.<sup>4</sup> A local DLL based on [41] generates the multiphase clock to enable the multi-phase-sampling technique to provide additional filtering. Digital circuitry synchronizes and aligns the bit-stream outputs of the sub-ADCs and feeds the sum of the sub-ADC array outputs to the digital bit-stream processor at the center of the chip.

A Bit-Stream Processing (BSP) core at the center of the chip directly processes the raw, undecimated outputs of the sixteen delta-sigma ADC arrays. Bit-stream processing takes advantage of the narrow bit-width (5bits) of the sub-ADC array outputs for simple MUX-based digital downconversion (DDC) and complex weight multiplication (CWM). The bit-stream processor forms four independent, simultaneous beams. Only the final beam-outputs are decimated, further saving power and die area [10]. The decimated digital beam outputs are distributed among four corners

<sup>&</sup>lt;sup>4</sup> Since the oversampling ADC scales well with CMOS technology node, we expect 2-3 times higher sampling frequency and bandwidth in more advanced nodes.

of the chip, as this utilizes the corner areas unused by channel slices and simplifies the routing on the substrate. The block diagram of the RX signal chain and BSP is shown in Figure 4-3 below:



Figure 4-3. RX signal chain with the bit-stream processor (BSP)

# 4.2. Implementation

#### 4.2.1. Mm-Wave frontend



Figure 4-4. Block diagram of the mm-wave frontend and ADC array

Each RX slice (Figure 4-4) contains an mm-wave frontend, an LO buffer, an ADC array with a DLL, and a digital bit-stream adder. Four RX slices form a 4× RX slice. There are a total of 16 RX slices on the chip. The patch antenna is integrated onto the LTCC substrate, and the die is flipchip attached. We adopt a high (1GHz) IF sampling as it allows us to perform I/Q mixing in the digital domain. Since the ADC directly digitizes a 1GHz high-IF, The RF down-conversion only requires a single-phase 27GHz LO. Therefore, the high IF brings significant advantages in power consumption and the die area for the single-phase LO generation and distribution. Another advantage is that a high-IF allows AC coupling to optimize the common-mode of different blocks independently.

A drawback with single-phase mixing is the lack of image rejection. An in-substrate filter such as a strip-line hairpin filter can provide a bandpass frequency response to suppress the image. For example, a single-section hairpin filter measures about  $1100\mu m \times 500\mu m$ , which is small compared to the substrate size of  $25mm \times 25mm$  and the patch antenna size of  $3.6mm \times 1.8mm$ . Furthermore, a more advanced technology node should permit a higher IF, significantly relaxing the image filtering requirements.

1) Low noise amplifier (LNA)



Figure 4-5. Single-ended to pseudo-differential low noise amplifier (LNA)

As illustrated in Figure 4-5, the LNA has a 28GHz single-ended input and generates 1GHz IF pseudo-differential output [43]. The single-ended input simplifies the in-package antenna design, while the pseudo-differential output helps suppress common-mode on-chip noise, especially from

the fast high-swing digital busses. The CG-CS input stage performs noise cancellation and input matching without inductor degeneration. The common-gate-connected transistor  $M_2$  sets the input impedance of the LNA to  $1/(1+A_{FB})g_{m2}$ . Ideally, this input impedance is frequency independent, which implies a broadband matching, which is attractive; as the data rate of the wireless communication systems keeps increasing, a broadband solution is preferable. AC coupling in mmwave design is area efficient at and simplifies biasing. A 1pF capacitor, C<sub>1</sub>, couples the 28GHz input to the CG transistor, M<sub>2</sub>. A 300fF capacitor, C<sub>2</sub>, couples the feedback signal to the CG transistor, M<sub>2</sub>. A local constant transconductance reference circuit generates the bias voltages, V<sub>b1</sub> and V<sub>b2</sub>, improving robustness over temperature changes. The body of the transistor, M<sub>4</sub>, is sourceconnected. This "hot well" connection eliminates the body effect of the output buffer transistor, M<sub>4</sub>, and improves linearity.

2) Passive mixer (MXR)



Figure 4-6. Double-balanced passive mixer

We adopt a double-balanced passive mixer (Figure 4-6) for down-conversion. The doublebalanced passive mixer is much simpler than an active mixer and facilitates a very compact layout, crucial for large array implementation. We also benefit from passive mixing, which provides high linearity, large headroom, and relaxed transistor matching. The double-balanced design provides better suppression of LO and RF feedthrough. There is some LO feedthrough (27GHz) due to the pseudo-differential RF port, which is benign because it is far from the 1GHz IF. The mixer input is AC coupled and terminated to ground through a bias resistor at the mixer output. The AC coupling suppresses DC offset caused by RF/LO leakage. The balanced LO is also AC coupled and biased at ~390mV, near the NMOS threshold voltage. A current-source-regulated source-follower buffers the output of the mixer to drive the input resistance of the VGA.

3) Variable gain amplifier (VGA)



Figure 4-7. Schematic of the variable gain amplifier (VGA)

Figure 4-7 shows the schematic of the IF variable gain amplifier (VGA). Low-pass filtering at the VGA input supplements the innate anti-aliasing of the continuous-time band-pass delta-sigma sub-ADC array. In particular, the 300fF capacitor,  $C_{in}$ , attenuates mixing artifacts. The 200 $\Omega$  sizing of the input resistor,  $R_{in}$ , is a compromise between noise and the loading of both the source-follower driver and the VGA op-amp. One-hot-coded PMOS switches set the feedback resistance,  $R_{f}$ , tuning the VGA gain from 0dB to 21dB in 3dB step size. Although the VGA input has a

relatively small signal swing around a 700mV common-mode voltage, the maximum output voltage swing from 650mV to 750mV favors PMOS switches given a 1.2V power supply. Multi-stage operation enables a simulated 10GHz op-amp GBW with >60° phase margin for a power consumption of 6.7mW.

4) Constant transconductance bias



Figure 4-8. Schematic of the constant transconductance bias

Figure 4-8 shows the constant transconductance bias schematic consisting of a start-up circuit, constant transconductance core, and cascaded bias stage. The transistor dimensions are shown in the figure. Neglecting the body effect and equaling the current of left and right branches of the core gives:

$$\frac{1}{2}\mu_{n}C_{ox}\frac{4W}{L}\left(V_{G}-I_{REF}R-V_{TH}\right)^{2} = \frac{1}{2}\mu_{n}C_{ox}\frac{W}{L}\left(V_{G}-V_{TH}\right)^{2}$$
(4-1)

Given the transconductance of the  $1 \times$  device is:

$$g_m = \frac{2I_{REF}}{V_G - V_{TH}} \tag{4-2}$$

Substituting (4-1) into (4-2) gives:

$$g_m = \frac{2}{R} \left( 1 - \frac{1}{\sqrt{4}} \right)$$
 (4-3)

Equation (4-3) shows that ideally, the transconductance of the  $1\times$  device is solely a function of the resistor value, which is tunable. The W/L ratios of the transistors are carefully chosen from PVT simulations so that each transistor has at least a 50mV margin against entering the triode region or the sub-threshold region. The simulation shows that all the bias transistors are in saturation over most of the PVT variations.

The circuit has two stable states, one that is desirable with all the bias transistors operating in the saturation region, and another that is the latched-up state where all the bias transistors are cutoff. Thus, a start-up circuit is needed to "kick start" the circuit out of the latched-up state. When the start-up completes, depending on the bias voltage, the start-up circuit continuously consumes a quiescent current, so the W/L ratio of the transistors in the start-circuit is minimized to reduce this quiescent current, at the cost of an increased start-up time.

#### 5) Performance summary

-

|                                | Simulation Result (1GHz IF) |
|--------------------------------|-----------------------------|
| Input Frequency                | 28GHz                       |
| LO Frequency                   | 27GHz                       |
| Gain                           | 21dB                        |
| Return Loss                    | 8.5dB                       |
| Input Referred IP3             | -12.6dBm                    |
| Input Referred 1dB Comp. Point | +3dBm                       |
| Noise Figure                   | 11.8dB                      |
| Power Consumption              | 29mW                        |

Table 7: Simulated performance of the mm-wave frontend (with VGA)

Table 7 summarizes the simulated performance of the mm-wave frontend. The simulation results are based on a test-bench that contains popularly used parasitic models for both pads and in-package routings.

# 4.2.2. Phase-locked loop (PLL) and LO distribution<sup>5</sup>

Four identical PLLs (one on each side of the chip) generate the 27GHz LOs for the corresponding  $4 \times RX$  slice banks. We estimate that using four PLLs reduces the 27GHz routing length by at least 4mm. Dedicating a 50-100µm width for the transmission lines indicates a direct area-saving of 0.2-0.4mm<sup>2</sup>, which is comparable to the area of the three extra PLLs. Furthermore, power dividers and buffers in a single PLL scheme would also require significant area. The 27GHz routing from a single PLL would also complicate routing of the digital busses from the ADCs to the digital core.

The use of multiple PLLs in a beamformer requires consideration of phase noise - this topic is thoroughly considered in recent publications [44]–[46]. With 4 PLLs driving a 16-element array and single carrier modulation, this configuration most closely fits the analysis of the tiled system described in [46]. Phase noise is common within the PLL bandwidth and typically can be tracked and canceled in the receiver. Outside of the PLL bandwidth, the phase noise is uncorrelated and has differing implications for self-interference and multi-user interference. Self-interference reduces as the number of uncorrelated clock domains increases as phase noise averages out. However, uncorrelated phase noise worsens multi-user interference, which appears as crosstalk between users. Interestingly, the effect depends on the load factor, which is the ratio of the number of users to the array size, so that self-interference is manageable through the choice of load factor [46].

<sup>&</sup>lt;sup>5</sup> The PLL and LO distribution were designed by Daniel Weyer.



Figure 4-9. Block diagram of the PLL



Figure 4-10. Schematic and simulated phase noise of the LC-tank VCO

Figure 4-9 shows a block diagram of a single PLL. The PLL is a 3<sup>rd</sup>-order analog-charge-pump PLL with an LC-tank VCO (Figure 4-10). Compared to all-digital PLL widely used in IoT applications [47]–[49], analog-charge-pump PLL offers better performance with higher energy efficiency at mm-wave bands. All PLLs share a common 100MHz external reference clock, which simplifies the board-level design. A buffered 200MHz output is available for checking that the PLLs are properly locked.



Figure 4-11. Schematic and layout of the 1-to-2 LO buffer

The schematic and layout of a single-stage 1-to-2 LO buffer tree are shown in Figure 4-11. Three stages of the buffer for each  $4 \times RX$  slice bank perform the 1-to-4 LO distribution. Each 1-to-2 LO buffer is a single-stage push-pull amplifier with a local common-mode feedback resistor. AC coupling at the input of each buffer decouples the common-mode of different stages to optimize the input common-mode to maximize the gain of each stage. The 1-to-2 splitting is achieved by the 1-to-2 split on metal 9. Two consecutive lower metal layers are used for each branch to reduce the routing resistance. Figure 4-12 shows the complete layout of the 1-to-4 LO buffer tree. The longest routing section length is ~570µm, which is much less than the wavelength of the 27GHz LO ( $\lambda \approx 5400\mu m$ ). The differential traces meander to match the phase to the RX slices.



Figure 4-12. Metal routing of the 1-to-4 buffer tree

## 4.2.3. 4× RX slice bank



Figure 4-13. Block diagram of the 4× RX slice bank

As illustrated in Figure 4-13, the 4× RX slice bank spans a range of 4×9 bump array, as represented by the dots. There are several practical considerations for the bump assignment: 1) the mm-wave input bumps should be regularly distributed as an even distribution makes the macroblock placement at various hierarchy levels feasible; 2) the routing length of the mm-wave input and the LO distribution should be minimized; and 3) the 5-bit digital output from the sub-ADC array must be connected to the digital bit-stream processor at the center of the chip without interfering with the mm-wave input and LO. It might be a good choice to have mm-wave inputs at the chip edge, but this is detrimental for the LO distribution, as it means the 27GHz LO must travel from the chip center to the edge. These considerations finally bring us to the bump assignment shown in Figure 4-13. We place the 28GHz input bumps (green dots) near the chip center, whereas the other bumps are for power and ground (grey dots). The mm-wave input bumps intersperse with the frontend ground bumps, simplifying both on-chip placement/routing and substrate routing.



Figure 4-14. Bump assignment

The antenna feed lines are on the substrate's upper metal layer (Figure 4-2), while the mmwave bumps attach to the bottom of the substrate. The substrate through vias for each mm-wave input lead to a 'via wall' from bottom to top, preventing any in-substrate trace between the mmwave vias. The bumps at the center of the chip are ground and connect to ground planes through substrate vias. Figure 4-14 shows the bump assignment of the chip. All the bias and digital I/Os are routed from the chip corners to prevent in-substrate routing through the mm-wave 'via wall'.

Having established the bump assignment, we can now consider how to place the circuit blocks into an RX slice that spans a 4×2 bump array. The mm-wave frontend is placed near the 28GHz input bump, which is located along the chip-center side. The green trace in Figure 4-13 represents the mm-wave routing, and the blue trace represents the 27GHz LO routing. It is favorable that we minimize the length of both mm-wave and LO distribution routing in this approach. The down-converted 1GHz IF signal then passes through the VGA, which drives the parallel 4× sub-ADC array. The compact design of the CTBPDSMs helps four sub-ADCs fit into the assigned RX slice region. After analog-to-digital conversion, the digital signal bus is much less sensitive to routing

length; the digital buses are illustrated in red. Decoupling capacitors provide a shield to minimize coupling between the digital buses and the analog circuitry.



Figure 4-15. The layout of a 4× RX slice bank

After one RX slice is partitioned and placed, we duplicate them to form the 4× RX slice bank, the pitch between the slices is defined by the bump pitch, which measures 162µm. Figure 4-15 shows the layout of the 4× RX slice bank. Different metal layers are used for the mm-wave, LO, and digital bus routing to minimize crosstalk. The re-distribution layer (RDL) routing length (including ESD protection circuitry) from the mm-wave bump to the LNA-plus-mixer is 180µm. The metal-6 digital output routing from the RX slice to the bit-stream processor (BSP) is 1~1.5mm long. We place the ADCs in the IC's outer area to minimize the routing of the single-phase 27GHz LO and shortens the LO routing length to 800µm. Thanks to the compact ADC size and the area-efficient bit-stream beamform processing, the IC area is pad-limited with conventional C4 bump technology.

Channel isolation is an essential consideration in a compact beamforming design. Crosstalk deteriorates beam-patterns and couples noise, reducing the benefits of array gain. An advantage of digital beamforming is that the analog and RF routing are confined to the frontend stripe, reducing the possibility of crosstalk. The stripe floor plan also minimizes the potential for crosstalk. Analog

and RF routing is short and shielded from the routing in other frontends. Each frontend has its dedicated power and ground bumps. A wall of power/ground bumps separates isolates the RF bumps. An associated 'via wall' extends through the LTCC substrate, helping further improve isolation. All-metal-layer decoupling capacitors provide shielding and further isolate the supplies.



## 4.3. Measurements

Figure 4-16. Die photo and layout of the 4× RX slice

The prototype 28GHz mm-wave digital beamformer is fabricated in 40nm CMOS and measures 2.8mm × 2.8mm (Figure 4-16). The prototype is flip-chip assembled on the backside of a custom-designed Kyocera 8-layer LTCC substrate with an antenna array on the topside (Figure 4-2). As discussed before, the 4× RX slice banks are placed on each edge of the chip. Four PLLs are placed around the center side to simplify LO distribution. The bit-stream processor (BSP) sits
at the center, and it takes  $380\mu m \times 380\mu m$ . The 100MHz reference clock and the digital I/Os are placed in corners to save area and simplify the routing in the substrate.



Figure 4-17. 28GHz anechoic chamber test setup

All mm-wave testing of the digital beamformer is over-the-air in a 28GHz anechoic chamber shown in Figure 4-17. The chamber setup is on the left, while the 28GHz horn antenna is shown on the right as the mm-wave signal source. The distance from the horn antenna to the beamformer is about 45cm, which satisfies a far-field measurement setup. The test setup consists of a motherboard with voltage and current reference boards, an SPI interface, power regulators, an ATX power supply, and fuses<sup>6</sup>. The beamformer assembly is mounted on a daughterboard with a heat sink and DC fan on the backside for thermal management. The test assembly sits inside a 28GHz anechoic chamber with RF absorption material covering the electronics. A partially 3D-printed platform is mounted on a two-axis gimballed system (covered by shielding materials). A Raspberry Pi controller drives the servo motors. A plastic LEGO tower supports the transmit horn antenna. An automated script running in MATLAB on a PC controls all the test instruments and motors.

<sup>&</sup>lt;sup>6</sup> The test PCB board and test setup were built in collaboration with Fred Buhler and Christine Weston.

Characterizing a digital beamformer is very different from characterizing an analog beamformer. In evaluating an analog beamformer, the output goes to a spectrum analyzer, which natively measures real power in the frequency domain. We collect the 12-bit 250MS/s decimated output of this digital beamformer with an Agilent 16802A logic analyzer. A challenge with characterizing a fully-integrated digital beamformer is that the analog signal power levels are not directly available.

VSWR Band #2 - West 2X2 Elements (RF IN[3:0]) band2 4.50 4.00 28.6243 2.501 3 50 2.7414 3.00 2 50 2.00 1.50 1.00 26.00 27.00 29.00 25.00 28.00 F [GHz]

1) VSWR of the patch antenna<sup>7</sup>

Figure 4-18. Measured VSWR of the patch antennas for the 4× RX slice on the west side

Figure 4-18 shows the measured VSWR of the patch antenna. Four traces shown represent the patch antenna on the west side of the chip. These measurements are made with a bare substrate without an attached die, allowing probing of the exposed bump pads with a GTL 5050 probe station and a Keysight N5227A Network Analyzer.

<sup>&</sup>lt;sup>7</sup> The measurement was performed by Dan Lambalot from Bayside Design Inc.

# 2) 3D beam-patterns



Figure 4-19. Measured 3D beam-patterns

Figure 4-19 shows the measured 3D beam-patterns steered at boresight and  $15^{\circ}/-15^{\circ}$ , and there is also a comparison between simulated and measured patterns at elevation and azimuth cuts. The beam-patterns are for the entire system and are measured over-the-air. An automated script running on the test PC controls the two-axis gimballed system, traverses all directions, and records the corresponding output powers. Thanks to the accurate 10-bit CWM coefficients of digital beamforming, the measured patterns are very close to the simulated one. It is noticeable that the measured main lobe is a little narrower than the simulated one because we assume isotropic antenna operation in the simulation, while the radiation pattern of the patch antenna only covers about a  $\pm 45^{\circ}$  scan angle.

## 3) QAM constellation



Figure 4-20. Measured QAM4 constellation diagram

Another over-the-air measurement is conducted for a QAM4 modulated signal. We use a Rohde & Schwarz SMW200A signal source to generate a custom arbitrary 28GHz RF signal, and we steer the beamformer to boresight. The signal generator clock, reference clock, and ADC clocks are synchronized. Figure 4-20 shows the measured constellation diagram, and the measured EVM is about -18dB at about 5MS/s symbol rate. The measured EVM is in part limited by the test setup. We can only observe I or Q data at one time due to the limited number of I/O pads; therefore, we must separately record I and Q data. Since we generate a custom 28GHz RF waveform with a preamble, we can align the I and Q from the two sets of measurements. However, phase noise associated with the I and Q measurements is uncorrelated for all offset frequencies. Furthermore, we determine the phase from a preamble in the packet header without implementing a de-rotation filter [46]. For these reasons, the test setup does not benefit from the correlation in phase noise between samples, and therefore the EVM is higher than the innate EVM of the beamformer.

# 4) Noise figure

Noise figure (NF) measurements for a digital beamformer are not as straightforward as analog beamformers. Since it is impossible to access the analog IF signal, we cannot directly measure the

NF of the mm-wave frontend. The analog-to-digital conversion also complicates the analog power calculation, which is essential in conventional NF measurements. Instead of directly measuring the input/output signal power and noise floor, we use the hot/cold source technique [50] to measure the mm-wave-to-digital NF. The test setup is shown in Figure 4-21. We use liquid nitrogen and a hot air gun to cool and heat a piece of absorption material to 77K and 393K, respectively. The air gap between the material and the beamformer minimizes any temperature change of the electronics under test.



Figure 4-21. Test setup to measure the noise figure (NF)

The NF of a receiver is the ratio between the input SNR and output SNR, and it is larger than one since the receiver adds noise. The among of the noise added is fixed if the temperature remains the same.

$$NF = \frac{SNR_{input}}{SNR_{output}} = \frac{S_i / N_i}{GS_i / (N_{add} + GN_i)} = \frac{1/N_i}{G / (N_{add} + GN_i)}, \quad N_i = kTB$$
(4-4)

In (4-4),  $N_{add}$  is the noise power added by the beamformer – this noise power is fixed, given a specific beamformer temperature, the spacing between the beamformer and the absorption material is critical to keep the  $N_{add}$  temperature the same.  $S_i$  is the cross-canceled input signal power, G is the mm-wave-to-digital gain of the receiver, and  $N_i = kTB$  is the input noise power, which is the black body radiation from absorption material determined by Boltzmann constant *k*, temperature T, and bandwidth B. For the digital beamformer, G and  $N_{add}$  are unknown parameters, and we

measure the output noise floor  $N_{add} + GN_i$  at different  $N_i$  temperatures to solve both G and  $N_{add}$ . A larger temperature difference makes the test more accurate. The measured NF is about 7dB.



#### 5) Power breakdown

Figure 4-22. Power breakdown of a single RX slice

The beamformer has a measured power consumption of 2.8W. Figure 4-22 shows the powerconsumption breakdown of a single RX slice.

#### 6) Performance summary and comparison

A performance summary and a comparison with the-state-of-the are provided in Table 8. This work features a high number of elements per IC, along with full integration from mm-wave to digital output. Digital beamforming with BSP, the novel 4× RX slice bank floorplan, the high IF sampling architecture with single-phase LO, the compact CTBPDSM array, and the inductor-less mm-wave frontend give us great integration of 16 elements per IC and four beams per IC. Benefiting from digital beamforming, the prototype IC forms four simultaneous beams from 16 elements without compromising the number of elements used per beam (i.e., all four beams utilize the raw information from all sixteen elements). This prototype demonstrates full integration from mm-wave frontend to digital processing. Digital beamforming is both accurate and compact, while

the die area per element is only 0.48mm<sup>2</sup>. The on-chip PLL with a reference of around one hundred MHz simplifies board-level design.

|                                            | This Work                              | HC. Park<br>ISSCC'20   | R. Garg<br>ISSCC'20                                  | J. Pang<br>JSCC'20    | S. Pellerano<br>ISSCC'19 | J. D. Dunworth<br>ISSCC'18 |
|--------------------------------------------|----------------------------------------|------------------------|------------------------------------------------------|-----------------------|--------------------------|----------------------------|
| Technology                                 | 40nm CMOS                              | 28nm CMOS              | 65nm CMOS                                            | 65nm CMOS             | 22nm FinFET              | 28nm LP-RF CMOS            |
| Frequency [GHz]                            | 28                                     | 39                     | 28                                                   | 28                    | 71-76                    | 28                         |
| Elements per IC                            | 16                                     | 16                     | 4                                                    | 8                     | 4                        | 24 (6×4-channel sub)       |
| Architecture                               | Digital                                | Analog                 | Analog                                               | Analog                | Analog                   | Analog                     |
| Beams per IC                               | 4<br>(16-elements each)                | 1                      | 4<br>(freqmultiplex)                                 | 2<br>(dual-polarized) | 1                        | 2<br>(dual-polarized)      |
|                                            | In-Package<br>Antenna                  | No                     | No                                                   | Yes                   | Yes                      | Yes                        |
| To do and in a                             | mm-Wave<br>Frontend                    | mm-Wave<br>Frontend    | mm-Wave<br>Frontend                                  | mm-Wave<br>Frontend   | mm-Wave<br>Frontend      | mm-Wave<br>Frontend        |
| Integration                                | 4× PLLs                                | No PLL                 | No PLL                                               | No PLL                | $1 \times PLL$           | $1 \times PLL$             |
|                                            | 64× ADCs                               | No ADC                 | No ADC                                               | No ADC                | No ADC                   | No ADC                     |
|                                            | Digital<br>Beamforming                 | Analog<br>Beamforming  | Analog<br>Beamforming                                | Analog<br>Beamforming | Analog<br>Beamforming    | Analog<br>Beamforming      |
| Phase Shift Res.<br>[bits]                 | 10                                     | 4                      | 16-Phase LO <sub>2</sub><br>+ Signed LO <sub>1</sub> | 0.4 deg phase error   | 8                        | 3                          |
| RX NF [dB]                                 | 7 (16×channel)<br>(antenna to digital) | 4.2-4.6<br>(1×channel) | 6-7.8<br>(1×channel)                                 | 4.2<br>(on-wafer)     | 6<br>(1×channel)         | 4.4-4.7<br>(4×channel)     |
| RX BW [MHz]                                | 100                                    | 800/100                | 400                                                  | 100-400               | 2000                     | 5500                       |
| Power [mW]                                 | 2800 (RX)<br>MW+PLL+ADC+BSP            | 624 (RX)               | 450                                                  | 2020 (RX)             | 168 (RX)                 | 167 (4× CH)                |
| RX Power<br>per Element [mW]               | 177<br>MW+PLL+ADC+BSP                  | 39                     | 112.5                                                | 252.5                 | 42                       | 42                         |
| Die Area [mm <sup>2</sup> ]                | 7.73<br>MW+PLL+ADC+BSP                 | 30                     | 10.6                                                 | 12                    | 5.04                     | 27.8                       |
| Die Area<br>per Element [mm <sup>2</sup> ] | 0.48<br>MW+PLL+ADC+BSP                 | 1.875                  | 2.65                                                 | 1.5                   | 1.26                     | 1.16                       |
| PLL Ref. Freq.<br>[MHz]                    | 102-105                                | N/A                    | N/A                                                  | N/A                   | 2370-2530                |                            |

Table 8: Performance summary and comparison

# Chapter 5. Frequency-Interleaving Continuous-Time Band-Pass Delta-Sigma Modulator

As discussed before, high IF sampling has several advantages for receiver architectures: 1) LO feedthrough is minimized; 2) immunity to DC offset and 1/f noise; 3) relaxed image rejection requirement. The high-speed wideband analog-to-digital converter is a crucial component of high IF sampling systems, and the CTBPDSM ADC is an attractive choice since it can directly digitize an IF frequency with band-pass noise-shaping and inherent anti-aliasing, resulting in a very power and area efficient implementation.

To achieve even higher bandwidths, conventionally, we can extend both the sampling frequency<sup>8</sup> and loop-order of a delta-sigma modulator. However, higher sampling frequency and high loop-order increase clock power and parameter sensitivity, complicating resonator design and loop compensation. Although the time-interleaving technique is well known for increasing the equivalent sampling frequency of an ADC, it is seriously limited by mismatch, which causes both misalignment and gain error across sub-ADCs in time-domain, hence artifacts in frequency-domain. Thus time-interleaving is a double-edged sword for increasing the bandwidth, making it less valuable in large-scale beamforming systems where linearity is a significant concern.

It is favorable to decouple the bandwidth of a delta-sigma modulator from its sampling frequency and loop-order. The idea leads to the frequency-interleaving architecture. Instead of using a single modulator, we frequency-interleave multiple sub-modulators, each with a different center frequency, to increase bandwidth without increasing sampling frequency or loop-order. The

<sup>&</sup>lt;sup>8</sup> Assume the OSR of the delta-sigma modulator remains the same.

reduced sampling frequency and lower loop-order reduce clocking-related power, relax op-amp GBW, alleviate excess-loop-delay (ELD) compensation, and reduce the rate of comparator metastability.

Frequency-interleaving is a natural choice for noise-shaping ADCs which manipulate NTF and STF in the frequency-domain. (It is not practical for Nyquist ADCs because it would require channel-filtering of the analog input to avoid aliasing between channels.) Frequency-interleaving has fundamental advantages over time-interleaving. Frequency-interleaving in an oversampling ADC is free of time-interleaving artifacts since the gain/timing mismatch does not introduce abrupt changes in the time-domain. However, the gain offset can degrade the dynamic range



5.1. System architecture

Figure 5-1. System architecture of the frequency-interleaving CT delta-sigma modulator

Figure 5-1 shows the system architecture of the frequency interleaving CT bandpass DSM. It frequency-interleaves two parallel sub-modulators centered at  $\pm$ 75MHz from 1.5GHz, with each sampling the same input signal for a total bandwidth of 300MHz. The 6GS/s sub-modulators derive from a conventional 1/4  $f_s$  center-frequency 4<sup>th</sup>-order cascade-of-resonator feedback (CRFB) continuous-time band-pass modulator. Two mirrored FIR filters process the bit-streams from the sub-modulators, which add to form the final output. A pair of 3-tap FIR filters with coefficients of [4 1 4] and [-4 1 -4] filters out the overlapping higher/lower band quantization noise from low/high band sub-modulators, respectively. Mirrored FIR filters ensure a flat overall in-band response and symmetric noise-nulling zeros. Simple MUXs realize the ±4 and ±1 coefficients, enabling standard digital design flow at 6GS/s. The total estimated power consumption of the FIR filters is 1.2mW, corresponding to only 3% of the entire ADC power (38mW).

### 5.2. Implementation

# 5.2.1. Modulator architecture



Figure 5-2. Block diagram of the sub-modulators

Each sub-modulator (Figure 5-2) consists of two single-op-amp resonators, RZ and HZ feedback-DACs, a passive current summing node, and a 9-level flash quantizer. A feedforward path around the first resonator improves linearity. The loop transfer functions are:

$$L_{low} = \frac{0.08763z^3 - 0.9924z^2 + 0.1768z - 0.6588}{\left(z^2 - 2z\cos\frac{19\pi}{40} + 1\right)^2}$$
(5-1)

$$L_{high} = \frac{-0.08763z^3 - 0.9924z^2 - 0.1768z - 0.6588}{\left(z^2 - 2z\cos\frac{21\pi}{40} + 1\right)^2}$$
(5-2)

Because the center frequency differs from a conventional  $f_s/4$  center-frequency modulator, there is an additional 1<sup>st</sup>-order term in the denominator and an extra zero in the numerator of each resonator transfer function in (5-1) and (5-2). Adjusting the frequency of the resonator satisfies the additional 1<sup>st</sup>-order term in the denominator. The RZ and HZ DACs accommodate two of the three zeros. We introduce the third zero with a zero-insertion resistor,  $R_z$ , in the 2<sup>nd</sup>-order loop resonator [51]. A further advantage of  $R_z$  is that it allows us to adjust the DAC coefficients and eliminate one DAC in each resonator. Table 9 summarizes the feedback-DAC coefficients, two of the DACs with small coefficients are eliminated to save power and area at the price of minor deformation of the NTF.

Table 9: DAC coefficients

| DAC Coeff. (µA) | k <sub>rz4</sub> | $k_{hz4}$ | k <sub>rz2</sub> | k <sub>hz2</sub> |
|-----------------|------------------|-----------|------------------|------------------|
| Low Band        | 0.8              | -102      | 36               | -168             |
| High Band       | 44               | -130      | 3                | -166             |

### 5.2.2. Resonator design



Figure 5-3. Schematic of the single-op-amp resonator

Figure 5-3 shows the schematic of the single-op-amp  $2^{nd}$ -order resonator. Three techniques improve the performance. First, in parallel with the output RC network, a resistor,  $R_z$ , adds a zero to the transfer function of the resonator, satisfying the extra zero in the numerator of the loop transfer function. The zero-inserted resonator transfer function with  $R_z$  is:

$$H_r(s) = \frac{0.5\omega_0 s + K\omega_0^2}{\omega_0^2 + s^2}$$
(5-3)

where  $\omega_0 = 1/R_n C_n = 1/R_p C_p = 1/R_o C_o$  and  $K = R_n/R_z$ .



Figure 5-4. Simulated STF with and w/o zero-insertion  $R_{\rm z}$ 

Figure 5-4 shows the simulated STF with and w/o zero-insertion  $R_z$ . With the help of the additional zero provided by  $R_z$ , we see much less STF gain variation.

Second, a Q-enhancement capacitor,  $C_c$ , in parallel with the series-feedback RC compensates for frequency offset and peak degradation due to insufficient op-amp GBW, improving the simulated Q from 23 to 79. Figure 5-5 shows the resonator gain with and without the Qenhancement capacitor.



Figure 5-5. Resonator gain with and w/o the Q-enhancement capacitor

Third, neutralization in the feedforward path of the multi-stage op-amp boosts the high-frequency gain of the op-amp. Figure 5-6 shows the op-amp gain with and without the neutralization capacitor, and we observe a ~4dB gain boost at 1.6GHz.



Figure 5-6. Op-amp gain with and w/o the neutralization capacitor

Table 10 summarizes the nominal values of the components.

| Nominal Values |             | R <sub>p</sub> | Cp     | R <sub>n</sub> | Cn    | Ro   | Co    | Cneut. | Cc | Rz   |
|----------------|-------------|----------------|--------|----------------|-------|------|-------|--------|----|------|
| Low Band –     | Resonator 1 | 600            | 165.3f | 1.2K           | 82.6f | 2.4K | 41.3f | 10f    | 3f | -    |
|                | Resonator 2 | - 000          |        |                |       |      |       |        |    | 5.2K |
| High Band –    | Resonator 1 | 600            | 148.2f | 1.2K           | 74.1f | 2.4K | 37f   | 10f    | 3f | -    |
|                | Resonator 2 | - 000          |        |                |       |      |       |        |    | 12K  |

# 5.3. Measurements





The prototype (Figure 5-7) is fabricated in 28nm CMOS and occupies an active area of 0.255mm<sup>2</sup>. The frequency-interleaving ADC consumes a total of 38mW. Compared with the state-of-the-art [25], [26], [28], [29], [52], the prototype has a much smaller active area and power consumption while supporting the highest input IF frequency of 1.5GHz, making it a very competitive choice for arrayed systems. To facilitate the high-speed 6GS/s sampling frequency testing, the die is chip-on-board attached to a daughter board to minimize bond-wire parasitics (Figure 5-8). An on-board Marki balun BAL-0003SMG converts the single-ended signal source to a balanced output. The motherboard generates regulated power supplies and references.



Figure 5-8. Test boards



Figure 5-9. Full-chip block diagram

The output bit-streams are dumped into an on-chip SRAM and read out later through an SPI interface. The full-chip block diagram is illustrated in Figure 5-9. The 6GS/s 8-bit data stream is parallelized by an SRAM decoder to 64-bit 750MS/s for moderate-speed SRAM writing. The SRAM has a word length of 64 and a depth of 1024 to support 8192-point FFT.



Figure 5-10. Measured 8192-point power spectra for each sub-modulator with FIR filter (top) and overall ADC power spectrum (bottom)

Figure 5-10 shows the measured 8192-point power spectra for each sub-modulator with the FIR filter enabled on the top and the combined power spectrum on the bottom. The measured SNDR and SFDR are 37dB and 44dB, respectively, for an input frequency of 1514.6MHz.



Figure 5-11. Measured 8192-point -9dBFS two-tone (1495MHz and 1505MHz) power spectrum



Figure 5-12. Measured STF with FIR filtering

Figure 5-12 shows the measured STF when FIR filters are enabled, the FIR filters introduce STF notch at around 1.4GHz/1.6GHz for high/low band sub-modulator, and the notch disappears after we combine two sub-modulators. The measured STF is flat from 1.2GHz to 1.8GHz.



Figure 5-13. Measured STF with FIR filtering

The measured dynamic range (DR) and power breakdown are reported in Figure 5-13. The measured DR is 40dB, and the total power consumption is 38mW.

Figure 5-14 shows the power/Fs and power/BW trends of recently published GHz DSMs. As we can see, the sampling frequency and the power consumption are bounded to about 9GS/s and 1W in CMOS processes. This work demonstrates that frequency interleaving architecture breaks the power-bandwidth barrier of GHz CT DSMs.



Figure 5-14. Power/Fs and power/BW trends for GHz DSMs

Table 11 compares this work to state-of-the-art GHz DSMs. This work features the highest input frequency, compact size, high BW, low power consumption, and moderate SNDR, making it ideal for MIMO and beamforming applications[53].

|                   | This Work   |       | ISSCC19<br>Wang | ISSCC17<br>Huang | VLSI17<br>Dayanik | ISSCC16<br>Dong | ISSCC16<br>Wu |        |
|-------------------|-------------|-------|-----------------|------------------|-------------------|-----------------|---------------|--------|
| Architecture      | FI-CT-BP-ΔΣ |       | CT-ΔΣ           | CT-ΔΣ            | CT-ΔΣ             | CT-ΔΣ           | CT-ΔΣ         |        |
| Technology [nm]   | 28          |       | 28              | 16               | 40                | 28              | 16            |        |
| Active Area [mm2] | 0.0255      |       | 0.019           | 0.217            | 0.45              | 1.4             | 0.155         |        |
| Fs [MS/s]         | 6000        |       | 2000            | 2150             | 5000              | 8000            | 2880          |        |
| BW [MHz]          | 300         |       | 100             | 125              | 156               | 465             | 160           |        |
| Order             | 4           |       | 4               | 4                | 3                 | 3               | 4             |        |
| OSR               | 20          |       | 10              | 8.6              | 16                | 8.6             | 9             |        |
| Fin_hf [MHz]      | 1383        | 1514  | 1617            | 18               | 40                | 100             | 400           | 30     |
| Fin_hf/BW         | 4.61        | 5.05  | 5.39            | 0.18             | 0.32              | 0.64            | 0.86          | 0.1875 |
| SNDR [dB]         | 40.16       | 37.04 | 37.33           | 72.6             | 71.9              | 64.1            | 64.7          | 65.33  |
| SFDR [dB]         | 52.11       | 44.11 | 53.49           | 83.6             | 85*               | -               | -             | 70*    |
| Power [mW] 38     |             | 16.3  | 54              | 233              | 930               | 40              |               |        |

Table 11: State-of-the-art GHz DSMs with BW>100MHz

\*Estimated from figure



### **Chapter 6. Future Work: Tiled System**

Figure 6-1. A 4× 16-element titled system for 64-element digital beamforming

The prototype 16-element beamformer can be tiled to achieve an even larger array size. Figure 6-1 shows a tiled system for a 64-element digital beamformer. On the motherboard, there are references, a digital beam output bus, and an SPI interface. A 64-element substrate supports a patch antenna array. Four identical DBF chips are attached to the substrate. An on-board FPGA processes the final output beam.

The tiled system enables many beamforming application scenarios: 1) four sub-beamformers can be steered to the same angles, with each sub-beamformer supporting four simultaneous beams; thus, four narrow beams can be generated from a total of 64 elements; 2) each sub-beamformer has four identical simultaneous steering angles, which leads to sixteen beams steered at different angles, each beam is obtained from 16 elements; and 3) depending on the spatial distribution of

the users, the tiled system can support on two  $4 \times 8$  or  $8 \times 4$  sub-tiled-systems, each sub-tiled-system gets narrower beams in the horizontal/vertical directions.

The digital beamformer has the appealing advantage that most of the on-board signal routing is in the digital-domain and operates at moderate bit-width and sample rate. For example, the prototype beamformer introduced in Chapter 4 has a decimated 12-bit digital output with a sample rate of 250MS/s. In the 4× tiled system, all of the digital outputs can be serialized to a 2-bit 1.5GS/s digital bus or a 3GS/s serial link. Commercial transceivers, including those in FPGAs, well support these data rates and at a reasonable cost  $[54]^9$ .

Although digital beamforming supports more elements per IC and significantly simplifies the on-board routing, there are still challenges for such a tiled system. First, a higher reference clock frequency improves on-chip PLL's performance, but in turn, it makes the onboard routing more complicated. This concern is exaggerated when high PLL performance and more tiled beamformers are required. Second, although many commercial FPGAs can support a 4× tiled system, it is not easy to support more beamformers with a single FPGA; thus, multiple FPGA might be necessary, further complicating the board design. Third, since multiple digital beamformers consume significant power, thermal management can be bulky and complicated.

<sup>&</sup>lt;sup>9</sup> The Xilinx Spartan-6 series FPGA XC6SLX75T supports up to 8 high speed transceivers at a maximum line rate of 3.125Gb/s.

### **Chapter 7. Conclusion**

We introduce a parallelized Continuous-Time Band-Pass Delta-Sigma ADC with multi-phasesampling in large-scale digital beamforming systems. The multi-phase-sampling sub-ADC array overcomes the ADC linearity bottleneck, improves SNDR, and provides inherent FIR filtering.

The prototype IF digital beamformer uses parallel element sub-ADC arrays. It demonstrates accurate measured beam-patterns and confirms the advantages of digital beamforming. The measured 77dB SFDR proves the harmonic suppression from the multi-phase-sampling technique.

The second beamforming prototype is a 16-element fully integrated 28GHz digital beamformer. Combined with a custom 8-layer LTCC substrate, it incorporates a 4×4 patch antenna array for a fully integrated 16-element single-chip 28GHz mm-wave-to-digital beamforming system. 16 elements per IC represents an excellent trade-off between die size, signal loss, and I/O routing complexity. Various system-level techniques enable the fully-integrated system. An inductor-less mm-wave frontend saves chip area. The parallel Continuous-Time Band-Pass Delta-Sigma ADC array provides built-in FIR filtering and facilitates high (1GHz) IF sampling. We minimize both LO distribution and mm-wave signal routing by the optimum placement of the bumps and RX slices. In both prototypes, the bit-stream processing efficiently handles the enormous raw data rate from 16 elements and generates four simultaneous independent beams.

Finally, we introduce frequency-interleaving to expand the bandwidth of the element continuous-time band-pass delta-sigma modulator ADCs. The prototype 28nm CMOS chip achieves measured SNDR and SFDR of 37dB and 44dB, respectively, at 300MHz BW. It supports

a high input frequency of 1.5GHz while consuming only 38mW, demonstrating that frequencyinterleaving breaks the power-bandwidth barrier of CT DSMs. The combination of small size, high IF, high BW, and moderate SNDR make the frequency-interleaved ADC ideal for MIMO and beamforming applications.

## Appendix A. Calculation method for parametrized DAC coefficients

Since the delta-sigma modulator natively works in discrete-time domain, regardless of continuous-time (CT) modulator or discrete-time (DT) modulator, the prototype NTF is synthesized in DT. A CT delta-sigma modulator must convert the CT loop transfer function to DT and then match the coefficients with the prototype NTF to get the coefficients of pulse-shaped DACs. This section uses an impulse-invariance approach, which matches the continuous-time loop impulse with the discrete-time loop response to get the CT to DT conversion [38]. Since the most published CT to DT conversion only valid for low-pass delta-sigma with rectangular DAC pulse-shape, we explore an analytic method to generalize the CT to DT conversion for parametrized rectangular DAC pulse-shape in a CTBPDSM, as this helps us to analyze how duty-cycle affects the NTF. In the end, the CT to DT conversion for arbitrary DAC shape is derived for non-rectangular pulse-shapes, which generalize the CT to DT conversion in a CTBPDSM to the full extent.

### A.1. Parametrized rectangular DAC pulse-shape



Figure A-1. 2<sup>nd</sup>-order and 4<sup>th</sup>-order continuous-time loops

There are three loops in the CTBPDSM shown in Figure 2-6: 1) a 4<sup>th</sup>-order loop with the first HZ DAC; 2) a 2<sup>nd</sup>-order loop with the second RZ and HZ DAC; and 3) a 2<sup>nd</sup>-order loop formed by the feedforward path and first HZ DAC. Because all three loops can be super-positioned, we can separately calculate the 2<sup>nd</sup>-order and 4<sup>th</sup>-order loop responses, as shown in Figure A-1. In the figure, the s-domain transfer functions are the responses of resonators. We parameterize the rectangular pulse-shape of feedback DACs with  $\alpha$  and  $\beta$ , both of which are bounded within one clock period (i.e.,  $0 < \alpha < \beta < 1$ ). The s-domain loop transfer functions are  $L_{2cAB}$  and  $L_{4cAB}$  for the 2<sup>nd</sup>-order and 4<sup>th</sup>-order loop, respectively.

There are three steps to determine the CT to DT loop transfer functions: 1) determine CT impulse response from s-domain transfer function; 2) sample the impulse response of the CT to obtain the DT impulse response; and 3) perform z-transform to obtain the CT to DT loop transfer function in the z-domain.

# 1) 2<sup>nd</sup>-order loop

We assume the nonlinear quantizer has a linearized gain of 1, and the s-domain loop transfer function of the 2<sup>nd</sup>-order loop in Figure A-1 is:

$$L_{2cAB} = \frac{G_2 \omega_0 s}{s^2 + \omega_0^2} \frac{e^{-\alpha s} - e^{-\beta s}}{s}$$
(A-1)

Apply inverse Laplace transform to (A-1), and the CT impulse response is hence:

$$h_{2cAB} = \mathcal{L}^{-1}\left\{L_{2cAB}\right\} = G_2\left\{\sin\left[\omega_0\left(t-\alpha\right)\right]u\left(t-\alpha\right) - \sin\left[\omega_0\left(t-\beta\right)u\left(t-\beta\right)\right]\right\}$$
(A-2)

Sampling the CT impulse response to get DT impulse response at every sample clock cycle  $t = nT = n\pi/2\omega_0$ , we start sampling at (n+1)T because the quantizer's digital code is available after the first clock cycle, and substitute the sampling frequency into (A-2):

$$h_{2cAB}\Big|_{(n+1)T} = G_2 \left\{ \sin\left[\frac{\pi}{2}(n+1) - \frac{\alpha\pi}{2T}\right] - \sin\left[\frac{\pi}{2}(n+1) - \frac{\beta\pi}{2T}\right] \right\} u[n]$$
(A-3)

The z-domain CT to DT 2<sup>nd</sup>-order loop transfer function is then:

$$z^{-1} \mathcal{Z}\left\{h_{2cAB}\big|_{(n+1)T}\right\} = G_2\left\{\left[\cos\left(\frac{\alpha\pi}{2T}\right) - \cos\left(\frac{\beta\pi}{2T}\right)\right]\frac{1}{1+z^{-2}} + \left[\sin\left(\frac{\alpha\pi}{2T}\right) - \sin\left(\frac{\beta\pi}{2T}\right)\right]\frac{z^{-1}}{1+z^{-2}}\right\}z^{-1} \quad (A-4)$$

2) 4<sup>th</sup>-order loop

A similar analysis can be applied to the 4<sup>th</sup>-order loop, linearizing the quantizer with a gain of 1, and the s-domain loop transfer function is:

$$L_{4cAB} = G_2 G_4 \left(\frac{\omega_0 s}{s^2 + \omega_0^2}\right)^2 \frac{e^{-\alpha s} - e^{-\beta s}}{s}$$
(A-5)

The CT impulse response is hence:

$$h_{4cAB} = \mathcal{L}^{-1}\left\{L_{4cAB}\right\} = G_2 G_4 \frac{\omega_0}{2}\left\{\left(t-\alpha\right)\sin\left[\omega_0\left(t-\alpha\right)\right]u\left(t-\alpha\right) - \left(t-\beta\right)\sin\left[\omega_0\left(t-\beta\right)u\left[t-\beta\right]\right]\right\}$$
(A-6)

Sample the CT impulse response at (n+1)T:

$$h_{4cAB}\Big|_{(n+1)T} = G_2 G_4 \frac{\omega_0}{2} \begin{cases} T\left[n+1\right] \left[ \sin\left[\frac{\pi}{2}(n+1) - \frac{\alpha\pi}{2T}\right] - \sin\left[\frac{\pi}{2}(n+1) - \frac{\beta\pi}{2T}\right] \right] \\ -\alpha \sin\left[\frac{\pi}{2}(n+1) - \frac{\alpha\pi}{2}\right] + \beta \sin\left[\frac{\pi}{2}(n+1) - \frac{\beta\pi}{2}\right] \end{cases} u[n]$$
(A-7)

The z-domain CT to DT 4<sup>th</sup>-order loop transfer function is then:

$$Z^{-1} Z \left\{ h_{4cAB} \Big|_{(n+1)T} \right\} = \frac{G_2 G_4 \pi}{4} \left\{ \left[ \cos \left( \frac{\alpha \pi}{2T} \right) - \cos \left( \frac{\beta \pi}{2T} \right) \right] \frac{1 - z^{-2}}{\left( 1 + z^{-2} \right)^2} + \left[ \sin \left( \frac{\alpha \pi}{2T} \right) - \sin \left( \frac{\beta \pi}{2T} \right) \right] \frac{2z^{-1}}{\left( 1 + z^{-2} \right)^2} \right\} z^{-1} - \frac{G_2 G_4 \pi}{4T} \left\{ \left[ \alpha \cos \left( \frac{\alpha \pi}{2T} \right) - \beta \cos \left( \frac{\beta \pi}{2T} \right) \right] \frac{1 + z^{-2}}{\left( 1 + z^{-2} \right)^2} + \left[ \alpha \sin \left( \frac{\alpha \pi}{2T} \right) - \beta \sin \left( \frac{\beta \pi}{2T} \right) \right] \frac{z^{-1} + z^{-3}}{\left( 1 + z^{-2} \right)^2} \right\} z^{-1} \right\} z^{-1}$$
(A-8)

The overall loop transfer function is obtained by superimposing all the loop responses. We apply the relationship between the NTF and the overall loop transfer function and combine the loops to get the equation for coefficients matching.

$$(1/NTF-1)z = k_{r_24}(L_{4dAB} + L_{2dAB}) + k_{h_24}(L_{4dAB} + L_{2dAB}) + k_{r_22}L_{2dAB} + k_{h_22}L_{2dAB}$$
(A-9)

The additional z in the LHS of (A-9) accounts for the ELD compensation delay. The DAC coefficients presented in Chapter 2.3.2 Table 4 are with the default RZ/HZ DAC pulse-shapes (i.e.,  $\alpha = 0, \beta = 0.5$  and  $\alpha = 0.5, \beta = 1$ ). The DAC coefficients are frozen once determined. However, changing  $\alpha$  or  $\beta$  is still possible. We substitute the DAC coefficients back into (A-9) while keeping  $\alpha$  and  $\beta$  as parameters and solve for the NTF in the z-domain, thus parameterizing the rectangular DAC pulse-shapes. By evaluating the NTF in (A-9) for different duty-cycles of the first HZ DAC, we can plot Figure 2-8.

# A.2. Arbitrary DAC pulse-shape

Arbitrary DAC pulse-shape gives us great insight into different DAC pulse-shapes. We derive analytic expression by integrating infinite rectangular pulse-shapes, which suggests an integrationlike expression in the end. First, we split an arbitrary DAC pulse-shape into infinite slices, as shown in Figure A-2.



Figure A-2. Arbitrary DAC pulse-shape

The arbitrary pulse-shape function can be approximated by adding N rectangular pulses:

$$f_{DAC}(t) = \sum_{i=0}^{N-1} f\left(i\frac{T}{N} + \frac{T}{2N}\right) \gamma \left[i\frac{T}{N}, (i+1)\frac{T}{N}\right]$$
(A-10)

where  $\gamma$  is the rectangular pulse-shape.

Then, for example, we apply one of the approximated arbitrary pulse-shape functions to (A-4):

$$H(z,i) = \frac{z^{-1}}{1+z^{-2}} \left[ 2\sin\left(\frac{(2i+1)\pi}{4N}\right) \sin\left(\frac{\pi}{4N}\right) + 2z^{-1}\cos\left(\frac{(2i+1)\pi}{4N}\right) \sin\left(\frac{-\pi}{4N}\right) \right] f_{DAC}\left(i\frac{T}{N} + \frac{T}{2N}\right) \quad (A-11)$$

Add T/N term to both numerator and denominator, and accumulate all the pulses:

$$H(z) = \sum_{i=0}^{N-1} H(z,i)$$

$$= \frac{z^{-1}}{T(1+z^{-2})} \sum_{i=0}^{N-1} \left[ 2\sin\left(\frac{(2i+1)\pi}{4T}\frac{T}{N}\right) \frac{\sin\left(\frac{\pi}{4N}\right)}{1/N} - \frac{1}{1/N} \right]_{DAC} \left(i\frac{T}{N} + \frac{T}{2N}\right) \frac{T}{N}$$
(A-12)
$$2z^{-1}\cos\left(\frac{(2i+1)\pi}{4T}\frac{T}{N}\right) \frac{\sin\left(\frac{\pi}{4N}\right)}{1/N} = \frac{1}{1/N}$$

When  $N \to \infty$ , the definition of integration leads us to the following analytic z-domain CT to DT 2<sup>nd</sup>-order loop transfer function with arbitrary DAC pulse shape:

$$H(z) = \lim_{N \to \infty} \sum_{i=0}^{N-1} H(z,i) = \frac{z^{-1}}{T(1+z^{-2})} \int_0^T \left[ \frac{\pi}{2} \sin\left(\frac{\pi}{2T}t\right) - \frac{\pi}{2} z^{-1} \cos\left(\frac{\pi}{2T}t\right) \right] f_{DAC}(t) dt$$
(A-13)

We apply similar procedures to the 4<sup>th</sup>-order loop, and the corresponding analytic transfer function is:

$$H(z) = \lim_{N \to \infty} \sum_{i=0}^{N-1} H(z,i) = \frac{\pi z^{-1}}{16T(1+z^{-2})^2} \times$$
  

$$= \int_0^T \left\{ T \Big[ (1-z^{-2}) H_1 + 2z^{-1} H_2 \Big] - (1+z^{-2}) H_3 - (z^{-1}+z^{-3}) H_4 \right\} f_{DAC}(t) dt$$
where  

$$H_1 = \frac{\pi}{2} \sin\left(\frac{\pi}{2T}t\right)$$

$$H_2 = \frac{-\pi}{2} \cos\left(\frac{\pi}{2T}t\right)$$

$$H_3 = \frac{\pi}{2T} t \sin\left(\frac{\pi}{2T}t\right) - \cos\left(\frac{\pi}{2T}t\right)$$

$$H_4 = \frac{-\pi}{2T} t \cos\left(\frac{\pi}{2T}t\right) - \sin\left(\frac{\pi}{2T}t\right)$$

# Appendix B. Center-frequency offset of the resonator with finite op-amp GBW

As derived in Chapter 2.3.3, the resonator transfer function with a finite GBW op-amp is:

$$H_{r}(s) = \frac{0.5\omega_{0}s}{\left(s^{2} + \omega_{0}^{2}\right) + \frac{1}{A(s)}\left(s^{2} + 4\omega_{0}s + \omega_{0}^{2}\right)}$$
(2-23)

The center frequency of the resonator is obtained when the denominator of (2-23) approaches zero. Substitute  $s = j\omega$  into (2-23):

$$H_{r_{-}den} = -\left(1 + \frac{1}{A_0} + \frac{4\omega_0}{GBW}\right)\omega^2 + \left(1 + \frac{1}{A_0}\right)\omega_0^2 + j\left[-\frac{1}{GBW}\omega^3 + \left(\frac{4\omega_0}{A_0} + \frac{\omega_0^2}{GBW}\right)\omega\right]$$
(B-1)

Since the GBW and DC gain A<sub>0</sub> of the op-amp is typically very high, we simplify (B-1):

$$H_{r\_den} = -\left(1 + \frac{4\omega_0}{GBW}\right)\omega^2 + \omega_0^2 + j\left(\frac{4\omega_0}{A_0} + \frac{\omega_0^2}{GBW}\right)\omega$$
(B-2)

The denominator is minimized when the real part equals with the imaginary part of (B-2); hence we get a quadratic equation:

$$\left(1 + \frac{4\omega_0}{GBW}\right)\omega^2 + \left(\frac{4\omega_0}{A_0} + \frac{\omega_0^2}{GBW}\right)\omega - \omega_0^2 = 0$$
(B-3)

There are two solutions for (B-3):

$$\omega_{1,2} = \frac{-\left(\frac{4\omega_0}{A_0} + \frac{\omega_0^2}{GBW}\right) \pm \sqrt{\left(\frac{4\omega_0}{A_0} + \frac{\omega_0^2}{GBW}\right)^2 + 4\omega_0^2 \left(1 + \frac{4\omega_0}{GBW}\right)}}{2\left(1 + \frac{4\omega_0}{GBW}\right)}$$
(B-4)

Taking the average of two solutions, we get the center-frequency estimation in Chapter 2.3.3:

$$\omega_{center} = \frac{2\sqrt{\left(\frac{4\omega_0}{A_0} + \frac{\omega_0^2}{GBW}\right)^2 + 4\omega_0^2 \left(1 + \frac{4\omega_0}{GBW}\right)}}{2\left(1 + \frac{4\omega_0}{GBW}\right)} \approx \frac{\omega_0}{\sqrt{1 + \frac{4\omega_0}{GBW}}}$$
(2-24)

# **Bibliography**

- E. Larsson and L. Van der Perre, "Massive MIMO for 5G," *IEEE Futur. Networks Tech Focus*, vol. 1, no. 1, 2017.
- [2] A. Mukherjee *et al.*, "Licensed-Assisted Access LTE: Coexistence with IEEE 802.11 and the evolution toward 5G," *IEEE Commun. Mag.*, vol. 54, no. 6, pp. 50–57, Jun. 2016.
- [3] "TS 38.101-1: NR; User Equipment (UE) radio transmission and reception; Part 1: Range
   1 Standalone." (16.3.0 ed.). 3GPP. 2020-04-08.
- [4] "TS 38.101-2: NR; User Equipment (UE) radio transmission and reception; Part 2: Range 2 Standalone." (16.3.1 ed.). 3GPP. 2020-04-09.
- [5] Avionics Department, *Electronic Warfare and Radar Systems Engineering Handbook*.
   Naval Air Warfare Center Weapons Division (NAWCWD), 2013.
- [6] H. B. Hamid Dutty and M. M. Mowla, "Weather impact analysis of mmWave channel modeling for aviation backhaul networks in 5G communications," in 2019 22nd International Conference on Computer and Information Technology, ICCIT 2019, 2019.
- [7] F. W. Vook, A. Ghosh, and T. A. Thomas, "MIMO and beamforming solutions for 5G technology," in *IEEE MTT-S International Microwave Symposium Digest*, 2014.
- [8] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, "Spatially sparse precoding in millimeter wave MIMO systems," *IEEE Trans. Wirel. Commun.*, vol. 13, no. 3, pp. 1499–1513, 2014.
- [9] J. Jeong, N. Collins, and M. P. Flynn, "A 260 MHz IF Sampling Bit-Stream Processing Digital Beamformer With an Integrated Array of Continuous-Time Band-Pass Modulators,"

IEEE J. Solid-State Circuits, vol. 51, no. 5, pp. 1168–1176, May 2016.

- [10] S. Jang, J. Jeong, R. Lu, and M. P. Flynn, "A 16-Element 4-Beam 1 GHz IF 100 MHz Bandwidth Interleaved Bit Stream Digital Beamformer in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 5, pp. 1302–1312, May 2018.
- S. Jang, R. Lu, J. Jeong, and M. P. Flynn, "A 1-GHz 16-Element Four-Beam True-Time-Delay Digital Beamformer," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1304–1314, May 2019.
- [12] I. Ahmed *et al.*, "A Survey on Hybrid Beamforming Techniques in 5G: Architecture and System Model Perspectives," *IEEE Commun. Surv. Tutorials*, vol. 20, no. 4, pp. 3060–3097, 2018.
- [13] J. Pang et al., "A 28GHz CMOS Phased-Array Beamformer Utilizing Neutralized Bi-Directional Technique Supporting Dual-Polarized MIMO for 5G NR," in IEEE International Solid-State Circuits Conference, 2019, pp. 344–346.
- [14] F. Sohrabi and W. Yu, "Hybrid Digital and Analog Beamforming Design for Large-Scale Antenna Arrays," *IEEE J. Sel. Top. Signal Process.*, vol. 10, no. 3, pp. 501–513, Apr. 2016.
- [15] B. Yang, Z. Yu, J. Lan, R. Zhang, J. Zhou, and W. Hong, "Digital Beamforming-Based Massive MIMO Transceiver for 5G Millimeter-Wave Communications," *IEEE Trans. Microw. Theory Tech.*, vol. 66, no. 7, pp. 3403–3418, Jul. 2018.
- [16] R. Rotman, M. Tur, and L. Yaron, "True Time Delay in Phased Arrays," *Proc. IEEE*, vol. 104, no. 3, pp. 504–518, Mar. 2016.
- [17] S. H. Talisa, K. W. O'Haver, T. M. Comberiate, M. D. Sharp, and O. F. Somerlock, "Benefits of Digital Phased Array Radars," *Proc. IEEE*, vol. 104, no. 3, pp. 530–543, Mar. 2016.

- [18] P. K. Bailleul, "A New Era in Elemental Digital Beamforming for Spaceborne Communications Phased Arrays," *Proc. IEEE*, vol. 104, no. 3, pp. 623–632, Mar. 2016.
- [19] S. Pellerano *et al.*, "A Scalable 71-to-76GHz 64-Element Phased-Array Transceiver Module with 2×2 Direct-Conversion IC in 22nm FinFET CMOS Technology," in *Digest of Technical Papers - IEEE International Solid-State Circuits Conference*, 2019, vol. 2019-February, pp. 174–176.
- [20] M. Y. Huang, T. Chi, F. Wang, T. W. Li, and H. Wang, "A 23-to-30GHz hybrid beamforming MIMO receiver array with closed-loop multistage front-end beamformers for full-FoV dynamic and autonomous unknown signal tracking and blocker rejection," in *Digest of Technical Papers - IEEE International Solid-State Circuits Conference*, 2018, vol. 61, pp. 68–70.
- [21] H. Chae, J. Jeong, G. Manganaro, and M. P. Flynn, "A 12 mW low power continuous-time Bandpass ΔΣ modulator with 58 dB SNDR and 24 MHz Bandwidth at 200 MHz if," *IEEE J. Solid-State Circuits*, vol. 49, no. 2, pp. 405–415, Feb. 2014.
- [22] H. Chae and M. P. Flynn, "A 69 dB SNDR, 25 MHz BW, 800 MS/s Continuous-Time Bandpass ΔΣ Modulator Using a Duty-Cycle-Controlled DAC for Low Power and Reconfigurability," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 649–659, Mar. 2016.
- [23] R. Schreier and G. C. Temes, Understanding delta-sigma data converters. IEEE Press, 2005.
- [24] M. Ortmanns, F. Gerfers, and Y. Manoli, "Influence of finite integrator gain bandwidth on continuous-time sigma delta modulators," in *IEEE International Symposium on Circuits* and Systems, 2003, vol. 1, pp. I-925-I–928.
- [25] H. Shibata et al., "A DC-to-1 GHz Tunable RFΔΣ ADC Achieving DR=74 dB and BW=150

MHz atf0=450 MHz Using 550 mW," IEEE J. Solid-State Circuits, vol. 47, no. 12, pp. 2888–2897, Dec. 2012.

- [26] W. Wang, C. H. Chan, Y. Zhu, and R. P. Martins, "A 72.6dB-SNDR 100MHz-BW 16.36mW CTDSM with Preliminary Sampling and Quantization Scheme in Backend Subranging QTZ," in *Digest of Technical Papers - IEEE International Solid-State Circuits Conference*, 2019, vol. 2019-February, pp. 340–342.
- [27] S. Dey, K. Reddy, K. Mayaram, and T. S. Fiez, "A 50 MHz BW 76.1 dB DR Two-Stage Continuous-Time Delta-Sigma Modulator with VCO Quantizer Non-linearity Cancellation," *IEEE J. Solid-State Circuits*, vol. 53, no. 3, pp. 799–813, Mar. 2018.
- [28] S. J. Huang, N. Egan, D. Kesharwani, F. Opteynde, and M. Ashburn, "A 125MHz-BW 71.9dB-SNDR VCO-based CT ΔΣ ADC with segmented phase-domain ELD compensation in 16nm CMOS," in *Digest of Technical Papers - IEEE International Solid-State Circuits Conference*, 2017, vol. 60, pp. 470–471.
- [29] S. H. Wu, T. K. Kao, Z. M. Lee, P. Chen, and J. Y. Tsai, "A 160MHz-BW 72dB-DR 40mW continuous-time ΔΣ modulator in 16nm CMOS with analog ISI-reduction technique," in *Digest of Technical Papers - IEEE International Solid-State Circuits Conference*, 2016, vol. 59, pp. 280–281.
- [30] N. Kurosawa, H. Kobayashi, K. Maruyama, H. Sugawara, and K. Kobayashi, "Explicit analysis of channel mismatch effects in time-interleaved ADC systems," *IEEE Trans. Circuits Syst. I Fundam. Theory Appl.*, vol. 48, no. 3, pp. 261–271, Mar. 2001.
- [31] O. Oliaei, "Clock jitter noise spectra in continuous-time delta-sigma modulators," in *IEEE Symposium on VLSI Circuits*, 1999, vol. 2, pp. 192–195.
- [32] K. Reddy and S. Pavan, "Fundamental Limitations of Continuous-Time Delta-Sigma

Modulators Due to Clock Jitter," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 54, no. 10, pp. 2184–2194, Oct. 2007.

- [33] R. Schreier and G. C. Temes, "Delta Sigma Toolbox," 2016. [Online]. Available: https://www.mathworks.com/matlabcentral/fileexchange/19-delta-sigma-toolbox.
   [Accessed: 30-Jul-2019].
- [34] P. Benabes, M. Keramat, and R. Kielbasa, "A methodology for designing continuous-time sigma-delta modulators," in *European Design and Test Conference*, 1997, pp. 46–50.
- [35] A. Yahia, P. Benabes, and R. Kielbasa, "Bandpass Delta-Sigma modulators synthesis with high loop delay," in *IEEE International Symposium on Circuits and Systems*, 2001, vol. 1, pp. 344–347.
- [36] W. Gao, O. Shoaei, and W. M. Snelgrove, "Excess loop delay effects in continuous-time delta-sigma modulators and the compensation solution," in *IEEE International Symposium* on Circuits and Systems, 1997, vol. 1, pp. 65–68.
- [37] J. A. Cherry and W. M. Snelgrove, "Excess loop delay in continuous-time delta-sigma modulators," *IEEE Trans. Circuits Syst. II Analog Digit. Signal Process.*, vol. 46, no. 4, pp. 376–389, Apr. 1999.
- [38] O. Shoaei and W. M. Snelgrove, "A multi-feedback design for LC bandpass delta-sigma modulators," in *IEEE International Symposium on Circuits and Systems*, 1995, vol. 1, pp. 171–174.
- [39] H. Chae, J. Jeong, G. Manganaro, and M. Flynn, "A 12mW low-power continuous-time bandpass ΔΣ modulator with 58dB SNDR and 24MHz bandwidth at 200MHz IF," in *IEEE International Solid-State Circuits Conference*, 2012, vol. 55, pp. 148–149.
- [40] Masaya Miyahara, Yusuke Asada, Daehwa Paik, and Akira Matsuzawa, "A low-noise self-

calibrating dynamic comparator for high-speed ADCs," in *IEEE Asian Solid-State Circuits Conference*, 2008, pp. 269–272.

- [41] D. J. Foley and M. P. Flynn, "CMOS DLL-based 2-V 3.2-ps jitter 1-GHz clock synthesizer and temperature-compensated tunable oscillator," *IEEE J. Solid-State Circuits*, vol. 36, no. 3, pp. 417–423, Mar. 2001.
- [42] K. Agarwal and R. Montoye, "A Duty-Cycle Correction Circuit for High-Frequency Clocks," in Symposium on VLSI Circuits, 2006, pp. 106–107.
- [43] H. Wang, L. Zhang, and Z. Yu, "A wideband inductorless LNA with local feedback and noise cancelling for low-power low-voltage applications," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 57, no. 8, pp. 1993–2005, 2010.
- [44] A. Puglielli, G. Lacaille, A. M. Niknejad, G. Wright, B. Nikolic, and E. Alon, "Phase noise scaling and tracking in OFDM multi-user beamforming arrays," in 2016 IEEE International Conference on Communications, ICC 2016, 2016.
- [45] T. Höhne and V. Ranki, "Phase Noise in Beamforming," *IEEE Trans. Wirel. Commun.*, vol. 9, no. 12, pp. 3682–3689, Dec. 2010.
- [46] M. E. Rasekh, M. Abdelghany, U. Madhowz, and M. Rodwell, "Phase noise analysis for mmwave massive MIMO: A design framework for scaling via tiled architectures," in 2019 53rd Annual Conference on Information Sciences and Systems, CISS 2019, 2019.
- [47] R. B. Staszewski *et al.*, "All-digital PLL and transmitter for mobile phones," in *IEEE Journal of Solid-State Circuits*, 2005, vol. 40, no. 12, pp. 2469–2480.
- [48] X. Chen *et al.*, "Analysis and Design of an Ultra-Low-Power Bluetooth Low-Energy Transmitter With Ring Oscillator-Based ADPLL and 4× Frequency Edge Combiner," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1339–1350, May 2019.

- [49] X. Chen, A. Alghaihab, Y. Shi, D. S. Truesdell, B. H. Calhoun, and D. D. Wentzloff, "A Crystal-Less BLE Transmitter With Clock Recovery From GFSK-Modulated BLE Packets," *IEEE J. Solid-State Circuits*, 2021.
- [50] Keysight, "Fundamentals of RF and Microwave Power Measurements," *Application Note*, vol. 57–1. pp. 25–29, 2002.
- [51] J. Bell and M. P. Flynn, "A simultaneous multiband continuous-time adc with 90-mhz aggregate bandwidth in 40-nm cmos," *IEEE Solid-State Circuits Lett.*, vol. 2, no. 9, pp. 91–94, Sep. 2019.
- [52] Y. Dong et al., "A 930mW 69dB-DR 465MHz-BW CT 1-2 MASH ADC in 28nm CMOS," in Digest of Technical Papers - IEEE International Solid-State Circuits Conference, 2016, vol. 59, pp. 278–279.
- [53] K. Roth and J. A. Nossek, "Achievable Rate and Energy Efficiency of Hybrid and Digital Beamforming Receivers With Low Resolution ADC," *IEEE J. Sel. Areas Commun.*, vol. 35, no. 9, pp. 2056–2068, Sep. 2017.
- [54] Xilinx, "Cost-Optimized Portfolio Product Tables and Product Selection Guide," 2015.