## Energy Efficient Integrated Circuits and Systems for Communications and Sensing: RF to Optical

by

Farzad Khoeini

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical and Computer Engineering) in The University of Michigan 2022

Doctoral Committee:

Professor Ehsan Afshari, Chair Professor Hui Deng Professor Kamal Sarabandi Professor David Wentzloff

Farzad Khoeini

fkhoeini@umich.edu

ORCID iD: 0000-0003-0249-6516

 $\bigodot$ Farzad Khoeini 2022

To my other half Bahareh and my trule love Hossein

## ACKNOWLEDGEMENTS

Although this dissertation bears my name as the sole author, it could not have been completed without encouragement, support, and cooperation of many individuals whom I will not be able to list all and thank enough accordingly. This is however a unique opportunity for me to sincerely thank Professor Ehsan Afshari for being not only a great advisor but also a true friend and a brother with unstinting support. The extent of his impact on my life is astounding and of course, the depth of my gratitude cannot be expressed by a few words. My friendship with Ehsan began before my graduate study at the University of Michigan when I contacted him for the first time. I cannot believe how dedicated a prominent professor at a top school who knew me little at that time could spend plenty of time speaking with me and addressing my numerous concerns and questions one by one. Obviously this was not because he wanted to recruit me as his group was full of smart students. Rather, he was kindly encouraging someone with a relatively different situation. Later Ehsan helped me to make my greatest accomplishment in my life; he helped me to marry with my true love, Bahareh. A couple years afterward, I got admitted to the prestigious University of Michigan with Ehsan's support. Over the past years in Michigan, he has taught me much beyond research from amazing lectures to ethics and sport. I also owe a large debt of gratitude to him for giving me the liberty to pursue the path of research that fascinated me the most. I am so honored to be advised by such a learned and brilliant but humble professor.

Many thanks also go out to the rest of my thesis committee, namely Prof. Kamal

Sarabandi, Prof. David Wentzloff, and Prof. Hui Deng for their invaluable comments and suggestions. Besides, I was so delighted to take Analog Integrated Circuits course with Prof. Wentzloff and Microwave Measurements Lab with Prof. Sarabandi at the University of Michigan. I learned a great deal of RF and Microwave design skills and insights in these two courses. I also would like to extend my gratitude to Prof. Amir Mortazawi and Prof. Michael Flynn of the University of Michigan for giving me great insights on microwave circuits and data converters during the course of my Ph.D.

I am glad and proud I was surrounded by elite members and alumni of my academic family, UNIC lab, namely Seyed Hossein Naghavi, Zainulabideen Khalifa, Lili Chen, Morteza Tavakoli Taba, James Gruber, Aditya Varma Muppala, Hamad Alotaibi, Dr. Vahnood Pourahmad, Dr. Ali Mostajeran, Prof. Hamidreza Aghasi, Prof. Najme Ebrahimi, Dr. Morteza Sheikhsofla, Dr. Chen Jiang, and Samir Nooshabadi. I really appreciate them for making a friendly and constructive atmosphere over the past years. My thanks also go out to the RadLab current members and alumni, especially Dr. Adib Nashashibi and Abdelhamid Nasr for their great and immediate assistance with the test equipment.

Over the past three years, I have had intimate and extensive collaboration with the fantastic team of Avicena Tech Corp., namely Dr. Bardia Pezeshki, Emad Afifi, Dr. Robert Kalman, Dr. Alex Tselikov, Dr. Christian Kromer, Dr. Rahul shringarpure, Chri Saint, Dr. Cameron Danesh, Dr. Max (Sunghwan) Min, Drew Hallman-Osinski, Dr. Vahid Mirkhani, and Sama Pourmojib. I would like to thank all of them. I would also like to take this opportunity and particularly express my sincere gratitude toward my great friend Emad who patiently spent tones of hours with me on many fruitful discussions.

I would like to extend my gratitude to my former esteemed advisors and professors in particular, Prof. Mohammad Sharifkhani, Prof. Ali Fotowat-Ahmady, and Prof. Siroos Toofan. This endeavor would not have been possible without their unflagging encouragement and support.

No words can express how much my parents mean to me. Unfortunately, my beloved father, Safarali who was bearing the burdens of parenting eight children suffered from cancer for two years until I lost him just before my graduate study in the US. I feel super light that at least I nursed him in his last years and days of life. I took him for chemotherapy and radiotherapy so many times and unbelievably, these moments were a strange mixture of sadness and happiness for me. He was an extraordinarily creative hands-on man without an academic education, yet he always wanted his children including me to pursue their higher education and make their dreams come true. Rest in peace "Agha joon", I will always miss and remember you! I cannot thank you enough for your countless sacrifices!

Fortunately, my beloved mother Mama Roghayeh, who has sacrificed her entire life for the success and prosperity of her children is still alive and I am so happy she will hear my graduation. This is undoubtedly one of the happiest moments for every parent. My deepest gratitude here goes out to her and my beloved family for their endless support. I would also like to thank my father-in-law Hamid and my mother-in-law Masoumeh for loving and supporting me like their son.

Undoubtedly the most precious gift I have ever received in my life is my other half. No words are enough to express my gratitude and love to my wife Bahareh, who has made many sacrifices and provided unwavering support. Bahareh has been a genuine companion since I met and married her. When she accepted me as her husband almost eight years ago, we started a journey with lots of ups and downs. Whenever I was faced with a work failure, Bahareh was not only the sympathetic psychologist to take care of me, but also a bright engineer with several workarounds. I wish I had her too many years before I met her. I could not have reached this point without her!

I would also like to express how much I am blessed and excited to have a new

source of emotion and passion in my family! I love you so much our 1-month-old Hossein, I am so proud to be your father!

Finally yet most importantly, I am eternally grateful for the person to whom I owe all of my accomplishments. His highness is an incomparable person whom I have been loving him since I found myself in this world. May God bless our kindest father and expedite his appearance.

## TABLE OF CONTENTS

| DEDICATION                                          |                                                                       | ii              |
|-----------------------------------------------------|-----------------------------------------------------------------------|-----------------|
| ACKNOWLEDGEN                                        | IENTS                                                                 | iii             |
| LIST OF FIGURES                                     |                                                                       | х               |
| LIST OF TABLES .                                    |                                                                       | xvii            |
| ABSTRACT                                            |                                                                       | viii            |
| CHAPTER                                             |                                                                       |                 |
| I. Introduction                                     |                                                                       | 1               |
| <ol> <li>1.1 Backgr</li> <li>1.2 Dissert</li> </ol> | ound                                                                  | 1<br>8          |
| II. MicroLED-b<br>Efficient Inte                    | ased Optical Links for Ultra Dense and Energy<br>erchip Communication | 11              |
| 2.1 Introdu                                         | uction                                                                | 11              |
| 2.2 A 4 G                                           | ops Single-ended Optical Receiver with an Integrated                  |                 |
| Blue L                                              | ight Photodetector for Parallel Optical Communication                 | 14              |
| 2.2.1                                               | Top-Level Schematic of the Design                                     | 15              |
| 2.2.2                                               | Integrated Blue Light Photodetector                                   | 16              |
| 2.2.3                                               | Transimpedance-to-Noise Optimization of the Shunt                     | 1 🗖             |
| 0.0.4                                               | Feedback TIA                                                          | 17              |
| 2.2.4                                               | Post and Limiting Ampliners                                           | 24              |
| 2.2.0                                               | Experimental Results                                                  | $\frac{20}{35}$ |
| 2.2.0<br>2.3 A 32 F                                 | lements 64 Gbps Data throughput Optical Receiver Array                | $\frac{55}{43}$ |
| 2.5 1.02 1                                          | Transimpedance-to-Noise Optimization for Differen-                    | 10              |
| 2.0.1                                               | tial Shunt Feedback Matched TIA                                       | 44              |
| 2.3.2                                               | Differential Post Amplifier and Limiting Amplifiers .                 | 49              |

|                                 | 2.3.3 Differential Digital Offset Cancellation            | 49  |
|---------------------------------|-----------------------------------------------------------|-----|
|                                 | 2.3.4 Experimental Results                                | 50  |
| 2.4                             | A 128 Elements 256 Gbps Data throughput Optical Receiver  |     |
|                                 | Array                                                     | 53  |
| 2.5                             | Parallel vs Serial Optical Communication                  | 55  |
| 2.6                             | Conclusion                                                | 62  |
|                                 |                                                           |     |
| III. A Tra                      | ansimpedance-to-Noise Optimized Analog Front-end with     |     |
| $\operatorname{High}$           | PSRR for Pulsed ToF Lidar Receivers                       | 63  |
| C                               |                                                           |     |
| 3.1                             | Introduction                                              | 63  |
| 3.2                             | Transimpedance-to-Noise Optimization                      | 69  |
| 3.3                             | Implementation                                            | 78  |
|                                 | 3.3.1 TIA                                                 | 79  |
|                                 | 3.3.2 Post Amplifier (PA) and Push-Pull driver            | 85  |
| 3.4                             | Experimental Results                                      | 88  |
| 3.5                             | Conclusion                                                | 98  |
|                                 |                                                           |     |
| IV. A Ne                        | w TIA Topology: Push-Pull Regulated Cascode TIA           | 99  |
|                                 |                                                           |     |
| 4.1                             | Introduction                                              | 99  |
| 4.2                             | Push-Pull Regulated Cascode TIA                           | 102 |
| 4.3                             | Experimental Results                                      | 104 |
|                                 |                                                           |     |
| $\mathbf{V}$ . A $\mathbf{C}$ M | IOS Sensor for Measuring Parasitic Capacitance of On-     |     |
| $\operatorname{Chip}$           | Photodetectors                                            | 110 |
|                                 |                                                           |     |
| 5.1                             | Introduction                                              | 110 |
| 5.2                             | Photodetector Capacitance Meter                           | 111 |
|                                 |                                                           |     |
| VI. Reflec                      | ction-based Short Pulse Generation in CMOS                | 118 |
|                                 |                                                           |     |
| 6.1                             | Introduction                                              | 118 |
| 6.2                             | Reflection-based Short Pulse Generation (RSPG) Theory and |     |
|                                 | Implementation                                            | 119 |
| 6.3                             | Measurement Results                                       | 125 |
| 6.4                             | Conclusion                                                | 131 |
|                                 |                                                           |     |
| VII. An E                       | nergy Efficient Fully Integrated 20 Gbps OOK Wireless     |     |
| Trans                           | mitter at 220 GHz                                         | 132 |
|                                 |                                                           |     |
| 7.1                             | Introduction                                              | 132 |
| 7.2                             | Implementation                                            | 133 |
| 7.3                             | Experimental Results                                      | 136 |
| 7.4                             | Conclusion                                                | 141 |
|                                 |                                                           |     |

| APPENDIX     | 142 |
|--------------|-----|
| BIBLIOGRAPHY | 145 |

## LIST OF FIGURES

## Figure

| 1.1 | Information and Communication Technology (ICT) may take more                                    | -  |
|-----|-------------------------------------------------------------------------------------------------|----|
|     | than 20% of global energy production by $2030$ [1]                                              | 2  |
| 1.2 | The wire capacitance prevails the energy consumption in chip-to-chip                            |    |
|     | communication.                                                                                  | 3  |
| 1.3 | Cross section of a GPU that communicates with HBM $[2]$                                         | 4  |
| 1.4 | The energy efficiency gets worse when the communication distance                                |    |
|     | increases                                                                                       | 4  |
| 1.5 | A figure of merit that accounts for energy efficiency and bandwidth                             |    |
|     | density as a function of communication distance $[3]$                                           | 5  |
| 1.6 | Parallel optical interface to improve energy efficiency and density                             |    |
|     | using microTXs and microRXs.                                                                    | 6  |
| 2.1 | (a) $\stackrel{\sim}{A}$ 16x16 array of 4 $\mu$ m diameter emitters, with 30 $\mu$ m spacing on |    |
|     | a sapphire substrate seen through a 1 mm plastic imaging fiber. (b) a                           |    |
|     | SEM photo of the microLEDs transferred to a CMOS die. The con-                                  |    |
|     | tact pad on the side is the cathode drive electrode, which is connected                         |    |
|     | in a subsequent lithography step [4,5].                                                         | 12 |
| 2.2 | The conceptual diagram for a full duplex parallel optical chip-to-chip                          |    |
|     | communication link In this architecture the electrical interfaces                               |    |
|     | (EI) send the data to the optical transmitters (TX) to modulate the                             |    |
|     | microLEDs. The optical light is carried over an imaging fiber and is                            |    |
|     | detected by the photodiodes on the receiver (RX) side. The RX am-                               |    |
|     | plifies the detector signal and sends it to the EL This communication                           |    |
|     | is also performed in the reverse direction at the same time                                     | 13 |
| 23  | A realization of the full dupley parallel optical link for the chip-to-                         | 10 |
| 2.0 | chip communication [5]                                                                          | 13 |
| 24  | The top level schematic of the proposed 4 Cbps fully integrated op-                             | 10 |
| 2.4 | tical receiver with a digital offset cancellation (DOC)                                         | 16 |
| 25  | The cross section of the blue light photodetector in the CMOS SOL                               | 10 |
| 2.0 | process                                                                                         | 17 |
| າເ  | (a) Shunt foodback TIA and the first post amplifor $(DA1)$ (b) the                              | 11 |
| 2.0 | (a) Shuff recuback TIA and the first post amplifier (PAI) (b) the                               | 10 |
| 2.7 | The aburt feedback TIA small signal model in prosence of the residence                          | 10 |
| 2.1 | The shund reedback TTA small signal model in presence of the noise.                             | 20 |

| 2.8 Normalized holse bandwidths to the -5db bandwidth as a function of demping ratio $(\ell)$ $RW$ and $RW$ are minimum for $\ell = 0.71$ and  |         |
|------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| damping ratio ( $\zeta$ ). $DW_n$ and $DW_{n2}$ are minimum for $\zeta = 0.71$ and $\zeta = 0.71$ and                                          | 91      |
| $\zeta = 0.44$ , respectively                                                                                                                  | 21      |
| 2.9 Qualitative must ation of the (a) input-referred rms holse current                                                                         |         |
| and (b) transmipedance-to-input-referred rms noise current (normal-<br>ized to $10^3$ ) as a function of $C/C$ for various values of $CBW$ for |         |
| ized to $10^{\circ}$ ) as a function of $C_I/C_{PD}$ for various values of $GDW$ for $DW$                                                      | 00      |
| $BW_{-3dB} = 2.8 \text{ GHz}, C_{PD} = 10 \text{ IF}, J_T = 40 \text{ GHz}, \text{ and } 1 = 3 $                                               | 22      |
| 2.10 Post amplifier 2 (PA2) circuit implementation.                                                                                            | 24      |
| 2.11 (a) Circuit implementation of the single-ended LA1, and (b) the DC                                                                        |         |
| transfer characteristics of the cascaded LA1 at the corresponding out-                                                                         | <b></b> |
| puts at $V_{DD} = 0.9$ V                                                                                                                       | 25      |
| 2.12 Circuit implementation of the LA2 and two cascaded stages of LA3.                                                                         | 25      |
| 2.13 Simplified model of an analog offset cancellation loop                                                                                    | 27      |
| 2.14 Bode plot of the analog offset cancellation loop.                                                                                         | 27      |
| 2.15 The circuit implementation of the digital offset cancellation (DOC).                                                                      | 27      |
| 2.16 The comparator schematic based on the strongArm configuration.                                                                            | 28      |
| 2.17 (a) The DGC output bits when the comparator senses a voltage above                                                                        |         |
| $V_{CM}$ and (b) the DGC output bits when the comparator senses a                                                                              |         |
| voltage below $V_{CM}$                                                                                                                         | 29      |
| 2.18 The circuit schematic of the R-R 5-bit digital to analog converter                                                                        |         |
| $(DAC). \dots \dots$                     | 30      |
| 2.19 (a) Circuit implementation of the REFH and REFL voltage generator                                                                         |         |
| for the DAC (b) the op amps used in the reference generator                                                                                    | 31      |
| 2.20 The digital offset cancellation loop model                                                                                                | 32      |
| 2.21 (a) The gain of an N-bit ADC and (b) extrapolating the gain of the                                                                        |         |
| N-bit ADC to 1-bit comparator                                                                                                                  | 32      |
| 2.22 The offset cancellation loop gain                                                                                                         | 34      |
| 2.23 The low frequency signal transfer function                                                                                                | 35      |
| 2.24 Die photograph of the proposed fully integrated optical receiver. The                                                                     |         |
| DOC circuitry that occupies $0.0075 \mathrm{mm^2}$ is not shown here because                                                                   |         |
| of top metal fillers that cover the underneath blocks                                                                                          | 36      |
| 2.25 The V2IC network used in the electrical receiver to electrically char-                                                                    |         |
| acterize the TIA.                                                                                                                              | 36      |
| 2.26 (a) Test buffer used to linearly characterize the electrical receiver and                                                                 |         |
| (b) output buffer used to drive external equipment for the optical                                                                             |         |
| receiver eye and bit error rate measurements.                                                                                                  | 37      |
| 2.27 The frequency response out of the test buffer for the electrical receiver                                                                 | : 37    |
| 2.28 Output rms noise voltage measurement using a wideband oscilloscope                                                                        | . 38    |
| 2.29 The setup used to test the fully integrated optical receiver                                                                              | 39      |
| 2.30 The eye diagram measurement at 4 Gbps with photodetector current                                                                          |         |
| of about $5 \mu A$ .                                                                                                                           | 39      |
| 2.31 The bathtub curve of the optical receiver running at 4 Gbps                                                                               | 40      |
| 2.32 The bit error rate measurement of the fully integrated optical receiver                                                                   | 5       |
| at 4 Gbps as a function of the input optical power                                                                                             | 40      |
| 2.33 The top level schematic of the optical array.                                                                                             | 43      |

| 2.34 | (a) Circuit implementation of the proposed differential shunt feed-<br>back matched transimpedance amplifier (DSFM-TIA) and (b) the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |          |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 0.05 | simplified model of the TIA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 45       |
| 2.35 | The DSFM-TIA in presence of noise. Here we calculate the output                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |
|      | noise voltage due to only half of the circuit. Due to the symmetry,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |          |
|      | the other half of the circuit will contribute the same amount of noise.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |          |
|      | Since the noise sources are uncorrelated, the output noise voltage will                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |          |
|      | be $\sqrt{2}$ times the half circuit noise                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 46       |
| 2.36 | Quantitative illustration of the (a) input-referred rms noise current                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |          |
|      | and (b) transimpedance-to-input noise as a function of $C_I$ for different                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |          |
|      | values of $GBW$ for $BW_{-3dB} = 1.4 \text{ GHz}, C_{PD} = 10 \text{ fF}, f_T = 30 \text{ GHz},$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |          |
|      | and $\Gamma = 3$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 48       |
| 2.37 | (a) The schematic of PA1 and $G_m$ and (b) PA2 and subsequent five                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |          |
|      | cascaded limiting amplifiers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 49       |
| 2.38 | The differential digital offset cancellation circuitry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 50       |
| 2.39 | The differential digital to analog converter (DDAC)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 50       |
| 2.40 | Die photo of a $8 \times 8$ optical transceiver that consists of $4 \times 8$ microLED                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |          |
|      | drivers and $4 \times 8$ receivers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 51       |
| 2.41 | Output measurement of eye diagram at 2Gbps with photodetector                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |          |
|      | current of $2\mu A$ .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 52       |
| 2.42 | Bit error rate versus power at 2 Gbps                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 52       |
| 2.43 | Bit error rate versus unit interval at 2 Gbps                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 53       |
| 2.44 | The die photo of 128 RX and 128 TX elements with AIB buffers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 54       |
| 2.45 | The AIB buffer schematic.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 54       |
| 2.46 | Simulation of the eye diagram of the victim (red) running at 2 Gbps.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 55       |
| 2.47 | Serial communication.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 56       |
| 2.48 | Parallel communication.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 57       |
| 2.49 | The number of limiting amplifier stages needed in each parallel's re-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 01       |
| 9.50 | The metic of DC memory and has the entired maximum in the entired                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 91       |
| 2.30 | line ratio of DC power consumed by the optical receiver in the serial                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |          |
|      | afficiency improvement is achieved in the perellel link for the series                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |          |
|      | enciency improvement is achieved in the parallel link for the same                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | C 1      |
| 0.1  | Dillighter for the balling of the ba | 01<br>67 |
| 3.1  | Building blocks of a typical pulsed Lidar system.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 99       |
| 3.2  | The possible shapes of the received pulses. If only a threshold detec-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |          |
|      | tion is used for timing discrimination, errors depending on the rise                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |          |
|      | time of the leading edge of the pulse in the distance measurement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | ~~       |
| 0.0  | will appear. $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 99       |
| 3.3  | A top-level schematic of a constant-fraction discriminator (CFD) as                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |          |
|      | a time discriminator (1D). An analog-front (AFE) in pulsed Lidar                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 00       |
| n 4  | receivers usually drives a CFD or other types of TDs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 90       |
| 3.4  | Schematic of the proposed analog front-end (AFE) for pulsed Lidar                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |          |
|      | receivers built on transimpedance-to-noise optimization approach.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 70       |

| 3.5  | Half-circuit of the fully differential resistive shunt-feedback TIA upon<br>which the optimization analysis is studied. In a fully differential TIA,<br>the photodiode capacitance has to be considered twice for the half-                                                                                        |    |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | circuit analysis because of the virtual ground in the middle of the                                                                                                                                                                                                                                                | 71 |
| 26   | photodiode capacitance. $\dots$ The pairs contribution of $M$                                                                                                                                                                                                                                                      | (1 |
| 5.0  | can be assumed zero by ignoring channel length modulation of $M_2$<br>Moreover, since we desire a high DC gain, $R_D$ noise can be neglected.<br>Also, the blue noise current source at the input is a fictitious current<br>source which results in the same poise voltage at the output coming                   |    |
|      | from the physical noise contributors                                                                                                                                                                                                                                                                               | 73 |
| 37   | The transition frequency $(f_{\rm r})$ as a function of the FFT's width for a                                                                                                                                                                                                                                      | 10 |
| 5.7  | constant $V_{ac}$                                                                                                                                                                                                                                                                                                  | 75 |
| 3.8  | Numerical illustration of four attributes as a function of $C_I/C_{PD}$ with<br>different values of $GBW$ for $BW_{-3dB} = 350$ MHz, $C_{PD} = 2.4$ pF,<br>$f_T = 30$ GHz, and $\Gamma = 2$ . (a) Input-referred rms noise current cal-<br>culated from (3.5), (b) required DC gain acquired from (3.9), (c) $R_F$ | 10 |
|      | obtained from (2.7), and (d) transimpedance-to-noise.                                                                                                                                                                                                                                                              | 77 |
| 3.9  | Variation of a few important bias-dependent parameters for a FET                                                                                                                                                                                                                                                   |    |
|      | operating in saturation region in the $0.11\mu\text{m}$ CMOS process with                                                                                                                                                                                                                                          |    |
|      | $W/L = 100 \mu\text{m}/110 \text{nm}$ . By defining $FoM = f_T (g_m r_o) (g_m/I_D)$ for                                                                                                                                                                                                                            |    |
|      | the front-end FET, we observe that the $FoM$ becomes maximum for                                                                                                                                                                                                                                                   |    |
|      | $V_{GS} \approx 0.4 \mathrm{V}.$                                                                                                                                                                                                                                                                                   | 80 |
| 3.10 | Implementation of the resistive shunt-feedback transimpedance am-<br>plifier based on the transimpedance-to-noise optimization approach.<br>The red part of the schematic shows the common-mode feedback am-<br>plifier used to provide a proper DC voltage at the output.                                         | 82 |
| 3.11 | The effect of the portion of the tail current of the TIA controlled by<br>the common-mode feedback amplifier on the common-mode stability.<br>M is the multiplier factor of the corresponding FET in the tail current                                                                                              |    |
| 0.10 | source in Fig. 3.10.                                                                                                                                                                                                                                                                                               | 83 |
| 3.12 | I ne wide swing current mirrors along with a Beta-multiplier current                                                                                                                                                                                                                                               | 01 |
| 3.13 | (a) Circuit implementation of the post amplifier (PA) and push-pull                                                                                                                                                                                                                                                | 84 |
|      | buffer to maximize the output aming                                                                                                                                                                                                                                                                                | 07 |
| 214  | Simulation of the frequency response over the process corners with                                                                                                                                                                                                                                                 | 01 |
| 3.14 | VDD and temperature variation                                                                                                                                                                                                                                                                                      | 88 |
| 3.15 | Die photograph of the analog front-end (AFE) fabricated in a $0.11 \mu m$ CMOS                                                                                                                                                                                                                                     | 89 |
| 3.16 | (a) Single-ended frequency response test setup, (b) differential fre-<br>quency response and transient response test setup.                                                                                                                                                                                        | 90 |
| 3.17 | Insertion gain of the AFE along with the input and output fixtures.<br>The gain is almost flat within the -3dB bandwidth of about 340 MHz                                                                                                                                                                          |    |
|      | for $\tilde{V_{DD}} = 1.8 \mathrm{V.} \dots \dots$                                                                                                                                                           | 91 |

| 3.18  | The frequency response of the AFE. Transimpedance is flat within the $-3$ dB bandwidth of 340 MHz and has gain of about 99dB (90 kO)         | 02    |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------|-------|
| 3 10  | (a) Differential outputs of the AFE responding to a trapezoid pulse                                                                          | 52    |
| 0.10  | with 1 ns rise time and approximately $1.7 \mu\text{A}$ amplitude (b) Differen-                                                              |       |
|       | tial output voltage response to large signal pulses with $t_{i} = t_{f} = 1$ ns                                                              |       |
|       | pulse-width = 2 ns, and amplitude of $7.2 \mu$ A, $13 \mu$ A, $19 \mu$ A, $25 \mu$ A,                                                        |       |
|       | and $30 \mu\text{A}$                                                                                                                         | 93    |
| 3.20  | Output noise measurement setup                                                                                                               | 94    |
| 3.21  | Noise performance of the AFE. Differential output noise is equal to                                                                          | 01    |
| 0     | $6.4 \mathrm{mV_{rmc}}$ which translates to 72 nA input-referred rms noise current.                                                          |       |
|       | The bandwidth is set to the maximum limit to account for the high                                                                            |       |
|       | frequency components of the noise.                                                                                                           | 94    |
| 3.22  | The test setup that was used to the measure power supply gain. The                                                                           | 0 -   |
|       | Bias Tee provides at least 50 dB isolation between the power supply                                                                          |       |
|       | and VDD terminal of the chip                                                                                                                 | 95    |
| 3.23  | Power supply rejection ratio (PSRR) versus frequency. More than                                                                              |       |
|       | 87 dB PSRR is achieved within the -3dB bandwidth when no decou-                                                                              |       |
|       | pling capacitor is used.                                                                                                                     | 95    |
| 4.1   | The common gate (CG) configuration as a potentially low noise wide                                                                           |       |
|       | bandwidth TIA.                                                                                                                               | 99    |
| 4.2   | The schematic of the push-pull common gate (PPCG)                                                                                            | 102   |
| 4.3   | The bias current has to be 4 times higher in the conventional CG to                                                                          |       |
|       | provide the same input resistance                                                                                                            | 102   |
| 4.4   | The schematic of the regulated cascode (RGC) TIA                                                                                             | 103   |
| 4.5   | The schematic of the proposed push-pull regulated cascode (PPRGC)                                                                            |       |
|       | TIA                                                                                                                                          | 104   |
| 4.6   | An implementation for the PPRGC TIA                                                                                                          | 105   |
| 4.7   | The PPRGC prototype fabricated in a 130 nm CMOS process. The                                                                                 |       |
|       | TIA core occupies about $20 \mu\text{m} \times 20 \mu\text{m}$ in area                                                                       | 106   |
| 4.8   | (a) An RC network to convert the voltage to current for the test of                                                                          |       |
|       | the TIA (b) the test buffer connected to the output of the core TIA                                                                          |       |
| 1.0   | to drive $50 \Omega$ test equipment.                                                                                                         | 107   |
| 4.9   | (a) Output response to a PRBS data with the data rate of 2 Gbps                                                                              | 100   |
| 4 1 0 | (b) $4 \text{ Gbps} \dots \dots$       | 108   |
| 4.10  | The output noise measured by the histogram of the oscilloscope (a) A should find head TIA with a second of $(CA)$ where detector             | 109   |
| 0.1   | (a) A shunt feedback IIA with a common anode (CA) photodetector $(h)$ a shunt feedback TIA with a source of the de $(CC)$ where detector     | - 110 |
| 5.0   | (b) a shuft feedback TIA with a common cathode (CC) photodetector<br>The test setup used to measure the perecitie conscitutes of $a_1(a)$ CA | C.112 |
| 0.2   | The test setup used to measure the parasitic capacitance of a (a) CA                                                                         | 119   |
| 53    | Non overlapping voltages generated by a clock source with the free                                                                           | 110   |
| 0.0   | Non-overlapping voltages generated by a clock source with the ne-<br>quoney of $f_{\text{exc}}$                                              | 112   |
| 5 /   | (a) Shunt feedback TIA and the first post amplifier (PA1) (b) the                                                                            | 110   |
| 0.4   | simplified model of the TIA                                                                                                                  | 115   |
| 5.5   | The schematic of the differential amplifier used in the proposed ca-                                                                         | 110   |
| 0.0   | pacitance meter circuit.                                                                                                                     | 116   |
|       |                                                                                                                                              |       |

| 5.6        | The frequency response of the amplifier over the process and temper-                                                                                                        |       |
|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
|            | ature variations.                                                                                                                                                           | 116   |
| 5.7        | The input-referred offset voltage distribution of the amplifier                                                                                                             | 117   |
| 5.8        | The transient output voltage of the proposed photodetector capaci-                                                                                                          |       |
| <b>F</b> 0 | tance meter.                                                                                                                                                                | 117   |
| 5.9        | The photodetector capacitance distribution over the process and mis-                                                                                                        |       |
|            | match variations. A $3\sigma \approx \pm 15\%$ is achieved.                                                                                                                 | 117   |
| 6.1        | An XOR gate can generate a pulse whose width is determined by the                                                                                                           |       |
|            | delay                                                                                                                                                                       | 118   |
| 6.2        | Pulse generation due to an ac-ground termination with: (a) voltage<br>source input in series with a resistor; (b) current source input in<br>parallel with a resistor       | 121   |
| 63         | (a) Circuit realization of Fig. 6.2b. (b) the current injected by $M_{\text{DSDG}}$                                                                                         | 141   |
| 0.0        | $I_{inj}$ , and (c) simple AC modeling of the pulse amplitude and width<br>produced by the BSPG technique where realistically rise-time of the                              |       |
|            | $I_{\rm ini}$ is non-zero.                                                                                                                                                  | 121   |
| 6.4        | (a) The amplifier used for the sharpening of the input and (b) the                                                                                                          |       |
| 0          | schematic of the Reflection-based Short Pulse Generator (RSPG).                                                                                                             | 124   |
| 6.5        | the die photographs of the fabricated chips.                                                                                                                                | 125   |
| 6.6        | (a) Frequency response of the 6 amplifiers used to sharpen the input                                                                                                        |       |
| 0.0        | and (b) transient voltage at the gate of $M_{\text{PSPC}}$ responding to square                                                                                             |       |
|            | waves with $t_{t} = t_{t} = 40 \text{ ps}$ and $t_{t} = t_{t} = 200 \text{ ps}$ respectively. It can                                                                        |       |
|            | be seen that thanks to the wideband amplifier chain there is almost                                                                                                         |       |
|            | no difference at the output of the amplifiers when square waves with                                                                                                        |       |
|            | fast or slow rise-time/fall-time is applied.                                                                                                                                | 126   |
| 6.7        | Measurement of the pulse generator using a probe station. The long<br>cable at the output introduce a great amount of dispersion and at-                                    |       |
|            | tenuation.                                                                                                                                                                  | 127   |
| 6.8        | A single pulse captured for the chip with $\ell = 150 \mu\text{m}$ . The amplitude<br>and FWHM are $\approx 0.21$ V and $\approx 13 \text{ps}$ , respectively. The pulse is |       |
|            | highly attenuated and dispersed mainly due to the 3-feet V-band                                                                                                             | 1.0.0 |
|            | cable used for the pulse measurement.                                                                                                                                       | 128   |
| 6.9        | Output response with a repetition: (a) 4 GHz and (c) 8 GHz. The                                                                                                             |       |
|            | amplitude and pulse width are kept almost constant over frequency                                                                                                           |       |
|            | change                                                                                                                                                                      | 129   |
| 6.10       | (a) Frequency response of the probe, 3-feet V-band cable, DC blocker,                                                                                                       |       |
|            | and the DCA. This is achieved by multiplication of the FFT of the                                                                                                           |       |
|            | measured impulse response of the DCA by the measured and fitted                                                                                                             |       |
|            | transfer function of the rest of the fixture.(b) Estimated pulses after                                                                                                     |       |
|            | de-embedding                                                                                                                                                                | 130   |
| 7.1        | Block diagram of the energy efficient fully integrated on-off keying                                                                                                        |       |
|            | transmitter.                                                                                                                                                                | 133   |
| 7.2        | Schematic of the harmonic oscillator                                                                                                                                        | 134   |
| 7.3        | Schematic of the shunt SPST modulation switch.                                                                                                                              | 135   |
| 7.4        | Insertion loss of the switch when transferring and blocking the signal.                                                                                                     | 135   |

| 7.5  | The on-chip slot antenna                                            | 136 |
|------|---------------------------------------------------------------------|-----|
| 7.6  | The wireless transmitter test setup environment                     | 137 |
| 7.7  | Spectrum measurement setup                                          | 137 |
| 7.8  | The received power as a function of distance                        | 138 |
| 7.9  | The radiated antenna pattern                                        | 138 |
| 7.10 | The simulated 3D radiation pattern of the antenna                   | 138 |
| 7.11 | The time domain measurement setup                                   | 139 |
| 7.12 | Spectrum of the received signal at (a) 10 Gbps modulated signal (b) |     |
|      | 20 Gbps                                                             | 139 |
| 7.13 | Eye diagram of the received signal at (a) 10 Gbps modulated signal  |     |
|      | (b) 20 Gbps                                                         | 140 |

# LIST OF TABLES

## <u>Table</u>

| 2.1 | Table of Comparison                                               | 42  |
|-----|-------------------------------------------------------------------|-----|
| 2.2 | Summary of Photodiode and TIA Attributes for the Serial vs Paral- |     |
|     | lel's Elements Normalized to the Serial                           | 58  |
| 3.1 | Parameters Values for the Designed TIA                            | 84  |
| 3.2 | Performance Summary and Comparison with Prior Arts                | 97  |
| 6.1 | Table of Comparison                                               | 130 |
| 7.1 | Performance Summary and Comparison with Prior Arts                | 141 |

## ABSTRACT

The trend of continually increasing demand for high-performance computation in artificial intelligence, cloud computing, and virtual/augmented reality requires increasing chip-to-chip communication throughput. On the other hand, high-throughput chip-to-chip links usually entail incorporating very high speed Serializers/De-Serializers (SerDes) to mux/de-mux the data onto a relatively few electrical I/O lines due to the limited number of chip package pins, and off-chip routing challenges. This not only incurs latency overhead but also a significant energy and silicon consumption overhead. It is therefore highly desirable to develop energy efficient and compact parallel interconnects for chip-to-chip communication. An optical interface has great potential to ameliorate the interchip communication bottleneck while enhancing energy efficiency and compactness. To this end, a major part of this thesis aims analysis and development of high performance energy efficient integrated circuits and systems for ultra dense optical parallel chip-to-chip communication. The experimental results of several multi Gbps optical transceiver prototypes fabricated in nanometer CMOS technologies prove the proposed optical links a promising technology for interchip communication. In addition, we theoretically compare the energy efficiency of a parallel optical link with a serial optical link. For the same data throughput, a parallel link demonstrates a significantly superior energy efficiency. Moreover, a CMOS sensor is proposed that can measure the small capacitance of on-chip photodetectors. This allows bandwidth and noise optimization for the front-end of optical receivers.

In the remaining parts of the dissertation, the development of several energy ef-

ficient high performance integrated circuits for pulsed light detection and ranging (Lidar), pulsed radio frequency detection and ranging (Radar), and wireless high frequency communication systems are presented. More specifically, the design and test of an optimized analog front-end for pulsed Lidar receivers fabricated in a CMOS technology is presented that remarkably reduces energy consumption. Also, a new transimpedance amplifier topology is introduced that enhances the energy efficiency of the conventional structures, not only for optical communication but also for sensing. In addition, a novel energy efficient pulse generator is introduced that systematically can produce a pulse with high amplitude and short width in any CMOS technology and deliver it to low impedance loads. Moreover, a wireless transmitter for short range communication in the mmWave frequency band fabricated in a SiGe BiCMOS technology is demonstrated that enables energy efficient high data rate communication.

## CHAPTER I

## Introduction

### 1.1 Background

Information computation was estimated to consume about 9% of the world's electricity production in 2012 [6]. Recently, it is projected more than 20% of the world's electricity energy production may be soon consumed by information and communication technology (ICT) as shown in Fig. 1.1. If the massive processing required by cryptocurrency keeps growing, a surge in the electricity energy demand may occur [1], and available resources growth may not suffice the energy demand growth [7].

A major portion of the energy is consumed over wired networks and this is mainly due to unwanted capacitors of the wires. To elaborate this, let us take an expample. A recent approach which has been employed to enhance the computing performance is to exploit 3D-stacked synchronous dynamic random-access memory (SDRAM) along with high-performance graphic processing unit (GPU) by which a large amount of data can be quickly stored and read. However, a large amount of energy needs to be dissipated over the electrical interconnect between the SDRAM and the GPU. The reason for the dissipation of the energy for transferring data over the electrical wire is capacitance. When the transmitter (TX) needs to send '1' bit after '0' bit, it needs to consume energy to charge the capacitance associated with the electrical wire. This energy dissipation is proportional to the capacitance of the wire and the data-rate.



Figure 1.1: Information and Communication Technology (ICT) may take more than 20% of global energy production by 2030 [1].

Therefore, the longer the wire is, the more the energy is dissipated. Fig. 1.2 depicts a simplified model of an electrical wire used for wired electrical communication. The powers supply consumes  $C_{IC}V_{DD}^2$  joules to charge the line to the logic level of '1'. This stored energy dissipates when the line discharges to the logic level of '0', resulting in an average power consumption of  $fC_{IC}V_{DD}^2$  on the line where the f is the frequency of logic operation.

While more advanced solid state technologies improve the energy efficiency of the logic cells, the energy per bit dissipated by electrical interconnects inside and between chips is not noticeably changed. In advanced complementary metal-oxidesemiconductor technologies (CMOS), the transistors parasitic capacitors including oxide and overlap capacitors shrink, reducing the energy dissipated per bit for the logic operation. On the contrary, the electrical interconnects do not scale down in more advanced technologies, predominating the energy consumption.

The energy consumption becomes a serious issue where quite a few electrical channels are required. For instance, the GPU that plays a critical role in the performance



Figure 1.2: The wire capacitance prevails the energy consumption in chip-to-chip communication.

of a computer for image/video processing and any other complex computation such as weather forecasting requires a memory with a fast data read/store capability. The HBM<sup>\*</sup> interface that was invented in 2013, employs SDRAM and achieves a higher bandwidth while consuming less energy in a significantly smaller form factor than  $DDR4^{\dagger}$  or  $GDDR5^{\ddagger}$ . Nevertheless, a remarkable amount of energy is dissipated over the electrical interposer between the GPU and SDRAM. The reason that makes the energy dissipation significant on the HBM interface is due to the demand for the high speed computation and the great number of the channels imposed by the SDRAM in the electrical interposer. Fig. 1.3 shows the cross section of an electrical link used between a GPU and HBM. The available HBM interface energy consumption is in order of a few pJ/bit (pico-joules per bit). On the other hand, the data-rate on a typical GPU is beyond 1 Tbps (tera-bit per second). This implies that normally tens of watts is dissipated and converted to heat just for transferring the high speed data across the electrical wires. The produced heat also requires incorporation of cooling technologies to reduce the temperature which again increase the energy dissipation. When the overall energy dissipation across a GPU and SDRAM is multiplied by the number of units in a data center, it imposes a huge amount of energy dissipation in a data \_center.

Fig. 1.4 shows energy efficiency between chips at different communication distances where for just a few mm between a GPU and HBM, more than 0.5 pJ is

<sup>\*</sup>High Bandwidth Memory

<sup>&</sup>lt;sup>†</sup>Double Data Rate 4 Synchronous Dynamic Random-Access Memory

<sup>&</sup>lt;sup>‡</sup>Graphics Double Data Rate 5 Synchronous Dynamic Random-Access Memory



Figure 1.3: Cross section of a GPU that communicates with HBM [2].



Figure 1.4: The energy efficiency gets worse when the communication distance increases.

dissipated [5]. Therefore, any alternative technology that can remove the electrical capacitor will be able to potentially improve the energy efficiency. In addition, the internal chips such as CPU, GPU, and HBM are slow with a wide bus. In these applications, Seriliazers/Deserializers (SerDes) are used to multiplex and de-multiplex the data into a few pins due to package limitation. These units inevitably imposes an additional power consumption.

An optical interface has a great potential to overcome energy efficiency issues at short and moderate distances. In fact, over the past decades, electrical wires have



Figure 1.5: A figure of merit that accounts for energy efficiency and bandwidth density as a function of communication distance [3].

been replaced with optical fibers at any possible length. Long haul communication was the pioneer of transitioning from electrical wires to the optical fibers, four decades ago. Nowadays, the distance has shrunk to a few meters across data centers. However, still between chips on the same package, circuit boards, and within racks the data communicate over electrical wires. Fig. 1.5 represents a figure of merit (FOM) that accounts for energy efficiency and bandwidth density. The higher the FOM the better. As it can be seen, for a communication range shorter than 1 cm, high FOM numbers are achieved. However, the FOM drops with increasing the communication distance and for the sub 10 m distance, there is no available solution yet that can yield a high FOM.

Fig. 1.6 illustrates an optical interconnect architecture solution that can potentially present very low joule per bit with high number of channels per unit length. What distinguishes this optical interface from other optical technologies is the parallel communication ability. This is an important aspect as many electrical interfaces are naturally parallel. Therefore, a parallel optical interconnect can efficiently couple to the parallel electrical interface, obviating the need for power hungry SERDES elec-



Figure 1.6: Parallel optical interface to improve energy efficiency and density using microTXs and microRXs.

tronics. As a result, further energy efficiency may be achieved. To this end, micro optical transmitters and receivers have to be developed that can maintain a high density with low power consumption. Visible light communication at blue wavelength has two special features that can enable such a technology. First, microLEDs made of GaN semiconductor that emit blue light can operate at several Gbps [4,8,9]. These microLEDs can be completely turned off, despite laser sources, paving the way to energy efficient modulation. Second, the light absorption in semiconductor is so shallow at blue wavelength, enabling CMOS compatible photodetector structures that can have very low capacitance per unit area. This subsequently provides high gain at the first amplifier stage of the optical receiver, resulting in a low power consumption for the receiver. Majority of this dissertation will study different aspects of microLEDbased parallel optical communication and discuss the details of several fabricated integrated circuit and system prototypes.

In addition, Lidar which stands for light detection and ranging has attracted a

lot of attentions. Lidar can be used for ranging and imaging of dimension of micrometer size to dimensions in order of a few thousands of meters. Lidar systems are getting widely used in autonomous vehicles (AV) because of special advantages over cameras. To have a fast and wide imaging view by this technique, a number of Lidar sensors need to be arrayed which significantly increases the energy consumption. It is therefore highly desirable to develop Lidar systems that are energy efficient. In this dissertation, the design and implementation of a pulsed Lidar receiver that consumes a lower power owing to a proposed optimization is presented.

Moreover, short pulse generators play a critical role in a broad range of fast and ultra wide-band applications such as high speed sampling, high sensitivity time domain reflectometry, pulse-based Radars, spectroscopy, and high data rate communication systems. For these applications, it is desired to develop a pulse generator with a shorter pulse and higher amplitude. In Radars for instance, the range resolution is inversely proportional to the bandwidth. In other words, the higher the bandwidth, the better the range between two nearby objects can be resolved. On the other hand, the frequency bandwidth of a pulse is inversely proportional to the pulse width. As a result, a pulse radiator with a shorter width can offer a better range resolution. In conventional techniques, the pulse width is highly limited to the switching speed of the transistors. However, there have been proposed several novel techniques that produce pulses beyond the transistor speed limitation. Nonetheless, most of these techniques are quite power hungry and bulky. A novel, simple, and energy efficient technique that breaks the transistors speed limitation and generates shorter pulses with a high amplitude is presented in this dissertation.

Last but not least, high data-rate wireless transceivers are highly demanded for high speed communication networks and as shown in Fig. 1.1, they take a significant portion of the ICT energy demand. The energy consumption in this area is crucial from several other points of view. First, for many portable applications, the energy consumption determines the duration that chargeable batteries can sustain. So a wireless transceiver with a better energy efficiency functions for a longer time with a given battery. Second, lithium which is the essential material for the production of the chargeable batteries has limited resources on earth. Therefore, a portable wireless system with a better energy efficiency expands the lifetime of the chargeable battery which consequently reduces the demand for lithium harvesting. In the remaining of the dissertation, the design of a high frequency high data-rate wireless transmitter is presented that employs several techniques to enhance the energy efficiency.

### **1.2** Dissertation Outline

The rest of the dissertation is organized as follows. In Chapter II, several prototypes enabling a microLED-based optical interposer as a viable solution for reducing energy dissipation of interconnects are presented. More specifically, the development of a 4 Gbps singled-ended optical receiver with an integrated blue light photodetector is elaborated. The short absorption depth of the blue light in silicon makes the photodetector to have a very low parasitic capacitance per unit area. Due to the extremely low capacitance along with a transimpedance-to-noise optimization of shunt feedback transimpedance amplifiers (TIA), a high transimpedance gain is obtained immediately in the TIA, resulting in very lower power consumption in the receiver. Also, a digital offset cancellation (DOC) circuitry that can achieve a zero low-end cutoff frequency is proposed. The fabricated receiver in a 130 nm low-cost CMOS SOI process dissipates about 0.45 pJ energy per bit at 0.9 V supply, and occupies  $0.01 \,\mathrm{mm^2}$  in silicon area. The chip achieves a sensitivity level as low as  $-15.3 \,\mathrm{dBm}$ at a bit error rate of  $10^{-12}$ . In addition, the design of a 32 element fully differential optical receiver prototype that incorporates test buffer and can achieve a 64 Gbps data-rate is discussed in detail. Moreover, a 128 element fully differential optical receiver prototype that employs AIB buffers is presented. Lastly, we compare the parallel optical communication link to a similar serial optical communication link and show why the parallel link outperform the serial link in terms of energy efficiency and signal-to-noise.

In Chapter III, we take advantage of the transimpedance-to-noise optimization approach for development of a resistive shunt-feedback TIA in a pulsed Lidar receiver. We analytically and quantitatively demonstrate that despite the well-known capacitive matching that yields the minimum noise, the transimpedance-to-noise is optimized when the front-end FET's input capacitance is identical to a smaller fraction of the photodiode capacitance. This optimization offers an enhancement in the transimple and a noise performance very close to the theoretical minimum noise of the TIA. In addition, the transimpedance-to-noise optimization approach results in a small front-end FET size which enables a further reduction in power consumption and area. Moreover, this approach enables using a fewer number of stages in the receiver chain which makes a high PSRR feasible and obviates the necessity for using an offset cancellation circuitry. Building on this approach, a fully differential analog front-end including a resistive shunt-feedback TIA and a post amplifier (PA) for time-of-flight (ToF) Lidar receivers is designed and implemented, achieving  $94 \, dB\Omega$  transimpedance gain, 71 nA input-referred rms noise current, -3dB bandwidth of 340 MHz, and power supply rejection ratio (PSRR) of more than 87 dB in a  $0.11 \,\mu m$  CMOS process. The associated DC power consumption is  $19.4 \,\mathrm{mW}$  with  $V_{DD}$  of  $1.8 \,\mathrm{V}$ . Moreover, a pushpull buffer with 1 V output swing is integrated for driving 50 $\Omega$  loads, such as off-chip time discriminators, which also additionally amplifies the signal with a gain of  $5 \, dB$ while consuming an extra 20.9 mW of DC power. The whole chip (excluding pads) occupies  $210 \,\mu \text{m} \times 110 \,\mu \text{m}$  in area.

In Chapter IV, a new TIA so called push-pull is introduced. The push-pull structure halves the input resistance of a common gate TIA with a slight increase in the power consumption, potentially doubling the bandwidth without sacrificing the noise performance. A regulated cascode TIA incorporating a push-pull structure rather than common gate is then developed. At a supply voltage of 1.2 V, the chip consumes 1.13 mA, dissipating 1.36 mW of DC power including the output test buffer.

In Chapter V, the design of a CMOS sensor is presented that can precisely measure the parasitic capacitance of photodetectors. The photodetector parasitic capacitance is an important parameter that mainly determines the bandwidth and noise of the optical receivers used in communication and sensing. As a result, accurate optimization over photodetector structures and TIAs can be performed. Also, this technique enables on-chip calibration for the TIA bandwidth. The sensor can measure the capacitance of both common anode and common cathode photodetector configurations.

In Chapter VI, a low cost, compact, and systematic method for generating a short pulse and efficiently delivering it to low impedance loads is proposed. The load can be a resistive on-chip or off-chip load, an antenna, or another stage such as a power combiner. The technique is inspired by time domain reflectometry from a shortcircuited termination. The technique is fully compatible with CMOS technology. To show the feasibility of the idea, two chips are fabricated and tested in a low cost  $0.11 \,\mu\text{m}$  CMOS process with  $f_T \approx 80$ GHz. The full width at half maximum (FWHM) of the pulses are 6.8 ps and 8.8 ps, and the amplitudes over a 50  $\Omega$  load are 0.62 V and 1 V. Either of the chips occupies 1160 × 540  $\mu$ m<sup>2</sup> in area.

In Chapter VII, we present a fully integrated energy efficient wireless transmitter (TX) at 220 GHz. The transmitter employs on-off keying (OOK) scheme for the modulation of the baseband data. The measurements demonstrate 20 Gbps data-rate with 63 mW of DC power consumption including CMOS buffer, resulting in 3.15 pJ/bit energy efficiency. The chip is fabricated in 55 nm SiGe BiCMOS process and occupies  $0.38 \text{ mm}^2$  in area.

## CHAPTER II

# MicroLED-based Optical Links for Ultra Dense and Energy Efficient Interchip Communication

## 2.1 Introduction

Increasing chip-to-chip communication throughput is inevitable to meet the trend of steadily rising demand for high-performance processing in artificial intelligence, cloud computing, and virtual/augmented reality. Due to the restricted number of chip package pins and the difficulties associated with off-chip routing, high-throughput chip-to-chip communications typically require the integration of very high speed Serializers/De-Serializers (SerDes) to mux/de-mux the data onto a small number of electrical I/O lines. This results in additional latency as well as increased energy and silicon consumption. Besides, as it was shown in Fig. 1.5, the communication reach with a reasonable energy efficiency is quite limited using the available electrical interconnect technologies. Therefore, developing energy efficient and compact parallel interconnects for chip-to-chip communication is highly demanded.

Optical interconnects have great potential to ameliorate the chip-to-chip communication bottleneck while enhancing energy efficiency and compactness, as discussed in 1.1. Fig. 1.6 depicted a viable solution, potentially providing energy efficient long reach chip-to-chip links. GaN microLEDs which have been developed for light-



Figure 2.1: (a) A 16x16 array of  $4 \,\mu$ m diameter emitters, with  $30 \,\mu$ m spacing on a sapphire substrate seen through a 1 mm plastic imaging fiber, (b) a SEM photo of the microLEDs transferred to a CMOS die. The contact pad on the side is the cathode drive electrode, which is connected in a subsequent lithography step [4,5].

ing and display applications can be used for chip-to-chip data communication applications [10, 11]. Recently, Cavity-Reinforced Optical Micro-Emitter (CROMOE) microLEDs have demonstrated several Gbps of data communication per lane with simple equalization [4,5,8]. These GaN microLEDs emit light whose wavelength falls within 400-450 nm. This wavelength enables on-chip photodetectors that have low capacitance per unit area [12] which is highly desirable in an optical receiver. Fig. 2.1 shows a fabricated and transferred array of CROMOEs to a CMOS die. Fig. 2.2 illustrates a conceptual diagram of a bidirectional parallel optical link for chip-tochip communication using blue light. In this architecture, the light coming out of the microLED array can be carried using an imaging fiber at very high density [13]. A receiver array with on-chip photodetectors convert the optical to electrical signal with the same dimension of the microLEDs. Fig. 2.3 depicts a detailed implementation of the full duplex parallel link for the chip-to-chip communication using microLEDs.

While many optical receivers have been reported in the past, their main focus has



Figure 2.2: The conceptual diagram for a full duplex parallel optical chip-to-chip communication link. In this architecture, the electrical interfaces (EI) send the data to the optical transmitters (TX) to modulate the microLEDs. The optical light is carried over an imaging fiber and is detected by the photodiodes on the receiver (RX) side. The RX amplifies the detector signal and sends it to the EI. This communication is also performed in the reverse direction at the same time.



Figure 2.3: A realization of the full duplex parallel optical link for the chip-to-chip communication [5].

been development of the receivers in different technologies for 850 nm wavelength coming from vertical-cavity surface-emitting lasers (VCSEL) [14–16], or  $1.3/1.55 \,\mu$ m with non-VCSEL [17–21] for serial communication. At 850 nm wavelength and beyond, the absorption depth of the light in silicon is long (>10  $\mu$ m) which is not desirable for the sub-micron depletion widths available in CMOS compatible photodetectors. This results in a steep efficiency-speed trade-off for on-chip photodetectors [12, 22].

By contrast, in this dissertation, we first present the design and test of a 4 Gbps single-ended optical receiver chip suited for parallel chip-to-chip communications using short wavelength visible light in Section 2.2. While this compact chip demonstrates very low power consumption with a high sensitivity, it is not suitable for large arrays where through-silicon via (TSV) technology is not available. In large arrays, the on-chip supply routing inductance becomes relatively large. Therefore, in high speed single-ended signaling, the voltage bounce at the array elements supply and ground pins due to the routing inductance can significantly degrade the signal-tonoise (SNR) ratio, demanding the TSV technology to provide low inductance routing. To prevent the SNR degradation without using the TSV technology, we can design the receiver channel whose configuration is fully differential throughout the chain. As a result, the voltage bounce appears as a common mode noise and so it gets highly attenuated. In this regard, we present the design of a fully differential 32 element parallel optical receiver prototype, achieving a 64 Gbps data throughput in Section 2.3. This chip incorporates  $50\,\Omega$  drivers for the test and measurement. In addition, another 128 elements parallel optical receiver with AIB drivers is presented in Section 2.4. Besides, in Section 2.5 the energy efficiency and signal-to-noise ratio of a parallel optical link is compared with a serial one. We analytically and quantitatively demonstrate the extent to which a parallel link can improve the energy efficiency and signal-to-noise ratio.

# 2.2 A 4 Gbps Single-ended Optical Receiver with an Integrated Blue Light Photodetector for Parallel Optical Communication

The receiver design is the most arduous part of the parallel optical link to design and needs to address several challenges. First, we have to develop an on-chip blue photodetector that is compatible with CMOS processes along with very compact receiver circuitry to maintain the high density of the transmitter. Second, the receiver noise has to be minimized to detect a very low optical power coming from a microLED with a typical bit error rate of 10<sup>-12</sup>. Third, the receiver DC power consumption should be low so the optical interface outperforms its electrical counterpart. Fourth, many parallel interfaces require a low-end cutoff frequency as small as DC. This obviates the requirement for coding, further improving energy efficiency.

We deal with the aforementioned challenges in the rest of this section and provide new solutions. The top level schematic of the chip is discussed in Subsection 2.2.1. In Subsection 2.2.2, we review the design of the on-chip blue photodetector in a CMOS SOI process. A transimpedance-to-noise optimization approach is presented in Subsection 2.2.3 that increases the transimpedance and decreases the DC power consumption of the transimpedance amplifier (TIA) along with the area of the frontend FET, without compromising the noise performance. In addition, the design of post and limiting amplifiers are discussed in Subsection 2.2.4. In Subsection 2.2.5, the proposed digital offset cancellation is elaborated. In Subsection 2.2.6, we present the measurement results of a prototype fabricated in a 130 nm CMOS SOI process.

#### 2.2.1 Top-Level Schematic of the Design

Fig. 2.4 shows the top level schematic of the optical receiver. An on-chip photodetector converts the incoming optical signal to current. A transimpedance amplifier (TIA) converts the detected signal to voltage. To increase the signal level and reach to rail-to-rail voltage swing, the TIA is followed by two post amplifiers (PA1 and PA2), and four identical limiting amplifiers (LA1). Since we need an output buffer to drive test equipment, the amplifier chain needs to be tapered using additional limiting amplifiers (LA2 and LA3). Despite the majority of the optical receivers that need to deal with a large dynamic range, we receive a relatively constant and known optical



Figure 2.4: The top level schematic of the proposed 4 Gbps fully integrated optical receiver with a digital offset cancellation (DOC).

power in the chip-to-chip link. Also, since the microLED transmitter runs at a higher supply voltage compared to the receiver, to improve the energy efficiency of the link, it is desirable to reduce the optical power of the transmitter and instead, enhance the sensitivity of the receiver. Therefore, to increase the signal-to-noise ratio, the DC of the incoming signal and offset of the amplifiers is compensated by the digital offset cancellation (DOC) at the output of PA2, rather than at the input of the TIA.

#### 2.2.2 Integrated Blue Light Photodetector

It is highly desirable to integrate photodetectors in the CMOS process to reduce packaging cost of the receivers. This also increases density for arrays and decreases the parasitic capacitance imposed by the off-chip detector. On-chip photodetectors that are compatible with a CMOS process usually employ multiple layers of n+, undoped, and p+ on vertically [23,24]. Although this approach provides wide depletion region that improves the responsivity ( $\mathfrak{R}$ ) of the detector, high junction capacitance is introduced. This not only degrades the receiver sensitivity but increases the frontend power consumption as well. Alternatively, we can configure the photodetector implants in such a way that the electric field that is generated by reverse biasing of the diode is mainly in the lateral (x-axis) direction. As a result, a low junction capacitance is introduced by the diode and most of the blue light light energy is used


Figure 2.5: The cross section of the blue light photodetector in the CMOS SOI process.

because of the shallow penetration depth of it [25]. Fig. 2.5 depicts a cross section of the photodetector in the CMOS SOI process. Several photodetectors with different spacing are fabricated and the one that presents the lowest capacitance and highest responsivity is picked. Measurement results shows a junction capacitance for our custom diode as small as about 10 fF and  $\Re \approx 0.17 \text{ W/A}$ .

### 2.2.3 Transimpedance-to-Noise Optimization of the Shunt Feedback TIA

To amplify the photodetector current, resistor shunt feedback transimpedance amplifiers (TIA) are widely used due to less dependency between bandwidth, inputreferred noise current, and voltage headroom consumption [26]. Most of these works incorporate an inverter as the core amplifier of the shunt feedback TIA [20,21,27–31]. However, an inverter-based TIA usually not only fails to reject power supply noise, but also may amplify it. This may not be acceptable from stability and signal integrity points of view for a receiver where high sensitivity is required.

To take advantage of shunt feedback TIAs while alleviating power supply noise rejection of the inverter-based TIAs, we can configure the feedback around a commonsource amplifier with a resistive load, as shown in Fig. 2.6a. In this schematic that also depicts the parasitic capacitors, the photodetector is modeled with a current source  $(I_{in})$  and a shunt capacitor  $(C_{PD})$  that includes the diode junction capacitance and



Figure 2.6: (a) Shunt feedback TIA and the first post amplifier (PA1) (b) the simplified model of the TIA.

the routing metal capacitance of the diode to the ground.

To optimize the noise and transimpedance of the TIA we can simplify the circuit to that shown in Fig. 2.6b in which  $A(s) = A_0/(1 + s/\omega_A)$  where  $A_0 = g_m R_D$ and  $\omega_A = 1/R_D C_D$  are the DC gain and dominant pole of the open loop amplifier, respectively. In our analysis, the input capacitance of the PA1 is included in  $C_D$ . Also,  $g_m$  denotes the transconductance of  $M_1$ .

By writing KCLs at the input node, we can find the transimpedance transfer function of  $V_{out}/I_{in}$  as

$$Z_T(s) = -R_T \frac{\omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$
(2.1)

where

$$R_T = \frac{A_0}{1 + A_0} R_F \tag{2.2}$$

$$\omega_n^2 = \frac{(A_0 + 1)\,\omega_A}{R_F C_T}\tag{2.3}$$

$$\zeta = \frac{1}{2} \frac{R_F (C_T + A_0 C_{gd})\omega_A + 1}{\sqrt{(A_0 + 1)\,\omega_A R_F C_T}}$$
(2.4)

 $C_T = C_{PD} + C_I$ , and  $C_I = C_{gs} + C_{gd}$ . For  $A_0 \gg 1$ , it is evident that  $R_T \approx R_F$ . To have  $\zeta = \sqrt{2}/2 \approx 0.71$  which has the fastest time domain response with negligible overshoot, and assuming  $A_0 C_{gd} \ll C_T$ ,  $\omega_A$  should equal to

$$\omega_A \approx \frac{2A_0}{R_F C_T}.\tag{2.5}$$

When  $\zeta = \sqrt{2}/2$ ,  $\omega_{-3dB} = \omega_n$  and so with using (2.3) and (2.5) we can find the -3dB bandwidth as

$$BW_{-3\mathrm{dB}} \approx \frac{1}{2\pi} \frac{\sqrt{2}A_0}{R_F C_T}.$$
(2.6)

Moreover, using (2.3) the feedback resistor can be calculated as

$$R_F = \frac{GBW}{2\pi C_T BW_{-3dB}^2} \tag{2.7}$$

where  $GBW = A_0 \omega_A / 2\pi = g_m / 2\pi C_D$  is the gain-bandwidth product of the core amplifier without feedback.

Fig. 2.7 illustrates the small signal model of the TIA along with the noise contributors. To find the output-referred noise voltage, we can perform KCLs at the input and output nodes. In this figure,  $i_{n,in}$  is the input-referred noise current that is calculated by dividing the output-referred noise voltage by the transimpedance transfer



Figure 2.7: The shunt feedback TIA small signal model in presence of the noise.

function. This results in the input-referred noise current to be

$$i_{n,in} = \frac{(i_{n,R_D} - i_{n,d} + i_{n,R_F})(R_F C_T s + 1)}{R_F C_{gd} s - R_F g_m + 1} - i_{n,R_F}.$$
(2.8)

where  $i_{n,R_D}^2 = 4kT/R_D$ ,  $i_{n,d}^2 = 4kT\Gamma g_m$ , and  $i_{n,R_F}^2 = 4kT/R_F$  are noise current spectral density of the drain resistor, FET channel, and feedback resistor per 1-Hz bandwidth, respectively. Also, k, T, and  $\Gamma$ , are Boltzmann's constant, absolute temperature, and Ogawa's excess noise factor [32], respectively.  $\Gamma$  can be considered constant for optical receivers [33]. Assuming  $f \ll g_m/(2\pi C_{gd}), g_m R_F \gg 1$ , and  $1/(g_m R_D) \ll \Gamma \ll g_m R_F$ , we can find the input referred noise current spectral density as

$$i_{n,in}^2(f) \approx \frac{4kT}{R_F} + 4kT\Gamma \frac{(2\pi C_T)^2}{g_m} f^2.$$
 (2.9)

By writing the above equation as  $i_{n,in}^2(f) = C_n + C_{n2}f^2$  and defining noisebandwidths for the white noise and violet noise terms [34–37] we can write

$$I_{n,in}^{2} = \int_{0}^{BW_{n}} C_{n} df + \int_{0}^{BW_{n2}} C_{n2} f^{2} df \qquad (2.10)$$



Figure 2.8: Normalized noise bandwidths to the -3dB bandwidth as a function of damping ratio ( $\zeta$ ).  $BW_n$  and  $BW_{n2}$  are minimum for  $\zeta = 0.71$  and  $\zeta = 0.44$ , respectively.

$$I_{n,in}^2 = C_n B W_n + C_{n2} \frac{B W_{n2}^3}{3}$$
(2.11)

$$BW_n(\zeta) = \frac{1}{Z_T(0)^2} \int_0^\infty |Z_T(j2\pi f)|^2 df$$
(2.12)

$$= \frac{\pi}{4\zeta\delta} \cdot BW_{-3\mathrm{dB}}$$

$$BW_{n2}^{3}(\zeta) = \frac{3}{Z_{T}(0)^{2}} \int_{0}^{\infty} |Z_{T}(j2\pi f)|^{2} f^{2} df$$

$$= \frac{3\pi}{4\zeta\delta^{3}} \cdot BW_{-3dB}^{3}$$
(2.13)

where

$$\delta = \sqrt{\sqrt{(1 - 2\zeta^2)^2 + 1} + (1 - 2\zeta^2)}.$$
(2.14)

Fig. 2.8 demonstrates  $BW_n(\zeta)$  and  $BW_{n2}(\zeta)$  normalized to the  $BW_{-3dB}$  as a function of  $\zeta$ . It can be seen that the noise-bandwidths are minimized for  $0.44 < \zeta < 0.71$ . Therefore, we design the TIA in a way that  $\zeta$  becomes 0.71 as it not only reduces total



Figure 2.9: Quantitative illustration of the (a) input-referred rms noise current and (b) transimpedance-to-input-referred rms noise current (normalized to  $10^3$ ) as a function of  $C_I/C_{PD}$  for various values of GBW for  $BW_{-3dB} = 2.8 \text{ GHz}$ ,  $C_{PD} = 10 \text{ fF}$ ,  $f_T = 40 \text{ GHz}$ , and  $\Gamma = 3$ .

integrated noise but also offers 41% bandwidth enhancement compared to a first-order response. With this  $\zeta$ , we obtain  $BW_n \approx 1.1BW_{-3dB}$  and  $BW_{n2}^3 \approx 3.3BW_{-3dB}^3$ . To appropriately optimize the noise, we can use (2.7) to plug in (2.9). Also, we can write  $g_m \approx 2\pi C_I f_T$  where  $f_T$  denotes the transition frequency of  $M_1$  which primarily depends on  $V_{GS}$  and the technology. With all this taken into account, we can find the input-referred rms noise current as

$$I_{n,in}^{2} = 8.8\pi kTBW_{-3dB}^{3} \left(\frac{C_{T}}{GBW} + \Gamma \frac{C_{T}^{2}}{f_{T}C_{I}}\right).$$
 (2.15)

Fig. 2.9a illustrates  $I_{n,in}$  as a function of  $C_I/C_{PD}$  for different values of GBW. It is evident from this figure that  $I_{n,in}$  reduction practically stops for GBW > 30 GHz. On the other hand,  $g_m$  which mainly determines the GBW is not proportionally increased by raising the current density due to velocity saturation. Therefore, to reduce the DC power consumption without compromising the noise performance, it is desired to run at a lower GBW. However, as evident from (2.7), a higher GBW allows using a larger  $R_F$  which can alleviate the subsequent amplifier noise contribution. Moreover, we can observe that the  $I_{n,in}$  is minimized for  $C_I \approx 0.8C_{PD}$ , close to the theoretical capacitive matching rule  $(C_I = C_{PD})$  [36, 38]. Nevertheless, the steep slope of  $I_{n,in}$  reduction is dramatically decreased as we start from lower  $C_I$  values and approach to  $C_I \approx 0.8 C_{PD}$ . A higher  $C_I$  mandates a lower  $R_F$  to meet the bandwidth requirement. This in turn reduces the transimpedance of the TIA and increases the subsequent noise contribution. Therefore, by defining  $R_F/I_{n,in}$  as a figure of merit, we observe that it is maximized for  $C_I \approx 0.3 C_{PD}$ . This optimum design point not only makes the  $I_{n,in}$  very close to the theoretical minimum noise of the TIA, but also boosts the transimpedance and reduces the subsequent stage noise contribution which was not accounted for in our analysis. In addition, because of a smaller FET, a lower area is occupied, and overall, less DC power is consumed in the receiver chain



Figure 2.10: Post amplifier 2 (PA2) circuit implementation.

as the amplification gain level requirement is relaxed in the subsequent stages.

## 2.2.4 Post and Limiting Amplifiers

Fig. 2.10 depicts the circuit implementation of the post amplifier 2 (PA2) in which the output of the DOC circuitry controls part of the tail current of the differential amplifier. In this way, DOC changes the common-mode without disturbing the signal. The DC level of the other input terminal of the differential amplifier is set to  $V_{DD}/2$ using  $R_1$  and  $R_2$ . With the help of  $C_1$  and  $C_2$ , this node is also set to ac ground. This capacitor configuration allows the gate of  $M_2$  to track the supply bounce.

To further amplify the signal and reach a rail-to-rail swing level, four identical limiting amplifiers (LA1) are used. We note that despite differential amplifiers where the only design consideration is bandwidth, in a single-ended limiting amplifier we should also set the amplifier trip point to  $V_{DD}/2$  to prevent the signal clipping. Usually in inverter-based amplifiers, this is achieved by a proper ratio of the PMOS to the NMOS. However, an inverter-based amplifier suffers from a poor power supply rejection. Fig. 2.11a shows the proposed single-ended amplifier that provides independent control of common mode and bandwidth. In this circuit,  $M_2$  and  $M_4$  are sized in a way that the trip point passes the mid-supply voltage. Therefore, as shown in Fig. 2.11b, once the LA1 is cascaded, after four stages of gain amplification, the trip



Figure 2.11: (a) Circuit implementation of the single-ended LA1, and (b) the DC transfer characteristics of the cascaded LA1 at the corresponding outputs at  $V_{DD} = 0.9$  V.



Figure 2.12: Circuit implementation of the LA2 and two cascaded stages of LA3.

point still passes  $V_{DD}/2$ . Also, with tuning  $R_1$ , we can tune the bandwidth without noticeably impacting the DC transfer characteristics. Fig. 2.12 illustrates additional limiting amplifiers (LA2 and LA3) that are tapered to be able to drive a 50 $\Omega$  output buffer.

### 2.2.5 Digital Offset Cancellation

To prevent channel saturation due to the DC content of the input signal and other non-idealities in the receiver chain, typically an offset cancellation circuit is used to sense the output DC and correct at the input. These configurations use low pass RCfilter networks to remove the high frequency content of the signal. However, the RCfilters become prohibitively large to achieve very low-end cutoff frequencies. Fig. 2.13 depicts a simplified model of such analog offset cancellation loops where  $A_0$  represents the DC gain of several cascaded amplifiers. The RC filter is a low pass filter that only senses the low frequency content of the signal. The result is then compared with the desirable DC level, that is  $V_{CM}$ . Fig. 2.14 represents the bode plot of the loop in which  $\omega_p$  is equal to 1/RC. Therefore, we can easily find the frequency of the pole which corresponds to the low-end cut off frequency of the amplifier to be

$$f_L = \frac{(1+A_0)}{2\pi RC}.$$
 (2.16)

For a receiver with  $A_0$  of about 2000, with  $R = 1 \text{ M}\Omega$ , a capacitor as large as  $1.3 \,\mu\text{F}$  is required to achieve a low-end cut off frequency as low as 240 Hz. This of course is not feasible in any silicon technology due to the extremely large area of the capacitor.

To provide low-end cutoff frequency tunability down to DC in the offset cancellation circuity with a small area, a digital integrator can be incorporated within the loop. Fig. 2.15 shows the digital offset cancellation (DOC) top level schematic. The



Figure 2.13: Simplified model of an analog offset cancellation loop.



Figure 2.14: Bode plot of the analog offset cancellation loop.



Figure 2.15: The circuit implementation of the digital offset cancellation (DOC).



Figure 2.16: The comparator schematic based on the strongArm configuration.

DOC circuit, excluding  $R_1C_1$  and  $R_3C_3$  sit outside the receiver circuit on the same silicon chip to increase the compactness of the optical receiver element. This does not adversely affect the functionality of the DOC as it runs at a low frequency. The network of  $R_1C_1$  is used in the receiver to immediately attenuate high frequency content of the signal and prevent any noise coupling. The sensed signal after the two cascaded low pass filter is compared with a desirable common mode level,  $V_{CM}$  using a comparator based on a strongArm configuration [39]. The circuit implementation of the comparator that operates at a clock frequency of a few MHz is shown in Fig. 2.16.

The digital gain control (DGC) block sign extends the 1-bit output of the comparator for a two's complement digital integration. Also, the 1-bit output of the comparator is weighted by a 3-bit "gain" control to adjust the offset cancellation loop gain and subsequently the low-end cutoff frequency. Fig. 2.17a shows the DGC output when  $V_i < V_{CM}$  for the lowest and highest gain settings. Similarly, Fig. 2.17b represents the DGC output when  $V_i > V_{CM}$  for the lowest and highest gain settings. The digital integrator makes the DC error between the comparator input terminals zero due to its high DC gain. In the beginning when the 1-bit of "freeze" is zero, the 5 most significant bits of the integrator are fed to a 5-bit digital to an analog converter



Figure 2.17: (a) The DGC output bits when the comparator senses a voltage above  $V_{CM}$  and (b) the DGC output bits when the comparator senses a voltage below  $V_{CM}$ .

(DAC) to provide an analog voltage at its output. Once the loop locks, we can enable the freeze bit to disconnect the integrator and instead use the lock condition bits to feed the DAC.

It is important to provide a high level of monotonicity in the DAC design for the correct operation of the DOC. Fig. 2.18 illustrates the schematic of the proposed DAC. In this design, the 2-MSB of the "DACIN" makes the coarse selector to pass a high and low voltage,  $V_{H,i}$  and  $V_{L,i}$ , respectively in which *i* indicates the decimal value of the 2-MSB bits of the DACIN. To isolate the left resistor ladder (coarse) from the right one (fine), we can use current mirrors on the top and bottom of the fine resistor ladder [40]. Alternatively, to reduce mismatch and power consumption, we can insert overlap resistors,  $R_2$ , that can compensate for the loading effect of the right resistor ladder. In other words, we must ensure that

$$(R_1 + 2R_2)||8R_3 = R_2 + R_1. (2.17)$$



Figure 2.18: The circuit schematic of the R-R 5-bit digital to analog converter (DAC).

With  $R_1 = 21.1 \text{ k}\Omega$ ,  $R_2 = 1.9 \text{ k}\Omega$ , and  $R_3 = 39.1 \text{ k}\Omega$ , the above condition is met while the DAC draws only sub- $\mu$ A current through REFL and REFH ports. Fig. 2.19a shows the schematic used to generate the REFL and REFH voltages for the DAC. Also, Fig. 2.19b illustrates the op amps used in the reference generator circuitry as the buffer. During the changes of the input codes of the DAC, transient glitches might occur which can be filtered out using the low pass filter of  $R_3C_3$ , as shown in Fig. 2.15.

In order to find the low-end cut off frequency with the proposed loop, we should find the gain of each block in the loop. Fig. 2.20 shows a simplified model for the offset cancellation loop where  $A_0$  accounts for the DC gain of PA2 all the way to the LA2. In this model, the two cascaded low pass filter transfer function is

$$H_{LP1}(s) = \frac{1}{R_1 R_2 C_1 C_2 s^2 + (R_1 C_1 + (R_1 + R_2) C_2) s + 1}.$$
 (2.18)



Figure 2.19: (a) Circuit implementation of the REFH and REFL voltage generator for the DAC (b) the op amps used in the reference generator.

We can find the gain of the comparator by obtaining the gain of an N-bit ADC for N=1. Fig.2.21a shows the voltage transfer characteristic of the N-bit ADC for an input signal with the swing of  $V_{SW}$ . It is evident that the gain of the N-bit ADC is equal to  $(2^N - 1)/V_{SW}$ . Therefore, the comparator gain turns out to be

$$G_{COMP} = \frac{1}{N}.$$
(2.19)

The three bits of DGC provides 8 different gain setting. Depending on the DGC bits, the 1-bit comparator output is gained as

$$G_{DGC} = 2^0 .. 2^{14}. (2.20)$$

Assuming a gain of 1 for the DGC, the integrator has to count  $2^{15}$  clock cycles in



Figure 2.20: The digital offset cancellation loop model.



Figure 2.21: (a) The gain of an N-bit ADC and (b) extrapolating the gain of the N-bit ADC to 1-bit comparator.

order to make 1-LSB change in the following DAC. This makes the integrator gain to be

$$G_{INT} = 2^{-15} \frac{Z^{-1}}{1 - Z^{-1}}.$$
(2.21)

It is obvious that for the gain of a 5-bit DAC we can write

$$G_{DAC} = 1 LSB = \frac{V_{REFH} - V_{REFL}}{2^5}.$$
 (2.22)

Also, the DAC's output filter transfer function is equal to

$$H_{LP2}(s) = \frac{1}{R_3 C_3 + 1}.$$
(2.23)

With all these taken into account, we can find the offset cancellation loop gain as

$$G_{\text{loop}}(s) = A_0 \times H_{LP1}(s) \times G_{COMP} \times G_{DGC} \times G_{INT} \times G_{DAC} \times H_{LP2}(s) \quad (2.24)$$

and the low frequency signal transfer function as

$$\frac{V_{out}}{V_{in}} = \frac{A_0}{1 + G_{loop}}.$$
 (2.25)

Fig. 2.22 shows the loop gain as a function of frequency for DGC=000...111 with  $A_0 \approx 66 \text{ dB}, R_1 = 0.9 \text{ M}\Omega, C_1 = 0.3 \text{ pF}, R_2 = 0.1 \text{ M}\Omega, C_2 = 0.18 \text{ pF}, V_{SW} = 0.8 \text{ V},$ and 1 LSB = 1 mV. For this simulation we have used  $Z = e^{T_{CK}s}$  where  $T_{CK} = 1/f_{CK}$ is the period of a 20 MHz clock. It is evident for a wide range of DGC levels, the loop is stable. Fig. 2.23 demonstrates the low frequency signal transfer function. It can be seen that for the lowest gain of the DGC, that is DGC=000, the low-end cut off frequency is equal to  $f_L \approx 240 \text{ Hz}$  which could not be achieved using analog filters.



Figure 2.22: The offset cancellation loop gain.



Figure 2.23: The low frequency signal transfer function.

### 2.2.6 Experimental Results

The fully integrated optical receiver was fabricated in a  $0.13\mu$ m low cost CMOS SOI process. Fig. 2.24 shows the die photograph of the designed optical receiver. It consumes about 1.8 mW of DC power at 0.9 V supply and occupies 0.01 mm<sup>2</sup> in silicon area. A test chip was also fabricated with the same receiver but a voltage to current converter network (V2IC), as shown in Fig. 2.25 instead of the PD. This "electrical receiver" is used to characterize the TIA and compare the optical and electrical noise results. To characterize the electrical one, an open drain test buffer as shown in Fig. 2.26a is hooked up to the output of the PA1. An external bias tee provides the proper bias with a wide bandwidth. We also connect a differential output buffer as shown in Fig. 2.26b to the optical receiver to simultaneously capture and test eye diagram and bit error rate, respectively.

Fig. 2.27 demonstrates the frequency response of the measured transimpedance converted from scattering parameters measurement using HP8753E vector network analyzer. The DC transimpedance gain and -3dB bandwidth (BW) are measured to be 76.4 dB $\Omega$  and 1.9 GHz, respectively. The test buffer degrades the TIA gain and BW in both simulation and measurement. Also, the measured gain and BW



Figure 2.24: Die photograph of the proposed fully integrated optical receiver. The DOC circuitry that occupies  $0.0075 \text{ mm}^2$  is not shown here because of top metal fillers that cover the underneath blocks.



Figure 2.25: The V2IC network used in the electrical receiver to electrically characterize the TIA.

are slightly lower than the simulation which is mainly due to probe loss and process variations. Fig. 2.28 represents the rms noise voltage measurement at the output of the test buffer using a wideband oscilloscope. The total rms noise voltage measures to be 0.57 mV. The measured noise associated with the oscilloscope itself is 0.25 mV. Since these are two uncorrelated noise quantities, the TIA output rms noise voltage equals 0.51 mV. As a result, the input-referred rms noise current of the TIA ( $I_{n,in}$ ) turns to be 0.51 mV/6.6 k $\Omega = 77$  nA. However, the noise is measured under a lowered BW of 1.9 GHz whereas the TIA is designed to have a BW of 2.8 GHz. Therefore, the  $I_{n,in} \approx 77$  nA × 2.8 GHz/1.9 GHz = 113 nA. This value is highly consistent with the noise analysis presented in Section 2.2.3 and Fig. 2.9a.

Fig. 2.29 shows the test setup used to optically characterize the receiver. In this setup, a PRBS-7 with a bit rate of 4 Gbps is generated using Anritsu Pulse Pattern Generator (MP1763B) to feed the microLED driver. The light coming out of the blue



Figure 2.26: (a) Test buffer used to linearly characterize the electrical receiver and (b) output buffer used to drive external equipment for the optical receiver eye and bit error rate measurements.



Figure 2.27: The frequency response out of the test buffer for the electrical receiver.



Figure 2.28: Output rms noise voltage measurement using a wideband oscilloscope.

CROME is focused on the PD using the optical lenses and the mirror. The electrical output of the chip is simultaneously measured by Keysight sampling oscilloscope for the eye diagram and Anritsu (MP1764A) Error Detector for the bit error rate test. In a different optical setup, the PD responsivity is measured to be 0.17 A/W.

Fig. 2.30 illustrates the captured eye diagram when the photodetector is illuminated by the CROME blue light and generates a peak-to-peak current of about 5  $\mu$ A. Fig. 2.31 and Fig. 2.32 demonstrate the bit error rate of the receiver at 4 Gbps as a function of the unit interval (UI) and optical power, respectively. As it can be seen, the sensitivity is about -15 dBm at the bit error rate of  $10^{-12}$ . This corresponds to about  $5 \mu$ A of the PD current. However, with the 119 nA measured input-referred rms noise current, the receiver should achieve a sensitivity of  $14 \times 113 \text{ nA} \approx 1.6 \mu$ A. Therefore, the degradation in the sensitivity is caused by the extra shot noise of the PD.

Table 2.1 summarizes and compares our prototype measurement results with prior arts that employ on-chip photodetectors and communicate at similar bit rates. To the best of our knowledge, the proposed receiver is the only 420 nm-wavelength fully integrated receiver that runs at several Gbps. The energy efficiency of the chip is several orders of magnitude better than similar works thanks to the extremely low capacitance of the PD along with the transimpedance-to-noise optimization that was



Figure 2.29: The setup used to test the fully integrated optical receiver.



Figure 2.30: The eye diagram measurement at 4 Gbps with photodetector current of about 5  $\mu {\rm A}.$ 



Figure 2.31: The bathtub curve of the optical receiver running at 4 Gbps.



Figure 2.32: The bit error rate measurement of the fully integrated optical receiver at 4 Gbps as a function of the input optical power.

adopted in this work. Also, our chip achieves superior sensitivity except [24, 41–43]. However, [41] and [42] incorporate single-photon avalanche diodes (SPAD) that have significant gain. Nevertheless, these highly energy and area inefficient receivers operate in a non-linear mode and can hardly run beyond 1 Gbps with an acceptable bit error rate. Moreover, better sensitivity is achieved in [24] owing to a high gain APD. However, that receiver runs at a lower speed and consumes a significant power. Likewise, a PIN diode based work in [43] presents a better detection but at a lower bit rate and consumes a remarkable power.

| Comparison |
|------------|
| of         |
| Table      |
| 2.1:       |
| Table      |

| Core<br>Area<br>mm <sup>2</sup> )                                                | 0.27 65 nm CMOS | 1.31 $\begin{array}{c} 0.6\mu\mathrm{m}\\ \mathrm{BiCMOS}\end{array}$ | 0.24 65 nm CMOS  | - 130 nm<br>CMOS | $-$ 0.35 $\mu m$ HV CMOS | 0.0275 28 nm CMOS  | - 130 nm<br>CMOS | ).0252 40 nm CMOS    | $\begin{array}{c} 0.01 \\ \text{CMOS SOI} \end{array}$ |
|----------------------------------------------------------------------------------|-----------------|-----------------------------------------------------------------------|------------------|------------------|--------------------------|--------------------|------------------|----------------------|--------------------------------------------------------|
| Energy<br>Efficiency<br>(pJ/b) ((                                                | 16              | 80                                                                    | 11.5             | 800              | 72.6                     | 14.9               | 230              | 1 41.5               | 0.45                                                   |
| BER                                                                              | $10^{-12}$      | $10^{-9}$                                                             | $10^{-12}$       | $10^{-9}$        | $10^{-9}$                | $3 \times 10^{-8}$ | $2^{-3}$         | $3.3 \times 10^{-1}$ | $10^{-12}$                                             |
| Sensitivity<br>(dBm)                                                             | -3.8            | -19.6                                                                 | -3.2             | -31.7            | -31.8                    | 0.1                | -46.1            | 1.4                  | -15.3                                                  |
| Data<br>Rate<br>(Gbps)                                                           | 3.125           | 1.25                                                                  | 4                | 0.1              | 1                        | co                 | 0.5              | 1                    | 4                                                      |
| $\begin{array}{c} \operatorname{PD}\mathfrak{R} \\ (\mathrm{mA/W}) \end{array}$  | I               | 520                                                                   | I                | NA               | 30000                    | 2.2                | NA               | 1.4                  | 170                                                    |
| PD<br>Cap<br>(fF)                                                                | 14000           | I                                                                     | 14000            | NA               | 570                      | 600                | NA               | 450                  | 10                                                     |
| $\begin{array}{c} \mathrm{PD} \\ \mathrm{Area} \\ (\mu\mathrm{m}^2) \end{array}$ | 62500           | 132548                                                                | 62500            | 51471            | 31416                    | 540                | 1795600          | 270                  | 225                                                    |
| PD<br>Type                                                                       | Si PN           | NIA                                                                   | Si PN            | Si SPAD          | Si APD                   | Si<br>Schottky     | Si SPAD          | Si<br>Schottky       | Si PD                                                  |
| λ<br>(mm)                                                                        | 670             | 660                                                                   | 670              | 450              | 675                      | 1310               | 450              | 1310                 | 420                                                    |
| References                                                                       | JSSC [44]       | JLT [43]                                                              | TCAS-<br>II [45] | JSSC [42]        | JSSC [24]                | ESSCIRC<br>[46]    | ISSCC [41]       | JSSC [29]            | This work                                              |



Figure 2.33: The top level schematic of the optical array.

# 2.3 A 32 Elements 64 Gbps Data throughput Optical Receiver Array

Although the single-ended optical receiver presented in Section 2.2 achieves a high sensitivity with a very low power consumption and small area, it is not suitable for large arrays where TSV technology is not available to alleviate the routing inductance. In this Section, we discuss the development of a fully differential optical receiver for parallel chip-to-chip communication. This prototype includes 32 elements and employs  $50 \Omega$  drivers for the test and characterizing. In this chip, every block is differentially designed so that the ground bounce and supply bounce due to routing inductance does not degrade the SNR. In addition, to further suppress the supply bounce in the first few stages where the signal swing is still small, an on-chip LDO is designed for each array element. Fig. 2.33 shows the top level schematic of the prototype excluding the drivers.

# 2.3.1 Transimpedance-to-Noise Optimization for Differential Shunt Feedback Matched TIA

Fig. 2.34a depicts the differential shunt feedback matched TIA (DSFM-TIA). Here,  $C_B$  is used to provide a partial ac ground at node IN, despite [47] that  $C_B$ is intended for a solid ac ground. This way provides the utmost symmetry and so provides the maximum power supply rejection which is critical for large arrays. Ignoring channel length modulation and other second order effects, we can show that  $V_X = V_Y$  when the X and Y nodes are not connected to each other. As a result, no current flows between these two nodes and so we can remove the wire connection in between. This simplifies the circuit to that shown in Fig. 2.34b in which  $A(s) = A_0/(1 + s/\omega_A)$  where  $A_0 = g_m R_D$  and  $\omega_A = 1/R_D C_D$  are the DC gain and dominant pole of the open loop differential amplifier, respectively. Also,  $g_m$ denotes the transconductance of  $M_{1,2}$ . By performing KCLs at nodes IN and IP of the circuit shown in Fig. 2.34b and assuming  $C_{PD} = C_B$ , we can find the transimpedance transfer function of  $V_{out}/I_{in}$  as

$$Z_T(s) = R_T \frac{\omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$
(2.26)

where

$$R_T = \frac{A_0}{1 + A_0} R_F \tag{2.27}$$

$$\omega_n^2 = \frac{(A_0 + 1)\,\omega_A}{R_F C_T} \tag{2.28}$$

$$\zeta = \frac{1}{2} \frac{R_F (C_T + A_0 C_{gd}) \omega_A + 1}{\sqrt{(A_0 + 1) \omega_A R_F C_T}}$$
(2.29)

 $C_T = C_{PD} + C_I$ , and  $C_I = C_{gs} + C_{gd}$ . For  $A_0 \gg 1$ , it is evident that  $R_T \approx R_F$ . Interestingly, this simplified model yields the same parameters obtained in Subsection 2.2.3. Therefore, for  $\zeta = \sqrt{2}/2$  we can use (2.6) and (2.7) to calculate the -3dB



Figure 2.34: (a) Circuit implementation of the proposed differential shunt feedback matched transimpedance amplifier (DSFM-TIA) and (b) the simplified model of the TIA.



Figure 2.35: The DSFM-TIA in presence of noise. Here we calculate the output noise voltage due to only half of the circuit. Due to the symmetry, the other half of the circuit will contribute the same amount of noise. Since the noise sources are uncorrelated, the output noise voltage will be  $\sqrt{2}$  times the half circuit noise.

bandwidth and transimpedance limit, respectively. In addition, if we plug in (2.7) into (2.6), we can find the required DC gain as

$$A_0 = \frac{\sqrt{2}}{2} \frac{GBW}{BW_{-3dB}}.$$
 (2.30)

The above equation reveals that for applications where a lower  $BW_{-3dB}$  and a higher GBW is desired, a lower  $A_0$  is demanded. This is a useful insight on what configuration for the core amplifier is the best.

Fig. 2.35 illustrates the small signal model of the DSFM-TIA along with the noise contributors. Here we calculate the output noise voltage due to only half of the circuit. Owing to the symmetry, the other half of the circuit will contribute the same amount of noise power. To find the output-referred noise voltage, we can perform KCLs at nodes ON, OP, IP, IN, and X. In this figure,  $i_{n,in}$  is the input-referred noise current that is calculated by dividing the output-referred noise voltage by the transimpedance transfer function. This results the input-referred noise current to be

$$i_{n,in} = \sqrt{2} \left( \frac{(i_{n,R_D} - i_{n,d} + i_{n,R_F})(R_F C_T s + 1)}{R_F C_{gd} s - R_F g_m + 1} - i_{n,R_F} \right).$$
(2.31)

where again  $i_{n,R_D}^2 = 4kT/R_D$ ,  $i_{n,d}^2 = 4kT\Gamma g_m$ , and  $i_{n,R_F}^2 = 4kT/R_F$  are noise current spectral density of the drain resistor, FET channel, and feedback resistor per 1-Hz bandwidth, respectively. Assuming  $f \ll g_m/(2\pi C_{gd})$ ,  $g_m R_F \gg 1$ , and  $1/(g_m R_D) \ll$  $\Gamma \ll g_m R_F$ , we can find the input referred noise current spectral density as

$$i_{n,in}^2(f) \approx 2\left(\frac{4kT}{R_F} + 4kT\Gamma\frac{(2\pi C_T)^2}{g_m}f^2\right)$$
 (2.32)

Which includes the same terms derived in (2.9). Therefore, with defining the same noise bandwidths presented in Subsection 2.2.3 and knowing that  $\zeta \approx 0.7$  yields the lowest noise bandwidths we can write the input-referred rms noise current as

$$I_{n,in}^{2} = 17.6\pi kTBW_{-3dB}^{3} \left(\frac{C_{T}}{GBW} + \Gamma \frac{C_{T}^{2}}{f_{T}C_{I}}\right).$$
 (2.33)

Fig. 2.36a illustrates  $I_{n,in}$  as a function of  $C_I$  for different values of GBW. It is evident that  $I_{n,in}$  improvement for GBW > 10 GHz is little. Moreover, similar to that discussed in Subsection 2.2.3, we can observe that  $0.22C_{PD} < C_I < 0.28C_{PD}$ maximizes  $R_F/I_{n,in}$  as a figure of merit 2.2.3 which improves the transimpedance without sacrificing noise performance. As a result, we design the DSFM-TIA with GBW < 10 GHz and  $C_I \approx 0.25C_{PD}$  to minimize input-referred rms noise current, DC power consumption, and silicon area. In addition, (2.30) reveals that for a bit rate of 2 Gbps that requires a bandwidth of 1.4 GHz,  $A_0$  has to be 5. This implies that multi-stage or telescopic configurations that boost the DC gain should be avoided for the optimal design which is vastly used in the literature.



Figure 2.36: Quantitative illustration of the (a) input-referred rms noise current and (b) transimpedance-to-input noise as a function of  $C_I$  for different values of GBW for  $BW_{-3dB} = 1.4 \text{ GHz}$ ,  $C_{PD} = 10 \text{ fF}$ ,  $f_T = 30 \text{ GHz}$ , and  $\Gamma = 3$ .



Figure 2.37: (a) The schematic of PA1 and  $G_m$  and (b) PA2 and subsequent five cascaded limiting amplifiers.

### 2.3.2 Differential Post Amplifier and Limiting Amplifiers

Fig. 2.37a shows the schematic of the first and second post amplifiers (PA1 and PA2) and the  $G_m$  transistors pair that is steered by the digital offset cancellation output for the offset cancellation. Fig. 2.37b depicts five stages of limiting amplifiers (LA) that follows PA2 output. While maximum swing is achieved with a fewer number of LAs, the 5 stages are used based on simulation at the worst corner and the highest temperature to ensure the same level of swing.

# 2.3.3 Differential Digital Offset Cancellation

Because of the considerations discussed in 2.2.5, a digital offset cancellation (DOC) is used. However, unlike the single-ended DOC that the DC level of the receiver output is compared with a desirable common mode level, here we compare the common



Figure 2.38: The differential digital offset cancellation circuitry.



Figure 2.39: The differential digital to analog converter (DDAC).

mode level of each end of the differential receiver output. As a result, no reference voltage is required. Fig. 2.38 shows the digital offset cancellation top level schematic. In addition, a differential digital to analog converter (DDAC) is used to convert the digital integrator code to a proper analog level. Fig. 2.39 shows the DDAC configuration in which two singled-ended DACs are used but the reference voltage pins are swapped for one of them. We note that the absolute level of REFL does not matter as long as it provides a sufficient common mode voltage to drive the  $G_m$  pair shown in Fig. 2.37a.

## 2.3.4 Experimental Results

The chip was fabricated in the  $0.13 \,\mu\text{m}$  CMOS SOI process. Fig. 2.40 shows the die photo of the fabricated chip. The 32 RX elements were placed in a rectangle shape to evenly distribute the routs and provide a proper cross section for plastic imaging fibers. Each array element including the blue light photodetector occupies



Figure 2.40: Die photo of a  $8 \times 8$  optical transceiver that consists of  $4 \times 8$  microLED drivers and  $4 \times 8$  receivers.

 $50 \,\mu\text{m} \times 50 \,\mu\text{m}$  in area. The offset cancellation circuitry is not placed in this area to increase the density of the array. Each array element is connected to a differential  $50\Omega$  buffer to drive the test equipment. The buffer schematic is the same used in the single-ended receiver which is shown in Fig. 2.26b.

Fig. 2.41 shows the eye diagram of one end of the differential test buffer of one RX element when the photodetector is modulated with an optical power that produces  $2\mu$ A of current at 2 Gbps. With the responsivity of 0.17 A/W, the optical power is equal to  $11.76 \mu$ W or equivalently -19.3 dBm for the  $2\mu$ A. The other end of the differential output buffer is connected to the bit error tester. Fig. 2.42 represents the bit error rate versus input optical power for two PRBS sequences. For this figure, an optical attenuator is used to lower the power and find the corresponding bit error rate. The bathtub curve is also displayed in Fig. 2.43. The receiver array element for these tests consumes about  $0.5 \,\text{pJ/bit}$ , excluding the LDO power. The LDO adds about



Figure 2.41: Output measurement of eye diagram at 2Gbps with photodetector current of  $2\mu A$ .



Figure 2.42: Bit error rate versus power at 2 Gbps.


Figure 2.43: Bit error rate versus unit interval at 2 Gbps.

0.3 pJ/bit to the energy consumption. However, with a better packaging technologies, the LDO can be removed without degrading the performance as all building blocks are differential and can highly attenuate the ground and supply bounce.

# 2.4 A 128 Elements 256 Gbps Data throughput Optical Receiver Array

Fig. 2.44 illustrates another prototype fabricated in the  $0.13 \,\mu\text{m}$  CMOS SOI process. This chip consists of 128 RX and 128 TX elements. Each element's schematic is the same as shown in Fig. 2.33. In this prototype, each array element connects to a buffer specialized for Advance Interface Bus (AIB) [48]. The far-side specs for the singled-ended AIB interface over a 10 mm chip-to-chip length that introduces a 0.5 pF capacitance includes 0 V for logic level 0 and 0.7 V-0.9 V for logic level 1. Thus, the swing voltage should be greater than 0.7 V and the overshoot has to be be less than 0.25 times the swing voltage. The eye width needs to be greater than 0.56 UI at 2 Gbps data rate. While we can hook up one end of the differential receiver to a CMOS buffer, this would cause a significant high frequency current spikes, subsequently resulting in a remarkable ground bounce. To alleviate this, the AIB buffer is designed as shown in



Figure 2.44: The die photo of 128 RX and 128 TX elements with AIB buffers.



Figure 2.45: The AIB buffer schematic.

Fig. 2.45. In this configuration,  $R_S$  provides some degree of supply current regulation without noticeably lowering the signal swing. Also, with the PFET input devices, the buffer can swing all the way to zero voltage level, meeting the AIB specs. Fig.2.46 shows the eye diagram of the AIB buffers outputs running at 2 Gbps with a supply voltage of 1.1 V. The worst case scenario happens when 127 channels out of the 128 channels operate coherently (aggressors) whereas the remaining 1 channel (victim) runs incoherent with respect to the 127 channels. The red eye diagram is the victim in presence of more than 2 nH routing inductance for ground and supply per each channel.



Figure 2.46: Simulation of the eye diagram of the victim (red) running at 2 Gbps.

### 2.5 Parallel vs Serial Optical Communication

An optical interface for sub meter chip-to-chip communication has the potential to improve the energy efficiency by removing the energy dissipation associated with the capacitance of the electrical interconnects. While it is obvious that a parallel link can remove the extra power dissipation introduced by SerDes electronics, no other clear immediate observations can be made on the energy efficiency of a parallel optical link versus a serial optical link. Therefore, it is desired to have an analytical comparison between the energy efficiency of these two links for the same data throughput with the same emitted optical power.

Fig. 2.47 shows a typical serial optical link in which the received optical signal is converted to a current  $I_{in,S}$  by a photodiode whose parasitic capacitance is  $C_{PD,S}$ . To amplify the signal received by the photodiode and reach to a sufficient swing level,  $V_{SW}$ , a transimpedance amplifier (TIA) followed by multiple limiting amplifiers (LA) are required. On the contrary, Fig. 2.48 depicts an optical transceiver link in which a total optical power of  $N \times P_P^{op}$  identical to the serial optical power,  $P_S^{op}$ , is generated by N smaller light emitters in the transmitter within the same area of



Figure 2.47: Serial communication.

 $w \times w$ . Likewise, at the receiver side, N smaller photodiodes are used to detect the parallel signals. With this approach, each small element of the parallel receiver will have a smaller photodiode that has a surface area equal to 1/N of the serial one. This corresponds to a parasitic capacitance that is 1/N times smaller. On the other hand, each photodiode receives an optical power of  $P_P^{op} = P_S^{op}/N$ . However, since the responsivity of photodiodes is independent of the area, the current that will be generated by each photodiode will be  $I_{in,P} = I_{in,S}/N$ .

Assuming the serial link operates at a bit rate of  $R_b$ , each smaller transceiver in the parallel link can operate at a lower bit rate, that is  $R_b/N$ , to produce a throughput equal to the serial link. Subsequently, since the TIA is usually designed to have a bandwidth equal to  $0.7R_b$  to minimize noise and Inter Symbol Interference (ISI) [26], the bandwidth requirement for the parallel link's element will be 1/N of the serial one. Therefore, using (2.7) and (2.33) to calculate feedback resistance ( $R_F$ ) and



Figure 2.48: Parallel communication.

input-referred rms noise current  $(i_{n,in})$ , respectively, and an optimum design point of  $C_{PD} \approx 0.25 C_I$ , which were all discussed in Subsection 2.3.1, the TIA in the parallel link can accommodate a feedback resistor as  $N^3$  times large as the serial one with a  $N^2$  times lower input-referred rms noise current, assuming the same power is consumed in both TIAs. Table 2.2 summarizes the photodiode and associated TIA specifications in both serial and parallel link, normalized to the serial one.

In the serial link, beside the TIA, we should employ  $m_S$  identical stages of LA to further amplify the signal and reach to the proper signal swing of  $V_{SW}$ . In this case, the equivalent -3dB bandwidth of the  $m_S$  limiting amplifiers will be

| Parameter                                    | Serial | Parallel |
|----------------------------------------------|--------|----------|
| TX Optical Power $(P^{op})$                  | 1      | 1/N      |
| Photodiode Capacitance $(C_{PD})$            | 1      | 1/N      |
| Photodiode Responsivity $(\mathcal{R}_{PD})$ | 1      | 1        |
| Photodiode Current $(I_{in})$                | 1      | 1/N      |
| -3dB Bandwidth $(BW)$                        | 1      | 1/N      |
| Feedback Resistance $(R_F)$                  | 1      | $N^3$    |
| Input-referred rms Noise $(I_{n,in})$        | 1      | $1/N^2$  |
| $I_{in}/I_{n,in}$                            | 1      | N        |
| Signal Swing out of TIA                      | 1      | $N^2$    |

Table 2.2: Summary of Photodiode and TIA Attributes for the Serial vs Parallel's Elements Normalized to the Serial.

 $BW_{L,S}\sqrt{2^{1/m_S}-1}$  [26,49] where  $BW_{L,S}$  is the -3dB bandwidth of each limiting amplifier in the serial link. We can also write  $BW_{L,S} = GBW_S/A_L$  where  $GBW_{L,S}$  and  $A_L$  are the gain-bandwidth product and DC gain of each limiting amplifier, respectively. From maximizing gain-bandwidth product point of view,  $A_L$  has to be equal to  $\sqrt{e} \approx 1.65$  [50], however to save DC power consumption, it is better to design for a higher gain, if the speed allows [51]. Since each parallel link's element runs at a bit rate of 1/N times the serial one, the equivalent -3dB bandwidth of  $m_P$  limiting amplifiers of the parallel link's element also has to be 1/N of the serial one. Thus, assuming the same DC gain for the limiting amplifier in either link, we can find the relationship between the gain-bandwidth product of these two links as

$$\frac{1}{N}GBW_{L,S}\sqrt{2^{1/m_S}-1} = GBW_{L,P}\sqrt{2^{1/m_P}-1}.$$
(2.34)

On the other hand, we must ensure that the signal swing at the end of the receiver in

the serial and parallel's element is equal. According to Table 2.2, this mandates that

$$A_L^{m_S} = N^2 A_L^{m_P}.$$
 (2.35)

This is because the signal of each parallel link element's photodiode is 1/N times smaller, but it is amplified by the corresponding TIA that has  $N^3$  times larger feedback value than its serial counterpart.

To reduce common mode noise such as substrate and supply noise, and also ground bounce caused by non-zero inductance of bond wires or bumps, it is desired to used differential configurations for all of the amplifiers in the entire chain. In a differential configuration, we can find the DC power consumption equal to  $V_{DD}I_{SS}$  where  $V_{DD}$ and  $I_{SS}$  are the DC power supply voltage level and tail current source level, respectively. On the other hand, using the long channel transistor model for the FETs in a CMOS process, we can write  $I_{SS} = g_m^2/(\mu_n C_{ox} W/L)$  where  $\mu_n$ ,  $C_{ox}$ , W, and Lare the mobility of the electrons, oxide capacitance, width, and length of the FET, respectively. Moreover, we can replace the transconductance with  $g_m = 2\pi C_{gs} GBW$ where  $C_{gs}$  is the input capacitance of the following stage [52]. Therefore, we can write the DC power consumed by the core amplifier for the TIA and limiting amplifier as

$$P = \xi G B W^2 \tag{2.36}$$

where  $\xi = V_{DD}(2\pi C_{gs})^2/(\mu_n C_{ox}W/L)$ . In this equation,  $\xi$  will be constant, if the dimensions of the FETs are kept the same for the TIA and limiting amplifiers in both serial and parallel link.

To compare the DC power consumption of serial and parallel optical receivers for

the same data throughput, we can write

$$P_{S} = P_{TIA,S} + m_{S}P_{LA,S}$$

$$P_{P} = N(P_{TIA,S} + m_{P}P_{LA,P})$$
(2.37)

Assuming the same dimension and DC power for the TIA and each stage, we can use (2.37) and (2.36) to find the ratio of the DC power consumption of the serial receiver to the parallel one, that is  $P_S/P_P$ , as a function of GBW. Eventually, we can also use (2.34) and (2.35) to simplify the links' power ratio to

$$\frac{P_S}{P_P} = N \frac{(1+m_S)(2^{1/m_P}-1)}{(1+m_P)(2^{1/m_S}-1)} 
= N \frac{(1+m_S)(2^{1/(m_S-2\ln N/\ln A_L)}-1)}{(1+(m_S-2\ln N/\ln A_L))(2^{1/m_S}-1)}.$$
(2.38)

Fig. 2.49 shows the number of limiting amplifier stages required in each parallel link receiver element as a function of N and  $m_S$ , assuming  $A_L = 3$ . A lower number in the parallel one is required because of having a higher signal swing right after the corresponding TIA. In this calculations, we have assumed the lowest number of LA stages for the parallel receiver element has to be 1. Fig. 2.50 demonstrates the significant DC power reduction in the parallel receiver for the same data throughput which translates to the same level of enhancement in the energy efficiency. In addition, in the parallel receiver  $I_{in}/I_{n,in}$  is improved by a factor of N, as evident from Table 2.2. This implies a significant improvement in the bit error rate of the receiver. Therefore, to enhance energy efficiency and bit error rate of parallel optical receivers, it is highly desired to use the transceiver with the smallest light emitters and detectors.



Figure 2.49: The number of limiting amplifier stages needed in each parallel's receiver element.



Figure 2.50: The ratio of DC power consumed by the optical receiver in the serial link to the total power in the parallel link. A significant energy efficiency improvement is achieved in the parallel link for the same data throughput.

## 2.6 Conclusion

A highly compact optical receiver prototype with an on-chip blue light photodetector was fabricated in a  $0.13\mu$ m CMOS SOI process. Thanks to the extremely low capacitance of the photodetector, a high transimpedance is attained immediately in the TIA. This along with the transimpedance-to-noise optimization approach presented in this chapter makes the power consumption very low, even in a relatively old CMOS process. The high energy efficiency and compactness of the receiver makes the prototype a viable optical interface for parallel chip-to-chip communications. In addition, two fully differential large array prototypes were presented, targeting for a high data throughput test prototype and real chip-to-chip link prototype, respectively. Moreover, we analytically and quantitatively described the energy efficiency and signal-to-noise advantages of a parallel optical link over the serial counterpart.

#### CHAPTER III

# A Transimpedance-to-Noise Optimized Analog Front-end with High PSRR for Pulsed ToF Lidar Receivers

#### 3.1 Introduction

Electromagnetic waves radiation and reflection have enabled the detection of objects for a myriad of applications. The detection includes range, angle, and velocity. While continuous or pulsed source generators [51,53–57] in RF, mm-Wave, and THz bands of the electromagnetic (EM) spectrum can be used for this purpose, they are less capable to provide a high lateral resolution due to the relatively long length of the wave. On the contrary, the infrared (IR) band of the EM spectrum can provide a high lateral resolution thanks to the shorter wavelength of the laser sources. Although frequency modulated continuous wave (FMCW) range finders in the IR band provide outstanding range accuracy, they can hardly be used for far distance measurement because of eye safety regulations. Pulsed range finders, on the other hand, can identify far distances by emitting high peak power while maintaining average power below the safety limit [58]. In spite of the FMCW range finders that inevitably employ electro-optical components for generating stable modulated light [59], the pulsed counterpart often does not require photonics integration.

The pulsed range finder measures the distance simply by emitting a pulse of light and receiving it by a photodiode (PD) after being reflected from an object. The distance is then equal to the wave's velocity  $(v_p)$  times the half round-trip time. Furthermore, for a given distance, the amount of received energy can characterize the illuminated object to some extent. With the advancement of optical phased arrays (OPA), a non-mechanical beam steering is feasible which paves the way for fast 3-D scanning of a large area [60–62].

Fig. 3.1 shows building blocks of a typical incoherent pulsed ToF laser range finder which is also known as Lidar. The control unit begins the distance measurement by sending START command to the time-to-digital converter (TDC) block. Simultaneously, a short laser pulse, in the range of nanoseconds and amplitude of a few tens of milliwatts, is transmitted by the laser diode. The pulse travels and hits the object. Then, the pulse is reflected back and is sensed by a photodiode. The received signal is converted to a current proportional to the received energy and responsivity of the photodiode. The analog front-end (AFE) amplifies the signal to a level that is recognizable to the time discriminator (TD) unit. The TD sends STOP command to the TDC, ideally independent of the received pulse amplitude and rise time. In addition, the peak of the signal can be held by the peak detection and hold (PDH) block for further processing such as grayscale measurement.

The rise time of the received pulse can change because of several reasons as shown in Fig. 3.2. If the received pulse is compared with only a constant threshold, there will be errors in the distance measurement called walk-error [63]. To perform a detection that does not depend on the rise time of the received pulse, a commonly used circuit is the constant-fraction discriminator (CFD) [64, 65]. In this technique, instead of comparing the leading edge of the pulse with a single threshold voltage, a delayed version of the pulse with a non-delayed but attenuated version of the pulse is compared. This results in a comparison that is independent of the slope of the leading



Figure 3.1: Building blocks of a typical pulsed Lidar system.



Figure 3.2: The possible shapes of the received pulses. If only a threshold detection is used for timing discrimination, errors depending on the rise time of the leading edge of the pulse in the distance measurement will appear.



Figure 3.3: A top-level schematic of a constant-fraction discriminator (CFD) as a time discriminator (TD). An analog-front (AFE) in pulsed Lidar receivers usually drives a CFD or other types of TDs.

edge of the pulse. Mathematically speaking, if we assume that the leading edge of the pulse has a slope of m, and the delay and attenuation are  $\tau$  and  $\alpha$ , respectively, the equation for the comparison would be  $m(t - \tau) = \alpha m t$ . The solution for this equation is  $t = \tau/(1 - \alpha)$  which clearly is independent of the slope. A transmission line with matched terminations and with a delay of less than  $\tau < 2t_r$  is suitable for such a purpose. Fig. 3.3 depicts a simplified circuit realization of a CFD. Therefore, if an off-chip CFD is used as a TD unit, the AFE has to drive a low impedance load, typically 50  $\Omega$ . Also, there are several other techniques that can compensate for the walk-error [66–69]. For instance, [66] mitigates the detection dependency on the rise time with double-threshold comparison, obviating the necessity for a gain control mechanism in the AFE.

The AFE plays a critical role in the receiver chain from different perspectives. For a given signal-to-noise ratio (SNR), the peak of the signal voltage to rms-noise voltage at the output of the AFE or equivalently, the peak of the signal current to the inputreferred rms noise current), the level of the input-referred noise current of the AFE determines the minimum detectable signal (MDS). This in turn specifies the maximum range that can be measured for a given laser pulse power and reflecting object. As a result, the higher the sensitivity, the longer the range that can be measured. In addition, the AFE should amplify the MDS to an extent that is distinguishable by the TD unit for the given SNR; this consequently determines the transimpedance gain of the AFE. Besides, the minimum required bandwidth to preserve the pulse shape is given by  $BW_{-3dB} \approx 0.35/t_r$  where  $t_r$  is the rise time of the received laser pulse [70,71]. Also, the precision of a single range measurement precision  $\sigma_R$  can be approximated as  $\sigma_R \approx 0.5v_p t_r/SNR$  where again  $v_p$  is the laser pulse velocity [72]. A practical value for the SNR to keep the false detection negligible [70] is 10, however it can be as low as 3.3 in particular applications [67].

Moreover, the power supply rejection parameter is of critical importance because not only does it determine the signal integrity but also if low, the AFE can be prone to instability due to the disturbance caused by supply line inductance [73]. The power supply rejection importance is more pronounced when an array of receivers is used. In this case, if all the receivers amplify an incoming signal coherently, a large amount of current might be drawn from supply rails which might cause severe stability and cross-talk problems [26]. If the AFE is fully differential, the above issue would less likely occur as the supply line noise mostly appears as a common-mode noise. However, although the fully differential AFE offers twice as much transimpedance gain as the single-ended counterpart does, it consumes at least twice the power. Besides, as will be explained later, for the same bandwidth, a single-ended AFE can be driven by twice as much photodiode's capacitance as a fully differential one can be. Therefore, neglecting the noise related issues, the overall electrical efficiency of a single-ended AFE outweighs the fully differential one. Nonetheless, a fully differential AFE is more immune to all types of common-mode noises such as power supply and substrate noises.

In this chapter, we explore an optimum design of a fully differential CMOS AFE in which the desired specification at the normal condition with  $V_{DD} = 1.8$  V consists of input-referred noise current of less than 100 nA, -3dB bandwidth broader than 300 MHz, transimpedance gain higher than  $94 \,\mathrm{dB}\Omega (\approx 50 \,\mathrm{k}\Omega)$ , PSRR greater than 80 dB in the entire bandwidth, and total power consumption less than 50 mW, including a 50 $\Omega$  driver for buffering low impedance loads such as a CFD. The AFE is driven by a photodiode (PD) with a parasitic capacitance of 1 pF at the input. With these specifications, the AFE is suited for Lidar receivers requiring sensitivity (MDS) of 1  $\mu$ A over SNR = 10 with laser pulses that have  $t_r > 1.1 \,\mathrm{ns}$ . A Lidar receiver with these parameters can achieve  $\sigma_R \approx 16.5 \,\mathrm{mm}$  single measurement range precision. To prevent SNR degradation due to power supply noise, it is essential to keep the power supply noise at the output of the AFE far below the AFE's output noise (100 nA  $\times 50 \,\mathrm{k}\Omega = 5 \,\mathrm{mV}$ ). The minimum PSRR of 80 dB allows a power supply gain of at most +14 dB. As a result, the power supply rms noise should be less than 0.1 mV to ensure the noise at the output of the AFE due to power supply (0.5 mV) is 10 times smaller than AFE's noise due to internal components (5 mV).

A number of different transimpedance amplifiers (TIA) topologies might satisfy the above design requirements for the Lidar AFE. However, a resistive shunt-feedback TIA usually proves more versatile mainly because of fewer trade-offs between noise, bandwidth, gain, and voltage headroom consumption. That is why this topology has been widely used in analog front-ends, for instance in [66,68,69,74]; however, in these works, it has not been reported how the sizing and biasing of the AFE has been achieved. In theory, the sizing of the front-end FET has been analytically optimized to achieve the best noise performance [36,38]. However, an AFE implementation based on such an approach [71] encounters practical limitations, as will be explained in the paper, and results in significant DC power consumption and area by the frontend FET. In [75], the noise performance has been traded with DC power and silicon area, however, the decision on the trade-offs has not been explained. Also, none of these works has reported the PSRR performance which is important in prevention of SNR degradation due to common-mode noises such as power supply and substrate noise.

We propose a transimpedance-to-noise optimization approach for the design of the AFE with the aforementioned target specifications using a resistive shunt-feedback TIA front-end. The transimpedance-to-noise optimization approach results in a frontend FET whose input capacitance is identical to a small fraction of the photodiode capacitance. This approach not only culminates in area and power reduction compared to only noise optimization approach, but makes a high PSRR performance feasible thanks to using fewer amplifier stages as well. On the other hand, the noise performance remains very close to the minimum noise that can theoretically be achieved. In practice, the noise of the AFE is higher than the theoretical minimum noise due to second order effects that will be explained in the next Section. That is why the noise performance of the AFE using our approach might even present a better noise performance, depending on the PD size, bandwidth, and total required transimpedance gain. In this regard, we elaborate on the transimpedance-to-noise optimization approach in Section 3.2. The presented analysis in this section also helps designers to quickly find out the performance boundaries and conditions without circuit implementation and simulations. Building on this, a bias optimization along with the implementation of the AFE depicted in Fig. 3.4 is described in Section 3.3. In Section 3.4, experimental results of the prototype fabricated in a  $0.11 \mu m$  CMOS are demonstrated and compared with prior arts.

#### 3.2 Transimpedance-to-Noise Optimization

A number of different topologies can be used as a transimpedance amplifier in the AFE of a Lidar receiver. Since the signal generated by the PD has a high impedance, it is modeled by an ideal current source in parallel with a capacitor. Among all possible permutations for a FET, a common-gate configuration can intrinsically present low impedance, absorbing most of the PD current and converting it to a voltage. However,



Figure 3.4: Schematic of the proposed analog front-end (AFE) for pulsed Lidar receivers built on transimpedance-to-noise optimization approach.

a common-gate TIA is usually noisy because the channel noise of the FET directly adds to the signal. With the same bandwidth of the common-gate configuration, a regulated cascoode TIA incorporating a front-end FET with a lower transconductance can improve the noise performance owing to a feedback loop [76,77]. In this configuration, however, there is still a trade-off between voltage headroom consumption, bandwidth, and noise.

To relax the aforementioned trade-offs, similar to the TIAs discussed in Chapter II, the commonly used configuration is resistive shunt-feedback transimpedance amplifier. Fig. 3.5 illustrates half-circuit of a fully differential resistive shunt-feedback TIA as a potentially low noise topology. Despite the TIAs presented in Chapter II, we take advantage of a cascode configuration to reduce the miller introduced capacitance. This is particularly important as the front-end FET size is proportional to the size of the photodetector and in pulsed Lidar receivers the parasitic capacitance of the photodetector is quite large compared to the ones often times used in optical communication receivers. Usually the photodetectors that are used in the Lidar receivers are avalanche photodiodes (APD) to increase the responsivity at the cost of a large parasitic capacitance. The feedback resistor  $R_F$ , senses the output voltage and



Figure 3.5: Half-circuit of the fully differential resistive shunt-feedback TIA upon which the optimization analysis is studied. In a fully differential TIA, the photodiode capacitance has to be considered twice for the half-circuit analysis because of the virtual ground in the middle of the photodiode capacitance.

returns a feedback current that is compared with the current generated by the PD,  $I_{in}$ . While the input resistance of the open loop amplifier is high, the input resistance of the closed-loop circuit becomes  $R_F/(1 + A_0)$  where  $A_0$  is the DC open loop gain of the core amplifier. As a result, most of the PD current is absorbed by the TIA. Despite the regulated cascode topology, there is no voltage headroom constraint on  $R_F$ , allowing it to be arbitrarily large, as long as the bandwidth requirement is met.

Cascoding technique used in the core amplifier not only decreases the effective input capacitance but also increases  $A_0$  by virtue of a higher output resistance, resulting in a larger bandwidth. Assuming the source follower as an ideal buffer, we can write the transfer function of the transimpedance amplifier as

$$Z_T(s) = -R_T \frac{\omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$
(3.1)

where

$$R_T = \frac{A_0}{1 + A_0} R_F \tag{3.2}$$

$$\zeta = \frac{1}{2} \frac{R_F C_T \omega_A + 1}{\sqrt{(A_0 + 1)\,\omega_A R_F C_T}} \tag{3.3}$$

$$\omega_n^2 = \frac{(A_0 + 1)\,\omega_A}{R_F C_T}\tag{3.4}$$

 $C_T = C_{PD} + C_I$ , and  $\omega_A$  is the dominant pole of the core amplifier. As shown in Fig. 3.5,  $\omega_A$  is equal to  $1/(R_D C_D)$  where  $C_D$  is the total parasitic capacitance at the drain of  $M_2$  and is roughly equal to  $C_{gd2} + C_{db2} + C_{gd3}$  plus a fraction of  $C_{gs3}$ . These results are similar to those obtained in Section 2.3, however, the DC transimpedance gain is twice (3.2) for our fully differential AFE. This is because the coupling capacitors at the input of the AFE converts the PD signal current into two fully differential currents. It should be noted that  $C_{gs3}$  might completely disappear thanks to "bootstrapping" by the source follower if the input capacitance of the subsequent stage is negligible [52]. This is an advantage for using a buffer after the core amplifier as it allows reaching a larger gain-bandwidth product. Here,  $C_{PD}$  includes parasitic capacitance of the photodiode, PAD, and ESD. Also,  $C_I \approx C_{gs1} + C_{gd1}$ . For  $A_0 \gg 1$ , we again realize  $R_T \approx R_F$ . The above results for this TIA indicate that we can use the (2.6) to find the -3dB bandwidth with  $\zeta = \sqrt{2}/2$ .

To find the total input-referred rms noise current, we first calculate the inputreferred noise current power spectral density from each noise contributor in the TIA. To simplify the analysis, based on the observations in Section 2.3 and [52], we can show that the output noise voltage of the differential TIA is  $\sqrt{2}$  times the singleended one. Therefore, the total input-referred noise current of our TIA will be  $\sqrt{2}/2$ times the single-ended as the transimpedance gain is twofold. Fig. 3.6 depicts the small signal model of the TIA along with the major noise sources. To further simplify the analysis, we assume that there is no channel length modulation and so the noise contribution of  $M_2$  would be zero. Also, the input-referred noise current due to  $R_D$ can be considered small because  $R_D$  is usually high, or we can simply add it to the channel noise of the  $M_1$  as they are in parallel. In addition, we presume that  $M_3$  is



Figure 3.6: The major sources of noise in the TIA. The noise contribution of  $M_2$  can be assumed zero by ignoring channel length modulation of  $M_1$ . Moreover, since we desire a high DC gain,  $R_D$  noise can be neglected. Also, the blue noise current source at the input is a fictitious current source which results in the same noise voltage at the output coming from the physical noise contributors.

forming an ideal buffer using the source follower topology and its input-referred noise contribution is small as it is divided by the gain of the preceding stage.

Therefore, it is straightforward to show that for the single-ended TIA with buffer the input-referred noise current is

$$I_{n,in}^2 = 6.2\pi kTBW_{-3dB}^3 \left(\frac{C_T}{GBW} + \Gamma \frac{C_T^2}{f_T C_I}\right)$$
(3.5)

by substituting  $R_F$  with (2.7) and assuming  $\zeta \approx 0.7$ . In this equation, for a given  $BW_{-3dB}$ , GBW, and  $V_{GS}$ , the only variable is the input capacitance of the  $M_1$ ,  $C_I$ , which is proportional to the size of it. It can be realized from this equation that there is an optimum  $C_I$  for which the input-referred rms noise is minimized. Therefore, by taking derivative with respect to  $C_I$  and setting the result to zero yields

$$C_I = C_{PD} (1 + \frac{f_T}{\Gamma GBW})^{-1/2}$$
(3.6)

that minimizes the total input-referred noise current.

We can also find an optimum  $C_I$  that can bring about a maximum transimpedance-

to-noise, that is

$$\frac{R_T}{I_{n,in}} = 0.12GBW \left( \pi^2 kTBW_{-3dB}^7 \left( \frac{C_T^3}{GBW} + \Gamma \frac{C_T^4}{f_T C_I} \right) \right)^{-1/2}.$$
(3.7)

By setting the derivative with respect to  $C_I$  to zero we can acquire

$$C_I \approx \frac{1}{3} C_{PD} \left(1 - \frac{1}{4} \frac{f_T}{\Gamma GBW}\right) \tag{3.8}$$

that maximizes  $R_T/I_{n,in}$ . For the above approximation we have used the first two terms of Maclaurin series, that is  $f(x) \approx f(0) + f'(0)x$ , and assumed that  $\Gamma GBW \gg f_T$ . We note that the assumption of  $\Gamma GBW \gg f_T$  is not always valid; however,  $\Gamma GBW$  is often greater than  $f_T$  and so makes the approximation useful with a small error. As an example, assuming  $\Gamma = 2$  and  $GBW = f_T = 30$  GHz, the optimum  $C_I$ from (3.6) and (3.8) results  $C_I$  of approximately  $0.8C_{PD}$  and  $0.3C_{PD}$ , respectively. Consequently, for this example, moving from the noise optimum to transimpedanceto-noise optimum corresponds to 12.7% increase in the input-referred noise but 38.4% enhancement in the transimpedance.

The above analysis reveals that the noise optimum slightly improves the noise performance compared to the transimpedance-to-noise optimum and therefore, yields a better SNR. However, this analysis relies on simplified long channel models and does not include second order effects. For instance, if we move from  $C_I \approx 0.3C_{PD}$  to  $0.8C_{PD}$ , for a constant  $V_{GS}$ ,  $f_T$  rolls off with the increase in the FET's width. Fig. 3.7 illustrates this phenomenon for two different  $V_{GS}$  voltages in a 0.11  $\mu$ m CMOS process. Therefore, the assumption of a constant  $f_T$  in (2.15) is not completely correct. As a result, the noise reduction partially fails due to the  $f_T$  droop with the increase in the width of the FET.

Moreover, the noise contribution of the source follower is ignored in the noise analysis. Also, there is usually a post amplifier that follows the TIA and contributes



Figure 3.7: The transition frequency  $(f_T)$  as a function of the FET's width for a constant  $V_{GS}$ .

to the ultimate output noise. For a given bandwidth, a larger  $C_I$  requires a smaller  $R_F$ , leading to a lower transimpedance gain. From the input-referred noise point of view, the lower the TIA gain is, the higher the source follower and post amplifier contribute in the noise. This is more pronounced in an application where  $R_F$  has to be smaller because of a larger bandwidth requirement or larger photodiode size. Besides, if we increase the size of the front-end FET to attempt to lower the noise, to achieve to the theoretical noise optimum, the raise in the front-end FET size increases the gate physical resistance and consequently the thermal noise contribution of the gate will increase.

For the aforementioned reasons, if we move from  $C_I \approx 0.3C_{PD}$  to  $0.8C_{PD}$ , the input-referred noise would practically be higher than the theoretical minimum noise. On the other hand,  $0.8C_{PD}/0.3C_{PD}$  results in 166% increase in the width of the frontend FET, and consequently its power consumption. Therefore, the transimpedanceto-noise optimization approach would certainly prove more beneficial, if other specifications such as power consumption and area are as important as noise minimization. Of course, with the transimpedance-to-noise optimization, the noise performance remains very close to the theoretical minimum noise and might even present a better noise performance in practice.

It is useful to quantitatively illustrate the variation of four important parameters, namely  $I_{n,in}$ , required  $A_0$ , required  $R_F$ , and corresponding  $R_F/I_{n,in}$  as a function of  $C_I/C_{PD}$  for the design constraints discussed in Section 3.1. While the desired minimum  $BW_{-3dB}$  is 300 MHz, we set it to 350 MHz and reserve 50 MHz margin for non-idealities and variations in the circuit implementation. The parasitic capacitance associated with the photodiode is 1 pF. However, since the TIA is fully differential and we are building our analysis on a half circuit model, the photodiode capacitance has to be doubled due to the virtual ground in the middle. In addition, we obtain the parasitic capacitance of the PAD and ESD to be 0.4 pF based on simulation. As a result, the total capacitance turns to be 2.4 pF in the half circuit model analysis. In addition,  $\Gamma \approx 2$  can be assumed for most modern CMOS technologies [33]. Moreover,  $f_T$  depends on the  $V_{GS}$  of the front-end FET and we assume it around 30 GHz which is obtained from a bias optimization discussed in the next Section.

Fig. 3.8 shows the aforementioned parameters variation as a function of  $C_I/C_{PD}$ . We make a few observations: (a) For the different values of GBW, the input-referred noise current is almost the same. This is mainly because GBW is so large that the corresponding term in (2.15) slightly changes. Also, the minimum noise happens for  $C_I$  smaller than  $C_{PD}$  as discussed earlier by an example and observed before in e.g., [38] and [36], here  $C_I \approx 0.8C_{PD}$ . (b) If we plug in (2.7) into (2.6), we can obtain the required DC open loop gain as

$$A_0 = \frac{\sqrt{2}}{2} \frac{GBW}{BW_{-3dB}}.$$
 (3.9)

The above equation reveals that a resistive shunt-feedback TIA with a large GBWand a required -3dB bandwidth that is much smaller than the GBW, demands a high



Figure 3.8: Numerical illustration of four attributes as a function of  $C_I/C_{PD}$  with different values of GBW for  $BW_{-3dB} = 350$  MHz,  $C_{PD} = 2.4$  pF,  $f_T = 30$  GHz, and  $\Gamma = 2$ . (a) Input-referred rms noise current calculated from (3.5), (b) required DC gain acquired from (3.9), (c)  $R_F$  obtained from (2.7), and (d) transimpedance-to-noise.

 $A_0$ . This is particularly important as for many pulsed Lidar receivers the required -3dB bandwidth is usually much smaller than the GBW. As a result, in our design, we should either employ multiple cascaded stages or a telescopic configuration so that  $A_0$  is boosted. While the former can operate with lower supply voltages, the stability might suffer, not to mention that it typically presents a larger parasitic input capacitance, and lower power supply rejection. (c) While having a larger GBW for a given  $BW_{-3dB}$  and  $C_{PD}$  does not reduce noise noticeably, it is beneficial toward achieving a larger  $R_F$  and consequently  $R_T$ , as it can also be realized from (2.7). (d) Finally yet importantly, it can be seen that a maximum peak in the transimpedance-to-noise occurs for  $C_I \approx 0.3 C_{PD}$ , verifying the approximation resulted in (3.8). This is because with a smaller  $C_I$ , the achievable  $R_F$  is large. On the other hand, with increasing  $C_I$ , at some point, before the noise optimum, the rate of the noise improvement begins saturating, culminating in a maximum peak in the transimpedance-to-noise. As discussed earlier, this not only boosts the transimpedance gain without loosing noise performance noticeably, but also decreases the front-end FET's size. This reduction in area is remarkable because most of the Lidar receivers incorporate photodiodes that have large parasitic capacitance such as avalanche photodiodes to achieve a high responsivity. Also, for a given  $V_{GS}$ , significantly less power is dissipated thanks to the smaller size of the front-end FET.

#### 3.3 Implementation

In this section, we go over the design and circuit implementation of the AFE incorporating a fully differential resisitive shunt-feedback TIA built upon the transimpedanceto-noise optimization approach discussed in Section 3.2.

#### 3.3.1 TIA

As evident from (3.8), the optimum size of the front-end FET depends on the  $f_T$  and GBW, and consequently  $V_{GS}$ . Therefore, the design of the TIA begins with finding an appropriate  $V_{GS}$ . Increasing  $V_{GS}$  can raise both GBW and  $f_T$ . The increase in  $f_T$  can result in a slightly smaller front-end FET size according to (3.8). Also, while a higher GBW does not improve noise performance noticeably, it can allow using a higher  $R_F$ , yielding a higher transimpedance. Nevertheless, arbitrarily increasing  $V_{GS}$  might not improve the TIA performance because of several reasons. First, non-idealities, especially velocity saturation prevents the full growth of the FET transconductance and consequently  $f_T$  and GBW in modern CMOS technologies. The  $f_T$  behavior of a low-threshold FET operating in saturation region with W/L = $100 \,\mu\text{m}/110 \,\text{nm}$  as a function of  $V_{GS}$  can be seen in Fig. 3.9a. We observe that  $f_T$  stops growing at  $V_{GS} \approx 0.6 \,\mathrm{V}$ . Second, a large  $V_{GS}$  imposes a high DC power consumption which might not be affordable. Third, besides the second-order effects, (3.9) reveals that a higher GBW requires a higher  $A_0$ ; however, a high  $A_0$  might not be feasible with a large  $V_{GS}$ . As shown in Fig. 3.9b, a larger  $V_{GS}$  reduces the intrinsic gain  $(g_m r_o)$  that is the maximum achievable  $A_0$ . On the other hand, a small  $V_{GS}$  can not provide enough transconductance for the correct functionality of the TIA. For these reasons, we can define a figure of merit (FoM) for the FET to find an optimum  $V_{GS}$  for the front-end FET of the TIA.

As we desire the FET to have a high speed  $(f_T)$ , high intrinsic gain  $(g_m r_o)$ , and minimum overdrive voltage  $(2I_D/g_m)$  for headroom considerations,  $FoM = f_T.(g_m r_o).(g_m/I_D)$  can be defined. The  $V_{GS}$  that yields the maximum of the FoMcan be an optimum biasing voltage if it provides sufficient overdrive voltage to alleviate threshold voltage variation, and a DC current that does not exceed the power consumption constraint. The FoM as a function of  $V_{GS}$  is plotted in Fig. 3.9a for the same FET. We observe that the maximum of FoM happens for  $V_{GS} \approx 0.4$  V. This



Figure 3.9: Variation of a few important bias-dependent parameters for a FET operating in saturation region in the 0.11  $\mu$ m CMOS process with  $W/L = 100 \,\mu$ m/110 nm. By defining  $FoM = f_T (g_m r_o) (g_m/I_D)$  for the front-end FET, we observe that the FoM becomes maximum for  $V_{GS} \approx 0.4$  V.

provides enough overdrive voltage for the low-threshold FET devices of the CMOS process that we have used. Ignoring second-order effects, the attributes shown on the y-axes of Fig. 3.9 are almost independent of W, the channel width of the FET. For instance, for the transition frequency we can write  $f_T \approx g_m/(2\pi C_{gs}) =$  $\mu_n C_{ox}(W/L)(V_{GS} - V_{TH})/(2\pi (2/3)WLC_{ox})$  which is independent of W.

With  $V_{GS} \approx 0.4$  V, the intrinsic gain is roughly 15 from Fig. 3.9b. However, a minimum DC open loop gain of 60 is required for the range of *GBW* from 30 GHz to 50 GHz as shown in Fig. 3.8b. To provide such a relatively high gain, we can either employ a multi stage or a telescopic configuration. However, the former suffers from poor stability and power supply noise rejection. To have an estimation on the achievable DC gain of a telescopic configuration we can write  $A_0 \approx g_{m,I}(g_{m,N}r_{o,N}r_{o,N}||g_{m,P}r_{o,P}r_{o,P})$  where N and P associate with the N and P FETs. Assuming identical transconductance and output resistance for the FETs, the estimated DC gain simplifies to  $A_0 \approx (g_m r_o)^2/2$ . It can be seen from Fig. 3.9b that despite the intrinsic gain,  $(g_m r_o)^2/2$  can potentially provide the required DC gain for the range of *GBW* from 30 GHz to 50 GHz at  $V_{GS} \approx 0.4$  V.

The next step in the design of the TIA is to find the optimum width of the FET. With  $V_{GS} \approx 0.4 \text{ V}$ , we obtain  $f_T \approx 30 \text{ GHz}$  from Fig. 3.9a. Also, (3.8) requires a known value for GBW to yield the optimum input capacitance and corresponding width. However, as as shown in Fig. 3.8d,  $C_I$  increases by only 3.3% with sweeping GBW from 30 GHz to 50 GHz. As a result, we choose  $C_I \approx 0.3C_{PD}$  to find the optimum FET size. In this regard, assuming  $C_{gs}$  as the only capacitance at the input, we can calculate the width of the front-end FET from  $W \approx C_{gs}/(2/3LC_{ox})$ . In practice, this width results a higher capacitance at the input because of the miller effect, although not significant due to the telescopic configuration. With known  $V_{GS}$ and W, we can readily determine DC current of the front-end FET.

Fig. 3.10 depicts the circuit implementation of the TIA with the telescopic configuration as the core amplifier. A common-mode feedback amplifier is required to compensate for the mismatch between N and P current mirrors. With consuming 0.2 V headroom across the tail current mirror and taking the optimum  $V_{GS} = 0.4 \text{ V}$ into account, the reference voltage for the common mode feedback amplifier becomes 0.6 V for  $V_{DD} = 1.8 \text{ V}$ . To ensure the stability of the common-mode loop, only a small fraction of the tail current of the telescopic core is controlled by the common-mode feedback amplifier. Fig. 3.11 shows the multiplier (M) effect on the stability of the



Figure 3.10: Implementation of the resistive shunt-feedback transimpedance amplifier based on the transimpedance-to-noise optimization approach. The red part of the schematic shows the common-mode feedback amplifier used to provide a proper DC voltage at the output.

common-mode loop. With a smaller M, the common-mode loop phase margin (PM)is higher and so the loop is more stable. However, since the common-mode loop gain is reduced, the DC voltage error between the terminals of the common-mode amplifier increases. Nevertheless, this has little impact on the functionality of the AFE as the DC output voltage of the TIA ( $V_{\text{TOP,N}}$ ) is sufficient to correctly drive the post amplifier, and the DC precision has negligible importance. Therefore, with M = 20, a sufficient common-mode loop gain is obtained, and the  $PM \approx 60^{\circ}$  provides enough stability for the common-mode loop. Also, the simulations shows that with M = 20, the PM remains above 54° over PVT variations and so common-mode stability is assured.

With all these taken into account, the TIA is implemented in the  $0.11 \,\mu m$  CMOS



Figure 3.11: The effect of the portion of the tail current of the TIA controlled by the common-mode feedback amplifier on the common-mode stability. M is the multiplier factor of the corresponding FET in the tail current source in Fig. 3.10.

process for the  $BW_{-3dB} = 350 \text{ MHz}$  and  $C_{PD} = 2.4 \text{ pF}$  with the values of the parameters shown in Table 3.1. In fact, with the optimum  $V_{GS} \approx 0.4$  V and optimum  $C_I \approx 0.3 C_{PD}$ , we obtain  $C_I \approx 0.7 \,\mathrm{pF}$ . This in turn yields the front-end FET size of  $(W/L)_I = 320 \,\mu m/0.11 \,\mu m$ . With these values, GBW of around 33 GHz is achieved and  $(W/L)_C$  is tuned to adjust the required DC open loop gain. Also, while the corresponding  $R_F$  is around 13 k $\Omega$ , we slightly reduce it to maintain an extra margin for the bandwidth requirement over PVT variations. The total DC current of the implemented TIA is 8.7 mA in which 7.8 mA, 0.85 mA, and 0.05 mA are consumed by the telescopic core, the source followers, and the common-mode feedback amplifier, respectively. The current mirrors are used in the cascode configuration and are biased with wide swing current mirrors to sustain the  $0.2 \,\mathrm{V}$  allocated voltage headroom. In addition, an internal Beta-multiplier current reference [78] is designed to appropriately bias the current mirrors. Fig. 3.12 depicts the wide swing current mirrors along with the Beta-multiplier current reference. Also, the layout of the front-end FETs are interdigitated to minimize mismatch related issues.

| Parameter | Value                                |
|-----------|--------------------------------------|
| $(W/L)_I$ | $320\mu\mathrm{m}/0.11\mu\mathrm{m}$ |
| $(W/L)_C$ | $40\mu\mathrm{m}/0.11\mu\mathrm{m}$  |
| $(W/L)_S$ | $10\mu\mathrm{m}/0.11\mu\mathrm{m}$  |
| $(W/L)_n$ | $2.4\mu\mathrm{m}/0.2\mu\mathrm{m}$  |
| $(W/L)_p$ | $3.6\mu\mathrm{m}/0.2\mu\mathrm{m}$  |
| $R_F$     | $12\mathrm{k}\Omega$                 |
| $R_{CM}$  | $20\mathrm{k}\Omega$                 |

Table 3.1: Parameters Values for the Designed TIA



Figure 3.12: The wide swing current mirrors along with a Beta-multiplier current reference.

#### 3.3.2 Post Amplifier (PA) and Push-Pull driver

With  $R_F = 12 \,\mathrm{k\Omega}$ , the differential TIA transimpedance gain becomes almost  $2R_F = 24 \,\mathrm{k\Omega}$  which is sufficient for detection with an acceptable SNR. However, to further relax the next stage noise and offset requirement, and reduce the chance of false detection, a post amplifier (PA) is used to further increase the gain. This also increases the symmetrical swing. In addition, for using the AFE with off-chip time discriminators, an output buffer is required to drive a low impedance load such as  $50\Omega$ . We note that the output of the buffer need not be matched to the  $50\Omega$  reference. In fact, as long as the time discriminator (TD) is well matched to the off-chip transmission line that is passing the signal, no reflection is occurred at the TD side, obviating the necessity for the buffer output impedance to be matched to the  $50\Omega$  reference [26]. This allows to the buffer to run with a higher output resistance, culminating in a larger swing with lower power dissipation.

Fig. 3.13a illustrates the implemented PA integrated with the output buffer. In spite of the conventional class-A buffers, we have employed a push-pull output stage or loosely speaking, a class-AB output driver that can decrease power consumption, increase swing, and enhance slew rate. Also, to further reduce power dissipation, highthreshold FET devices (nhvt and phvt) are used to decrease the quiescent DC current drawn from the power supply. In addition, a SLEEP port is used to lower the AFE power consumption to tens of  $\mu$ Ws when no operation is performed. To maximize the symmetrical swing, the outputs of the buffers are averaged and compared with a  $V_{DD}/2$  reference by an error amplifier. The error amplifier as depicted in Fig. 3.13b, adjusts the PA biasing current so that the outputs of the buffer swing around  $V_{DD}/2$ . With  $V_{DD} = 1.8$  V, the PA and buffer draw 2.1 mA and 11.6 mA current from the power supply, and provide 6.4 dB and 5 dB gain, respectively. Fig. 3.14 shows the frequency response of the AFE over different process corners, when  $V_{DD}$ and temperature is varied from 1.7 V and  $-40^{\circ}$ C to 1.9 V and  $+85^{\circ}$ C, respectively, verifying appropriate functionality of all blocks.

As mentioned in Section 3.1, the PSRR is important from signal integrity and stability points of view. In theory, the power supply noise should not appear in a fully differential amplifier. However, in reality the power supply noise does appear differentially and the source of it can be mainly decomposed into two parts. First, the power supply noise appears as a common mode at the output of the first stage (TIA) and then, due to mismatch within the second stage (PA), it is differentially amplified. In our proposed topology, this is not an issue as the power supply noise is highly suppressed in the first stage, although it can be in other amplifier topologies. In fact, in our design the power supply noise is attenuated by the cascode configuration and then it is further suppressed by the common-mode amplifier. Second, the power supply noise differentially appears at the output of the first stage due to the internal mismatch and is amplified by the differential gain of the second stage.

It is shown that the power supply noise gain at the output of a single stage amplifier due to the mismatch is equal to  $20 \log \delta$  where  $\delta$  is the mismatch coefficient [79]. For example, a 10% mismatch which can be achieved with a careful layout design results in -20 dB power supply noise gain (the negative sign means rejection). Therefore, with the 87.6 dB $\Omega$  (24 k $\Omega$ ) transimpedance gain at the first stage and assuming -20 dB for the power supply noise gain, we should expect a PSRR of around 107.6 dB in our design which satisfies the requirement. The PSRR performance becomes worse with increasing frequency as the signal gain drops and mismatch due to parasitic increases. In the layout phase of the implementation, we take advantage of interdigitating technique for all active and passive components to minimize the mismatch and enhance the PSRR performance.

It is also important to highlight the PSRR performance of a TIA built over a telescopic configuration in contrast to a TIA built over a multi-stage cascaded amplifier. Assuming the required open loop gain,  $A_0$ , is 64, we can achieve such a relatively



Figure 3.13: (a) Circuit implementation of the post amplifier (PA) and push-pull output buffer (b) the error amplifier used in the PA and push-pull buffer to maximize the output swing.



Figure 3.14: Simulation of the frequency response over the process corners with VDD and temperature variation.

high gain by either a telescopic configuration like our design, or for instance a three identical stage cascaded amplifier with a gain of 4 (12 dB) to provide  $4^3$ =64 (the number of stages has to be odd to form a negative feedback in a resistive shunt feedback TIA). Assuming the same  $\delta = 10\%$ , the power supply noise gain at the output of this three-stage amplifier will be +4 dB. This is because the power supply noise gain at the output of the first stage is -20 dB, but it is amplified by the gain of the following stages (24 dB). Therefore, in our example, the telescopic PSRR performance is 24 dB better than the cascaded counterpart which clearly demonstrates the advantage of a telescopic configuration and less number of stages in this regard. The difference is also more pronounced when a higher  $A_0$  is required.

#### **3.4** Experimental Results

The AFE was fabricated in a  $0.11 \,\mu\text{m}$  CMOS. Fig. 3.15 illustrates the die photograph of the fabricated chip. The chip consumes  $41 \,\text{mW}$  DC power from a  $1.8 \,\text{V}$ power supply. Different test setups were used to demonstrate frequency response, transient response, noise, and power supply rejection performance. It is a typical


Figure 3.15: Die photograph of the analog front-end (AFE) fabricated in a  $0.11\,\mu{\rm m}$  CMOS.

approach to characterize an optical receiver with a photodiode simulator [80–82]. In this approach, by virtue of an  $R_S$  in series with an input voltage source, a current source resembles. With Norton's theorem, the equivalent current is  $I_S = V_S/R_S$ . Also, coupling capacitors ( $C_c$ ) are used to isolate DC operating point of the AFE at the input and output. In addition, a capacitor at the input is used to represent the parasitic capacitance of the photodiode. We note that since the chip performance depends on the photodiode capacitance, it is important to include an accurate capacitance during the measurements. That is why we measured the scattering parameters of the capacitor on the fabricated PCB with a vector network analyzer (VNA). Then we extracted the capacitance from the admittance parameters which were converted by the scattering parameters. This ended up with  $C_{PD} = 1 \text{ pF}$ . Also, we measured the chip performance with a  $R_s = 1 \text{ k}\Omega$  and  $R_s = 10 \text{ k}\Omega$ , and the results were highly consistent with each other. So the input texture includes  $R_s = 1 \text{ k}\Omega$ ,  $C_c = 100 \text{ nF}$ ,



Figure 3.16: (a) Single-ended frequency response test setup, (b) differential frequency response and transient response test setup.

and  $C_{PD} = 1 \,\mathrm{pF}$  for the following test setups and experimental results.

To accurately measure the bandwidth and transimpedance gain overshoot in the frequency domain, Keysight N9918A VNA was used in a test configuration shown in Fig. 3.16a. In this setup, the unused ports at the input and output were terminated with 50  $\Omega$  loads to maintain the circuit symmetry. The results are shown in Fig. 3.17 for different  $V_{DD}$  voltages. The VNA power is set to the minimum (-45 dBm) to prevent gain saturation. The gain rolls off with  $\approx 12 \text{ dB/Oct}$ , verifying the second order behavior of the TIA. Also, the measured overshoot for  $V_{DD} = 1.8 \text{ V}$  is only 0.2 dB which demonstrates that the chip is running with  $\zeta$  close to 0.7, or under maximally flat frequency response. To measure the fully differential frequency response, we used Keysight M8195A arbitrary waveform generator (AWG) for producing and sweeping differential sinusoidal waveform at the input. A pair of wideband 30 dB attenuators were used to highly attenuate the signal at the input as the AWG amplitude cannot



Figure 3.17: Insertion gain of the AFE along with the input and output fixtures. The gain is almost flat within the -3dB bandwidth of about 340 MHz for  $V_{DD} = 1.8$  V.

be arbitrarily small. The measurement result is represented in Fig. 3.18, showing about  $99 \,\mathrm{dB}\Omega$  and  $340 \,\mathrm{MHz}$  transimpedance gain and  $-3\mathrm{dB}$  bandwidth, respectively.

For the transient experiment, the setup shown in Fig. 3.16b was used. Fig. 3.19 shows the transient response of the AFE captured by Keysight MSOS804 oscilloscope. The AFE differential outputs responses to a trapezoid pulse with 1 ns rise time and  $1.7 \,\mu\text{A}$  amplitude are demonstrated in Fig. 3.19a. It can be seen that pulse shape is preserved, and the differential outputs are highly matched. In addition, as shown in Fig. 3.19b, multiple large signals with different amplitudes were applied to the input and the corresponding differential output voltages were captured.

To measure the output noise of the AFE, the test setup illustrated in Fig. 3.20 was used. The rms noise voltage at the differential output of the AFE was measured by the histogram function of the oscilloscope as shown in Fig. 3.21. To capture the noise energy to the maximum extent, the oscilloscope bandwidth was set to maximum, that is 8.4 GHz, and the experiment was prolonged to reach more than  $10^6$  hits. With this setup, the measured rms differential output noise voltage is approximately 6.4 mV. This corresponds to 71 nA input-referred noise current which is calculated by dividing the output noise by the mid-band transimpedance gain of 90 k $\Omega$ .



Figure 3.18: The frequency response of the AFE. Transimpedance is flat within the -3dB bandwidth of 340 MHz and has gain of about 99dB (90 k $\Omega$ ).

Fig. 3.22 depicts the test setup that was used to measure the power supply gain. The bias tee provides the required DC voltage of 1.8 V for the AFE while enabling an ac path for the VNA to send RF power to the supply terminal of the AFE. It also provides at least 50 dB isolation between chip and power supply. We can drive (Appendix A) the equation from the scattering parameters to calculate the gain from the power supply terminal to one of the AFE outputs as

$$\frac{V_{\rm ON}}{V_{\rm sup}} = \frac{S_{21}}{1 + S_{11}}.$$
(3.10)

We can measure the power supply gain of the other terminal of the AFE by swapping the second port of the VNA with the  $50\Omega$  termination. Therefore, dividing the differential transimpedance gain by the differential power supply gain yields the PSRR. Fig. 3.23 demonstrates the measured PSRR of the AFE with and without decoupling capacitor. We observe that in the entire -3dB bandwidth, more than 87 dB PSRR is acquired when no decoupling capacitor is used. The measurement results are summarized and compared with prior arts in Table 3.2. Despite single stage voltage amplifiers that GBW is a good figure of merit and is mainly







Figure 3.19: (a) Differential outputs of the AFE responding to a trapezoid pulse with 1 ns rise time and approximately  $1.7 \,\mu\text{A}$  amplitude. (b) Differential output voltage response to large signal pulses with  $t_r = t_f = 1$  ns, pulse-width = 2 ns, and amplitude of  $7.2 \,\mu\text{A}$ ,  $13 \,\mu\text{A}$ ,  $19 \,\mu\text{A}$ ,  $25 \,\mu\text{A}$ , and  $30 \,\mu\text{A}$ .



Figure 3.20: Output noise measurement setup.



Figure 3.21: Noise performance of the AFE. Differential output noise is equal to  $6.4 \,\mathrm{mV_{rms}}$  which translates to 72 nA input-referred rms noise current. The bandwidth is set to the maximum limit to account for the high frequency components of the noise.



Figure 3.22: The test setup that was used to the measure power supply gain. The Bias Tee provides at least 50 dB isolation between the power supply and VDD terminal of the chip.



Figure 3.23: Power supply rejection ratio (PSRR) versus frequency. More than 87 dB PSRR is achieved within the -3dB bandwidth when no decoupling capacitor is used.

technology dependent, transimpedance-BW product of a TIA is not a good figure of merit as it depends on the operating BW [82]. Nevertheless, we can define a  $FoM = R_T \cdot BW_{-3dB}^2 \cdot C_{PD} / (i_{n,in} / \sqrt{\Delta f})$  that is less dependent to the BW [83] and includes the input referred-noise current spectral density. With this FoM, our proposed AFE outperforms the prior arts. It is important to mention that for the same bandwidth, a single-ended AFE can be loaded by twice as much photodiode's capacitance as a fully differential one can be. Also, a fully differential AFE consumes more than twice as much power as the single-ended counterpart does. However, a fully differential AFE, especially with our proposed topology achieves a much superior PSRR performance which is critical in terms of signal integrity and stability. Tacking this into account, our prototype consumes less DC power and occupies less area compared to other differential configurations except [84, 85]. However, [84] excludes the output buffer consumption which is the most power hungry block and achieves to a less transimpedance gain and a higher noise. Also, [85] is driven with a smaller photodiode capacitance and presents a higher noise, lower transimpedance gain, and lower PSRR.

| References                                      | [99]                      | [71]         | [84]             | [68]                      | [81]   | [74]        | [86]  | [85]                 | [75]        | This<br>work       |
|-------------------------------------------------|---------------------------|--------------|------------------|---------------------------|--------|-------------|-------|----------------------|-------------|--------------------|
| TIA topology                                    | R-Fb.*                    | R-Fb.*       | RGC <sup>†</sup> | R-Fb.*                    | C-M.‡  | R-Fb.*      | C-M.‡ | $\mathrm{BFD.}^{\$}$ | R-Fb.*      | R-Fb.              |
| Fully differential                              | Yes                       | No           | Yes              | Yes                       | No     | No          | No    | Yes                  | No          | Yes                |
| Estimated C <sub>PD</sub> (pF)                  | 1.5                       | 2            | 5                | 7                         | 2      | 0.5         | 2     | 0.5                  | 1.5         | 1                  |
| Bandwidth (MHz)                                 | 300                       | 640          | 255              | 230                       | 50     | 720         | 110   | $350^{	extsf{1}}$    | 281         | 340                |
| Transimpedance gain $(\mathrm{dB}\Omega)$       | 87.2                      | 78           | 90.4             | 100                       | 106    | 76.3        | 100   | 86                   | 86          | 66                 |
| PSRR (dB)                                       | NA                        | NA           | $40^{\parallel}$ | NA                        | NA     | NA          | NA    | 23**                 | NA          | >87#               |
| Input-referred noise $(pA/\sqrt{Hz})$           | $5.23^{\ddagger\ddagger}$ | 4.7          | 6.8              | $6.28^{\ddagger\ddagger}$ | 1.52   | 6.3         | 2.21  | 7.5                  | 4.68        | 3.67 <sup>‡‡</sup> |
| Max output swing (V)                            | NA                        | 0.65         |                  | NA                        | $\sim$ | NA          | >1.1  | NA                   | NA          | 1                  |
| ${ m FoM^{SS}}~({ m Hz^{1.5}\!	imes\!10^{20}})$ | 5.9                       | 13.8         | 6.3              | 16.8                      | 6.5    | 2.7         | 11    | 1.6                  | 5           | 28                 |
| Supply (V)                                      | 1.2                       | 1.2/3        | 3                | 3.3                       | 3.3    | 1.8         | 3.3   | 1.2                  | 3.3         | 1.8                |
| Power consumption<br>(mW)                       | 45                        | 114          | 3011             | 180                       | ×      | 29.8        | 21    | 24                   | 200         | 41                 |
| Silicon area (mm²)                              | $1.69^{***}$              | $0.58^{***}$ | 0.046            | 4***                      | 0.28   | $5.5^{+++}$ | 0.35  | 0.022                | $2.2^{***}$ | 0.023              |
| CMOS Process $(\mu m)$                          | 0.13                      | 0.13         | 0.35             | 0.35                      | 0.18   | 0.18        | 0.18  | 0.13                 | 0.18        | 0.11               |

Table 3.2: Performance Summary and Comparison with Prior Arts

\*Resistive-Feedback,<sup>†</sup>Regulated Cascode,<sup>‡</sup>Current-Mirror,<sup>§</sup>Boot Strap Fully Differential. <sup>¶</sup>Estimated from 0.7×bit-rate,<sup>||</sup>@20 MHz,\*\*@100 kHz,<sup>††</sup>@ entire BW. <sup>‡‡</sup> $i_{n,in}^{avg} = i_{n,in,rms}/\sqrt{BW_n}$ ,<sup>§§</sup> FoM =  $R_T.BW_{-3dB}^2.C_{PD}/(i_{n,in}/\sqrt{\Delta f})$ . <sup>¶¶</sup>Excluding output buffer,\*\*\*Including pads.<sup>‡††</sup>For 16 ch.s including pads.

### 3.5 Conclusion

An analog front-end (AFE) for pulsed ToF Lidar receivers was fabricated and tested in a  $0.11 \,\mu\text{m}$  CMOS process. The chip performance satisfies all the desired specifications discussed in the beginning of the chapter. The implemented prototype is based on a transimpedance-to-noise optimization approach for resistive shuntfeedback TIAs. This optimization yields a front-end FET size for which a significant reduction in area and power consumption is achieved compared to only noise optimization. On the other hand, the noise performance of the TIA with the transimpedanceto-noise optimization approach remains very close to the theoretical minimum noise of the TIA. The approach and the topology that is used in this work allows using a fewer number of stages that enables achieving an outstanding PSRR performance and obviating the necessity for an offset cancellation circuitry. Also, the analytical derivations in this paper help designers to quickly find out the performance boundaries and conditions without a circuit implementation and simulation.

### CHAPTER IV

# A New TIA Topology: Push-Pull Regulated Cascode TIA

### 4.1 Introduction

In the previous chapter, different flavors of resistive shunt feedback TIA were used for communication and sensing applications. Another topology that can potentially be used as a TIA is a common gate (CG) amplifier as its input resistance is typically low and therefore can possibly provide a wide bandwidth. Fig. 4.1 depicts a CG TIA at which the signal current is applied to the source of the FET. In this configuration, most of the signal current is absorbed by the source of the FET and appears with the same level at the drain of the FET. Therefore, the low frequency transimpedance



Figure 4.1: The common gate (CG) configuration as a potentially low noise wide bandwidth TIA.

gain turns out to be

$$R_T \approx R_D. \tag{4.1}$$

In addition, the low frequency input resistance of the CG TIA is equal to

$$R_i \approx \frac{1}{g_m}.\tag{4.2}$$

Assuming the load has little capacitance, the input resistance of the TIA along with photodetector capacitance determines the bandwidth. Moreover, ignoring the channel length modulation, only the bottom and top resistors contribute to the noise. As a result, the low frequency input-referred noise current spectral density equals

$$i_{n,in}^2 = \frac{4kT}{R_S} + \frac{4kT}{R_D}.$$
(4.3)

Also, for the DC power consumption we can write

$$P_{DC} = V_{DD}I. (4.4)$$

In the CG configuration, we prefer to increase  $R_S$  as much as possible for two reasons: first, this prevents the transimpedance gain drop by  $R_S$  and second, as evident from (4.3) it reduces the input referred-noise current. However, the rise in the  $R_S$  is constrained by the voltage headroom consumption of  $R_S$ , that is  $R_S I$ . Nevertheless, the use of  $R_S$  for the bias is inevitable. To utilize the  $R_S$  for the signal amplification, we propose the topology shown in Fig. 4.2 which we call push-pull common gate (PPCG), recalling push-pull power amplifier. This configuration halves the input resistance, doubling the bandwidth. In fact, assuming the same transconductance for the N and P devices, i.e.,  $g_{mn} = g_{mp} = g_m$ , and  $R_{DP} = R_{DN} = R_D$ , we can list the same parameters obtained for the CG configuration as

$$R_T = \frac{R_D}{2} \tag{4.5}$$

$$R_i = \frac{1}{2g_m} \tag{4.6}$$

$$i_{n,in}^2 = \frac{4kT}{R_{DP}} + \frac{4kT}{R_{DN}}$$
(4.7)

when biased with the same current of I. Also, to have the voltage headroom consumed by  $R_{DP}$  and  $R_{DN}$  identical to those of  $R_S$  and  $R_D$ , the supply voltage has to be increased by an overdrive voltage. Allocating  $0.1V_{DD}$  overdrive voltage for  $M_P$ increases the DC power consumption to

$$P_{DC} = (V_{DD} + V_{OD,P})I = 1.1V_{DD}I.$$
(4.8)

Therefore, if we get the signal at the drain of either  $M_N$  or  $M_P$ , the proposed TIA achieves the same noise, with twofold bandwidth two times smaller transimpedance gain, at the cost of 10% extra power consumption. Of course, if we add the signals at the drain of  $M_N$  and  $M_P$ , the transimpedance gain remains unchanged.

One may argue that doubling the bandwidth can be achieved with the conventional CG without sacrificing the gain. However, to double the bandwidth, the transconductance has to be doubled which requires a four times higher bias current. In other words, using the long channel transistor model for the transconductance,  $g_m = \sqrt{2\mu_n C_{ox} \frac{W}{L}I}$ , the current has to be 4x to double the bandwidth. Therefore the previous specs obtained for the CG TIA turn out to be

$$R_T = R_D \times \frac{1}{4} \tag{4.9}$$

$$R_i = \frac{1}{2g_m} \tag{4.10}$$



Figure 4.2: The schematic of the push-pull common gate (PPCG).



Figure 4.3: The bias current has to be 4 times higher in the conventional CG to provide the same input resistance.

$$i_{n,in}^2 = \frac{4kT}{R_S} \times 4 + \frac{4kT}{R_D} \times 4 \tag{4.11}$$

$$P_{DC} = V_{DD}I \times 4. \tag{4.12}$$

It is therefore obvious that the proposed TIA certainly outperforms the CG TIA.

#### 4.2 Push-Pull Regulated Cascode TIA

While the CG TIA is simple and stable, it is rarely used because of poor noise performance. In other words, increasing  $R_D$  in (4.3) to attempt to reduce the input-



Figure 4.4: The schematic of the regulated cascode (RGC) TIA.

referred noise current is constrained by voltage headroom consumption, despite the resistive shunt feedback TIA where there is no DC current flow across the feedback resistor. On the other hand, to increase the bandwidth, it is desired to increase the DC current to decrease the input resistance which subsequently imposes further limitations on the voltage headroom consumed by  $R_D$ . To alleviate this, a regulated cascode (RGC) TIA is commonly used [76]. Fig. 4.4 shows the schematic of the RGC TIA. In this configuration, instead of having a constant voltage at the gate of  $M_1$ , a voltage with a negative sign that is proportional to the source voltage of  $M_1$ is provided, reducing the input resistance. In other words, assuming  $M_B$  and  $R_B$ provide a gain of  $A = -g_m R_B$ , it is straight forward to show that

$$R_i = \frac{1}{(1+A)g_m}.$$
(4.13)

In fact, the bandwidth of the CG is widened by a factor of (1 + A). Therefore, we can reduce the bias current of the TIA to achieve the same bandwidth as the CG counterpart, allowing more voltage headroom for  $R_D$ .

Likewise, we can take advantage of this technique over the proposed push-pull TIA. Fig. 4.5 depicts an implementation of this idea where the two diodes are used



Figure 4.5: The schematic of the proposed push-pull regulated cascode (PPRGC) TIA.

to provide proper bias voltages for  $M_1$  and  $M_2$ . Similarly, we can readily prove that

$$R_i = \frac{1}{2(1+A)g_m} \tag{4.14}$$

assuming the same transconductance for N and P devices. In this circuit,  $V_{ON}$  and  $V_{OP}$  are added to prevent the transimpedance loss of (4.5). Fig. 4.6 illustrates the implementation of the proposed PPRGC TIA in a CMOS process where the biasing diodes are realized by diode connected FETs.

### 4.3 Experimental Results

As a proof of concept, a prototype was fabricated in a 130 nm CMOS process. Fig. 4.7 shows the chip photo where the TIA core occupies roughly  $20 \,\mu m \times 20 \,\mu m$ in area. We can characterize a TIA with a photodiode simulator [80–82]. Fig. 4.8a shows the network used to convert the voltage to the current. On the other hand,



Figure 4.6: An implementation for the PPRGC TIA.



Figure 4.7: The PPRGC prototype fabricated in a 130 nm CMOS process. The TIA core occupies about  $20 \,\mu\text{m} \times 20 \,\mu\text{m}$  in area.

an open drain test buffer is used to provide the drive capability for the TIA core as shown in Fig. 4.8b. At a supply voltage of 1.2 V, the chip draws 1.13 mA, consuming 1.36 mW of DC power including the test buffer. Fig. 4.9 demonstrates the output response to a PRBS-31 with a data rate of 2 Gbps and 4 Gbps. There is a slight closure in the 4 Gbps eye which is due to unequalized loss and dispersion of long coaxial cables. The output noise is also measured by the same oscilloscope as shown in Fig. 4.10. The measured standard deviation of the output noise is 646  $\mu$ V. The oscilloscope noise is measured to be 233  $\mu$ V. Therefore, the TIA output noise becomes  $\sqrt{646^2 - 233^2} \mu V = 602 \mu V$ . The low frequency transimpedance gain of the TIA was also measured to be about  $0.5 \, \mathrm{k}\Omega$ . The output noise and the transimpedance



Figure 4.8: (a) An RC network to convert the voltage to current for the test of the TIA (b) the test buffer connected to the output of the core TIA to drive  $50 \Omega$  test equipment.

gain results in an input-referred rms noise current of  $1.2 \,\mu$ A. However, the simulation shows an input-referred noise current of about  $0.7 \,\mu$ A. The test buffer not only adds extra noise at the output, but also decreases the transimpedance gain, increasing the input-referred noise current.





Figure 4.9: (a) Output response to a PRBS data with the data rate of 2 Gbps (b)  $4\,{\rm Gbps}$  .



Figure 4.10: The output noise measured by the histogram of the oscilloscope.

### CHAPTER V

# A CMOS Sensor for Measuring Parasitic Capacitance of On-Chip Photodetectors

### 5.1 Introduction

The parasitic capacitance of photodetectors is a critical parameter that determines the bandwidth of the transimpedance amplifiers (TIA) and other specifications. Knowing the precise amount of the capacitance allows accurate design and optimization of the TIAs. However, the capacitance measurement of on-chip photodetectors is challenging. Conventionally, the capacitance of the photodetector is characterized by measuring scattering parameters using vector network analyzers. However, the test setup introduces errors, even with a calibration, in the measurement of the capacitance of photodetectors that have low capacitance. To reduce this error often times big photodetectors are fabricated and characterized. The smaller photodetectors capacitance is then estimated based on the bigger photodetector. However, this process costs a large silicon area and still leaves systematic errors in the measurement. Therefore, it is highly desirable to develop low cost techniques by which we can measure the actual photodetector capacitance.

Different techniques in the past have been used for capacitance measurement in a wide range of applications such as interconnects [87], mismatch between small capac-

itors [88], accelerometers [89,90], gyroscopes [91], biological sensors [92,93], humidity sensors [94], and displacement sensors [95]; however, no work has been reported for the on-chip measurement of the capacitance of photodetectors. In this dissertation, a versatile CMOS sensor is proposed that outputs a DC voltage that is proportional to the capacitance of the photodetector. The sensor can measure the capacitance of the photodetectors with either cathode or anode connected to the TIA.

Fig. 5.1a shows a shunt feedback TIA with a common anode (CA) photodetector. This configuration is used when the DC voltage provided by the shunt feedback resistor is sufficiently high to fully reverse bias the photodetector. The parasitic capacitance of the photodetector consists of the junction capacitance of the diode  $(C_J)$ and the capacitance of the cathode terminal to the substrate  $(C_M)$  which includes n-type fingers and the corresponding metal capacitance to the substrate. The photodetector capacitance that directly impacts the TIA bandwidth and noise is  $C_J + C_M$ . If the DC voltage provided by the TIA is not enough to completely reverse the bias of the photodetector, a common cathode (CC) configuration with an external voltage of  $V_{PD}$  as shown in Fig. 5.1b should be used. Likewise,  $C_J + C_M$  in this configuration impacts the TIA bandwidth and noise as  $V_{PD}$  is ac ground. Therefore, in either case, the ultimate goal is to precisely measure the  $C_{PD} = C_J + C_M$ .

#### 5.2 Photodetector Capacitance Meter

Fig. 5.2a and Fig. 5.2b depict the configurations to measure a CC and CA photodetector capacitance, respectively in which  $\phi_1$  and  $\phi_2$  as shown in Fig. 5.3 are two non-overlapping periodic voltages generated by a clock source with the period of  $T_{CK}$ . When  $\phi_1 = \phi_2 = 0$ , the photodetector capacitors are charged by  $V_{ref}$ . Conversely, when  $\phi_1 = \phi_2 = V_{ref}$ , the photodetector capacitors are discharged. The voltage levels of  $C_J$  and  $C_M$  of Fig. 5.2b can be seen in Fig. 5.3. For this configuration



Figure 5.1: (a) A shunt feedback TIA with a common anode (CA) photodetector (b) a shunt feedback TIA with a common cathode (CC) photodetector.

we can write the average power dissipated on the  $V_{ref}$  as

$$P_{avg} = V_{ref} I_{avg} = \frac{1}{T_{CK}} \int_{0}^{T_{CK}} V_{ref} I_{ref}(t) dt$$

$$= \frac{1}{T_{CK}} \int_{t_1}^{t_2} V_{ref} (C_J \frac{dV_{CJ}}{dt} + C_M \frac{dV_{CM}}{dt}) dt = f_{CK} C_{PD} V_{ref}^2.$$
(5.1)

Therefore, we can find the photodetector capacitance as

$$C_{PD} = \frac{I_{avg}}{f_{CK}V_{ref}}.$$
(5.2)

For the CA configuration, we can simply set  $V_{PD} = 0$  which results in the same capacitance obtained in (5.2).



Figure 5.2: The test setup used to measure the parasitic capacitance of a (a) CA photodetector (b) CC photodetector.



Figure 5.3: Non-overlapping voltages generated by a clock source with the frequency of  $f_{CK}$ .

While in this technique we can obtain the unknown capacitance of the photodetector by measuring the average current drawn from  $V_{ref}$  and knowing the clock frequency and the reference voltage, it is hard to measure the average current on the chip. It is therefore desired to convert this average current to a voltage. As shown in Fig. 5.4 the proposed circuit to convert the average current to voltage for both configurations is indeed a TIA with a very bandwidth. Assuming a very high DC gain for the core amplifier  $(A_0 \gg 1)$ , it is straightforward to show that

$$I_{avg} \approx \frac{V_{out} - V_{ref}}{R_F}.$$
(5.3)

Plugging (5.3) in (5.2) yields

$$C_{PD} \approx \frac{V_{out} - V_{ref}}{f_{CK} V_{ref} R_F}.$$
(5.4)

The converted voltage is then followed by a unity gain buffer. The buffer provides a constant and known load with little capacitance for the TIA. As a result, the stability of the TIA is ensured.

It is of utmost importance for the differential core amplifier of the TIA to have a very high DC gain to make sure that the negative terminal of the amplifier has almost the same voltage as  $V_{ref}$ . Otherwise, a systematic error is introduced in the capacitance measurement. Fig. 5.5 shows the schematic of the core amplifier. Seven identical FETs with maximum channel length are connected in series to boost the output resistance and subsequently the DC gain. Also, the gate of the NFET tail current source,  $M_B$ , is connected to the gate of the PFET diode connected devices. The negative feedback provided in this way provides a proper current, obviating the need for an external current source. Fig. 5.6 shows the frequency response of the amplifier over the process and temperature corners, indicating a gain of about 60 dB.

The offset of the amplifier can also introduce a random error. To alleviate this issue, the FETs are interdigitated in the layout and their dimensions are largely sized to reduce the mismatch. Also, the series connection of the FETs to increase the output resistance further the random mismatch to a further extent. As shown in Fig. 5.7, the input-referred offset voltage distribution of the amplifier is simulated using Monte Carlo. It is evident that  $3\sigma < \pm 3$  mV.

Fig. 5.8 demonstrates the transient output voltage of the implemented circuit



Figure 5.4: (a) Shunt feedback TIA and the first post amplifier (PA1) (b) the simplified model of the TIA.

with  $R_F = 100 \text{ k}\Omega$ . For this simulation, the clock frequency is set to 100 MHz while  $V_{ref} = 1 \text{ V}$ . In this simulation, a known capacitance of 20 fF is used. Plugging the output voltage at the steady state, that is 1.2 V into (5.2) results in the same capacitance of 20 fF. Fig. 5.9 represents the value of  $C_{PD}$  distribution over process and mismatch. The mean value of the capacitance is 20.5 fF while the standard deviation is about 1 fF. This in turn makes the  $\pm 3\sigma = \pm 3$  fF which is roughly  $\pm 15\%$  of the photodetector capacitance. The amplifier has little impact on this uncertainty owing to the high DC gain and low input-referred offset voltage, as discussed earlier. However, the feedback resistor in the fabricated process has a tolerance of  $\pm 15\%$  which is indeed the primary source of uncertainty in the proposed capacitance meter.



Figure 5.5: The schematic of the differential amplifier used in the proposed capacitance meter circuit.



Figure 5.6: The frequency response of the amplifier over the process and temperature variations.



Figure 5.7: The input-referred offset voltage distribution of the amplifier.



Figure 5.8: The transient output voltage of the proposed photodetector capacitance meter.



Figure 5.9: The photodetector capacitance distribution over the process and mismatch variations. A  $3\sigma \approx \pm 15\%$  is achieved.

### CHAPTER VI

### **Reflection-based Short Pulse Generation in CMOS**

### 6.1 Introduction

In conventional pulse generation techniques, the pulse width is limited by the switching time of the transistors. This is because in these approaches the transistors are required to turn ON and OFF to generate the output pulses. For instance, an XOR gate with square wave inputs that are delayed with respect to each other can generate a rectangular pulse whose width is equal to the delay. However, the shortest pulse width is proportional to the -3dB bandwidth of the XOR gate which is far less than  $f_T$  of the process. This becomes sever when the pulse needs to be delivered into a low-impedance load such as 50  $\Omega$ .

In order to generate pulses beyond the process switching time, a non-linear transmission line can be used to form a soliton [96]. Likewise, non-linear transmission lines can be extended to a two-dimensional lattice to compress further for sharper pulse generation [97]. In [98], a short pulse is generated with interfering several frequency



Figure 6.1: An XOR gate can generate a pulse whose width is determined by the delay.

harmonics beyond  $f_{max}$  of the process. While the aforementioned techniques generate decent pulses, they occupy a large chip area. Moreover, the use of a high frequency VCO/PLL at the input of these circuits is inevitable which results in high power consumption. In addition, there are other techniques for a short pulse generation, such as the spark gap inspired radiator [99]. However, these methods are not suitable for non-radiating applications.

In this Chapter, we introduce a technique inspired by time domain reflectometry (TDR) from a short-circuited termination to generate short pulses that can efficiently be delivered to low impedance loads. In this method, a transistor is used to pump current into a transmission line to form the forward-traveling edge of the pulse. This edge travels along the transmission line and reflects back from the ac-ground termination at the end of the line with  $180^{\circ}$  phase shift. The interference of the two forward- and backward-traveling edges results in a pulse, and the pulse width is ideally set by the length of the transmission line. The reflection-based short pulse generation (RSPG) technique can be used to deliver the short pulse into a resistive on-chip or off-chip load, an antenna, or another stage such as a power combiner for amplitude amplification. As a proof of concept, we fabricated two chips with two different lengths of transmission lines in a 0.11 µm low cost CMOS process.

The rest of the chapter is organized as follows. Section 6.2 reviews the theory and implementation of the reflection-based pulse generation (RSPG) technique. Section 6.3 presents the measurement results. Section 6.4 concludes the chapter.

### 6.2 Reflection-based Short Pulse Generation (RSPG) Theory and Implementation

In the well-know technique of TDR, a particular waveform is formed depending on the transmission line characteristics and the termination impedance or the discontinuities in the line. One of the ideal situations in which a rectangular pulse can be formed is illustrated in Fig. 6.2a. In this schematic, once the switch is closed, a voltage jump proportional to the series impedance of the DC voltage source and the characteristic impedance of the transmission line develops. The abrupt voltage travels along the line and reflects back in the opposite direction with an identical amplitude but a negative sign due to the ac-ground termination. If the source resistance is equal to the transmission line characteristic impedance, no additional reflection is formed. Therefore, ideally, a rectangular pulse with  $V_{\rm DC}/2$  amplitude is generated. Moreover, the pulse width which is the round-trip time is given by

$$\tau = 2\frac{\ell}{v_p} \tag{6.1}$$

where  $\ell$  is the length of the transmission line and  $v_p$  is the wave propagation velocity.

Although this scheme can generate a voltage pulse at the output node of  $V_{\text{out}}$ , it is unable to efficiently deliver it to low impedance loads such as 50  $\Omega$ . With a minor change at the input source we can deliver the generated pulse to a low impedance load. Fig. 6.2b represents a scheme in which a current source with a matched shunt resistor is used rather than a voltage source in a series connection with a matched resistor. Here,  $R_s$  can be used as the low impedance load, and the pulse level across it would be ideally  $Z_0 I_{\text{DC}}/2$ .

Fig. 6.3a depicts a circuit realization of Fig. 6.2b. Whenever  $M_{\rm RSPG}$  is OFF (ON) and a rising edge (falling edge) step voltage with amplitude greater than the transistor threshold voltage is applied to the gate, the transistor serves as a switched current source. The exact implementation of Fig. 6.2b requires a PMOS transistor as the current switch. However, an n-type transistor is faster and hence preferred. This means we need to bias the transistor drain using a  $V_{\rm DD-RSPG}$  supply at the end of the transmission line which acts as an AC ground at the same time. A DC



Figure 6.2: Pulse generation due to an ac-ground termination with: (a) voltage source input in series with a resistor; (b) current source input in parallel with a resistor.

blocker capacitor can be used to remove the DC offset of the pulse. Also, a piece of transmission line is needed to carry the generated pulse at the drain of  $M_{\rm RSPG}$  to  $R_s$  which is a low impedance load. It should be noted that all the lines should have characteristic impedance equal to  $R_s$  to avert undesired reflections.



Figure 6.3: (a) Circuit realization of Fig. 6.2b, (b) the current injected by  $M_{\rm RSPG}$ ,  $I_{\rm inj}$ , and (c) simple AC modeling of the pulse amplitude and width produced by the RSPG technique where realistically rise-time of the  $I_{\rm inj}$  is non-zero.

The width of the pulse produced by the RSPG technique is ideally determined by the length of the transmission line,  $\ell$ . However, the pulse width in reality is wider than (6.1) as the injected current ( $I_{inj}$ ) by  $M_{RSPG}$  has non-zero rise-time/fall-time. Fig. 6.3b shows a simple AC modeling of  $I_{inj}(t)$  when a falling step voltage is applied to the input of  $M_{RSPG}$ . Also, Fig. 6.3c depicts interference of the incident voltage,  $V_{inc}(t)$ , and reflected voltage,  $V_{ref}(t)$ , at the drain of  $M_{RSPG}$ . Considering the reflection preventive condition of  $R_s = Z_0$  we can write

$$V_{\rm inc}(t) = Z_0 \frac{I_{\rm inj}(t)}{2}$$
 (6.2)

and  $I_{\rm inj}(t) = (I_{\rm M}/t_r)t$  for  $t \leq t_r$  where  $t_r$  and  $I_{\rm M}$  are the rise-time and the maximum of  $I_{\rm inj}$ , respectively. For  $t > t_r$ , the injected current becomes constant and equal to  $I_{\rm M}$ . As a result,  $V_{\rm inc}(t) + V_{\rm ref}(t)$  yields the total voltage at the drain of  $M_{\rm RSPG}$ . We can then find the amplitude of the pulse for  $\tau \leq t_r$ ,

$$V_{\text{pulse}} = Z_0 I_{\text{M}} \frac{\tau}{2t_r} \tag{6.3}$$

and the FWHM of the pulse

$$\tau' = t_r. \tag{6.4}$$

The aforementioned modeling predicts that we can increase  $\ell$  to increase the pulse amplitude while keeping FWHM of the pulse constant. However, the model does not take second order effects such as frequency dependent dielectric/conductor loss and phase behavior of the transmission line into account. In reality the pulse with a longer length undergoes a higher attenuation and dispersion, resulting a slightly wider pulse width.

Also, we can observe that for  $\tau > t_r$ ,

$$V_{\rm pulse} = Z_0 \frac{I_{\rm M}}{2} \tag{6.5}$$

and

$$\tau' = \tau. \tag{6.6}$$

The above equation suggests that we can arbitrarily tune the FWHM of the pulse by adjusting the  $\ell$ , as mentioned earlier in the paper. Nevertheless, (6.6) is always greater than (6.4) (note the  $\tau > t_r$  condition), and since in this work we intend to employ the technique to produce the shortest pulse with sufficient amplitude in a process, we have chosen  $\ell$  such that the associated  $\tau$  is less than  $t_r$ .

Equations (6.3) and (6.4) reveal that a steep step voltage at the gate of  $M_{\rm RSPG}$  is desired to shorten  $t_r$  so as to increase  $V_{\rm pulse}$  and decrease  $\tau'$ . In addition, we like  $t_r$ to have the least sensitivity to the input step voltage or square wave. An amplifier preceding  $M_{\rm RSPG}$  can reduce the rise-time of the input, if the input is slow. Fig. 6.4a depicts the designed amplifier in which to make the edge sharper, the bridged shunt series peaking gain-bandwidth extension technique [100] is used. A custom finger capacitor and transmission line-based inductors are used in the amplifier to achieve high quality factor at higher frequencies.

Likewise, cascading multiple stages of amplifiers can improve the total gain bandwidth product. It can be shown that for a total gain of  $A_{tot}$ , the optimum number of stages that yields maximum bandwidth for the whole chain is given by  $N_{opt} = 2 \ln A_{tot}$ [101]. The improvement from N = 1 to  $N = N_{opt}$  monotonically increases. However, the rate of the improvement incrementing from  $(N_{opt} - 1)$  to  $N_{opt}$  is less than incrementing from  $(N_{opt} - 2)$  to  $(N_{opt} - 1)$ . Simulations demonstrate that cascading 6 stages of the amplifier shown in Fig. 6.4a yields the best transient response. We still can ameliorate the response by incrementing the number of stages beyond 6, however the difference is slight whereas the additional power consumption and area are remarkable. Fig. 6.6a shows the small signal frequency response of the 6-stage amplifier under  $V_{DD-AMP} = 1.2$  V and  $V_{inp,DC} \approx 0.62$  V conditions, achieving the unity gain frequency of 75 GHz ( $f_T \approx 80$  GHz). Fig. 6.6b demonstrates the transient voltage at the gate of  $M_{RSPG}$ , responding almost equally to a square wave voltage at the input of the amplifier chain with fast and slow rise-time/fall-time.



Figure 6.4: (a) The amplifier used for the sharpening of the input and (b) the schematic of the Reflection-based Short Pulse Generator (RSPG).


Figure 6.5: the die photographs of the fabricated chips.

The final implementation of the RSPG is shown in Fig. 6.4b. As discussed earlier,  $\ell$  mainly specifies the pulse characteristics. Thus, to provide a decent reflection from the  $V_{\rm DD-RSPG}$ , it is essential to make the connection pad a solid AC ground. For this purpose, a combination of a bank of MOS and MIM capacitors are used to bypass the inductance introduced by the bonding wire. The locations of the capacitors are highlighted in Fig. 6.5.

## 6.3 Measurement Results

Two chips are fabricated in a 0.11 µm CMOS process with ( $f_T \approx 80 \text{ GHz}$ ). The die photographs are shown in Fig. 6.5 for  $\ell = 150 \text{ µm}$  and  $\ell = 300 \text{ µm}$ . The chip area is  $1160 \times 540 \text{ µm}^2$  including the decoupling capacitor bank, pads, and seal-rings. Fig. 6.7b shows the test setup used for the measurement. Since  $M_{\text{RSPG}}$  in Fig. 6.4b acts as a current source,  $V_{\text{DD-RSPG}}$  does not significantly change the current level. Thus, in the measurement the supply is set to 1 V to reduce DC power consumption. On



Figure 6.6: (a) Frequency response of the 6 amplifiers used to sharpen the input, and (b) transient voltage at the gate of  $M_{\rm RSPG}$ , responding to square waves with  $t_r = t_f = 40$  ps and  $t_r = t_f = 200$  ps, respectively. It can be seen that thanks to the wideband amplifier chain, there is almost no difference at the output of the amplifiers when square waves with fast or slow rise-time/fall-time is applied.

the hand,  $V_{\rm DD-AMP}$  affects the performance by changing the effective slope of rising or falling edges at the gate of  $M_{\rm RSPG}$  so we set  $V_{\rm DD-AMP}$  to 1.2 V. Given these values, the chips consume 86 mW for a pulse repetition rate of 4 GHz.

The input waveform is produced using Keysight M8195A Arbitrary Wave Generator. The rise/fall time of the input was set to 40 ps, although the performance of the chip is not affected by this value as the 6 internal amplifiers sharpen the signal that is applied to the gate of  $M_{\rm RSPG}$ . The AWG is connected by a K-band cable to the input probe (Cascade GSG 67 GHz). On the other side, the output probe (Cascade GSG 67 GHz) is connected to Anritsu (50 kHz-65 GHz) BIAS TEE to remove the DC component via a 3-feet V-band cable (67 GHz). Keysight Digital Communication Analyzer (DCA-X86100D) wide-bandwidth oscilloscope is used to display and measure the pulse characteristics.

Fig. 6.8 displays one of the pulses tested on the chip with  $\ell = 150 \,\mu\text{m}$ . With the mentioned setup, full width at half maximum (FWHM) and amplitude of the pulse



Figure 6.7: Measurement of the pulse generator using a probe station. The long cable at the output introduce a great amount of dispersion and attenuation.



Figure 6.8: A single pulse captured for the chip with  $\ell = 150 \,\mu\text{m}$ . The amplitude and FWHM are  $\approx 0.21$ V and  $\approx 13 \,\text{ps}$ , respectively. The pulse is highly attenuated and dispersed mainly due to the 3-feet V-band cable used for the pulse measurement.

are measured to be 13 ps and 210 mV, respectively. In addition, the frequency of the input waveform is varied and the output pulse train is captured. Fig. 6.9a and Fig. 6.9b demonstrate pulse train for repetition rates of 4 GHz and 8 GHz, respectively. Amplitude and pulse width of the pulse remain almost constant.

Unfortunately, multiple blocks with limited cut off frequencies on the output rout of the test-setup reduce the amplitude and increase the pulse width. The major degradation is caused by the 3-feet V-band cable which presents frequency dependent loss [102].To extract the real signal we performed the following steps. Insertion loss of the 3-feet V-band cable and the DC blocker were measured by a wideband signal source/spectrum analyzer and a VNA. In addition, time domain impulse response of the DCA was measured and transferred to frequency domain. Moreover, the probe was characterized by the scattering parameters provided in the datasheet. Fig. 6.10a illustrates the transfer function of the output fixture that limits the measurement. Applying inverse Fourier Transform over the multiplication of the measured data by



Figure 6.9: Output response with a repetition: (a) 4 GHz and (c) 8 GHz. The amplitude and pulse width are kept almost constant over frequency change.



Figure 6.10: (a) Frequency response of the probe, 3-feet V-band cable, DC blocker, and the DCA. This is achieved by multiplication of the FFT of the measured impulse response of the DCA by the measured and fitted transfer function of the rest of the fixture.(b) Estimated pulses after de-embedding.

|                                  | This work $\ell=300\mu m$  | This work $\ell = 150 \mu m$                                     | [96]                                                             | [97]                  | [98]                            |
|----------------------------------|----------------------------|------------------------------------------------------------------|------------------------------------------------------------------|-----------------------|---------------------------------|
| FWHM (ps)                        | $16^{*}~(8.8^{\dagger})$   | $13^{*}~(6.8^{\dagger})$                                         | 23*                                                              | 9.6 $(6.3^{\dagger})$ | 2.6                             |
| Pulse Peak<br>Power (mW)         | $3.7^{*}~(19.2^{\dagger})$ | $0.88^{*}~(7.7^{\dagger})$                                       | 18                                                               | 145.8                 | 0.46                            |
| Measurement<br>Domain            | Time                       | Time                                                             | Time                                                             | Time                  | Frequency                       |
| Pulse<br>Generation<br>Technique | Reflection-<br>based       | Reflection-<br>based                                             | Nonlinear<br>T-Line                                              | Lattice               | Intefering<br>traveling<br>wave |
| Power<br>Consumption<br>(mW)     | $146^{\ddagger}$           | 86 <sup>‡</sup>                                                  | N/A                                                              | N/A                   | 435                             |
| DC-to-Peak<br>RF Efficiency      | $2.5^{*}~(13.1^{\dagger})$ | $1^{st}~(8.95^{\dagger})$                                        | N/A                                                              | N/A                   | 0.1                             |
| Area $(mm^2)$                    | 0.63                       | 0.63                                                             | N/A                                                              | 9                     | 2.53                            |
| Technology                       | $0.11\mu{ m m}$ CMOS       | $\begin{array}{c} 0.11\mu\mathrm{m}\\ \mathrm{CMOS} \end{array}$ | $\begin{array}{c} 0.18\mu\mathrm{m}\\ \mathrm{SiGe} \end{array}$ | 0.13 μm<br>CMOS       | 65 nm-LP<br>CMOS                |

| Table | 6.1: | Table | of | Com | parison |
|-------|------|-------|----|-----|---------|
|       |      |       |    |     | 1       |

\*Limited by the test set-up. <sup>†</sup>After de-embedding. <sup>‡</sup>Measured @ 4 GHz. P<sub>DC</sub> is reduced with decreasing the input repetition frequency.

the inverse of the transfer function of the fixture in frequency domain yields the deembedded signal in time domain. Fig. 6.10b represents the estimated pulses after de-embedding for  $\ell = 150 \,\mu\text{m}$  and  $\ell = 300 \,\mu\text{m}$ . It is worth mentioning that the passive structures were modeled in a 3D-EM CAD tool from DC to 800 GHz, and the simulation results are consistent with the de-embedding. Table 6.1 reviews the achieved performance by the RSPG technique.

#### 6.4 Conclusion

Two proof of concept prototypes were fabricated in a  $0.11 \,\mu\text{m}$  low cost CMOS process that demonstrate a low cost, compact, and systematic method for producing a pulse with short width and high amplitude. The pulse can be efficiently delivered to low impedance loads such as a resistive on-chip or off-chip load, an antenna, or another stage to boost the amplitude. Therefore, with this technique, multiple power combiners can be systematically used for increasing the amplitude without compromising the pulse width. The 1<sup>st</sup> order mathematical model presented in this chapter for estimating the pulse width and amplitude is highly consistent with simulations and experimental results.

## CHAPTER VII

# An Energy Efficient Fully Integrated 20 Gbps OOK Wireless Transmitter at 220 GHz

## 7.1 Introduction

Based on Shannon's channel capacity theorem, bandwidth and signal power determine the channel capacity limit. To achieve a high data-rate, a high bandwidth is necessary but not sufficient. This is because at higher bandwidths, the channel capacity limit is proportional to the signal power level. As a result, generating higher signal power at higher frequencies is of the utmost importance. However, power generation in mm-wave/terahertz bands in silicon is very challenging. In addition, modulation loss degrades the data-rate even more. In this regard, any reduction in loss and enhancement in the signal power helps to raise the data rate.

While high data rate transmitters have been reported in [103–106], a high level of integration and energy efficiency is desired for portable applications. In this work, the high frequency carrier is extracted from a harmonic oscillator. This enables removing external oscillators and frequency multipliers, resulting in a higher efficiency. In addition, an on-chip antenna is used to transmit the modulated signal. Likewise, this integration helps in reduction of the loss in the modulated signal path. Moreover, we have designed the baseband modulator in a way that consumes zero DC power.



Figure 7.1: Block diagram of the energy efficient fully integrated on-off keying transmitter.

# 7.2 Implementation

Fig. 7.1 shows the transmitter block diagram. First, the baseband data is amplified by a 7-stage buffer. Then, the output of the buffer is fed to a modulation switch which conducts the signal from the harmonic oscillator to the antenna, or blocks it from flowing to the antenna. That is, once the oscillator is turned on and settled, the second harmonic signal of the oscillator will be directed to the ground, or the antenna, for data transmission. In this approach, only an initial latency is required for the oscillator to settle. This yields a faster OOK modulation compared to the case where the oscillator itself is turned on and off with the baseband data.

The oscillator structure is represented in Fig. 7.2. A push-push configuration which is compact and can achieve high efficiency is adopted for harmonic signal generation [107]. The oscillator is designed to oscillate at fundamental frequency  $(f_0)$  of 110 GHz, and its second harmonic is extracted at 220 GHz. To boost the signal power, not only optimum phase is set between the gate-drain, but also source degeneration is exploited as explained in [107]. The source degeneration pushes the



Figure 7.2: Schematic of the harmonic oscillator.

transistors to operate more in triode region. This gives rise to a highly non-linear operation of the transistors, producing high harmonic signal power. The transmission lines at the drain determine the fundamental oscillation frequency. A quarter wave length transmission line at  $2f_0$  provides DC path to the oscillator.

A constant load shunt single-pole single-through (SPST) switch modulates the 2nd harmonic signal. The shunt SPST switch has a low insertion loss when transferring power. Also, the switch is designed in a way that it achieves the required functionality without connecting to supply, resulting in zero DC power consumption. As shown in the Fig. 7.3, in "ON" state, when Q1 and Q2 are off ( $V_{B1, 2}=0V$ ), a capacitive impedance looking into the collectors appear are tuned out with shunt inductors (TL1 and TL2). As a result, most of the 2nd harmonic current flows from Port1 to Port2. In "OFF" state when Q1 and Q2 are on ( $V_{B1, 2}=1V$ ), the low resistance of the collector-emitter conducts most of the 2nd harmonic current to the ground. In other words, the switch blocks the 2<sup>nd</sup> harmonic current flow into the antenna. Having a quarter wavelength impedance converter, the constant load switching will



Figure 7.3: Schematic of the shunt SPST modulation switch.



Figure 7.4: Insertion loss of the switch when transferring and blocking the signal.

be obtained. TL3 along with the parasitic capacitors of Q1 and Q2 helps to obtain a constant load at Port1 when Q1 and Q2 are in ON or OFF states. This in turn suppresses undesired disturbance in oscillation that could occur because of the varying switch impedance, resulting in less Inter Symbol Interference (ISI). TL4, quarter wave length at  $2f_0$ , isolates the 2<sup>nd</sup> harmonic current leakage into the buffer. The insertion loss at 220 GHz for the designed switch in the ON and OFF states are 2 dB and 14 dB, respectively.

To radiate the modulated second harmonic signal, a slot antenna comprised of the top metal of the process is employed. The antenna has a length of  $264 \,\mu\text{m}$  which is approximately half wavelength at 220 GHz. Fig. 7.5 depicts the antenna dimensions.



Figure 7.5: The on-chip slot antenna.

To prevent internal reflections of the radiated signal from the slot antenna, a high resistivity hemisphere silicon (Si) lens along with a  $280 \,\mu\text{m}$  high resistivity undoped silicon slab is attached to the fabricated chip.

#### 7.3 Experimental Results

Various measurement setups were used to characterize and test the transmitter. Fig. 7.6 is a photo of the wireless link measurement setup. A VDI Erikson PM4 power meter was used to capture the received power. By matching the Friis equation with the measured received power, the far field region was determined (Fig. 7.8). An EIRP of 2.8 dBm for the chip was measured given the free space path loss (FSPL), gain of the receiver antenna, and loss of the cables. To increase the signal power and subsequently range, we can coherently couple more oscillators [108–110] .Also, the measured pattern of the slot antenna along with the Si slab and lens is shown in Fig. 3.

Frequency domain and time domain test setups are shown in Fig. 7.7 and Fig. 7.11, respectively. A Keysight M8195A Arbitrary Waveform Generator (AWG) with 2-feet K-band cables were used to feed PRBS-9 baseband data to the TX. To minimize the baseband data attenuation, the chip was mounted on a PCB with 0.6mm thickness and Rogers 3003 substrate. The modulated signal was then captured using Rohde&Schwarz FS-Z325 harmonic mixer and VDI WR-3.4 diagonal horn antenna, as demonstrated in Fig. 7.12 and Fig. 7.13. Frequency spacing of 19.6 MHz and



Figure 7.6: The wireless transmitter test setup environment.



Figure 7.7: Spectrum measurement setup.

39.14 MHz can be seen corresponding to 10 Gbps and 20 Gbps input baseband data.

A VDI WR-3.4 ZBD (zero bias detector), the WR-3.4 diagonal horn antenna, and Teflon lens with a Keysight DCA-X 86100D wide-bandwidth oscilloscope were employed to represent time domain measurement of the transmitter. With  $5 \text{ pW}/\sqrt{\text{Hz}}$ , 2000 V/W, and  $6 \text{ k}\Omega$  corresponding to the equivalent noise power (NEP), responsivity, and output impedance of the WR-3.4 ZBD, respectively, and also  $1.9 \text{ nV}/\sqrt{\text{Hz}}$  as the input noise of the oscilloscope, the SNR drops 26 dB due to the large impedance mismatch between the ZBD output and oscilloscope input. This remarkable SNR drop causes the signal to fall below the minimum detectable signal level of the os-



Figure 7.8: The received power as a function of distance.



Figure 7.9: The radiated antenna pattern.



Figure 7.10: The simulated 3D radiation pattern of the antenna.



Figure 7.11: The time domain measurement setup.



Figure 7.12: Spectrum of the received signal at (a) 10 Gbps modulated signal (b) 20 Gbps.

cilloscope. Therefore, to increase the signal level, a cascade of a low noise amplifier and a broadband amplifier with 10GHz and 14GHz bandwidth were used to amplify



Figure 7.13: Eye diagram of the received signal at (a) 10 Gbps modulated signal (b) 20 Gbps.

the signal. In addition to the significant SNR drop, the dispersion of the K-band cables and PCB transmission lines, eye diagrams are adversely affected by the limited bandwidth of the cascaded amplifiers and their added noise. The eye diagrams of the received signal corresponding to 10 Gbps and 20 Gbps data rate are shown in Fig. 7.13.

# 7.4 Conclusion

A 20 Gbps OOK wireless transmitter prototype at 220 GHz was fabricated in a 55nm SiGe BiCMOS technology. Thanks to harmonic oscillator and zero power modulation switch and antenna integration, high energy efficiency is achieved. Attractive for short-range and portable high-speed communication links, with the measured 20 Gbps data-rate, the chip achieves a 3.15 pJ/bit energy efficiency. Table 7.1 summarizes performance of state-of-the-art designs.

| References           | [103]                          | [104]                   | [105]        | [106]        | This work<br>[111]     |
|----------------------|--------------------------------|-------------------------|--------------|--------------|------------------------|
| Freq. (GHz)          | 217                            | 100-120                 | 240          | 390          | 226                    |
| Modulation           | OOK dual<br>(X-Y)<br>polarized | 16QAM                   | QPSK         | BPOOK        | ООК                    |
| Data-Rate (Gbps)     | 24.4                           | 20                      | 16           | 28           | 20                     |
| Pdc (mW)             | 900                            | 520                     | 220          | 114          | 63                     |
| On-chip oscillator   | Yes                            | No                      | No           | No           | Yes                    |
| On-chip antenna      | Yes                            | No                      | Yes          | No           | Yes                    |
| EIRPTX (dBm)         | 21.1                           | -                       | 1            | -            | 2.8                    |
| Distance (cm)        | 10                             | 20                      | 1            | -            | 25                     |
| Efficiency (pJ/bit)  | 36.9                           | 26                      | 13.75        | 4.07         | 3.15                   |
| FOM<br>(EIRP/Pdc,TX) | 0.143                          | -                       | 0.006        | -            | 0.034                  |
| Area (mm2)           | 2.8                            | 3.17                    | 2            | 0.39         | 0.38                   |
| Technology           | 130nm<br>SiGe<br>BiCMOS        | 180nm<br>SiGe<br>BiCMOS | 65nm<br>CMOS | 28nm<br>CMOS | 55nm<br>SiGe<br>BiCMOS |

Table 7.1: Performance Summary and Comparison with Prior Arts

APPENDIX

# APPENDIX A

# Derivation of Power Supply Gain using S-Parameters

We can use the measured scattering parameters to drive an equation for the gain from the power supply terminal to the AFE output. Using the scattering parameters definition we can write [112]

$$V_1^- = S_{11}V_1^+ + S_{12}V_2^+$$

$$V_2^- = S_{21}V_1^+ + S_{22}V_2^+$$
(A.1)

where the subscripts and superscripts denote the port number and incident/reflected voltages at the corresponding port, respectively. We connect the first port and second port of the VNA to the supply terminal and output of the AFE, respectively. Considering the AFE is unilateral which is a valid assumption verified by measurement, we can find the voltage at the supply terminal as

$$V_{\rm sup} = (1 + S_{11})V_1^+. \tag{A.2}$$

Likewise, the voltage at the output of the AFE is

$$V_{\rm ON} = S_{21} V_1^+ \tag{A.3}$$

as the incident voltage at the second port is zero due to the  $50\Omega$  termination. Therefore, using (A.3) and (A.2), we can find the power supply gain equation used in (3.10).

# BIBLIOGRAPHY

# BIBLIOGRAPHY

- N. Jones, "How to stop data centres from gobbling up the world's electricity," Nature, vol. 561, no. 7722, pp. 163–167, 2018.
- [2] [Online]. Available: https://en.wikipedia.org/wiki/High\_Bandwidth\_Memory
- [3] M. Wade, E. Anderson, S. Ardalan, P. Bhargava, S. Buchbinder, M. L. Davenport, J. Fini, H. Lu, C. Li, R. Meade *et al.*, "Teraphy: a chiplet technology for low-power, high-bandwidth in-package optical i/o," *IEEE Micro*, vol. 40, no. 2, pp. 63–71, 2020.
- [4] B. Pezeshki, F. Khoeini, A. Tselikov, R. Kalman, C. Danesh, and E. Afifi, "Microled array-based optical links using imaging fiber for chip-to-chip communications," in *Proc. Opt. Fiber Commun. Conf.*, 2021. Optica Publishing Group, 2022, pp. W1E–1.
- [5] B. Pezeshki, F. Khoeini, A. Tselikov, R. F. Kalman, C. Danesh, and E. Afifi, "Led-array based optical interconnects for chip-to-chip communications with integrated cmos drivers, detectors, and circuitry," in *Optical Interconnects XXII*, vol. 12007. SPIE, 2022, pp. 31–34.
- [6] W. Van Heddeghem, S. Lambert, B. Lannoo, D. Colle, M. Pickavet, and P. Demeester, "Trends in worldwide ict electricity consumption from 2007 to 2012," *Computer Communications*, vol. 50, pp. 64–76, 2014.
- [7] D. A. Miller, "Attojoule optoelectronics for low-energy information processing and communications," J. Lightw. Technol., vol. 35, no. 3, pp. 346–396, 2017.
- [8] B. Pezeshki, A. Tselikov, R. Kalman, and C. Danesh, "Wide and parallel LEDbased optical links using multi-core fiber for chip-to-chip communications," in *Proc. Opt. Fiber Commun. Conf.*, 2021. Optical Society of America, 2021, pp. F3A–1.
- [9] F. Xu, Z. Jin, T. Tao, P. Tian, G. Wang, X. Liu, T. Zhi, Q.-a. Yan, D. Pan, Z. Xie *et al.*, "C-plane blue micro-led with 1.53 ghz bandwidth for high-speed visible light communication," *IEEE Electron Device Lett.*, 2022.

- [10] R. X. Ferreira, E. Xie, J. J. McKendry, S. Rajbhandari, H. Chun, G. Faulkner, S. Watson, A. E. Kelly, E. Gu, R. V. Penty *et al.*, "High bandwidth GaNbased micro-LEDs for multi-Gb/s visible light communications," *IEEE Photon. Technol. Lett.*, vol. 28, no. 19, pp. 2023–2026, 2016.
- [11] S. Rajbhandari, J. J. McKendry, J. Herrnsdorf, H. Chun, G. Faulkner, H. Haas, I. M. Watson, D. O'Brien, and M. D. Dawson, "A review of gallium nitride LEDs for multi-gigabit-per-second visible light data communications," *Semicond. Sci. Technol.*, vol. 32, no. 2, p. 023001, 2017.
- [12] A. Bhatnagar, S. Latif, and D. A. Miller, "Transit-time limited response from low capacitance cmos photodetectors," in *Conference on Lasers and Electro-Optics*. Optical Society of America, 2004, p. CThR2.
- [13] H. Kosaka, M. Kajita, Y. Li, and Y. Sugimoto, "A two-dimensional optical parallel transmission using a vertical-cavity surface-emitting laser array module and an image fiber," *IEEE Photon. Technol. Lett.*, vol. 9, no. 2, pp. 253–255, 1997.
- [14] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s Transceiver for Optical Interconnects," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1235–1246, 2008.
- [15] A. Sharif-Bakhtiar and A. Chan Carusone, "A 20 Gb/s CMOS Optical Receiver With Limited-Bandwidth Front End and Local Feedback IIR-DFE," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2679–2689, 2016.
- [16] I. Ozkaya, A. Cevrero, P. A. Francese, C. Menolfi, T. Morf, M. Brändli, D. M. Kuchta, L. Kull, C. W. Baks, J. E. Proesel, M. Kossel, D. Luu, B. G. Lee, F. E. Doany, M. Meghelli, Y. Leblebici, and T. Toifl, "A 64-Gb/s 1.4-pJ/b NRZ Optical Receiver Data-Path in 14-nm CMOS FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3458–3473, 2017.
- [17] S. Saeedi and A. Emami, "A 25Gb/s 170μW/Gb/s optical receiver in 28nm CMOS for chip-to-chip optical communication," in *Proc. IEEE Radio Freq. Integ. Circuits Symp. (RFIC)*. IEEE, 2014, pp. 283–286.
- [18] L. Szilagyi, J. Pliva, R. Henker, D. Schoeniger, J. P. Turkiewicz, and F. Ellinger, "A 53-Gbit/s optical receiver frontend with 0.65 pJ/bit in 28-nm bulk-CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 845–855, 2018.
- [19] K. C. Chen and A. Emami, "A 25-Gb/s avalanche photodetector-based burstmode optical receiver with 2.24-ns reconfiguration time in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 6, pp. 1682–1693, 2019.
- [20] M. G. Ahmed, D. Kim, R. K. Nandwana, A. Elkholy, K. R. Lakshmikumar, and P. K. Hanumolu, "A 16-Gb/s-11.6-dBm OMA Sensitivity 0.7-pJ/bit Optical Receiver in 65-nm CMOS Enabled by Duobinary Sampling," *IEEE J. Solid-State Circuits*, 2021.

- [21] H. Li, J. Sharma, C.-M. Hsu, G. Balamurugan, and J. Jaussi, "11.6 A 100Gb/s-8.3 dBm-Sensitivity PAM-4 Optical Receiver with Integrated TIA, FFE and Direct-Feedback DFE in 28nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64. IEEE, 2021, pp. 190–192.
- [22] T. Woodward and A. Krishnamoorthy, "1-Gb/s integrated optical detectors and receivers in commercial CMOS technologies," *IEEE J. Sel. Topics Quantum Electron.*, vol. 5, no. 2, pp. 146–156, 1999.
- [23] M. Atef, A. Polzer, and H. Zimmermann, "Avalanche double photodiode in 40-nm standard cmos technology," *IEEE J. Quantum Electron.*, vol. 49, no. 3, pp. 350–356, 2013.
- [24] P. Brandl, T. Jukić, R. Enne, K. Schneider-Hornstein, and H. Zimmermann, "Optical wireless apd receiver with high background-light immunity for increased communication distances," *IEEE J. Solid-State Circuits*, vol. 51, no. 7, pp. 1663–1673, 2016.
- [25] A. Bhatnagar, S. Latif, C. Debaes, and D. A. Miller, "Pump-probe measurements of cmos detector rise time in the blue," *J. Lightw. Technol*, vol. 22, no. 9, p. 2213, 2004.
- [26] B. Razavi, Design of Integrated Circuits for Optical Communications. John Wiley & Sons, 2012.
- [27] M. G. Ahmed, T. N. Huynh, C. Williams, Y. Wang, P. K. Hanumolu, and A. Rylyakov, "34-GBd linear transimpedance amplifier for 200-Gb/s DP-16-QAM optical coherent receivers," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 834–844, 2018.
- [28] K. R. Lakshmikumar, A. Kurylak, M. Nagaraju, R. Booth, R. K. Nandwana, J. Pampanin, and V. Boccuzzi, "A process and temperature insensitive CMOS linear TIA for 100 Gb/s/λ PAM-4 optical links," *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3180–3190, 2019.
- [29] W. Diels, M. Steyaert, and F. Tavernier, "1310/1550 nm optical receivers with schottky photodiode in bulk CMOS," *IEEE J. Solid-State Circuits*, vol. 55, no. 7, pp. 1776–1784, 2020.
- [30] M. Raj, Y. Frans, P.-C. Chiang, S. L. C. Ambatipudi, D. Mahashin, P. De Heyn, S. Balakrishnan, J. Van Campenhout, J. Grayson, M. Epitaux *et al.*, "Design of a 50-Gb/s hybrid integrated Si-photonic optical link in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 55, no. 4, pp. 1086–1095, 2020.
- [31] S. Daneshgar, H. Li, T. Kim, and G. Balamurugan, "A 128 Gb/s PAM4 Linear TIA with 12.6 pA/\sqrt{Hz} Noise Density in 22nm FinFET CMOS," in Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC), 2021, pp. 135–138.

- [32] K. Ogawa, "Noise caused by gaas mesfets in optical receivers," Bell System Technical Journal, vol. 60, no. 6, pp. 923–928, 1981.
- [33] E. Sackinger, "On the Excess Noise Factor Γ of a FET Driven by a Capacitive Source," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 9, pp. 2118–2126, 2011.
- [34] P. R. Gray, P. Hurst, R. G. Meyer, and S. Lewis, Analysis and design of analog integrated circuits. Wiley, 2001.
- [35] D. A. Johns and K. Martin, Analog integrated circuit design. John Wiley & Sons, 2008.
- [36] E. Sackinger, "On the noise optimum of FET broadband transimpedance amplifiers," *IEEE Trans. Circuits Syst. I*, vol. 59, no. 12, pp. 2881–2889, 2012.
- [37] F. Khoeini, B. Hadidian, K. Zhang, and E. Afshari, "A Transimpedance-to-Noise Optimized Analog Front-End With High PSRR for Pulsed ToF Lidar Receivers," *IEEE Trans. Circuits Syst. I*, vol. 68, no. 9, pp. 3642–3655, 2021.
- [38] A. A. Abidi, "On the noise optimum of gigahertz FET transimpedance amplifiers," *IEEE J. Solid-State Circuits*, vol. 22, no. 6, pp. 1207–1209, 1987.
- [39] B. Razavi, "The strongarm latch [a circuit for all seasons]," IEEE Solid-State Circuits Mag., vol. 7, no. 2, pp. 12–17, 2015.
- [40] C.-W. Lu, P.-Y. Yin, C.-M. Hsiao, M.-C. F. Chang, and Y.-S. Lin, "A 10bit resistor-floating-resistor-string DAC (RFR-DAC) for high color-depth LCD driver ICs," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2454–2466, 2012.
- [41] J. Kosman, O. Almer, T. Al Abbas, N. Dutton, R. Walker, S. Videv, K. Moore, H. Haas, and R. Henderson, "29.7 A 500Mb/s-46.1 dBm CMOS SPAD receiver for laser diode visible-light communications," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2019, pp. 468–470.
- [42] E. Fisher, I. Underwood, and R. Henderson, "A reconfigurable single-photoncounting integrating receiver for optical communications," *IEEE J. Solid-State Circuits*, vol. 48, no. 7, pp. 1638–1650, 2013.
- [43] M. Atef, R. Swoboda, and H. Zimmermann, "1.25 Gbit/s over 50 m step-index plastic optical fiber using a fully integrated optical receiver with an integrated equalizer," J. Lightw. Technol, vol. 30, no. 1, pp. 118–122, 2012.
- [44] Y. Dong and K. W. Martin, "A high-speed fully-integrated POF receiver with large-area photo detectors in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 9, pp. 2080–2092, 2012.
- [45] —, "A 4-gbps pof receiver using linear equalizer with multi-shunt-shunt feedbacks in 65-nm cmos," *IEEE Trans. Circuits Syst. II*, vol. 60, no. 10, pp. 617– 621, 2013.

- [46] W. Diels, M. Steyaert, and F. Tavernier, "Optical receiver with schottky photodiode and tia with high gain amplifier in 28nm bulk cmos," in *Proc. IEEE* 45th Eur. Solid State Circuits Conf. (ESSCIRC), 2019, pp. 149–152.
- [47] F. Bozorgi, M. Bruccoleri, E. Rahimi, M. Repossi, F. Svelto, and A. Mazzanti, "Analog Front End of 50-Gb/s SiGe BiCMOS Opto-Electrical Receiver in 3-D-Integrated Silicon Photonics Technology," *IEEE J. Solid-State Circuits*, 2021.
- [48] [Online]. Available: https://www.intel.com/content/dam/www/public/us/en/ documents/white-papers/accelerating-innovation-through-aib-whitepaper.pdf
- [49] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, Analysis and design of analog integrated circuits. John Wiley & Sons, 2009.
- [50] E. Säckinger, Broadband circuits for optical fiber communication. John Wiley & Sons, 2005.
- [51] F. Khoeini, B. Hadidian, K. Zhang, and E. Afshari, "Reflection-Based Short Pulse Generation in CMOS," *IEEE Solid-State Circuits Letters*, vol. 3, pp. 318– 321, 2020.
- [52] B. Razavi, *Design of Analog CMOS Integrated Circuits*. Tata McGraw-Hill Education, 2002.
- [53] B. Yektakhah and K. Sarabandi, "All-Directions Through-the-Wall Radar Imaging Using a Small Number of Moving Transceivers," *IEEE Trans. Geosci. Remote Sens.*, vol. 54, no. 11, pp. 6415–6428, 2016.
- [54] S. Razavian and A. Babakhani, "A THz Pulse Radiator Based on PIN Diode Reverse Recovery," in 2019 IEEE BiCMOS and Compound semiconductor Integrated Circuits and Technology Symposium (BCICTS), 2019, pp. 1–4.
- [55] S. Razavian and A. Babakhani, "Silicon integrated thz comb radiator and receiver for broadband sensing and imaging applications," *IEEE Trans. Microw. Theory Techn.*, vol. 69, no. 11, pp. 4937–4950, 2021.
- [56] —, "A Highly Power Efficient 2× 3 PIN-Diode-Based Intercoupled THz Radiating Array at 425GHz with 18.1 dBm EIRP in 90nm SiGe BiCMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, vol. 65. IEEE, 2022, pp. 1–3.
- [57] S. M. H. Naghavi, S. Seyedabbaszadehesfahlani, F. Khoeini, A. Cathelin, and E. Afshari, "A 250GHz Autodyne FMCW Radar in 55nm BiCMOS with Micrometer Range Resolution," in *Proc. IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2021, pp. 320–321.
- [58] B. Behroozpour, P. A. Sandborn, M. C. Wu, and B. E. Boser, "Lidar system architectures and circuits," *IEEE Commun. Mag.*, vol. 55, no. 10, pp. 135–142, 2017.

- [59] A. Binaie, S. Ahasan, and H. Krishnaswamy, "A 65nm CMOS Continuous-Time Electro-Optic PLL (CT-EOPLL) with Image and Harmonic Spur Suppression for LIDAR," in 2019 IEEE Radio Frequency Integrated Circuits Symposium (RFIC). IEEE, 2019, pp. 103–106.
- [60] T. Kim, P. Bhargava, C. V. Poulton, J. Notaros, A. Yaacobi, E. Timurdogan, C. Baiocco, N. Fahrenkopf, S. Kruger, T. Ngai *et al.*, "A single-chip optical phased array in a wafer-scale silicon photonics/CMOS 3D-integration platform," *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3061–3074, 2019.
- [61] R. Fatemi, A. Khachaturian, and A. Hajimiri, "A nonuniform sparse 2-D large-FOV optical phased array with a low-power PWM drive," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1200–1215, 2019.
- [62] X. Han, Q. Wang, Z. Wang, Y. Fang, Y. He, W. Geng, Z. Pan, and Y. Yue, "Solid-State Photonics-Based Lidar With Large Beam-Steering Angle by Seamlessly Merging Two Orthogonally Polarized Beams," *IEEE J. Sel. Topics Quantum Electron.*, vol. 27, no. 1, pp. 1–8, 2020.
- [63] M.-C. Amann, T. M. Bosch, M. Lescure, R. A. Myllylae, and M. Rioux, "Laser ranging: a critical review of unusual techniques for distance measurement," *Opt. Eng.*, vol. 40, 2001.
- [64] D. M. Binkley, "Performance of non-delay-line constant-fraction discriminator timing circuits," *IEEE Trans Nucl Sci*, vol. 41, no. 4, pp. 1169–1175, 1994.
- [65] K. Yoshioka, H. Kubota, T. Fukushima, S. Kondo, T. T. Ta, H. Okuni, K. Watanabe, M. Hirono, Y. Ojima, K. Kimura *et al.*, "A 20-ch TDC/ADC Hybrid Architecture LiDAR SoC for 240×96 Pixel 200-m Range Imaging With Smart Accumulation Technique and Residue Quantizing SAR ADC," *IEEE J. Solid-State Circuits*, vol. 53, no. 11, pp. 3026–3038, 2018.
- [66] J. Nissinen, I. Nissinen, and J. Kostamovaara, "Integrated receiver including both receiver channel and TDC for a pulsed time-of-flight laser rangefinder with cm-level accuracy," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1486–1497, 2009.
- [67] H.-S. Cho, C.-H. Kim, and S.-G. Lee, "A high-sensitivity and low-walk error LADAR receiver for military application," *IEEE Trans. Circuits Syst. I*, vol. 61, no. 10, pp. 3007–3015, 2014.
- [68] S. Kurtti, J. Nissinen, and J. Kostamovaara, "A wide dynamic range CMOS laser radar receiver with a time-domain walk error compensation scheme," *IEEE Trans. Circuits Syst. I*, vol. 64, no. 3, pp. 550–561, 2016.
- [69] A. Baharmast, S. Kurtti, and J. Kostamovaara, "A Wide Dynamic Range Laser Radar Receiver Based on Input Pulse-Shaping Techniques," *IEEE Trans. Circuits Syst. I*, pp. 1–12, 2020.

- [70] T. Ruotsalainen, P. Palojarvi, and J. Kostamovaara, "A wide dynamic range receiver channel for a pulsed time-of-flight laser radar," *IEEE J. Solid-State Circuits*, vol. 36, no. 8, pp. 1228–1238, 2001.
- [71] T.-H. Ngo, C.-H. Kim, Y. J. Kwon, J. S. Ko, D.-B. Kim, and H.-H. Park, "Wideband receiver for a three-dimensional ranging LADAR system," *IEEE Trans. Circuits Syst. I*, vol. 60, no. 2, pp. 448–456, 2012.
- [72] J. T. Kostamovaara, K. E. Maatta, M. Koskinen, and R. A. Myllylae, "Pulsed laser radars with high-modulation frequency in industrial applications," in *Laser Radar VII: Advanced Technology for Applications*, vol. 1633. International Society for Optics and Photonics, 1992, pp. 114–127.
- [73] M. Ohara, Y. Akazawa, N. Ishihara, and S. Konaka, "High gain equalizing amplifier integrated circuits for a gigabit optical repeater," *IEEE J Solid-State Circuits*, vol. 20, no. 3, pp. 703–707, 1985.
- [74] C. Hong, S.-H. Kim, J.-H. Kim, and S. M. Park, "A linear-mode LiDAR sensor using a multi-channel CMOS transimpedance amplifier array," *IEEE Sensors* J., vol. 18, no. 17, pp. 7032–7040, 2018.
- [75] X. Wang, R. Ma, D. Li, H. Zheng, M. Liu, and Z. Zhu, "A Low Walk Error Analog Front-End Circuit With Intensity Compensation for Direct ToF LiDAR," *IEEE Trans. Circuits Syst. I*, vol. 67, no. 12, pp. 4309–4321, 2020.
- [76] S. M. Park and H.-J. Yoo, "1.25-Gb/s regulated cascode CMOS transimpedance amplifier for gigabit ethernet applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 112–121, 2004.
- [77] S. Kurtti and J. Kostamovaara, "Laser radar receiver channel with timing detector based on front end unipolar-to-bipolar pulse shaping," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 835–847, 2009.
- [78] R. J. Baker, CMOS: Circuit Design, Layout, and Simulation. John Wiley & Sons, 2019.
- [79] A. de la Plaza and P. Morlon, "Power-supply rejection in differential switchedcapacitor filters," *IEEE J. Solid-State Circuits*, vol. 19, no. 6, pp. 912–918, 1984.
- [80] J. S. Yun, M. Seo, B. Choi, J. Han, Y. Eo, and S. M. Park, "A 4Gb/s currentmode optical transceiver in 0.18 μm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers.* IEEE, 2009, pp. 102–103.
- [81] R. Ma, M. Liu, H. Zheng, and Z. Zhu, "A 77-dB dynamic range low-power variable-gain transimpedance amplifier for linear LADAR," *IEEE Trans. Circuits Syst. II*, vol. 65, no. 2, pp. 171–175, 2017.

- [82] E. Säckinger, Analysis and design of transimpedance amplifiers for optical receivers. John Wiley & Sons, 2017.
- [83] —, "The transimpedance limit," *IEEE Trans. Circuits Syst. I*, vol. 57, no. 8, pp. 1848–1856, 2010.
- [84] B. Zand, K. Phang, and D. A. Johns, "A transimpedance amplifier with dccoupled differential photodiode current sensing for wireless optical communications," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, 2001, pp. 455–458.
- [85] Y. Park, J.-H. Kim, and S. M. Park, "Bootstrapped fully differential CMOS transimpedance amplifier," J. of Semiconductor Tech. and Sci., vol. 20, no. 1, pp. 1–7, 2020.
- [86] R. Ma, M. Liu, H. Zheng, and Z. Zhu, "A 66-dB Linear Dynamic Range, 100dBΩ Transimpedance Gain TIA With High-Speed PDSH for LiDAR," *IEEE Trans. Instrum. Meas.*, vol. 69, no. 4, pp. 1020–1028, 2019.
- [87] J. C. Chen, B. W. McGaughy, D. Sylvester, and C. Hu, "An on-chip, attofarad interconnect charge-based capacitance measurement (CBCM) technique," in *International Electron Devices Meeting. Technical Digest.* IEEE, 1996, pp. 69–72.
- [88] A. Verma and B. Razavi, "Frequency-based measurement of mismatches between small capacitors," in *IEEE Custom Integrated Circuits Conference 2006*. IEEE, 2006, pp. 481–484.
- [89] H. Xu, X. Liu, and L. Yin, "A Closed-Loop ΣΔ Interface for a High-Q Micromechanical Capacitive Accelerometer With 200 ng/√Hz Input Noise Density," *IEEE J. Solid-State Circuits*, vol. 50, no. 9, pp. 2101–2112, 2015.
- [90] Y. Wang, Q. Fu, Y. Zhang, W. Zhang, D. Chen, L. Yin, and X. Liu, "A digital closed-loop sense MEMS disk resonator gyroscope circuit design based on integrated analog front-end," *Sensors*, vol. 20, no. 3, p. 687, 2020.
- [91] L. Aaltonen, A. Kalanti, M. Pulkkinen, M. Paavola, M. Kamarainen, and K. A. Halonen, "A 2.2 mA 4.3 mm<sup>2</sup> ASIC for a 1000 °/s 2-Axis Capacitive Micro-Gyroscope," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1682–1692, 2011.
- [92] B. P. Senevirathna, S. Lu, M. P. Dandin, J. Basile, E. Smela, and P. A. Abshire, "Real-time measurements of cell proliferation using a lab-on-CMOS capacitance sensor array," *IEEE Trans. Biomed. Circuits Syst.*, vol. 12, no. 3, pp. 510–520, 2018.
- [93] S. Forouhi, R. Dehghani, and E. Ghafar-Zadeh, "CMOS based capacitive sensors for life science applications: A review," Sens. Actuators, A, vol. 297, p. 111531, 2019.

- [94] Z. Tan, R. Daamen, A. Humbert, Y. V. Ponomarev, Y. Chae, and M. A. Pertijs, "A 1.2-V 8.3-nJ CMOS humidity sensor for RFID applications," *IEEE J. Solid-State Circuits*, vol. 48, no. 10, pp. 2469–2477, 2013.
- [95] S. Xia, K. Makinwa, and S. Nihtianov, "A capacitance-to-digital converter for displacement sensing with 17b resolution and 20µs conversion time," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers.* IEEE, 2012, pp. 198–200.
- [96] E. Afshari and A. Hajimiri, "Nonlinear transmission lines for pulse shaping in silicon," *IEEE J. Solid-State Circuits*, vol. 40, no. 3, pp. 744–752, 2005.
- [97] W. Lee, M. Adnan, O. Momeni, and E. Afshari, "A nonlinear lattice for highamplitude picosecond pulse generation in cmos," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 2, pp. 370–380, 2012.
- [98] Xue Wu and K. Sengupta, "Programmable picosecond pulse generator in cmos," in 2015 IEEE MTT-S International Microwave Symposium, 2015, pp. 1–4.
- [99] M. M. Assefzadeh and A. Babakhani, "Broadband oscillator-free thz pulse generation and radiation based on direct digital-to-impulse architecture," *IEEE J. Solid-State Circuits*, vol. 52, no. 11, pp. 2905–2919, 2017.
- [100] S. Shekhar, J. S. Walling, and D. J. Allstot, "Bandwidth extension techniques for cmos amplifiers," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2424– 2439, 2006.
- [101] B. Razavi, Design of Integrated Circuits for Optical Communications. John Wiley & Sons, 2012.
- [102] E. Smith, "Dispersion in commonly used cables," Jefferson Lab, Experimental Hall B, CLAS-NOTE-91-007. CEBAF, TN-91-022, 1991.
- [103] C. Jiang, A. Cathelin, and E. Afshari, "A high-speed efficient 220-GHz spatialorthogonal ASK transmitter in 130-nm SiGe BiCMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 9, pp. 2321–2334, 2017.
- [104] H. Wang, H. Mohammadnezhad, D. Dimlioglu, and P. Heydari, "A 100-120GHz 20Gbps bits-to-RF 16QAM transmitter using 1-bit digital-to-analog interface," in Proc. IEEE Custom Integr. Circuits Conf. (CICC), 2019, pp. 1–4.
- [105] S. Kang, S. V. Thyagarajan, and A. M. Niknejad, "A 240 GHz Fully Integrated Wideband QPSK Transmitter in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 10, pp. 2256–2267, 2015.
- [106] C. D'heer and P. Reynaert, "A High-Speed 390GHz BPOOK Transmitter in 28nm CMOS," in Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC), 2020, pp. 223–226.

- [107] R. Kananizadeh and O. Momeni, "High-power and high-efficiency millimeterwave harmonic oscillator design, exploiting harmonic positive feedback in CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 65, no. 10, pp. 3922–3936, 2017.
- [108] V. Pourahmad, F. Khoeini, and E. Afshari, "A System of Two Coupled Oscillators With a Continuously Controllable Phase Shift," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 66, no. 4, pp. 1531–1543, 2019.
- [109] L. Chen, S. Nooshabadi, F. Khoeini, Z. Khalifa, B. Hadidian, and E. Afshari, "An ultra-fast frequency shift mechanism for high data-rate sub-THz wireless communications in CMOS," *Appl. Phys. Lett.*, vol. 118, no. 24, p. 242103, 2021.
- [110] B. Hadidian, F. Khoeini, S. H. Naghavi, A. Cathelin, and E. Afshari, "A 220-ghz energy-efficient high-data-rate wireless ask transmitter array," *IEEE J. Solid-State Circuits*, vol. 57, no. 6, pp. 1623–1634, 2021.
- [111] —, "An Energy Efficient Fully Integrated 20Gbps OOK Wireless Transmitter at 220GHz," in Proc. Custom Integr. Circuits Conf. (CICC). IEEE, 2021, pp. 1–2.
- [112] D. M. Pozar, *Microwave engineering*. John wiley & sons, 2011.