031155-2-T Investigation of finite element - ABC methods for electromagnetic field simulation A. Chatterjee, J.L. Volakis and J. Nguyen National Aeronautics & Space Administration Ames Research Center Moffett Field, CA 94035 September 1994 31155-2-T = RL-2435

Report #031155-2-T NASA Ames Grant NAG 2-866 Grant Title: Report Title *: Report Authors: Primary University Collaborator: Primary NASA-Ames Collaborator: University Address: Date: Development and parallelization of the Finite Element Method with Mixed Termination Schemes Investigation of finite element -ABC methods for electromagnetic field simulation A. Chatterjee, J.L.Volakis and J. Nguyen John L. Volakis volakis@um.cc.umich.edu Telephone: (313) 764-0500 Alex Woo woo@ra-next.arc.nasa.gov Radiation Laboratory Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, MI 48109-2122 September 1994 (replaces report published in May 1994) Funds for the support of this study have been allocated in part of the NASA-Ames Research Center, Moffett Field, California, under Grant No. NAG 2-866 * This report was also submitted by the first author (AC) as a dissertation in partial fulfillment of the requirements for the Ph.D. degree at the University of Michigan. 1

TABLE OF CONTENTS LIST OF FIGURES................................ iv LIST OF TABLES..................................... viii LIST OF APPENDICES........................... x CHAPTER I. Introduction............................... 1 1.1 Background............................ 2 1.2 Outline of thesis.............................. 4 II. Basic concepts of electromagnetics and finite elements.... 7 2.1 Electromagnetics: basic concepts................ 7 2.1.1 Maxwell's equations.................. 7 2.1.2 W ave equations.................... 10 2.1.3 Boundary conditions....................10 2.1.4 Radiation conditions.................... 11 2.1.5 Radar cross-section........................12 2.2 Finite elements: basic ideas........................ 12 2.2.1 Boundary value problem.....................13 2.2.2 The Ritz method........................13 2.2.3 Galerkin's method................... 15 2.2.4 Implementation scheme................ 16 III. Shape functions for scalar and vector finite elements..... 19 3.1 Node-based elements....................... 20 3.1.1 Two dimensional elements.............. 20 3.1.2 Three dimensional elements........................ 24 3.2 Edge-based elements................................ 27 3.2.1 Two dimensional elements...................27 3.2.2 Three dimensional elements...................32 11

IV. Vector finite elements for 3D electromagnetic problems 43 4.1 Formulation............................... 45 4.1.1 Finite element equations............... 45 4.1.2 Origin of spurious solutions................. 47 4.1.3 Basis functions........................ 49 4.2 Results.............................. 51 4.3 Conclusions..................................54 4.4 Formulation...................................... 57 4.4.1 Derivation of finite element equations............ 57 4.5 Finite element discretization.................... 64 4.6 Results............................... 65 4.7 Conclusions............................................ 71 V. Optimization and parallelization.................. 74 5.1 Numerical considerations........................... 75 5.2 Matrix storage and generation........................... 76 5.3 Linear equation solver.............................. 79 5.4 Preconditioning................... 81 5.4.1 Diagonal preconditioner..................... 81 5.4.2 Modified ILU preconditioner.................. 83 5.5 Parallelization.......................................... 88 5.5.1 Analysis of Communication.................. 93 VI. Conformal absorbing boundary conditions for the vector wave equation.........................................100 6.1 Formulation......................... 101 6.1.1 Unsymmetric ABCs...................... 103 6.1.2 Symmetric correction......................... 106 6.1.3 Finite element implementation.................108 6.2 Applications......................... 113 6.3 Conclusion............................... 129 VII. Conclusions........................... 132 APPENDICES..................................136 BIBLIOGRAPHY.................................143 111

LIST OF FIGURES Figure 3.1 Rectangular element.................................... 21 3.2 Transformation of a quadrilateral element in the xy plane to a unit square in the by plane............................................ 22 3.3 Triangular element............................................... 23 3.4 Tetrahedral element......................................... 26 3.5 Second order triangular edge element........................... 31 3.6 Rectangular brick element........................................ 33 3.7 Mapping of a hexahedral element to a unit cube.................... 35 3.8 Tetrahedral element......................................... 40 4.1 Performance comparison of rectangular bricks and tetrahedrals... 52 4.2 Geometry of ridged cavity........................................ 53 4.3 Illustration of scattering structure Vd enclosed by an artificial mesh termination surface, SO, on which the absorbing boundary condition is imposed............................................. 58 4.4 Bistatic echo-area of a perfectly conducting cube having edge length of 0.755A. Plane wave incident from 0 = 1800; q 90~......... 66 4.5 Backscatter RCS of a perfectly conducting cube at normal incidence as a function of edge length............................... 67 iv

4.6 Backscatter pattern of a perfectly conducting cylinder having a radius of 0.3A and a height of 0.6A. The solid and the dashed lines indicate data obtained from a body of revolution code and the black and white dots indicate FE-ABC data................. 68 4.7 Bistatic echo-area of a homogeneous dielectric sphere (6e = 4; koa = 1). 69 4.8 Normalized bistatic pattern of a lossy prolate spheroid (e, = 4 - jl; koa = 7r/2; a/b = 2), where a and b are the major and minor axes of the spheroid, respectively.......................... 70 4.9 Geometry of cube (a = b = 0.5A) consisting of a metallic section and a dielectric section (6e = 2 - j2), where the latter is bounded by a resistive surface having R = Zo....................... 71 4.10 RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The lower half of the cube is metallic while the upper half is air-filled with a resistive card draped over it.......... 72 4.11 RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The composition of the cube is the same as in Figure 4.10, except that the air-filled portion is filled with dielectric. The solid curve is the FEMATS pattern and the black dots are MoM data for the Ezn = 0 polarization........................ 73 5.1 Structure of block preconditioning matrix.............. 83 5.2 Symmetric biconjugate gradient method with preconditioning.... 91 5.3 Speedup curve for the linear equation solver on the KSR1...... 93 5.4 Counts of p subpages required by each processor for sparse matrixvector multiply (total copies=5968).................. 95 5.5 Degree of sharing histogram of p subpages during sparse matrixvector multiply (28 procesors)..................... 96 6.1 Scatterer enclosed in conformal mesh termination boundary..... 101 6.2 Geometry of cube (a = b = 0.5A) consisting of a metallic section and a dielectric section (6. = 2 - j2), where the latter is bounded by a resistive surface having R = Zo......................114 v

6.3 RCS pattern in the x - z plane for the composite cube shown earlier. The solid curve is the FEMATS pattern and the black dots are MoM data for the EnC = 0 polarization. Mesh termination is piecewise planar................................... 115 6.4 Backscatter pattern of a metallic rectangular inlet (1A x IA x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is spherical................................... 116 6.5 Backscatter pattern of a metallic rectangular inlet (IA x IA x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is spherical.................................... 117 6.6 Backscatter pattern of a metallic rectangular inlet (IA x 1A x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is piecewise planar............................. 118 6.7 Backscatter pattern of a metallic rectangular inlet (IA x IA x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is piecewise planar............................. 119 6.8 Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for HH polarization. The solid line indicates measured data [69] and the black dots indicate computed data. Mesh termination surface is a rectangular box......... 120 6.9 Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for HH polarization. Black dots indicate computed values, the solid line represents measured data [69] and the dotted line is body of revolution data [2]. Mesh termination surface is a circular cylinder....................... 121 6.10 Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for VV polarization. Black dots indicate computed values, the solid line is measured data [69] and the dotted line represents reference data from a body of revolution code [2]. Mesh termination surface is a circular cylinder....... 122 vi

6.11 Backscatter pattern of a lossy foam cylinder with three perfectly conducting wires embedded along the axis. The incident electric field is oriented parallel to the axis of the cylinder. Mesh termination surface is a circular cylinder............................... 123 6.12 Backscatter pattern of a lossy foam cylinder with the middle wire offset 0.25A from the cylinder axis. The incident electric field is oriented parallel to the axis of the cylinder. Mesh termination surface is a circular cylinder........................... 124 6.13 Backscatter pattern (aoo) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination......... 125 6.14 Backscatter pattern (akoo) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination............ 126 6.15 Backscatter pattern (aok) of a 3.5A x 2A perfectly conducting plate in the yz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination......... 127 6.16 Backscatter pattern (a+o) of a 3.5A x 2A perfectly conducting plate in the xz plane. The numbers in the legend indicate mesh termination distance from the plate edges.......................... 128 6.17 An eighth of a glass plate enclosed inside a smooth mesh termination boundary composed of flat planes and cylindrical and spherical sections.................................... 129 6.18 Backscatter pattern (aoo) of a glass plate (1.75A x lAx.125A) having a relative permittivity of (3 -j.09) for the 0 = 80~ cut....... 130 vii

LIST OF TABLES Table 3.1 Edge numbering for rectangular element............... 28 3.2 Edge numbering for triangular element................ 30 3.3 Edge definition for rectangular brick.................. 34 3.4 Edge definition for tetrahedron................ 37 3.5 Hierarchical basis functions for tetrahedron.............. 42 4.1 TETRAHEDRON EDGE DEFINITION................. 49 4.2 EIGENVALUES (ko, CM-1) FOR AN EMPTY 1CM X 0.5CM X 0.75CM RECTANGULAR CAVITY........................ 51 4.3 Eigenvalues (ko,cm-1) for a Half-Filled 1cm x O.lcm x 1cm Rectangular Cavity Having a Dielectric Filling of or = 2 Extending from z = 0.5cm to z = 1.0cm.......................... 53 4.4 Eigenvalues (ko, cm-1) for an empty spherical cavity of radius 1cm 54 4.5 Eigenvalues for an empty cylindrical cavity of base radius 0.5cm and height 0.5 cm (380 unknowns)..................... 55 4.6 Ten lowest non-trivial eigenvalues (k0, cm-1) for the geometry drawn in Figure 2: (a) 267 Unknowns; (b) 671 Unknowns......... 56 5.1 No. of iterations vs no. of blocks for a block ILU preconditioned biconjugate gradient solution method................. 88 5.2 No. of iterations required for convergence of a 224,476 unknown system using the point diagonal and block ILU preconditioning strategies................................... 88 viii

5.3 Floating point operations per iteration........................90 5.4 Execution time and speedup for the iterative solver.............. 94 5.5 Execution time and speedup for the matrix generation and assembly (20,033 unknowns)........................... 94 ix

LIST OF APPENDICES Appendix A. Derivation of matrix elements.............................. 137 B. Derivation of some vector identities...................140 x

CHAPTER I Introduction The mechanics of wave propagation in the presence of obstacles is of great interest in many branches of engineering and applied mathematics like electromagnetics, fluid dynamics, geophysics, seismology, etc. Such problems can be broadly classified into two categories: the bounded domain or the closed problem and the unbounded domain or the open problem. Analytical techniques have been derived for the simpler problems; however, the need to model complicated geometrical features, complex material coatings and fillings and to adapt the model to changing design parameters have inevitably tilted the balance in favor of numerical techniques. The modeling of closed problems presents difficulties primarily in proper meshing of the interior region. However, problems in unbounded domains pose a unique challenge to computation, since the exterior region is inappropriate for direct implementation of numerical techniques. A large number of solutions have been proposed but only a few have stood the test of time and experiment. The goal of this thesis is to develop an efficient and reliable partial differential equation technique to model large three dimensional scattering problems in electromagnetics. 1

2 1.1 Background Ever since the method of moments (MoM) was introduced by Harrington [3] in the late 60s, numerical techniques for predicting electromagnetic field behavior have gained in popularity. With increases in computing speeds and memory and the need to simulate real-life problems, researchers have been actively trying to refine the older numerical methods as well as devise newer and more efficient solution techniques. The MoM is based on applying integral equations on the surface of the desired structure and computing the fields everywhere in space [4]. For anisotropic materials, the entire volume needs to be discretized and a volume integral equation must then be solved. However, the matrix obtained from discretizing an integral equation is full and thus requires O(N2) storage, where N is the number of unknowns. In 3D problems, this is a serious limitation since the method can scale up to large problems only at considerable computational cost. Thus we need to seek solution techniques which scale favorably with increasing problem size. Partial differential equation (PDE) methods, like finite element and finite difference methods, offer the most attractive alternative to integral equation techniques since they lead to matrix systems which are sparse. Therefore, only the non-zero entries of the final matrix system need to be stored resulting in O(N) storage requirement. Thus the increase in storage demand with increasing problem size is seen to be minimal. PDE techniques like finite elements were, however, originally constructed for solving bounded domain problems. In recent times, finite elements are increasingly being used for modeling unbounded problems, where the desired parameter decays off to zero infinitely away from the region of interest. In electromagnetics, the desired

3 parameter is a field quantity like the electric or magnetic field. It is obviously impractical to extend the finite element mesh to infinity; thus the mesh must be truncated at a suitable distance from the region of interest. Boundary conditions should then be applied at this artificial mesh truncation surface such that the boundary appears transparent to the propagating field. There are two types of mesh truncation conditions: exact and approximate. Exact boundary conditions can be placed very close to the region of interest; however, they suffer from potential uniqueness problems [5] and give rise to partly full systems. The loss of uniqueness associated with systems where the exact boundary condition is employed on the mesh truncation boundary is well-known and was first pointed out by Mautz and Harrington[6]. Remedies like complexification of the wave number [7] and using the combined field integral equation [8] exist, and must be used for a robust implementation. Although the problem of interior resonances can be now avoided, the finite element-boundary integral system still possesses a partly full system which affects its scalability to large problems. Approximate boundary conditions, on the other hand, are local in nature but preserve the sparsity of the finite element system. This advantage is partly offset by the fact that the finite element mesh must be extended some distance away from the region of interest and the approximate boundary condition imposed on the mesh truncation surface. These boundary conditions work on the principle that the higher order terms of the expansion for the propagating field decay rapidly away from the target. Therefore, if the truncation boundary is placed far enough from the region of interest, the boundary condition on the mesh termination surface needs to absorb only the lowest order terms of the field expansion to accurately model the physics of the problem. In this thesis, our aim is to examine the performance of these

4 approximate conditions, also known as absorbing boundary conditions (ABCs), in practical three dimensional problems and to derive improved boundary conditions which will enable more efficient utilization of the available resources. 1.2 Outline of thesis This dissertation describes the development of a finite element method for the solution of general three-dimensional scattering problems. The entire research has been geared towards a robust, state-of-the-art solution of unbounded domain problems in electromagnetics. Improvements in solution convergence, mesh termination conditions, algorithmic complexity and computational speed have all been carried out with an eye to making the methodology more efficient in terms of computer storage and time. Chapter 1 presents a brief introduction to the problem, its possible applications in science and industry and our motivation in preferring this solution methodology over more traditional ones. Chapter 2 gives a short review of the fundamental laws that govern electromagnetic phenomena and the modeling technique of finite elements. The wave equation for electromagnetic fields is derived and various boundary conditions satisfied on material interfaces are presented. A brief outline of the method of finite elements and its application in electromagnetics is given. Chapter 3 provides a detailed review on the construction and implementation of scalar and vector shape functions for two- and three-dimensional finite elements. Traditional node-based shaped functions are presented for a wide variety of element shapes and their pros and cons are outlined. The problem with nodal basis for a full-scale vector formulation is explained and edge-based vector shape functions are

5 introduced. The development of two- and three-dimensional edge bases is presented for a wide variety of element shapes. Higher order edge basis functions are presented for triangles and tetrahedra along with other recently developed novel shape functions. Chapter 4 describes the formulation and implementation for closed and open domain problems. In the first part of the chapter, the closed problem is solved by determining the eigenvalues of an empty or filled metallic cavity. The origin and avoidance of spurious modes is discussed. The open problem is then formulated in the second part of the chapter and schemes for terminating the finite element mesh are discussed. The code is then validated for a wide class of perfectly conducting and composite geometries having arbitrary shapes. Since the finite element-absorbing boundary condition methodology becomes extremely attractive for large problems, it is essential that the computer code be as computationally efficient as possible. Chapter 5 details the optimization and the subsequent parallelization of the finite element code and the various numerical considerations associated with it. The strategies for sparse matrix storage as well as solution of sparse systems using preconditioned iterative methods is outlined. The inherent parallelism in various types of point and block preconditioners is examined along with performance figures on the KSR1 and the Intel iPSC/80 massively parallel architectures. In Chapter 6, new mesh termination conditions which can be applied on termination surfaces conformal to the target are derived and applied on some benchmark geometries. Since these ABCs are enforceable on doubly curved surfaces, dramatic reductions in computer storage and solution time are obtained. The improved boundary conditions are applied on mesh truncation surfaces composed of combinations of

6 cylinders, spheres and flat planes and their performance examined with respect to mesh termination distance and system symmetry. Extensions to more complex mesh termination boundaries are possible. Chapter 7 concludes the thesis by summarizing the important results obtained during the course of this work and its possible future extensions.

CHAPTER II Basic concepts of electromagnetics and finite elements As mentioned in the last chapter, this thesis deals with the application of finite elements to three-dimensional problems in electromagnetics. Therefore, it is important to have a grasp of both the finite element method as well as electromagnetic theory to solve these problems. This chapter is thus divided into two parts. The first part gives a brief review of the basic concepts of electromagnetics and the resulting differential and integral equations pertaining to our problem. The second part introduces the reader to the formulation of boundary-value problems with finite elements. 2.1 Electromagnetics: basic concepts 2.1.1 Maxwell's equations Electromagnetic waves in all space are governed by a set of fundamental equations called Maxwell's equations. In differential form, they are written as OB VxE = — t: (Faraday's law) (2.1) VxH = J + -: (Maxwell-Ampere's law) (2.2) V-D = p:(Gauss' law) (2.3) 7

8 V-B = 0: (Gauss' magnetic law) (2.4) where E electric field intensity in volts/m D electric flux density in coulombs/sq. m H magnetic field intensity in amperes/m B magnetic flux density in webers/sq. m J electric current density in amperes/sq. m p electric charge density in coulombs/cu. m Another fundamental equation, frequently referred to as the equation of continuity, is given by V*J = -P (2.5) and expresses the conservation of charge. Of the five equations stated in this chapter, only three are independent. Either the first three equations, (2.1-2.3), or the first two equations, (2.1 and 2.2) and the fifth equation (2.5), can be chosen as independent equations. The remaining two equations,(2.4 and 2.5) or (2.3 and 2.4), can be derived from the independent equations and are thus called auxiliary or dependent equations. The three independent equations cannot be solved since the number of unknowns exceeds the number of equations. Maxwell's equations become definite only when the three constitutive relations between the field quantities are specified. These relations describe the macroscopic properties of the medium being considered. For a simple medium, they are given by D = 6E (2.6)

9 B = H (2.7) J aE (2.8) where the parameters c,p and a denote the permittivity (farads/m), permeability (henrys/m) and conductivity (mhos/m) of the material, respectively. For anisotropic media, the constitutive relations are given as D = e E (2.9) B =. H (2.10) J = aE (2.11) We will be considering only isotropic media in the following sections since the generalization to anisotropy is trivial and introduces needless algebraic complexity. It is usually sufficient to consider the steady-state solution for electromagnetic fields as produced by currents having sinusoidal time dependence. The set of Maxwell's equations, using complex phasor notation and the constitutive relations, can then be written as VxE = -jgwH (2.12) VxH = J+3weE (2.13) V (eE) =p (2.14) V.(/ H) = 0 (2.15) V.J = -jwp (2.16) where w is the angular frequency of oscillation and the time convention ejmwt is used and suppressed.

10 2.1.2 Wave equations The two curl equations, (2.1 and 2.2), can be combined together with the assumed constitutive relations, (2.7 and 2.8), to obtain a separate second-order differential equation for each field. By taking the curl of (2.1) or (2.2) and eliminating H or E respectively, we obtain 12 Vx-VxE- kcrE = -JwJ (2.17) lUr Vx-VxH - kuH = Vx ) (2.18) where ko = wo /eolo is the free-space wave number, er and Our are the respective relative permittivity and permeability of the medium under consideration and J is an impressed or source current. The differential equations shown above are called inhomogeneous vector wave equations in three dimensions. 2.1.3 Boundary conditions Mathematically, the solution of a partial differential equation (PDE) like the wave equation, outlined in (2.17) and (2.18), is not unique in a region unless boundary conditions are specified, i.e, the behavior of the field on the boundary of the region of interest. Boundary conditions play the same role in the solution of PDEs that initial conditions play in the solution of differential equations for electric circuits. An electromagnetic problem is thus completely defined only when it contains information about the governing differential equation and the corresponding boundary conditions at material discontinuities or inhomogeneities. At the interface between two media, say medium 1 and medium 2, the boundary conditions can be mathematically expressed as n x (El-E2) = 0 (2.19)

11 n (Di-D) = 0 (2.20) for electric fields and n x(Hi - H2) = 0 (2.21) n (Bi - B2)= 0 (2.22) for magnetic fields, where n denotes the unit normal to the interface. It is assumed in (2.20) and (2.21) that neither surface currents nor surface charge exist on the boundary. Equations (2.19) and (2.21) state that tangential electric and magnetic fields are continuous across dielectric boundaries. It is possible to simplify the above boundary conditions at the interface of a perfect electric conductor and free-space. Since a perfect conductor cannot sustain a field inside it and likewise since the flux lines of B are continuous, (2.19) can be rewritten as n x E = 0 (2.23) and (2.22) reduces to n B = 0 (2.24) However, the conductor surface can support a surface current (Js = n x H) and a surface charge (ps = n. D). 2.1.4 Radiation conditions It can be shown that the electric field within a finite volume can be derived in terms of the sources within the volume and the field values on the surfaces bounding that volume. If wae make this volume infinitely large, we arrive at the Sommerfeld

12 radiation condition [9] given by lim r + j k =0 (2.25) roo Or r + where b is regular at infinity and describes a component of the electric or magnetic field and r = x2 + y2. In three dimensions, the radiation becomes lim R [VxE + jkoR x E] = 0 (2.26) R —+oo where R = - y2 + z2. A similar result exists for the magnetic field. The radiation condition requires that E and H diminish as R-1 when R — + oo. 2.1.5 Radar cross-section The radar cross-section (RCS) is a quantity characterizing the scattering from an obstacle. It is defined as the area intercepting that amount of power which, when scattered isotropically, produces at the receiver a power density equal to that scattered by the target under consideration. In the three-dimensional case, the RCS is defined as a(0,q) = lim 4,R2 IF 2 (2.27) R-oo 2Fine2 wher Fs denotes the scattered field (either Es or Hs) at the observation point (R, 0, 0) and FinC represents the incident field, usually a plane wave, coming from (R, Oinc, nc) 2.2 Finite elements: basic ideas The finite element method (FEM) is a numerical method for obtaining approximate solutions to boundary-value problems in physics and engineering. Very few analytical solutions are possible and thus a numerical method like the FEM provides

13 us with an alternative solution technique for these problems. For a particular class of such problems, there exist extremum principles by which the solution being sought makes an appropriate functional stationary, or, in certain cases, extremal. In other problems for which no genuine extremal principles can be derived, the error resulting from the substitution of the numerical approximation into the differential equation is minimized. In the following pages, we will give a brief description of the two accepted formulation schemes - the Ritz variational method [10] and the Galerkin method of weighted residuals [11]. We will then discuss how the finite element method is used to solve PDE problems formulated by the two abovementioned schemes. 2.2.1 Boundary value problem We basically seek an unknown function u which satisfies a differential equation A(u) = Lu-f = 0 (2.28) in a domain Q and certain boundary conditions B(u) = 0 (2.29) on the domain boundary F. In electromagnetics, the form of the governing differential equation ranges from the simple Poisson equation in statics to the complicated vector wave equation such as (2.17) and (2.18). The boundary conditions can also vary from Dirichlet and Neumann conditions to more complicated higher-order transition and radiation conditions. 2.2.2 The Ritz method The Ritz method, or the Rayleigh-Ritz method, is a variational formulation where the solution to the boundary value problem is obtained by searching for the stationary

14 point of the functional. If the operator L in (2.28) is self-adjoint and positive-definite, then the solution to (2.28) can be found by determining the stationary point of the functional [1] F(u) = < Cu, > - < U, f > (2.30) with respect to u, where u denotes the trial function. The inner product, denoted by angular brackets, is given by < a,b>= a b dV (2.31) Once the functional is found, we approximate the trial function by the expression N u = Cw = CTw (2.32) i=l where wi are basis functions and Ci are constant coefficients to be determined. Substituting (2.32) in the expression for the functional, we get F() = CT (J WWT dQ) C-C C wf dQ (2.33) where C is the column vector of unknown coefficients and the superscript denotes the transpose of a vector. On differentiating F(u) with respect to C and setting the resultant expression to zero - equivalent to finding the stationary point of F(u) - we obtain a system of equations [A] {C} = {b} (2.34) where the elements of the matrix A and the vector b are given by A = J Wi dQ (2.35) b = J wf dQ (2.36) JQ

15 On solving (2.34) for the unknown coefficients {C} of the finite element bases, we obtain a solution for the desired quantity everywhere within the computational domain. It should be pointed out that the inner product as defined in (2.31) extends the applicability of the variational formulation to complex numbers. This is of utmost importance in electromagnetics since material fillings are often lossy. However, unlike in other branches of science, the functional does not have any physical significance in electromagnetics and is thus used sparingly. A further limitation is that the matrix A must be symmetric for a variational principle to exist. Problems which give rise to unsymmetric matrices need to be handled differently. The weighted residual or the Galerkin method provides an alternative, and simpler, approach of formulating the finite element equations. 2.2.3 Galerkin's method Galerkin's method or the weighted residual method tries to minimize a residual in the mean square sense. If we assume that u is an approximate solution to (2.28), on replacing u with u in (2.28) we obtain the residual 7t = LCu- f 0 (2.37) Naturally, the best approximation for u would be one that reduces the value of the residual to a minimum at all points in Q. The integral of the residual, weighted with some known weighting functions wi, is then required to vanish in Q. / Rwi dQ = 0 (2.38) In the Galerkin method, the weighting functions wi are chosen to be identical to the basis functions used for expanding u. With this choice of weighting functions,

16 the residual R is orthogonal to the subspace of functions spanned by the basis of u, ensuring that the resulting approximation is in this sense the best possible from the space of approximating functions. The expression (2.38) then becomes (wiWWTc - Wif) dQ = O, i = 1,..., N (2.39) where u has been expanded as in (2.32). This again leads to a matrix system identical to the one given in (2.34), obtained by the Ritz method. One of the advantages of this formulation is that the matrix system need not be symmetric for its validity. Also, the method can be applied to problems for which no genuine extremum principles exist. 2.2.4 Implementation scheme The implementation scheme for FEM follows three broad outlines. Step 1: At first, the problem is discretized by dividing the entire computational domain into simple subdomains, the elements. For two-dimensional problems, commonly used elements are triangles, parallelograms and quadrilaterals with straight or curved edges. In three dimensional implementations, the elements of choice are usually terahedra, curvilinear bricks or prisms with straight or curved surfaces [12, 13]. Step 2: Next, a suitable approximation function is chosen for the problem to be solved. The form of approximation depends on the type of element and must satisfy certain continuity conditions across inter-element boundaries. Further, the form of the polynomial function must remain unchanged under a linear transformation from one Cartesian coordinate system to another. This requirement is satisfied if the polynomials are complete to a specific order like U(X, y) = ci + C2X + c3y + c4x2 + c5xy + c6y2 (2.40)

17 or when the extra terms are symmetric with respect to one another, as in the following incomplete third-order polynomial u(x, y) = c1 + cx + c3y + c4x2 + C5xy + c6y2 + c7x2y + csxy2 (2.41) Such approximation functions have the characteristic that, for fixed x or y, they are always complete ploynomials in the other variable. The two examples shown above apply in two-dimensions; the extension to three-dimensional elements is trivial. Once the order of the polynomial is selected, we can derive an expression for the unknown solution in an element, say the eth element, having the following form: n U = E u eN (2.42) k=l where n is the number of bases in the eth element, u6 is the value of the unknown function at node, edge or facet j and Nk is the basis (or shape) function for the element. Step 3: In the third step, we enforce the extremum principle by substituting (2.42) into the functional expression (2.30) or into the residual value (2.37). On imposing the stationarity of the functional as explained in Section 2.2.2, we obtain the system of linear equations (2.34). An alternative procedure of minimizing the weighted residuals (2.38) yields an identical system of linear equation (2.34). As mentioned earlier, both the variational (Ritz) and the weighted residual (Galerkin) formulations are equivalent when the matrix system is symmetric. In addition, the Galerkin method can handle unsymmetric systems. Finally, the solution of (2.34) specifies the values of the desired function everywhere within the computational domain. In the next chapter, we will be discussing step 2 in more detail. The two subsequent chapters are devoted to the derivation of the finite element equations for our

18 application and the optimizations undertaken to enable rapid solution of the system.

CHAPTER III Shape functions for scalar and vector finite elements The finite elenient method is used for modeling a wide class of problems by breaking up the computational domain into elements of simple shapes. Suitable interpolation polynomials (or shape functions) are used to approximate the unknown function within each element. It is then possible to program the computer to solve complicated geometries by specifying the shape functions only. The element choice, however, needs human intervention and intelligence to ensure a reliable solution of the of the problem at hand. In this chapter, we will discuss the derivation of node-based and edge-based shape functions for two dimensional and three dimensional finite elements. Node-based shape functions have been used extensively in civil and mechanical engineering applications as well as in scalar electromagnetic problems. However, a full three dimensional vector formulation brings out numerous deficiencies in these traditional element shape functions [14, 15]. Edge-based shape functions have thus been derived to overcome the problems associated with nodal bases and are now being applied widely for solving vector problems in electromagnetics. We will also describe a general procedure for deriving higher-order shape functions for node and edge basis. 19

20 3.1 Node-based elements In node-based finite elements, the form of the sought function in the element is controlled by the function values at its nodes. The approximating function can then be expressed as a linear combination of basis functions weighted by the nodal coefficients. If the function values uie at the nodes are taken as nodal variables, then the approximating function for a two-dimensional element e with p nodes has the form p = Iue ) iN(x, y) (3.1) i=l Since the expression (3.1) must be valid for any nodal variable ue, the basis function Ne(x, y) must be unity at node i and zero for all remaining nodes within the element. Shape functions can be derived either by inspection (Serendipity family) or through simple products of appropriate polynomials (Lagrange family). It is easier and more systematic to construct higher-order bases in the Lagrange family while progression to higher orders is difficult in the Serendipity family. However, Lagrange shape functions have undesirable interior nodes and more unknowns than Serendipity shape functions of the same order. All shape functions derived in the following sections impose function continuity or Co continuity (not slope continuity) between elements. 3.1.1 Two dimensional elements Rectangular elements The simple shape of the rectangular element permits its shape functions to be written down merely by inspection. On examining the element shape given in Figure (3.1), the shape functions can be cast in the form e he ( h- e e -2 c 2 y

21 I he ye +kyy- -y N3 - Ae -x + x + c + y where xc and y1 denote the coordinates of the mid-points of the edges, hh and h represent the edge length and Ae denotes the area of the element. Higher order 4 3 — op- X A e ee _ 2 I x 1 Figure 3.1: Rectangular element rectangular elements are presented in Zienkiewicz [12]. However, these elements can model only regular geometries and are thus not very useful in practice. Irregular geometries can be modeled by using quadrilateral elements which can also be viewed as distorted rectangles. To construct basis functions for a quadrilateral element, we need to use a transformation that maps a quadrilateral element in the xy-plane to a square element in the or/ plane (Figure 3.2). Such a transformation can be found by satisfying the following relation at the four nodes of the quadrilateral element: x = a + b + crJ+ d y = a' + b'g' + c'r + d'(/ (3.2)

22 4 4 =1 3 3 K> — 1 - ^ VI 1 2 1 r=-1 2 Figure 3.2: Transformation of a quadrilateral element in the xy plane to a unit square in the r71 plane On solving for the unknown coefficients a,..., d, the basis functions can be cast in the following form Ni = +(l +o)(l +/o), i =1,...,4 (3.3) where o = (i and Tco = 7nti. The variables (ii,7i) denote the coordinates of the ith node in the (, /1) coordinate system. Triangular elements Triangular elements are popular since they can model arbitrary geometries. We will determine the shape functions of triangular elements by using Lagrange interpolation polynomials. Let us consider a point P within a triangular element (Figure 3.3). The area of the smaller triangle formed by points p, 2 and 3 is given by 1 x y 2 1 2 Y2 1 Y3

23 3 p _ — -—... -,2 Figure 3.3: Triangular element The area coordinate L7 is then given by L = (3.5) where A is the area of the whole triangle and can be found from (3.4) by replacing x and y in the first row with x1 and yi. Similarly, the two remaining area coordinates L2 and L3 are given by A2 Area P31 ~ A Area 123 L A3 Area P12 3 A Area 123 The values for x and y inside the triangular element reduce to 3 3 x = Lex y =E Lye (3.8) i=l i=1 The area coordinates are equal to the basis functions - Ne, i = 1,2,3 - when the required interpolation order is linear. Higher order basis functions for triangular and quadrilateral elements are derived in [12, 13].

24 3.1.2 Three dimensional elements Shape functions for three dimensional elements can be described in a precisely analogous way to their two dimensional counterpart. However, the simple rules for inter-element continuity given previously must be modified. The nodal field values should now interpolate to give continuous fields across the face of each element. Rectangular bricks The simplest polynomial approximation to a rectangular brick element is the trilinear function u (x, y, z) = ae + bex + cey + d(z + exy + feyz + gezx + hexyz (3.9) whose eight parameters are uniquely defined by the values of the function u at the eight corners of the brick. From the eight resulting equations, we can determine the coefficients ae, be,..., he and write the final expression in the form 8 U,(X ) y, Z =i Y, y )Ui (3.10) i=1 However, this is a cumbersome process and can be easily avoided by writing down the required basis functions by mere inspection. Since the basis function Ni must be unity at node i and zero at the remaining nodes, the eight interpolation functions can be written down as N1 he he (Xc he Z Y - y Z N - + 2 Z)(c + - ) ( Z -) Ne - -e (C + 2 YC 2 + Y Zc + 2 Z) I he hey- -^ he Ne = ye (Zc + 2- X) (- + 2+ ) (YZc+ 2 -z) = ve c 2 YC 2 c 2 1e he h e, he^ - Ne- _ e + X Y + + Ze ZH

25 N (V)( +X + - Y ) (-Z) + N7 -e (-XC + 2 + x -y -Y + Y - ze + ) -e + Z 1 he he he Ne = X+ x -x y + + Z + + Z where xe, y, and z' denote the coordinates of the center of the element, he, he, and he represent the edge lengths of the element and Ve denotes the element volume. Bricks with quadratic interpolation functions need 20 degrees of freedom and thus have e node points at the corners and the mid-points of each edge. Shape functions for hexahedral elements or distorted bricks can be derived by mapping the element in the xyz coordinate system onto a standard cube in a new ir/( coordinate system. The required transformation yields 8 8 8 x = N( ); y = E N(,, (); z = E N(, (3.11) i=1 =1 i=1 where N= (1 + i)(1 + rit)(1 + i) (3.12) with (,,,,() denoting the coordinates of the ith node. Tetrahedral elements The three dimensional analogue of a two-dimensional triangle is a tetrahedron (fourfaced element). Once again, we can introduce special coordinates, called volume coordinates or simplex coordinates, to simplify the derivation of shape functions. If P is a point within the tetrahedron shown in Figure 3.4, the four volume coordinates are given by _ Volume P234 Volume 1234 Volume P341 Volume 1234

26 4 1....::::::::........ Figure 3.4: Tetrahedr al element _ Volume P412 - Volume 1234 - Volume P123 Volume 1234 and any position within the element is specified by 4 4 4 x = Z Lixi; y = E Liyi; z = Z LiZi i=l1 i=l i=1 Quadratic shape functions for a tetrahedron necessitates the use of ten node points - the 4 corner nodes and the remaining 6 on the mid-points of the edges. Other elements Other three-dimensional elements having simple shapes include the triangular prism and isoparametric elements. It is much easier to discretize a complicated structure by using parallelepipeds in combination with prism elements. To ensure that a small number of elements can model a relatively complex region, curved or isoparametric elements can be used. There is a good review of such elements in [12, 16].................................................................................................................................. riw............. ~::: ~~::: ~~~::: ~~::: ~~::: ~~::: ~~~::: ~~ ~:::: ~~.................................::.::~~:~~:.~:. I::..:~~5::t ~:iI;....................................................................................................:.-. -.................................. and ay postionwithi the lemet.......... d b.............................................. Quadrtic sape unc.......................nnode oint................................. h e e d g e s.............. Otherthre-dim nsioal e................:..............ianularpris and isparameric elments.It is uch...............................tuctur...........................on w it prsm lem nt......t a s al.........................yco plx egon crvd r soarmeri

27 3.2 Edge-based elements In electromagnetics, we encounter several serious problems when node-based elements are employed to represent vector electric or magnetic fields. First, spurious modes are observed when modeling cavity problems using node-based elements [17]. Secondly, special care needs to be taken to impose boundary conditions at material interfaces and conducting surfaces [18]. The first limitation can also jeopardize the near-field results of a scattering problem, the far-field escapes contamination since spurious modes do not radiate. Edge-based finite elements, whose degrees of freedom are associated with the edges of the finite element mesh, have been shown to be free of the above shortcomings. They were described by Whitney [19] over 35 years ago and have been revived by Nedelec [20] and Bossavit and Verite [14] and Hano [21]. Mur and de Hoop [22], van Welij [23], Barton and Cendes [24] and Lee,et al [25] have extended their applicability to various two- and three-dimensional shapes and even constructed higher order elements for a more accurate approximation of the field values. 3.2.1 Two dimensional elements Rectangular elements We first consider the rectangular element first since its shape function is usually the easiest to formulate. For the element shown in Figure 3.1, we can find its edgebased finite element basis function merely by by inspection. If the edges are numbered according to Table 3.1 and considering that the basis function should be unity along one edge and zero over all others, the vector basis functions can be written as W1 = e -y + y 2 X

28 Edge no. Zi Z2 1 1 2 2 4 3 3 1 4 4 2 3 Table 3.1: Edge numbering for rectangular element W2 = 1 -Ye+ )X W3 = hX 2) = W4 = he (X Xc + 2 = where x, y and z are the unit vectors in the Cartesian coordinate system. The electric field within the finite element is then given by E = EEeWe (3.14) i=l where Ee denotes the tangential field along the ith edge. The basis functions Ne guarantee tangential continuity across inter-element boundaries since they have a tangential component only along the ith edge and none along the other edges. They are also divergenceless within the element and possess a constant non-zero curl. It should be noted that by taking the cross-product of z with We, we obtain basis functions which possess normal continuity across element boundaries, have zero curl and non-zero divergence. The latter are ideal for representing surface current densities and are known as roof-top basis functions in electromagnetics. They have found extensive use in the solution of integral equations [26]. Edge basis for quadrilateral elements can be derived by carrying out the transformation detailed in the derivation of nodal basis for quadrilaterals in the previous

29 section and then taking the gradient of the resulting expression for each edge. The edge-based quadrilateral element has two shortcomings. First, the integrals associated with these elements do not lend themselves to easy evaluation and secondly, the basis functions are not divergence free. However, their ability to model complicated shapes with a lesser number of unknowns than tetrahedra and property of tangential continuity across elements make them attractive for use in two-dimensional vector formulations. Triangular elements Since the edges of an arbitrary triangular element are not parallel to the x or y axis, it is not easy to guess the form of the vector basis function by inspection. Therefore, the edge basis for a triangular element is expressed in terms of its area coordinates, L1, L2 and L3. These are the so-called Whitney elements. If the local edge numbers are defined according to Table 3.2, then edge bases for a triangular element are defined as Wk = Ni = LLVL - LjVL, i,j = 1,...,3 (3.15) where Wk denotes the basis function for the kth edge of the element. The vector field inside the triangular element can, therefore, be expanded as 3 E = EEWe (3.16) k=1 where E, denotes the tangential field along the kth edge. It can be easily shown that the edge-based functions defined in (3.15) has the following properties within the element: V.Ni = 0 VxNij = 2VL, x VL

30 Edge no. Z| Z2 1 1 2 2 2 3 3 3 1 Table 3.2: Edge numbering for triangular element If 'e is the unit vector pointing from node I to node 2 in Figure 3.3, then 'e VLi = -1 and 'e VL2 = 1. Since L1 is a linear function that varies from unity at node 1 and zero at node 2 and L2 is unity at node 2 and zero at node 1, we have el N12 = Li + L2 = 1 (3.17) along the entire length of edge 1. This implies that N12 has a constant tangential component along edge 1. Moreover, since L1 vanishes along edge 2 and L2 vanishes along edge 3, N12 has no tangential component along these edges. Thus, tangential continuity is preserved across inter-element boundaries but normal continuity is not. A different method of constructing edge basis functions for triangular elements is given in [27]. Higher order vector basis functions include the contribution of facet elements to the approximating function. Unknowns in the triangular element are assigned as shown in Figure 3.5 [28]. The tangential projection of the vector field along edge {i, j} is determined by two unknowns El and EJ and two facet unknowns - F1 and F2 - are provided to allow a quadratic approximation of the normal component along two of the three edges. Only two facet unknowns are required to make the basis functions of second order complete. Therefore, there are 8 degrees of freedom for

31 1 2 3 E1 F1 F E3 2 3 3 3 E3 Figure 3.5: Second order triangular edge element each triangular element. The higher order vector field within the element is given by 3 3 E - = E-ELiVLj + FeLL2VL3 + F2LiL3VL2 (3.18) i=l j=l where we have arbitrarily chosen the facet variables to lie on edges 1 and 2. These variables are local unknowns associated with each separate triangular element and are included to provide a linear approximation for Vt x Et, where the subscript t denotes the tangential component. Since the edge variables provide common unknowns across element boundaries, tangential continuity of the field over the boundary is assured. However, an obvious disadvantage of these elements is that the 2 facet variables cannot be symmetrically assigned. This disadvantage can be avoided by using third order elements [29].

32 3.2.2 Three dimensional elements Edge-based elements have facilitated to a great degree the finite element analysis of three dimensional structures in electromagnetics. Linear nodal structures in electromagneticbasis with their problem of spurious modes and difficulty in maintaining only tangential continuity across material interfaces are not as convenient for electromagnetic field simulations in three dimensions. On the other hand, the introduction of edge based shape functions provide a robust way of treating general three dimensional problems having material inhomogeneities and structural irregularities like sharp edges and corners. In the following section, we will consider the simple rectangular bricks first and will proceed to derive edge-based shape functions for more complicated structures like tetrahedrals and curvilinear hexahedrals. The chapter is concluded with a brief discussion on hierarchical edge elements. Rectangular bricks and hexahedrals As in the two dimensional case, we derive the edge-based shape function for a rectangular brick (see Figure 3.6) by simple inspection. Since a constant tangential field component must be assigned to each edge of the element, we can express the shape function along each edge of the element as [30] hyhe 2 ( 2) h h (J e 2)(~e he2) 5 h h;> + c + 2 )( +c 2 )Y

33 8 7 5 6 __ _ 3 z Y 1 --- 2 Figure 3.6: Rectangular brick element W6 - z (-~% + -x + hx) = heh (z C 2 C 2 Y h -z + z(- + W8 -= z+ ( x-x x + -y + ( w12 hhe (c 2 c 2 ) W z+ _ y -y~+ W-xx + - c y-yY+ z where hhphe denote the edge lengths in the xy, and z directions, respectively, w~ _ 2 - hexe + 2 y-2 e eA and the center coordinates of the brick are given by (x, y, z). If the edge numbers are defined as in Table 3.3, the expression for the vector field within the element can

34 be expressed as 12 E e= EWe (3.19) where Ee represents the value of the electric field along the ith edge. The vector basis Edge no. Node i1 Node i2 1 1 2 2 4 3 3 5 6 4 8 7 5 1 4 6 5 8 7 2 3 8 6 7 9 1 5 10 2 6 11 4 8 12 3 7 Table 3.3: Edge definition for rectangular brick Ne defined for the rectangular brick element have zero divergence and a nonzero curl. Furthermore, the expansion (3.19) guarantees tangential continuity of the electric field across the surfaces of the elements. A rectangular brick element has limitations in the sense that it is unable to model irregular geometries. Due to this reason, the analog of the two dimensional quadrilateral ( the hexahedral element) finds much wider use in modeling practical three dimensional problems. As in the case of the quadrilateral element in two

35 dimensions, a hexahedral element in Cartesian coordinates can be seen as the image of a unit cube un(ler a trilinear mapping to the (y( coordinate system (see Figure 3.7). 8 7 K => - - 3 2 1 2 Figure 3.7: Mapping of,a hexahedral element to a unit cube Let us consider those faces for which ( =constant. Therefore, Vf must then possess only a normal component on that face. Since g varies linearly along the edges that are parallel to the (-axis, the vector function Vt has nonzero tangential components only along those edges that are parallel to the (-axis. Using the nodebased expression for the shape function in a hexahedral element given in (3.12), we may write the corresponding edge bases as W =8 (1 + 7in) (1 + (X() V, edges to (-axis (3.20) 8 = (1+ i) (1 + (i() V71, edges | to n-axis (3.21) 8 W = (1 + di) (1 + 71ij) VC, edges to C-axis (3.22) where ((, nV, i) denote the coordinates at edge i. The vector bases derived above possess all the desired continuity properties of edge elements and generally result in about half the number of unknowns than that

36 obtained by tetrahedral gridding. However, these basis functions are not divergenceless and it is difficult to generate a finite element mesh of an arbitrary structure using hexahedral elements. Tetrahedral elements Tetrahedral elements are, by far, the most popular element shapes to be employed for three dimensional applications. This is because the tetrahedral element is the simplest tessellation shape capable of modeling arbitrary three dimensional geometries and is also well-suited for automatic mesh generation. The derivation of shape functions for these elements follow the same pattern as that for triangular vector basis functions. If we consider the tetrahedron shown in Figure 3.4 and define the edge numbers according to Table 3.4, we have W = Nie = LVLje- LjVL,,j = 1,...,4 (3.23) and the vector field within the element can be expanded as 6 E = EWe (3.24) k=l A nice explanation of the physical character of the edge-based interpolation function is given by Bossavit [31]. Let us consider edge number 1 connecting nodes 1 and 2. Since VL2 is orthogonal to facet {134} and VL1 is orthogonal to facet {234}, the field turns around the axis 3-4 and is normal to planes containing 3 and 4. The field thus has only tangential continuity across element faces. Edge elements can also be described as Whitney elements of degree 1. Whitney elements of the second degree are called facet elements because they are constant over the face of the tetrahedron. The vector function for the facet element can be written as Nijk = 2 (LiVLj x VLk + LjVLk x VL, + LkVL, x VL), i, j, k = 1,..., 43.25)

37 Edge no. Node il Node i2 1 1 2 2 1 3 3 1 4 4 2 3 5 4 2 6 3 4 Table 3.4: Edge definition for tetrahedron As explained in [31], we now have a central field (emanating as if from node 4 in Figure 3.4) on each of the two tetrahedra that share the face {1,2,3}. The field can be imagined as coming from the 'source' 4, growing, crossing the facet and vanishing into the 'well' 4', the fourth vertex of the other tetrahedron. This field thus has normal continuity and the flux across the facet forms the degree of freedom for the element. Alternative expressions for linear basis inside a tetrahedron have been derived in [24]. They are given by f?7-i + g7-i x r, r in the tetrahedron 17 — = (3.26) 0, otherwise with h7-i f7-i = 6V r1i x ri2 (3.27) g7-i - 6V (3.28) in which i = 1, 2,..., 6, Ve is the volume of the tetrahedral element, &e = (ri2 -ri1)/h is the unit vector of the ith edge and hi = r i2- ril is the length of the ith edge

38 with ri1 and ri2 denoting the position vector of the i1 and i2 nodes. It can be shown that (3.23) is identically equal to (3.26) when simplified. Therefore, g7-i = h7-i (VLi x VLi2), i = 1,..., 6 where i1 and i2 are given in Table 3.4. The basis functions given in (3.26) have zero divergence and constant curl (V x W = 2g?). The order of the polynomial approximation for the first order edge element given in (3.23) or (3.26) can be taken as 0.5. This is because the value of the basis function is constant (0(1)) along the edge it supports and is linear (0(r)) everywhere else within the element. Mur and de Hoop [22] presented edge elements which are consistently linear, yielding a linear approximation of the field both inside each tetrahedron and along its edges and faces. Since this requires two unknowns per edge, there are 12 degrees of freedom per element. The basis functions in [32] are derived by first defining the outwardly directed vectorial areas of the faces as Ai = rj x rk + rk x rl + rl x rj (3.29) where ri, i = 1,..., 4 denote the position vectors of the vertices of the tetrahedron and i, j, k, 1 are cyclic. Then the edge-based vectorial expansion function is defined by N j(r)=-, 3- =1,...,4,A- (3.30) 3V where V is the volume of the tetrahedron and +(r) is a linear scalar function of position given by 1 (r - rb) A 4 3V in which rb is the position vector of the centroid of the tetrahedron. We observe that Oi(r) equals unity when r = ri and zero for the remaining vertices of the tetrahedral

39 element. In that sense, they are very similar to the simplex or volume coordinates mentioned earlier. They also satisfy the following equalities: 4 r = E i(r)ri i=l 4 EZ () = 1 i=1 The edge basis function Nij is a linear vector function of position inside the tetrahedral element and its tangential component vanishes on all edges of the element except the one joining vertices i and j. Nij varies linearly along the edge formed by nodes i and j such that Nij rj = 0 while Nj. (ri- r) = 1 These basis functions have non-zero values of divergence and curl. An inspection of the expressions for the vectorial areas reveals that the form is identical to that obtained by taking the gradient of one of the simplex or volume coordinates mentioned earlier. In other words, the three components of the vector A1 have the same functional dependence as that obtained by VL1 Y2 1 Z2 x2 1 z2 x2 1 Y2 VL1 = 6V x det y 1 3 - y det x3 1 3 + det X3 1 Y4 1 Z4 X4 1 Z4 X4 1 y4 where L1 is the volume coordinate for a tetrahedron defined in (3.13), det indicates the value of the determinant of the matrix and x, yi, zi denote the coordinates of the ith vertex. The basis functions with consistently linear interpolation in the tetrahedron can now be rewritten in more convenient notation as Wk = Ni = LVL, i,j 1,...,4, i j (3.31)

40 Still higher order basis functions are sometimes necessary for rapidly changing fields or for modeling extremely thin structures where linear interpolation for the highly stretched elements is not enough. The second order edge basis (O(r1-5)) for a tetrahedral element was first presented by Lee, Sun and Cendes [32]. We need 20 degrees of freedom to achieve a quadratic approximation of the vector field inside a tetrahedron (see Figure 3.8). Accordingly, the field within a tetrahedron can be written as 4 4 4 E = S Ei iVLj + E (FiLiLjVLk + F2LiLkVLL) i=1 j=l i=1 (3.32) ii iij where i, j, k form cyclic indices. The facet variables Fl' and F2 are common unknowns 2 4 w4 E3 3 Figure 3.8: Tetrahedral element for two tetrahedra that share the same face. Even higher order edge-based elements up to polynomial order 2 can be constructed. Each tetrahedral element now has 30 unknowns - 3 along each edge and 3 on each face.

41 Other elements Recently, Wang and Ida proposed a systematic method for the construction of curvilinear elements in [33]. The vector shape function is expressed in the following form: Wi(r) = (,, ()vi(r), i = 1,..., M (3.33) where < I, (C) are completely defined in the local coordinate system, vi contains the edge and facet information and M denotes the number of degrees of freedom in the element. These basis functions usually lead to a symmetric system of equations. However, it is difficult to find commercial mesh generation packages which construct curvilinear elements for a wide range of geometrical configurations. Hierarchical vector elements Finite elements are said to be hierarchical when the basis functions for an element are a subset of the basis functions for any elment of higher order [12]. The basis functions described in [34] are hierarchical and tangentially continuous. Vector elements complete upto polynomials of order 2 are available and basis functions of a given order are fully compatible to be used with basis functions of lower or higher orders. Thus elements of different orders could be used in the same mesh - lower order elements could be used in regions where field variation is uniform and higher order elements employed in regions where the field varies rapidly. The implementation of hierarchical vector elements can be a bit tricky, especially at the transition boundaries where elements of one order merge into the elements of another order. If several vector elements share an edge, the field tangent to the edge can be made identical in each of the tetrahedra. This is done by setting the coefficient of the corresponding basis function for the edge in all tetrahedra to be identical. For tangential continuity across a face, the same equality must be enforced

42 between the coefficients of all the edge and facet functions associated with the face. Table 3.5 gives the basis functions for hierarchical vector finite elements. Higher order basis functions are constructed by systematically adding the extra terms upto the desired order. It should be noted that the bases for the tetrahedron with 6 and Element type Polynomial Unknowns Basis function order per element Edge 0.5 6 L LVLj - LjVLj Edge 1 12 V (LiLj) Face 1.5 20 LkLjVLi Face LkLiVLj Edge 2 30 V [LiLj (Li - L)] Face V [LiLjLk] Table 3.5: Hierarchical basis functions for tetrahedron 20 unknowns shown in Table 3.5 is identical to the linear and second order edge basis given in (3.23) and (3.33), respectively.

CHAPTER IV Vector finite elements for 3D electromagnetic problems Finite elements have been used extensively to model open and closed domain electromagnetic problems in scalar form in two and three dimensions[13, 35, 36]. But a reliable full vector formulation proved to be extremely difficult to implement. The cause of the problem was found to be the traditional nodal basis functions that were being used to discretize the unknown field variable. The reasons for the failure of node-based elements in modeling the vector wave equation will be discussed in a later section. Fortunately, a novel remedy was found by assigning the degrees of freedom to the edges rather than to the nodes of elements. These types of elements had been described by Whitney[19] in terms of geometrical forms about 35 years back and were revived by Nedelec [20] in 1980. In recent years, Bossavit [14] and others [21, 22, 23, 24] applied these edge-based finite elements successfully to model three dimensional problems. In all these works, edge elements were seen to be devoid of the shortcomings commonly experienced with node-based elements. The goal of this thesis was to develop a general purpose code for computing the scattering pattern of three dimensional composite structures having complex shapes. Edge based functions with their robustness in modeling general three dimensional 43

44 problems were evidently our choice. Since the only simple shape to model an arbitrary three dimensional space is a tetrahedron, we settled on edge-based tetrahedra as our mesh discretization units. The mesh truncation was chosen to be done with absorbing boundary conditions (ABCs) which are local in nature, preserve the sparsity of the finite element system and permit scalability to large problems with minimal storage -O(N)- and computational time - 0(k * N), k <K N - penalties. In the first part of this chapter, we present the weighted residual or weak formulation for the closed domain problem and solve for the eigenvalues of a metallic cavity having arbitrary shape. In passing, we briefly describe the problem of spurious solutions encountered with node-based elements. We also validate our methodology by comparing the computed eigenvalues with analytically derived ones. In the latter part of the chapter (Part II), we formulate the open domain problem in terms of the variational functional, describe the enforcement of boundary conditions for perfectly conducting and composite targets and present the proof of the mesh termination condition in detail. We then validate our solution by comparison with measured or analytically derived data. PART I: CLOSED DOMAIN PROBLEM Solving Maxwell's equations for the resonances of a closed cavity is important in understanding and controlling the operation of many devices, including particle accelerators, microwave filters, microwave ovens and optical fibers. However, the exact eigenvalues can be obtained only for simple geometries. For arbitrarily shaped cavities, numerical techniques like the finite element method must be used, but the occurrence of spurious modes [17] in the node-based finite element approach has plagued the computation of their eigenvalues. This difficulty can be circumvented

45 with the introduction of a penalty term [37] to render the finite element vector field solutions non-divergent. However, it is difficult to satisfy continuity requirements across material interfaces and treat geometries with sharp edges [38] using classical finite-elements, obtained by interpolating the nodal values of the vector field components. As mentioned in the Introduction to this chapter, edge elements, a type of vector finite elements with their degrees of freedom associated with the edges of the mesh, have been shown to be free of these shortcomings. Generally these lead to more unknowns but the higher variable count is balanced by the greater sparsity of the finite element matrix so that the computation time required to solve such a system iteratively with a given accuracy is less than the traditional approach [24]. Here we solve for the eigenvalues of an arbitrarily shaped metallic cavity using node-based and edge-based vector finite elements. The computed data are then compared with analytical results for empty and partially filled cavities. A comparison between the storage intensity and computational accuracy for edge-based rectangular bricks and tetrahedra is also presented. Finally, we compute the eigenvalues of a metallic cavity with a ridge along one of its faces. 4.1 Formulation 4.1.1 Finite element equations Consider a three dimensional inhomogeneous body occupying the volume V. To discretize the electric field E within this volume, we subdivide the volume into small tetrahedra or rectangular bricks, each occupying the volume Ve (e = 1,2,...,M), where M is the total number of elements. For a numerical solution, we expand E within the eth volume element as

46 m E=EJW (4.1) j=1 where WJ are the edge-based vector basis functions, Eje denote the expansion coefficients of the basis, m represents the number of edges comprising the element and the superscript stands for the element number. On substituting this into the usual vector wave equation and upon applying Galerkin's technique, some vector identities and the divergence theorem, we obtain the weak form of Maxwell's equation = E (V x W. (V X We)- rWe WJ dv -JkoZoJ W (n x H) ds (4.2) where Ri represents the weighted residual integral for element e, Se denotes the surface enclosing Ve, n is the outward unit vector normal to Se,) ZO is the freespace intrinsic impedance and 6r, or is the material permittivity and permeability, respectively. Equation (4.2) can be conveniently written in matrix form as {Re} = [Ae] {Ee} - k2 [Be6] {Ee} - {Ce} (4.3) where A. = -(V x We) (V x We) dv (4.4) B = erW Wje dv (4.5) Ci = JkoZ0j W (n x H)ds (4.6) and on assembling the equations from all the elements making up the geometry, we obtain the system M M M M E{Re} = [Ae] {Ee} - k [Be] {Ee} - E {Ce} e=l e=l e=l e=l = {o} (4.7)

47 where all matrices and vectors following the summation sign have been augmented using global numbers. Due to the continuity of tangential H at the interface between two dielectrics, an element face lying inside the body does not contribute to the last term of (4.7) in the final assembly of the element equations. As a result, the last term of (4.7) reduces to a column vector containing the surface integral of the tangential magnetic field only over the outer surface of the body. In this application, the surface enclosing the volume of the body V is perfectly conducting and, thus, the coefficients associated with the edges bordering the perfectly conducting surface can be set to zero a priori. This reduces the original unknown count and eliminates the need to generate equations for those edges/unknowns which would have otherwise involved the column vector {Ce}. Also since {Ce} is only associated with boundary edges, the surface integral associated with it vanishes and (4.7) can be written as [A]{E} = [B] {E} (4.8) where [A] and [B] are N x N symmetric, sparse matrices with N being the total number of edges resulting from the subdivision of the body excluding the edges on the boundary, {E} is a N x 1 column vector denoting the edge fields and A = k2 gives the eigenvalues of the system. A solution of (4.8) will yield the resonant field distribution {E} and the corresponding wavenumber ko. 4.1.2 Origin of spurious solutions Conventional finite element basis functions give rise to spurious solutions when (4.8) is solved. As Wong and Cendes points out in [39], the origin of these spurious solutions lies in the infinitely degenerate eigenvalue k = 0 in the spectrum of (4.8). Given the eigenvalue system in 4.8 along with the PEC boundary condition n x E = 0

48 on the boundary, there exists an infinite number of scalar functions 4 such that,0 = 0 on the boundary. Then E = -Vo is a permitted eigenfunction corresponding to the eigenvalue k = 0. If the discretization scheme fails to model this infinite dimensional nullspace of the curl operator exactly, spurious solutions to the eigenvalue problem will appear. One way to get rid of spurious modes is to formulate the eigenvalue problem such that k = 0 is no longer a permissible eigenvalue. This is achieved by enforcing V E = 0 (4.9) exactly everywhere in the solution region. Then the only solution corresponding to the k = 0 eigenvalue is the trivial one E = 0. This is also the reason why spurious solutions do not occur when the Helmholtz equation is discretized. In finite elements, solving a problem (4.8) along with a constraint (4.9) is well known [12]. Researchers have mostly tried the penalty function approach of constrained minimization [37, 38] since it is simple to implement. However, the penalty approach is a mere fix and not a cure for the problem. Since the spurious eigenmodes are now shifted far into the visible spectrum, they are not completely eliminated and are dependent on an user-defined parameter which specifies how strongly the divergenceless condition is to be imposed. Other than the penalty method, derivative continuous finite elements (C1 elements) have also been proposed [39] to alleviate this problem. In this method, an auxiliary vector field ( is introduced such that E=Vx( (4.10) Since substitution of (4.10) for E into (4.8) results in second derivatives, we need to construct first derivative continuous elements or C1 elements. As shown in [39],

49 discretization of E using node-based C1 elements eliminates the problem of spurious solutions since the nullspace of the curl operator is modeled exactly. However, C1 elements are not commonly found in finite elements and need to be explicitly derived for the problem at hand. Another method of eliminating spurious modes, without getting rid of the eigenvalue k = 0, is by using edge elements [40]. Bossavit in [40] provides a mathematical proof as to why spurious modes do not appear with edge based elements and why they are likely to be present with node-based vectorial elements. However, special node-based elements, like the C1 element in [39], do not present this problem. Thus the root cause of spurious modes appears to be the improper modeling of the nullspace of the curl operator. Any basis function which approximates it correctly will be stable and free of spurious modes. As it turns out, conventional Lagrangian finite elements are unsuitable; either C1 node-based elements or edge-based elements of any order can be used to obtain the true solutions. 4.1.3 Basis functions Edge no. i Z2 1 1 2 2 1 3 3 1 4 4 2 3 5 4 2 6 3 4 Table 4.1: TETRAHEDRON EDGE DEFINITION

50 The vector edge-based expansion functions for rectangular bricks were presented in [30]. Vector fields within tetrahedral domains can be conveniently represented by expansion functions that are linear in the spatial variables and have either zero divergence or zero curl. The basis functions defined in [24] are associated with the six edges of the tetrahedron and have zero divergence and constant curl. To define them, let us assume that i1 and i2 are the terminal nodes of the ith edge and the six edges of a tetrahedron are numbered according to Table 4.1. The vector basis function associated with the (7 - i)th edge of the tetrahedron is then given by f7-i + g7-i x r, r in the tetrahedron W_, - (4.11) O, otherwise with b7_i f7-i = xr2 (4.12) g7-i = 6- (4.13) 6Ve in which i = 1, 2,..., 6, Ve is the volume of the tetrahedral element, ei = (r2 - r^ )/bi is the unit vector of the ith edge and bi = Iri - ri I is the length of the ith edge with ri1 and r%2 denoting the location of the il and i2 nodes. In general, the implementation of the above discretization will involve two numbering systems, and thus some unique global edge direction must be defined to ensure the continuity of n x E across all edges [41]. Here we choose this direction to be coincident with the edge vector pointing from the smaller to the larger global node number. Finally, since V Wt = 0, the electric field obtained from a solution of (4.3) satisfies the divergence equation within each element and, thus, the solution will be free from contamination due to spurious solutions.

51 Mode Analytical Computed Computed Error (%) Error (%) (bricks) (tetra.) (bricks) (tetra.) 270 260 unknowns unknowns TElol 5.236 5.307 5.213 -1.36.44 TM1lo 7.025 7.182 6.977 -2.23.70 TEoll 7.531 7.725 7.474 -2.58 1.00 TE201 7.767 7.573 -3.13 -.56 TM111 8.179 8.350 7.991 -2.09 2.29 TE111 8.350 8.122 -2.09.70 TM210 8.886 9.151 8.572 -2.98 3.53 TE102 8.947 9.428 8.795 -5.38 1.70 Table4.2: EIGENVALUES (ko,CM-1) FOR AN EMPTY 1CM X 0.5CM X 0.75CM RECTANGULAR CAVITY 4.2 Results In Table 4.2, we present a comparison of the percentage error in the computation of eigenvalues for a 1cm x.5cm x.75cm rectangular cavity using edge-based rectangular bricks and tetrahedra. The edge-based approach using tetrahedral elements predicts the first six distinct non-trivial eigenvalues with less than 4 percent error and is seen to provide better accuracy than rectangular brick elements. The maximum edge length for the rectangular brick elements was.15cm whereas that for the tetrahedral elements was.2cm. To investigate this matter further, we considered a cubical metallic cavity having a side length of.5cm. A plot of the percentage error in calculating the first three degenerate resonant frequencies versus the number of unknowns is given in Figure 4.1 for both rectangular bricks and tetrahedral elements. It is clear in this example that the tetrahedral elements predict the eigenvalues with greater accuracy than the rectangular bricks. In Tables 4.3, 4.5 and 4.4, we compare the exact eigenvalues with those computed using edge-based tetrahedral finite elements. The finite element mesh was generated

52 1.5 -1.25 -u-,,Tetrahedron ^ 0.75- \ ^^ 0.5 0.25 -0 500 1000 1500 2000 Number of unknowns Figure 4.1: Performance comparison of rectangular bricks and tetrahedrals. using SDRC I-DEAS, a commercial pre-processing package and it is seen that the numerical results are in good agreement with the exact values for both homogeneous and inhomogeneous cavities. The exact eigenvalues of the half-filled cavity as described in Table 4.3 are computed by solving the transcendental equation obtained upon matching the tangential electric and magnetic fields at the air-dielectric interface. As seen, these results agree with those predicted by the finite element solution to within 1 percent (no symmetry was assumed in this solution). Similar comparisons are given in Table 4.4 for a sphere having 1cm radius. Finally, Table 4.6 presents the eigenvalues of the geometry illustrated in Figure 4.2. This is a closed metallic cavity with a ridge along one of its faces. It is noted that as the degeneracy of the eigenvalues increases, the matrix becomes increasingly ill-conditioned and the numerical solution is correspondingly less accurate [42]. This is clearly observed from the data in Table 4.4 for the case of a

53 Mode Analytical Computed Error (%) 192 unknowns TEzlo 3.538 3.534.11 TEz201 5.445 5.440.10 TEz102 5.935 5.916.32 TEz301 7.503 7.501.04 TEz202 7.633 7.560.97 TEzlo03 8.096 8.056.50 Table 4.3: Eigenvalues (ko,cm-1) for a Half-Filled 1cm x O.lcm x 1cm Rectangular Cavity Having a Dielectric Filling of cr = 2 Extending from z = 0.5cm to z = 1,Ocm. * - 1.0 cm 0.5 cm i; ', y/ — 0.4 cm — — 0.4 cm — Figure 4.2: Geometry of ridged cavity perfectly conducting hollow spherical cavity. Since the second lowest TM mode has five-fold degeneracy, the computational error is seen to be the greatest. However, for the partially filled rectangular cavity, the absence of degenerate modes gives results which are accurate to within 1 percent of the exact eigensolutions. We finally remark of the inherent presence of zero eigenvalues in our computations whose number is equal to the internal nodes. These zero eigenvalues are easily identifiable and since they do not correspond to physical modes, they were always discarded.

54 Mode Analytical Computed Error (%) 300 unknowns TMojo 2.744 2.799 -2.04 TM,,,even 2.802 -2.11 TM111,odd 2.811 -2.44 TMo21 3.870 3.948 -2.02 TMi2i,even 3.986 -2.99 TM121,odd 3.994 -3.20 TM221,even 4.038 -4.34 TM221,odd 4.048 -4.59 TEoll 4.493 4.433 1.33 TEiii,even 4.472.47 TE111,odd 4.549 -1.25 Table 4.4: Eigenvalues (ko, cm-1) for an empty spherical cavity of radius 1cm 4.3 Conclusions It was shown that the resonant frequencies of an arbitrarily shaped inhomogeneously filled metallic resonator can be computed very accurately via the finite element method using edge-based tetrahedral elements. The same method in conjunction with node-based elements is much less reliable and not readily applicable to regions containing discontinuous boundaries in shape and material. Edge-based rectangular bricks do not provide as good an accuracy as edge-based tetrahedral elements and their use is further limited to a special class of geometries. PART II: OPEN DOMAIN PROBLEM Of generic interest in electromagnetic scattering is the modeling of composite configurations comprised of metallic and non-metallic sections. In the case of man-made structures, abrupt material discontinuities and metallic corners are also encountered along with resistive sheets and thin ferrite coatings intended for controlling the scat

55 Mode no. Analytical Computed Error(%) TM 010 4.810 4.809.02 TE 111 7.283 7.202 1.1 7.288 -.07 TM 110 7.650 7.633.22 7.724 -.97 TM 011 7.840 7.940 -1.28 TE 211 8.658 8.697 -.45 8.865 -2.39 Table 4.5: Eigenvalues for an empty cylindrical cavity of base radius 0.5cm and height 0.5 cm (380 unknowns) terer's radar cross-section (RCS). Differential equation methods, especially the finite element method (FEM), with its capability of handling arbitrary geometries and its versatility in modeling inhomogeneities and material discontinuities has been a viable solution approach for bounded domain problems. However, for unbounded problems as is the case with electromagnetic scattering, the solution is more involved since the finite element mesh needs to be truncated artificially at some distance from the object with a suitable boundary condition. These boundary conditions can be either global or local. Global boundary conditions are exact but lead to fully populated submatrices thus spoiling; the sparse, banded structure of the finite element system. Further, problems due to internal resonances may arise in many cases [43]. In contrast, local conditions such as the absorbing boundary conditions (ABCs), are approximate but have the important advantage of retaining the sparsity of the matrix system. Also, they are free from the interior resonance problem that plagues boundary integral termination schemes [43]. ABCs are essentially differential equations enforced at the mesh truncation boundary and are chosen to suppress non-physical reflections from that boundary, thus ensuring the outgoing nature of the waves.

56 No. (a) (b) 1 4.941 4.999 2 7.284 7.354 3 7.691 7.832 4 7.855 7.942 5 8.016 7.959 6 8.593 8.650 7 8.906 8.916 8 9.163 9.103 9 9.679 9.757 10 9.837 9.927 Table 4.6: Ten lowest non-trivial eigenvalues (ko, cm-1) for the geometry drawn in Figure 2: (a) 267 Unknowns; (b) 671 Unknowns A variety of ABCs have been derived and widely employed in FEM solutions of open region two-dimensional scattering problems. However, the method's implementation and performance for scattering by three dimensional geometries using edgebased finite elements has not received similar attention. The only three-dimensional implementations of the FEM for scattering has been a hybrid solution combined with the boundary element method (BEM) [41, 30, 44] and those formulations combined with ABCs [45, 46]. The boundary element method, though exact, is equivalent to employing a global boundary condition for terminating the mesh and consequently leads to a full submatrix, restricting the method's utility to small geometries. For large-scale three-dimensional applications, it is necessary to employ an ABC for terminating the mesh to retain the O(N) storage requirement, characteristic of the finite element method. However, the use of traditional node-based elements suffers from various difficulties as mentioned in the Part I. To avoid these difficulties, we consider an implementation of the FEM using vector basis functions whose degrees of freedom are associated with the fields along the six edges of a tetrahedron. Our implementation is further coupled with a mesh

57 termination scheme based on the vector ABCs derived in [47, 48]. In contrast to the implementation proposed in [49], the one presented here preserves the symmetry of the finite element system, thus being computationally more efficient and making it ideally suited for solution via a conjugate gradient type of algorithm. Further, the implementation discussed in [49] requires that the absorbing boundary be placed nearly a wavelength away from the scatterer, whereas in our implementation remarkably accurate results are obtained with the ABCs enforced a small fraction of a wavelength from the scattering body. This is probably due to the accuracy of the second order ABCs derived in [48]. 4.4 Formulation In the following section, the open domain problem is formulated in terms of the finite element functional. The final system is obtained by setting the first variation in the functional to zero and then a Rayleigh-Ritz minimization is performed to arrive at the final answer,. 4.4.1 Derivation of finite element equations Let us consider the problem of scattering by an inhomogeneous target associated with possible material discontinuities. To solve for the scattered fields via the FEM, it is necessary to enclose the scatterer- embedded inside the volume V- by an artificial surface SO on which the ABC is enforced (see Figure 4.3). The ABCs to be considered in this chapter are the Sommerfeld radiation condition given by n x V x Es = -jkoni x n x E (4.14) and the second-order ABC [48] which can be written as n xVxEs = aE + /Vx[nf(VxEs)n] + /Vt(V. Es) (4.15)

58 l* *.j;::.:::.:::::::::::. *. * *......K............. i.:.:......,, '.'..,, '. iii. iiiii iiii... iiii.i iiiig.....:...........:.. *.:-:- -:-:- iiiiiiiiiiiiiiiii i..............................iiiii i iiiii iiiI.:-:-.:. th,::u::n:it no m.::th aciiiiiiaic Siii ti iiiiiiiiiip t iiiiaii iii:..:.: -............iii.i.i..i...iii.i.iiiii.iii.ii.i.......i i. I.....:....*:::::......................:* '.:V ^ ' ' i'i^iI.iiii..i.i.i.iii iiiiii iiii yii^............. ii II ii................: I and no liiiiicpiiineiitiit S rieisc i iviieiiy. iWiiiiih n iiiiriieieim iiiiiioiini i:.::. w..=: i~~iirr I.:..........ii.................... i........... v ''".'1ii i j iiEasiitiiij iiiiiiiii~iJ^ ii Figure 4.3: Illustration of scattering structure Vd enclosed by an artificial mesh termination surface, SOi on which the absorbing boundary condition is imposed. where a = jk, s = 1/(2jk + 2/r), Es represents the scattered electric field, n is the unit normal to the surface So and the subscripts t and n denote the transverse and normal component to So, respectively. When these ABCs are employed on the artificial boundary So, they annihilate all field terms of O(r-(2m+1)) and smaller, where m denotes the order of the ABC. The ABCs outlined above were derived for spherical surfaces [48] but in this work we have extended their application to SO which include flat sections. In this case, the local curvature is used to replace 1/r in (4.15). For flat sections, the 1/r term, therefore, reduces to zero. This permits the construction of termination boundaries conformal to the scatterer, thus reducing the size of the the computational domain. A detailed derivation of (4.15) as well as other more general ABCs are outlined in Chapter 6. The vector ABCs (4.14) and (4.15) can be combined and more conveniently writ

59 ten as n x V x E = P(E') (4.16) for the scattered field formulation in which Es is the working variable and n x V x E = P(E) + Un"' (4.17) for the total field formulation where the unknown is the total electric field. In (4.17), U'C = n xVxEc - P(Ein) (4.18) where E = Es + Einc is the total field and Einc is the incident electric field. Considering (4.17) to be the boundary condition employed at So, we can express the functional for the total electric field as F(E) = J [ (V x E) (V x E) - kcrE E] dV + / [E. P(E) + 2E U dS (4.19) where Er and Pir are the relative permittivity and permeability, respectively. The above functional can be generalized to account for the presence of impedance and resistive sheets or other discontinuous boundaries. In the case of a resistive card, the transition condition [50] n x (n x E)=-Rn x (H+ -H-) (4.20) must be enforced, where Hi denotes the total magnetic field above and below the sthe resistivity in Ohms per square and n is the unit normal to the unit normal to the sheet pointing in the upward direction (+ side). For an impenetrable impedance surface, the appropriate boundary condition on that surface is n x (n x E) = -rnfi x H (4.21)

60 where ni is the unit normal to the surface and V is the surface impedance. Taking into consideration these boundary/transition conditions, the functional for the total electric field can be more explicitly written as F(E) = IV [ (V x E)(V E - krE E]dV +jkoZoj (n x E) (n x E)dS + J [E P(E) + 2E u ] dS (4.22) where K is the surface resistivity (R) when integrating over a resistive card and equals the surface impedance (r/) for an impedance sheet. In order to deal with anisotropic scatterers, the functional outlined in (4.22) undergoes a slight modification since the material properties of the scatterer (permeability and permittivity) are now second rank tensors rather than scalars. Equation (4.22) can therefore be written as F(E) = [(VxE).{[-1. (V x E)}- E { E}] dV +jkoZ J (ni x E) (n x E) dS + [E. P(E) + 2E U dS (4.23) where Uxx ~xy ~xz =- (/y tyy tyz (4.24) Izx Pzy Izz and Cxx Cxy Cxz = ~ (4.25) yx Szyy Syz (4 Lzx C (zy Czz

61 The symmetry of the final system of equations now depends on the symmetry of the permeability and the permittivity tensors. The formulation presented above is in terms of the total field but we can easily revert to a scattered field formulation by setting Es = E - Einc and noting that the scattered field satisfies the wave equation inside the domain of interest. Let us consider the case where the computational volume V is occupied by a dielectric structure and is bounded internally by the surface of a perfect conductor and externally by the mesh termination boundary. On examining the terms inside the volume integral in (4.19), we can define j [(VxE).(VxE)-k2erEE]E dV = G(E,E) (4.26) Expressing the above relation in terms of the incident and the scattered fields, we have G(E, E) = G(ES, ES) + 2G(Es, Einc) + G(Einc, Einc) (4.27) The first and the third terms on the RHS of (4.27) cannot be simplified any further. The second term will, however, lend itself to more simplification. Making use of a simple vector identity and the divergence theorem, we can rewrite G(ES, Einc) as G(Es, Einc) = J Es [Vx-VVxEif- k2rEin dV - j E* (n x VxEinc) dS (4.28) since -(VxE ). (VxE )] dV = j Es. [Vx-VxEinc dV - -E (n x VxEc) dS (4.29) L ~Ur PUr

62 and the surface integral cancels out everywhere inside the computational domain except on the mesh termination boundary So. If we define Vd to be the volume occupied by dielectric materials, then the remaining volume (VO = V - Vd) is the volume occupied by free space. On incorporating this into (4.28), we have G(Es EInc) E=. [VxVxEinc - koEnc dV + E Vx- x E -koerE dV Since the incident electric field satisfies the wave equation in free space, the first term of (4.30) is identically zero. The third term cancels exactly with the cross term ds E. UiTc dS in the total field functional (4.19). The second term can be simplified by employing a standard vector identity and the divergence theorem to yield Vd [inc in +jkZ j o -Es. (ni x Hinc) dS (4.31) where the normal to Sd is directed away from Vd. The surface integral over the dielectric interface Sd occurs since the tangential component of the scattered electric field is discontinuous over the interface between two dielectrics having dissimilar permeabilities. It should be noted that (4.31) holds good even when there are multiple dielectric regions present. If the dielectric regions have the same permeability (/Prl = br2 =... rn = 1, for example) and different permittivities, the surface integral contribution over the dielectric interfaces -Sdl,..., Sdn- is zero. If different permeability values are also present, then the permeability values must be substituted into the element equations and the direction of the normal for the two elements on

63 the interface should take care of the respective signs. Therefore, G(ES, Einc) reduces to G(E, Einc) = (VxE) s(VxEinc) - koc2rE. Eic dV -r +jkoZo 1- Es (n x Hinc) dS Pd ]r -I L E e (n x VxEinc) dS (4.32) The impedance and resistive sheet boundary conditions can be incorporated in a similar way into the scattered field functional. After simplification, the functional F(Es) for the scattered field is then given by F(E5) == J (V x Es) (V x Es) - k2rE E] dV +JkoZo (ni x ES). (n x Es) dS + E P(ES)dS +2jk0Zo J I-E". (n x H"ic) dS +2 [(V x Es) (V x Einc) - k erEs Einc] dV +2jkoZo Is (n x E5) * (n x Einc)dS Es5 P(E5)dS = [ (Es)2 +,(V x Es) - (V - Es)2] dS when the second order ABC is employed. The function f (Einc) is solely in terms of the incident electric field and vanishes when we take the first variation of F(E5). We remark that the scattered field formulation was implemented in our code.

64 4.5 Finite element discretization To discretize the functional given in (4.33), the volume V is subdivided into a number of small tetrahedra, each occupying the volume Ve (e = 1, 2,..., M), where M denotes the total number of tetrahedral elements. Within each element, the scattered electric field is expressed as m E = E EW = {W }T{Ee} = {E}T{W } (4.34) j=l where Wq are the edge-based vector basis functions [24], Ej denote the expansion coefficients of the basis and represent the field components tangential to the jth edge of the eth element, m is the number of edges making up the element and the superscript stands for the element number. The basis functions used in our implementation have zero divergence and constant curl. The system of equations to be solved for Eje is obtained by a Rayleigh-Ritz procedure which amounts to differentiating F with respect to each edge field and then setting it to zero. On substituting (4.34) into (4.33), taking the first variation in F and assembling all M elements, we obtain the following augmented system of equations ( OF M M MP E = [A]{Ee + E~[BS]{ES} + E{CP} = 0 (4.35) e=1 s=1 p=l In this, MS denotes the number of triangular surface elements on Sk and So and Mp is equal to the sum of the surface elements on Sk, Sd and the volume elements in Vd. The elements of the matrices [Ae], [B8] and {CP} are given by = J [i(V x W). (V x Wj) - W Wj] dV Bsj = jkoZo [/ (n x W) (n fix W)dS]

65 + aJ [Wt * W~ t + 3(V x W) (V x W) - i(V. W,)(V. W't)] dS Ct = 2jkiZ [ -W. ( ni x H c)dS + J (ni x WP) i (n x E )dS] L Yr ^k K +2 X E- (V x W) (V x Einc) - k6rW? P Ei] dV where Vd is the volume of the pth tetrahedron inside the dielectric, S3 and SP represent the surface area of the sth and pth triangular surfae element and the subscripts t and n denote the tangential and normal components of a vector, respectively. The boundary condition nf x Es = -n' x EinC must be imposed a priori on metallic boundaries; however, no special treatment is required at material discontinuities. Only the identification of the edges on material discontinuities or inhomogeneities is required to kick in the contribution from the surface integrals in (4.33). The biconjugate gradient algorithm with diagonal preconditioning was used to solve the sparse, symmetric system of equations. The residual norm was usually set to less than 0.1% of the solution norm as a criterion for convergence since lower tolerances did not appear to offer significant improvement on the far-field values. The data structure was constructed such that only the non-zero elements of the upper triangular part of the symmetric, sparse matrix were stored in a Na x k complex array. In our case, Na was typically 1.1 x N, where N, denotes the number of unknowns and k was equal to 12. The corresponding addresses were stored in a separate Na x k integer array. The storage required in this scheme was about 25N" and the number of distinct non-zero elements was typically 9N,. 4.6 Results A computer program was written for implementing the proposed FE-ABC formulation. This implementation was validated by computing the scattering for several

66 15 -10.! 5... 0- 00 6 '0 1: Z V-O0-a —/0 --- 'v -5 -b-10 X. Y -15 _. Measured FE-ABC(_=90~ ) -20 -............... - illll.i o 0 Measured -25-, * i.,,i,-, i -, i,, -- 0 30 60 90 120 150 180 Observation Angle 6o, deg. Figure 4.4: Bistatic echo-area of a perfectly conducting cube having edge length of 0.755A. Plane wave incident from 0 = 180~; 4 = 90~. configurations including metallic and dielectric bodies as well as structures satisfying resistive and impedance boundary conditions. Figure 4.4 compares the measured [51] bistatic cross-section (nc = 180~, q$7C = 90~) of a metallic cube having an edge length of 0.755A with the corresponding pattern computed by the three-dimensional FE-ABC code. The second-order vector ABC was employed on a spherical mesh truncation boundary which was placed only 0.1A from the edge of the cube. About 33,000 unknowns were used for the discretization of the computational domain and the [A] matrix contained a total of 264,000 distinct non-zero entries. The storage requirement of this matrix was consequently much smaller than that of the 1400 unknown moment method system (assuming the same sampling rate as the FEM of 14 points/A) which had 2 million non-zero entries. In Figure 4.5, we plot the normal incidence backscatter RCS of a perfectly con

67 30 - 20 -10 -N TY FE-ABC -10 - -20 -... -30 - I I I I I I I I,,, 0 1 2 3 4 5 6 7 Edge length in wavelengths Figure 4.5: Backscatter RCS of a perfectly conducting cube at normal incidence as a function of edge length ducting cube as a function of its edge length. The meshes constructed for this experiment were terminated on conformal boundaries, i.e, on another cube placed a small distance (more than 0.15A) from the scatterer. As seen, the agreement with measured data [51] is remarkably good over a 50dB dynamic range. Figure 4.6 presents backscatter data for a cylinder of radius 0.3A and height 0.6A. The data from the three-dimensional finite element code again compare well with that obtained from a moment method-body of revolution code. The mesh was terminated on a spherical boundary at a distance of 0.3A fromn the edge of the scatterer and the system consisted of nearly 33,000 unknowns. Convergence was achieved within about 350 iterations when the Sommerfeld radiation condition was imposed on the spherical mesh termination boundary. Each iteration took approximately 0.1 seconds on a Cray YMP after vectorization and on the average it was found that for N > 25,000,

68 10 5.. O ~ e '" 'iV-ili"^ ~ a"..... ii o 0 E=O (BOR) =0..................(BOR).................. =0:::::::::::::::: (FE-ABC).................. (FE-ABC ) -15 I 0 30 60 90 Observation Angle 8o, deg. Figure 4.6: Backscatter pattern of a perfectly conducting cylinder having a radius of 0.3A and a height of 0.6A. The solid and the dashed lines indicate data obtained from a body of revolution code and the black and white dots indicate FE-ABC data. the number of required iterations were approximately N/100. The agreement was quite good even on enclosing the metallic cylinder with a rectangular outer boundary placed 0.3A from the edge of the scatterer. The results presented till now have been for perfectly conducting geometries. However, the real advantage of the FEM over integral equation techniques is the ease with which the former can handle material inhomogeneities and transition conditions. With this in mind, the remaining figures show backscatter and bistatic patterns for scatterers comprised of resistive cards, dielectric material and combinations of these. The first geometry that we tested was that of a homogeneous dielectric sphere having a relative permittivity of 4 and a radius of 1/2w7. The bistatic pattern of the geometry is compared to that obtained using a CG-FFT formulation (Figure 4.7) and is seen

69 O0 --10 o -20- z Y - -40.................FE-ABC:.::..::..::..:0 * CG-FFT -50 —, |, -50 -' '- "' I ' 'I --— '- I —',' ---'-' ---' — 0 30 60 90 120 150 180 Observation Angle 0~, deg. Figure 4.7: Bistatic echo-area of a homogeneous dielectric sphere (cr = 4; koa = 1). to agree remarkably well. The finite element mesh was terminated only 0.3A from the dielectric body. Another of the test cases was a prolate spheroid shown in Figure 4.8 filled with lossy dielectric having (r = 4-ji, k~a = xr/2 and a/b = 2, where a and b are the major and minor axes of the spheroid, respectively. The bistatic pattern (oinc = 180~; qifC = 90~) obtained from the FE-ABC solution agree reasonably well with those obtained via the hybrid finite element-boundary integral method presented in [41]. However, the corresponding convergence rate for non-metallic bodies and resistive/impedance sheets was found to be slower than that observed for metallic scatterers. A diagonal preconditioner was, therefore, used to accelerate the convergence of the biconjugate gradient algorithm with encouraging results. For our last example, we compute the scattering from an inhomogeneous geometry with embedded resistive cards. Particularly, the scatterer shown in Figure 4.9

70 2.5 21.5 - \ m.... FE' -ABC 0.5 ~FE-BI 0 -0 30 60 90 120 150 180 Observation Angle e0, deg. Figure 4.8: Normalized bistatic pattern of a lossy prolate spheroid (c, = 4-jI; ka = 7r/2;a/b - 2), where a and b are the major and minor axes of the spheroid, respectively consists of an air-filled resistive card block (0.5A x 0.5A x 0.25A) joined to a metallic block (0.5A x 0.5A x 0.25A). In Figure 4.10, we compare a principal plane backscatter pattern obtained from our 3D FE-ABC implementation with data computed using a traditional moment method code [52] for both polarizations. The computed data is again seen to follow the reference data closely. For the FE-ABC solution, the scatterer was enclosed within a cubical outer boundary placed only 0.3A away from the scatterer. This resulted in a 30,000 unknown system which converged to the solution in about 400 iterations when using the Sommerfeld radiation condition and in 1600 iterations when the second order ABC was used. For this geometry, the second order ABC did not provide a significant improvement in accuracy (only about 0.1dB) over the first order condition. The same case was run with a higher discretization resulting in a system of 50,000 unknowns; however, there was no significant difference in \~ ~ ~ ~ ~ ~~~~~~~~~~~ ^:;i:i:::;::::::::::::::;:::;::::;;:; 1 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~ ~~- ~~:;::::::::::::::::::::::::~::::;:;[ ' ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~ * \ ^ <^:::::::::::::::::::::ii::::;::::;:"^ ~ - V '<:^::::;::::::::::::::::::::;::::::;:;:":;; (rt~~~~~~~~~~~ \ ^^*i;;^..:;::::;::;:;:..:^'^' ~~~~~~~~~~~~.~ ~~................ ~~ ~~~~~~~~~~,: \ ___ ~~~~~~~~~~:~~~~~~~~~~~~ \ FE-ABC a j^~~~~~~~~;~~~~~~~~~~~~ ~~~~ * FE ~~~~BI ~~~~~~ 0~~~~~~~~~~~~~ 30 60 90 120 150 180~.~~ Observation~~~~~~~~ Angl 6^ deg.~~~~~~~~~~~ Figure 4.8: Normaized bistatic patern of a lossy rolate spheroid 67. = 4-~l; kod 7T/2; a/b = 2), where a and b are the major and minor axes of the spheroid,~~~~~~~~~~~~ respectively~~~~~~~ consiss of a air-flled rsistiv card lock 0.5A x0.5A x0.25A)joinedto a mtalli block (.5A x.5A x.25A).In Figre 4.1, we cmpare princpal plne bacscatte pattern otained frm our 3DFE-ABC imlementatin with dta computd using traditional momnt method code 52] for both poarizations. Thecomputed data i again seen tofollow the reerence data cosely. For th FE-ABC soluton~ the scat terer was enclosd within a cubial outer bounday placed only 03A away from th scatterer. Ths resulted ina 30,000 unknwn system whih converged t the solutio inaot40ieain hn sn h omredrdato odto n n10 iteratios when he secod orderABC wasused. Fo this gometrythe secnd orde ABC did ot provie a signficant iprovemen in accuacy (onl about OldB) ove the firs order ondition The sae case ws run wth a higer discetizatio result ing in asystem f 50,00 unknown; howevr, ther was no ignificnt diffrence i

71 z X/ Y Resistive a/2 -- a ---- I dielectric section (-T = 2 -j2), where the latter is bounded by a resistive surface having R = Zo. agreement with corresponding moment method data [52]. 4.7 Conclusions surface. This is probably due to the fast (1/r) decay of the scattered fields. As a result, in addition to ther = the matrix, the total number of unknowns surface... havin....Z the far-field valueswith the earlier cas...................kcaterpater shown n Figue 411 sthee same.as.in...........9.it.teai-flldsetin.o occupid by abossy ielecric ha ing...=. 2-........................ p tter for~ ~ ~~~~~~~~~......I...........tina omue y u EAB oeisaansent e ngo agreement with...epndn momen metho...... [2....

72 5 --5 ' -10 0 N -15 \ /' -20 _ FE-ABC (E::): *MoM -25- FE-ABC (E ): oMoM -30- ' I '"I1"'1'1 — I " ' 0 30 60 90 120 150 180 Observation Angle eO, deg. Figure 4.10: RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The lower half of the cube is metallic while the upper half is airfilled with a resistive card draped over it. is kept under control. Further, due to the use of edge elements, the program can easily handle sharp conducting edges and tips, inhomogeneous dielectric and/or magnetic materials, resistive sheets and impedance surfaces. These, in conjunction with the well-known advantages of the finite element method, result in low 0(N) storage requirement, making the computation of large body scattering possible. These capabilities along with the ease in modeling arbitrary geometries, makes this formulation, to the best of our knowledge, one of the first suitable for solving practical three-dimensional scattering problems.

73 10 -0 b -10- *0 -20,,,, I I I I...... 0 30 60 90 120 150 180 Observation Angle spo, deg. Figure 4.11: RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The composition of the cube is the same as in Figure 4.10, except that the air-filled portion is filled with dielectric. The solid curve is the FEMATS pattern and the black dots are MoM data for the Ez"C = 0 polarization.

CHAPTER V Optimization and parallelization In the previous chapter, we laid the foundation for our methodology by outlining the formulation of the finite elment system together with the absorbing boundary condition method of mesh termination and presenting some examples to validate our solution technique. We found that the FE-ABC technique yielded accurate far-field values for small geometries, i.e., structures whose dimensions are less than a wavelength. However, our principal motivation was to compute scattering from large, three dimensional structures having arbitrary material inhomogeneities and regions satisfying impedance and/or transition conditions. As mentioned earlier, the number of unknowns escalates rapidly in three dimensions as the target size increases. Therefore, the limiting factor in dealing with three dimensional problems is the unknown count and the associated demands on storage and solution time. Solution techniques which have O(N) storage and feasible solution times are thus the only way that the curse of dimensionality can be avoided. This is one of the principal reasons for the popularity of partial differential equation techniques over integral equation (IE) approaches which lead to dense matrices and O(N2) storage. As the problem size increases, the IE and hybrid methods, both of which need O(NA), 1 < I < 2, storage, quickly become unmanageable in terms of storage and solution time. 74

75 Another concern while solving problems having more than 100,000 unknowns - a scenario that can be envisioned for most practical problems - is to avoid software bottlenecks. The algorithmic complexity of any part of the program should increase at most linearly with the number of unknowns. In this chapter, the implementation details of our finite element code are presented along with the associated numerical considerations. The various trade-offs associated with the data structures used to represent sparse matrices and their impact on vectorization and parallelization are discussed. The iterative solver, a preconditioned biconjugate gradient (BCG) algorithm, is studied along with point and block preconditioning strategies and the trade-offs between the two types of preconditioners are outlined. A modified incomplete LU (ILU) preconditioner is presented, which seems to work better than the original ILU preconditioner for our matrix systems. Iterative solvers for unsymmetric matrix systems are also mentioned to handle anisotropic geometries and situations where the mesh termination condition makes the system unsymmetric. In order to facilitate the solution of large problems, the computationally intensive portions of the finite element code have been parallelized on a variety of massively parallel architectures like the KSR1 (Kendall Square Research) and the Intel iPSC/860. A full analysis of the communication patterns is also presented for the KSR1 machine. 5.1 Numerical considerations The finite element code implemented by the authors can be divided into four main modules: * Input/output * Right-hand side vector (b) generation

76 * Finite element matrix (A) generation * Linear equation solver The input to the program consists of the mesh information obtained by pre-processing the mesh file generated from SDRC I-DEAS, a commercial CAD software package. The right-hand side vector,b, is usually a sparse vector and only a small fraction of the total CPU time is required to generate it. The finite element matrix generation consists of too many subroutine calls and highly complex loops to permit any significant speedup through vectorization. It is, however, highly amenable to parallelization as will be discussed later. The most time-consuming portion of the code is the linear equation solver, taking up approximately 90% of the CPU time. On a vector computer like the Cray YMP, it is possible to vectorize only the equation solver. However, short vector lengths and indirect addressing inhibit large vector speedups. 5.2 Matrix storage and generation The matrix systems arising from I-DEAS were very sparse: on the average, the minimum number of non-zero elements per row was 9 and the maximum number of non-zeros per row was 30. The total number of non-zeros varied between 15N and 16N, where N is the number of unknowns. There are various storage schemes for sparse matrices. In this chapter, we will discuss the ITPACK format [53], the jagged diagonal format and the Compressed Sparse Row (CSR) format. Knowledge of the storage formats is important since the speed of computation on vector or parallel processors is directly linked to the data structure used for matrix storage.

77 In the ITPACK storage scheme, a sparse matrix A of order N is stored using two arrays D and PC. For example, if we have the 5 x 5 unsymmetric matrix A 3 0 0 4 5 7 0 4 0 2 4= 4 0 7 0 0 0 0 8 0 0 9 7 0 0 0 Then, according to the ITPACK scheme, the rows of the array D will contain the non-zero elements of the corresponding rows of the original matrix. The number of columns of D will be equal to the maximum number of non-zeros in a row; rows containing fewer non-zero elements will be zero padded. The array D thus looks like 3 4 7 4 D= 4 7 8 0 9 7 The column indices of the elements in D are 5 2 0 0 0 stored in an integer array PC defined as 1 4 5 1 3 5 PC= 1 3 * 3 ~ ~ 1 2 * The asterisk denotes that the corresponding elements of D are zeros. The ITPACK storage scheme is attractive for generating finite element matrices since the number

78 of comparisons required while augmenting the matrix depends only on the locality of the corresponding edge and not on the number of unknowns. Moreover, the sparse matrix-vector multiplication process can be highly vectorized because of large vector lengths when the number of non-zeros in all rows is nearly equal. However for our application, almost half the space is lost in storing zeros. As a result, a lot of storage as well as computational effort is wasted in storing and operating on zeros, respectively. The modified ITPACK scheme [54], or the method of jagged diagonals, does alleviate this problem to a certain degree by sorting the rows of the matrix by decreasing number of non-zero elements. However, 30% of the allotted space is still lost in zero padding. The best trade-off between storage and speed for our application is obtained by storing the non-zero matrix elements in a long complex vector, the column indices in a long integer vector and the number of non-zeros per row in another integer vector. This data structure is referred to as the Compressed Sparse Row (CSR) format. A similar data structure which stores the row indices instead of the column indices is called the Compressed Sparse Column (CSC) format. The CSC format is sometimes used when the matrix is to be accessed along the rows and not the columns, e.g., in the multiplication of the transpose of a sparse matrix with a vector. In our implementation, a map of the number of non-zeros for each row is obtained through a simple pre-processor. The main program stores the matrix in CSR format, thus minimizing storage and sacrificing a bit of speed. The required storage is 15N to 16N complex words plus integers for t and PC, respectively, and N integers for the array containing the pointers to the rows' data.

79 5.3 Linear equation solver In three dimensional applications, the order N of the system of linear equations may be very large. Direct solution methods usually suffer from fill-in to an extent that these large problems cannot be solved at a reasonable cost even on state-ofthe-art parallel machines. It is, therefore, essential to employ solvers whose memory requirements are a small fraction of the storage demand of the coefficient matrix. This necessitates the use of iterative algorithms instead of direct solvers to preserve the sparsity pattern of the finite element matrix. Especially attractive are iterative methods that involve the coefficient matrices only in terms of matrix-vector products with A or AT. The most powerful iterative algorithm of this type is the conjugate gradient algorithm for solving positive definite linear systems [55]. In our implementation, the system of linear equations is solved by a variation of the CG algorithm, the biconjugate gradient (BCG) method. This scheme is usually used for solving unsymmetric systems; however, it performs equally well when applied to symmetric systems of linear equations. For symmetric matrices, BCG differs from CG in the way the inner product of the vectors are taken. The conjugate gradient squared (CGS) algorithm [56] performs best when applied to unsymmetric systems of linear equations. It is usually faster than BCG but is more unstable since the residual polynomials are merely the squared BCG polynomials and hence exhibit even more erratic behavior than the BCG residuals. Moreover, there are cases where CGS diverges, while BCG still converges. Recently, Freund [57] has proposed the quasi-minimal residual (QMR) algorithm with look-ahead for complex symmetric matrices. Based on the above, the biconjugate gradient (BCG) algorithm was found to be

80 most suitable for our implementation. The BCG requires 1 matrix-vector multiplication, 3 vector updates and 3 dot products per iteration. The solution scheme requires only three additional vectors of length N. The vector updates and the dot products can be carried out extremely fast on a vector Cray machine like the Cray YMP, reaching speeds of about 190 MFLOPS. However, the matrix-vector product, which involves indirect addressing and short vector lengths, runs at about 45.5 MFLOPS on 1 processor of the 8-processor Cray YMP. As a rule of thumb, the biconjugate gradient algorithm with no preconditioning consumes 4.06 microseconds per iteration per unknown on the Cray YMP. As mentioned earlier, there are two problems which limit the vectorizability of a sparse matrix code - short vector lengths and indirect addressing. There is not much one can do about the second problem since sparse matrices must have indirect addressing to exploit its O(N) storage feature. However, the first problem can be removed by storing the matrix in a different format such that the vector lengths are approximately equal to the order of the system being solved. The storage format is called the jagged diagonal format [61]. The rows are ordered by decreasing degree and the leftmost elements of each row are stored as a dense vector with an additional vector indicating the column numbers of each element. The matrix is thus stored as a collection of vectors of decreasing length. The inner loop of the matrix-vector multiplication routine traverses the entire length of a jagged diagonal, which can be of the order of the system being solved. The storage requirement of the above format can be made to be the same as the previously mentioned CSR format through careful programming. The altered code then runs at around 275M flops on a Cray C-90. It must be mentioned that the CRAY C-90 is a substantially faster machine than the Cray YMP but the CSR formatted matrix-vector multiplication routine runs

81 about 4 times slower on the C-90. Therefore, we can reliably state that the method of jaggeddiagonals is the best sparse matrix storage scheme in terms of computer storage and vectorizability. 5.4 Preconditioning The condition number of the system of equations usually increases with the number of unknowns. It is then desirable to precondition the coefficient matrix such that the modified system is well-conditioned and converges in significantly fewer iterations than the original system. The equivalent preconditioned system is of the form [C-] [A] {x} = [C-1] {b} (5.1) The non-singular preconditioning matrix C must satisfy the following conditions: 1. should be a good approximation to A. 2. should be easy to compute. 3. should be invertible in 0(N) operations. The preconditioners that we discuss below are the diagonal and the ILU point and block preconditioners. Block preconditioners are usually preferable due to reduced data movement between memory level hierarchies as well as decreased number of iterations required for convergence. Block algorithms are also suited for highperformance computers with multiple processors since all scalar, vector and matrix operations can be performed with a high degree of parallelism. 5.4.1 Diagonal preconditioner The simplest preconditioner that was used in our implementation was the point diagonal preconditioner. The preconditioning matrix C is a diagonal matrix which

82 is easy to invert and has a storage requirement of N complex words, where N is the number of unknowns. The entries of C are given by Cj = ijAij, i= 1,...,N; j=1,...,IN (5.2) where &ij is the Kronecker delta. The matrix C-1 contains the reciprocal of the diagonal elements of A. The algorithm with the diagonal preconditioner converged in about 35% of the number of iterations required for the unpreconditioned case. This suggested that our finite elment matrix was diagonally dominant since the reduction in the number of iterations was rather impressive. The diagonal preconditioner is also easily vectorizable and consumes 4.1 microseconds per iteration per unknown on the Cray YMP, a marginal slowdown over the unpreconditioned system. A more general diagonal preconditioner is the block diagonal preconditioner. The point diagonal preconditioner is a block diagonal preconditioner with block size 1. The block diagonal preconditioning matrix consists of m x m symmetric blocks as shown in Figure 5.1. The inverse of the whole matrix is simply the inverse of each individual block put together. If the preconditioning matrix C is broken up into n blocks of size m, the storage requirement for the preconditioner is at most m x N. However, this method suffers a bit from fill-in since the inverted m x m blocks are dense even though the original blocks may have been sparse. Due to this reason, large blocks cannot be created since the inverted blocks would lead to full matrices and take a significant fraction of the total CPU time for inversion. However, since the structure of the preconditioning matrix is known a priori, this preconditioner vectorizes well and runs at 194 MFLOPS on the Cray-YMP for a block size of 8. For a test case of 20, 033 unknowns, a block size of 2 caused the maximum reduction in the number of iterations(14%) and ran at 197 MFLOPS.

83 — C __ C2 CCNl CN Figure 5.1: Structure of block preconditioning matrix 5.4.2 Modified ILU preconditioner The next step was to use a better preconditioner to improve the condition number of the system resulting in faster convergence. The traditional ILU preconditioner [58] was employed with zero fill-in; however, the algorithm took a greater number of iterations than the diagonal preconditioner to converge to a specified tolerance. This was probably because the ILU preconditioned system may not have been positive definite [59]. The preconditioned conjugate gradient method usually converges faster if the preconditioner is positive definite, although this is not a necessary condition. Higher values of fill-in were not attempted since the preconditioner already occupied storage space equal to that of the coefficient matrix. Algorithm 1: Modified ILU preconditioner with zero fill-in It is assumed that the data is stored in CSR format and that the column numbers

84 for each row are sorted in increasing order. The sparse matrix is stored in the vector D and the column numbers in PC. SI27(i) contains the total number of non-zeros till the ith row. The locations of the diagonal entries for each row are stored in the vector DIAG. The preconditioner is stored in another complex vector, CU. for i=1 step 1 until n-1 do begin lbeg=diag(i) lend=sig(i) for j=lbeg+l step 1 until lend do begin jj=pc(j) ij=srch(jj,i) if (ij.ne. O) then begin lu(ij)=lu(ij)/lu(lbeg) end end end A modified version of the ILU preconditioner was next employed by eliminating the inner loop of the traditional version. The algorithm basically scales the off-diagonal elements in the lower triangular portion of the matrix by the column diagonal. Since the matrix is symmetric, it retains the LDLT form and is also positive definite if the coefficient matrix is positive definite. This preconditioner is less expensive to generate and converges in about 1/3 the number of iterations taken by the

85 point diagonal preconditioner. It has been tested with reliable results for N < 50000. However, the time taken by the two preconditioning strategies is approximately the same since each iteration of the ILU preconditioned system is about three times more expensive. The forward and backward substitutions carried out at each iteration runs at 26.5 MFLOPS on the Cray YMP and proves to be the bottleneck since they are inherently sequential processes and the vector lengths are approximately half that of the sparse matrix-vector multiplication process. The triangular solver is also extremely difficult to parallelize. Techniques like level scheduling and self scheduling try to exploit the fine grain parallelism in the sparse system [60]. We implemented the level scheduling algorithm to examine the potential parallelism in the forward/backward substitution step. For solving any lower triangular system Lx = b, the ith unknown in the forward solution is given by xi = l bi- -,xlix (5.3) If L is dense, all the components x1,..., xi_ need to be computed before xi can be obtained. However, when L is sparse, most of the lijs are zero; hence, we may not need to compute all of the unknowns x1,...,xi- before solving for xi. Level scheduling is based on this simple observation. The dependencies between the unknowns can be modeled using a graph in which node i corresponds to the unknown xi and an edge from node j to node i indicates that lij - 0 implying that the value of xj is needed for solving xi. The operation shown in (5.3) can now be rewritten as xi = b- E lijxj) (5.4) l 3j<i:li, O Thus xi can be solved at the kth step if all the components xj in (5.4) have been computed in the earlier steps.

86 In order to implement the level scheduling algorithm, we need to define the depth of a node and the level of the graph. The depth of a node is defined as the maximum distance from the root [61]. Therefore, we will place an imaginary root node with links to the nodes having no predecessors so that the depth of each node will be defined from the same point. The depth of each node can now be computed with one pass through the structure of the coefficient matrix L by depth(i) = if L = O0for all< 1 < ( depth (Z)= l3 (5.5) 1 + maxj<i{depth(j): li. 7 0}, otherwise J The level of the graph can then be defined as the set of nodes with the same depth. The level scheduling algorithm can now be implemented without physically ordering the matrix, but solving the system in increasing order of node depth and distributing the nodes at each depth among the available processors. Algorithm 2: Forward elimination step with level scheduling The number of levels of the graph, nlev, can be easily determined from the depth information. Two other integer vectors are also required. ORDER(i) stores the ordering of the rows of L in terms of increasing node depth. LEVEL(i) stores the index to the start of each level in ORDER(i). do k=1,...,nlev do j=ilevel(k),...,ilevel(k+1)-1 (parallel loop) i=iorder(j) execute Equation 5.4 enddo enddo However, our efforts at parallelizing the ILU preconditioned system with level scheduling did not lead to significant speedup mainly due to the enormous amount of mem

87 ory traffic that was generated. This observation was also noticed in [60], where the authors estimated that the parallel algorithm generated as much as 10 times more traffic than the sequential code. In order to look for an effective parallelizable ILU preconditioner, we next turned our attention to a block ILU preconditioner. In our scheme of implementing the block ILU preconditioner, we distribute one block to each processor in a multi-processor architecture, thus achieving load balancing as well as minimizing fill-in. The modified ILU decomposition outlined earlier is then carried out on each of these individual blocks. Further, since the blocks are much larger than the block diagonal version, the preconditioner is a closer approximation to the coefficient matrix. Moreover, the triangular solver is fully parallelized since each processor solves an independent system of equations through forward and backward substitution. In our test case of 20, 033 unknowns, the number of iterations was reduced by approximately half the number required by the diagonal preconditioner. Since the work done is less than twice that for the diagonal preconditioner, we achieved a marginal savings of CPU time. However, the number of iterations required for convergence is highly sensitive to block size as shown in Table 5.1 for N = 20,033. Table 5.1 clearly shows that a larger block size (smaller number of blocks) does not guarantee faster convergence. Nevertheless, there is an approximately 50% decrease in the number of iterations over the point diagonal preconditioner, regardless of block size. The optimum block size is dependent on the sparsity pattern of the matrix and can only be determined empirically. The savings in the number of iterations over the point diagonal preconditioner for 28 blocks is given in Table 5.2 for a system having 224,476 unknowns. From the table, it is clear that the block ILU preconditioner is very effective in reducing the iteration count; however, the CPU time required is about 10% less than that required by the point diagonal preconditioner for the best

88 No. of blocks I No. of iterations 1 127 2 176 4 185 8 172 12 162 16 174 24 223 28 177 Table 5.1: No. of iterations vs no. of blocks for a block ILU preconditioned biconjugate gradient solution method. Angle of no. of iterations Ratio incidence point diagonal (I) block ILU (II) (II/I) 0 2943 2758.937 10 5985 3834.641 20 5464 3984.729 30 6048 3651.604 40 5770 3256.564 50 5107 3720.728 60 6517 4162.639 70 5076 4108.809 80 5305 3551.669 90 2898 2832.977 Table 5.2: No. of iterations required for convergence of a 224,476 unknown system using the point diagonal and block ILU preconditioning strategies. case. 5.5 Parallelization The different versions of the FE-ABC code were parallelized on two different types of massively parallel architectures - the KSR1 and the Intel iPSC/860. The KSR1 is a parallel machine which implements a shared virtual memory, although the memory is physically distributed for the sake of scalability. The Intel iPSC/860, on the other hand, is a distributed memory, Multiple Instruction, Multiple Data (MIMD)

89 system in which the nodes process information independently of one another and communicate by passing messages to each other. The conversion of sequential or vectorized code to parallel code involves two primary tasks: * parallelization of DO loops Parallelism is introduced by allowing each processor to execute a portion of the DO loop. * distribution of arrays among the processor set Since each processor only has a limited amount of memory, each array is divided into smaller units that reside on each node. This also allows array accesses from each processor to be serviced by different nodes, thus reducing contention for resources on any single node. On a cache-only memory machine such as the KSR1, only the first step is necessary since the hardware cache system automatically takes care of data distribution among the processors. This makes porting codes to the KSR1 quite easy. However, the increased control of data distribution and communication on the iPSC/860 can translate into improved performance for some applications. We will first describe our port and performance figures for the KSR1 and then detail our port on the Intel iPSC/860. 1. KSR1 port The basic strategy for the parallelization of the code is described on the biconjugate gradient solver with diagonal preconditioning. The other versions use the same parallelization scheme with slight modifications. We also comment on the parallelization of the matrix assembly phase.

90 Complex Real Operation * + * + Matrix Multiply nze nze - N 4nze 4nze - 2N Vector Updates 4N 3N 16N 12N Dot Products 3N 3N 12N 12N Table 5.3: Floating point operations per iteration. The symmetric biconjugate gradient method iteratively refines an approximate solution of the given linear system until convergence. Figure 5.2 shows the method in terms of vector and matrix operations. For a system of equations containing N unknowns, all these vectors are of size N and the sparse matrix is of order N. The number of nonzero elements in the sparse matrix is denoted as nze. Table 5.3 shows the operation counts per iteration for each type of vector operation. In the FE-ABC code, each vector operation is implemented as a loop. The program is parallelized by tiling these loops. For P processors, the vectors are divided into P sections of N/P consecutive elements. Each processor is assigned the same section of each vector. This partitioning attempts to reduce communication while balancing load. To guarantee correctness, synchronization points are added after lines 2, 7, and 9. Lines 2 and 7 require synchronization to guarantee that the dot products are computed correctly. Note that the dot products in lines 6 and 7 require only one synchronization. The line 9 synchronization guarantees that p is completely updated before the matrix multiply for the next iteration begins. In the sparse matrix vector multiplication, each processor computes a block of the result vector by multiplying the corresponding block of rows of the sparse matrix with the operand vector. Since the operand vector is distributed among the processors, data communication is required. The communication pattern is determined by the sparsity structure of the matrix, which in our case is derived from an unstructured

91 Initialization: x given r = b-Ax; p =r; Repeat until (resd < tol) q =Ap a = tm.p/(q p) x = -F ap r = r -- aq q =C-* r resd = /|r - r* /3 = (r - q)/tmp tmp = j3 x tmp p=q - 3p tmp = r * r (1) (2) (3) (4) (5) (6) (7) (8) (9) Step 1 Step 2 I Step 3 EndRepeat A is a sparse complex symmetric matrix. C is the preconditioning matrix. q, p, x, r are complex vectors. a,/3, tmp are complex scalars; resd, tol are real scalars. Figure 5.2: Symmetric biconjugate gradient method with preconditioning.

92 mesh. Therefore the communication pattern is unstructured and irregular. However since the sparse matrix is not modified during the iterative process, the communication pattern is the same at each iteration. Vector updates and dot products are easily parallelized using the same block distribution as in the sparse matrix vector multiply. Although sparse computations are known to be hard to implement efficiently on distributed memory machine, mainly because of the unstructured and irregular communication pattern, the previous scheme was easily and efficiently implemented on the KSR1 MPP thanks to the global address space [62]. Table 5.4 shows the execution time of one iteration (in seconds) and the speedup for different numbers of processors and for two problem sizes. For both problems, the performance scales surprisingly well up to a large number of processors. For the 20,033-unknown problem, the speedup for the parallelized sparse solver varies from 1 to 38 as the number of processors is increased from 1 to 56 (Figure 5.3). The overall performance of the solver on 28 processors is more than three times that of a single processor on the Cray-YMP. The large problem (224,476 unknowns) exhibits superlinear speedup which can be attributed to a memory effect. As a matter of fact, the large data set does not entirely fit in the local cache of a single node in the KSR which results in a large number of page faults. However, as the number of processors increases, the large data set is distributed over the different processors' memories. The global matrix assembly is the second largest computation in terms of execution time. The elemental matrices are computed for each element in the 3D mesh and assembled in a global sparse matrix. A natural way of parallelizing the global matrix assembly is to distribute the elements over the processors, have each pro

93 4 3- 3 16 8 Linear 0 8 16 24 32 40 48 56 Figure 5.3: Speedup curve for the linear equation solver on the KSR1 cessor compute the elemental matrix of the elements it owns and update the global sparse matrix. Since the global sparse matrix is shared by all processors, the update needs to be done atomically. On the KSR1 this is done by using the hardware lock mechanism. The performance for the matrix assembly is given in Table 5.5 and also in Figure 5.3. 5.5.1 Analysis of Communication In the main loop (Figure 5.2), significant communication between processors takes place only during the sparse matrix vector multiply (line 1) and the vector update of p (line 9). The rest of the vector operations incur little or no communication at all. The distribution of the nonzero entries in the matrix affects the amount and nature aFor 1, 8 and 16 processors, only the first 100 iterations were run. bCode run on a 64 node KSR at Cornell University

94 Procs N=20,033 N=224,476 Execution time Speedup Execution time Speedup (secs per iter) (secs per iter) Ia.515 1 10.8 1 8.071 7.3 1.4 7.7 16.040 12.9.671 16.1 29.027 19.1.304 35.6 60b.149 76.2 Table 5.4: Execution time and speedup for the iterative solver of communication. In this section, we present an analysis of the communication pattern incurred by the sparse matrix vector multiplication as derived from analysis of the sparsity structure of the matrix. Line 1. In the matrix-vector multiply, each processor computes an N/P-sized subsection of the product q. The processor needs the elements of p that correspond to the nonzero elements found in the N/P rows of A that are aligned with its subsection. Procs Execution time Speedup in seconds 1 24.355 1 2 13.376 1.8 4 6.811 3.6 8 3.744 6.5 16 1.89 12.9 25 1.625 15.0 28 1.276 19.1 Table 5.5: Execution time and speedup for the matrix generation and assembly (20,033 unknowns)

95 Because the matrix A remains constant throughout the program, the set of elements of p that a given processor needs is the same for all iterations in the loop. However, since p is updated at the end of each iteration, all copies of its element set are invalidated in each processor's local cache except for the ones that the processor itself updates. As a result, in each iteration, processors must obtain updated copies of the required elements of p that they do not own. These elements can be updated by a read miss to the corresponding subpage, by an automatic update, or by an explicit prefetch or poststore instruction. Figure 5.4 Number of Subpages 3 5 0 - _ _. _........ 25 - ___ 200 H -. - __....................................................................................................................................................................................... 3 0 0 -........................................................................................................................................... 100 250 -| 200|1 t 150 - 100 -...l,-I 50 - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19202122232425 2627 Thread ID Figure 5.4: Counts of p subpages required by each processor for sparse matrix-vector multiply (total copies=5968) lists the number of subpages that each of the 28 processors needs to acquire from other processors. Automatic update of an invalid copy of a subpage becomes more likely as the number of processors sharing this subpage grows. The number of processors that need a given subpage (excluding the processor that updates the subpage) is

96 referred to as the degree of sharing of that subpage. Figure 5.5 shows the degree of Number of Subpages 0 1 2 3 4 5 6 7 8 9 10 Degree of Sharing Figure 5.5: Degree of sharing histogram of p subpages multiply (28 procesors) during sparse matrix-vector sharing histogram for the example problem. Since the only subpage misses occurring in Step 1 of the sparse solver are coherence misses due to the vector p, the use of the poststore instruction to broadcast the updated sections of the vector p from Step 3 should eliminate the subpage misses in Step 1. However, the overhead of executing the poststore instruction in Step 3 offsets the reduction in execution time of Step 1. On a poststore, the processor typically stalls for 32 cycles while the local cache is busy for 48 cycles. As a result, the net reduction in execution time is only 3%. Line 9. Before proceeding with the updates of the N/P elements of p for which it is responsible, each processor must acquire exclusive ownership for those elements. Because a cache line holds 8 consecutive elements, each processor will generate N/8P requests for ownership (assuming all subpages are shared). In order to hide access

97 latencies, the request for ownership can be issued in the form of a prefetch instruction after step 1. This could lead to an eightfold decrease in the number of subpage misses. However, as with the poststore instruction, the benefit of prefetching is offset by the overhead of processing the prefetch instructions in step 2. This is because the processor stalls for at least two cycles on a prefetch and the local cache cannot satisfy any processor request until the prefetch is put on the ring. The overall execution time is reduced by only 4% in this case. Lines 2, 6, 7. The rest of the communication is due to the three dot products. Each processor computes the dot product for the vector subsection that it owns. These are then gathered and summed up on a single processor. 2. Intel iPSC/860 port The parallelization of the DO loops is one of the main tasks since the majority of the computer time is spent on solving the linear system of equations. The basic strategy for parallelizing the DO loops on the iPSC/860 is similar to the KSR1 - each portion executes a portion of the DO loop. This scheme works fine as long as there are no dependencies in the body of the loop, as is the case for the vector updates and the sparse matrix-vector multiply of the linear solver. However, the main loop in the matrix generation/assembly phase contains a dependency between loop iterations. As on the KSR1, this problem is solved by using a mechanism by which each processor locks a row of the matrix while performing an update. Since the locking of each row is maintained by the processor whose memory holds the particular row, processors lock and unlock rows by sending messages to the appropriate row owner. Even though the parallelization of loops enables programs to run faster on multiprocessors, the distribution of arrays must be done for the code to run at all. Arrays

98 are distributed in the code by partitioning one dimension among the processors. Thus for a 1000 element array, processor holds the first 100 elements, processor 2 the next 100 and so on. The straightforward method for accessing this distributed array involves the translation of array references into subroutine calls. Thus an expression x = a(i) is translated into the call call fetcha(i, x). The subroutine fetcha then sends a message to the processor that holds element a(i), which in turn sends a reply message with the value of a(i). Although this scheme requires the implementation of a new subroutine for each distributed array and the replacement of each array access with a subroutine call, the process is easy and mechanical. However, such a scheme does not result in good performance. The primary reason for this is that the overhead for sending a message is much higher than that of sending a single byte. The cost for sending 10 or even 100 bytes is usually not much higher than that of sending 1 byte. Thus, messages need to be 'bundled' for fast and efficient operation. However, the simple strategy mentioned above is in direct contrast to message bundling. Therefore, we have implemented the simple scheme for parts of the code that do not take up a significant portion of computation time like the matrix generation/assembly phase and a better scheme for accessing the distributed arrays in the equation solver phase. The primary operation in the solver that generates communication is the sparse matrix-vector product. Since the matrix-vector product involves performing a dot product of each row with the distributed vector, each processor must obtain the values for the entire vector from the other processors. The dot product operation must be carried out in several phases as each processor may not be able to hold the entire vector in memory. Thus each processor P begins the matrix-vector multiply by sending its portion of the vector to other processors, then performs the following

99 tasks for every other processor P' * Reads the portion of the vector owned by P'. * Updates the partial dot product for each row by adding the product of the appropriate matrix element with the elements of the partial vector. After performing the above operations for all the processors, the dot product is complete. Unfortunately, each phase requires a pass over all the sparse matrix rows owned by the processor. In the future, it may be possible to sort each row of the matrix to allow the phases to pass over the rows in order. The speedup results on a problem with 31000 unknowns show that the problem scales reasonably well for a small number of processors. However, as the number of processors increases, much of the time is spent on communication and book-keeping than on true computation. Efforts are under way to run a larger problem.

CHAPTER VI Conformal absorbing boundary conditions for the vector wave equation As mentioned in the previous chapters, the focus of this thesis is the computation of scattering from large three-dimensional structures. Since we are dealing with large targets having arbitrary shape, a spherical mesh termination boundary is not as attractive in terms of storage and computational cost. This is especially true for long and thin geometries where a sphere is the least economical shape of mesh termination, in terms of the number of unknowns. The ideal situation would be to enclose the scatterer inside a mesh termination boundary which is of the same shape as the scattering body (see Figure 6.1). If boundary conditions could be derived for such conformal mesh truncation surfaces, the volume to be meshed and the corresponding computing cost would then be minimized. However, available three dimensional ABCs for the vector wave equation as derived by Peterson [47] and Webb and Kanellopoulos [48] are only suited for application on spherical mesh terminations. Our goal, therefore, is to derive new vector ABCs for three dimensional analysis which can be applied on a surface conformal to the structure of interest. We begin with a modified Wilcox expansion whose leading order term recovers the geometrical 100

101 Figure 6.1: Scatterer enclosed in conformal mesh termination boundary optics fields and thus, given the appropriate principal curvatures, the resulting ABC completely absorbs all geometrical optics fields from the surface. We then proceed to derive the first and second order absorbing boundary conditions in terms of the principal curvatures of the surface on which they are employed. We also introduce an approximation to make the absorbing boundary condition contribution symmetric. In the next step, we incorporate these boundary conditions into the finite element equations and express them in a readily implementable form. We also comment on the symmetry of the system for doubly curved surfaces. In the last section, we examine the performance of these ABCs - in terms of computational cost - when applied on mesh termination surfaces conformal to the scattering object. 6.1 Formulation It is known that the electric field in a homogeneous region of space is governed by the vector wave equation VxVxE- k2E = 0 (6.1)

102 where ko is the free-space wave number. We assume that the field has a well-defined phase front in the region under consideration. Since we are concerned only with local behavior, we can assume that the phase fronts can be treated as parallel regions. Consequently, the surface describing the phase fronts can be specified by a net of coordinate curves denoted by t1 and t2 and a third variable n denotes the coordinate along the normal to the phase front. The point of observation in the Dupin coordinate system [63] can now be defined as x = nn + Xo (t, t2) (6.2) where nf is the unit normal and xo (t1, t2) denotes the surface of the reference phase front. The curl of a vector in the above coordinate system is given by VxE =VTxE+nfx O (6.3) On where VT x E is called the surface curl involving only the tangential derivatives and is defined as [64] VT x E = -n x VEn + t2KEt, - t2E2 + nV fi (E x f) (6.4) In (6.4), cil and (2 denote the principal curvatures of the surface under consideration, Et1, Et2 are the tangential components and En is the normal component of the electric field on the surface. The principal curvature of a surface is defined as [63] 1 1 h1 (6.5) /l = R1 = -ha{(6.5) K1, h1 an 1 1 h2 (6.6) 2=- = - (66) R2 h2 An where h1, h2 are the metric coefficients and R1, R2 are the principal radii of curvature. Using the aforementioned coordinates, the Wilcox expansion for a vector radiating

103 function can now be generalized to read e-jko6 c~ Ep (/1,/2) E(n,tl,t2) = 4ek/1 Ep(tlt-)p (6.7) where Ri = pi + n, Z = 1,2 and pi is the principal radius of curvature associated with the outgoing wavefront at the target. The lowest-order term in (6.7) represents the geometrical optics spread factor for a doubly curved wavefront and reduces to the standard Wilcox expansion [65] for a spherical wave. Moreover, (6.7) can be differentiated term by term any number of times and the resulting series converges absolutely and uniformly [65]. 6.1.1 Unsyrrmetric ABCs In the 3D finite element implementation using vector basis functions and the electric field as the working variable, we need to relate the tangential component of the magnetic field in terms of the electric field at any surface discontinuity. Therefore, our next task is to derive a relation between n x VxE (i.e., n x H where H is the magnetic field) and the tangential components of the electric field on the surface. Taking the curl of the electric field expansion given by (6.7) and crossing it with the normal vector, we have 47r: I + p+ P (6.8) where u = /RR2 and VtEn = -(n x n x V)En hK1 + K2 Km = 2 = Kltltl + K2t2t2

104 Considering that Eon is zero due to the divergenceless condition [65] and simplifying, we obtain the first order absorbing boundary condition fx VxE- (jko + Km -.) Et - VEpn -pmEpt 9 4w Z UP+l p=l or, n x VxE-(jko + Km -') Et = + 0 (n-3) (6.10) for a conformal outer boundary. Not surprisingly, this is the impedance boundary condition for curved surfaces as derived by Rytov [66]. It should be noted that in the above equation, VtEn and Km are each proportional to n-1. Therefore, the leading order behavior of (6.10) is 0 (n-3), i.e., only the first two terms of (6.7) are exactly satisfied by (6.10). If the scattered field contains higher order terms, application of (6.10) will give rise to non-physical reflections back into the computational domain. In order to reduce these spurious reflections, we need to either shift the mesh truncation boundary farther away from the scatterer or employ higher order boundary conditions which satisfy higher order terms of (6.7). To reduce the order of the residual error further, we consider the tangential components of the curl of (6.9). This yields nii [ x x [ VxE - (jko + im - r' )Et] ^-jk,9n 00 tEpn + p7mEpt 4n ~+ (P + 3)Km - g} PKmEpt (2 Kg VtEpi pKmT Ept (6 - r2Km - i) j 1 (6.11) where Kg = K1K2 is the Gaussian curvature. Using the result derived in (6.9) and simplifying, (6.11) reduces to n x Vx [n x VxE - (jko + Km - ) Et] = 4w IK Km Up+l [ + (p + - m - 6.12) - 2Km - -- VtEn (6.12)

105 If we take a closer look at the term in the square brackets on the RHS of (6.12), we find that it can be written as e-jkon oo VtEpn + pKmEpt 47r Pm UP+l p=1 + (jk, +3 3K -' {n x V xE-(jko +i m- 7m.)Et} Kgm where we have substituted VtE + pKmEpt 1(6.13) - E tEpn + pmEpt = n x VxE - (j cko + Km - 7.) Et (6.13) 47r p:I u+l using the relation derived in (6.9). Now the dominant terms on the RHS of (6.12) can be eliminated by considering the higher-order operator [n x Vx - (ko + 3m- 9 - ) {n x VxE- (jko + Km - )Et} + Km n-j on oo /' Kp e-l- V +pn'mErpt (2m ) tEn = Zd p4Km 7rV P+ (6.14) p-1 The residual of (6.14) can be reduced further to yield the absorbing boundary condition of second order which satisfies (6.7) to O (n-5). This second order ABC is found to be [nxVx - jko+4Km- ).) {nfx VxE -(jko+rm -)Et}+ (2 - K9 VtEn 0= (6.15) aKm and the residual is equal to e3kon 00tEpn q+ pnKmEpt 4 ( M UP+1 (6.16) p=2 The operator on the LHS of (6.15) can be applied repeatedly to obtain ABCs of increasing order; however, higher order basis functions are needed for their implementation.

106 After some algebraic manipulation, the terms on the LHS of (6.15) reduced to simpler ones. In addition to the wave equation, the following vector identities were derived to carry out the simplifications and are provided below for the reader's convenience: n x VxEt = n x VxE-VtEn n xVxVtE, = Vt (V Et)+2KmVtEn n x Vx (n x VxE) = Vx {n (VxE)n} - k2Et + AK n x VxE where AK = K - K2. The derivation of these identities is given in Appendix B. Upon simplification, the second order ABC can be compactly written as - (D - A - 2.) n x VxE + {4K - I + D (jk0 - 77) + (j)2 *} Et +Vx {n (VxE)n} + (jko + 3Km- b -2f7) VtEn = 0 (6.17) in which Kg D = 2jko + 5Km -- Km and (f)2 = Et = K1Et1ti + K2Et2t2 The derivatives of the curvatures in (6.17) have been ignored due to the reasons outlined in [64]. These derivatives essentially give rise to second-order curvature terms which add to the coefficient of Et only. The second order ABC derived in [1] is recovered on setting K1 = K2 = 1/r. 6.1.2 Symmetric correction It has been shown by Peterson in [47] that the LHS of (6.17) when incorporated into the finite element equations gives rise to an unsymmetric matrix system

107 in spherical coordinates. To alleviate this problem, Kanellopoulos and Webb [48] suggested an alternative derivation involving an arbitrary parameter which would lead to a symmetric matrix while sacrificing some accuracy. Below, we discuss a different approach which leads to a symmetric ABC without the introduction of an arbitrary parameter. On considering the series expansion of the term n x VxVtEn, we have ej'kon 0o n xVxV E = 4 E {jk0 + (p + l)lm} 4,x 13UP+l 00 7 7 = jkoVtEn + +2KmVtEn + (p - )m U+ p=2 U = jkoVtEn + 2KmVtEn + 0 (n-5) and on making use of the vector identity Vt (V Et) = n x VxVtEn - 2/mVtEn given earlier, we arrive at the following result Vt (V.Et) = jkoVtEn +0 (n-5) (6.18) Since our ABC was derived to have a residual error of 0 (n-5), we can replace jkoVtEn with Vt (V-Et) without affecting the order of the approximation. Doing so, the second order ABC with a symmetric operator can be rewritten as (D - Ar - 2f7.) n x VxE = {4 D - Kg + D (jko -) j) + ()2 *} Et+ Vx f{n (VxE)n + - (jko + 3 g- g - 27 ) Vt (V.Et) (6.19) + ko \ "m M It can be easily shown that the above boundary condition leads to a symmetric system of equations when incorporated into the finite element functional for surfaces having K1 = K2. Equations (6.10) and (6.19) reduce to the boundary conditions derived in [48] on setting K1 = K2 = 1/r which have been found to work well for spherical and flat boundaries [67].

108 6.1.3 Finite element implementation The boundary condition outlined in equation (6.19) cannot be incorporated into the finite element equations without modification. As explained in Chapter 3, the absorbing boundary condition is implemented in the finite element system through the surface integral over the mesh termination surface So. E.fixVxEdS = E P(E) dS where P(E) denotes the boundary condition relating the tangential magnetic field to the tangential electric field on the surface. Let P1(E) denote the first order absorbing boundary condition given by (6.10), where the subscript represents the order of the ABC. Therefore, the surface integral contribution for the first order ABC reduces to / E.P(E) dS= (jko + Km) E Et dS-J E (r'Et) dS (6.20) Using some basic vector identities and considering that Et = -n x n x E, we deduce that / E. P(E) dS = (jk m) (21 + E22) dS- (ilEt2 + i2E2 dS (6.21) which is a readily implementable form of the first order ABC. However, the second order ABC does not simplify as easily. If P2(E) denotes the second order ABC given by (6.19), we can rewrite it in a more compact vector notation as P2(E) = ai Et + d [Vx {ni (VxE)n}] + 7 {V (V.Et)} (6.22) where the tensors c,,3 and 7 are given by a = 1 {42 -g D Dt - DAn-l -Kq-J- Or (jo I~ )-t K12tlt

109 + D - -2 {42m -g K + D (jko - K2) + K} t2t2 (6.23) D - AK - 2K D - AK - 2K2 jko(D -A- 21) (jko + 3Km Kg - 2K1) t +jo() (jko+K + 3m - 2 22)2 t2t2 (6.25) Substituting; the second-order absorbing boundary condition in the surface integral given in (6.22), we have E-P2(E)dS = j E(a -Et) dS +s E. (/-I{n(VxE)}) dS +J E. { V(V.Et)} dS = 1 + 2 + I3 Let us examine the integral 1,. Since Et = -n x n x E, we have I1 = j (1E + a2E22 dS (6.26) after employing some simple vector identities. The other two integrals (12 and I3) do not reduce as easily to simple, implementable forms. They are first simplified using basic vector and tensor identities and then the divergence theorem is employed to eliminate one of the terms. Considering the integrand of the second integral 12, we note that E. [ (Vxn~)] = (ThE) *(Vxn^i ) where we have set + = (VxE)n. Using some additional vector identities and letting 3 E = F, we get F Vxnfi = V-(nfi x F)+ dn VxF = V. ( x F) + (V xF)n

110 Using the results from [63], the first term in the above identity can be further simplified to read a V(onx F) = V.(s nx F)+-{ n.(nx F)}-J{n.(n x F)} = Vs (on x F) (6.27) where Vs denotes the surface gradient operator and J = Ki + K2. The integral 12 can now be written as 12 Vs s (On- xF) dS+ j (VxF)ndS We can now apply the surface divergence theorem to the first term on the RHS of the this expression to yield Is Vs (on x F) dS = JO qfim.(n xF)dl = 0 (6.28) since the surface SO is closed. We note that m = 1 x n and 1 is the unit vector along the edge of the surface element and C denotes the contour of integration. On the basis of (6.28), 12 reduces to 2 = (VxE),{Vx(3 E)}n dS (6.29) We now turn our attention to simplifying 13 for implementation in the finite element equations. Considering the integrand of 13, we have E.{ 7QVt(V.Et)} = (a7/E). {Vt(V.Et)} = *y.E).{VQ- ~an} where b = V Et. Next, setting G = 7 E, we obtain G { V - n n } = V (G) - V G - Gn an (6.30) Gjan) On

111 The first term in the above identity can be written as a V. (G) - V.s (G) + (bG )-J(bGn) and since aGn/On = V G - V Gt + JGn, the LHS of (6.30) reduces to G.{- V - n V. (bG) - OV. Gt (6.31) We now replace the integrand of 13 with the expression in (6.31) and use the divergence theorem to eliminate the first term of (6.31). Specifically, I3 = Vs, (G) dS - J V Gt dS = J m. (OG) dS - IJ V Gt dS= - j V. Gt dS where iM has been defined earlier and the contour integral vanishes since the surface is closed. The integral 13 can finally be rewritten as 13 = -j (V. Et)(V Gt) dS (6.32) Using (6.26), (6.29) and (6.32), the complete surface integral term incorporating the conformal second order ABC reduces to I E P2(E)dS = (aE + 2E2) dS+ (VxE) {Vx(E)} x (3 ) dS - (V. Et) {V. (7 E)} dS (6.33) It remains to be seen whether the integrals in (6.33) lead to a symmetric system when incorporated into the finite element equations. With this in mind, we will examine three simple shapes and check whether they preserve symmetry of the finite element system. It will then be possible to generalize our findings to a more general mesh truncation boundary.

112 Let us consider the case of a sphere of radius r. Since the two principal curvatures of the sphere are identical (K1 = K2 = 1/r), the first order boundary condition reduces to the simple Sommerfeld radiation condition / E. P(E)dS = jko (E +E) dS (6.34) On a spherical boundary, the second order ABC also reduces to the comparatively simple form: fE2 1 2+(Vx - 1 (V.E22 E. (E) ds - 2jko + 2/r (VxE)n 2jko + 2/r (6.35) The ABC given in (6.35) is identical to the boundary condition derived in [48] for a spherical mesh termination surface and leads to a symmetric system of equations. Next, we consider the case of a planar termination boundary in which case K1 = K2 = 0. The first order ABC then reduces to the Sommerfeld radiation condition and the second order ABC for a planar boundary simplifies to E. P2(E) dS = E2 + 2jk (VxE)-2k (V Et)2] dS (6.36) o 2 2 ko Since the planar boundary is a special case of a spherical boundary, (6.36) again reduces to asymmetric system of equations. Now we examine the situation when the mesh termination boundary is cylindrical in shape and of radius p. The principal curvatures of the cylindrical surface are then cK = l/p and K2 = 0. Since the principal curvatures are no longer identical, the tensors c, ~ and 7 do not reduce to simple scalars. The first order ABC for a cylindrical outer boundary is given by J E. P(E) dS o= (Jko+- J ko iE dS E - dS (6.37) P P P

113 and the second order ABC gives by / E- P2(E)d = f (aci E+ac2E ) dS+ (VxE)p{Vx(Ic E)} dS - (V Et){V (7V. E)t} dS (6.38) where acl, ac2,'3c and yc are obtained by substituting nl = 1/p and r2 = 0 in the original expressions for c, /3 and A. It is seen that the first order ABC given by (6.37) leads to a symmetric matrix for a cylindrical boundary. On the other hand, the second order ABC does not yield a symmetric matrix for an arbitrary choice of basis functions. However, the boundary condition outlined in (6.38) preserves symmetry on using linear edge-based elements for discretization. The above discussion enables us to conclude that the first order boundary condition leads to a symmetric system for surfaces having arbitrary principal curvatures. However, symmetry is guaranteed for the second order ABC only when the two principal curvatures of the mesh termination boundary are identical, i.e., only when the outer boundary is limited to a planar or a spherical surface. Thus if we want to enclose a scatterer having arbitrary shape within a conformal outer boundary, an unsymmetric system of equations will have to be solved. It should, however, be noted that the resulting unsymmetric system will, in general, have a lesser number of unknowns than its symmetric counterpart. 6.2 Applications In the previous section, we have discussed the derivation of absorbing boundary conditions which can be employed on surfaces conformal to the scattering or radiating structure. As a result, the mesh termination boundary can be made to enclose the scattering object more snugly. Consequently for arbitrary targets, we achieve a

114 substantial saving in the amount of volume to be meshed between the ABC surface and that of the scatterer. This is particularly critical when the target is cylindrical in shape or a combination of cylindrical, doubly curved and planar surfaces as is the case with any real-life structure. In this section, we examine the performance of these boundary conditions when applied on conformal mesh termination surfaces. A. Composite cube For our first example, we compute the backscatter pattern of the half metal-half z X Y R es istive:::::::::::::.::: sheet |...-.-.l Metal - a -a Figure 6.2: Geometry of cube (a = b = 0.5A) consisting of a metallic section and a dielectric section (r = 2 -j2), where the latter is bounded by a resistive surface having R = Zo. dielectric cube geometry shown in Chapter 3. However, instead of using a spherical surface to terminate the mesh, we employ the absorbing boundary condition on a piecewise planar surface, i.e., a cubical box placed only 0.3A from the face of the scatterer. The geometry is shown in Figure 6.2 and needed only 30,000 unknowns

115 for discretization. This is in stark contrast to the 40,000 unknown system which resulted when the geometry was enclosed in a spherical termination boundary. The decrease in the unknown count is even more dramatic as we go to larger scatterers. In Figure 6.3, we plot the backscatter pattern in the x - z plane (EnC = 0 polar10 -0 b @0 -10 -0 30 60 90 120 150 180 Observation Angle jo, deg. Figure 6.3: RCS pattern in the x - z plane for the composite cube shown earlier. The solid curve is the FEMATS pattern and the black dots are MoM data for the E7 = 0 polarization. Mesh termination is piecewise planar. ization) for the metal-dielectric cube geometry given in Figure 6.2 and compare the computed values with data obtained from a traditional method of moments (MoM) code [52]. The dielectric-filled section has unit permeability and a relative permittivity of 2 - j2. The agreement with reference data is seen to be excellent; it can therefore be concluded that accuracy of the far-field values has not been affected by a different mesh termination scheme. In fact, we have obtained results of comparable accuracy with only 75% of the computing resources than were necessary before. This observation will be made by the reader again and again in the following pages as the

116 full capability of these conformal ABCs is demonstrated. B. Inlets In our next example, we compute the scattering from perfectly conducting inlets. The aperture of an inlet usually has a large radar cross-section around normal incidence: therefore, a good understanding of its scattering characteristics is critical if measures need to be taken for reducing its echo-area. An accurate computer simulation of such a geometry provides a cost-effective and ready way of allowing the designer to experiment with complex material fillings to achieve satisfactory results. All our validations are carried out for empty inlets due to lack of reference data for more complicated structures. 20 - 15 --\ >] j X/ -5 1< -0 -. Observation Angle 00, deg. Figure 6.4: Backscatter pattern of a metallic rectangular inlet (IX x 1A x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is spherical. The geometry of interest is a perfectly conducting rectangular inlet, with dimensions 1A x 1A x 1.5A. For the plots shown in Figures 6.4 and 6.5, we have enclosed

117 20 15 10 -5,< 0 -- 10 - l il -5 0 30 60 90 Observation Angle Vo, deg. Figure 6.5: Back:scatter pattern of a metallic rectangular inlet (1A x 1A x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is spherical. the target within a sphere of radius 1.35A, which is only about.35A from the farthest edge of the scatterer. This resulted in a system of 224,476 unknowns and converged in an average of 785 seconds per incidence angle on the 56 processor KSR1. The computed values from our code agrees very well with measured data for both HH and VV polarizations. As can be seen from the above discussion, we have obtained our solution using significant computing resources and time. Our next step is to use the conformal mesh termination scheme formulated in the previous section and utilized in Example A. Therefore, instead of using a spherical mesh truncation surface, we terminate the mesh with a rectangular box placed only 0.35A away from the scatterer (see inset of Figure 6.6). The problem size reduces dramatically to 145,000 unknowns, a 35% reduction over the spherical mesh termination scheme. The convergence time for each excitation vector is about 220 seconds, less

118 than 4 minutes, when run on all 56 processors of the KSR1. The computed values are again compared with measured data for both polarizations in Figures 6.6 and 6.7; the agreement is excellent, albeit a bit worse than the spherical case. However, this fact is overshadowed by the fact that we have reduced the problem size by more than a third and computing time by about a fourth. 20 - 15 - 10 -b -10A — 0 30 60 90 Observation Angle op deg. Figure 6.6: Backscatter pattern of a metallic rectangular inlet (IA x IA x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is piecewise planar. We then considered the problem of scattering from a perfectly conducting cylindrical inlet. Even though integral equation codes are more efficient for such bodies of revolution, our primary concern in this test was to examine the performance of the conformal absorbing boundary conditions that we derived earlier. The target is a perfectly conducting cylindrical inlet having a diameter of 1.25A and a height of 1.875A. We first used a rectangular outer boundary, placed.45A from the farthest

119 20-. 15 --5 10 - 1 -1........... 0 30 60 90 Observation Angle.o, deg. Figure 6.7: Backscatter pattern of a metallic rectangular inlet (1A x 1A x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data [1]. Mesh termination surface is piecewise planar. edge of the target, to enclose the scatterer. The radar cross-section was then computed for a +-polarized incident wave in the yz-plane and compared with measured data (Figure 6.8). The agreement was found to be quite good for all lobes except the third. We expect the results to improve on moving the outer boundary farther away. The backscatter echo-area computed for the same geometry by Shankar [68] using the finite difference-time domain method with the absorbing boundary placed a few wavelengths from the scattering structure, agrees with the computed results remarkably well for all incidence angles. Next, we used a truly conformal termination scheme by using a cylindrical surface for mesh truncation. It should be noted that this is the first instance of a non-spherical surface (i.e., a surface having different principal curvatures) being ap

120 -10 - S~ -15 5 -20 (0 -25 - 0.) -30 - -3 -35 - -45................ -40..::::::::.. Measured 0 FE-ABC (HH pol) -4 0 30 60 90 Observation Angle 6o, deg. Figure 6.8: Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for HH polarization. The solid line indicates measured data [69] and the black dots indicate computed data. Mesh termination surface is a rectangular box plied to terminate a finite element mesh for solving open problems. The cylindrical outer boundary was placed about 0.45A from the target and RCS computations were carried out for a q polarized incident wave and compared with measured data [69] and with a body of revolution code [2] (Figure 6.9). As can be observed from Figures 6.8 and 6.9, the far-field results for a cylindrical termination and a rectangular termination do not differ significantly. However, the savings in computational cost is quite impressive. The cylindrical mesh termination has only 144,392 unknowns compared to the 191,788 unknowns for a rectangular truncation scheme. A spherical mesh termination would have swelled to about 265,000 unknowns, sampling density and outer boundary distance remaining the same. Thus we have reduced the problem size by about 45% and computation time by a similar, if not greater, amount by us

121 ing a conformal mesh termination scheme. The savings in computational resources is quite significant even when we compare the rectangular and cylindrical termination schemes - a 25% reduction in problem size and a similar decrease in computation time. In Figure 6.10, we plot the backscatter pattern for the same cylindrical inlet -10-...... ~ -15-._ -20 -I ' Z I * -25 - 3 '* m I * Jl p | 0 -30.. ter 1. -.o.....c -45 -.i.. 0 30 60 90 Observation Angle Oo, deg. ter= 1.25A, height=1.875A) for HH polarization. Black dots indicate computed values, the solid line represents measured data [69] and the dotted line is body of revolution data [2]. Mesh termination surface is a circular cylinder. in which the incident wave is 0 polarized. The agreement is seen to be decent for the entire range of incident angles. C. Lossy foam cylinder with embedded wires The next geometry to be considered was a lossy foam (er = 1.05 - j'0.2) cylinder having a radius of 1A and a height of 3.5A with 0.5A long perfectly conducting wires embedded in it. The wires are spaced 0.5A apart from each other and have a diameter of 0.01A. This is an interesting problem for two reasons: first, the orientation of the

122 -10-......... [ -15. o -25 -. 1.4.- 2 - -20 - '....................... 0 30 60 90 Observation Angle 6~, deg. Figure 6.10: Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for VV polarization. Black dots indicate computed values, the solid line is measured data [69] and the dotted line represents reference data from a body of revolution code [2]. Mesh termination surface is a circular cylinder. embedded wires makes it a difficult geometry for other methodologies to handle; second, the problem is very large since the cylinder has a volume of 11A3 and a surface area of 28.27A2. As a first case, we consider the wires to be aligned along the axis of the cylinder (the Z axis in this case) and compute the resulting backscatter pattern in the (Figure 6.11). The cylindrical mesh termination boundary was placed 0.45A from the flat and curved surfaces of the foam cylinder. The resulting system of equations had 437,064 unknowns but needed only an average of 3050 iterations to converge to the desired answer. Thus each angle of incidence took a little more than 12.5 minutes to compute on a 56-processor KSR1 massively parallel architecture. The impressive convergence rate is due to the low contrast and the loss in the dielectric medium as

123 15 10 5 0 0 b - 10 - - 1..:.............0 30 60 90 Observation Angle 8~, deg. Figure 6.11: Backscatter pattern of a lossy foam cylinder with three perfectly conducting wires embedded along the axis. The incident electric field is oriented parallel to the axis of the cylinder. Mesh termination surface is a circular cylinder. well as due to the presence of the metallic wires. In the second case, the middle wire is offset 0.25A in the negative X-direction from the cylinder axis. The number of unknowns and the time taken for convergence is comparable to the earlier case. The effect of the offset wire on the backscatter pattern can be studied in Figure 6.12. There is a slight asymmetry in the side lobes of the resulting backscatter pattern but, as can be expected, the effect is very small. D. Perfectly conducting plate The motivation for testing the FEMATS code on the perfectly conducting plate was two-fold. It is usually very difficult to model the scattering from the edges of the plate even using integral equation methods. Therefore, we wanted to carry out some tests to see how the code would behave at edge-on incidence. Secondly, we wanted

124 15 5 - 0 - b -10 -15 -20 20 - ----------- ----------------------- -25....... -90 -60 -30 0 30 60 90 Observation Angle 8o, deg. Figure 6.12: Backscatter pattern of a lossy foam cylinder with the middle wire offset 0.25A from the cylinder axis. The incident electric field is oriented parallel to the axis of the cylinder. Mesh termination surface is a circular cylinder. to examine the performance of termination boundaries of esoteric shapes. The first choice was to enclose the plate in a rectangular box. The second choice was to use a box with half cylinders attached to the faces normal to the plane of the plate - the reasoning being that since the edge of the plate behaves like a line source and scatters cylindrical waves, a cylindrical mesh termination was most suitable for wave absorption. It should be noted that both mesh termination schemes require approximately the same number of unknowns; the superiority of one over the other was thus decided only on the basis of computed backscatter values. Our test case is a 3.5A x 2A perfectly conducting rectangular plate. In Figure 6.13, we plot the backscatter pattern for the 00 polarization in the xz plane, i.e., over the long side of the plate. The computed values compare very well with reference

125 30 A 20 - 10 0 I b -10- z 8\ -20 --30 -0 30 60 90 Observation Angle 9O deg. Figure 6.13: Backscatter pattern (ass) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination. data; however, the code does not pick up the sharp null at 00 = 45~ and the two mesh termination schemes perform as well, although a slight improvement is noticeable for the box-cylinder termination. Next, we plot the backscatter pattern of the same geometry for the qq> polarization over the long side of the plate in Figure 6.14. Again, the agreement with reference data is quite good. However, the backscatter echo-area at edge-on incidence is not calculated accurately. In the next figure, we compute the RCS of the conducting plate in the yz plane, i.e., over its short side, for the (fq polarization. The backscatter echo-area for edge-on incidence is picked up very well for a rectangular-cylindrical termination whereas a rectangular truncation scheme gives completely incorrect results. These two schemes have approximately the same storage requirement; in fact, the box-cylinder combi

126 30 20 -10 0 - -10 z SIM -20 - -30 0 30 60 90 Observation Angle 00, deg. Figure 6.14: Backscatter pattern (aco) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination. nation yields a slightly smaller system of equation. This example truly illustrates the power of a conformal truncation scheme composed of simple shapes: not only are the results far more accurate but even the storage requirement is slightly less. In all the above simulations, the boundary was terminated at 0.35A from the flat face of the plate and 0.5A from the edges of the plate. In order to test the accuracy of the ABC method as a function of mesh termination distance, we computed the backscatter patterns from the edges of the plate with increasing mesh termination distance. Figure 6.16 shows that the backscatter values from the plate edges slowly take the shape of the reference data as the mesh truncation distance is increased. E. Glass plate In the final example, we present the results for one-of the most challenging problems solved by the FEMATS method. The target is a simple rectangular glass slab 1.75A

127 30 -20 10 -- 0 -"- U$a S < 0i0 -10 z 0o -20 -30-,, 0 30 60 90 Observation Angle po, deg. Figure 6.15: Backscatter pattern (akr) of a 3.5A x 2A perfectly conducting plate in the yz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination. long, 1A wide and.125A thick. The relative permeability of glass is taken to be 3 - j.09. The backscatter pattern (aoe) is sought for the 0 = 80~ cut. This is an extremely difficult problem for any numerical method to handle since the incident field is almost edge-on to the dielectric slab, causing the scattered field to have strong higher order components which decay appreciably only at large distances from the scatterer. In our first attempt, we enclosed the glass plate in a flat box placed.45A from it. Though the computed results were accurate for some angles, they departed significantly from the reference data [68] for other incidences. The results did not show a significant improvement on shifting the outer boundary 0.5A away from the scatterer while maintaining its shape. The next step was to modify the shape of the outer boundary such that the flat box had half cylinders attached to the faces normal to

128 30 - 20 CGFFT 2 -* FE-ABC (.45) o FE-ABC (.65) -O 10 - N 0 --10 - -20-..,,..... |..... 0 30 60 90 Observation Angle o0, deg. Figure 6.16: Backscatter pattern (crk,) of a 3.5A x 2A perfectly conducting plate in the xz plane. The numbers in the legend indicate mesh termination distance from the plate edges. the plane of the slab - the reasoning was the same as that for the perfectly conducting rectangular plates mentioned in an earlier section. Further, this termination scheme results in an outer boundary with no sharp edges. The free space between two cylinders with their axes perpendicular to each other is filled with a quarter sphere having the same radius as the cylinders (see Figure 6.17). As can be observed in Figure 6.18, the computed results agree quite closely with the reference data for most of the incident angles. The problem was solved with only 155,000 unknowns and needed an average of 2700 iterations to converge to the specified tolerance. The slow convergence is typical of geometries that are composed of dielectrics since the linear system of equations becomes indefinite due to non-unit values of the relative permittivity.

129 SDRC I-DEAS VI.i: FE-Modeling-a-nalysis z Figure 6.17: An eighth of a glass plate enclosed inside a smooth mesh termination boundary composed of flat planes and cylindrical and spherical sections. 6.3 Conclusion In the previous section, we have used our FEMATS methodology to compute scattering patterns from large, three dimensional geometries having arbitrary shapes and complicated configurations and with material inhomogeneities. It has been shown that the solution technique does indeed live upto its promise of delivering accurate results with the expenditure of minimal computer resources. Very large problems can be solved in real time with a fraction of the resources that existing numerical electromagnetics codes require. There are basically two parameters that govern the accuracy of the final result.

130 -20-i-l E -25 c^ -35 ^ / 0Y 0 n <.^. * X* c -40 0) | -45 |-50 - -50 0 FEMATS -55- - - - - -| - - - - - - li i I 0 30 60 90 Observation Angle <p., deg. Figure 6.18: Backscatter pattern (aur) of a glass plate (1.75A x 1A x.125A) having a relative permittivity of (3 - j.09) for the 0 = 80~ cut. The first one is the mesh termination technique. Though lower order absorbing boundary conditions are simpler to implement, a significant penalty in computational resources has to be paid for getting the same accuracy as the higher order ABCs. The reliability and the order of the ABC is thus crucial to the computation process. The second parameter is the shape of the mesh termination boundary and its distance from the scatterer. As shown in the previous sections, significantly better results can be obtained by using smooth termination boundaries in place of flat boxes with sharp corners. The distance of the termination boundary from the scattering body is also important in obtaining accurate results. We have determined that a distance of.45A usually gives reliable results for large three dimensional geometries. For small problems, distances of.3A -.35A are sufficient to give accurate far-field values. In order to obtain reliable near-field values such as the input impedance of an antenna,

131 the mesh must be terminated farther away from the target.

CHAPTER VII Conclusions This thesis is one of the first works known to the author to tackle the problem of three dimensional scattering using finite elements and ABCs. It started out as an investigation into the methodology mentioned above, given its extremely attractive features for solving large problems. The 0(N) storage requirement and the iterative equation solver are essentially the keys to this technique, since they allow the solution of extremely large, realistic problems with minimal computational overhead. As the technology progresses to higher frequencies and longer electrical dimensions, the superiority of this solution methodology over integral equation or hybrid techniques will become even more apparent. The opening chapter outlined the motivation in choosing this solution technique over other traditional methods. In Chapter 2, we described the basic concepts of electromagnetics and finite elements. They were by no means complete or exhaustive and the interested reader is referred to several excellent texts for reference [9, 12, 13]. Chapter 3 provided the rationale for employing edge basis functions for discretizing electromagnetic field variables in three dimensional computation. A full review of nodal and edge bases of various orders for-two and three dimensions was also carried out. In Chapter 4, the basic formulation of the solution technique was presented along 132

133 with code validation for a large number of small and composite geometries. Since the methodology was found to perform extremely well for small complex geometries with inhomogeneities, we considered extending it to compute scattering from very large problems. In order to achieve this goal, the first step was to make the finite element code coinputationally efficient. Chapter 5 discussed the various optimization techniques that were carried out to make the code as computationally efficient as possible on a wide class of vector and parallel machines. Since the mesh termination condition used till now were valid only for spherical or flat terminations, the next step was to derive more efficient boundary conditions that would yield accurate results even with reduced computational resources. In Chapter 6, new ABCs were derived which are enforceable on mesh termination boundaries conformal to the surface of the target and result in drastic reductions in the number of unknowns and hence solution time. Thus, it can be stated that this thesis has achieved its objective of showing the efficacy and the accuracy of computing three dimensional scattering from large, composite and complex-shaped structures using finite elements in conjunction with the ABC method of mesh termination. Any research, however, by its very nature generates more questions than it answers. The work outlined in the previous chapters is no exception. The author has sought to provide answers to the more immediate questions but many more need to be examined for making this methodology robust and viable in commercial applications. Tests on a large variety of complicated problems have produced encouraging results and exhibited the versatility of this method. Future research into this method must inevitably focus on the following aspects: * iterative refinement of the solution through a posteriori error estimation and the use of hierarchical basis functions. Hierarchical basis functions use p refinement

334 instead of h refinement to adaptively correct the solution, maintaining the coarse mesh throughout. * further reduction in the number of unknowns through the use of 1. mixed node-based and edge-based elements. Since node-based elements usually have half as many unknowns as edge-based elements, nodal basis can be employed in free-space regions to reduce unknown count. 2. derivation of higher order boundary conditions. This would allow the mesh termination boundary to be placed even closer to the target, thus reducing the number of unknowns dramatically - the possible asymmetry of the resulting equation system is an acceptable tradeoff. * more robust, geometry-dependent, iteratively refined numerical mesh termination conditions that would guarantee the accuracy of the final solution. In most cases, it is the mesh termination condition and its distance from the scatterer which determine the accuracy of the computation. Numerical, iteratively refined ABCs could theoretically be applied very close to the scattering or radiating target with excellent accuracy. * extension of the formulation to antenna radiation and electromagnetic interference(EMI) problems. The formulation for the antenna problem must include efficient source modeling and reliable simulation of the near-field for impedance calculations. EMI phenomena critical to electronic packaging can be predicted efficiently by the technique through proper modeling of the circuit or the device to be shielded. * more efficient mesh generation packages dedicated to electromagnetics.

135 Most of the extensions mentioned above are not trivial and each could form the core of another doctoral dissertation. However, the efficient realization of even a few of these topics would result in a very powerful technique for computing electromagnetic radiation or scattering from arbitrary three dimensional structures reliably and efficiently.

APPENDICES 136

137 APPENDIX A Derivation of matrix elements The derivation of the matrix elements in ( 4.7) amounts to evaluating the integrals in ( 4.4) and ( 4.5). Therefore, from ( 4.4), we have I 4 -(VxWe).(VxW) = -g.g= V Ve tr 'Pr (A.1) since V x W( == 2gi. The evaluation of the integral in ( 4.5) is more cumbersome. Substituting into ( 4.5) the basis functions defined in ( 4.11), we obtain er JV V..Wdv = er {(fi'fJ) + (r.D) + (gi x r).(gj x r)} dv = r (I1 + I2 + I3) (A.2) where D = (f x g) + (fj x gi) and 1 = V f.fjdv '2 = / r.Ddv 3 = (gi x r).(gj x r)dv (A.3) (A.4) (A.5)

138 Since f is a constant vector, I1 reduces to II = fi.fjVe (A.6) Since 4 x = E LiXi i=l 4 y = E Liyi i=1 4 z = Lizi i=1 where Li are the shape functions for the tetrahedral element and xi, yi, Zi(i = 1,, 4) denote the x, y and z co-ordinates of the vertices of the tetrahedral element. Using the standard formula for volume integration within a tetrahedral element and simplifying, we have 12 e D xi + Dy Eyi + DzEzi (A.7) i=l i=l i=l where Dm is the mth component of D. The evaluation of I3 can be simplified by the use of basic vector identities. Therefore, 13 = gi.gj f r dv- f (gi.r) (gj.r) dv = (iygjy + gizjz) x2dv + (gixgj + gizgjz) J ydv + (gixgjx + giygjy) v Zdv (gixgjy + gjxgiy) j xydv - (gigjz + gjxgiz) J zxdv - (giygjz + gjygiz) j yzdv (A.8) where gim represents the mth component of the vector gi. Each of the volume integrals in the above equation can be easily evaluated analytically and the result expressed in the following general form:

139 /4Ve[\ 4 4 ] I alamdv = 20 aliami+ ali E ami (A.9) j 20 L=1 i=1 i=1 where 1, m = 1, 3 and al represents the variable x, a2 stands for the variable y and a3 denotes the variable z.

140 APPENDIX B Derivation of some vector identities The curl of a vector in the Dupin coordinate system is given by VxE=VTxE+nfix (B.1) where VT x E is called the surface curl involving only the tangential derivatives and is defined as VT x E = -n x VEn + t2/lEtl - tl12Et2 + nV (E x i) (B.2) In the above equation, Ki and K2 denote the principal curvatures of the surface under consideration, Etl, Et2 are the tangential components and En is the normal component of the vector E on the surface. We are interested in the evaluation of the three vector identities given in Chapter 5. Let us consider simplifying the tangential components of the curl of a vector, E in this case. Using the definition of the curl given above, we have n x VxE = -(n x n x V)En-lEtltl-2Et2t2 + n x (n x = VtEn - [( + KEt + (at + 2Et2 t2 = VtEn + x VxEt (B.3) where -(n x n x V) = Vt. The first vector identity is, therefore, easily proved.

141 Next, we will prove the second of the three identities. We start with the term n x V xVtE and simplify it using the definition of the curl of a vector given above. n x VxVtE, = n x Vx VEn -n ]n aEn = -nxV xn on aEn = (x n x V)an = V aEn (B.4) kan/ Since V E = V Et + (V n)En + aEn, we can simplify the above relation even further by substituting the appropriate expression for the normal derivative of the normal component of the electric field and using the fact that the electric field is divergence-free in a source-free region. n x VxVtEn = Vt(V Et) + (V n)VtEn = Vt(V Et) + 2KmVtEn (B.5) where Km = (Ki + K2)/2 is the mean curvature. The proof of the third identity is more complicated since it involves two curl operations on the electric field. We first need to switch the positions of the outermost nix and the Vx operators to arrive at a simplified form of the rather complex expression. Therefore, n x Vx(n x VxE) = Vx {n x n x VxE} - nVt (n x VxE) + AK(n x VxE) = -Vx{VxE-n (VxE)}-nVt (n x VxE) +Ar(n x VxE) (B.6) Now we use the fact that the electric field satisfies the wave equation to reduce the expression even further. n x Vx(n x VxE) = -k E + Vx {n(VxE)n} + k nEn + AK(n x VxE)

142 = Vx {(VxE)} - ko2Et + A/(n x VxE) (B7) where An = 1 - K2 -Thus, we have shown that all three identities hold as long as the vector, E in this case, is divergenceless and satisfies the vector wave equation.

BIBLIOGRAPHY 143

144 BIBLIOGRAPHY [1] A. Woo, M. Schuh, M. Simon, H.T.G. Wang, and M.L. Sanders. Radar crosssection measurement data of a simple rectangular cavity. NWC TM 7132, 1991. [2] A. Glisson and D.R. Wilton. Simple and efficient numerical techniques for treating bodies of revolution. Univ. Mississippi Tech. Rep. 105, 1982. [3] R.F. Harrington. Field computation by moment method. Macmillan: New York, 1968. [4] E.K. Miller, L. Medgyesi-Mitschang, and E.H. Newman: eds. Computational Electromagnetics: Frequency-domain method of moments. IEEE Press, 1992. [5] A.F. Peterson. The interior resonance problem associated with surface integral equations of electromagnetics: numerical consequences and a survey of remedies. Electromagnetics, vol. 10, pp. 293-312, 1990. [6] J.R. Mautz and R.F. Harrington. H-field, E-field and combined-field solutions for conducting bodies of revolution. AEU, vol. 32, pp. 157-163, 1978. [7] J.D. Collins, J.M. Jin, and J.L. Volakis. Eliminating interior resonances in FE-BI methods for scattering. IEEE Trans. Antennas Propagat., vol. 40, pp. 1583-1585, Dec 1992. [8] B. Stupfel, R. Le Martret, P. Bonnemason, and B. Scheurer. Solution of the scattering problem by axisymmetrical penetrable objects with a mixed boundaryelement and finite-element method. Proc. JINA, pp. 116-120, 1990. [9] J.A. Stratton. Electromagnetic theory. McGraw Hill: New York, 1941. [10] W. Ritz. Uber eine neue Method zur L6sung gewissen Variations - Probleme der mathematischen Physik. J. Reine Angew. Math., vol. 135, pp. 1-61, 1909. [11] S.G. Mikhlin. Variational methods in mathematical physics. Macmillan, New York, 1964. [12] O.C. Zienkiewicz. The finite element method. McGraw Hill, New York, 3rd edition, 1979.

145 [13] P.P. Silvester and R.L. Ferrari. Finite elements for electrical engineers. Cambridge University Press, 2nd edition, 1990. [14] A. Bossavit and J.C. Verite. A mixed FEM-BIEM method to solve 3D eddy current problems. IEEE Trans. Magnetics, vol. 18, pp. 431-5, Mar 1982. [15] J.P. Webb. Edge elements and what they can do for you. IEEE Trans. Magnetics, vol. 29, pp. 1460-1465, 1993. [16] H.R. Schwarz. Finite element methods. Academic Press, 1988. [17] Z.J. Cendes and P. Silvester. Numerical solution of dielectric loaded waveguides: I - Finite element analysis. IEEE Trans. Microwave Theory Tech., vol. 118, pp. 1124-1131, 1970. [18] X. Yuan, D.R. Lynch, and K. Paulsen. Importance of normal field continuity in inhomogeneous scattering calculations. IEEE Trans. Microwave Theory Tech., vol. 39, pp. 638-642, Apr 1991. [19] H. Whitney. Geometric Integration Theory. Princeton University Press, 1957. [20] J.C. Nedelec. Mixed finite elements in R3. Numer. Math., vol. 35, pp. 315-41, 1980. [21] M. Hano. Finite element analysis of dielectric-loaded waveguides. IEEE Trans. Microwave Theory Tech., vol. 32, pp. 1275-9, Oct. 1984. [22] G. Mur and A.T. deHoop. A finite element method for computing threedimensional electromagnetic fields in inhomogeneous media. IEEE Trans. Magnetics, vol. 21, pp. 2188-91, Nov 1985. [23] J.S. van Welij. Calculation of eddy currents in terms of H on hexahedra. IEEE Trans. Magnetics, vol. 21, pp. 2239-41, Nov 1985. [24] M.L. Barton and Z.J. Cendes. New vector finite elements for three-dimensional magnetic field computation. J. Appl. Phys., vol. 61, no. 8, pp. 3919-21, Apr 1987. [25] J.F. Lee, D.K. Sun, and Z.J. Cendes. Full-wave analysis of dielectric waveguides using tangential vector finite elements. IEEE Trans. Microwave Theory Tech., vol. 39, no. 8, pp. 1262-71, Aug 1991. [26] D.H. Schaubert, D.R. Wilton, and A.W. Glisson. A tetrahedral modeling method for electromagnetic scattering by arbitrarily shaped inhomogeneous dielectric bodies. IEEE Trans. Antennas Propagat., pp. 77-85, Jan 1984. [27] D.R. Tanner and A.F. Peterson. Vector expansion functions for the numerical solution of maxwell's equations. Microwave Opt. Tech. Lett., vol. 2, no. 2, pp. 331-334, 1989.

146 [28] J.F. Lee, D.K. Sun, and Z.J. Cendes. Full-wave analysis of dielectric waveguides using tangential vector finite elements. IEEE Trans. Microwave Theory Tech., vol. 39, no. 8, pp. 1262-1271, Aug 1991. [29] Z.J. Cendes. Vector finite elements for electromagnetic field computation. IEEE Trans. Magnetics, vol. 27, no. 5, Sep 1991. [30] J.M. Jin and J.L. Volakis. Electromagnetic scattering by and transmission through a three-dimensional slot in a thick conducting plane. IEEE Trans. Antennas Propagat., vol. 39, pp. 543-50, Apr 1991. [31] A. Bossavit. Whitney forms: a class of finite elements for three-dimensional computations in electromagnetism. IEE Proceedings, vol. 135, pt. A, no.8, Nov 1988. [32] J.F. Lee, D.K. Sun, and Z.J. Cendes. Tangential vector finite elements for electromagnetic field computation. IEEE Trans. Magnetics, vol. 27, no. 5, pp. 4032-5, Sep 1991. [33] J.S. Wang and N. Ida. Curvilinear and higher order 'edge' finite elements in electromagnetic field computation. IEEE Trans. Magnetics, vol. 29, no. 2, pp. 1491-4, Mar 1993. [34] J.P. Webb and B. Forghani. Hierarchal scalar and vector tetrahedra. IEEE Trans. Magnetics, vol. 29, no. 2, pp. 1495-8, Mar 1993. [35] A.F. Peterson and S.P. Castillo. A frequency-domain differential equation formulation for electromagnetic scattering from inhomogeneous cylinders. IEEE Trans. Antennas Propagat., vol. 37, no. 5, pp. 601-607, May 1989. [36] R. Mittra and 0. Ramahi. Absorbing boundary conditions for the direct solution of partial differential equations arising in electromagnetic scattering problems. Elsevier:New York, 1990. Chapter 4 in Finite Element and Finite Difference Method in Electromagnetic Scattering ed. M.A. Morgan. [37] B.M.A. Rahman and J.B. Davies. Penalty function improvement of waveguide solution by finite elements. IEEE Trans. Microwave Theory Tech., vol. 32, pp. 922-28, Aug 1984. [38] J.P. Webb. Finite element analysis of dispersion in waveguides with sharp metal edges. IEEE Trans. Microwave Theory Tech., vol. 36, no. 12, pp. 1819-1824, Dec 1988. [39] Steven H. Wong and Z.J. Cendes. Combined finite element-modal solution of three-dimensional eddy current problems. IEEE Trans. Magnetics, vol. 24, no. 6, Nov 1988. [40] A. Bossavit. Solving maxwell's equations in a closed cavity, and the question of spurious modes. IEEE Trans. Magnetics, vol. 26, no. 2, pp. 702-705, Mar 1990.

147 [41] X. Yuan. Three dimensional electromagnetic scattering from inhomogeneous objects by the hybrid moment and finite element method. IEEE Trans. Antennas Propagat., vol. 38, pp. 1053-1058, 1990. [42] G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1985. [43] L.W. Pearson, A.F. Peterson, L.J. Behrmasel, and R.A. Whitaker. Inwardlooking and outward-looking formulations for scattering from penetrable objects. IEEE Trans. Antennas Propagat., vol. 40, pp. 714-720, 1992. [44] J. Angelini, C. Soize, and P. Soudais. Hybrid numerical method for harmonic 3D Maxwell equations: Scattering by a mixed conducting and inhomogeneous anisotropic dielectric medium. IEEE Trans. Antennas Propagat., vol. 41, no. 1, pp. 66-76, Jan 1993. [45] J. D'Angelo and I.D. Mayergoyz. Three-dimensional RF scattering by the finite element method. IEEE Trans. Magnetics, vol. 27, no. 5, pp. 3827-3832, Sep 1991. [46] I.D. Mayergoyz and J. D'Angelo. New finite element formulation for 3D scattering problem. IEEE Trans. Magnetics, vol. 27, no. 5, pp. 3967-3970, Sep 1991. [47] A.F. Peterson. Absorbing boundary conditions for the vector wave equation. Microwave and Opt. Techn. Letters., vol. 1, pp. 62-64, Apr 1988. [48] J.P. Webb and V.N. Kanellopoulos. Absorbing boundary conditions for finite element solution of the vector wave equation. Microwave and Opt. Techn. Letters, Vol. 2, No. 10, pp. 370-372, Oct 1989. [49] J. D'Angelo and I.D. Mayergoyz. Finite element methods for the solution of RF radiation and scattering problems. Electromagnetics, Vol. 10,pp. 177-199, 1990. [50] T.B.A. Senior. Combined resistive and conductive sheets. IEEE Trans. Antennas Propagat., Vol. AP-33, pp. 577-579, 1985. [51] A.D. Yaghjian and R.V. McGahan. Broadside radar cross-section of a perfectly conducting cube. IEEE Trans. Antennas Propagat., vol. 33, no. 3, pp. 321-329, Mar 1985. [52] Courtesy of Northrop Corp., B2 Division, Pico Rivera, CA. [53] D.R. Kincaid and T.C. Oppe. ITPACK on supercomputers. Numerical Methods, Lecture Notes in Mathematics, vol. 1005, pp. 151-161, 1982. [54] G.V. Paolini and G. Radicati di Brozolo. Data structures to vectorize CG algorithms for general sparsity patterns. BIT, vol. 29, pp. 703-718, 1989.

148 [55] M.R. Hestenes and E.Stiefel. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand., vol. 49, pp. 409-436, 1952. [56] P. Sonneveld. CGS, a fast solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., vol. 10, pp. 35-52, 1989. [57] R. Freund. Conjugate-gradient type methods for linear systems with complex symmetric coefficient matrices. SIAM J. Sci. Stat. Comput., vol. 13, pp. 425 -448, 1992. [58] H.P. Langtangen. Conjugate gradient methods and ILU preconditioning of nonsymmetric matrix systems with arbitrary sparsity patterns. Int. J. Numer. Meth. Fluids, vol. 9, pp. 213-233, 1989. [59] J.R. Lovell. Hierarchical basis functions for 3D finite element methods. ACES Digest, pp. 657-63, 1993. [60] E. Rothberg and A. Gupta. Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck. Parallel Computing, vol. 18, pp. 719-741, 1992. [61] E. Anderson and Y. Saad. Solving sparse triangular linear systems on parallel computers. International Journal of High Speed Computing, vol. 1, no. 1, pp. 73-95, 1989. [62] D. Windheiser, E. Boyd, E.Hao, S.G. Abraham, and E.S. Davidson. KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver. In Proc. of the 7th International Parallel Processing Symposium, Apr 1993. Newport Beach. [63] C.T. Tai. Generalized vector and dyadic analysis. IEEE Press, New York, 1992. [64] D.S. Jones. An improved surface radiation condition. IMA J. Appl. Math., vol. 48, pp. 163-193, 1992. [65] C.H. Wilcox. An expansion theorem for electromagnetic fields. Comm. Pure Appl. Math., vol. 9, pp. 115-134, May 1956. [66] S.M. Rytov. Computation of the skin effect by the perturbation method. J. Exp. Theor. Phys., vol. 10, pp. 180, 1940. Translation by V. Kerdemelidis and K.M. Mitzner. [67] A. Chatterjee, J.M. Jin, and J.L. Volakis. Edge-based finite elements and vector ABCs applied to 3D scattering. IEEE Trans. Antennas Propagat., vol. 41, no.2, pp. 221-226, Feb 1993. [68] V. Shankar, William F. Hall, Alireza Mohammedian, and Chris Rowell. Development of a finite-volume, time-domain solver for maxwell's equations. Rockwell Technical Report prepared for NASA/NDC under contract N62269-90-C-0257, May 1993.

149 [69] M. Schuh, A. Woo, M. Sanders, and H.T.G. Wang. Radar cross-section measurement data of four small cavities. NASA Tech. Memo. 108782, Nov 1993.