INVESTIGATION OF FINITE ELEMENT-ABC METHODS FOR ELECTROMAGNETIC FIELD SIMULATION by Arindam Chatterjee A dissertation submit{ed in partial fulfillment of the Feq m:iets for the degree of Do!tor of Philosophy (Electrical Engineering) in The University of Michigan 1994 Doctoral Committee: Associate Professor John L. Volakis, Chairperson Assistant Professor Gregory M. Hulbert Associate Professor Linda P.B. Katehi Professor Thomas B.A. Senior

TABLE OF CONTENTS LIST OF FIGURES............................... iv LIST OF TABLES................................ viii LIST OF APPENDICES............................ x CHAPTER I. Introduction.............................. 1 1.1 Background............................ 2 1.2 Outline of thesis................. 4 II. Basic concepts of electromagnetics and finite elements.... 7 2.1 Electromagnetics: basic concepts................ 7 2.1.1 Maxwell's equations.................. 7 2.1.2 Wave equations.................... 10 2.1.3 Boundary conditions................. 10 2.1.4 Radiation conditions................. 11 2.1.5 Radar cross-section.................. 12 2.2 Finite elements: basic ideas................... 12 2.2.1 Boundary value problem............... 13 2.2.2 The Ritz method................... 13 2.2.3 Galerkin's method................... 15 2.2.4 Implementation scheme................ 16 III. Shape functions for scalar and vector finite elements..... 19 3.1 Node-based elements.......................20 3.1.1 Two dimensional elements..............20 3.1.2 Three dimensional elements..............24 3.2 Edge-based elements....................... 27 3.2.1 Two dimensional elements.............. 27 3.2.2 Three dimensional elements.............. 31 R_9 ~~~~~~~~~~~~~~~~~~~~~~ 1F@P-a.~1A m ni ----------------- 9

IV. Vector finite elements for 3D electromagnetic problems... 43 4.1 Formulation.......................... 45 4.1.1 Finite element equations.................. 45 4.1.2 Origin of spurious solutions................. 47 4.1.3 Basis functions..................... 49 4.2 R esults............................ 51 4.3 Conclusions............................ 54 4.4 Formulation........................... 57 4.4.1 Derivation of finite element equations........ 57 4.5 Finite element discretization.................. 64 4.6 R esults.............................. 65 4.7 Conclusions............................ 71 V. Optimization and parallelization.................. 74 5.1 Numerical considerations.................... 75 5.2 Matrix storage and generation................. 76 5.3 Linear equation solver...................... 79 5.4 Preconditioning......................... 80 5.4.1 Diagonal preconditioner................ 81 5.4.2 Modified ILU preconditioner............. 82 5.5 Parallelization.......................... 88 5.5.1 Analysis of Communication.............. 93 VI. Conformal absorbing boundary conditions for the vector wave equation.................................... 99 6.1 Formulation........................... 100 6.1.1 Unsymmetric ABCs.................. 102 6.1.2 Symmetric correction................. 105 6.1.3 Finite element implementation........... 107 6.2 Applications........................... 112 6.3 Conclusion........................... 128 VII. Conclusions............................... 130 APPENDICES................................ 134 BIBLIOGRAPHY.............................. 141 iii

LIST OF FIGURES Figure 3.1 Rectangular element.......................... 21 3.2 Transformation of a quadrilateral element in the xy plane to a unit square in the by plane......................... 22 3.3 Triangular element........................... 23 3.4 Tetrahedral element........................... 26 3.5 Second order triangular edge element................. 31 3.6 Rectangular brick element....................... 32 3.7 Mapping of a hexahedral element to a unit cube........... 35 3.8 Tetrahedral element........................... 40 4.1 Performance comparison of rectangular bricks and tetrahedrals... 52 4.2 Geometry of ridged cavity....................... 53 4.3 Illustration of scattering structure Vd enclosed by an artificial mesh termination surface, S5, on which the absorbing boundary condition is im posed............................... 58 4.4 Bistatic echo-area of a perfectly conducting cube having edge length of 0.755A. Plane wave incident from 0 = 180~; 90~....... 66 4.5 Backscatter RCS of a perfectly conducting cube at normal incidence as a function of edge length...................... 67 iv

4.6 Backscatter pattern of a perfectly conducting cylinder having a radius of 0.3A and a height of 0.6A. The solid and the dashed lines indicate data obtained from a body of revolution code and the black and white dots indicate FE-ABC data................. 68 4.7 Bistatic echo-area of a homogeneous dielectric sphere (er = 4; koa = 1). 69 4.8 Normalized bistatic pattern of a lossy prolate spheroid (Er = 4 - jl; koa = 7r/2; a/b = 2), where a and b are the major and minor axes of the spheroid, respectively...................... 70 4.9 Geometry of cube (a = b = 0.5A) consisting of a metallic section and a dielectric section (er = 2 - j2), where the latter is bounded by a resistive surface having R = Z..................... 71 4.10 RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The lower half of the cube is metallic while the upper half is air-filled with a resistive card draped over it......... 72 4.11 RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The composition of the cube is the same as in Figure 4.10, except that the air-filled portion is filled with dielectric. The solid curve is the FEMATS pattern and the black dots are MoM data for the E'znC = 0 polarization................... 73 5.1 Structure of block preconditioning matrix.............. 82 5.2 Symmetric biconjugate gradient method with preconditioning..... 91 5.3 Speedup curve for the linear equation solver on the KSR1...... 92 5.4 Counts of p subpages required by each processor for sparse matrixvector multiply (total copies=5968).................. 95 5.5 Degree of sharing histogram of p subpages during sparse matrixvector multiply (28 procesors)..................... 96 6.1 Scatterer enclosed in conformal mesh termination boundary..... 100 6.2 Geometry of cube (a = b = 0.5A) consisting of a metallic section and a dielectric section (c, = 2 - j2), where the latter is bounded by a resistive surface having R = Zo.................... 113

6.3 RCS pattern in the x - z plane for the composite cube shown earlier. The solid curve is the FEMATS pattern and the black dots are MoM data for the E/nc = 0 polarization. Mesh termination is piecewise planar.................................. 114 6.4 Backscatter pattern of a metallic rectangular inlet (1A x 1A x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is spherical.................................. 115 6.5 Backscatter pattern of a metallic rectangular inlet (1A x 1A x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is spherical.................................. 116 6.6 Backscatter pattern of a metallic rectangular inlet (IA x IA x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is piecewise planar............................ 117 6.7 Backscatter pattern of a metallic rectangular inlet (IA x 1A x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is piecewise planar............................ 118 6.8 Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for HH polarization. Mesh termination surface is a rectangular box...................... 119 6.9 Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for HH polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is a circular cylinder............. 120 6.10 Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for VV polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is a circular cylinder............. 121 6.11 Backscatter pattern of a lossy foam cylinder with three perfectly conducting wires embedded along the axis. The incident electric field is oriented parallel to the axis of the cylinder. Mesh termination surface is a circular cylinder...................... 122 vi

6.12 Backscatter pattern of a lossy foam cylinder with the middle wire offset 0.25A from the cylinder axis. The incident electric field is oriented parallel to the axis of the cylinder. Mesh termination surface is a circular cylinder.................................. 123 6.13 Backscatter pattern (aoo) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination......... 124 6.14 Backscatter pattern (acr) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination......... 125 6.15 Backscatter pattern (aor) of a 3.5A x 2A perfectly conducting plate in the yz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination.......... 126 6.16 Backscatter pattern (uak) of a 3.5A x 2A perfectly conducting plate for a conical 0 - 800 cut. The white dots indicate box termination; the black dots represent a combined box-cylinder termination.... 127 6.17 Backscatter pattern (ucr) of a glass plate (1.75A x 1A x.125A) having a relative permittivity of (3 - j.09) for the 0 = 80~ cut........ 128 vii

LIST OF TABLES Table 3.1 Edge numbering for rectangular element............... 28 3.2 Edge numbering for triangular element................ 30 3.3 Edge definition for rectangular brick.................. 34 3.4 Edge definition for tetrahedron..................... 36 3.5 Hierarchical basis functions for tetrahedron.............. 42 4.1 TETRAHEDRON EDGE DEFINITION................. 49 4.2 EIGENVALUES (k0, CM-1) FOR AN EMPTY 1CM X 0.5CM X 0.75CM RECTANGULAR CAVITY................. 51 4.3 Eigenvalues (k0o, cm-') for a Half-Filled 1cm x 0.1cm x 1cm Rectangular Cavity Having a Dielectric Filling of cr = 2 Extending from z = 0.5cm to z = 1.0cm........................ 53 4.4 Eigenvalues (k0, cm-1) for an empty spherical cavity of radius 1cm 54 4.5 Eigenvalues for an empty cylindrical cavity of base radius 0.5cm and height 0.5 cm (380 unknowns)..................... 55 4.6 Ten lowest non-trivial eigenvalues (ko, cm-') for the geometry drawn in Figure 2: (a) 267 Unknowns; (b) 671 Unknowns......... 56 5.1 No. of iterations vs no. of blocks for a block ILU preconditioned biconjugate gradient solution method................ 87 5.2 No. of iterations required for convergence of a 224,476 unknown system using the point diagonal and block ILU preconditioning strategies...................................87 viii

5.3 Floating point operations per iteration................. 89 5.4 Execution time and speedup for the iterative solver......... 93 5.5 Execution time and speedup for the matrix generation and assembly (20,033 unknowns)........................... 93 ix

LIST OF APPENDICES Appendix A. Derivation of matrix elements....................... 135 B. Derivation of some vector identities................... 138 x

CHAPTER I Introduction The mechanics of wave propagation in the presence of obstacles is of great interest in many branches of engineering and applied mathematics like electromagnetics, fluid dynamics, geophysics, seismology, etc. Such problems can be broadly classified into two categories: the bounded domain or the closed problem and the unbounded domain or the open problem. Analytical techniques have been derived for the simpler problems; however, the need to model complicated geometrical features, complex material coatings and fillings and to adapt the model to changing design parameters have inevitably tilted the balance in favor of numerical techniques. The modeling of closed problems presents difficulties primarily in proper meshing of the interior region. However, problems in unbounded domains pose a unique challenge to computation, since the exterior region is inappropriate for direct implementation of numerical techniques. A large number of solutions have been proposed but only a few have stood the test of time and experiment. The goal of this thesis is to develop an efficient and reliable partial differential equation technique to model large three dimensional scattering problems in electromagnetics. 1

1.1 Background Ever since the method of moments (MoM) was introduced by Harrington [1] in the late 60s, numerical techniques for predicting electromagnetic field behavior have gained in popularity. With increases in computing speeds and memory and the need to simulate real-life problems, researchers have been actively trying to refine the older numerical methods as well as devise newer and more efficient solution techniques. The MoM is based on applying integral equations on the surface of the desired structure and computing the fields everywhere in space [2]. For anisotropic materials, the entire volume needs to be discretized and a volume integral equation must then be solved. However, the matrix obtained from discretizing an integral equation is full and thus requires 0(N2) storage, where N is the number of unknowns. In 3D problems, this is a serious limitation since the method can scale up to large problems only at considerable computational cost. Thus we need to seek solution techniques which scale favorably with increasing problem size. Partial differential equation (PDE) methods, like finite element and finite difference methods, offer the most attractive alternative to integral equation techniques since they lead to matrix systems which are sparse. Therefore, only the non-zero entries of the final matrix system need to be stored resulting in O(N) storage requirement. Thus the increase in storage demand with increasing problem size is seen to be minimal. PDE techniques like finite elements were, however, originally constructed for solving bounded domain problems. In recent times, finite elements are increasingly being used for modeling unbounded problems, where the desired parameter decays off to zero infinitely away from the region of interest. In electromagnetics, the desired

3 parameter is a field quantity like the electric or magnetic field. It is obviously impractical to extend the finite element mesh to infinity; thus the mesh must be truncated at a suitable distance from the region of interest. Boundary conditions should then be applied at this artificial mesh truncation surface such that the boundary appears transparent to the propagating field. There are two types of mesh truncation conditions: exact and approximate. Exact boundary conditions can be placed very close to the region of interest; however, they suffer from potential uniqueness problems [3] and give rise to partly full systems. The loss of uniqueness associated with systems where the exact boundary condition is employed on the mesh truncation boundary is well-known and was first pointed out by Mautz and Harrington[4]. Remedies like complexification of the wave number [5], using the combined field integral equation [6] exist and must be used for a robust implementation. Although the problem of interior resonances can be now avoided, the finite element-boundary integral system still possesses a partly full system which affects its scalability to large problems. Approximate boundary conditions, on the other hand, are local in nature but preserve the sparsity of the finite element system. This advantage is partly offset by the fact that the finite element mesh must be extended some distance away from the region of interest and the approximate boundary condition imposed on the mesh truncation surface. These boundary conditions work on the principle that the higher order terms of the expansion for the propagating field decay rapidly away from the target. Therefore, if the truncation boundary is placed far enough from the region of interest, the boundary condition on the mesh termination surface needs to absorb only the lowest order terms of the field expansion to accurately model the physics of the problem. In this thesis, our aim is to examine the performance of these

4 approximate conditions, also known as absorbing boundary conditions (ABCs). in practical three dimensional problems and to derive improved boundary conditions which will enable more efficient utilization of the available resources. 1.2 Outline of thesis This dissertation describes the development of a finite element method for the solution of general three-dimensional scattering problems. The entire research has been geared towards a robust, state-of-the-art solution of unbounded domain problems in electromagnetics. Improvements in solution convergence, mesh termination conditions, algorithmic complexity and computational speed have all been carried out with an eye to making the methodology more efficient in terms of computer storage and time. Chapter 1 presents a brief introduction to the problem, its possible applications in science and industry and our motivation in preferring this solution methodology over more traditional ones. Chapter 2 gives a short review of the fundamental laws that govern electromagnetic phenomena and the modeling technique of finite elements. The wave equation for electromagnetic fields is derived and various boundary conditions satisfied on material interfaces are presented. A brief outline of the method of finite elements and its application in electromagnetics is given. Chapter 3 provides a detailed review on the construction and implementation of scalar and vector shape functions for two- and three-dimensional finite elements. Traditional node-based shaped functions are presented for a wide variety of element shapes and their pros and cons are outlined. The problem with nodal basis for a full-scale vector formulation is explained and edge-based vector shape functions are

5 introduced. The development of two- and three-dimensional edge bases is presented for a wide variety of element shapes. Higher order edge basis functions are presented for triangles and tetrahedra along with other recently developed novel shape functions. Chapter 4 describes the formulation and implementation for closed and open domain problems. In the first part of the chapter, the closed problem is solved by determining the eigenvalues of an empty or filled metallic cavity. The origin and avoidance of spurious modes is discussed. The open problem is then formulated in the second part of the chapter and schemes for terminating the finite element mesh are discussed. The code is then validated for a wide class of perfectly conducting and composite geometries having arbitrary shapes. Since the finite element-absorbing boundary condition methodology becomes extremely attractive for large problems, it is essential that the computer code be as computationally efficient as possible. Chapter 5 details the optimization and the subsequent parallelization of the finite element code and the various numerical considerations associated with it. The strategies for sparse matrix storage as well as solution of sparse systems using preconditioned iterative methods is outlined. The inherent parallelism in various types of point and block preconditioners is examined along with performance figures on the KSR1 and the Intel iPSC/80 massively parallel architectures. In Chapter 6, new mesh termination conditions which can be applied on termination surfaces conformal to the target are derived and applied on some benchmark geometries. Since these ABCs are enforceable on doubly curved surfaces, dramatic reductions in computer storage and solution time are obtained. The improved boundary conditions are applied on mesh truncation surfaces composed of combinations of

6 cylinders, spheres and flat planes and their performance examined with respect to mesh termination distance and system symmetry. Extensions to more complex mesh termination boundaries are possible. Chapter 7 concludes the thesis by summarizing the important results obtained during the course of this work and its possible future extensions.

CHAPTER II Basic concepts of electromagnetics and finite elements As mentioned in the last chapter, this thesis deals with the application of finite elements to three-dimensional problems in electromagnetics. Therefore, it is important to have a grasp of both the finite element method as well as electromagnetic theory to solve these problems. This chapter is thus divided into two parts. The first part gives a brief review of the basic concepts of electromagnetics and the resulting differential and integral equations pertaining to our problem. The second part introduces the reader to the formulation of boundary-value problems with finite elements. 2.1 Electromagnetics: basic concepts 2.1.1 Maxwell's equations Electromagnetic waves in all space are governed by a set of fundamental equations called Maxwell's equations. In differential form, they are written as VxE - -: (Faraday's law) (2.1) VxH = J + Ot: (Maxwell-Ampere's law) (2.2) V.D = p: (Gauss' law) (2.3) 7

8 V-B = 0: (Gauss' magnetic law) (2.4) where E electric field intensity in volts/m D electric flux density in coulombs/sq. m H magnetic field intensity in amperes/m B magnetic flux density in webers/sq. m J _ electric current density in amperes/sq. m p _ electric charge density in coulombs/cu. m Another fundamental equation, frequently referred to as the equation of continuity, is given by V.J = -, (2.5) and expresses the conservation of charge. Of the five equations stated in this chapter, only three are independent. Either the first three equations, (2.1-2.3), or the first two equations, (2.1 and 2.2) and the fifth equation (2.5), can be chosen as independent equations. The remaining two equations,(2.4 and 2.5) or (2.3 and 2.4), can be derived from the independent equations and are thus called auxiliary or dependent equations. The three independent equations cannot be solved since the number of unknowns exceeds the number of equations. Maxwell's equations become definite only when the three constitutive relations between the field quantities are specified. These relations describe the macroscopic properties of the medium being considered. For a simple medium, they are given by D = eE (2.6)

9 B = pH (2.7) J = uE (2.8) where the parameters, j and a denote the permittivity (farads/m), permeability (henrys/m) and conductivity (mhos/m) of the material, respectively. For anisotropic media, the constitutive relations are given as D = e E (2.9) B =.H (2.10) J = oE (2.11) We will be considering only isotropic media in the following sections since the generalization to anisotropy is trivial and introduces needless algebraic complexity. It is usually sufficient to consider the steady-state solution for electromagnetic fields as produced by currents having sinusoidal time dependence. The set of Maxwell's equations, using complex phasor notation and the constitutive relations, can then be written as VxE = -jwyH (2.12) VxH = J+jweE (2.13) V (eE) = p (2.14) V. (H) - 0 (2.15) V.J = -jLwp (2.16) where w is the angular frequency of oscillation and the time convention eimwt is used and suppressed.

10 2.1.2 Wave equations The two curl equations, (2.1 and 2.2), can be combined together with the assumed constitutive relations, (2.7 and 2.8), to obtain a separate second-order differential equation for each field. By taking the curl of (2.1) or (2.2) and eliminating H or E respectively, we obtain Vx-VxE - kerE = -wJ (2.17) I r Vx-VxH-k 2rH = x (J) (2.18) where ko = w~iodo is the free-space wave number, Er and it are the respective relative permiitivity and permeability of the medium under consideration and J is an impressed or source current. The differential equations shown above are called inhomogeneous vector wave equations in three dimensions. 2.1.3 Boundary conditions Mathematically, the solution of a partial differential equation (PDE) like the wave equation, outlined in (2.17) and (2.18), is not unique in a region unless boundary conditions are specified, i.e, the behavior of the field on the boundary of the region of interest. Boundary conditions play the same role in the solution of PDEs that initial conditions play in the solution of differential equations for electric circuits. An electromagnetic problem is thus completely defined only when it contains information about the governing differential equation and the corresponding boundary conditions at material discontinuities or inhomogeneities. At the interface between two media, say medium 1 and medium 2, the boundary conditions can be mathematically expressed as n x (E1-E2) - 0 (2.19)

11 n. (Di- D2) = 0 (2.20) for electric fields and nfix(Hi-H2) = 0 (2.21) n (B1-B2) = 0 (2.22) for magnetic fields, where n denotes the unit normal to the interface. It is assumed in (2.20) and (2.21) that neither surface currents nor surface charge exist on the boundary. Equations (2.19) and (2.21) state that tangential electric and magnetic fields are continuous across dielectric boundaries. It is possible to simplify the above boundary conditions at the interface of a perfect electric conductor and free-space. Since a perfect conductor cannot sustain a field inside it and likewise since the flux lines of B are continuous, (2.19) can be rewritten as n x E = 0 (2.23) and (2.22) reduces to n - B = 0 (2.24) However, the conductor surface can support a surface current (J. = nfix H) and a surface charge (psn - D). 2.1.4 Radiation conditions It can be shown that the electric field within a finite volume can be derived in terms of the sources within the volume and the field values on the surfaces bounding that volume. If we make this volume infinitely large, we arrive at the Sommerfeld

12 radiation condition [7] given by lim r 9 +jk/ =0 (2.25) where 4 is regular at infinity and describes a component of the electric or magnetic field and r = V/x2 + y2. In three dimensions, the radiation becomes lim R [VxE + jkoR x E = 0 (2.26) R —-*oo where R = /2 + y2 + z2. A similar result exists for the magnetic field. The radiation condition requires that E and H diminish as R-1 when R -* oo. 2.1.5 Radar cross-section The radar cross-section (RCS) is a quantity characterizing the scattering from an obstacle. It is defined as the area intercepting that amount of power which, when scattered isotropically, produces at the receiver a power density equal to that scattered by the target under consideration. In the three-dimensional case, the RCS is defined as a(0e, q) = lim 47rR2 Fs 12 (2.27) R-+oo IFnc2 wher Fs denotes the scattered field (either Es or H8) at the observation point (R, 0, 0q) and FtnC represents the incident field, usually a plane wave, coming from (R.,incv inc). 2.2 Finite elements: basic ideas The finite element method (FEM) is a numerical method for obtaining approximate solutions to boundary-value problems in physics and engineering. Very few analytical solutions are possible and thus a numerical method like the FEM provides

13 us with an alternative solution technique for these problems. For a particular class of such problems, there exist extremum principles by which the solution being sought makes an appropriate functional stationary, or, in certain cases, extremal. In other problems for which no genuine extremal principles can be derived, the error resulting from the substitution of the numerical approximation into the differential equation is minimized. In the following pages, we will give a brief description of the two accepted formulation schemes - the Ritz variational method [8] and the Galerkin method of weighted residuals [9]. We will then discuss how the finite element method is used to solve PDE problems formulated by the two abovementioned schemes. 2.2.1 Boundary value problem We basically seek an unknown function u which satisfies a differential equation A(u) = Cu - f = 0 (2.28) in a domain Q and certain boundary conditions B(u) = 0 (2.29) on the domain boundary F. In electromagnetics, the form of the governing differential equation ranges from the simple Poisson equation in statics to the complicated vector wave equation such as (2.17) and (2.18). The boundary conditions can also vary from Dirichlet and Neumann conditions to more complicated higher-order transition and radiation conditions. 2.2.2 The Ritz method The Ritz method, or the Rayleigh-Ritz method, is a variational formulation where the solution to the boundary value problem is obtained by searching for the stationary

14 point of the functional. If the operator C in (2.28) is self-adjoint and positive-definite, then the solution to (2.28) can be found by determining the stationary point of the functional [1] F(u) = < u,U > - < uf > (2.30) with respect to u, where u denotes the trial function. The inner product, denoted by angular brackets, is given by < a,b >= a bdV (2.31) Once the functional is found, we approximate the trial function by the expression N u = Cw -Cw (2.32) i=1 where wi are basis functions and Ci are constant coefficients to be determined. Substituting (2.32) in the expression for the functional, we get F(u) = CT (j wCT dQ) C-CT J wf dQ (2.33) where C is the column vector of unknown coefficients and the superscript denotes the transpose of a vector. On differentiating F(u) with respect to C and setting the resultant expression to zero - equivalent to finding the stationary point of F(u) - we obtain a system of equations [A] {C} = {b} (2.34) where the elements of the matrix A and the vector b are given by A = wiwj dQ (2.35) b = / wf dfQ (2.36)

15 On solving (2.34) for the unknown coefficients {C} of the finite element bases, we obtain a solution for the desired quantity everywhere within the computational domain. It should be pointed out that the inner product as defined in (2.31) extends the applicability of the variational formulation to complex numbers. This is of utmost importance in electromagnetics since material fillings are often lossy. However, unlike in other branches of science, the functional does not have any physical significance in electromagnetics and is thus used sparingly. A further limitation is that the matrix A must be symmetric for a variational principle to exist. Problems which give rise to unsymmetric matrices need to be handled differently. The weighted residual or the Galerkin method provides an alternative, and simpler, approach of formulating the finite element equations. 2.2.3 Galerkin's method Galerkin's method or the weighted residual method tries to minimize a residual in the mean square sense. If we assume that u is an approximate solution to (2.28), on replacing u with u in (2.28) we obtain the residual Z =Cu - f 0 (2.37) Naturally, the best approximation for u would be one that reduces the value of the residual to a minimum at all points in Q. The integral of the residual, weighted with some known weighting functions wi, is then required to vanish in Q. / Rwi dQ = 0 (2.38) In the Galerkin method, the weighting functions wi are chosen to be identical to the basis functions used for expanding u. With this choice of weighting functions,

16 the residual 1? is orthogonal to the subspace of functions spanned by the basis of 1iK ensuring that the resulting approximation is in this sense the best possible from the space of approximating functions. The expression (2.38) then becomes j (wiwTc -wif) dQ= 0, i= 1,...,N (2.39) where u has been expanded as in (2.32). This again leads to a matrix system identical to the one given in (2.34), obtained by the Ritz method. One of the advantages of this formulation is that the matrix system need not be symmetric for its validity. Also, the method can be applied to problems for which no genuine extremum principles exist. 2.2.4 Implementation scheme The implementation scheme for FEM follows three broad outlines. Step 1: At first, the problem is discretized by dividing the entire computational domain into simple subdomains, the elements. For two-dimensional problems, commonly used elements are triangles, parallelograms and quadrilaterals with straight or curved edges. In three dimensional implementations, the elements of choice are usually terahedra, curvilinear bricks or prisms with straight or curved surfaces [10, 11]. Step 2: Next, a suitable approximation function is chosen for the problem to be solved. The form of approximation depends on the type of element and must satisfy certain continuity conditions across inter-element boundaries. Further, the form of the polynomial function must remain unchanged under a linear transformation from one Cartesian coordinate system to another. This requirement is satisfied if the polynomials are complete to a specific order like u(x, y) = c1 + c2x + c3y + c4x2 + csxy + coy2 (2.40)

17 or when the extra terms are symmetric with respect to one another, as in the followsing incomplete third-order polynomial U(X, y) = C1 + C2X + C3y + c4x2 + c5xy + c6y2 - C7X2y + C8xy2 (2.41) Such approximation functions have the characteristic that, for fixed x or y, they are always complete ploynomials in the other variable. The two examples shown above apply in two-dimensions; the extension to three-dimensional elements is trivial. Once the order of the polynomial is selected, we can derive an expression for the unknown solution in an element, say the eth element, having the following form: n u = E ueNe (2.42) k=l where n is the number of bases in the eth element, uk is the value of the unknown function at node, edge or facet j and Nk is the basis (or shape) function for the element. Step 3: In the third step, we enforce the extremum principle by substituting (2.42) into the functional expression (2.30) or into the residual value (2.37). On imposing the stationarity of the functional as explained in Section 2.2.2, we obtain the system of linear equations (2.34). An alternative procedure of minimizing the weighted residuals (2.38) yields an identical system of linear equation (2.34). As mentioned earlier, both the variational (Ritz) and the weighted residual (Galerkin) formulations are equivalent when the matrix system is symmetric. In addition, the Galerkin method can handle unsymmetric systems. Finally, the solution of (2.34) specifies the values of the desired function everywhere within the computational domain. In the next chapter, we will be discussing step 2 in more detail. The two subsequent chapters are devoted to the derivation of the finite element equations for our

18 application and the optimizations undertaken to enable rapid solution of the system.

CHAPTER III Shape functions for scalar and vector finite elements The finite element method is used for modeling a wide class of problems by breaking up the computational domain into elements of simple shapes. Suitable interpolation polynomials (or shape functions) are used to approximate the unknown function within each element. It is then possible to program the computer to solve complicated geometries by specifying the shape functions only. The element choice, however, needs human intervention and intelligence to ensure a reliable solution of the of the problem at hand. In this chapter, we will discuss the derivation of node-based and edge-based shape functions for two dimensional and three dimensional finite elements. Node-based shape functions have been used extensively in civil and mechanical engineering applications as well as in scalar electromagnetic problems. However, a full three dimensional vector formulation brings out numerous deficiencies in these traditional element shape functions [12, 13]. Edge-based shape functions have thus been derived to overcome the problems associated with nodal bases and are now being applied widely for solving vector problems in electromagnetics. We will also describe a general procedure for deriving higher-order shape functions for node and edge basis. 19

20 3.1 Node-based elements In node-based finite elements, the form of the sought function in the element is controlled by the function values at its nodes. The approximating function can then be expressed as a linear combination of basis functions weighted by the nodal coefficients. If the function values ui at the nodes are taken as nodal variables, then the approximating function for a two-dimensional element e with p nodes has the form p e(x, y) = uN(x, y) (3.1) i=l Since the expression (3.1) must be valid for any nodal variable Ui, the basis function Ne(x, y) must be unity at node i and zero for all remaining nodes within the element. Shape functions can be derived either by inspection (Serendipity family) or through simple products of appropriate polynomials (Lagrange family). It is easier and more systematic to construct higher-order bases in the Lagrange family while progression to higher orders is difficult in the Serendipity family. However, Lagrange shape functions have undesirable interior nodes and more unknowns than Serendipity shape functions of the same order. All shape functions derived in the following sections impose function continuity or Co continuity (not slope continuity) between elements. 3.1.1 Two dimensional elements Rectangular elements The simple shape of the rectangular element permits its shape functions to be written down merely by inspection. On examining the element shape given in Figure (3.1), the shape functions can be cast in the form N =A ) - I h _

21 where xe and y1 denote the coordinates of t he and he represent the edge length and Ay denotes the area of the element. Higher order 4 3 *(Xe YC) he Y x 1I. 2 0L -,1____. he Figure 3.1: Rectangular element rectangular elements are presented in Zienkiewicz [10]. However, these elements can model only regular geometries and are thus not very useful in practice. Irregular geometries can be modeled by using quadrilateral elements which can also be viewed as distorted rectangles. To construct basis functions for a quadrilateral element, we need to use a transformation that maps a quadrilateral element in the xy-plane to a square element in the At plane (Figure 3.2). Such a transformation can be found by satisfying the following relation at the four nodes of the quadrilateral element: x = a + b( + cr/+ dr y - a' + b' + c'r+ d'Fr/ (3.2)

22 4 4: I 3 3 1 2 1 ==-1 2 Figure 3.2: Transformation of a quadrilateral element in the xy plane to a unit square in the ~t7 plane On solving for the unknown coefficients a,..., d, the basis functions can be cast in the following form Ni=4 (1+o)(1 + o), i=1,...,4 (3.3) where o = -i and 7ro = rpt. The variables ((j, ri) denote the coordinates of the ith node in the (7, r7) coordinate system. Triangular elements Triangular elements are popular since they can model arbitrary geometries. We will determine the shape functions of triangular elements by using Lagrange interpolation polynomials. Let us consider a point P within a triangular element (Figure 3.3). The area of the smaller triangle formed by points p, 2 and 3 is given by 1 x y \A=- 1 z 2 (3.4) i x3 Y3

23 3 2 Figure 3.3: Triangular element The area coordinate L7 is then given by L = A1 (3.5) where A is the area of the whole triangle and can be found from (3.4) by replacing x and y in the first row with xl and Yl. Similarly, the two remaining area coordinates L2 and L3 are given by - A2 Area P31 A Area 123 A3 _ Area P12 ~ A Area 123 () The values for x and y inside the triangular element reduce to 3 3 x -= EL y = ELy (3.8) i=l i=1 The area coordinates are equal to the basis functions - Nie,i = 1,2,3 - when the required interpolation order is linear. Higher order basis functions for triangular and quadrilateral elements are derived in [10, 11].

24 3.1.2 Three dimensional elements Shape functions for three dimensional elements can be described in a precisely analogous way to their two dimensional counterpart. However, the simple rules for inter-element continuity given previously must be modified. The nodal field values should now interpolate to give continuous fields across the face of each element. Rectangular bricks The simplest polynomial approximation to a rectangular brick element is the trilinear function U e( x, z) = ae + bex + cey + de + exy + feyz + gezx + hexyz (3.9) whose eight parameters are uniquely defined by the values of the function u at the eight corners of the brick. From the eight resulting equations, we can determine the coefficients ae, be,..., he and write the final expression in the form 8 e(x, y,z) - N~(x, yz)u? (3.10) i=l However, this is a cumbersome process and can be easily avoided by writing down the required basis functions by mere inspection. Since the basis function Ne must be unity at node i and zero at the remaining nodes, the eight interpolation functions can be written down as Ve xC + -2 - V) c + 2 -Y- - z~ + 2 -2z N e 2 = v e ( -c + X 2 + _ _ % + — Z - Z N3 = Ve (-x + 2- + x -Yc + - + yZ i= h_ (e+ h ) (~ he: = v

25 N = ( — + Y )(- +-) -8 = - xe (sac + +x -+ - -~~+ 2 +z where xz, ye) and zc denote the coordinates of the center of the element, h;, hy, and he represent the edge lengths of the element and Ve denotes the element volume. Bricks with quadratic interpolation functions need 20 degrees of freedom and thus have node points at the corners and the mid-points of each edge. Shape functions for hexahedral elements or distorted bricks can be derived by mapping the element in the xyz coordinate system onto a standard cube in a new,<v/ coordinate system. The required transformation yields 8 8 8 z - NN(,, ()x'; y - N(,, ()y; z- N(, r, ()z (3.11) i=l i=l i=l where I= (1 + (i)(1 + rWi/)(1 + (C,) (3.12) 8 with (ti, /, (i) denoting the coordinates of the ith node. Tetrahedral elements The three dimensional analogue of a two-dimensional triangle is a tetrahedron (fourfaced element). Once again, we can introduce special coordinates, called volume coordinates or simplex coordinates, to simplify the derivation of shape functions. If P is a point within the tetrahedron shown in Figure 3.4, the four volume coordinates are given by Volume P234 Volume 1234 Volume P341 L2 -" Volume 1234

4 -...........Volume 1234 Volume P123(3 ~~x ~~=Z= ^ ~Li~ ~.Xi; y.'. =:-:.-..'.' ZL. Other elements and isoparametric elements. It is much easier to discretize a complicated structure elements can be used. There is a good review of such elements in [10, 14].. - -.-.- fi__... Zixi;. -.- Y~Ziyi z - y..... - the.4 c o r n e r.nodes a......... the reann 8. o t m o t. 4................. x.....................................i z i...................l................ Quadratic~ ~~~............. fucin.o. erhernncsiae h ueo e oepit -~~~~~~~~~................. cone.oe n h eann ntemdpit fteegs Other~~~~~~~~..............lements..... Other threedimensionalelements haing simpleshapes inclde the.t..agular.pris and isoparametric elements...................... ismuh.asertodicrtzea. omliatd.trctr by usin paralelopipes in cmbinaton withprism.lement.......... that..smal numbr ofelemnts an odela reativly omplx reioncured o isoaramtri elements~~~~~~~........................hr is. goo reie of suc.lmnsi1,1]

27 3.2 Edge-based elements In electromagnetics, we encounter several serious problems when node-based elements are employed to represent vector electric or magnetic fields. First, spurious modes are observed when modeling cavity problems using node-based elements [15]. Secondly, special care needs to be taken to impose boundary conditions at material interfaces and conducting surfaces [16]. The first limitation can also jeopardize the near-field results of a scattering problem, the far-field escapes contamination since spurious modes do not radiate. Edge-based finite elements, whose degrees of freedom are associated with the edges of the finite element mesh, have been shown to be free of the above shortcomings. They were described by Whitney [17] over 35 years ago and have been revived by Nedelec [18] and Bossavit and Verite [12] and Hano [19]. Mur and de Hoop [20], van Welij [21], Barton and Cendes [22] and Lee,et al [23] have extended their applicability to various two- and three-dimensional shapes and even constructed higher order elements for a more accurate approximation of the field values. 3.2.1 Two dimensional elements Rectangular elements We first consider the rectangular element first since its shape function is usually the easiest to formulate. For the element shown in Figure 3.1, we can find its edgebased finite element basis function merely by inspection. If the edges are numbered according to Table 3.1 and considering that the basis function should be unity along one edge and zero over all others, the vector basis functions can be written as - = —'^ - -

28 Edge no. i1 i2 1 1 2 2 43 3 1 4 4 23 Table 3.1: Edge numbering for rectangular element e = (3.14) N- - + Ix -r- I E~ - y -_ (.+ ) i=l where Et denotes the tangential field along the ith edge. The basis functions Nte guarantee tangential continuity across inter-element boundaries since they have a tangential component only along the ith edge and none along the other edges. They are also divergenceless within the element and possess a constant non-zero curl. It should be noted that by taking the cross-product of z with N', we obtain basis functions which possess normal continuity across element boundaries, have zero curl and non-zero divergence. The latter are ideal for representing surface current densities and are known as roof-top basis functions in electromagnetics. They have found extensive use in the solution of integral equations [24]. Edge basis for quadrilateral elements can be derived by carrying out the transformation detailed in the derivation of nodal basis for quadrilaterals in the previous section and then taking the gradient of the resulting expression for each edge. The

29 edge-based quadrilateral element has two shortcomings. First, the integrals associated with these elements do not lend themselves to easy evaluation and secondly, the basis functions are not divergence free. However, their ability to model complicated shapes with a lesser number of unknowns than tetrahedra and property of tangential continuity across elements make them attractive for use in two-dimensional vector formulations. Triangular elements Since the edges of an arbitrary triangular element are not parallel to the x or y axis, it is not easy to guess the form of the vector basis function by inspection. Therefore, the edge basis for a triangular element is expressed in terms of its area coordinates, L1, L2 and L3. These are the so-called Whitney elements. If the local edge numbers are defined according to Table 3.2, then edge bases for a triangular element are defined as Nk = W = LiVLj - LjVL, i,j = 1,..., 3 (3.15) where Nk denotes the basis function for the kth edge of the element. The vector field inside the triangular element can, therefore, be expanded as 3 E E E (3.16) k=1 where Ee denotes the tangential field along the kth edge. It can be easily shown that the edge-based functions defined in (3.15) has the following properties within the element: V Wij = 0 VxWij = 2VLi xVLj If ei is the unit vector pointing from node 1 to node 2 in Figure 3.3, then ei VL1 = -1 and e1 * VL2 = 1. Since L1 is a linear function that varies from unity at node 1 and

30 Edge no. iI 2 1 1 2 2 23 3 3 1 Table 3.2: Edge numbering for triangular element zero at node 2 and L2 is unity at node 2 and zero at node 1, we have el' W12 - Li + L2 - 1 (3.17) along the entire length of edge 1. This implies that W12 has a constant tangential component along edge 1. Moreover, since L1 vanishes along edge 2 and L2 vanishes along edge 3, Wi2 has no tangential component along these edges. Thus, tangential continuity is preserved across inter-element boundaries but normal continuity is not. A different method of constructing edge basis functions for triangular elements is given in [25]. Higher order vector basis functions include the contribution of facet elements to the approximating function. Unknowns in the triangular element are assigned as shown in Figure 3.5 [26]. The tangential projection of the vector field along edge {i, j} is determined by two unknowns EJ and Ej and two facet unknowns - F1 and F2 - are provided to allow a quadratic approximation of the normal component along two of the three edges. Only two facet unknowns are required to make the basis functions of second order complete. Therefore, there are 8 degrees of freedom for each triangular element. The higher order vector field within the element is given by 3 3 Ee > E E(i iVLj + F~LL2VL3 + F~L1L3VL2 (3.18) i=l j=l i:j

31 1 2 3 ~1 F F 1 E 1 i/ 1'1 2 2 3 2/ 33 2 E2 ~~E3 Figure 3.5: Second order triangular edge element where we have arbitrarily chosen the facet variables to lie on edges 1 and 2. These variables are local unknowns associated with each separate triangular element and are included to provide a linear approximation for Vt x Et, where the subscript t denotes the tangential component. Since the edge variables provide common unknowns across element boundaries, tangential continuity of the field over the boundary is assured. However, an obvious disadvantage of these elements is that the 2 facet variables cannot be symmetrically assigned. This disadvantage can be avoided by using third order elements [27]. 3.2.2 Three dimensional elements Edge-based elements have facilitated to a great degree the finite element analysis of three dimensional structures in electromagnetics. Linear nodal basis with their problem of spurious modes and difficulty in maintaining only tangential continuity across material interfaces are not as convenient for electromagnetic field simulations in three dimensions. On the other hand, the introduction of edge based shape func

32 tions provide a robust way of treating general three dimensional problems having material inhomogeneities and structural irregularities like sharp edges and corners. In the following section, we will consider the simple rectangular bricks first and will proceed to derive edge-based shape functions for more complicated structures like tetrahedrals and curvilinear hexahedrals. The chapter is concluded with a brief discussion on hierarchical edge elements. Rectangular bricks and hexahedrals As in the two dimensional case, we derive the edge-based shape function for a rectangular brick (see Figure 3.6) by simple inspection. Since a constant tangential field 8 7 Figure 3.6: Rectangular brick element component must be assigned to each edge of the element, we can express the shape function along each edge of the element as [28] Ne- 1 ( e h_ ___ ( e he) N =., (- y+Yc-z+ +) (zz+ ) =' h h.(Y^ ~2(z-z 2)

33 N = hI ~(2 — )(z ) hh (+ + ) (h +C 2 N = hh(zzc2)(+c+2)Y N = hh- y (- yz + - + + = hyhz (-2+ ) (- C- 2 ) Y N = hh z+c 2 ( c+ 2) Nl= -he (xXc+ 2)2) N -= - z -x + h + (-Y) N2= -he + -) (- + ) where hyhhe denote the edge lengths in the hy, and z directions, respectively, and the center coordinates of the brick are given by (xbe ye) z). If the edge numbers be expressed as 12 N= -x +E (3.19) i=l where EI represents the value of the electric field along the ith edge. The vector basis Nt defined for the rectangular brick element have zero divergence and a nonzero curl. Furthermore, the expansion (3.19) guarantees tangential continuity of the electric field across the surfaces of the elements. A rectangular brick element has limitations in the sense that it is unable to model irregular geometries. Due to this reason, the analog of the two dimensional

34 Edge no. Node il Node i2 1 1 2 2 4 3 3 5 6 4 8 7 5 1 4 6 5 8 7 2 3 8 6 7 9 1 5 10 2 6 11 4 8 12 3 7 Table 3.3: Edge definition for rectangular brick quadrilateral ( the hexahedral element) finds much wider use in modeling practical three dimensional problems. As in the case of the quadrilateral element in two dimensions, a hexahedral element in Cartesian coordinates can be seen as the image of a unit cube under a trilinear mapping to the (wq( coordinate system (see Figure 3.7). Let us consider those faces for which ~ =constant. Therefore, V~ must then possess only a normal component on that face. Since ~ varies linearly along the edges that are parallel to t he axis the vector function Vi has nonzero tangential components only along those edges that are parallel to the (-axis. Using the nodebased expression for the shape function in a hexahedral element given in (3.12), we

35 8 7 4 3 Figure 3.7: Mapping of a hexahedral element to a unit cube may write the corresponding edge bases as N - = - (1 + Tr/i) (1 + C(C) V~, edges 11 to (-axis (3.20) N. = - (1 + ii) (1 + hC() Vr, edges |1 to r-axis (3.21) N7 = (1 + ) (1 + i) VC, edges |I to (-axis (3.22) 8 where (hi, ri, (X) denote the coordinates at edge i. The vector bases derived above possess all the desired continuity properties of edge elements and generally result in about half the number of unknowns than that obtained by tetrahedral gridding. However, these basis functions are not divergenceless and it is difficult to generate a finite element mesh of an arbitrary structure using hexahedral elements. Tetrahedral elements Tetrahedral elements are, by far, the most popular element shapes to be employed for three dimensional applications. This is because the tetrahedral element is the simplest tessellation shape capable of modeling arbitrary three dimensional geometries and is also well-suited for automatic mesh generation. The derivation of shape functions for these elements follow the same pattern as that for triangular vector

36 basis functions. If we consider the tetrahedron shown in Figure 3.4 and define the edge numbers according to Table 3.4, we have = = L VL - L7VL, i j... 1,4 (3.23) and the vector field within the element can be expanded as 6 E = ZE EeNe (3.24) k=l A nice explanation of the physical character of the edge-based interpolation function Edge no. Node il Node i2 1 1 2 2 1 3 3 1 4 4 2 3 5 4 2 6 3 4 Table 3.4: Edge definition for tetrahedron is given by Bossavit [29]. Let us consider edge number 1 connecting nodes 1 and 2. Since VL2 is orthogonal to facet {134} and VL, is orthogonal to facet {234}, the field turns around the axis 3-4 and is normal to planes containing 3 and 4. The field thus has only tangential continuity across element faces. Edge elements can also be described as Whitney elements of degree 1. Whitney elements of the second degree are called facet elements because they are constant over the face of the tetrahedron. The vector function for the facet element can be written as Wijk = 2 (LiVLj x VLk + LjVLk x VLi + LkVLi x VLj), i,j, k = 1,..., 43.25)

37 As explained in [29], we now have a central field (emanating as if from node 4 in Figure 3.4) on each of the two tetrahedra that share the face {1,2,3}. The field can be imagined as coming from the'source' 4, growing, crossing the facet and vanishing into the'well' 4', the fourth vertex of the other tetrahedron. This field thus has normal continuity and the flux across the facet forms the degree of freedom for the element. Alternative expressions for linear basis inside a tetrahedron have been derived in [22]. They are given by f7-i + g7-i x r, r in the tetrahedron N7_i= (3.26) 0, otherwise with f7-i = 6V r x ri (3.27) g7-i = 6V (3.28) 6Ve in which i = 1,2,...,6, Ve is the volume of the tetrahedral element, e = (r i-r=i)/hi is the unit vector of the ith edge and hi = Iri2 - ri | is the length of the ith edge with ril and ri2 denoting the position vector of the il and i2 nodes. It can be shown that (3.23) has the same form a + b x r as (3.26) when simplified, where a and b are constant vectors. The basis functions given in (3.26) have zero divergence and constant curl (VxNt = 2gg). The order of the polynomial approximation for the first order edge element given in (3.23) or (3.26) can be taken as 0.5. This is because the value of the basis function is constant (0(1)) along the edge it supports and is linear (0(r)) everywhere else within the element. Mur and de Hoop [20] presented edge elements which are consistently linear, yielding a linear approximation of the field both inside each tetrahedron and

38 along its edges and faces. Since this requires two unknowns per edge. there are 12 degrees of freedom per element. The basis functions in [30] are derived by first defining the outwardly directed vectorial areas of the faces as Ai = r x rk + rk x rl + rl x r (3.29) where ri = 1,...,4 denote the position vectors of the vertices of the tetrahedron and i, j, k, I are cyclic. Then the edge-based vectorial expansion function is defined by (r)- i(r)j, i,j = 1,..., 4, (3.30) where V is the volume of the tetrahedron and ~>(r) is a linear scalar function of position given by 1 (r -rb) * Ai w 4 - 3V in which rb is the position vector of the centroid of the tetrahedron. We observe that Xi(r) equals unity when r = ri and zero for the remaining vertices of the tetrahedral element. In that sense, they are very similar to the simplex or volume coordinates mentioned earlier. They also satisfy the following equalities: 4 r = i(r)r i=l 4 Z O(r)= 1 i=1 The edge basis function Wij is a linear vector function of position inside the tetrahedral element and its tangential component vanishes on all edges of the element except the one joining vertices i and j. Wij varies linearly along the edge formed by nodes i and j such that Wij(rj = 0 while W,. * (r - rj) = 1

39 These basis functions have non-zero values of divergence and curl. An inspection of the expressions for the vectorial areas reveals that the form is identical to that obtained by taking the gradient of one of the simplex or volume coordinates mentioned earlier. In other words, the three components of the vector A1 have the same functional dependence as that obtained by VL1 Y2 1 z2 X2 1 z2 x2 1 Y2 VL1 = -x 6 detV Y 3 1 Z3 -3 det x3 1 Y3 L4 1 Z4 X4 1 Z4 X4 1 Y4 where L1 is the volume coordinate for a tetrahedron defined in (3.13), det indicates the value of the determinant of the matrix and xiy, i, zi denote the coordinates of the ith vertex. The basis functions with consistently linear interpolation in the tetrahedron can now be rewritten in more convenient notation as Nk Wij LVLj, i, j =- 1,..., 4,i = j (3.31) Still higher order basis functions are sometimes necessary for rapidly changing fields or for modeling extremely thin structures where linear interpolation for the highly stretched elements is not enough. The second order edge basis (O(r15)) for a tetrahedral element was first presented by Lee, Sun and Cendes [30]. We need 20 degrees of freedom to achieve a quadratic approximation of the vector field inside a tetrahedron (see Figure 3.8). Accordingly, the field within a tetrahedron can be written as 4 4 4 E -= E'EiLVLj + E (FiLiLjVLk + F2LiLkVLj) (3.32) i=l j=l i=l isja where i, j, k form cyclic indices. The facet variables F[ and F\ are common unknowns

40 2 1 I 44 E2 <F 1 3 112 4 E2 4 Figure 3.8: Tetrahedral element for two tetrahedra that share the same face. Even higher order edge-based elements up to polynomial order 2 can be constructed. Each tetrahedral element now has 30 unknowns - 3 along each edge and 3 on each face. Other elements Recently, Wang and Ida proposed a systematic method for the construction of curvilinear elements in [31]. The vector shape function is expressed in the following form: Ni(r) -= i(~,, t()vi(r), i 1,..., M (3.33) where q~(g, r/, () are completely defined in the local coordinate system, vi contains the edge and facet information and M denotes the number of degrees of freedom in the element. These basis functions usually lead to a symmetric system of equations. However, it is difficult to find commercial mesh generation packages which construct curvilinear elements for a wide range of geometrical configurations. Hierarchical vector elements Finite elements are said to be hierarchical when the basis functions for an element

41 are a subset of the basis functions for anNy elment of higher order [10]. The basis functions described in [32] are hierarchical and tangentially continuous. Vector elements complete upto polynomials of order 2 are available and basis functions of a given order are fully compatible to be used with basis functions of lower or higher orders. Thus elements of different orders could be used in the same mesh - lower order elements could be used in regions where field variation is uniform and higher order elements employed in regions where the field varies rapidly. The implementation of hierarchical vector elements can be a bit tricky, especially at the transition boundaries where elements of one order merge into the elements of another order. If several vector elements share an edge, the field tangent to the edge can be made identical in each of the tetrahedra. This is done by setting the coefficient of the corresponding basis function for the edge in all tetrahedra to be identical. For tangential continuity across a face, the same equality must be enforced between the coefficients of all the edge and facet functions associated with the face. Table 3.5 gives the basis functions for hierarchical vector finite elements. Higher order basis functions are constructed by systematically adding the extra terms upto the desired order. It should be notedthat the bases for the tetrahedron with 6 and 20 unknowns shown in Table 3.5 is identical to the linear and second order edge basis given in (3.23) and (3.33), respectively.

42 Element type Polynomial Unknowns Basis function order per element Edge 0.5 6 LiVLj-LjVLi Edge 1 12 V (L-Lj) Face 1.5 20 LkLjVLi Face LkLiVLj Edge 2 30 V [LiLj (Li - L)] Face V [LiLjLk] Table 3.5: Hierarchical basis functions for tetrahedron

CHAPTER IV Vector finite elements for 3D electromagnetic problems Finite elements have been used extensively to model open and closed domain electromagnetic problems in scalar form in two and three dimensions[1, 33, 34]. But a reliable full vector formulation proved to be extremely difficult to implement. The cause of the problem was found to be the traditional nodal basis functions that were being used to discretize the unknown field variable. The reasons for the failure of node-based elements in modeling the vector wave equation will be discussed in a later section. Fortunately, a novel remedy was found by assigning the degrees of freedom to the edges rather than to the nodes of elements. These types of elements had been described by Whitney[17] in terms of geometrical forms about 35 years back and were revived by Nedelec [18] in 1980. In recent years, Bossavit [12] and others [19, 20, 21, 22] applied these edge-based finite elements successfully to model three dimensional problems. In all these works, edge elements were seen to be devoid of the shortcomings commonly experienced with node-based elements. The goal of this thesis was to develop a general purpose code for computing the scattering pattern of three dimensional composite structures having complex shapes. Edge based functions with their robustness in modeling general three dimensional 43

44 problems were evidently our choice. Since the only simple shape to model an arbitrary three dimensional space is a tetrahedron, we settled on edge-based tetrahedra as our mesh discretization units. The mesh truncation was chosen to be done with absorbing boundary conditions (ABCs) which are local in nature, preserve the sparsity of the finite element system and permit scalability to large problems with minimal storage -O(N)- and computational time - O(k * N), k << N - penalties. In the first part of this chapter, we present the weighted residual or weak formulation for the closed domain problem and solve for the eigenvalues of a metallic cavity having arbitrary shape. In passing, we briefly describe the problem of spurious solutions encountered with node-based elements. We also validate our methodology by comparing the computed eigenvalues with analytically derived ones. In the latter part of the chapter (Part II), we formulate the open domain problem in terms of the variational functional, describe the enforcement of boundary conditions for perfectly conducting and composite targets and present the proof of the mesh termination condition in detail. We then validate our solution by comparison with measured or analytically derived data. PART I: CLOSED DOMAIN PROBLEM Solving Maxwell's equations for the resonances of a closed cavity is important in understanding and controlling the operation of many devices, including particle accelerators, microwave filters, microwave ovens and optical fibers. However, the exact eigenvalues can be obtained only for simple geometries. For arbitrarily shaped cavities, numerical techniques like the finite element method must be used, but the occurrence of spurious modes [15] in the node-based finite element approach has plagued the computation of their eigenvalues. This difficulty can be circumvented

45 with the introduction of a penalty term [35] to render the finite element vector field solutions non-divergent. However, it is difficult to satisfy continuity requirements across material interfaces and treat geometries with sharp edges [36] using classical finite-elements, obtained by interpolating the nodal values of the vector field components. As mentioned in the Introduction to this chapter, edge elements, a type of vector finite elements with their degrees of freedom associated with the edges of the mesh, have been shown to be free of these shortcomings. Generally these lead to more unknowns but the higher variable count is balanced by the greater sparsity of the finite element matrix so that the computation time required to solve such a system iteratively with a given accuracy is less than the traditional approach [22]. Here we solve for the eigenvalues of an arbitrarily shaped metallic cavity using node-based and edge-based vector finite elements. The computed data are then compared with analytical results for empty and partially filled cavities. A comparison between the storage intensity and computational accuracy for edge-based rectangular bricks and tetrahedra is also presented. Finally, we compute the eigenvalues of a metallic cavity with a ridge along one of its faces. 4.1 Formulation 4.1.1 Finite element equations Consider a three dimensional inhomogeneous body occupying the volume V. To discretize the electric field E within this volume, we subdivide the volume into small tetrahedra or rectangular bricks, each occupying the volume Ve (e = 1, 2,..., M), where M is the total number of elements. For a numerical solution, we expand E within the eth volume element as

46 m E= E Ww (4.1) j=l where Wj are the edge-based vector basis functions, E~' denote the expansion coefficients of the basis, m represents the number of edges comprising the element and the superscript stands for the element number. On substituting this into the usual vector wave equation and upon applying Galerkin's technique, some vector identities and the divergence theorem, we obtain the weak form of Maxwell's equation R = E/ [ (V x We) (V x We) - k we. we dv -jkoZo I W. (n x H)ds (4.2) where Rt represents the weighted residual integral for element e, Se denotes the surface enclosing Ve, n is the outward unit vector normal to 5e, ZO is the freespace intrinsic impedance and C., Pur is the material permittivity and permeability, respectively. Equation (4.2) can be conveniently written in matrix form as {Re} = [Ae]{Ee} - k2[Be] {Ee} -{Ce} (4.3) where A = 1 (V x W) * (V x W) dv (4.4) B = e et. We Wdv (4.5) C~ = jkoZo s W. (in x H)ds (4.6) le and on assembling the equations from all the elements making up the geometry, we obtain the system M M M M E {R?} = 5[As] {Ee}-k o E[Be] {Ee}- E{Ce} e=l e=l e=l e=l = {0} (4.7)

47 where all matrices and vectors following the summation sign have been augmented using global numbers. Due to the continuity of tangential H at the interface between two dielectrics, an element face lying inside the body does not contribute to the last term of (4.7) in the final assembly of the element equations. As a result, the last term of (4.7) reduces to a column vector containing the surface integral of the tangential magnetic field only over the outer surface of the body. In this application, the surface enclosing the volume of the body V is perfectly conducting and, thus, the coefficients associated with the edges bordering the perfectly conducting surface can be set to zero a priori. This reduces the original unknown count and eliminates the need to generate equations for those edges/unknowns which would have otherwise involved the column vector {Ce}. Also since {Ce} is only associated with boundary edges, the surface integral associated with it vanishes and (4.7) can be written as [A] {E} = A [B] {E} (4.8) where [A] and [B] are N x N symmetric, sparse matrices with N being the total number of edges resulting from the subdivision of the body excluding the edges on the boundary, {E} is a N x 1 column vector denoting the edge fields and A = ko gives the eigenvalues of the system. A solution of (4.8) will yield the resonant field distribution {E} and the corresponding wavenumber ko. 4.1.2 Origin of spurious solutions Conventional finite element basis functions give rise to spurious solutions when (4.8) is solved. As Wong and Cendes points out in [37], the origin of these spurious solutions lies in the infinitely degenerate eigenvalue k = 0 in the spectrum of (4.8). Given the eigenvalue system in 4.8 along with the PEC boundary condition n x E = 0

48 on the boundary, there exists an infinite number of scalar functions such that &' = 0 on the boundary. Then E = -Vt, is a permitted eigenfunction corresponding to the eigenvalue k = 0. If the discretization scheme fails to model this infinite dimensional nullspace of the curl operator exactly, spurious solutions to the eigenvalue problem will appear. One way to get rid of spurious modes is to formulate the eigenvalue problem such that k = 0 is no longer a permissible eigenvalue. This is achieved by enforcing V E = 0 (4.9) exactly everywhere in the solution region. Then the only solution corresponding to the k = 0 eigenvalue is the trivial one E = 0. This is also the reason why spurious solutions do not occur when the Helmholtz equation is discretized. In finite elements, solving a problem (4.8) along with a constraint (4.9) is well known [10]. Researchers have mostly tried the penalty function approach of constrained minimization [35, 36] since it is simple to implement. However, the penalty approach is a mere fix and not a cure for the problem. Since the spurious eigenmodes are now shifted far into the visible spectrum, they are not completely eliminated and are dependent on an user-defined parameter which specifies how strongly the divergenceless condition is to be imposed. Other than the penalty method, derivative continuous finite elements (C1 elements) have also been proposed [37] to alleviate this problem. In this method, an auxiliary vector field C is introduced such that E=Vx( (4.10) Since substitution of (4.10) for E into (4.8) results in second derivatives, we need to construct first derivative continuous elements or C1 elements. As shown in [37],

49 discretization of E using node-based C1 elements eliminates the problem of spurious solutions since the nullspace of the curl operator is modeled exactly. However, C1 elements are not commonly found in finite elements and need to be explicitly derived for the problem at hand. Another method of eliminating spurious modes, without getting rid of the eigenvalue k = 0, is by using edge elements [38]. Bossavit in [38] provides a mathematical proof as to why spurious modes do not appear with edge based elements and why they are likely to be present with node-based vectorial elements. However, special node-based elements, like the C1 element in [37], do not present this problem. Thus the root cause of spurious modes appears to be the improper modeling of the nullspace of the curl operator. Any basis function which approximates it correctly will be stable and free of spurious modes. As it turns out, conventional Lagrangian finite elements are unsuitable; either C1 node-based elements or edge-based elements of any order can be used to obtain the true solutions. 4.1.3 Basis functions Edge no. i! i2 1 1 2 2 1 3 3 1 4 4 2 3 5 4 2 6 3 4 Table 4.1: TETRAHEDRON EDGE DEFINITION

50 The vector edge-based expansion functions for rectangular bricks were presented in [28]. Vector fields within tetrahedral domains can be conveniently represented by expansion functions that are linear in the spatial variables and have either zero divergence or zero curl. The basis functions defined in [22] are associated with the six edges of the tetrahedron and have zero divergence and constant curl. To define them, let us assume that i1 and i2 are the terminal nodes of the ith edge and the six edges of a tetrahedron are numbered according to Table 4.1. The vector basis function associated with the (7 - i)th edge of the tetrahedron is then given by f7-i + g7-i x r, r in the tetrahedron W' 7-/ (4.11) 0, otherwise with b7-i f7-i = 6Ve ril x ri2 (4.12) g7-i = b7e (4.13) in which i = 1,2,..., 6, Ve is the volume of the tetrahedral element, e- = (r2 - ri1 )/bi is the unit vector of the ith edge and bi = Iri2 - ril | is the length of the ith edge with ri1 and ri2 denoting the location of the il and i2 nodes. In general, the implementation of the above discretization will involve two numbering systems, and thus some unique global edge direction must be defined to ensure the continuity of n x E across all edges [39]. Here we choose this direction to be coincident with the edge vector pointing from the smaller to the larger global node number. Finally, since V - Wt = 0, the electric field obtained from a solution of (4.3) satisfies the divergence equation within each element and, thus, the solution will be free from contamination due to spurious solutions.

51 Mode Analytical Computed Computed Error (%) Error (%) (bricks) (tetra.) (bricks) (tetra.) 270 260 unknowns unknowns TE1lo 5.236 5.307 5.213 -1.36.44 TM11o 7.025 7.182 6.977 -2.23.70 TEoll 7.531 7.725 7.474 -2.58 1.00 TE201 7.767 7.573 -3.13 -.56 TM111 8.179 8.350 7.991 -2.09 2.29 TE111 8.350 8.122 -2.09.70 TM210 8.886 9.151 8.572 -2.98 3.53 TE102 8.947 9.428 8.795 -5.38 1.70 Table 4.2: EIGENVALUES (ko, CM-'1) FOR AN EMPTY 1CM x 0.5CM x 0.75CM RECTANGULAR CAVITY 4.2 Results In Table 4.2, we present a comparison of the percentage error in the computation of eigenvalues for a 1cm x.5cm x.75cm rectangular cavity using edge-based rectangular bricks and tetrahedra. The edge-based approach using tetrahedral elements predicts the first six distinct non-trivial eigenvalues with less than 4 percent error and is seen to provide better accuracy than rectangular brick elements. The maximum edge length for the rectangular brick elements was.15cm whereas that for the tetrahedral elements was.2cm. To investigate this matter further, we considered a cubical metallic cavity having a side length of.5cm. A plot of the percentage error in calculating the first three degenerate resonant frequencies versus the number of unknowns is given in Figure 4.1 for both rectangular bricks and tetrahedral elements. It is clear in this example that the tetrahedral elements predict the eigenvalues with greater accuracy than the rectangular bricks. In Tables 4.3, 4.5 and 4.4, we compare the exact eigenvalues with those computed using edge-based tetrahedral finite elements. The finite element mesh was generated

52 1.51.25-', " Tetrahedron 1: \ | Brick o0.75- \ 0.50.25 - 0 500 1000 1500 2000 Number of unknowns Figure 4.1: Performance comparison of rectangular bricks and tetrahedrals. using SDRC I-DEAS, a commercial pre-processing package and it is seen that the numerical results are in good agreement with the exact values for both homogeneous and inhomogeneous cavities. The exact eigenvalues of the half-filled cavity as described in Table 4.3 are computed by solving the transcendental equation obtained upon matching the tangential electric and magnetic fields at the air-dielectric interface. As seen, these results agree with those predicted by the finite element solution to within 1 percent (no symmetry was assumed in this solution). Similar comparisons are given in Table 4.4 for a sphere having 1cm radius. Finally, Table 4.6 presents the eigenvalues of the geometry illustrated in Figure 4.2. This is a closed metallic cavity with a ridge along one of its faces. It is noted that as the degeneracy of the eigenvalues increases, the matrix becomes increasingly ill-conditioned and the numerical solution is correspondingly less accurate [40]. This is clearly observed from the data in Table 4.4 for the case of a

53 Mode Analytical Computed Error (%) 192 unknowns TEzlol 3.538 3.534.11 TEz201 5.445 5.440.10 TEzl02 5.935 5.916.32 TEz301 7.503 7.501.04 TEz202 7.633 7.560.97 TEz103 8.096 8.056.50 Table 4.3: Eigenvalues (ko,cm-1) for a Half-Filled 1cm x O.lcm x 1cm Rectangular Cavity Having a Dielectric Filling of er = 2 Extending from z = 0.5cm to z = 1.0cm. ~ ~ 1.0 cm -0.4 cm — < —0.4 cm — Figure 4.2: Geometry of ridged cavity perfectly conducting hollow spherical cavity. Since the second lowest TM mode has five-fold degeneracy, the computational error is seen to be the greatest. However, for the partially filled rectangular cavity, the absence of degenerate modes gives results which are accurate to within 1 percent of the exact eigensolutions. We finally remark of the inherent presence of zero eigenvalues in our computations whose number is equal to the internal nodes. These zero eigenvalues are easily identifiable and since they do not correspond to physical modes, they were always discarded.

54 Mode Analytical Computed Error (%) 300 unknowns TMolo 2.744 2.799 -2.04 TM11i,even 2.802 -2.11 TMlll,odd 2.811 -2.44 TMo21 3.870 3.948 -2.02 TMi21,even 3.986 -2.99 TM121,odd 3.994 -3.20 TM221,even 4.038 -4.34 TM221,odd 4.048 -4.59 TEo11 4.493 4.433 1.33 TE11i,even 4.472.47 TE11l,odd 4.549 -1.25 Table 4.4: Eigenvalues (ko, cm-1) for an empty spherical cavity of radius 1cm 4.3 Conclusions It was shown that the resonant frequencies of an arbitrarily shaped inhomogeneously filled metallic resonator can be computed very accurately via the finite element method using edge-based tetrahedral elements. The same method in conjunction with node-based elements is much less reliable and not readily applicable to regions containing discontinuous boundaries in shape and material. Edge-based rectangular bricks do not provide as good an accuracy as edge-based tetrahedral elements and their use is further limited to a special class of geometries. PART II: OPEN DOMAIN PROBLEM Of generic interest in electromagnetic scattering is the modeling of composite configurations comprised of metallic and non-metallic sections. In the case of man-made structures, abrupt material discontinuities and metallic corners are also encountered along with resistive sheets and thin ferrite coatings intended for controlling the scat

55 Mode no. Analytical Computed Error(%) TM 010 4.810 4.809.02 TE 111 7.283 7.202 1.1 7.288 -.07 TM 110 7.650 7.633.22 7.724 -.97 TM 011 7.840 7.940 -1.28 TE 211 8.658 8.697 -.45 8.865 -2.39 Table 4.5: Eigenvalues for an empty cylindrical cavity of base radius 0.5cm and height 0.5 cm (380 unknowns) terer's radar cross-section (RCS). Differential equation methods, especially the finite element method (FEM), with its capability of handling arbitrary geometries and its versatility in modeling inhomogeneities and material discontinuities has been a viable solution approach for bounded domain problems. However, for unbounded problems as is the case with electromagnetic scattering, the solution is more involved since the finite element mesh needs to be truncated artificially at some distance from the object with a suitable boundary condition. These boundary conditions can be either global or local. Global boundary conditions are exact but lead to fully populated submatrices thus spoiling the sparse, banded structure of the finite element system. Further, problems due to internal resonances may arise in many cases [41]. In contrast, local conditions such as the absorbing boundary conditions (ABCs), are approximate but have the important advantage of retaining the sparsity of the matrix system. Also, they are free from the interior resonance problem that plagues boundary integral termination schemes [41]. ABCs are essentially differential equations enforced at the mesh truncation boundary and are chosen to suppress non-physical reflections from that boundary, thus ensuring the outgoing nature of the waves.

56 No. (a) (b) 1 4.941 4.999 2 7.284 7.354 3 7.691 7.832 4 7.855 7.942 5 8.016 7.959 6 8.593 8.650 7 8.906 8.916 8 9.163 9.103 9 9.679 9.757 10 9.837 9.927 Table 4.6: Ten lowest non-trivial eigenvalues (ko, cm-1) for the geometry drawn in Figure 2: (a) 267 Unknowns; (b) 671 Unknowns A variety of ABCs have been derived and widely employed in FEM solutions of open region two-dimensional scattering problems. However, the method's implementation and performance for scattering by three dimensional geometries using edgebased finite elements has not received similar attention. The only three-dimensional implementations of the FEM for scattering has been a hybrid solution combined with the boundary element method (BEM) [39, 28, 42] and those formulations combined with ABCs [43, 44]. The boundary element method, though exact, is equivalent to employing a global boundary condition for terminating the mesh and consequently leads to a full submatrix, restricting the method's utility to small geometries. For large-scale three-dimensional applications, it is necessary to employ an ABC for terminating the mesh to retain the 0(N) storage requirement, characteristic of the finite element method. However, the use of traditional node-based elements suffers from various difficulties as mentioned in the Part I. To avoid these difficulties, we consider an implementation of the FEM using vector basis functions whose degrees of freedom are associated with the fields along the six edges of a tetrahedron. Our implementation is further coupled with a mesh

57 termination scheme based on the vector ABCs derived in [45, 46]. In contrast to the implementation proposed in [47], the one presented here preserves the symmetry of the finite element system, thus being computationally more efficient and making it ideally suited for solution via a conjugate gradient type of algorithm. Further, the implementation discussed in [47] requires that the absorbing boundary be placed nearly a wavelength away from the scatterer, whereas in our implementation remarkably accurate results are obtained with the ABCs enforced a small fraction of a wavelength from the scattering body. This is probably due to the accuracy of the second order ABCs derived in [46]. 4.4 Formulation In the following section, the open domain problem is formulated in terms of the finite element functional. The final system is obtained by setting the first variation in the functional to zero and then a Rayleigh-Ritz minimization is performed to arrive at the final answer. 4.4.1 Derivation of finite element equations Let us consider the problem of scattering by an inhomogeneous target associated with possible material discontinuities. To solve for the scattered fields via the FEM, it is necessary to enclose the scatterer- embedded inside the volume V- by an artificial surface SO on which the ABC is enforced (see Figure 4.3). The ABCs to be considered in this chapter are the Sommerfeld radiation condition given by n x V x Es = -jkon x n x Es (4.14) and the second-order ABC [46] which can be written as fix V x E c= E + V x [nfi(V x E8),] +/3Vt(V. E~) (4.15)

58 S.. 0 V t J *^''A''''A''' *........................:..:::.:.::.::.,..............................":...,... ~'' "''~''.-.-.-....-............................................... ^........'* ~''*. *.-/*'*........'........................'..* s- **. Figure 4.3: Illustration of scattering structure Vd enclosed by an artificial mesh termination surface, So, on which the absorbing boundary condition is imposed. where a = jk,/3 = 1/(2jk + 2/r), Es represents the scattered electric field, fi is the unit normal to the surface So and the subscripts t and n denote the transverse and normal component to So, respectively. When these ABCs are employed on the artificial boundary S., they annihilate all field terms of O(rd2m+l) ) and smaller, where m denotes the order of the ABC. The ABCs outlined above were derived for spherical surfaces [46] but in this work we have extended their application to So in (4.15). For flat sections, the 1/r term, therefore, reduces to zero. This permits the construction of termination boundaries conformal to the scatterer, thus reducing the size of the the computational domain. A detailed derivation of (4.15) as well as other more general ABCs are outlined in Chapter 6. The vector ABCs (4.14) and (4.15) can be combined and more conveniently writ

59 ten as n x V x E = P(Es) (4.16) for the scattered field formulation in which Es is the working variable and nixVxE = P(E) +U (4.17) for the total field formulation where the unknown is the total electric field. In (4.17), U i = n x V x E'in - P (Einc) (4.18) where E = E' + Einc is the total field and Einc is the incident electric field. Considering (4.17) to be the boundary condition employed at SO, we can express the functional for the total electric field as F(E) = (v x E) (V x E) -k2erE E] dV + j [E P(E) + 2E Uinc] dS (4.19) where er and [r are the relative permittivity and permeability, respectively. The above functional can be generalized to account for the presence of impedance and resistive sheets or other discontinuous boundaries. In the case of a resistive card, the transition condition [48] n x ( x E)=-Rn x (H+-H-) (4.20) must be enforced, where HA denotes the total magnetic field above and below the sheet, R is the resistivity in Ohms per square and n is the unit normal to the sheet pointing in the upward direction (+ side). For an impenetrable impedance surface, the appropriate boundary condition on that surface is fin x (fin x E) -7-n x H (4.21)

60 where n is the unit normal to the surface and i7 is the surface impedance. Taking into consideration these boundary/transition conditions, the functional for thile total electric field can be more explicitly written as F(E) = 1 [~(V x E) (V x E) - k2crE E] dV +jkoZo Ij (n' x E) (n x E) dS + /[E * P(E) + 2E * Uinc] dS (4.22) where K is the surface resistivity (R) when integrating over a resistive card and equals the surface impedance (r/) for an impedance sheet. In order to deal with anisotropic scatterers, the functional outlined in (4.22) undergoes a slight modification since the material properties of the scatterer (permeability and permittivity) are now second rank tensors rather than scalars. Equation (4.22) can therefore be written as F(E) = Vx E) {-l(V x E)}-k2E. {e.E} dV +jkoZo I E) (n x E ) (n x E) dS + [E P(E) + 2E * Uinc] dS (4.23) where [xx Yxy [xz = #uP P lyy I yz (4.24) LIzx Yzy [zz and exx exy Cxz E= syx ~yy y(4.25) 6zx Czy Czz

61 The symmetry of the final system of equations now depends on the symmetry of the permeability and the permittivity tensors. The formulation presented above is in terms of the total field but we can easily revert to a scattered field formulation by setting Es = E - Einc and noting that the scattered field satisfies the wave equation inside the domain of interest. Let us consider the case where the computational volume V is occupied by a dielectric structure and is bounded internally by the surface of a perfect conductor and externally by the mesh termination boundary. On examining the terms inside the volume integral in (4.19), we can define J [ (VxE)- (VxE)-kc6rE E] dV = G(E,E) (4.26) Expressing the above relation in terms of the incident and the scattered fields, we have G(E, E) = G(E8, ES) + 2G(ES, Einc) + G(Ein, Einc) (4.27) The first and the third terms on the RHS of (4.27) cannot be simplified any further. The second term will, however, lend itself to more simplification. Making use of a simple vector identity and the divergence theorem, we can rewrite G(ES, Einc) as G(EsEinc) = Es * [Vx VxEinc - k2rEinc dV -/L Es. (n x VxEic) dS (4.28) since [-(VxE) (VxEinc dV = I Es. [Vx-VxEnc] dV - J Es (n x VxEinc) dS (4.29) JV ^~~~/r JS r /

62 and the surface integral cancels out everywhere inside the computational domain except on the mesh termination boundary So. If we define Vd to be the volume occupied by dielectric materials, then the remaining volume (Vo = V - Vd) is the volume occupied by free space. On incorporating this into (4.28), we have G(EsEinc) = J Es. [VxVxEinc - k2Einc] dV + Es Vx-VxEn c- krEin] dV Es (n x VxEinc) dS (4.30) Since the incident electric field satisfies the wave equation in free space, the first term of (4.30) is identically zero. The third term cancels exactly with the cross term Ifs E Uinc dS in the total field functional (4.19). The second term can be simplified by employing a standard vector identity and the divergence theorem to yield Es [VX-VxEinc - k2CxrEinC] dV = r~d [ Pr 1 ] -(VxE8) (VxEinc) - k6rEs. Ec dV +jkoZo L -Es (n x Hinc) dS (4.31) where the normal to Sd is directed away from Vd. The surface integral over the dielectric interface Sd occurs since the tangential component of the scattered electric field is discontinuous over the interface between two dielectrics having dissimilar permeabilities. It should be noted that (4.31) holds good even when there are multiple dielectric regions present. If the dielectric regions have the same permeability (Prl = iLr2 =... [rn = 1, for example) and different permittivities, the surface integral contribution over the dielectric interfaces -Sd1,..., Sdn- is zero. If different permeability values are also present, then the permeability values must be substituted into the element equations and the direction of the normal for the two elements on

63 the interface should take care of the respective signs. Therefore, G(ES, Einc) reduces to G(Es,Einc) = (VxE) ( VxE V in) - k2Es E inc dV +jkoZo i-Es - (n x Hinc) dS d ~tr -/ Es' (n x VxEinc) dS (4.32) The impedance and resistive sheet boundary conditions can be incorporated in a similar way into the scattered field functional. After simplification, the functional F(Es) for the scattered field is then given by F(ES) = I l (v x Es). (V x E) - koerEs E] dV +jkoZo s x (n x E) (n x E) dS + E5 * P(Es)dS +2jkoZo j -E (n x Hinc) dS +2 [-(V x Es). (V x Eic)- koE5 s Enc] dV +2jkoZo K (n x E) (n x Ec)dS +f (Einc) (4.33) where Vd is the volume occupied by the dielectric (portion of V where EC,. or Pra are not unity), Sd encompasses all dielectric interface surfaces and E. P(E5)dS = j [(Es)2 + #(V x Es )2 - 3(V. Et)2] dS when the second order ABC is employed. The function f (Ein) is solely in terms of the incident electric field and vanishes when we take the first variation of F(E5). We remark that the scattered field formulation was implemented in our code.

64 4.5 Finite element discretization To discretize the functional given in (4.33), the volume V is subdivided into a number of small tetrahedra, each occupying the volume ye (e = 1,2, - -, AM), where M denotes the total number of tetrahedral elements. Within each element, the scattered electric field is expressed as m E= EW {W }T{E } = {Ee}T{We} (434) j=l where Wq are the edge-based vector basis functions [22], Eje denote the expansion coefficients of the basis and represent the field components tangential to the jth edge of the eth element, m is the number of edges making up the element and the superscript stands for the element number. The basis functions used in our implementation have zero divergence and constant curl. The system of equations to be solved for Ej is obtained by a Rayleigh-Ritz procedure which amounts to differentiating F with respect to each edge field and then setting it to zero. On substituting (4.34) into (4.33), taking the first variation in F and assembling all M elements, we obtain the following augmented system of equations f aF M Ms Mp - Ee- EB[A]{E}Es + [B]{ E + C} = 0 (4-35) J e=l s=1 p=l In this, Ms denotes the number of triangular surface elements on Sk and So and Mp is equal to the sum of the surface elements on Sk, Sd and the volume elements in Vd. The elements of the matrices [Ae], [B3] and {CP} are given by =Ak = | [ (V x Wj)-koeW. Wj] dV

65 +/ [aw: * W;t +:x(V x w)n (v x wX > - /3(V * W,')(V Wt)] dS Cf = 2jkoZo [F -WP (n x Hinc)dS + j -(n x WP) (n x Enc)dS +2 -((v x WP). (V x Einc) - kCrWP Efc1] dV where Vd is the volume of the pth tetrahedron inside the dielectric, SS and SP represent the surface area of the sth and pth triangular surface element and the subscripts t and n denote the tangential and normal components of a vector, respectively. The boundary condition n x Es = -f x E Ei must be imposed a priori on metallic boundaries; however, no special treatment is required at material discontinuities. Only the identification of the edges on material discontinuities or inhomogeneities is required to kick in the contribution from the surface integrals in (4.33). The biconjugate gradient algorithm with diagonal preconditioning was used to solve the sparse, symmetric system of equations. The residual norm was usually set to less than 0.1% of the solution norm as a criterion for convergence since lower tolerances did not appear to offer significant improvement on the far-field values. The data structure was constructed such that only the non-zero elements of the upper triangular part of the symmetric, sparse matrix were stored in a N, x k complex array. In our case, Na was typically 1.1 x N., where N, denotes the number of unknowns and k was equal to 12. The corresponding addresses were stored in a separate Na x k integer array. The storage required in this scheme was about 25Nu and the number of distinct non-zero elements was typically 9N,. 4.6 Results A computer program was written for implementing the proposed FE-ABC formulation. This implementation was validated by computing the scattering for several

66 1510 5 @00. c 0 3 6 91 ". -5 \.~~: \ / ^ ~FE-ABC(o=090) -ing resistive and impedance boundary conditions. Figure 4.4 compares the measured -25 - -,>, i-'~' f''l —'' i-, -, -a-, -, 0 30 60 90 120 150 180 Observation Angle 9~, deg. Figure 4.4: Bistatic echo-area of a perfectly conducting cube having edge length of 0.755A. Plane wave incident from 0 = 180~; ( = 90~. configurations including metallic and dielectric bodies as well as structures satisfying resistive and impedance boundary conditions. Figure 4.4 compares the measured [49] bistatic cross-section (fnc = 180~, oinc = 90~) of a metallic cube having an edge length of 0.755A with the corresponding pattern computed by the three-dimensional FE-ABC code. The second-order vector ABC was employed on a spherical mesh truncation boundary which was placed only 0.1A from the edge of the cube. About 33,000 unknowns were used for the discretization of the computational domain and the [A] matrix contained a total of 264,000 distinct non-zero entries. The storage requirement of this matrix was consequently much smaller than that of the 1400 unknown moment method system (assuming the same sampling rate as the FEM of 14 points/A) which had 2 million non-zero entries. In Figure 4.5, we plot the normal incidence backscatter RCS of a perfectly con

67 30 -- 20- 10-'Z 0 - b X FE-ABC' d [ i Measured.............::: i, -30 --- Edge length in wavelengths Figure 4.5: Backscatter RCS of a perfectly conducting cube at normal incidence as a function of edge length ducting cube as a function of its edge length. The meshes constructed for this mesh termination boundary. Each iteration took approximately 0.1 seconds on a smCray YMP after vect(more izatiohan 0.15A) from the scaverage it was seefound the agreement with25,000,

68 10 5 o.ob -10 -...............H4=0.(BOR):::::::::::::::::: O HH=O (FE-ABC) 0 30 60 90 Observation Angle o, deg. Figure 4.6: Backscatter pattern of a perfectly conducting cylinder having a radius of 0.3A and a height of 0.6A. The solid and the dashed lines indicate data obtained from a body of revolution code and the black and white dots indicate FE-ABC data. the number of required iterations were approximately N/100. The agreement was quite good even on enclosing the metallic cylinder with a rectangular outer boundary placed 0.3A from the edge of the scatterer. The results presented till now have been for perfectly conducting geometries. However, the real advantage of the FEM over integral equation techniques is the ease with which the former can handle material inhomogeneities and transition conditions. With this in mind, the remaining figures show backscatter and bistatic patterns for scatterers comprised of resistive cards, dielectric material and combinations of these. The first geometry that we tested was that of a homogeneous dielectric sphere having a relative permittivity of 4 and a radius of 1/27r. The bistatic pattern of the geometry is compared to that obtained using a CG-FFT formulation (Figure 4.7) and is seen The results presented ~~~~~~~~~tilnwhaebefopreclcndtigemtiese. However, the real advantage of the FEM overinerlqutotchies iste eas

69 -10 -20- z -< - ~ Y -50 - -40 - |.................. FE-ABC I |l......... l.* CG-FFT -50- -'' i'' l'' l'' l''l —' -' 0 30 60 90 120 150 180 Observation Angle 6e0, deg. Figure 4.7: Bistatic echo-area of a homogeneous dielectric sphere (c. = 4; koa -=1). to agree remarkably well. The finite element mesh was terminated only 0.3A from the dielectric body. Another of the test cases was a prolate spheroid shown in Figure 4.8 filled with lossy dielectric having or = 4-jl, koa = 2r/2 and a/b = 2, where a and b are the major and minor axes of the spheroid, respectively. The bistatic pattern (inc = 180~; qZfc = 90~) obtained from the FE-ABC solution agree reasonably well with those obtained via the hybrid finite element-boundary integral method presented in [39]. However, the corresponding convergence rate for non-metallic bodies and resistive/impedance sheets was found to be slower than that observed for metallic scatterers. A diagonal preconditioner was, therefore, used to accelerate the convergence of the biconjugate gradient algorithm with encouraging results. For our last example, we compute the scattering from an inhomogeneous geometry with embedded resistive cards. Particularly, the scatterer shown in Figure 4.9

70 — \ _....-: ~ n.::.....:;-...... \.............................iiii.ii....... 1.5 x'.ijii!iii:iiiilllllil!ll!illllilili" x I~g~' - -.: -".............. b ^- \ I -- FE-ABC D ~ OFE-BI 0.5- \^ 0 30 60 90 120 150 180 Observation Angle 00, deg. Figure 4.8: Normalized bistatic pattern of a lossy prolate spheroid (c, = 4-jl; koa = 7r/2;a/b = 2), where a and b are the major and minor axes of the spheroid, respectively consists of an air-filled resistive card block (0.5A x 0.5A x 0.25A) joined to a metallic block (0.5A x 0.5A x 0.25A). In Figure 4.10, we compare a principal plane backscatter pattern obtained from our 3D FE-ABC implementation with data computed using a traditional moment method code [50] for both polarizations. The computed data is again seen to follow the reference data closely. For the FE-ABC solution, the scatterer was enclosed within a cubical outer boundary placed only 0.3A away from the scatterer. This resulted in a 30,000 unknown system which converged to the solution in about 400 iterations when using the Sommerfeld radiation condition and in 1600 iterations when the second order ABC was used. For this geometry, the second order ABC did not provide a significant improvement in accuracy (only about 0.1dB) over the first order condition. The same case was run with a higher discretization resulting in a system of 50,000 unknowns; however, there was no significant difference in

71 z Y x^^ i —- a -; — sessurfac e having R = the far-field values with the earlier case. The geometry for the backscatter pattern, *'".". a ii' 1 " i'.......... i... Figure 4.9: Geometry of cube (a b -0.5A) consisting of a metallic section and a dielectric section (c, = 2-j2), where the latter is bounded by a resistive surface having R- Zo. shown in Figure 4.11 is the same as in Figure 4.9 with the air-filled section now occupied by a lossy dielectric having 6r = 2 - j2. The backscatter echo-area pattern for the qx polarization as computed by our FE-ABC code is again seen to be in good agreement with corresponding moment method data [50]. 4.7 Conclusions In this chapter, we have shown that the finite element technique with vector basis functions, when coupled with ABCs for mesh termination and the biconjugate gradient algorithm for the solution of the resulting system, is a viable procedure for computing the scattering by three-dimensional targets. We have found that these ABCs can be enforced only a small fraction of a wavelength from the scatterer's surface. This is probably due to the fast (1/r) decay of the scattered fields. As a result, in addition to the sparsity of the matrix, the total number of unknowns

72 5-5 o 0l -30: %'bS^%.0 -. l -10 6 0 00~ /0 < -15, -20- %'I FE-ABC (E.,) ~~~~~: ~~~* MoM -25 - FE-ABC (E ). - OMoM -300 30 60 90 120 150 180 Observation Angle 6~, deg. Figure 4.10: RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The lower half of the cube is metallic while the upper half is airfilled with a resistive card draped over it. is kept under control. Further, due to the use of edge elements, the program can easily handle sharp conducting edges and tips, inhomogeneous dielectric and/or magnetic materials, resistive sheets and impedance surfaces. These, in conjunction with the well-known advantages of the finite element method, result in low O(N) storage requirement, making the computation of large body scattering possible. These capabilities along with the ease in modeling arbitrary geometries, makes this formulation, to the best of our knowledge, one of the first suitable for solving practical three-dimensional scattering problems.

73 10' O -10 -20 0 30 60 90 120 150 180 Observation Angle ~po, deg. Figure 4.11: RCS pattern in the x - z plane for the composite cube shown in Figure 4.9. The composition of the cube is the same as in Figure 4.10, except that the air-filled portion is filled with dielectric. The solid curve is the FEMATS pattern and the black dots are MoM data for the EnC = 0 polarization.

CHAPTER V Optimization and parallelization In the previous chapter, we laid the foundation for our methodology by outlining the formulation of the finite elment system together with the absorbing boundary condition method of mesh termination and presenting some examples to validate our solution technique. We found that the FE-ABC technique yielded accurate far-field values for small geometries, i.e., structures whose dimensions are less than a wavelength. However, our principal motivation was to compute scattering from large, three dimensional structures having arbitrary material inhomogeneities and regions satisfying impedance and/or transition conditions. As mentioned earlier, the number of unknowns escalates rapidly in three dimensions as the target size increases. Therefore, the limiting factor in dealing with three dimensional problems is the unknown count and the associated demands on storage and solution time. Solution techniques which have 0(N) storage and feasible solution times are thus the only way that the curse of dimensionality can be avoided. This is one of the principal reasons for the popularity of partial differential equation techniques over integral equation (IE) approaches which lead to dense matrices and 0(N2) storage. As the problem size increases, the IE and hybrid methods, both of which need O(N'), 1 < 1 K 2, storage, quickly become unmanageable in terms of storage and solution time. 74

75 Another concern while solving problems having more than 100,000 unknowns - a scenario that can be envisioned for most practical problems - is to avoid software bottlenecks. The algorithmic complexity of any part of the program should increase at most linearly with the number of unknowns. In this chapter, the implementation details of our finite element code are presented along with the associated numerical considerations. The various trade-offs associated with the data structures used to represent sparse matrices and their impact on vectorization and parallelization are discussed. The iterative solver, a preconditioned biconjugate gradient (BCG) algorithm, is studied along with point and block preconditioning strategies and the trade-offs between the two types of preconditioners are outlined. A modified incomplete LU (ILU) preconditioner is presented, which seems to work better than the original ILU preconditioner for our matrix systems. Iterative solvers for unsymmetric matrix systems are also mentioned to handle anisotropic geometries and situations where the mesh termination condition makes the system unsymmetric. In order to facilitate the solution of large problems, the computationally intensive portions of the finite element code have been parallelized on a variety of massively parallel architectures like the KSR1 (Kendall Square Research) and the Intel iPSC/860. A full analysis of the communication patterns is also presented for the KSR1 machine. 5.1 Numerical considerations The finite element code implemented by the authors can be divided into four main modules: * Input/output * Right-hand side vector (b) generation

76 * Finite element matrix (A) generation * Linear equation solver The input to the program consists of the mesh information obtained by pre-processing the mesh file generated from SDRC I-DEAS, a commercial CAD software package. The right-hand side vector,b, is usually a sparse vector and only a small fraction of the total CPU time is required to generate it. The finite element matrix generation consists of too many subroutine calls and highly complex loops to permit any significant speedup through vectorization. It is, however, highly amenable to parallelization as will be discussed later. The most time-consuming portion of the code is the linear equation solver, taking up approximately 90% of the CPU time. On a vector computer like the Cray YMP, it is possible to vectorize only the equation solver. However, short vector lengths and indirect addressing inhibit large vector speedups. 5.2 Matrix storage and generation The matrix systems arising from I-DEAS were very sparse: on the average, the minimum number of non-zero elements per row was 9 and the maximum number of non-zeros per row was 30. The total number of non-zeros varied between 15N and 16N, where N is the number of unknowns. There are various storage schemes for sparse matrices. In this chapter, we will discuss the ITPACK format [51], the jagged diagonal format and the Compressed Sparse Row (CSR) format. Knowledge of the storage formats is important since the speed of computation on vector or parallel processors is directly linked to the data structure used for matrix storage.

77 In the ITPACK storage scheme, a sparse matrix A of order N is stored using two arrays D and PC. For example, if we have the 5 x 5 unsymmetric matrix A 30045 70402 A= 4 0 7 0 0 00800 97000 Then, according to the ITPACK scheme, the rows of the array D will contain the non-zero elements of the corresponding rows of the original matrix. The number of columns of D will be equal to the maximum number of non-zeros in a row; rows containing fewer non-zero elements will be zero padded. The array V thus looks like 345 742 =- 47 0 800 970 The column indices of the elements in V are stored in an integer array PC defined as 1 4 5 1 3 5 PC 1 3 * 3 * * 1 2 * The asterisk denotes that the corresponding elements of V are zeros. The ITPACK storage scheme is attractive for generating finite element matrices since the number

78 of comparisons required while augmenting the matrix depends only on the locality of the corresponding edge and not on the number of unknowns. Moreover, the sparse matrix-vector multiplication process can be highly vectorized because of large vector lengths when the number of non-zeros in all rows is nearly equal. However for our application, almost half the space is lost in storing zeros. As a result, a lot of storage as well as computational effort is wasted in storing and operating on zeros, respectively. The modified ITPACK scheme [52], or the method of jagged diagonals, does alleviate this problem to a certain degree by sorting the rows of the matrix by decreasing number of non-zero elements. However, 30% of the allotted space is still lost in zero padding. The best trade-off between storage and speed for our application is obtained by storing the non-zero matrix elements in a long complex vector, the column indices in a long integer vector and the number of non-zeros per row in another integer vector. This data structure is referred to as the Compressed Sparse Row (CSR) format. A similar data structure which stores the row indices instead of the column indices is called the Compressed Sparse Column (CSC) format. The CSC format is sometimes used when the matrix is to be accessed along the rows and not the columns, e.g., in the multiplication of the transpose of a sparse matrix with a vector. In our implementation, a map of the number of non-zeros for each row is obtained through a simple pre-processor. The main program stores the matrix in CSR format, thus minimizing storage and sacrificing a bit of speed. The required storage is 15N to 16N complex words plus integers for D and PC, respectively, and N integers for the array containing the pointers to the rows' data.

79 5.3 Linear equation solver In three dimensional applications, the order N of the system of linear equations may be very large. Direct solution methods usually suffer from fill-in to an extent that these large problems cannot be solved at a reasonable cost even on state-ofthe-art parallel machines. It is, therefore, essential to employ solvers whose memory requirements are a small fraction of the storage demand of the coefficient matrix. This necessitates the use of iterative algorithms instead of direct solvers to preserve the sparsity pattern of the finite element matrix. Especially attractive are iterative methods that involve the coefficient matrices only in terms of matrix-vector products with A or A'T. The most powerful iterative algorithm of this type is the conjugate gradient algorithm for solving positive definite linear systems [53]. In our implementation, the system of linear equations is solved by a variation of the CG algorithm, the biconjugate gradient (BCG) method. This scheme is usually used for solving unsymmetric systems; however, it performs equally well when applied to symmetric systems of linear equations. For symmetric matrices, BCG differs from CG in the way the inner product of the vectors are taken. The conjugate gradient squared (CGS) algorithm [54] performs best when applied to unsymmetric systems of linear equations. It is usually faster than BCG but is more unstable since the residual polynomials are merely the squared BCG polynomials and hence exhibit even more erratic behavior than the BCG residuals. Moreover, there are cases where CGS diverges, while BCG still converges. Recently, Freund [55] has proposed the quasi-minimal residual (QMR) algorithm with look-ahead for complex symmetric matrices. Based on the above, the biconjugate gradient (BCG) algorithm was found to be

80 most suitable for our implementation. The BCG requires 1 matrix-vector multiplication, 3 vector updates and 3 dot products per iteration. The solution scheme requires only three additional vectors of length N. The vector updates and the dot products can be carried out extremely fast on a vector Cray machine like the Cray YMP, reaching speeds of about 190 MFLOPS. However, the matrix-vector product, which involves indirect addressing and short vector lengths, runs at about 45.5 MFLOPS on 1 processor of the 8-processor Cray YMP. As a rule of thumb, the biconjugate gradient algorithm with no preconditioning consumes 4.06 microseconds per iteration per unknown on the Cray YMP. 5.4 Preconditioning The condition number of the system of equations usually increases with the number of unknowns. It is then desirable to precondition the coefficient matrix such that the modified system is well-conditioned and converges in significantly fewer iterations than the original system. The equivalent preconditioned system is of the form [C-] [A] {x} = [-] {b} (5.1) The non-singular preconditioning matrix C must satisfy the following conditions: 1. should be a good approximation to A. 2. should be easy to compute. 3. should be invertible in O(N) operations. The preconditioners that we discuss below are the diagonal and the ILU point and block preconditioners. Block preconditioners are usually preferable due to reduced data movement between memory level hierarchies as well as decreased number

81 of iterations required for convergence. Block algorithms are also suited for highperformance computers with multiple processors since all scalar, vector and matrix operations can be performed with a high degree of parallelism. 5.4.1 Diagonal preconditioner The simplest preconditioner that was used in our implementation was the point diagonal preconditioner. The preconditioning matrix C is a diagonal matrix which is easy to invert and has a storage requirement of N complex words, where N is the number of unknowns. The entries of C are given by Cij = b Ai, i-1,...,N; j=1,...,N (5.2) where bij is the Kronecker delta. The matrix C-1 contains the reciprocal of the diagonal elements of A. The algorithm with the diagonal preconditioner converged in about 35% of the number of iterations required for the unpreconditioned case. This suggested that our finite elment matrix was diagonally dominant since the reduction in the number of iterations was rather impressive. The diagonal preconditioner is also easily vectorizable and consumes 4.1 microseconds per iteration per unknown on the Cray YMP, a marginal slowdown over the unpreconditioned system. A more general diagonal preconditioner is the block diagonal preconditioner. The point diagonal preconditioner is a block diagonal preconditioner with block size 1. The block diagonal preconditioning matrix consists of m x m symmetric blocks as shown in Figure 5.1. The inverse of the whole matrix is simply the inverse of each individual block put together. If the preconditioning matrix C is broken up into n blocks of size m, the storage requirement for the preconditioner is at most m x N. However, this method suffers a bit from fill-in since the inverted m x m blocks are dense even though the original blocks may have been sparse. Due to this reason,

82 C1 C2 C3 CN-ll1 CN Figure 5.1: Structure of block preconditioning matrix large blocks cannot be created since the inverted blocks would lead to full matrices and take a significant fraction of the total CPU time for inversion. However, since the structure of the preconditioning matrix is known a priori, this preconditioner vectorizes well and runs at 194 MFLOPS on the Cray-YMP for a block size of 8. For a test case of 20,033 unknowns, a block size of 2 caused the maximum reduction in the number of iterations(14%) and ran at 197 MFLOPS. 5.4.2 Modified ILU preconditioner The next step was to use a better preconditioner to improve the condition number of the system resulting in faster convergence. The traditional ILU preconditioner [56] was employed with zero fill-in; however, the algorithm took a greater number of iterations than the diagonal preconditioner to converge to a specified tolerance. This was probably because the ILU preconditioned system may not have been positive

83 definite [57]. The preconditioned conjugate gradient method usually converges faster if the preconditioner is positive definite, although this is not a necessary condition. Higher values of fill-in were not attempted since the preconditioner already occupied storage space equal to that of the coefficient matrix. Algorithm 1: Modified ILU preconditioner with zero fill-in It is assumed that the data is stored in CSR format and that the column numbers for each row are sorted in increasing order. The sparse matrix is stored in the vector D and the column numbers in PC. Sl!Q(i) contains the total number of non-zeros till the ith row. The locations of the diagonal entries for each row are stored in the vector DTIA9. The preconditioner is stored in another complex vector, LU. for i=l1 step 1 until n-1 do begin lbeg=diag (i) lend=sig(i) for j=lbeg+l step 1 until lend do begin jj=pc(j) ij=srch(jj,i) if (ij.ne.O) then begin lu (ij )=lu(ij)/lu (lbeg) end end end

84 A modified version of the ILU preconditioner was next employed by eliminating the inner loop of the traditional version. The algorithm basically scales the off-diagonal elements in the lower triangular portion of the matrix by the column diagonal. Since the matrix is symmetric, it retains the LDLT form and is also positive definite if the coefficient matrix is positive definite. This preconditioner is less expensive to generate and converges in about 1/3 the number of iterations taken by the point diagonal preconditioner. It has been tested with reliable results for N < 50000. However, the time taken by the two preconditioning strategies is approximately the same since each iteration of the ILU preconditioned system is about three times more expensive. The forward and backward substitutions carried out at each iteration runs at 26.5 MFLOPS on the Cray YMP and proves to be the bottleneck since they are inherently sequential processes and the vector lengths are approximately half that of the sparse matrix-vector multiplication process. The triangular solver is also extremely difficult to parallelize. Techniques like level scheduling and self scheduling try to exploit the fine grain parallelism in the sparse system [58]. We implemented the level scheduling algorithm to examine the potential parallelism in the forward/backward substitution step. For solving any lower triangular system Lx = b, the ith unknown in the forward solution is given by Xi= (b - E lj) (5.3) If L is dense, all the components xi,..., xi- need to be computed before xi can be obtained. However, when L is sparse, most of the lijs are zero; hence, we may not need to compute all of the unknowns xj,..., xi-, before solving for xi. Level scheduling is based on this simple observation. The dependencies between the unknowns can be modeled using a graph in which node i corresponds to the unknown

85 xi and an edge from node j to node i indicates that lI $ 0 implying that the value of xj is needed for solving xi. The operation shown in (5.3) can now be rewritten as x -= l'(ib,- E y i x) (5.4) tt \ j< i:l o:O Thus xi can be solved at the kth step if all the components Xj in (5.4) have been computed in the earlier steps. In order to implement the level scheduling algorithm, we need to define the depth of a node and the level of the graph. The depth of a node is defined as the maximum distance from the root [59]. Therefore, we will place an imaginary root node with links to the nodes having no predecessors so that the depth of each node will be defined from the same point. The depth of each node can now be computed with one pass through the structure of the coefficient matrix L by 1, if li - =0 for allj < i1 depth(i) = (5.5) 1 + maxj<i{depth(j): lij $ 0}, otherwise The level of the graph can then be defined as the set of nodes with the same depth. The level scheduling algorithm can now be implemented without physically ordering the matrix, but solving the system in increasing order of node depth and distributing the nodes at each depth among the available processors. Algorithm 2: Forward elimination step with level scheduling The number of levels of the graph, nlev, can be easily determined from the depth information. Two other integer vectors are also required. ORDER(i) stores the ordering of the rows of L in terms of increasing node depth. LEVEL(i) stores the index to the start of each level in ORDER(i). do k=l,...,nlev

86 do j=ilevel(k),..., ilevel(k+ 1)-1 (parallel loop) i=iorder(j) execute Equation 5.4 enddo enddo However, our efforts at parallelizing the ILU preconditioned system with level scheduling did not lead to significant speedup mainly due to the enormous amount of memory traffic that was generated. This observation was also noticed in [58], where the authors estimated that the parallel algorithm generated as much as 10 times more traffic than the sequential code. In order to look for an effective parallelizable ILU preconditioner, we next turned our attention to a block ILU preconditioner. In our scheme of implementing the block ILU preconditioner, we distribute one block to each processor in a multi-processor architecture, thus achieving load balancing as well as minimizing fill-in. The modified ILU decomposition outlined earlier is then carried out on each of these individual blocks. Further, since the blocks are much larger than the block diagonal version, the preconditioner is a closer approximation to the coefficient matrix. Moreover, the triangular solver is fully parallelized since each processor solves an independent system of equations through forward and backward substitution. In our test case of 20,033 unknowns, the number of iterations was reduced by approximately half the number required by the diagonal preconditioner. Since the work done is less than twice that for the diagonal preconditioner, we achieved a marginal savings of CPU time. However, the number of iterations required for convergence is highly sensitive to block size as shown in Table 5.1 for N = 20,033. Table 5.1 clearly shows that a larger block size (smaller number of blocks) does not guarantee faster convergence. Nevertheless, there is an approximately 50% decrease

87 in the number of iterations over the point diagonal preconditioner, regardless of block size. The optimum block size is dependent on the sparsity pattern of the matrix and can only be determined empirically. The savings in the number of iterations over the No. of blocks No. of iterations 1 127 2 176 4 185 8 172 12 162 16 174 24 223 28 177 Table 5.1: No. of iterations vs no. of blocks for a block ILU preconditioned biconjugate gradient solution method. point diagonal preconditioner for 28 blocks is given in Table 5.2 for a system having 224,476 unknowns. From the table, it is clear that the block ILU preconditioner Angle of no. of iterations Ratio incidence point diagonal (I) block ILU (II) (II/I) 0 2943 2758.937 10 5985 3834.641 20 5464 3984.729 30 6048 3651.604 40 5770 3256.564 50 5107 3720.728 60 6517 4162.639 70 5076 4108.809 80 5305 3551.669 90 1 2898 2832.977 Table 5.2: No. of iterations required for convergence of a 224,476 unknown system using the point diagonal and block ILU preconditioning strategies. is very effective in reducing the iteration count; however, the CPU time required is about 10% less than that required by the point diagonal preconditioner for the best case.

88 5.5 Parallelization The different versions of the FE-ABC code were parallelized on two different types of massively parallel architectures - the KSR1 and the Intel iPSC/860. The KSR1 is a parallel machine which implements a shared virtual memory, although the memory is physically distributed for the sake of scalability. The Intel iPSC/860, on the other hand, is a distributed memory, Multiple Instruction, Multiple Data (MIMD) system in which the nodes process information independently of one another and communicate by passing messages to each other. The conversion of sequential or vectorized code to parallel code involves two primary tasks: * parallelization of DO loops Parallelism is introduced by allowing each processor to execute a portion of the DO loop. * distribution of arrays among the processor set Since each processor only has a limited amount of memory, each array is divided into smaller units that reside on each node. This also allows array accesses from each processor to be serviced by different nodes, thus reducing contention for resources on any single node. On a cache-only memory machine such as the KSR1, only the first step is necessary since the hardware cache system automatically takes care of data distribution among the processors. This makes porting codes to the KSR1 quite easy. However, the increased control of data distribution and communication on the iPSC/860 can translate into improved performance for some applications. We will first describe our port and performance figures for the KSR1 and then detail our port on the Intel iPSC/860.

89 1. KSR1 port The basic strategy for the parallelization of the code is described on the biconjugate gradient solver with diagonal preconditioning. The other versions use the same parallelization scheme with slight modifications. We also comment on the parallelization of the matrix assembly phase. Complex Real Operation * + * + Matrix Multiply nze nze - N 4nze 4nze - 2N Vector Updates 4N 3N 16N 12N Dot Products 3N 3N 12N 12N Table 5.3: Floating point operations per iteration. The symmetric biconjugate gradient method iteratively refines an approximate solution of the given linear system until convergence. Figure 5.2 shows the method in terms of vector and matrix operations. For a system of equations containing N unknowns, all these vectors are of size N and the sparse matrix is of order N. The number of nonzero elements in the sparse matrix is denoted as nze. Table 5.3 shows the operation counts per iteration for each type of vector operation. In the FE-ABC code, each vector operation is implemented as a loop. The program is parallelized by tiling these loops. For P processors, the vectors are divided into P sections of N/P consecutive elements. Each processor is assigned the same section of each vector. This partitioning attempts to reduce communication while balancing load. To guarantee correctness, synchronization points are added after lines 2, 7, and 9. Lines 2 and 7 requre synchronization to guarantee that the dot products are computed correctly. Note that the dot products in lines 6 and 7 require only one synchronization. The line 9 synchronization guarantees that p is completely updated

90 before the matrix multiply for the next iteration begins. In the sparse matrix vector multiplication, each processor computes a block of the result vector by multiplying the corresponding block of rows of the sparse matrix with the operand vector. Since the operand vector is distributed among the processors, data communication is required. The communication pattern is determined by the sparsity structure of the matrix, which in our case is derived from an unstructured mesh. Therefore the communication pattern is unstructured and irregular. However since the sparse matrix is not modified during the iterative process, the communication pattern is the same at each iteration. Vector updates and dot products are easily parallelized using the same block distribution as in the sparse matrix vector multiply. Although sparse computations are known to be hard to implement efficiently on distributed memory machine, mainly because of the unstructured and irregular communication pattern, the previous scheme was easily and efficiently implemented on the KSR1 MPP thanks to the global address space [60]. Table 5.4 shows the execution time of one iteration (in seconds) and the speedup for different numbers of processors and for two problem sizes. For both problems, the performance scales surprisingly well up to a large number of processors. For the 20,033-unknown problem, the speedup for the parallelized sparse solver varies from 1 to 38 as the number of processors is increased from 1 to 56 (Figure 5.3). The overall performance of the solver on 28 processors is more than three times that of a single processor on the Cray-YMP. The large problem (224,476 unknowns) exhibits superlinear speedup which can be attributed to a memory effect. As a matter of fact, the large data set does not entirely fit in the local cache of a single node in the KSR which results in a large number of page faults. However, as

91 Initialization: x given r = b-Ax; p = r; tmp = r * r Repeat until (resd < tol) q =Ap (1) Step 1 a = tmp/(q. p) (2) x = x + ap (3) r = r - aq (4) q = C-1 * r (5) Step 2 resd -= -Ir r*1 (6) 3 =- (r. q)/tmp (7) tmp=3 x tmp (8) Ste Step 3 p=q+/3p (9) J EndRepeat A4 is a sparse complex symmetric matrix. C is the preconditioning matrix. q, p, x, r are complex vectors. ac, 3, tmp are complex scalars; resd, tol are real scalars. Figure 5.2: Symmetric biconjugate gradient method with preconditioning.

92 48, Linear 48...../ 40sured, 32 -; ~24/ 16- -- 0 -'' I''' I''' I''' I''' I''' I'I' 0 8 16 24 32 40 48 56 Number of processors Figure 5.3: Speedup curve for the linear equation solver on the KSR1 the number of processors increases, the large data set is distributed over the different processors' memories. The global matrix assembly is the second largest computation in terms of execution time. The elemental matrices are computed for each element in the 3D mesh and assembled in a global sparse matrix. A natural way of parallelizing the global matrix assembly is to distribute the elements over the processors, have each processor compute the elemental matrix of the elements it owns and update the global sparse matrix. Since the global sparse matrix is shared by all processors, the update needs to be done atomically. On the KSR1 this is done by using the hardware lock mechanism. The performance for the matrix assembly is given in Table 5.5 and also in Figure 5.3.

93 Procs N=20,033 N=224,476 Execution time Speedup Execution time Speedup (secs per iter) (secs per iter) 1a.515 1 10.8 1 8.071 7.3 1.4 7.7 16.040 12.9.671 16.1 29.027 19.1.304 35.6 60b.149 76.2 Table 5.4: Execution time and speedup for the iterative solver 5.5.1 Analysis of Communication In the main loop (Figure 5.2), significant communication between processors takes place only during the sparse matrix vector multiply (line 1) and the vector update of p (line 9). The rest of the vector operations incur little or no communication at all. The distribution of the nonzero entries in the matrix affects the amount and nature aFor 1, 8 and 16 processors, only the first 100 iterations were run. bCode run on a 64 node KSR at Cornell University Procs Execution time Speedup in seconds 1 24.355 1 2 13.376 1.8 4 6.811 3.6 8 3.744 6.5 16 1.89 12.9 25 1.625 15.0 28 1.276 19.1 Table5.5: Execution time and speedup for the matrix generation and assembly (20,033 unknowns)

94 of communication. In this section, we present an analysis of the communication pattern incurred by the sparse matrix vector multiplication as derived from analysis of the sparsity structure of the matrix. Line 1. In the matrix-vector multiply, each processor computes an N/P-sized subsection of the product q. The processor needs the elements of p that correspond to the nonzero elements found in the N/P rows of A that are aligned with its subsection. Because the matrix A remains constant throughout the program, the set of elements of p that a given processor needs is the same for all iterations in the loop. However, since p is updated at the end of each iteration, all copies of its element set are invalidated in each processor's local cache except for the ones that the processor itself updates. As a result, in each iteration, processors must obtain updated copies of the required elements of p that they do not own. These elements can be updated by a read miss to the corresponding subpage, by an automatic update, or by an explicit prefetch or poststore instruction. Figure 5.4 lists the number of subpages that each of the 28 processors needs to acquire from other processors. Automatic update of an invalid copy of a subpage becomes more likely as the number of processors sharing this subpage grows. The number of processors that need a given subpage (excluding the processor that updates the subpage) is referred to as the degree of sharing of that subpage. Figure 5.5 shows the degree of sharing histogram for the example problem. Since the only subpage misses occurring in Step 1 of the sparse solver are coherence misses due to the vector p, the use of the poststore instruction to broadcast the updated sections of the vector p from Step 3 should eliminate the subpage misses in Step 1. However, the overhead of executing the poststore instruction in Step 3 offsets the reduction in execution time of Step 1. On a poststore, the processor typically stalls for 32 cycles while the local cache is

95 Number of Subpages 350- --- ----------- ----............................................................................................................350300 - 250- - 200 —.- ------ - _ _.... 150100500 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627 Thread ID Figure 5.4: Counts of p subpages required by each processor for sparse matrix-vector multiply (total copies=5968) busy for 48 cycles. As a result, the net reduction in execution time is only 3%. Line 9. Before proceeding with the updates of the N/P elements of p for which it is responsible, each processor must acquire exclusive ownership for those elements. Because a cache line holds 8 consecutive elements, each processor will generate N/8P requests for ownership (assuming all subpages are shared). In order to hide access latencies, the request for ownership can be issued in the form of a prefetch instruction after step 1. This could lead to an eightfold decrease in the number of subpage misses. However, as with the poststore instruction, the benefit of prefetching is offset by the overhead of processing the prefetch instructions in step 2. This is because the processor stalls for at least two cycles on a prefetch and the local cache cannot satisfy any processor request until the prefetch is put on the ring. The overall execution time is reduced by only 4% in this case.

96 Number of Subpages 900805 800700600 561 7 8 9 10 500 300 2 200 -174 116 15 2 4 1 0 1 2 3 4 5 6 7 8 9 10 Degree of Sharing multiply (28 procesors) These are then gathered and summed up on a single processor. 2. Intel iPSC/860 port The parallelization of the DO loops is one of the main tasks since the majority of the computer time is spent on solving the linear system of equations. The basic strategy for parallelizing the DO loops on the iPSC/860 is similar to the KSR1 - each portion executes a portion of the DO loop. This scheme works fine as long as there are no dependencies in the body of the loop, as is the case for the casevector updates andfor the sparse matrix-vector multiply of the linear solver. However, the main loop in the matrix generation/assembly phase contains a dependency between loop iterations.

97 As on the KSR1, this problem is solved by using a mechanism by which each processor locks a row of the matrix while performing an update. Since the locking of each row is maintained by the processor whose memory holds the particular row, processors lock and unlock rows by sending messages to the appropriate row owner. Even though the parallelization of loops enables programs to run faster on multiprocessors, the distribution of arrays must be done for the code to run at all. Arrays are distributed in the code by partitioning one dimension among the processors. Thus for a 1000 element array, processor holds the first 100 elements, processor 2 the next 100 and so on. The straightforward method for accessing this distributed array involves the translation of array references into subroutine calls. Thus an expression x = a(i) is translated into the call call fetcha(i,x). The subroutine fetcha then sends a message to the processor that holds element a(i), which in turn sends a reply message with the value of a(i). Although this scheme requires the implementation of a new subroutine for each distributed array and the replacement of each array access with a subroutine call, the process is easy and mechanical. However, such a scheme does not result in good performance. The primary reason for this is that the overhead for sending a message is much higher than that of sending a single byte. The cost for sending 10 or even 100 bytes is usually not much higher than that of sending 1 byte. Thus, messages need to be'bundled' for fast and efficient operation. However, the simple strategy mentioned above is in direct contrast to message bundling. Therefore, we have implemented the simple scheme for parts of the code the ode that do not take up a significant portion of computation time like the matrix generation/assembly phase and a better scheme for accessing the distributed arrays in the equation solver phase. The primary operation in the solver that generates communication is the sparse

98 matrix-vector product. Since the matrix-vector product involves performing a dot product of each row with the distributed vector, each processor must obtain the values for the entire vector from the other processors. The dot product operation must be carried out in several phases as each processor may not be able to hold the entire vector in memory. Thus each processor P begins the matrix-vector multiply by sending its portion of the vector to other processors, then performs the following tasks for every other processor P' * Reads the portion of the vector owned by P'. * Updates the partial dot product for each row by adding the product of the appropriate matrix element with the elements of the partial vector. After performing the above operations for all the processors, the dot product is complete. Unfortunately, each phase requires a pass over all the sparse matrix rows owned by the processor. In the future, it may be possible to sort each row of the matrix to allow the phases to pass over the rows in order. The speedup results on a problem with 31000 unknowns show that the problem scales reasonably well for a small number of processors. However, as the number of processors increases, much of the time is spent on communication and book-keeping than on true computation. Efforts are under way to run a larger problem.

CHAPTER VI Conformal absorbing boundary conditions for the vector wave equation As mentioned in the previous chapters, the focus of this thesis is the computation of scattering from large three-dimensional structures. Since we are dealing with large targets having arbitrary shape, a spherical mesh termination boundary is not as attractive in terms of storage and computational cost. This is especially true for long and thin geometries where a sphere is the least economical shape of mesh termination, in terms of the number of unknowns. The ideal situation would be to enclose the scatterer inside a mesh termination boundary which is of the same shape as the scattering body (see Figure 6.1). If boundary conditions could be derived for such conformal mesh truncation surfaces, the volume to be meshed and the corresponding computing cost would then be minimized. However, available three dimensional ABCs for the vector wave equation as derived by Peterson [45] and Webb and Kanellopoulos [46] are only suited for application on spherical mesh terminations. Our goal, therefore, is to derive new vector ABCs for three dimensional analysis which can be applied on a surface conformal to the structure of interest. We begin with a modified Wilcox expansion whose leading order term recovers the geometrical 99

100 Figure 6.1: Scatterer enclosed in conformal mesh termination boundary optics fields and thus, given the appropriate principal curvatures, the resulting ABC completely absorbs all geometrical optics fields from the surface. We then proceed to derive the first and second order absorbing boundary conditions in terms of the principal curvatures of the surface on which they are employed. We also introduce an approximation to make the absorbing boundary condition contribution symmetric. In the next step, we incorporate these boundary conditions into the finite element equations and express them in a readily implementable form. We also comment on the symmetry of the system for doubly curved surfaces. In the last section, we examine the performance of these ABCs - in terms of computational cost - when applied on mesh termination surfaces conformal to the scattering object. 6.1 Formulation It is known that the electric field in a homogeneous region of space is governed by the vector wave equation VxVxE-k2E-=0 (6.1)

101 where ko is the free-space wave number. We assume that the field has a well-defined phase front in the region under consideration. Since we are concerned only with local behavior, we can assume that the phase fronts can be treated as parallel regions. Consequently, the surface describing the phase fronts can be specified by a net of coordinate curves denoted by t1 and t2 and a third variable n denotes the coordinate along the normal to the phase front. The point of observation in the Dupin coordinate system [61] can now be defined as x = nfi + XO (tl, t2) (6.2) where n is the unit normal and xo (t1, t2) denotes the surface of the reference phase front. The curl of a vector in the above coordinate system is given by 9E VxE =VTxE+nix (6.3) On where VT x E is called the surface curl involving only the tangential derivatives and is defined as [62] VT x E = -n x VEn + t21iE1, - t12E2 + nV (E x n) (6.4) In (6.4), KcI and 2 denote the principal curvatures of the surface under consideration, Et,, Et2 are the tangential components and En is the normal component of the electric field on the surface. The principal curvature of a surface is defined as [61] 1 1 Oh1. 1=l = - -- a (6.5) 1 1 h2 (6.6) N2 = = -R- (6-6) Rz h2 On where h1, h2 are the metric coefficients and R1, R2 are the principal radii of curvature. Using the aforementioned coordinates, the Wilcox expansion for a vector radiating

102 function can now be generalized to read ~-jk\ ^on c_ Ep (tlIt2) E(n, t1, t2) = ik1/ - pE ( / 2P (6. 7 47 R1R2 (7 VRR2) where Ri = pi + n, i = 1,2 and pi is the principal radius of curvature associated with the outgoing wavefront at the target. The lowest-order term in (6.7) represents the geometrical optics spread factor for a doubly curved wavefront and reduces to the standard Wilcox expansion [63] for a spherical wave. Moreover, (6.7) can be differentiated term by term any number of times and the resulting series converges absolutely and uniformly [63]. 6.1.1 Unsymmetric ABCs In the 3D finite element implementation using vector basis functions and the electric field as the working variable, we need to relate the tangential component of the magnetic field in terms of the electric field at any surface discontinuity. Therefore, our next task is to derive a relation between n x VxE (i.e., n x H where H is the magnetic field) and the tangential components of the electric field on the surface. Taking the curl of the electric field expansion given by (6.7) and crossing it with the normal vector, we have fixVxE = SRe-Jk + m -)Ept V tEpn+ p t (6Ept 8 xE47 [ UP+1 + UP+1 (6.8) where u = /R1 R2 and VtEn = -(n x n x V) En K1 + K2 Km ~ 2 T= -nitlt1 + n2t2t2

103 Considering that Eon is zero due to the divergenceless condition [63] and simplify ing. we obtain the first order absorbing boundary condition fix-xE-("k/ n-.)E ejkon CVtEpn+PKmEpt n x V xE - (jko + -7 ) Et = UtEpn + p(6.9) 47r __ or, n x VxE-(jko+Km-77)Et = 0 +0 (n-) (6.10) for a conformal outer boundary. Not surprisingly, this is the impedance boundary condition for curved surfaces as derived by Rytov [64]. It should be noted that in the above equation, VtEn and Km are each proportional to n-1. Therefore, the leading order behavior of (6.10) is 0 (n-3), i.e., only the first two terms of (6.7) are exactly satisfied by (6.10). If the scattered field contains higher order terms, application of (6.10) will give rise to non-physical reflections back into the computational domain. In order to reduce these spurious reflections, we need to either shift the mesh truncation boundary farther away from the scatterer or employ higher order boundary conditions which satisfy higher order terms of (6.7). To reduce the order of the residual error further, we consider the tangential components of the curl of (6.9). This yields n x Vx [n x VxE- (jko + Km -.)E] =t] kn ko + (p 3)K K VtEpn + PmEpt 4 [ m K uP+ uP_ +l (6.1) where Kg = K1K2 is the Gaussian curvature. Using the result derived in (6.9) and simplifying, (6.11) reduces to n x Vx [n x VxE-(jko +Km- )Et] jko +(P 3)Km g } VtEP + PKm (2Km - -.)VtEn (6.12)

104 If we take a closer look at the term in the square brackets on the RHS of (6.12), we find that it can be written as e-n Vt Epn+ pPKmEpt 47r UP+l + jko +3m- 9.- {n x VxE-(jko + KIm- -) Et} Km where we have substituted ~eA 2 E[ VtEPn +JmEP-t =] n xVxE-(jko+ -m -')Et (6.13) using the relation derived in (6.9). Now the dominant terms on the RHS of (6.12) can be eliminated by considering the higher-order operator [n x Vx(-jko + 3im- -r7 ) {n x VxE- (jko + IKm - ) Et} + \Km (/ g _-)g e-jkon oo VtEpn + pKmEpt (2nm -m.')VtEn- ZPhm (614) The residual of (6.14) can be reduced further to yield the absorbing boundary condition of second order which satisfies (6.7) to 0 (n-5). This second order ABC is found to be nx Vx - (jko + 4;m- -} {n x VxE - (jko + Km - r') Et} + (2I- -17. )VtE, =0 (6.15) K~m and the residual is equal to e-jkon oo VtEpn+phmEpt 4ar (p1)/gm up+1 Th p-opeato on th LHSP ofp (6.16) The operator on the LHS of (6.15) can be applied repeatedly to obtain ABCs of increasing order; however, higher order basis functions are needed for their implementation.

105 After some algebraic manipulation, the terms on the LHS of (6.15) reduced to simpler ones. In addition to the wave equation, the following vector identities were derived to carry out the simplifications and are provided below for the reader's convenience: nfixVxEt = fixVxE-VtEn nfix VxVtE = Vt (V.Et) + 2KmVtEn n x Vx (n x VxE) = Vx {n (VxE)n} - 2Et + A, n x VxE where AK = K - K2. The derivation of these identities is given in Appendix B. Upon simplification, the second order ABC can be compactly written as - (D - A - 27) n x VxE + {4K2 - Kg + D (jko - f.) + (.)2 *} Et +Vx {fin (VxE)} + (jko + 3m- Kg -2.) VtEn = 0 (6.17) \ K~r Km in which D 2jko +5KmKm and (7)2 Et= KEt,ti + K2Et2t2 The derivatives of the curvatures in (6.17) have been ignored due to the reasons outlined in [62]. These derivatives essentially give rise to second-order curvature terms which add to the coefficient of Et only. The second order ABC derived in [1] is recovered on setting K1 = K2 = 1/r. 6.1.2 Symmetric correction It has been shown by Peterson in [45] that the LHS of (6.17) when incorporated into the finite element equations gives rise to an unsymmetric matrix system

106 in spherical coordinates. To alleviate this problem. Kanellopoulos and Webb [46] suggested an alternative derivation involving an arbitrary parameter which would lead to a symmetric matrix while sacrificing some accuracy. Below, we discuss a different approach which leads to a symmetric ABC without the introduction of an arbitrary parameter. On considering the series expansion of the term n x V x VtE, we have -jAkon oo nx VxVEn = - 4 E {jko + (p + 1l)m} VtEp = jkoVtEn+ 2imVtEn +Z E(p-l)m 1m p=2 UP+l = jkoVtEn + 2KmVtEn + 0 (n-5) and on making use of the vector identity Vt (V * Et) = f x V xVtE - 2imVtEn given earlier, we arrive at the following result Vt (V.Et) = jkoVtEn + 0 (n-5) (6.18) Since our ABC was derived to have a residual error of 0 (n5), we can replace jkoVtEn with Vt (V.Et) without affecting the order of the approximation. Doing so, the second order ABC with a symmetric operator can be rewritten as (D - A - 2f.) ni x VxE = {4cm - _g + D (jko - ff.) + (rf)2.} Et+ Vx {n (VxE)} + - (jko + 3/ - 9- -27r) Vt (V.Et) (6.19) j ko k rt. M It can be easily shown that the above boundary condition leads to a symmetric system of equations when incorporated into the finite element functional for surfaces having /l =- /2. Equations (6.10) and (6.19) reduce to the boundary conditions derived in [46] on setting n1 = - = 1/r which have been found to work well for spherical and flat boundaries [65].

107 6.1.3 Finite element implementation The boundary condition outlined in equation (6.19) cannot be incorporated into the finite element equations without modification. As explained in Chapter 3, the absorbing boundary condition is implemented in the finite element system through the surface integral over the mesh termination surface So. En x VxEdS =/ E P(E) dS where P(E) denotes the boundary condition relating the tangential magnetic field to the tangential electric field on the surface. Let P1(E) denote the first order absorbing boundary condition given by (6.10), where the subscript represents the order of the ABC. Therefore, the surface integral contribution for the first order ABC reduces to s E. Pi(E) dS = (jko +m) E. Et dS - E. (77 Et) dS (6.20) Using some basic vector identities and considering that Et = -n x n x E, we deduce that S E. P(E)dS = (jko + im) (E2 + E) dS-S (iEt2, + 2Et22 dS (6.21) which is a readily implementable form of the first order ABC. However, the second order ABC does not simplify as easily. If P2(E) denotes the second order ABC given by (6.19), we can rewrite it in a more compact vector notation as P2(E) =- Et +/.3 [Vx {f (VxE)n}] + 7. {Vt (V.Et)} (6.22) where the tensors a, /3 and 7 are given by a = D -/Ac - 2K1 {4tm -g + D (jko -/ l) + /l2} ilt

108 +D- d- 2 {4K m Kg + D (jko - 2) + 2} t2t2 (6.23) = 1 ^t (6.24) D - AK 2K1 D - AK - 2nt2 ko(D - AK - 2K) Km ) tKm +.,D A — -k (o + 3JKo m -o 2FK2) t2t2 (6.25) jko(D AK- 22) A Km Substituting the second-order absorbing boundary condition in the surface integral given in (6.22), we have E- P2(E)dS = E (a -Et) dS +I E (/3- {n(VxE)n}) dS + E. {. Vt(V.Et)} dS = + 2+I3 Let us examine the integral I1. Since Et = -n x n x E, we have I1= j /$aE2 +a a2E22 dS (6.26) after employing some simple vector identities. The other two integrals (I2 and I3) do not reduce as easily to simple, implementable forms. They are first simplified using basic vector and tensor identities and then the divergence theorem is employed to eliminate one of the terms. Considering the integrand of the second integral 12, we note that E [3 - (Vxno)] = (*E) -(Vxn^q) where we have set = (VxE)n. Using some additional vector identities and letting ~3 E = F, we get F Vxnx = V.(Qn x F)+ qnf. VxF V. (n x F)+ (VxF)~

109 Using the results from [61], the first term in the above identity can be further simplified to read 0 V-(in x F) = Vs-(qn x F) + { {fi(fi x F)}-J {fn (n x F)} Vs (nfix F) (6.27) where Vs denotes the surface gradient operator and J = ci1 + K2. The integral I2 can now be written as I2 = s Vs. (ni x F) dS + js q(VxF)n dS We can now apply the surface divergence theorem to the first term on the RHS of the this expression to yield J vs. (on x F) dS = m (n x F) dl = 0 (6.28) since the surface So is closed. We note that m = 1 x n and 1 is the unit vector along the edge of the surface element and C denotes the contour of integration. On the basis of (6.28), I2 reduces to 2 = jo(VxE) {Vx7 x E)} dS (6.29) We now turn our attention to simplifying I3 for implementation in the finite element equations. Considering the integrand of I3, we have E. {. Vt(V.Et)} (y. E).{Vt(V.Et)} = (V.E). {Vf-A} where = V * Et. Next, setting G = y' E, we obtain G. v -n - } =V.(. G)- V.G - G (6.30)

110 The first term in the above identity can be written as V (PG) = V. (OG) + (VGn) - J ('Gn) and since aGn/9n = V * G - V Gt + JGn, the LHS of (6.30) reduces to G. V f-n-fi = Vs. (7G)- V *Gt (6.31) We now replace the integrand of I3 with the expression in (6.31) and use the divergence theorem to eliminate the first term of (6.31). Specifically, I3 = j Vs. (G) dS - -V Gt dS =| m J (4G) dS - j V GtdS = - V -GtdS where m has been defined earlier and the contour integral vanishes since the surface is closed. The integral 13 can finally be rewritten as 3 = -Is (V. Et)(V.Gt) dS (6.32) Using (6.26), (6.29) and (6.32), the complete surface integral term incorporating the conformal second order ABC reduces to J E.P2(E)dS = I (alE2 + 2E2) dS + (VxE)n {Vx(13.E)} dS -j (V. E){V.(7.E)t} dS (6.33) It remains to be seen whether the integrals in (6.33) lead to a symmetric system when incorporated into the finite element equations. With this in mind, we will examine three simple shapes and check whether they preserve symmetry of the finite element system. It will then be possible to generalize our findings to a more general mesh truncation boundary.

111 Let us consider the case of a sphere of radius r. Since the two principal curvatures of the sphere are identical (ci = K2 = 1/r), the first order boundary condition reduces to the simple Sommerfeld radiation condition f E.P1(E)ds = jkoj (E + E2) dS (6.34) On a spherical boundary, the second order ABC also reduces to the comparatively simple form: jE. P2(E) dS = [oE+ 1 (VxE) -2(V.E)2] dS 2 jk(+ / - 2 ~ j k+ 0ko + 2/r (6.35) The ABC given in (6.35) is identical to the boundary condition derived in [46] for a spherical mesh termination surface and leads to a symmetric system of equations. Next, we consider the case of a planar termination boundary in which case K1 = K2 = 0. The first order ABC then reduces to the Sommerfeld radiation condition and the second order ABC for a planar boundary simplifies to E. P2(E) dS = [jk oEt + 2k (VxE) -2jk (V Et)2] dS (6.36) Since the planar boundary is a special case of a spherical boundary, (6.36) again reduces to asymmetric system of equations. Now we examine the situation when the mesh termination boundary is cylindrical in shape and of radius p. The principal curvatures of the cylindrical surface are then Ig = I/p and K2 = 0. Since the principal curvatures are no longer identical, the tensors a, /3 and a do not reduce to simple scalars. The first order ABC for a cylindrical outer boundary is given by / E.Pi(E)dS - = (jko'-}- (jko +- Et dS-j -E dS (6.37) J~~~ 1 1 )J~ ],~ 1

112 and the second order ABC gives by fs EP2(E)dS = I/ (acE +ac2E ) dS + (VxE)p{Vx(o3.E)} dS -j (V. Et){V. (Yc E)t} dS (6.38) where QCl, c2,13c and yc are obtained by substituting K1 = 1/p and K2 = 0 in the original expressions for a,,/3 and 7. It is seen that the first order ABC given by (6.37) leads to a symmetric matrix for a cylindrical boundary. On the other hand, the second order ABC does not yield a symmetric matrix for an arbitrary choice of basis functions. However, the boundary condition outlined in (6.38) preserves symmetry on using linear edge-based elements for discretization. The above discussion enables us to conclude that the first order boundary condition leads to a symmetric system for surfaces having arbitrary principal curvatures. However, symmetry is guaranteed for the second order ABC only when the two principal curvatures of the mesh termination boundary are identical, i.e., only when the outer boundary is limited to a planar or a spherical surface. Thus if we want to enclose a scatterer having arbitrary shape within a conformal outer boundary, an unsymmetric system of equations will have to be solved. It should, however, be noted that the resulting unsymmetric system will, in general, have a lesser number of unknowns than its symmetric counterpart. 6.2 Applications In the previous section, we have discussed the derivation of absorbing boundary conditions which can be employed on surfaces conformal to the scattering or radiating structure. As a result, the mesh termination boundary can be made to enclose the scattering object more snugly. Consequently for arbitrary targets, we achieve a

113 substantial saving in the amount of volume to be meshed between the ABC surface and that of the scatterer. This is particularly critical when the target is cylindrical in shape or a combination of cylindrical, doubly curved and planar surfaces as is the case with any real-life structure. In this section, we examine the performance of these boundary conditions when applied on conformal mesh termination surfaces. A. Composite cube For our first example, we compute the backscatter pattern of the half metal-half Z XA XB~Y Resistive a:::::::::::::::::::::::::::::::::::: /2 dielectric section (er = 2 -j2), where the latter is bounded by a resistive surface having R = Zo. dielectric cube geometry shown in Chapter 3. However, instead of using a spherical scatterer. The geometry is shown in Figure 6.2 and needed only 30,000 unknowns ti e................... a 2 Figure6.2: Gometryof cub (a b.5A)........ cn istn......... ecio ad dieletric ectin....2-.... whee.th.lattr.isboundd.bya.resstiv surface~ ~ ~ ~~~~...... I...... R Z dielectrc cub sheo etr........ I.....................se d fusn as heia surface~ ~~~~ 1........ Iemiat th.esweeplyth.bsrin.oudrycndtono. piecewis planar surface, i..e. uialbxpae......... 0...... th ac fh scaterer Thegeomtry s shwn iFigre.2. ndnede.oly3000.uknwn

114 for discretization. This is in stark contrast to the 40,000 unknown system which resulted when the geometry was enclosed in a spherical termination boundary. The decrease in the unknown count is even more dramatic as we go to larger scatterers. In Figure 6.3, we plot the backscatter pattern in the x - z plane (EfC = 0 polar100 -20..... 0 30 60 90 120 150 180 Observation Angle Vo, deg. Figure 6.3: RCS pattern in the x - z plane for the composite cube shown earlier. The solid curve is the FEMATS pattern and the black dots are MoM data for the Ezn = 0 polarization. Mesh termination is piecewise planar. ization) for the metal-dielectric cube geometry given in Figure 6.2 and compare the computed values with data obtained from a traditional method of moments (MoM) code [50]. The dielectric-filled section has unit permeability and a relative permittivity of 2 - j2. The agreement with reference data is seen to be excellent; it can therefore be concluded that accuracy of the far-field values has not been affected by a different mesh termination scheme. In fact, whe have obtained results of comparable accuracy with only 75% of the computing resources than were necessary before. This observation will be made by the reader again and again in the following pages as the

115 full capability of these conformal ABCs is demonstrated. B. Inlets In our next example, we compute the scattering from perfectly conducting inlets. The aperture of an inlet usually has a large radar cross-section around normal incidence: therefore, a good understanding of its scattering characteristics is critical if measures need to be taken for reducing its echo-area. An accurate computer simulation of such a geometry provides a cost-effective and ready way of allowing the designer to experiment with complex material fillings to achieve satisfactory results. All our validations are carried out for empty inlets due to lack of reference data for more complicated structures. 2 0 -------------------- 15i 1 L. ~ i 5 0 -101 1 5 - - I 0 30 60 90 Observation Angle 6~, deg. Figure 6.4: Backscatter pattern of a metallic rectangular inlet (1A x lA x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is spherical. The geometry of interest is a perfectly conducting rectangular inlet, with dimensions 1A x 1A x 1.5A. For the plots shown in Figures 6.4 and 6.5, we have enclosed

116 20 - 1510 - 0 30 60 90 Observation Angle (p<, deg. Figure 6.5: Backscatter pattern of a metallic rectangular inlet (IA x IA x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is spherical. the target within a sphere of radius 1.35A, which is only about.35A from the farthest edge of the scatterer. This resulted in a system of 224,476 unknowns and converged in an average of 785 seconds per incidence angle on the 56 processor KSR1. The computed values from our code agrees very well with measured data for both HH and VV polarizations. As can be seen from the above discussion, we have obtained our solution using significant computing resources and time. Our next step is to use the conformal mesh termination scheme formulated in the previous section and utilized in Example A. Therefore, instead of using a spherical mesh truncation surface, we terminate the mesh with a rectangular box placed only 0.35A away from the scatterer (see inset of Figure 6.6). The problem size reduces dramatically to 145,000 unknowns, a 35% reduction over the spherical mesh termination scheme. The convergence time for each excitation vector is about 220 seconds, less

117 than 4 minutes, when run on all 56 processors of the KSR1. The computed values are again compared with measured data for both polarizations in Figures 6.6 and 6.7; the agreement is excellent, albeit a bit worse than the spherical case. However, this fact is overshadowed by the fact that we have reduced the problem size by more than a third and computing time by about a fourth. 20 - < 0 b \ I -5 -0 0 30 60 90 Observation Angle (o, deg. Figure 6.6: Backscatter pattern of a metallic rectangular inlet (1X x IX x 1.5A) for HH polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is piecewise planar. We then considered the problem of scattering from a perfectly conducting cylindrical inlet. Even though integral equation codes are more efficient for such bodies of revolution, our primary concern in this test was to examine the performance of the conformal absorbing boundary conditions that we derived earlier. The target is a perfectly conducting cylindrical inlet having a diameter of 1.25A and a height of 1.875A. We first used a rectangular outer boundary, placed.45A from the farthest edge of the target, to enclose the scatterer. The radar cross-section was then com

118 20 - 15 1< _ - -150 0 30 60 90 Observation Angle (o, deg. Figure 6.7: Backscatter pattern of a metallic rectangular inlet (1A x 1A x 1.5A) for VV polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is piecewise planar. puted for a +-polarized incident wave in the yz-plane and compared with measured data (Figure 6.8). The agreement was found to be quite good for all lobes except the third. We expect the results to improve on moving the outer boundary farther away. The backscatter echo-area computed for the same geometry by Shankar [66] using the finite difference-time domain method with the absorbing boundary placed a few wavelengths from the scattering structure, agrees with the computed results remarkably well for all incidence angles. Next, we used a truly conformal termination scheme by using a cylindrical surface for mesh truncation. It should be noted that this is the first instance of a non-spherical surface (i.e., a surface having different principal curvatures) being applied to terminate a finite element mesh for solving open problems. The cylindrical outer boundary was placed about 0.45A from the target and RCS computations were

119 - -15- 0'.-20- 2 ~. 30 30..... -25 - 0 C t 1.5" h 1 for )H Measured face is a rectangular box =.................. 0 30 60 90 Obsdrical termination and a rectangular termination Angledo not differ significantly. How-g. Figuever, the 6.8: Bsavings in computatter patterional cost is quite impressive. Theonducting cylindrical nlet (diamesh rectangular truncation sheight=1me. A spherical mesh termination would have swelled tourabout 265,000 unknowns, sampling densityt wave and outer boundary distance remainingure 6.9).the same. Thus we haobserved from Figures 6.8 and 6.9, the far-field results for a cylindrtime by a sical termilar, if not ga rectangular termination do not differ significantly. How-ion sheme. The savings in computational rcost is quite impressive. The significant even when wesh compare the rectangular and cylindrical termination schemes - a 25% reduction in problem size and a similar decrease in computation time. In Figure 6.10, we plot

120 -10 - -15 -20- z o -25..-. -30-'':'..:. -35 lii!i!j| -40. 0 30 60 90 Observation Angle SO, deg. Figure 6.9: Backscatter pattern of a perfectly conducting cylindrical inlet (diameter= 1.25A, height=1.875A) for HH polarization. Black dots indicate computed values and the solid line represents measured data. Mesh termination surface is a circular cylinder. the backscatter pattern for the same cylindrical inlet in which the incident wave is 0 polarized. The agreement is seen to be decent for the entire range of incident angles. C. Lossy foam cylinder with embedded wires The next geometry to be considered was a lossy foam (cr = 1.05 - jO.2) cylinder having a radius of 1A and a height of 3.5A with 0.5A long perfectly conducting wires embedded in it. The wires are spaced 0.5A apart from each other and have a diameter of 0.01A. This is an interesting problem for two reasons: first, the orientation of the embedded wires makes it a difficult geometry for other methodologies to handle; second, the problem is very large since the cylinder has a volume of 11 A3 and a surface area of 28.27A2.

121 -10 0 30 60 90 -20Observation Angle deg. 0 30 60 90 Observation Angle or, deg. toFigure 6.10: Backscatter pattern of a perfectly paralllcnducting cylindrical inlet (diam-mpressive e ter= 1.25A, height=l.875A) for VV polarization. Black dots indicate com uted values and the solid line represents measured data. Mesh termination surface is a circular cylinder. As a first case, we consider the wires to be aligned along the axis of the cylinder 6.11). The cylindrical mesh termination boundary was placed 0.45A from the flat and curved surfaces of the foam cylinder. The resulting system of equations had 437,064 unknowns but needed only an average of 3050 iterations to converge to the well as due to the presence of the metallic wires. from the cylinder axis. The number of unknowns and the time taken for convergence

122 15 10 -5 -.: 0 bj -10 -15 -.......... Observation Angle 6^ deg.............. -20. Perfectly conducting plate............. -25 0 30 60 90 Observation Angle perf, deg.ctly conducting plate was two-fold. It is usualltter pattern odifficult to model the scattam cylindering fromwith three edges of thenducting wires embedded along the axis. The inciden t electric field is oriented parallelsts to sethe axis of the cylinder. M esh termination surface is a circular cylinder. is comparable to the earlier case. The effect of the offset wire on the backscatter pattern can be studied in Figure 6.12. There is a slight asymmetry in the side lobes of the resulting backscatter pattern but, as can be expected, the effect is very small. D. Perfectly conducting plate two-fold. It is usually very difficult to model the scattering from the edges of the choice wexasin theo the ormae te in a rectangular box. The second choice was to use a box with half cylinders attached to the faces normal to the plane of the plate

123 15- 10 - 5.c -5 -10 -15 -20 -90 -60 -30 0 30 60 90 Observation Angle o0, deg. Figure 6.12: Backscatter pattern of a lossy foam cylinder with the middle wire offset 0.25A from the cylinder axis. The incident electric field is oriented parallel to the axis of the cylinder. Mesh termination surface is a circular cylinder. - the reasoning being that since the edge of the plate behaves like a line source and scatters cylindrical waves, a cylindrical mesh termination was most suitable for wave absorption. It should be noted that both mesh termination schemes require approximately the same number of unknowns; the superiority of one over the other was thus decided only on the basis of computed backscatter values. Our test case is a 3.5A x 2A perfectly conducting rectangular plate. In Figure 6.13, we plot the backscatter pattern for the 00 polarization in the xz plane, i.e., over the long side of the plate. The computed values compare very well with reference data; however, the code does not pick up the sharp null at 0o = 450 and the two mesh termination schemes perform as well, although a slight improvement is noticeable for the box-cylinder termination.

124 30 - 20 - 10 0 30 30 60 90 Observation Angle so, deg. Figure 6.13: Backscatter pattern (asee) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination. Next, we plot the backscatter pattern of the same geometry for the 44 polarization over the long side of the plate in Figure 6.14. Again, the agreement with reference data is quite good. However, the backscatter echo-area at edge-on incidence is not calculated accurately. In the next figure, we compute the RCS of the conducting plate in the yz plane, i.e., over its short side, for the 44 polarization. The backscatter echo-area for edge-on incidence is picked up very well for a rectangular-cylindrical termination whereas a rectangular truncation scheme gives completely incorrect results. These two schemes have approximately the same storage requirement; in fact, the box-cylinder combination yields a slightly smaller system of equation. This example truly illustrates the power of a conformal truncation scheme composed of simple shapes: not only are the results far more accurate but even the storage requirement is slightly less.

125 30 — _ 2010-20 -30 0 30 60 90 Observation Angle ~o, deg. Figure 6.14: Backscatter pattern (acro) of a 3.5A x 2A perfectly conducting plate in the xz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination. In the last experiment, we compute the backscatter values of the conducting plate for a conical 0 = 80~ cut (Figure 6.16). This problem is extremely difficult since the ABCs are usually the least accurate at edge-on. Even then, the comparison with reference data is decent for the box-cylinder combination and not so accurate for the rectangular truncation scheme. This is especially noticeable at q = 0~ where the rectangular termination boundary yields completely inaccurate results. In all the above simulations, the boundary was terminated at 0.35A from the flat face of the plate and 0.5A from the edges of the plate. E. Glass plate In the final example, we present the results for one of the most challenging problems solved by the FEMATS method. The target is a simple rectangular glass slab 1.75A long, 1A wide and.125A thick. The relative permeability of glass is taken to be

126 30 - 20 - 10 - -10- z 0 -20 - Z o -30 -......-.-.*.-. i.-. -X-... 0 30 60 90 Observation Angle ~o, deg. Figure 6.15: Backscatter pattern (ci0~) of a 3.5A x 2A perfectly conducting plate in the yz plane. The white dots indicate box termination; the black dots represent a combined box-cylinder termination. 3 - j.09. The backscatter pattern (oeoo) is sought for the 0 = 80~ cut. This is an extremely difficult problem for any numerical method to handle since the incident field is almost edge-on to the dielectric slab, causing the scattered field to have strong higher order components which decay appreciably only at large distances from the scatterer. In our first attempt, we enclosed the glass plate in a flat box placed.45A from it. Though the computed results were accurate for some angles, they departed significantly from the reference data [66] for other incidences. The results did not show a significant improvement on shifting the outer boundary 0.5A away from the scatterer while maintaining its shape. The next step was to modify the shape of the outer boundary such that the flat box had half cylinders attached to the faces normal to the plane of the slab - the reasoning was the same as that for the perfectly conducting

127 20 - 10 - P'oZ m 0'Z 0 - -10- o -20 c.... I''''' I - --- 0 30 60 90 Observation Angle so, deg. Figure 6.16: Backscatter pattern (aow0) of a 3.5A x 2A perfectly conducting plate for a conical 0 = 80o cut. The white dots indicate box termination; the black dots represent a combined box-cylinder termination. rectangular plates mentioned in an earlier section. Further, this termination scheme results in an outer boundary with no sharp edges. The free space between two cylinders with their axes perpendicular to each other is filled with a quarter sphere having the same radius as the cylinders. As can be observed in Figure 6.17, the computed results agree quite closely with the reference data for most of the incident angles. The problem was solved with only 155,000 unknowns and needed an average of 2700 iterations to converge to the specified tolerance. The slow convergence is typical of geometries that are composed of dielectrics since the linear system of equations becomes indefinite due to non-unit values of the relative permittivity.

128 -20 Z E -25 0-y. S -30 -35 - -) - -4 -45| -50 mom Figure 6.17: Backscatter pattern (Tee) of a glass plate (1.75A x 1A x.125A) having a relative permittivity of (3 - j.09) for the 0 = 80~ cut. 6.3 Conclusion In the previous section, we have used our FEMATS methodology to compute scattering patterns from large, three dimensional geometries having arbitrary shapes and complicated configurations and with material inhomogeneities. It has been shown that the solution technique does indeed live upto its promise of delivering accurate results with the expenditure of minimal computer resources. Very large problems can be solved in real time with a fraction of the resources that existing numerical electromagnetics codes require. There are basically two parameters that govern the accuracy of the final result. The first one is the mesh termination technique. Though lower order absorbing boundary conditions are simpler to implement, a significant penalty in computational resources has to be paid for getting the same accuracy as the higher order ABCs. The

129 reliability and the order of the ABC is thus crucial to the computation process. Tlhe second parameter is the shape of the mesh termination boundary and its distance from the scatterer. As shown in the previous sections, significantly better results can be obtained by using smooth termination boundaries in place of flat boxes with sharp corners. The distance of the termination boundary from the scattering body is also important in obtaining accurate results. We have determined that a distance of.45A usually gives reliable results for large three dimensional geometries. For small problems, distances of.3 -.35A are sufficient to give accurate far-field values. In order to obtain reliable near-field values such as the input impedance of an antenna, the mesh must be terminated farther away from the target.

CHAPTER VII Conclusions This thesis is one of the first works known to the author to tackle the problem of three dimensional scattering using finite elements and ABCs. It started out as an investigation into the methodology mentioned above, given its extremely attractive features for solving large problems. The 0(N) storage requirement and the iterative equation solver are essentially the keys to this technique, since they allow the solution of extremely large, realistic problems with minimal computational overhead. As the technology progresses to higher frequencies and longer electrical dimensions, the superiority of this solution methodology over integral equation or hybrid techniques will become even more apparent. The opening chapter outlined the motivation in choosing this solution technique over other traditional methods. In Chapter 2, we described the basic concepts of electromagnetics and finite elements. They were by no means complete or exhaustive and the interested reader is referred to several excellent texts for reference [7, 10, 11]. Chapter 3 provided the rationale for employing edge basis functions for discretizing electromagnetic field variables in three dimensional computation. A full review of nodal and edge bases of various orders for two and three dimensions was also carried out. In Chapter 4, the basic formulation of the solution technique was presented along 130

131 with code validation for a large number of small and composite geometries. Since the methodology was found to perform extremely well for small complex geometries with inhomogeneities, we considered extending it to compute scattering from very large problems. In order to achieve this goal, the first step was to make the finite element code computationally efficient. Chapter 5 discussed the various optimization techniques that were carried out to make the code as computationally efficient as possible on a wide class of vector and parallel machines. Since the mesh termination condition used till now were valid only for spherical or flat terminations, the next step was to derive more efficient boundary conditions that would yield accurate results even with reduced computational resources. In Chapter 6, new ABCs were derived which are enforceable on mesh termination boundaries conformal to the surface of the target and result in drastic reductions in the number of unknowns and hence solution time. Thus, it can be stated that this thesis has achieved its objective of showing the efficacy and the accuracy of computing three dimensional scattering from large, composite and complex-shaped structures using finite elements in conjunction with the ABC method of mesh termination. Any research, however, by its very nature throws up more questions than it answers. The work outlined in the previous chapters is no exception. The author has sought to provide answers to the more immediate questions but many more need to be examined for making this methodology robust and viable in commercial applications. Tests on a large variety of complicated problems have produced encouraging results and exhibited the versatility of this method. Future research into this method must inevitably focus on the following aspects: * iterative refinement of the solution through a posteriori error estimation and the use of hierarchical basis functions. Hierarchical basis functions use p refinement

132 instead of h refinement to adaptively correct the solution, maintaining the coarse mesh throughout. * further reduction in the number of unknowns through the use of 1. mixed node-based and edge-based elements. Since node-based elements usually have twice as few unknowns as edge-based elements, nodal basis can be employed in free-space regions to reduce unknown count. 2. derivation of higher order boundary conditions. This would allow the mesh termination boundary to be placed even closer to the target, thus reducing the number of unknowns dramatically - the possible asymmetry of the resulting equation system is an acceptable tradeoff. * more robust, geometry-dependent, iteratively refined numerical mesh termination conditions that would guarantee the accuracy of the final solution. In most cases, it is the mesh termination condition and its distance from the scatterer which determine the accuracy of the computation. Numerical, iteratively refined ABCs could theoretically be applied very close to the scattering or radiating target with excellent accuracy. * extension of the formulation to antenna radiation and electromagnetic interference(EMI) problems. The formulation for the antenna problem must include efficient source modeling and reliable simulation of the near-field for impedance calculations. EMI phenomena critical to electronic packaging can be predicted efficiently by the technique through proper modeling of the circuit or the device to be shielded. * more efficient mesh generation packages dedicated to electromagnetics.

133 Most of the extensions mentioned above are not trivial and each could form the core of another doctoral dissertation. However, the efficient realization of even a few of these topics would result in a very powerful technique for computing electromagnetic radiation or scattering from arbitrary three dimensional structures reliably and efficiently.

APPENDICES 134

135 APPENDIX A Derivation of matrix elements The derivation of the matrix elements in ( 4.7) amounts to evaluating the integrals in ( 4.4) and ( 4.5). Therefore, from ( 4.4), we have (V xWe). (V x Wj) = - g (A.1) IVe -( r-g.gV since V x W^ = 2gi. The evaluation of the integral in ( 4.5) is more cumbersome. Substituting into ( 4.5) the basis functions defined in ( 4.11), we obtain e'V Wf.W'dv = erv {(fi.fj) + (r.D) + (gi x r).(gj x r)} dv (A.2) = r(I1 +I2+I3) where D - (fi x gj) + (fj x gi) and Il = V fi.fjdv (A.3) 2 =' r.Ddv (A.4) 3 = J (gi x r).(gj x r)dv (A.5)

136 Since f is a constant vector, I1 reduces to Ii = f.f, V (A.6) Since 4 x = E Lixi i=l 4 y = Liyi i=1 4 z = ELizi i=l where Li are the shape functions for the tetrahedral element and xi, Yi, zi(i = 1,..., 4) denote the x, y and z co-ordinates of the vertices of the tetrahedral element. Using the standard formula for volume integration within a tetrahedral element and simplifying, we have I2 = j E x[D if+ DEyi+ D~ EZi (A.7) 4 i=1 i=1 i= J where Dm is the mth component of D. The evaluation of I3 can be simplified by the use of basic vector identities. Therefore, I3 gi.gJf Ir12 dv- (gi.r)(gj.r)dv = (giygjy + gizgjz) I x2dv + (gigj + gizgjz)' y2dv + (gixgjx + giygjy) J z2dv - (gixgjy + gjxgiy) v xydv - (gigjz + ggjiz) V zxdv - (giygjz + gjygiz) IV yzdv (A.8) where gim represents the mth component of the vector gi. Each of the volume integrals in the above equation can be easily evaluated analytically and the result expressed in the following general form:

137 alamdv = - [ aliami + Eal ami] (A.9) - Lt=i t=1 i=1 i where 1,m = 1,..,,3 and al represents the variable x, a2 stands for the variable y and a3 denotes the variable z.

138 APPENDIX B Derivation of some vector identities The curl of a vector in the Dupin coordinate system is given by OE VxE =VT E E + nx (B.1) an where VT x E is called the surface curl involving only the tangential derivatives and is defined as VT x E = -n x VEn + t2 KEt1 - tl2E2 + nV * (E x n) (B.2) In the above equation, KC and n2 denote the principal curvatures of the surface under consideration, Et1,Et2 are the tangential components and En is the normal component of the vector E on the surface. We are interested in the evaluation of the three vector identities given in Chapter 5. Let us consider simplifying the tangential components of the curl of a vector, E in this case. Using the definition of the curl given above, we have nfx VxE = -(n x n x V)En- lEtltl - 2Et2t2+ n x f x n VtEn _- On + lE) E tl + ( On + -K2Et2 t2 = VEn + fix VxE (B.3) where -(n x n x V) = Vt. The first vector identity is, therefore, easily proved.

139 Next, we will prove the second of the three identities. We start with the term n x V xVtE and simplify it using the definition of the curl of a vector given above. nxVxVtEn =- nxVx VEn-nfi n] [ an J 9 En = -n x Vxnfi On - ) (B.4) Since V. E = V Et + (V- n)En + M, we can simplify the above relation even further by substituting the appropriate expression for the normal derivative of the normal component of the electric field and using the fact that the electric field is divergence-free in a source-free region. n x VxVtEn = Vt(V * Et) + (V n)VtEn = Vt(V * Et) + 2KmVtEn (B.5) where Km- = (il + K2)/2 is the mean curvature. The proof of the third identity is more complicated since it involves two curl operations on the electric field. We first need to switch the positions of the outermost nx and the Vx operators to arrive at a simplified form of the rather complex expression. Therefore, n x Vx(n x VxE) = Vx {n x n x VxE}-nVt (n x VxE) + AKc(n x VxE) = -Vx {VxE - n (VxE)} -n lV, (n x VxE) +Ac(nf x VxE) (B.6) Now we use the fact that the electric field satisfies the wave equation to reduce the expression even further. n x Vx(n x VxE) = -koE + Vx {n(VxE)~} + konEn + A(n x VxE)

140 = Vx {n(VxE)n}- k2Et + AL(fi x VxE) (B.7) where A-i = l - 2. Thus, we have shown that all three identities hold as holongd as the vector, E in this case, is divergenceless and satisfies the vector wave equation.

BIBLIOGRAPHY 141

142 BIBLIOGRAPHY [1] R.F. Harrington. Field computation by moment method. Macmillan: New York, 1968. [2] E.K. Miller, L. Medgyesi-Mitschang, and E.H. Newman: eds. Computational Electromagnetics: Frequency-domain method of moments. IEEE Press, 1992. [3] A.F. Peterson. The interior resonance problem associated with surface integral equations of electromagnetics: numerical consequences and a survey of remedies. Electromagnetics, vol. 10, pp. 293-312, 1990. [4] J.R. Mautz and R.F. Harrington. H-field, E-field and combined-field solutions for conducting bodies of revolution. AEU, vol. 32, pp. 157-163, 1978. [5] J.D. Collins, J.M. Jin, and J.L. Volakis. Eliminating interior resonances in FE-BI methods for scattering. IEEE Trans. Antennas Propagat., vol. 40, pp. 1583-1585, Dec 1992. [6] B. Stupfel, R. Le Martret, P. Bonnemason, and B. Scheurer. Solution of the scattering problem by axisymmetrical penetrable objects with a mixed boundaryelement and finite-element method. Proc. JINA, pp. 116-120, 1990. [7] J.A. Stratton. Electromagnetic theory. McGraw Hill: New York, 1941. [8] W. Ritz. Uber eine neue Method zur L6sung gewissen Variations - Probleme der mathematischen Physik. J. Reine Angew. Math., vol. 135, pp. 1-61, 1909. [9] S.G. Mikhlin. Variational methods in mathematical physics. Macmillan, New York, 1964. [10] O.C. Zienkiewicz. The finite element method. McGraw Hill, New York, 3rd edition, 1979. [11] P.P. Silvester and R.L. Ferrari. Finite elements for electrical engineers. Cambridge University Press, 2nd edition, 1990. [12] A. Bossavit and J.C. Verite. A mixed FEM-BIEM method to solve 3D eddy current problems. IEEE Trans. Magnetics, vol. 18, pp. 431-5, Mar 1982.

143 [13] J.P. Webb. Edge elements and what they can do for you. IEEE Trans. Magnetics, vol. 29, pp. 1460-1465, 1993. [14] H.R. Schwarz. Finite element methods. Academic Press, 1988. [15] Z.J. Cendes and P. Silvester. Numerical solution of dielectric loaded waveguides: I - Finite element analysis. IEEE Trans. Microwave Theory Tech., vol. 118, pp. 1124-1131, 1970. [16] X. Yuan, D.R. Lynch, and K. Paulsen. Importance of normal field continuity in inhomogeneous scattering calculations. IEEE Trans. Microwave Theory Tech., vol. 39, pp. 638-642, Apr 1991. [17] H. Whitney. Geometric Integration Theory. Princeton University Press, 1957. [18] J.C. Nedelec. Mixed finite elements in R3. Numer. Math., vol. 35, pp. 315-41, 1980. [19] M. Hano. Finite element analysis of dielectric-loaded waveguides. IEEE Trans. Microwave Theory Tech., vol. 32, pp. 1275-9, Oct. 1984. [20] G. Mur and A.T. deHoop. A finite element method for computing threedimensional electromagnetic fields in inhomogeneous media. IEEE Trans. Magnetics, vol. 21, pp. 2188-91, Nov 1985. [21] J.S. van Welij. Calculation of eddy currents in terms of H on hexahedra. IEEE Trans. Magnetics, vol. 21, pp. 2239-41, Nov 1985. [22] M.L. Barton and Z.J. Cendes. New vector finite elements for three-dimensional magnetic field computation. J. Appl. Phys., vol. 61, no. 8, pp. 3919-21, Apr 1987. [23] J.F. Lee, D.K. Sun, and Z.J. Cendes. Full-wave analysis of dielectric waveguides using tangential vector finite elements. IEEE Trans. Microwave Theory Tech., vol. 39, no. 8, pp. 1262-71, Aug 1991. [24] D.H. Schaubert, D.R. Wilton, and A.W. Glisson. A tetrahedral modeling method for electromagnetic scattering by arbitrarily shaped inhomogeneous dielectric bodies. IEEE Trans. Antennas Propagat., pp. 77-85, Jan 1984. [25] D.R. Tanner and A.F. Peterson. Vector expansion functions for the numerical solution of maxwell's equations. Microwave Opt. Tech. Lett., vol. 2, no. 2, pp. 331-334, 1989. [26] J.F. Lee, D.K. Sun, and Z.J. Cendes. Full-wave analysis of dielectric waveguides using tangential vector finite elements. IEEE Trans. Microwave Theory Tech., vol. 39, no. 8, pp. 1262-1271, Aug 1991. [27] Z.J. Cendes. Vector finite elements for electromagnetic field computation. IEEE Trans. Magnetics, vol. 27, no. 5, Sep 1991.

144 [28] J.M. Jin and J.L. Volakis. Electromagnetic scattering by and transmission through a three-dimensional slot in a thick conducting plane. IEEE Trans. Antennas Propagat., vol. 39, pp. 543-50, Apr 1991. [29] A. Bossavit. Whitney forms: a class of finite elements for three-dimensional computations in electromagnetism. IEE Proceedings, vol. 135, pt. A, no.8, Nov 1988. [30] J.F. Lee, D.K. Sun, and Z.J. Cendes. Tangential vector finite elements for electromagnetic field computation. IEEE Trans. Magnetics, vol. 27, no. 5, pp. 4032-5, Sep 1991. [31] J.S. Wang and N. Ida. Curvilinear and higher order'edge' finite elements in electromagnetic field computation. IEEE Trans. Magnetics, vol. 29, no. 2, pp. 1491-4, Mar 1993. [32] J.P. Webb and B. Forghani. Hierarchal scalar and vector tetrahedra. IEEE Trans. Magnetics, vol. 29, no. 2, pp. 1495-8, Mar 1993. [33] A.F. Peterson and S.P. Castillo. A frequency-domain differential equation formulation for electromagnetic scattering from inhomogeneous cylinders. IEEE Trans. Antennas Propagat., vol. 37, no. 5, pp. 601-607, May 1989. [34] R. Mittra and 0. Ramahi. Absorbing boundary conditions for the direct solution of partial differential equations arising in electromagnetic scattering problems. Elsevier:New York, 1990. Chapter 4 in Finite Element and Finite Difference Method in Electromagnetic Scattering ed. M.A. Morgan. [35] B.M.A. Rahman and J.B. Davies. Penalty function improvement of waveguide solution by finite elements. IEEE Trans. Microwave Theory Tech., vol. 32, pp. 922-28, Aug 1984. [36] J.P. Webb. Finite element analysis of dispersion in waveguides with sharp metal edges. IEEE Trans. Microwave Theory Tech., vol. 36, no. 12, pp. 1819-1824, Dec 1988. [37] Steven H. Wong and Z.J. Cendes. Combined finite element-modal solution of three-dimensional eddy current problems. IEEE Trans. Magnetics, vol. 24, no. 6, Nov 1988. [38] A. Bossavit. Solving maxwell's equations in a closed cavity, and the question of spurious modes. IEEE Trans. Magnetics, vol. 26, no. 2, pp. 702-705, Mar 1990. [39] X. Yuan. Three dimensional electromagnetic scattering from inhomogeneous objects by the hybrid moment and finite element method. IEEE Trans. Antennas Propagat., vol. 38, pp. 1053-1058, 1990. [40] G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1985.

145 [41] L.W. Pearson, A.F. Peterson, L.J. Behrmasel, and R.A. Whitaker. Inwardlooking and outward-looking formulations for scattering from penetrable objects. IEEE Trans. Antennas Propagat., vol. 40, pp. 714-720, 1992. [42] J. Angelini, C. Soize, and P. Soudais. Hybrid numerical method for harmonic 3D Maxwell equations: Scattering by a mixed conducting and inhomogeneous anisotropic dielectric medium. IEEE Trans. Antennas Propagat., vol. 41, no. 1, pp. 66-76, Jan 1993. [43] J. D'Angelo and I.D. Mayergoyz. Three-dimensional RF scattering by the finite element method. IEEE Trans. Magnetics, vol. 27, no. 5, pp. 3827-3832, Sep 1991. [44] I.D. Mayergoyz and J. D'Angelo. New finite element formulation for 3D scattering problem. IEEE Trans. Magnetics, vol. 27, no. 5, pp. 3967-3970, Sep 1991. [45] A.F. Peterson. Absorbing boundary conditions for the vector wave equation. Microwave and Opt. Techn. Letters., vol. 1, pp. 62-64, Apr 1988. [46] J.P. Webb and V.N. Kanellopoulos. Absorbing boundary conditions for finite element solution of the vector wave equation. Microwave and Opt. Techn. Letters, Vol. 2, No. 10, pp. 370-372, Oct 1989. [47] J. D'Angelo and I.D. Mayergoyz. Finite element methods for the solution of RF radiation and scattering problems. Electromagnetics, Vol. lOpp. 177-199, 1990. [48] T.B.A. Senior. Combined resistive and conductive sheets. IEEE Trans. Antennas Propagat., Vol. AP-33, pp. 577-579, 1985. [49] A.D. Yaghjian and R.V. McGahan. Broadside radar cross-section of a perfectly conducting cube. IEEE Trans. Antennas Propagat., vol. 33, no. 3, pp. 321-329, Mar 1985. [50] Courtesy of Northrop Corp., B2 Division, Pico Rivera, CA. [51] D.R. Kincaid and T.C. Oppe. ITPACK on supercomputers. Numerical Methods, Lecture Notes in Mathematics, vol. 1005, pp. 151-161, 1982. [52] G.V. Paolini and G. Radicati di Brozolo. Data structures to vectorize CG algorithms for general sparsity patterns. BIT, vol. 29, pp. 703-718, 1989. [53] M.R. Hestenes and E.Stiefel. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand., vol. 49, pp. 409-436, 1952. [54] P. Sonneveld. CGS, a fast solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., vol. 10, pp. 35-52, 1989.

146 [55] R. Freund. Conjugate-gradient type methods for linear systems with complex symmetric coefficient matrices. SIAM J. Sci. Stat. Comput. vol. 13, pp. 425448, 1992. [56] H.P. Langtangen. Conjugate gradient methods and ILU preconditioning of nonsymmetric matrix systems with arbitrary sparsity patterns. Int. J. Numer. Meth. Fluids, vol. 9, pp. 213-233, 1989. [57] J.R. Lovell. Hierarchical basis functions for 3D finite element methods. ACES Digest, pp. 657-63, 1993. [58] E. Rothberg and A. Gupta. Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck. Parallel Computing, vol. 18, pp. 719-741, 1992. [59] E. Anderson and Y. Saad. Solving sparse triangular linear systems on parallel computers. International Journal of High Speed Computing, vol. 1, no. 1, pp. 73-95, 1989. [60] D. Windheiser, E. Boyd, E.Hao, S.G. Abraham, and E.S. Davidson. KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver. In Proc. of the 7th International Parallel Processing Symposium, Apr 1993. Newport Beach. [61] C.T. Tai. Generalized vector and dyadic analysis. IEEE Press, New York, 1992. [62] D.S. Jones. An improved surface radiation condition. IMA J. Appl. Math., vol. 48, pp. 163-193, 1992. [63] C.H. Wilcox. An expansion theorem for electromagnetic fields. Comm. Pure Appl. Math., vol. 9, pp. 115-134, May 1956. [64] S.M. Rytov. Computation of the skin effect by the perturbation method. J. Exp. Theor. Phys., vol. 10, pp. 180, 1940. Translation by V. Kerdemelidis and K.M. Mitzner. [65] A. Chatterjee, J.M. Jin, and J.L. Volakis. Edge-based finite elements and vector ABCs applied to 3D scattering. IEEE Trans. Antennas Propagat., vol. 41, no.2, pp. 221-226, Feb 1993. [66] V. Shankar and et al. CEM problems using finite differences. Rockwell Technical Report, April 1993.