374794-1-T COMPARISON OF THREE FMM TECHNIQUES FOR SOLVING HYBRID FINITE ELEMENTBOUNDARY INTEGRAL METHODS Sunil Bindiganavale John Volakis January 1997 374794-1-T = RL-2536

Comparison of three FMM techniques for solving hybrid FE-BI systems Sunil S. Bindiganavale and John L. Volakis Radiation Laboratory Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, MI 48109-2122 Abstract By virtue of its low operation count, the application of the fast multipole method results in substantial speed-up of the boundary integral portion of the hybrid finite element-boundary integral technique, independent of the shape of the BI contour. Recently, various versions of the fast multipole method have been proposed, each introducing a different approximation to the implementation of the boundary integral. The main goal of the paper is to provide a comparison of the various FMM approaches on the basis of implementation, CPU time and accuracy. To gain an appreciation of the differences among the various FMM methodologies, a large portion of the paper is devoted to a discussion of the algorithms at a tutorial level. Flow charts and pseudocodes are also given at sufficient detail to facilitate their implementation. We present quantitative comparisons and draw conclusions using the scattering by a groove as the basis for comparison. Finally, we present a three dimensional application for which the hybrid technique is very well suited. 1

1 Introduction The finite element-boundary integral (FE-BI) method has been quite popular and extensively applied to many scattering and radiation problems. The method [1] combines the geometrical adaptability and material generality of the FEM with the rigorous boundary integral (BI) for truncating the mesh. Nevertheless, although "'exact", the FE-BI leads to a partly full and partly sparse system which is computationally intensive for large boundary integrals. When the boundary is rectangular or circular, the FFT can be used to reduce the memory and CPU requirements down to N log N [1][2]. However, in general, the boundary integral is not convolutional and in that case the CPU requirements will be O(NIVb), where iVb denotes the unknowns on the boundary. The application of the fast multipole method enables the computation of the boundary matrix-vector product using O(A1Vb5) operations per iteration [3]. Multilevel schemes can also be employed to reduce the operation count down to O(Vb 33) [4]. In this paper, we apply three different versions of the Fast Multipole Method (FMM) to reduce the storage and computational requirements of the boundary integral when the latter is used to terminate the finite element mesh. By virtue of its low operation count, the application of the FMM results in substantial speed-up of the boundary integral portion of the code independent of the boundary shape. Each version of the FMM is associated with inherent approximations and a goal of this paper is the evaluation of these approaches in terms of CPU requirements and accuracy. 2 Problem Definition and Formulation As an application of the proposed study we consider the scattering by a cavity-backed aperture shbwn in Figure 1. The FE-BI formulation for this problem was already outlined in [2] and results in the system [AInt] [Ax] 0 } {q} } } ( 1 L [AT] [B] \ | { ] ~ { inc } J () whose typical form is given in Figure 2. For Hz-incidence, the vector {q} represents the magnetic field at the nodes within the groove and on the boundary whereas {4} is proportional to the magnetic current (or tangential electric field) on the boundary cells. By virtue of the finite element method, the matrices [AInt] and [Ax] are sparse and thus the corresponding matrix-vector products are implemented using 0(N) operations. However [B] is a full sub-matrix and thus O(Nb) operations are needed to perform the product [B]{4} with Nb denoting the number of nodes on the cavity aperture. Consequently, in an iterative solution, this matrix-vector product becomes the computational bottleneck. To reduce the operation count we can employ the FMM procedure for implementing the products [B]{I}. Next we examine three versions of the FMM to accelerate the boundary integral matrixvector product computation. Specifically, the exact FIMM [5][6], a windowed FMM [7] and an approximate FMM [8] are examined. 2

Jd. M. i A y (Boundary/aperture) / rI d: + (^pec w FEM domain Figure 1: Geometry of the groove recessed in a ground plane 3 FMM Techniques As stated above, the FMM is a fast method for calculating the mnatrix-vector products resulting from a discretization of a boundary integral. For the system (1) the pertinent matrix vector product is [B]{}i obtained from the discretization of the integral equation (for TE incidence) H-()H pI e r (2) 2H -- - Ikod1' - -, F where Hz denotes the incident field on the aperture r and HFEM is the magnetic field in the FEM domain evaluated just below the aperture. Since Hs=- 1 j()H2)(koLj- /)dl' p r (3) 2 r is the scattered field it follows that 4' = ko}oMizI = koYoE, and En is the x-component of the electric field on the aperture. It is readily identified that (2) enforces continuity of the magnetic field across the aperture (i.e. across the interior FEM domain and the exterior region). The continuity of the corresponding tangential electric field components is enforced by choosing the excitation of the FEM domain to be the negative of the magnetic current across the aperture just below F as described in [2]. In the next few subsections we examine the evaluation of the integral (3) using the various forms of the FMM. By using consistent notation and appropriate explanation of the procedures, this exposition provides the first comparative look at the characteristics of the FMM techniques. Consequently, one is able to make conclusions on their CPU performance and accuracy compromises. These conclusions are later supported y numerical data referring tby numerical data referring to the example of scattering by a large aperture groove. 3

0 II I I I '2%. ' h.I '~.~ ~..~ 20 X '. "h 40 ZL 60 260. X 151 80 K BI System 80 100 nF. 10........................ --- 120o 20 40 60 80 100 120 column Figure 2: Typical form of the FE-BI system matrix arising from the scattering/radiation problem of a groove in a ground plane. 3.1 The Fast Multipole Method In accordance with the FMM (see Figure 4), the Nb boundary unknowns introduced for the discretization of (3) are subdivided into groups with each group assigned Mb unknowns. Thus, a total of Lb?.- groups are constructed. The key step in all FMM procedures is Mb to rewrite the integral (3) as a product of terms each being a function of ~ (observation point) or ~' (integration point) but not both. In this manner, the evaluation of the integral is carried out by considering the group to group interactions separately from the intergroup interactions. Beyond the math, this breakdown of interactions/operations can be viewed in the context of the manager-worker model. Basically, we can view each group as managed by the center element with the workers comprising the elements of the group. Communication/interaction among the groups takes place through the managers which in turn interact with the group elements. However, this type of model is based on certain simplifications/decompositions of the original boundary integral. The decompositions reduce the direct interdependence of each group member with the other elements belonging to different groups and this is at the heart of the CPU speed-up afforded by FMM. However there are inherent approximations as part of the group decomposition process which must be understood in order to assess the accuracy of each FMM algorithm. Below we discuss the decomposition procedure employed in three of the fast multipole methods. 4

Basis elements / / Test element / 0- C \ xI P r f - -. - -—. - Group center Group r \ K ) ^/, (Test group) /.\ Group ( Source element H __- h (Source group) Y - Global origin. Group center Figure 3: Computation of the boundary integral matrix vector product using exact FMM 3.2 Exact FMM To achieve the decomposition of (3) into a product of functions in p and p', we first invoke the addition theorem to rewrite the Hankel function as Q/2 H oI (kolp7- + PI- pi I) ' E J (kolp, — l)H(2)(kopl,)eJl("'t-.'oPP) p,,. > I - p-I (4) n=-Q/2 where Pa,' denotes the distance between the centers of the I and 1' groups as illustrated in Figure 3 and 4. Also, qll and OGp are the angles between the vectors pill and p,- P and the x-axis respectively. The source and observation points p', and N7 have their origin at the center of the 1' and I groups, respectively, while p' and o are measured from the origin. Typically, the semi-empirical formula Q/2 = koD + 51n(koD + r) (5) is used to truncate the sum (4), where D is the diameter of the circle enclosing the groups In general, Q/2 = hMb, assures convergence. Next we introduce the Fourier integral of the Bessel function Jn(ko I p - I) = ek ( - 2)d (6) and in conjunction with (4) we can now rewrite (3) as Hk(p) = - j V' ( )Till'()e-Jke d (7) 47 J7 T 5

1 i I / Group P - N_,, Group Group,'Group y N _J I I K+ - I b d mn K,' K-Il P min Far group | P I< d min = Near group Figure 4: Computation of the boundary integral matrix vector product using exact FMM where k = ko(x cos q + y sin $) is measured from the x-axis. In this, V,() = M(e d' (8) is identified as the far-field pattern of the source group and Q/2 T1l,() = E H,2)( kopl, )e-jn(-L't+7r/2) (9) n=-Q/2 is referred to as the translation operator providing the group-to-group (I to 1') interactions. From (7)-(9), we observe that we have accomplished the decomposition of the integral (3) into terms which separate out the dependence on p and ~'. The final evaluation of Hz proceeds by discretizing the integral over ( to yield the expression Hs 'p) - k Y~) e (10) q=1 which is the radiated fields from some location in the source group 1' to a point within the receiving group 1. Note that Ao = 2xr/Q indicates the angular spacing between the propagation vectors of plane waves emanating from a group. Thus q- = qAo, q = 1... Q, whereas kq = ko(x cos qq + y sin qq). As mentioned earlier, the number of plane wave directions is set equal to twice the number of elements in the group (Q = 2Mb), thus, satisfying the Nyquist sampling theorem with respect to the integration over q. Given the above steps, the exact FMM procedure for carrying out the matrix vector product can be summarized as follows: 1. Compute pattern of the source group (aggregation). Mathematically, this corresponds to evaluating Vi'(Obq) given in (8). The evaluation of VI/(Oq) for a single source group and at a single direction requires NIb operations, corresponding to the number of elements in the group (the integration over the line segment is performed as a summation). Consequently for Lb groups and Q directions for each group, the operation count is Q MbLb. 6

2. The next step is to employ the translation operator to evaluate the pattern of a source group at the center of the test group. Mathematically, this operation amounts to computing the coefficient Ai(0q) = I/l,(O)Tl,(q). The evaluation of Ai(dq) involves an operation count of QL~, where again Lb denotes the number of groups and Q is the number of directions. 3. Finally, at the receiving group, the fields are redistributed (disaggregation). Mathematically, this amounts to computing the expression HS(T) as given in (10). Evaluating the sum at a single point requires Q operations. Thus for Lb groups, each containing MIb unknowns, the operation count is QLbMb. From the above we conclude that the total operation count of the above three steps is ClQMlbLb + C2QL2. Also, the near field (by this we mean that groups in the near vicinity of each other are treated using the standard method of moments procedure) operation count is of O(Nb4Ib). On choosing, Q ~ Mb, sufficient to achieve convergence, the operation count of the three steps reduces to Cl MbNb + C2.- On setting M b N/b, implying Lb = vNb (an optimal choice) the final operation count is NJ'5 and this should be compared to the usual NV operation count of the direct CG or LU solvers. The reduction of the operation count from O(Nb) down to O(N'b5) is indeed dramatic. An appreciation of the CPU reduction can be acquired by setting, for example, Nb = 2000 which is a relatively small number of elements. However, further improvements can still be achieved by nesting groups leading to the multi-level FMM [9]. 3.3 Windowed FMM In the exact FMM, the translation operation between groups assumed isotropic radiation. However, it is suggestive that the groups would interact strongly along the line joining them and less so in other directions. Indeed, it was shown in [7] that the translation operator could be contemplated as composed of a geometrical optics (GO) term (along the line joining the source and test group) and two diffraction terms associated with the shadow boundaries of the GO term. To illustrate the validity of this concept, we plot in Figure 5 the translation operator for different group separation distances along the groove of width 50A. For this example, the number of unknowns on the boundary was 750, resulting in 27 groups. As seen, the "lit" region of the translation operator narrows as the group separation distance is increased, eventually displaying the predictable sinc function behavior for large group separation distances. The tapering off of the translation operator from a value oscillating around 2 down to zero for larger q - opil values is characteristic of the geometrical optics plus diffraction terms in the context of traditional high frequency methods. We may also comment that-this high frequency model enables the identification of a lit region even for groups which are not widely separated (for example, see Figure 5 for the translation operator between groups 1 and 3. The key characteristic of the windowed FMM is the exploitation of the diminished value of Twlr($) for large q - 11,. Basically, in the windowed FMM, the computation of T(ll(0) for these angles is avoided altogether. This can be accomplished by multiplying Til(o) with the filter (windowing) function VVuI(bq) = { 1 (I|q - 11, I < as) 1' ( = -0 - q-0- s )2 (| - P'[ > is) 7

(11) where =sin-(12) 2koPtt, and c is a taper factor to be specified. Note also that I3s was selected to provide a larger bandpass window when p1l\ is smaller as dictated from high frequency analysis. The discretized plane wave expansion can now be written as Hs = - O Vlt ( q ) Tr ( q) VI/ ( ) e- q T7 (1 3) q=l By taking into account only the non-zero sector of Wll(0), the operation count of the translation process is now reduced to C3L' ~ - with the corresponding total operation count b given by Cl.MVbN + C4 ". Grouping the unknowns into NV/3 elements per group, results in a total operation count of O(N4/3). This should be compared with the O(N3/2) operation count of the exact FMM. The computation of the boundary integral matrix vector product by employing the windowed FMM is depicted pictorially in Figure 6 illustrating that the filter function has the effect of eliminating plane wave interactions at directions away from the line joining the interacting groups. 3.00 - ~ - Groups 1 and 3 -4 2.50 ft / - 54;P,=5 2- 0 Q=54;p,=15 1.00-.00( -180 -120 -60 0 60 12 180.- O (degrees) Figure 5: The Translation operator for different groups on the boundary of a 50A wide groove; 750 BI unknowns; 27 groups 8

1 /=0 ' iI.~' Group NT iIVUP I / / ^ VIVU b _' dmin K+ K-i O " ----" x —J...'1 " --- —' Z X!I I x Window ~' ';x I> dmin i/ ', _min_ Far group I P,< d min Near group Figure 6: Computation of the boundary integral matrix vector product using windowed FMM 3.4 Fast Far Field Algorithm This is an approximate version of the FMM since the algorithm is based on introducing the large argument approximation of the Hankel function. That is, the approximation U(2) j e-_jk 'pj, - p Ho-( ko )P- p e~-jkop;L p klm A — j ''jk0p-' (14) V o pkop' 1 is used. As shown in Figure 7, pill is the distance between the center of the test group I and the center of the source group l'; pnI is the distance between the nth source element and its group center and pim is the distance between the mth test element and its group center. Pn, - / Group Group,,Group Group K+l K,' K-l N,.- 7... d min N -< e9 rnun --- - ----- / I I / I I I I \ / I I I -4 \ n ': m Figure 7: Computation of the boundary integral matrix vector product using the FAFFA The introduction of the large argument expansion necessitates that the FMM procedure be used only for groups which are very well separated. However, (14) allows for the immediate decoupling of the test-source element interactions, thus, enabling the computation of the 9

matrix-vector product for far-field groups with a reduced operation count. This is illustrated below by going through the three steps of the FMM 1. The aggregation of source elements in a single source group now involves.11b operations, corresponding to the number of elements in the source group. Specifically, Nib Vil= EMPe-n0 l (15) j=l and since the above aggregation needs to be done for all source groups, the operation count becomes O( —b 1iIb) ~ O(Nb), where Nb represents the total number of groups. Also this operation, being dependent only on the test group rather than the test element, needs to be repeated for all Q = Nb test groups leading to a total operation Nib count of O(i-b-) for aggregation. It should be noted that use of the large argument expansion, rather than the addition theorem for the Hankel function, results in the aggregation sum being a function of the test group also (Vis) unlike the exact FMM where the aggregation sum is a function of source group only (V1'(q)). Thus, the technique by which the Exact FMM and the FAFFA reduce the operation count differ in the fact that while in the exact FMM the aggregation sum is characterized by a source group (') and a direction (o) which is not interwined with the test group direction, the aggregation sum in the FAFFA is characterized by the source group (1') and the test group (1). 2. The main advantage of FAFFA is due to the faster computation of the translation operator. We have Al/m = T11tVIlt (16) where in the FAFFA the translation operator simplifies to e -jkop~p This should be compared to the sum (9) for the exact FMM. Clearly, (17) needs to be N2 done only at the group level and involves 0(A-b) operations for all possible test and source group combinations, making it the least computationally intensive step. 3. The disaggregation or redistribution process is again the operation ) y Nb/Mb H()- o o2 E Ale-Jkop.pm (18) 2 =l Since this operation involves only the source group instead of the source element, it needs to be done for each source group, implying 0(- -) operations to generate a single row of the matrix-vector product. To generate Mb rows, corresponding to a test group, the operation count would be O(Nb). With -N-b test groups, the operation count is MbConsolidating the above three steps for the FAFFA algorithm we have Consolidating the above three steps for the FAFFA algorithm we have Op.count,, Ci NbMb + C2 - (19) Mlb 10

where the first term refers to the operations associated with the near field terms. As before,,lib = /Vb and the total operation count is O~ N,'5. While, the operation count for this algorithm could be further reduced down to AN'33 by performing the process of "interpolation" and "anterpolation" as described in [8] for very large objects, we found that the accuracy deteriorated for the considered applications. Hence only the O, Vb J'5 version was used. 4 Logic Flow The operation counts described in the previous section for the various algorithms are illustrated with the help of flow diagrams and sections of code from the computation of the matrix vector products for the far groups. Figures 8 and 11 depict the flow diagram and code for computing the matrix vector product in the exact FMM. It is seen in Figure 11 that each of the aggregation, translation and disaggregation operations consists of a single multiplication which is described below * The aggregation operation consists of the product of an entry of the trial vector represented as (Dum(J) in Figure 11) with an aggregation factor, represented in Figure 11 for the Jth element and Kth direction as SrcGc(J,K). This is given by SrcGc(J,K) = AJe kOPCOsO+1sinK)PJG where Acj is the length of the Jth discretization element, O4K is the Kjth radiation direction and p'jG is the direction vector of the Jth element, measured from the center of the group(JGr) it belongs to. The result of the aggregation operation yields a term characterized by only the source group and radiation direction (represented in Figure 11 as V(JGr,K)). * The translation operation involves the multiplication of the aggregation sum, V(JGr,K), with a translation factor, represented in Figure 11 for the IGrth test group, JGrth source group and Kth radiation direction by Trans(IGr,JGr,K). This is given by K/2 Trans(IGr,JGr,K) = E e-jn(H-nIGr.JGr+ )H7(2)(koriGrJGr) where O-K is the Kth n=-K/2 radiation direction, rIGr,jGr and OIGr,JGr are the distance and angle between the IGrth and JGrth groups. The result of the translation operation yields a term dependent only on the test group and radiation direction (represented in Figure 11 as GrGr(IGr,K)). * The disaggregation operation involves the multiplication of the translation sum, GrGr (IGr, K), with a disaggregation factor, represented in Figure 11 for the Ith test element and the Kth radiation direction by TestGc(I,K). This is given by TestGc(I,K) = e-jko(cosK+ sinK) -p-1G where PlGr is the direction vector of the Ith element measured from the center of the group(IGr) it belongs to. The result of the disaggregation operation yields a term dependent only on the test element alone and is the contribution to the Ith entry of the product vector. The windowed FMM differs from the exact FMM in the translation phase and this is illustrated in Figures 9 and 12. These figures illustrate that the windowed FMM achieves its reduced operation count by eliminating some of the directions in which plane wave interaction takes place. The innermost loop in the translation phase has an operation count which is a constant (15-25 in our simulations, 20 in [7]) and is a significant reduction from the corresponding operation count in the exact FMM. 11

The technique by which the FAFFA achieves its speed-up is depicted in Figures 10 and 13. It is seen that the FAFFA "recycles" the plane wave spectra of the source group. For a given test group, the aggregation and translation operation are performed only once for each source group, necessitating that only the disaggregation operation needs to be performed for each individual element of the test group. Similar to the exact FMIM, the aggregation, translation and disaggregation process consist of a single multiplication. However the factors used in the three processes and the method by which the reduced operation count is achieved are different. * The aggregation operation again consists of the product of an entry of the trial vector with an aggregation factor, represented in Figure 13 for the Jth element and IGrth test group as SrcGc(IGr,J). This is given by SrcGc(IGr,J) = Je-jikO JGr, IGr'-JJGr where AJ is the length of the Jth discretization element, pJGrGr is the unit vector along the line joining the source and test groups while PJJGr is the vector along the line joining the source element with its group center. Thus, an aggregation sum is formed for each combination of source and test groups. * The translation operation involves the multiplication of the aggregation sum with a translation factor, represented in Figure 13 for the IGrth test group and JGrth source group by Trans(IGr,JGr) and given by Trans(IGr,JGr) = e,JGrIGr Again, the ko PJGr,IGr translation operation needs to be done for each pair of test and source groups. * The disaggregation operation involves the multiplication of the translation sum, GrGr(JGr), with a disaggregation factor, represented in Figure 13 for the Ith test element and the JG'rh source group by TestGc(I,JGr). This is given by TestGc(I,JGr) = e-ojlOJk Gr,IGrPIGr, where PIGrI is the vector along the line joining the test element with its group center. It should be noted that to compute the interactions between a pair of groups, the aggregation and translation need to be done only once and thus the crux of the FAFFA is indicated with highlighted sections of code in Figure 13. 5 Results A computer code based on the above FMM formulations and utilizing the conjugate gradient solver was implemented and executed on a HP 9000/750 workstation with a peak flop rate of 23.7 MFLOPS. The geometry considered was the rectangular groove shown in Figure 1. Table 1 compares the execution time and RMS error [10] of the standard FE-BI to the FEExact FMM, FE-FAFFA and the FE-Windowed FMM (FE-WFMM) for grooves of widths 25A, 35A and 50A. The depth of the groove was 0.35A with a material filling of 6r = 4 and [lr=1 and was illuminated at normal incidence. The data reveal that the FE-FMMExact offers almost a 50% savings in execution time with almost no compromise in accuracy. While the FE-FAFFA is the fastest of the three algorithms, the RMS error was substantially higher (>1 dB). If the maximum tolerable RMS error is set at 1 dB'[10], the FE-Windowed FMM is the most attractive option since it meets the error criterion and is only slightly slower than the FE-FAFFA. Of interest is the comparison of the residual error as a function of the 'To our experience, this error criterion gives a calculated pattern that is visually in excellent agreement with the exact. Typically, a 3 dB error criterion can lead to large deviations between calculated and exact data 12

CPU Time for BI (Minutes,seconds) Groove Total BI Width Unknowns Unknowns FE-BI (CG) FE-FAFFA FE-FMMIEact FE-WFMM 25A 2631 375 (8,48) (3,26) (5,25) (4,13) 35 A 3681 525 (16,34) (5,55) (10,31) (7,22) 50A 5256 750 (45,1) (14,31) (26,18) (16.10) RMS error (dB) Groove Width FE-FAFFA FE-FMMxat FE-WFIMM 25A 1.12 0.0752 0.6218 35A 1.2 0.1058 0.721 50A 1.36 0.1123 0.843 Table 1: CPU Times and RMS error of the hybrid algorithms number of iterations in the CG solver. Such a comparison is shown in Figure 14 and is seen that the curves for the FE-BI and the FE-FMMExact overlap to graphical accuracy whereas the FE-WFMM shows a very small deviation from the exact result. Thus, the hybridization of the FMM does not have any adverse effect on the condition of the.FE-BI system. The performance of the hybrid algorithms at a more stressing angle of incidence is depicted in Figure 15. We observe that the RMS error follows the same trend as for normal incidence illumination. For this example calculation the width of the groove was 10A. However, even for this smaller size aperture the scalability of the speed-up is maintained. The employed near group radius was IX and thus the matrix vector products for groups separated by a distance less than a wavelength was computed using the exact method of moments procedure. Smaller near-group distances can be employed to reduce the CPU time even further and near group distances down to 0.3A have been found to yield sufficiently accurate results. 6 Three dimensional applications of these techniques A three dimensional application for these hybrid algorithms is in the computation of the scattering from an aircraft nose radome. Nose radomes, which are hollow structures made of thin dielectric material, serve as enclosures for antennas and are generally pointed in shape to reduce aerodynamic drag. The performance of a radome is generally described by parameters [11] which include the far-field radiation pattern, power transmittance, boresight error and boresight error slope. Approximate methods for treating the propagation of the plane wave through a radome include high frequency techniques which typically consider the radome to be locally plane and omit guided waves as well as interactions between the radome sections. Also, treatments of the higher order interactions between the radome and the antenna have so far been of little attention. Clearly, a more accurate approach is to employ an exact analysis method such as the moment method technique [12] which is capable of including all first and higher order phenomena. However, due to the traditional 0(N2) storage and 0(N3) CPU requirements the method of moments approach can only be 13

used for small radome structures. However, as described in previous sections, fast algorithms can be emploved to considerably alleviate storage and execution time bottlenecks. The thin dielectric radomes were modeled using the electric field integral equation [13] and the resistive boundary condition [14]. The fast solver described in [8] was used to compute matrix-vector products. To validate the typical nose radome geometry, available scattering data for the Von Karman radome were used for reference. Although, this is a BOR structure, it should be remarked that our formulation was not specialized to this class of geometries. The generating curve for this radome (see Figure 16) is given by [15] r = v {cos ( - - sin [2cosl (- 20) r cs — (20) 2^7r I/ \ L / 2 L \ L/J with r 2 + y2 (21) and D is the diameter of the radome base whereas L is the length of the radome. The thickness of the dielectric shell was h = 0.05m and the dielectric constant was Er = 4.0. The resistive sheet condition (R = ko( e- )h) gives an equivalent resistivity of -j1.061. Discretization of this geometry at twelve points/wavelength, results in 187 nodes, 362 triangular facets and 549 edges (unknowns) as shown in Figure 16(b). The bistatic RCS with nose-on incidence is shown in Figure 16(c) and (d). Also shown in these figures is the comparison with a surface integral formulation [12]. As seen there is very good agreement between the two codes even though the RCS levels are very low at angles away from forward scattering. The incorporation of the fast algorithm alleviates the limitation on the size of the bodies analyzed as can be inferred from the results shown in Figure 17. In this figure a metallic nose radome 10A long with a circular base of diameter of 1A is analyzed. Again, a body of revolution was the geometry of choice because of the availability of validation data from CICERO [16]. Discretization of this radome results in 3120 nodes and 6204 triangular facets. Comparison of the backscatter RCS from the two codes (see Figure 17) reveals very good agreement for the 00 polarization while the 0o polarization shows some deviation particularly at the low RCS levels. The near group radius employed in the FMM solution was 2A and the number of unknowns (interior edges) was 9324. Solution times were 12 seconds/iteration for the FMMV radome code while the conjugate gradient solver employing the full, unreduced system would need an estimated 60 seconds/iteration on a HP 9000/750 with a peak flop rate of 23.7 MFLOPS. Estimates from scaling smaller problems indicate that LU decomposition would require an unrealistic 0.67 GB of memory for storage of the full matrix. An important advantage of integral equation analysis for the nose radome problem is that it avoids modeling the free space between the radome and nose antenna thus reducing the computational cost. Results which include the interactions between the nose-antenna and the radome are given in Figure 18. Unlike the Von Karman radome which was generated by an algebraic equation, the radome in Figure 18 was generated from user specified elliptical or circular cross sections, which were then interpolated with cubic splines to generate the cross section at any intermediate point along the axis. Figure 18 depicts the o09 backscatter cross-section of the dielectric nose radome alone and in the presence of a circular plate at the base of the radomne (nose antenna). The antenna is inclined with respect to the plane perpendicular to the radome axis. The effect of the antenna on the RCS is pronounced at incidence angles close to specular directions of the antenna plate and this was to be expected. 14

This is particularly seen in Figure 18 for the backscatter RCS patterns perpendicular to the plane with respect to which the antenna is inclined. Specifically the large peak observed in Figure 18 at 0 = 150~ when the antenna is inclined at 60~ with respect to the x-z plane is due to the large specular return at that angle of incidence. In all the above cases, the dielectric material was thin and could be modeled with the resistive boundary condition. Thicker dielectrics and frequency selective surfaces can be treated by using the finite element method to discretize the dielectric region and employing the fast solver on the boundary integral used to terminate the finite element mesh. 7 Summary We examined the computational performance of the finite element-boundary integral method using the fast multipole method for evaluating the BI matrix-vector products. Three different versions of the fast multipole method were used and it was shown that a considerable reduction in CPU time could be achieved with even further reductions for larger problem sizes. The FE-WFMM provided the best compromise in terms of speed and accuracy. A version of the fast multipole method was employed to compute the electromagnetic scattering from electrically large nose radome structures. The application of this technique while preserving to a great extent the accuracy of the moment method, significantly alleviates the limitations on the size of the bodies analyzed. We analyzed nose radome shaped structures which were composed of metal or thin dielectric shells and in the case of the latter the resistive boundary condition was employed. The hybridization of a technique such as the finite element method will enable the treatment of a wider class and thickness of materials and could thus handle radomes with frequency selective surfaces. References [1] J.L. Volakis, J. Gong, and A. Alexanian. A finite element boundary integral method for antenna RCS analysis. Electromagnetics, 14(1):83-85, 1994. [2] J.-M. Jin and J.L. Volakis. TM scattering by a inhomogeneously filled aperture in a thick conducting plane. IEE Proceedings, Part H, 137:153-159, June 1990. [3] S. S. Bindiganavale and J. L. Volakis. A hybrid FEM-FMM technique for electromagnetic scattering. IEEE Transactions on Antennas and Propagation, 45(1):180-181, 1997. [4] N. Lu and J. M. Jin. Application of the fast multipole method to finite elementboundary integral solution of scattering problems. IEEE Transactions on Antennas and Propagation, 44(6):781-786, 1996. [5] V. Rokhlin. Rapid solution of integral equations of scattering theory in two dimensions. Journal of Computational Physics, 86(2):414-439, 1990. [6] R. Coifman, V. Rokhlin, and S. Wandzura. The fast multipole method for the wave equation: A pedestrian prescription. IEEE Antennas and Propagation Magazine, 35(3):7-12, 1993. 15

[7] R.J. Burkholder and D.H. Kwon. High-frequency asymptotic acceleration of the fast multipole method. Radio Science, 31(5):1199-1206, 1996. [8] C.C. Lu and W.C. Chew. Fast far field approximation for calculating the RCS of large objects. Microwave and Optical Technology Letters. 8(5):238-241, 1995. [9] J.M. Song and W.C. Chew. Multilevel fast multipole algorithm for soving combined field integral equation of electromagnetic scattering. Alicrowave and Optical Technology Letters, 10:14-19, 1995. [10] S. S. Bindiganavale and J. L. Volakis. Guidelines for using the fast multipole method to calculate the RCS of large objects. Microwave and Optical Technology Letters, 11(4):190-194, 1996. [11] Y.T. Lo and S.W. Lee ed. Antenna handbook. Van Nostrand: New York, 1988. [12] E. Arvas and S. Ponnapalli. Scattering cross section of a small radome of arbitrary shape. IEEE Transactions on Antennas and Propagation, 37(5):655-658, May 1989. [13] S.M. Rao, D.R. Wilton, and A.W. Glisson. Electromagnetic scattering by surfaces of arbitrary shape. IEEE Transactions on Antennas and Propagation, 30(3):409-418, May 1982. [14] T.B.A. Senior and J.L. Volakis. Approximate boundary conditions in electromagnetics. IEE press: London, 1995. [15] J.D. Walton ed. Radome engineering handbook. Marcel-Dekker: New York, 1970. [16] J.M Putnam and L.N. Medgyesi-Mitschang. Combined field integral equation formulation for axially inhomogeneous bodies of revolution. Technical Report QA003, McDonnell Douglas research laboratories, St.Louis, MO 63166, December 1987. 16

Aggregation Translation Disaggregation Aggregation op. count (M) (M) () = NM Translation op. count (M) 2 (M) (M)(M = (MN Disaggregation op. count (M) (M) () = NM Figure 8: Sequence of operations to be performed in the Exact FMM 17

Translation op. count Set direction counter = I (Start U with the first radiation direction) ( ( ) No s windowed translaton operator > 0 or this direction? 15-25 N _N esOs Ops Perform translation operation for a single pair of source and test groups and for a single direction ( Operation count = I ) ncrement direction counter by one Is direction counter > No number of radiation directions (M)? Yes Increment source group counter by one Is source group counter > N total number of groups Yes Increment test group counter by one Is test group counter > NO \ total number of groups s N A? / "yYes End translation. Go to disaggregation. Figure 9: Sequence of operations to be performed in the translation process of the windowed FMM 18

Total op. count /k\N N N X\.fN\ ls lest element counter > NO (M +1) M + tal number of elements Mroup (M) = M N) ( M ( 2 Yes | Increment test group counter 4 4 by one Aggregation Translation Disaggregation Is test group counter > No total number of groups \ (N)? Yes Matrix-vector product done Figure 10: Sequence of operations to be performed in the FAFFA 19

c NGr - Number of groups; IQAngles - Number of radiation directions c DPhi - Angular spacing between radiation directions c NElGr(NGr) - Array containing number of elements in each group c GetGlobal - function which gets the global element # given a group c number and local element number c DMin - Minimum distance beyond which two group are treated as far c groups c Distance - Function which returns the distance between groups c Dum - Vector multiplying the matrix - in the CG algorithm c will be the search and residual vectors c AX - Array which is the matrix-vector product c SrcGc,Trans & TestGc - aggregation, translation & disaggregation factors c respectively described in the text IQAngles = 2*NGr DPhi = 2*Pi/IQAngles c Aggregation - Operation count O(NM) do JGr =,NGr do K = l,IQAngles do JEl =,NElGr(JGr) J = GetGlobal(JGr,JEl) if (S.eq.'*') then N M V(JGr,K) = V(JGr,K) + Dum(J)*conjg(SrcGc(J,K)) * else V(JGr,K) = V(JGr,K) + Dum(J)*SrcGc(J,K) endif enddo enddo enddo c Translation - Operation count O(N^2/M) | do IGr = l,NGr do JGr = l,NGr if (Distance(IGr,JGr).gt.DMin) then do K = l,IQAngles if (S.eq.'*') then N N GrGr(IGr,K) = GrGr(IGr,K) + conjg(Trans(IGr,JGr,K))*V(JGr,K) __ M_ else M M \ GrGr(IGr,K) = GrGr(IGr,K) + Trans(IGr,JGr,K) * V(JGr,K) endif enddo endif enddo enddo Ic Disaggregation - Operation count O(NM) I do IGr= I,NGr do K = 1,IQAngles do El = 1,NEIGr(IGr) I = GetGlobal(IGr,IEl) if(S.eq.'*')then N M AX(I) = AX(I) + GrGr(IGr,K)*conjg(TestGc(I,K)) M 1 else M | AX(I) = AX(I) + GrGr(IGr,K)*TestGc(I,K) endif enddo enddo enddo Figure 11: Code indicating the computation of the matrix-vector product in the Exact FMM 20

c The symbols in this section of code represent the c same quantities as in the code for the exact FMM c Translation - Operation count O(N^2/M^2) do IGr = l,NGr do JGr = 1,NGr if (Distance(IGr,JGr).gt.DMin) then do K = 1,IQAngles if (S.eq.'*') then if (abs(Trans(IGr,JGr,K)).eq.0) then continue else |GrGr(IGr,K) = GrGr(IGr,K) + conjg(Trans(IGr,JGr,K)) * V(JGr,K) N N ____endif__ - - else M M if (abs(Trans(IGr,JGr,K)).eq.0) then continue else GrGr(IGr,K) = GrGr(IGr,K) + Trans(IGr,JGr,K) * V(JGr,K) | endif endif enddo endif enddo [ enddo Figure 12: Code indicating the computation of the matrix-vector product in the translation phase of the windowed FMM 21

c The symbols in this section of code represent the c same quantities as in the code for the exact FMM c ITestGrNew is a counter which is set to 1 if a test group c is "new". This means that since the aggregation and c translation operations involve only the test group c rather than the test element, they need to be done c only once (for the first element in the test group). c For the rest of the elements in the test group only c the disaggregation operation needs to be done (corresponds c to ITestGrNew = 0). do IGr = l,NGr ITestGrNew = 1 do IEI = 1,NEIGr(IGr) I = GetGlobal(IGr,IEl) do JGr = l,NGr if (Distance(IGr,JGr).gt.Dmin) then if (ITestGrNew.eq. ) then I V = (0.,0.) do JEl = 1,NE1Gr(JGr) J = GetGlobal(JGr,JEl) M if(S.eq.'*') then V = Dum(J)*conjg(SrcGc(IGr,J)) + V N else M eV = Dum(J)*SrcGc(IGr,J) + V endif N N enddo M M if (S.eq.'*') then M M GrGr(JGr) = conjg(Trans(IGr,JGr))*V else GrGr(JGr) = Trans(IGr,JGr)*V endif endif I if (S.eq.'*') then AX(I) = AX(I) + GrGr(JGr)*conjg(TestGc(I,JGr)) else AX(I) = AX(I) + GrGr(JGr)*TestGc(I,JGr) endif endif enddo ITestGrNew = 0 enddo enddo Figure 13: Code indicating the computation of the matrix-vector product in the FAFFA 22

10~ 0 0 o) en ~) ac 0 100 200 300 400 500 600 Iteration number Figure 14: Convergence curves for the hybrid algorithms for the groove of width 25A 23

30 20 m 10 -2 o O *m.-20 -20 pec pec x w =10 -30 (a) 0 30 60 90 120 Observation angle (deg.) 150 180 (b) RMS error (dB) Groove Width FE-FAFFA FE-Exact FMM FE-WFMM 10A 0.7627 0.1621 0.3291 (c) Figure 15: Scalability of the hybrid techniques to smaller problems (a) Problem geometry (b) Bistatic patterns (c) Error table 24

Y (a) (b) 3 I-N E2 U d cn.C0 1 3 (I '-'2 U.-a u ct.mI 0 30 60 90 120 Observation angle o (deg.) (c) 150 180 0 30 60 90 120 Observation angle 6 (deg.) (d) 150 18 Figure 16: (a) Von Karman radome geometry (2D section) - L = 1.1 m, D = 0.6 m (b) Meshed Von Karman radome (c) Bistatic RCS (0i = 0~) - 00 polarization (d) Bistatic RCS (0- = 00) - q4 polarization 25

25 v FMM Radome code o 5-. S' I 0 15 30 45 60 75 90 O \ ' II Observation angle (deg.) Figure 17: Backscatter RCS for a 10A long nose radome of diameter 1A 26

- /-\ (a) (b) 15 I | I- IResstive rad e 30 - - <- - - Antenna - 30 deg inclination Antenna - 90 deg inclination w -10 -25 -35 0 30 60 90 120 150 180 Observation angle | (deg.) (c) Figure 18: (a) and (b) Nose antenna - radome geometry - 6 is the angle of antenna inclination w.r.t x-z plane (c) aoo backscatter RCS (x-y plane) -35 w~~ -15.ln c e akcte C xypa 27