GLOBALLY CONVERGENT HOMOTOPY METHODS: A TUTORIAL Layne T. Watson, University of Michigan Technical Report 86-22 May 30, 1986

GLOBALLY CONVERGENT HOMOTOPY METHODS: A TUTORIAL Layne T. Watson Departments of Electrical Engineering and Computer Science, Industrial and Operations Engineering, and Mathematics University of Michigan Ann Arbor, MI 48109 USA Abstract. The basic theory for probability one globally convergent homotopy algorithms was developed in 1976, and since then the theory, algorithms, and applications have considerably expanded. These are algorithms for solving nonlinear systems of (algebraic) equations, which are convergent for almost all choices of starting point. Thus they are globally convergent with probability one. They are applicable to Brouwer fixed point problems, certain classes of zero-finding problems, unconstrained optimization, linearly constrained optimization, nonlinear complementarity, and the discretizations of nonlinear two-point boundary value problems based on shooting, finite differences, collocation, and finite elements. A mathematical software package, HOMPACK, exists that implements several different strategies and handles both dense and sparse problems. Homotopy algorithms are closely related to ODE algorithms, and make heavy use of ODE techniques. Homotopy algorithms for some classes of nonlinear systems, such as polynomial systems, exhibit a large amount of coarse grain parallelism. These and other topics are discussed in a tutorial fashion. 1. An example. Consider a thin incompressible elastic rod clamped at the origin and acted on by forces Q. P and torque M (see Figure 1). The governing nondimensionalized equations are = cos 8 = sin, = Qx - Py + M, (1.1) ds ds ds (O) = y(O) = O(0) = 0, (1.2) x(1) = a, y(l) = b, 0(1) = c.(1.3) The cantilever beam problem, which is to find the position (a, b) of the tip of the rod given the forces Q # 0 and P = 0, has a closed-form solution in terms of elliptic integrals. The inverse problem, where the a, b, c are specified and Q, P, M are to be determined, has no similar closedform solution. This inverse problem is ferociously nonlinear and extremely difficult. For large deformations, c = 6gr for example, the rod is wound like a coil spring and its shape is very sensitive to small perturbations in Q, P, or M. Let Q v P (1.4) M and x(s; v), y(s; v), 0(s; v) be the solution (dependent on v) of the initial value problem (1.1)-(1.2). Then an equivalent formulation of the nonlinear two-point boundary value problem (1.1)-(1.3) is to find a vector v such that x(1; v) - a F(v) = y(1; v)- b =0. (1.5) 0(l; v) - c 1

Figure 1. Elastic rod. This particular formulation is based on shooting. Nonlinear systems different from (1.5) could be derived based on multiple shooting, finite differences, polynomial or spline collocation, spectral methods, or Galerkin methods. The best way to approximate the solution to (1.1)-(1.3) is not the issue here. The issue is how to solve the particular given nonlinear system of equations (1.5). Newton and quasi-Newton methods, even the very best such as HYBRJ from Argonne's MINPACK package [32], fail dismally when applied to (1.5). A simple continuation scheme, such as tracking the zeros of pw(A, v) = A F(v) - (1 - A)(v - w) (1.6) as A is increased from 0 to 1, also fails. The zero curve of pw(A, v) in (1.6) emanating from (0, w) diverges to infinity. In fact this divergence also occurs if F(v) in (1.6) is replaced by DF(v) for any diagonal orthogonal matrix D. Nonlinear least squares algorithms attempting to minimize F(v)tF(v) also quickly fail, since there are too many local minima v that are not global minima (F(v) 0). Consider the function (known as a homotopy map) p: E3 x [0, 1) x E3 -} E3 defined by x(1; v)- [Aa+ (1- A)di] p(dA,v) = d(A,v) = |y(1; v) - [Ab+ (1- A)d2 (1.7) 9(1' v) - [Ac + (1 - A)d3] Pd is a homotopy (the technical term from topology) because it continuously deforms one function (in this case Pd(O, v) ) to another function (in this case pd(l, v) = F(v) ). Note that p(d, A, v) is C2 and that its Jacobian matrix Dp(d,, v) = [-(1 - A)I, -(a, b,c)t d, DF(v)] (1.8) 2

has full rank (rank = 3) on p-1(0). In differential geometry jargon, p is said to be transversal to zero. The mathematics then says that for almost all vectors d E E3 (in the sense of Lebesgue measure), the map Pd is also transversal to zero. What this means geometrically is that the zero set of Pd consists of smooth disjoint curves that do not intersect themselves or bifurcate, and have endpoints only at A = 0 or A = 1 (see Figure 2). In general under suitable conditions (described in Section 2) there is a zero curve of pd,(A, v) stretching from a known solution vo at A = 0 to the desired solution v' at A = 1. For the homotopy map (1.7) such a zero curve - does indeed exist, and a solution i to (1.5) can be found by tracking - starting from (O, vo). vo and d are related by x(l; vo) = d1, y(1; vo) = d, 0(1; vo) = d3. Since the theory says that everything works for almost all d, by the implicit function theorem, everything also works for almost all vo. Thus in practice Vo was chosen at random and d computed from vo, rather than vice versa. More details on this example can be found in [59]. Figure 2. The zero set pdl(0). The scheme just described is known as a probability one globally convergent homotopy algorithm. The phrase "probability one" refers to the almost any choice for d, and the "global convergence" refers to the fact that the starting point vo need not be anywhere near the solution v. Section 2 describes the mathematical theory behind globally convergent homotopy algorithms, and states some typical convergence theorems for various types of problems. Section 3 describes several strategies for tracking the zero curve A. This is a challenging numerical analysis problem, and has been the focus of much recent research, especially on large sparse problems. The mathematical software package HOIMPACI( is the topic of Section 4, and Section 5 briefly mentions some of the application areas of homotopy methods. Section 6 presents some current (and incomplete) work on parallel computation. 3

2. Theory. The theoretical foundation of all probability one globally convergent homotopy methods is given in the following differential geometry theorem: Definition. Let En denote n-dimensional real Euclidean space, let U C Em and V C E" be open sets, and let p: U x i0, 1) x V -- En be a C2 map. p is said to be transversal to zero if the Jacobian matrix Dp has full rank on p-(O). Parametrized Sard's Theorem. If p(a, X, x) is transversal to zero, then for almost all a E U the map pa(, x) = p (a, A, x) is also transversal to zero; i.e., with probability one the Jacobian matrix Dpa,(, x) has full rank on Pa 1(0). A typical zero set Pa1(0) is shown in Figure 2. The recipe for constructing a globally convergent homotopy algorithm to solve the nonlinear system of equations F(x) =0, (2.1) where F: En -- En is a C2 map, is as follows: For an open set U C Em construct a C2 homotopy map p: U x [0, 1) x En -- En such that 1) p(a,A,x) is transversal to zero, 2) pa(O, x) = p(a, 0, x) = 0 is trivial to solve and has a unique solution xo, 3) pa(l, x) = F(x), 4) pa1(0) is bounded. Then for almost all a E U there exists a zero curve Ey of Pa, along which the Jacobian matrix Dpa has rank n, emanating from (0, xo) and reaching a zero x of F at A = 1. This zero curve a does not intersect itself, is disjoint from any other zeros of Pa, and has finite arc length in every compact subset of [0, 1) x En. Furthermore, if DF(x) is nonsingular, then 7 has finite arc length. The general idea of the algorithm is now apparent: just follow the zero curve 7 emanating from (0, a) until a zero x of F(x) is reached (at A = 1). Of course it is nontrivial to develop a viable numerical algorithm based on that idea, but at least conceptually, the algorithm for solving the nonlinear system of equations F(x) = 0 is clear and simple. The homotopy map (usually, but not always) is pa(A x) = A F(x) + (1 - )(x- a), (2.2) which has the same form as a standard continuation or embedding mapping. However, there are two crucial differences. In standard continuation, the embedding parameter A increases monotonically from 0 to 1 as the trivial problem x- a = 0 is continuously deformed to the problem F(x) = 0. The present homotopy method permits A to both increase and decrease along a with no adverse effect; that is, turning points present no special difficulty. The second important difference is that there are never any "singular points" which afflict standard continuation methods. The way in which the zero curve r- of Pa is followed and the full rank of Dpa along -y guarantee this. It should be mentioned that the form of the homotopy map pa(A, x) in (2.2) is just a special case used here for clarity of exposition. The more general theory can be found in [69, 73-75, 82j, and practical engineering problems requiring a Pa nonlinear in A are in [59] and!66]. 4

The following sections give some typical theorems for various classes of problems. 2.1. Brouwer fixed points. This area represents one of the first successes for both simplicial [4, 13-14, 30, 41-43, 451 and continuous homotopy methods [8, 48]. Brouwer fixed point problems can be very nasty, and often cause locally convergent iterative methods a great deal of difficulty. Theorem. Let B = {xz En! ix\Il2 = 1} be the closed unit ball, and f: B - B a C2 map. Then for almost all a E int B there exists a zero curve: of pa(A, x) = (x - f(x)) + (1 - A)(x - a), along which the Jacobian matrix Dpa(A, x) has full rank, emanating from (0, a) and reaching a fixed point Z of f at A = 1. Furthermore, 7 has finite arc length if I - Df(x) is nonsingular. 2.2. General zero finding problems. Typically a problem (such as a partial differential equation) reduces to a finite dimensional nonlinear system of equations, and what is desired are conditions on the original problem, not on the final discretized problem. Thus the results in this section are used to derive, working backwards, useful conditions on the original problem, whatever it might be. The following four lemmas from [74], [74], [73], [75] respectively, which follow from the results of 18], are used for that purpose. Lemma 1. Let g: EP - EP be a C2 map, a E EP, and define Pa: [0,1) x EP -- EP by pa(A, y) = Ag(y) + (1 - A)(y - a). Then for almost all a E EP there is a zero curve 7 of Pa emanating from (0, a) along which the Jacobian matrix Dpa(A, y) has full rank. Lemma 2. If the zero curve 3' in Lemma 1 is bounded, it has an accumulation point (1, y), where g(y) = 0. Furthermore, if Dg(y) is nonsingular, then 7 has finite arc length. Lemma 3. Let F: EP 1 EP be a C2 map such that for some r > 0, xF(x) > 0 whenever |lxl[ = r. Then F has a zero in {zE EP l x\II <~ r}, and for almost all a E EP, \cal < r, there is a zero curve y of pa(X, x) = AF(x) + (1 - A)(x- a), along which the Jacobian matrix Dpa(, x) has full rank, emanating from (0, a) and reaching a zero x of F at A = 1. Furthermore, - has finite arc length if DF(x) is nonsingular. Lemma 3 is a special case of the following more general lemma. Lemma 4. Let F: EP - EP be a C2 map such that for some r > O and r > 0, F(x) and x - a do not point in opposite directions for I x\I = r, [i a < r. Then F has a zero in {z E EP 1 I x\ <: r}, and for almost all a C EP, [iall < r, there is a zero curve - of pa(, x) = XF(x) + (1 - A)(x- a), along which the Jacobian matrix Dpa(A, z) has full rank, emanating from (O0 a) and reaching a zero x of F at A = 1. Furthermore, - has finite arc length if DF(z) is nonsingular. 5

2.3. Optimization. Consider first the unconstrained optimization problem min f(x). (2.3) Theorem. Let f: En -+ E be a C3 convex map with a minimum at i, \II112 < M. Then for almost all a, 1i alo < M, there exists a zero curve 7 of the homotopy map p(A, x) = A Vf(x)+ (1 - A)(x- a), along which the Jacobian matrix Dpa(A, x) has full rank, emanating from (0, a) and reaching a point (1, ), where x solves (2.3). A function is called uniformly convex if it is convex and its Hessian's smallest eigenvalue is bounded away from zero. Consider next the constrained optimization problem min f(x). (2.4) X_0 This is more general than it might appear because the general convex quadratic program reduces to a problem of the form (2.4). Theorem. Let f: E" -> E be a C3 uniformly convex map. Then there exists 6 > 0 such that for almost all a > 0 with'!all2 < 6 there exists a zero curve 7 of the homotopy map pa(, x) = A K(x) + (1- A)(x- a), where:( - x) - ) i + ( ) + x3 i (xi O xi along which the Jacobian matrix Dpa(A, x) has full rank, connecting (0, a) to a point (1,l), where x solves the constrained optimization problem (2.4). 2.4. Nonlinear complementarity. Given F: En -* En, the nonlinear complementarity problem is to find a vector x ~ En such that > 0, F(x) > 0, tF(x) = 0. (2.5) At a solution A, x and F(x) are "complementary" in the sense that if;i > 0, then Fi(x) = 0, and if Fi(x) > 0, then ij = 0. This problem is difficult because there are linear constraints x > 0, nonlinear constraints F(x) > 0, and a combinatorial aspect from the complementarity condition xtF(x) = 0. It is interesting that homotopy methods can be adapted to deal with nonlinear constraints and combinatorial conditions. Define G: En - En by G,(z) = - F,(z) - zi13 + (F,(z))3 + z, i= 1,..., n, (2.6) and let pa(X, z) = A G(z) + (1 - )(z- a). 6

Theorem. Let F: En -+ En be a C2 map, and let the Jacobian matrix DG(z) be nonsingular at every zero of G(z). Suppose there exists r > 0 such that z > 0 and zk = IzIloo > r imply Fk(z) > 0. Then for almost all a > 0 there exists a zero curve'y of pa(^, z), along which the Jacobian matrix Dpa(A, z) has full rank, having finite arc length and connecting (0, a) to (1, z), where solves (2.5). Theorem. Let F: En -I En be a C2 map, and let the Jacobian matrix DG(z) be nonsingular at every zero of G(z). Suppose there exists r > 0 such that z > 0 and l\z\1__ > r imply zkFk(z) > 0 for some index k. Then there exists 6 > 0 such that for almost all a > 0 with |a ll| < 6 there exists a zero curve' of p,(A, z), along which the Jacobian matrix Dp,(A, z) has full rank, having finite arc length and connecting (0, a) to (1, ), where 7 solves (2.5). 2.5. Two-point boundary value problems: shooting. The next few sections consider nonlinear two-point boundary value problems of the form y"(t)= g(t,y(t), y(t)), 0 < t < 1, (2.7) y(0) = 0, y(l) = 0 (or y'(1) = 0), (2.8) where y = (Y1i,, Yn), g(t, u, v) satisfies a Lipschitz condition in (u, v) for 0 < t < 1, and has continuous second partials with respect to u and v for 0 < t < 1. These technical assumptions vary slightly from theorem to theorem and method to method; consult the references for complete and precise statements of the theorems. The intent here is to provide the flavor of applying homotopy methods to approximations to nonlinear two-point boundary value problems, and not get bogged down in technical detail. Let u(t) be the unique solution of y"(t)= g(t, y(t), y'(t)), 0 < t < 1, y(0) = 0, y'(O) = v, and define f: En -? En by f(v) = u() (or u'(1)). (2.9) Observe that the original problem (2.7)-(2.8) is equivalent to solving f(v) = (2.10) for the correct initial condition v. Theorem. Suppose there exists a constant M > 0 such that lg(t,z(t),z'(t)) \ ~ < M along every trajectory x(t) for which x(0) = 0, JIx'(0)Ilo = M. Then for almost all w E En with I wl oo < M there exists a zero curve -E of the homotopy map p,(A, v) = A (v) + (1 - A)(v - w), along which Dpw has full rank, lying in [O, 1] x {v E En I ivll\\ < M} and connecting (0, w) to (1, v), where v is a zero of f. If Df(v) is nonsingular, then' has finite arc length. 2.6. Two-point boundary value problems: finite differences. The problem under consideration is (2.7)-(2.8). Using standard centered second order accurate finite difference approximations for y' 7

and y" in (2.7) with the boundary conditions (2.8) at mesh points 0 = to < t < < tN < tN+1 = 1 results in the finite difference approximation G(Y) = A Y + h2Fh(Y) = 0, (2.11) where h = maxi(ti+l - ti) is the mesh size, A is a constant positive definite matrix, and Fh(Y) is the nonlinear part of G(Y) due to g. Theorem. Let Fh(Y) be a C2 mapping, and suppose that limsup, F 9. Y 12-o00 1l 1 Then for almost W e EN there is a zero curve - of the homotopy map w(A, Y) = A G(Y) + (1 - A)(Y - W), along which the Jacobian matrix Dpw(A, Y) has full rank, emanating from (0, W) and reaching a zero Y of G at A = 1. Furthermore, - has finite arc length if DG(Y) is nonsingular. More general boundary conditions than (2.8) have the form By(O) + By'(O) + Cy(l) + C'y'(1) = b, (2.12) where rank (B B' C C') = 2n, and there are other technical restrictions (see Keller [23]). These boundary conditions lead to a different nonlinear system of equations G(Y) = A Y + h2Fh(Y) = 0, (2.13) where A and Fh are different from those in (2.11). Theorem. Let Fh(Y) be a C2 mapping, and suppose that G from (2.13) satisfies one of the following: 1) there exist r > 0 and r > 0 such that Y - W and G(Y) do not point in opposite directions for!i l21= r, iiw ti < r; 2) there exists r > 0 such that YtG(Y) > O for | Y112 = r; 3) A is positive semidefinite, and there exists r > 0 such that YtFh(Y) > 0 for Yl Y\l,2 = r. Then for almost all 11 WIl2 < r there is a zero curve f of the homotopy map pw(A, Y) = A G(Y) + (1- A)(Y - W), along which the Jacobian matrix Dpw(A, Y) has full rank, emanating from (0, W) and reaching a zero Y of G at A = 1. Furthermore, 7 has finite arc length if DG( Y) is nonsingular. Theorem. The conclusion of the previous theorem holds if A from (2.13) is positive definite and g(t, u, v) is a bounded C2 mapping. Theorem. The conclusion of the previous theorem also holds if g(t, u, v) is a bounded C2 mapping and the boundary conditions are of the form y(O) = 61, y(l) = b2 8

2.7. Two-point boundary value problems: collocation. The idea here is to approximate y(t) from some convenient, finite dimensional vector space Sm with basis i,...,pm. These basis functions could be orthogonal polynomials (a popular approach in chemical engineering) or B-splines (the numerical analysts' choice) or something tailored for a specific problem. The approximations m yk(t) ~ Ak(t) -= ck,(t), k = 1,... n, (2.14) i=1 are substituted into equations (2.7)-(2.8) evaluated at discrete points, the collocation points, in the interval [O, 1]. This results in the nonlinear system of equations F(Y) = M Y + N(Y) = 0, (2.15) where Y = ( 11o 12) * * *~ C0lm; oz211 * *~ ~, 2mr, *~ ~ ~; nl; * *,~ t nm), M is a constant matrix, and N(Y) is the nonlinear part due to g. The dimension of the problem is p= nm. Theorem. Let N( Y) in (2.15) be a C2 mapping, and suppose there exist constants C and v such that limsup N(Y )I= C, 0< v < 1. (2.16) |Y|lloo.-o 11 r oll For W T EP, define pw: [0, 1) x EP - EP by w(A, Y ) = A F(Y) + (1 - A)(Y - 1). Then for almost all 4W EP there exists a zero curve - of pw, along which the Jacobian matrix Dpw(A, Y) has full rank, emanating from (0, W) and reaching a zero Y' of F (at A = 1). Furthermore, if DF(Y) is nonsingular, then ) has finite arc length. Theorem. Let N( ) in (2.15) be a C2 mapping and the matrix M be such that min max. (MYY)j = F > 0. |i Y[Ioo=1 l<j<p If lim sup N( < r, IIYIr+oo-~ { I iYlK then the conclusion of the above Theorem holds. Theorem. If g(t, u, v) in (2.7) is C2 and bounded, then the conclusion of the above Theorem holds. Theorem. Let g(t, u, v) in (2.7) be a C2 mapping, and suppose there exist constants it and v such that limsup max iig(t, yu) i =, 0 < v < 1. (2.17).ilio-o0 0 lylloo Then the conclusion of the above Theorem holds. 9

2.8. Two-point boundary value problems: finite elements. The finite element (or Rayleigh-RitzGalerkin) approach is similar to collocation in that an approximation is sought from a finite dimensional space Sm. Here the elements of Sm automatically satisfy the boundary conditions (2.8), which collocation doesn't require. Instead of satisfying the differential equation (2.7) at discrete points, the finite element method satisfies (2.7) in an average sense by requiring certain inner product integrals to be equal. In some contexts the finite element formulation can be viewed as minimizing some functional over a finite dimensional space, where the minimum over an infinite dimensional space gives the exact solution (a variational formulation). Using the approximations (2.14), the Galerkin approximation to (2.7)-(2.8) is M1 rl j A(x)k(x) d = j g(x, A(x), A'(x)) Pk(x) dx, k= 1,...,m, j= 1,...,n. (2.18) This is a system of equations of exactly the same form as (2.15). The convergence theorems are similar to those for collocation. Theorem. Let N(Y) in (2.15) be a C2 mapping, and suppose there exist constants C and v such that im sup IN(Y)12 C, < v < 1. |iy|I2oo!~ i2 12 For W E EP, define pw: [0,1) x EP -I EP by pw(A, v) = AF(Y) + (1- A)( -')). Then for almost all TW EP there exists a zero curve -7 of pw, along which the Jacobian matrix Dpw. (A, Y) has full rank, emanating from (0, W) and reaching a zero Y' of F (at A = 1). Furthermore. if DF( Y) is nonsingular, then y has finite arc length. Theorem. Let N( Y) in (2.15) be a C2 mapping and r > 0 the smallest eigenvalue of Al. If limsup iN( Y) 11 limsup = C < r, II YlI.-2- I} [li2 then the conclusion of the above Theorem holds. Theorem. If g(t, u, v) in (2.7) is C2 and bounded, then the conclusion of the above Theorem holds. Theorem. Let g(t, u, v) in (2.7) be a C2 mapping, and suppose there exist constants Ai, i, and v such that g(t u, v) l!2 < (4 + (ilull 2 + lvi2)v), 0 < < 1, (2.19) for all 0 < t < 1 and u, v E E'. Then the conclusion of the above Theorem holds. 3. Algorithms. The zero curve E of the homotopy map Pw(A, Y) in (2.2) can be tracked by many different techniques; refer to the excellent survey [4] and recent work by Rheinboldt and Burkhardt [39], Mejia [28], and Watson [82]. There are three primary algorithmic approaches to tracking y: 1) an ODE-based algorithm based on that in [68]. with several refinements; 2) a predictor-corrector algorithm whose corrector follows the flow normal to the Davridenko flow [16-18i (a "normal flow" 10

algorithm); 3) a version of Rheinboldt's linear predictor, quasi-Newton corrector algorithm [39] (an "augmented Jacobian matrix" method). The algorithms for dense and sparse Jacobian matrices are qualitatively different. Only algorithms for dense Jacobian matrices are discussed here; see [77] for sparse homotopy algorithms. 3.1. Ordinary differential equation based algorithm (dense Jacobian matrix). For the sake of brevity, only the zero finding problem (2.1) with homotopy map (2.2) will be presented. Assuming that F(x) is C2, a is such that the Jacobian matrix Dpa(A, x) has full rank along y, and 7 is bounded, the zero curve a is C1 and can be parametrized by arc length s. Thus A = A(s), x = x(s) along y, and pa((s), x(s))= 0 (3.1) identically in s. Therefore da(A(s), x(s)) = Dpa((s), x(s)) d = 0 (3.2) dx \ ds dA dx (3.3) ds' ds) |12 With the initial conditions A(0) = 0, x(O) = a, (3.4) the zero curve 3 is the trajectory of the initial value problem (3.2-3.4). When A(s) = 1, the corresponding x(S) is a zero of F(x). Thus all the sophisticated ordinary differential equation techniques currently available can be brought to bear on the problem of tracking a [47]. [69]. Typical ordinary differential equation software requires (dA/ds, dx ds) explicitly, and (3.2), (3.3) only implicitly define the derivative (dA/ds, dx/ds). (It might be possible to use an implicit ordinary differential equation technique for (3.2-6), but that seems less efficient than the method proposed here.) The derivative (dA/ds, dx/ds), which is a unit tangent vector to the zero curve, can be calculated by finding the one-dimensional kernel of the n x (n + 1) Jacobian matrix Dpa(A(s), x(s)), which has full rank according to the theory [69]. It is here that a substantial amount of computation is incurred, and it is imperative that the number of derivative evaluations be kept small. Once the kernel has been calculated, the derivative (dA/ds, dx/ds) is uniquely determined by (3.3) and continuity. Complete details for solving the initial value problem (3.2-3.4) and obtaining x(s) are in [68] and [69]. A discussion of the kernel computation follows. The Jacobian matrix Dpa is nx (n+l1) with (theoretical) rank n. The crucial observation is that the last n columns of Dpa, corresponding to Dzpa, may not have rank n, and even if they do, some other n columns may be better conditioned. The objective is to avoid choosing n "distinguished" columns, rather to treat all columns the same (not possible for sparse matrices). Kubicek [26! and Mejia 128] have kernel finding algorithms based on Gaussian elimination and n distinguished columns. Choosing and switching these n columns is tricky, and based on ad hoc parameters. Also, 11

computational experience has shown that accurate tangent vectors (dA/ds, dx/ ds) are essential, and the accuracy of Gaussian elimination may not be good enough. A conceptually elegant, as well as accurate, algorithm is to compute the QR factorization with column interchanges [7] of Dpa,, *, ~ Q Dpa PtPz..: Pz = O, 0 * * where Q is a product of Householder reflections and P is a permutation matrix, and then obtain a vector z E ker Dpa by back substitution. Setting (Pz),l+ = 1 is a convenient choice. This scheme provides high accuracy, numerical stability, and a uniform treatment of all n + 1 columns. Finally, (dA dx\ _ z ds' ds H'z\12 where the sign is chosen to maintain an acute angle with the previous tangent vector on -. There is a rigorous mathematical criterion, based on a (n + 1) x (n + 1) determinant, for choosing the sign, but there is no reason to believe that would be more robust than the angle criterion. Several features which are a combination of common sense and computational experience should be incorporated into the algorithm. Since most ordinary differential equation solvers only control the local error, the longer the arc length of the zero curve - gets, the farther away the computed points may be from the true curve -y. Therefore when the arc length gets too long, the last computed point (A, x) is used to calculate \ F(.) + (1 - AX) a- _ (3.5) Then pa(A;, ) = 0 exactly, and the zero curve of pa(A, 2) is followed starting from (A_, ). A rigorous justification for this strategy was given in [69]. Remember that tracking - was merely a means to an end, namely a zero x of F(x). Since c itself is of no interest (usually), one should not waste computational effort following it too closely. However, since 7 is the only sure way to x, losing 7 can be disastrous [69j. The strategy proposed in [82] estimates the curvature of each component of y using finite differences, and reduces the tolerance used by the ordinary differential equation solver whenever the estimated curvature exceeds some threshold. The tradeoff between computational efficiency and reliability is very delicate, and a fool-proof strategy appears difficult to achieve. In summary, the algorithm is: 1. Set s: = 0, y: = (0, a), ypold: = yp: = (1,0,...,0), restart: = false, error: = initial error tolerance for the ODE solver. 2. If yl < 0 then go to 23. 3. If s > some constant then 4. s: =0. 5. Compute a new vector a from (3.5). If ||new a - old ajl > 1 - constant * |[old all, 12

then go to 23. 6. ode error: = error. 7. If Ilyp - ypoldllo > (last arc length step) * constant, then ode error: = tolerance < error. 8. ypold: = yp. 9. Take a step along the trajectory of (3.2-3.4) with the ODE solver. yp = y'(s) is computed for the ODE solver by 10-12: 10. Find a vector z in the kernel of Dpa(y) using Householder reflections. 11. If ztypold < 0, then z:= -z. 12. yp: = z/IzI. 13. If the ODE solver returns an error code, then go to 23. 14. If yi < 0.99, then go to 2. 15. If restart = true, then go to 20. 16. restart: = true. 17. error: = final accuracy desired. 18. If y1 > 1, then set (s, y) back to the previous point (where yl < 1). 19. Go to 4. 20. If y1 < 1 then go to 2. 21. Obtain the zero (at yi = 1) by interpolating mesh points used by the ODE solver. 22. Normal return. 23. Error return. 3.2. Normal flow algorithm (dense Jacobian matrix). As the homotopy parameter vector a varies, the corresponding homotopy zero curve 7- also varies. This family of zero curves is known as the Davidenko flow. The normal flow algorithm is so called because the iterates converge to the zero curve y along the flow normal to the Davidenko flow (in an asymptotic sense). The normal flow algorithm has three phases: prediction, correction, and step size estimation. (2.1) and (2.2) are the relevant equations here. For the prediction phase, assume that several points p(1) = (A(sl), (sl)), p(2) = (A(s2),x(s2)) on a with corresponding tangent vectors (dA/ds(si), dx/ds(sl)), (dA/ds(s2), dx/ds(s2)) have been found, and h is an estimate of the optimal step (in arc length) to take along -y. The prediction of the next point on - is ) = P(s+ h), (3.6) where p(s) is the Hermite cubic interpolating (A(s), x(s)) at si and s2. Precisely, p(s) = (A(s1), X(5s)), p'(s) = (dA/ds(sl), dx/ds(sl)), p(s2) = (A(s2), x(2)), p'(s2) = (dA/ds(s2), dx/ds(s2)), 13

and each component of p(s) is a polynomial in s of degree less than or equal to 3. Starting at the predicted point Z(), the corrector iteration is Z(k+1) z z(- [Dpa(Z( ))] a(Z ) k = 0,1,... (3.7) where [Dpa(Z(k))] is the Moore-Penrose pseudoinverse of the n x (n + 1) Jacobian matrix Dpa. Small perturbations of a produce small changes in the trajectory y, and the family of trajectories o- for varying a is known as the "Davidenko flow". Geometrically, the iterates given by (3.7) return to the zero curve along the flow normal to the Davidenko flow, hence the name "normal flow algorithm". A corrector step AZ is the unique minimum norm solution of the equation [Dpa]AZ = -Pa (3.8) Fortunately AZ can be calculated at the same time as the kernel of [Dpa], and with just a little more work. Normally for dense problems the kernel of [Dpa] is found by computing a QR factorization of rDpa], and then using back substitution. By applying this QR factorization to -pa and using back substitution again, a particular solution v to (3.8) can be found. Let u: 0 be any vector in the kernel of [Dpa]. Then the minimum norm solution of (3.8) is vt u AZ = v-.u. (3.9) Since the kernel of [Dpal is needed anyway for the tangent vectors, solving (3.8) only requires another 0(n2) operations beyond those for the kernel. The number of iterations required for convergence of (3.7) should be kept small (say < 4) since QR factorizations of [Dpa] are expensive. The alternative of using [Dpa(Z(0))] for several iterations, which results in linear convergence, is rarely cost effective. When the iteration (3.7) converges, the final iterate Z(k+l) is accepted as the next point on 7, and the tangent vector to the integral curve through Z(k) is used for the tangent-this saves a Jacobian matrix evaluation and factorization at Z(k+l). The step size estimation described next attempts to balance progress along -y with the effort expended on the iteration (3.7). Define a contraction factor L= Z() Z) (3.10) a residual factor R - a(Z(' )) (3.11) a distance factor (Z' = limko Z(k)) D = IZ - ^(3.12) 14

and ideal values L, R, D for these three. Let h be the current step size (the distance from Z' to the previous point found on y ), and h the "optimal" step size for the next step. The goal is to achieve -L R ~ -D A- -- (3.13) L R D hq for some q. This leads to the choice h= (min{L/L, R/R, D/D}) 1/ h, (3.14) a worst case choice. To prevent chattering and unreasonable values, constants hmin (minimum allowed step size), hm,, (maximum allowed step size), Bmin (contraction factor), and Bmax (expansion factor) are chosen, and h is taken as h= min {max{hmin, Binh, h}, Bmaxh, hmax}. (3.15) There are eight parameters in this process: L, R, D, /hin, hmax, Bmin, Bmax, q. The choice of h from (3.15) can be refined further. If (3.7) converged in one iteration, then h should certainly not be smaller than h, hence set h:= max{h, h} (3.16) if (3.7) only required one iteration. To prevent divergence from the iteration (3.7), if (3.7) has not converged after K iterations, h is halved and a new prediction is computed. Every time h is halved the old value hold is saved. Thus if (3.7) has failed to converge in K iterations sometime during this step. the new h should not be greater than the value hold known to produce failure. Hence in this case h:= minh hold, h} (3.17) Finally, if (3.7) required the maximum K iterations, the step size should not increase, so in this case set h:= min{h, h}. (3.18) The logic in (3.16-3.18) is rarely invoked, but it does have a stabilizing effect on the algorithm. In summary, the algorithm is: 1. s:=, y: = (O, a), h: = 0.1, firststep:= true, arcae, arcre: = absolute, relative error tolerances for tracking -y, ansae, ansre: = absolute, relative error tolerances for the answer. 2. If firststep: = false then 3. Compute the predicted point Z() using (3.6). else 4. Compute the predicted point Z(~) using a linear predictor based on y = (0, a) and the tangent there. 15

5. Iterate with equation (3.7) until either ||Z(k) b y arcae + arcre Z(k) or 4 iterations have been performed 6. If the Newton iteration (3.7) did not converge in 4 steps, then 7. h:= h/2. 8. If h is unreasonably small, then return with an error flag. 9. Go to 2. 10. firststep: = false. 11. If Yl < 1, then compute a new step size h by (3.10-3.18) and go to 2. 12. Do 13.-18. some fixed number of times. 13. Find s such that p(s)j = 1, using yold, ypold, y, yp in (3.6). 14. Do two iterations of (3.7) starting with Z(~) = p(s), ending with Z(2). 15. If Z (2 1 [+ AZ(11 < ansae+ ansre Z(1) then return (solution has been found). 16. If Z2 > 1, then 17. y: Z(2), yp = tangent at Z(2). else 18. yold: = Z2), ypold: = tangent at (2). 19. Return with an error flag. 3.3. Augmented (dense) Jacobian matrix algorithm. The augmented Jacobian matrix algorithm has four major phases: prediction, correction, step size estimation, and computation of the solution at A = 1. Again, only the zero finding case is described here. The algorithm here is based on Rheinboldt [36-39], but with some significant differences: (1) a Hermite cubic rather than a linear predictor is used; (2) a tangent vector rather than a standard basis vector is used to augment the Jacobian matrix of the homotopy map; (3) updated QR factorizations and quasi-Newton updates are used rather than Newton's method; (4) different step size control, necessitated by the use of quasi-Newton iterations, is used; (5) a different scheme for locating the target point at A = 1 is used which allows the Jacobian matrix of F to be singular at the solution. 16

The prediction phase is exactly the same as in the normal flow algorithm. Having the points p(1) = (A(.s), X(sl)), p(2) = (\(s2), x(s2)) on - with corresponding tangent vectors dA dA T - ((1() ) T ( d (2) dx' T (2) - dx (ds ) ds (82) the prediction Z(~) of the next point on - is given by (3.6). In order to use this predictor, a means of calculating the tangent vector T(2) at a point p(2) is required. This is done by solving the system Dpa (p(2)) (3.19) T(1)t (3.19) for z, where Dpa is the n x (n + 1) Jacobian of Pa. Normalizing z gives (2)_ z -(2) = 4 (3.20) The last row of (3.19) insures that the tangent T(2) makes an acute angle with the previous tangent (1). It is the augmentation of the Jacobian matrix with this additional row which motivates the name "augmented Jacobian matrix algorithm". The solution to (3.19) is found by computing a QR factorization of the matrix, and then using back substitution [7]. Starting with the predicted point Z(), the correction is performed by a quasi-Newton iteration defined by Z(k+ ) = ) -[ A(k) -1 (P (Z(k))) k=, 1,... (3.21) [T(2)t 0 where A(k) is an approximation to the Jacobian matrix Dpa (Z(k)). The last row of the matrix in (3.21) insures that the iterates lie in a hyperplane perpendicular to the tangent vector T("). (3.21) is the quasi-Newton iteration for solving the augmented nonlinear system ( Pa(Y) = 0. (3.22) A corrector step AZ(k) is the unique solution to the equation [A ] AZ(k) = (-Pa (Z(k))) (3.23) The matrix on the left side of this equation is produced by successive Broyden rank one updates [71 of the matrix in (3.19). Precisely, letting Z(-1) = p(2), A(-1) _ Dpa (P(2)), and M(k) _ A(k)T J T(2) t1 17

the update formulas are (-)_A- _DT ( - M [ A()t] = [ (T)T(1) t (3.24) and _(Apa -M(k) AZ(k)) AZ(k)t M(k+1) = M(k) + AZ( ) = -1,0,... (3.25) AZ(k) t AZ(k) where ^P (Pa (Z(k+ )) Pa (Z(k))) These updates can be done in QR factored form, requiring a total of O(n2) operations for each iteration in the correction process [7]. When the iteration (3.21) converges within some tolerance, the final iterate Z(') is accepted as the next point on the zero curve y. The step size estimation algorithm is an adaptation of a procedure developed by Rheinboldt 139]. At each point p(k) with tangent T(k) along y, the curvature is estimated by the formula i,(k)= sin(a k/2)|, (3.26) I Sk where W(k) -() - T(k-1) a arccos T(k)tT(k-1)) s = p() _ p(k-1) ASk) Intuitively, ak represents the angle between the last two tangent vectors, and the curvature is approximated by the Euclidean norm of the difference between these two tangents divided by Ask. This curvature data can be extrapolated to produce a prediction for the curvature for the next step e (k) + ( Alk - (k- 1) ^k wW + -W -w^-1)). (3.27) Since ik can be negative, use k-max{ min, k} for some small min> 0, (3.28) as the predicted curvature for the next step. The goal in estimating the optimal step size is to keep the error in the prediction \Z() - Z(':) relatively constant, so that the number of iterations required by the corrector will be stable. This is achieved by choosing the step size as h = /26k/k, (3.29) where 8k represents the ideal starting error desired for the prediction step. 6k is chosen as a function of the tolerance for tracking the curve and is also restricted to be no larger than half of Ask. 18

As with the normal flow algorithm, additional refinements on the optimal step size are made in order to prevent chattering and unreasonable values. In particular, h is chosen to satisfy equations (3.15) and (3.17). This h is then used as the step size for the next step. The final phase of the algorithm, computation of the solution at A = 1, is entered when a point p(2) is generated such that p2) > 1. p(2) is the first such point, so the solution must lie on 7 somewhere between p(2) and the previous point p(1). The algorithm for finding this solution is a two step process which is repeated until the solution is found. First, starting from a point p(k) a prediction Z(k-2) for the solution is generated such that Z(k-2) = 1. Second, a single quasiNewton iteration is performed to produce a new point p(k+l) close to 7, but not necessarily on the hyperplane A = 1. Normally, the prediction Z(k-2) is computed by a secant method using the last two points p(k) and p(k-1): Z(k-2) p(k)+ (p(k-1) p(k)) (k) (3.30) (k) I D^K~^) p) < However, this formula can potentially produce a disastrous prediction (e.g., if |Pk l) - P(k) < - pn)l ), so an additional scheme is added to ensure that this does not happen. In order to implement this scheme, a point P(opp) must be saved. This point is chosen as the last point computed from a quasi-Newton step which is on the opposite side of the hyperplane A = 1 from p(k) Thus, the points P(opp) and p(k) bracket the solution. The prediction Z(k-2) may be bad whenever the inequality Z(k2) _ p(k) > p(k) p(op) (3.31) (3 1) This chord method, while much safer than the secant method (3.30), is used only in the special case (3.31) because it has a much slower rate of convergence than the secant method. An exception to these linear prediction schemes occurs with the first step of the final phase. Since the tangents T(1) and T2) at P(1) and p(2) are available, this information is used to generate a Hermite cubic polynomial p(s) for calculating the first prediction point Z(~). This is done by finding the root s of the equation pi(s) = 1. Z(O) is then given by Z() = p(s). (3.33) After the predictor Z(k-2) has been determined, a quasi-Newton step is taken to get the point p(ktl ). This step is defined by p(k+l) = Z(k-2) Z(k-2) (334) where AZ(k-2) is the solution to (3.23). Again, the matrix in (3.23) is produced by the rank one updates (3.24) and (3.25). 19

The alternating process of computing a prediction and taking a quasi-Newton step is repeated until the solution is found. In summary, the algorithm is: 1. s: = 0, y: = (0, a), ypold: = (1,0), h: = 0.1, failed: = false, firststep: = true, arcae, arcre: = absolute, relative error tolerances for tracking y, ansae, ansre: = absolute, relative error tolerances for the answer. 2. Compute the tangent yp at y, using (3.19) and (3.20), and update the augmented Jacobian matrix using (3.24). 3. If firststep = false then 4. Compute the predicted point Z(.) with the cubic predictor (3.6) based on yold, ypold, y, YPelse 5. Compute the predicted point Z(~) using a linear predictor based on y and yp. 6. If failed = true then 7. Compute the augmented Jacobian matrix at Z(). 8. Compute the next iterate Z(1) using (3.21). 9. limit: =2( - logio(arcae - arcreli )J -- 1). Repeat steps 10-11 until either llAZ(k)! < arcae + arcre Z(k)! or limit iterations have been performed. 10. Update the augmented Jacobian matrix using (3.25). 11. Compute the next iterate using (3.21). 12. If the quasi-Newton iteration did not converge in limit steps, then 13. h: h/2; failed: = true. 14. If h is unreasonably small, then return with an error flag. 15. Go to 3. 16. Compute the tangent at the accepted iterate Z(') using (3.19) and (3.20), and update the augmented Jacobian matrix using (3.24). 17. Compute the angle a between the current and previous tangents by (3.26). 18. If ac > r/3, then 19. h:= h/2: failed: = true. 20. If h is unreasonably small, then return with an error flag. 20

21. Go to 3. 22. yold: = y, ypold: = yp, y: Z(), yp: = tangent computed in step 16, firststep: = false, failed: = false. 23. If yi < 1, then compute a new step size h by Equations (3.15, 3.17, 3.26-3.29) with Emin = 0.01, 8k = min (arcae + arcrellyll) /4, -y - yold 2 4' and go to 3. 24. Find 5 such that p(s)1 = 1, using yold, ypold, y, and yp in (3.6). yopp: = yold, Z(O) = p:s). 25. limit: = 2( - loglo(ansae + arisrelly/)j + 1). Do steps 26-33 for k = 2,...,limit + 2. 26. Update the augmented Jacobian matrix using (3.25). 27. Take a quasi-Newton step with (3.34). 28. If I(k+1) - 1 +l Z(k2) 2) anse ansre Z(k2) I Pi < ^ -^^ansae ansre then return (solution has been found). 29. If p(k)_ 1 1 < ansae + ansre, then Z(k-l): = p(k-l) else do steps 30-33. 30. yold:= y, y = p(k+). 31. If yoldl and yi bracket A = 1, then yopp = yold. 32. Compute Z(k-l) with the linear predictor (3.30) using y and yold. 33. If IZ(k-l) - y > IIy - yopp\l, then compute Z(k-1) with the linear predictor (3.32) using y and yopp. 34. Return with an error flag. 4. Software. This section describes HOMPACK [82], a software package currently under development at Sandia National Laboratories, General Motors Research Laboratories, Virginia Polytechnic Institute and State University, and the University of Michigan. HOMPACK is organized in two different ways: by algorithm/problem type and by subroutine level. There are three levels of subroutines. The top level consists of drivers, one for each problem type and algorithm type. Normally these drivers are called by the user, and the user need know nothing beyond them. They allocate storage for the lower level routines, and all the arrays are variable dimension, so there is no limit on problem size. The second subroutine level implements the major components of the algorithms such as stepping along the homotopy zero curve, computing tangents, and the end game for the solution at A = 1. A sophisticated user might call these routines directly to have complete control 21

of the algorithm, or for some other task such as tracking an arbitrary parametrized curve over an arbitrary parameter range. The lowest subroutine level handles the numerical linear algebra, and includes some BLAS routines. All the linear algebra and associated data structure handling are concentrated in these routines, so a user could incorporate his own data structures by writing his own versions of these low level routines. Also, by concentrating the linear algebra in subroutines, HOMPACK can be easily adapted to a vector or parallel computer. The organization of HOMPACK by algorithm/problem type is shown in Table 1, which lists the driver name for each algorithm and problem type. Table 1. Taxonomy of homotopy subroutines. x = f() I F(z) = 0 p(a, A, x) = 0 algorithm dense sparse dense sparse dense sparse FIXPDF FIXPDS FIXPDF FIXPDS FIXPDF FIXPDS ordinary differential equation FIXPNF FIXPNS FIXPNF FIXPNS FIXPNF FIXPNS normal flow FIXPQF FIXPQS FIXPQF FIXPQS FIXPQF FIXPQS augmented Jacobian matrix The naming convention is D F F IXP N S where D ~ ordinary differential equation algorithm, N, normal flow algorithm, Q s augmented Jacobian matrix algorithm, F ~ dense Jacobian matrix, and S - sparse Jacobian matrix. Using brackets to indicate the three subroutine levels described above, the natural grouping of the HOMPACK routines is: [FIXPDF] [FODE, ROOT, SINTRP, STEPS] [DCPOSE] [FIXPDS] [FODEDS, ROOT, SINTRP, STEPDSJ [GMFADS, MFACDS, MULTDS, PCGDS, QIMUDS, SOLVDS] [FIXPNF] [ROOTNF, STEPNF, [TANGNF]] [ROOT] [FIXPNS] [ROOTNS, STEPNS, TANGNS] [GMFADS, MFACDS, MULTDS, PCGDS, PCGNS, QIMUDS, ROOT, SOLVDS] [FIXPQF] [ROOTQF, STEPQF, TANGQF] [QRFAQF, QRSLQF, RlUPQF, UPQRQF] [FIXPQS] [ROOTQS, STEPQS, TANGQS] [GMFADS, MULTDS, PCGQS, SOLVDS] The BLAS subroutines used by HOMPACK are DAXPY, DCOPY, DDOT, DNRM2, DSCAL, DiMACH, IDAMAX. The user written subroutines, of which exactly two must be supplied depending on the driver chosen, are F, FJAC, JFACS, RHO, RHOA, RHOJAC, RHOJS. The special purpose polynomial system solver POLSYS is essentially a high level driver for HOMPACK. POLSYS requires special versions of RHO and RHOJAC (subroutines normally provided by the user). These special versions are included in HOMPACK, so for a polynomial system the user need only call POLSYS, and define the problem directly to POLSYS by specifying the 22

polynomial coefficients. POLSYS scales and computes partial derivatives on its own. Thus the user interface to POLSYS and HOMPACK is clean and simple. The only caveat is that FFUNP cannot recognize patterns of repeated expressions in the polynomial system, and so may be less efficient than a hand crafted version. If great efficiency is required, the user can modify the default FFUNP; the sections in the code which must be changed are clearly marked. The grouping is: [POLSYS] [POLYNF, POLYP, ROOTNF, STEPNF, TANGNF] [DIVP, FFUNP, GFUNP, HFUNP, HFUN1P, INITP, MULP, OTPUTP, POWP, RHO, RHOJAC, ROOT, SCLGNP, STRPTP] 5. Applications. The globally convergent probability one homotopy algorithms described here have been successfully applied to a wide range of practical scientific and engineering problems. A survey of some engineering applications prior to 1980 is given in [76]. Some problems for which locally convergent iterative methods either totally or partially failed, and for which homotopy methods succeeded, are given in the references [12], [20], [22], [44], [49-67], [70], [76], [78-81], [83]. The example at the beginning of this paper is typical. The fields where homotopy methods have been successfully applied include: nonlinear structural mechanics (limit point and postbuckling analysis), fluid mechanics, statistics (nonlinear regression), chemistry (reaction and equilibrium equations), chemical engineering (reactor modelling and process control), robotics (inverse kinematics), computer vision (shape from shading, structure from motion), economics (fixed points), industrial engineering (nonlinear complementarity), control theory (pole placement), CAD/CAM (solid modelling). 6. Parallel computation. This section describes some ongoing work to implement HOMPACK on a parallel computer. The serial version of HOMPACK was structured so as to be easily adapted to a vector machine, by segregating most of the linear algebra into subroutines; The appropriate algorithmic decomposition for a parallel computer like the hypercube is not so clear. Here the implementation of HOMPACK on a hypercube machine for one special class of problems, polynomial systems, is addressed. 6.1. The hypercube. The word "hypercube" refers to an n-dimensional cube. Think of a cube in n dimensions as sitting in the positive orthant, with vertices at the points (Vi,... Vn), Vi {0,1}, i= 1..., n. There are thus 2n vertices, and two vertices v and w are "adjacent", i.e., connected by an edge, if and only if vt = wi for all i except one. The associated graph, also sometimes referred to an an 23

Figure 3. 4-cube structure and node labelling. "In-cube", has 2" vertices (which can be labelled as above with binary n-tuples) and edges between vertices whose labels differ in exactly one coordinate (see Figure 3). A "hypercube computer architecture"' is a computer system with 2" (node) processors, corresponding to the 2n vertices (nodes), and a communication link corresponding to each edge of the n-cube. Thus each processor has a direct communication link to exactly n other processors. The distance between any two of the P = 2' processors is at most n = log2 P = log2(2"), considered an ideal compromise between total connectivity (distance = 1) and ring connectivity (distance = P/2). Figure 1 shows how a 4-cube is built up from two 3-cubes. Typically the node label (v1,... v,) is viewed as a binary number v1 v2... Vn,, and in this view two nodes are adjacent if and only if their binary representations differ in exactly one bit. Typically node addresses are computed in programs by a gray code, a bijective function g9: {O,... -,2n- 1}2-} such that the binary representations of g(k (mod 2n)) and g(k + 1 (mod 2n)) differ in exactly one bit for all k. Realizations of this abstract architecture have one additional feature: a "host" processor with communication links to all the node processors. This host typically loads programs into the nodes, starts and stops processes executing in the nodes, and interchanges data with the nodes. In current hardware implementations only the host has external I/O and peripheral storage; the nodes consists of memory, a CPU, and possibly communication and floating-point hardware. 24

The Intel iPSC has 32, 64, or 128 nodes. Each node is an 80286/80287 with 512K bytes of memory. The host is also an 80286/80287, but with 4 MB of memory, a floppy disk drive, a hard disk, an Ethernet connection, and Xenix. The nodes have only a minimal monitor for communication and process management. The NCUBE has up to 1024 nodes in multiples of 64, each with 128K of memory and communication and floating-point hardware. The host is an 80286, running NCUBE's operating system, a primitive version of UNIX. The node chip is NCUBE's own design, with a unique feature being communication hardware. 6.2. Polynomial systems. Suppose that the components of the nonlinear system of equations (2.1) have the form ni n Fi(x) -Eaikxi i- = 1..,n. (6.1) k= j=l The ith component Fi(x) has ni terms, the aik are the (real) coefficients, and the degrees dijk are nonnegative integers. The total degree of Fi is n di = max dijk (6.2) ji= The total degree of the entire system (6.1) is d= dli*. dn. (6.3) The homotopy map Pc: [0, 1) x Cn -* Cn has the form pC(A, x) = (1 - A)G(x) + AF(x), (6.4) where F(zx) is now regarded as a complex map F: Cn -, Cn, and G: Cn -f Cn is defined by Gj(x) = bj x - aj, j=1,...,n, (6.5) where aj and bj are nonzero complex numbers, dj was defined in (6.2), and c = (al,..., a,, b1,... b) is the parameter vector for the homotopy map. Through a series of ingenious tricks - converting F(x) to a complex map and then back to a real map, converting to projective coordinates, and adding a linear equation in projective space - the following attributes can be achieved (with probability one): 1) there are exactly d zero curves of pc emanating from A = 0; 2) dA/ds > 0 along each zero curve; 3) all the zero curves are bounded; 4) every isolated solution of (2.1) is reached by some zero curve of pc. 25

There is thus a rigorous theoretical algorithm [33-34, 82] for finding all solutions to a polynomial system (2.1). This algorithm is implemented in subroutine POLSYS from HOMPACK. 6.3. Computational results. Polynomial systems arise in such diverse areas as solid modelling, robotics, chemical engineering, mechanical engineering, and computer vision. A small problem has total degree d < 100 and a large problem has d > 1000. Given that d homotopy paths are to be tracked, there are two extreme approaches using a hypercube. The first extreme, with the coarsest granularity possible, is to assign one path to each node processor, with the host controlling the assignment of paths to the nodes, keeping as many nodes busy as possible, and post-processing the answers computed by the nodes. The second extreme, with the finest granularity, is to track all d paths on the host processor, distributing the numerical linear algebra, polynomial system evaluation, Jacobian matrix evaluation, and possibly other things amongst the nodes. Because of the high communication overhead (whether hardware or software) on a hypercube, this approach requires an immense amount of sophistication and analysis to prevent the communication costs from overwhelming the computational costs. Also the algorithm at this granularity would have to be a major modification of the serial algorithm. A possible advantage is that the load could be balanced better, resulting in an overall speedup over a coarser grained algorithm. These issues are not simple, and much more research is needed on these and intermediate approaches. Neither extreme can be declared inferior a priori. The first approach, having a node track an entire path, has been tried, and some results are shown in Table 1. The problem number refers to an internal numbering scheme used at General Motors Research Laboratories; problem data is available on request. These problems are all real engineering problems that have arisen at GM and elsewhere. Table 1. Execution time (secs). Problem total 80286/ VAX VAX number degree 80287 iPSC-32 11/750 11/780 102 256 16257 645 2438 1248 103 625 34692 1616 5260 2656 402 4 255 54 41 18 403 4 84 19 14 6 405 64 3669 335 703 334 601 60 9450 257 1707 796 602 60 28783 2795 4332 2054 603 12 1200 243 325 152 803 256 - 11527 29779 16221 1702 16 1655 163 216 112 1703 16 1657 162 216 112 1704 16 1628 108 216 112 1705 81 14336 378 1884 999 5001 576 -- 11786 49736 27815 The parallel computation scheme of assigning a path to each node seems completely transparent and straightforward. Why this is not so is illustrative of the difficulty of the whole field of parallel computation. A detailed discussion of tasks that should be trivial, but were not, is contained in t351. 26

7. References. [1] J. C. Alexander, The topological theory of an imbedding method, in Continuation Methods (H. G. Wacker, Ed.), Academic Press, New York, 1978, pp. 37-68. [2] J. C. Alexander and J. A. Yorke, The homotopy continuation method: numerically implementable topological procedures, Trans. Amer. Math. Soc., 242 (1978), pp. 271-284. [3] J. C. Alexander, T.-Y. Li, and J. A. Yorke, Piecewise smooth homotopies, in Homotopy Methods and Global Convergence, B. C. Eaves, F. J. Gould, H.-O. Peitgen, and M. J. Todd, eds., Plenum, New York, 1983, pp. 1-14. [4] E. Allgower and K. Georg, Simplicial and continuation methods for approximating fixed points, SIAM Rev., 22 (1980), pp. 28-85. [5] S. C. Billups, An augmented Jacobian matrix algorithm for tracking homotopy zero curves, M.S. Thesis, Dept. of Computer Sci., VPI & SU, Blacksburg, VA, Sept., 1985. [6] P. Boggs, The solution of nonlinear systems of equations by A-stable integration techniques, SIAM J. Numer. Anal., 8 (1971), pp. 767-785. [7] P. Businger and G. H. Golub. Linear least squares solutions by Householder transformations, Numer. Math., 7 (1965), pp. 269-276. [81 S. N. Chow, J. Mallet-Paret, and J. A. Yorke, Finding zeros of maps: homotopy methods that are constructive with probability one, Math. Comp., 32 (1978), pp. 887-899. [9] S. N. Chow, J. Mallet-Paret, and J. A. Yorke, A homotopy method for locating all zeros of a system of polynomials, in Functional Differential Equations and Approximation of Fixed Points, H.-O. Peitgen and H. 0. Walther, eds., Springer-Verlag Lecture Notes in Math. 730, New York, 1979, pp. 228-237. [10, D. F. Davidenko, On the approximate solution of systems of nonlinear equations, Ukrain. Mat. Z., 5 (1953), pp. 196-206. l11] J. E. Delmis and R. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, NJ, 1983. [121 J. P. Dunyak, J. L. Junkins, and L. T. Watson, Robust nonlinear least squares estimation using the Chow- Yorke homotopy method, J. Guidance, Control, Dynamics, 7 (1984), pp. 752-755. [13] B. C. Eaves, Homotopies for the computation of fixed points, Math. Programming, 3 (1972). pp. 1-22. [14] B. C. Eaves and R. Saigal, Homotopies for computation of fixed points on unbounded regions, Math. Programming, 3 (1972), pp. 225-237. [15! D. K. Fadeev and V. N. Fadeeva, Computational Methods of Linear Algebra, Freeman, London, 1963. [16] K. Georg, private communication, 1981. [17] K. Georg, A note on stepsize control for numerical curve following, in Homotopy Methods and Global Convergence, B. C. Eaves, F. J. Gould, H.-O. Peitgen, and M. J. Todd, eds., Plenum, New York, 1983, pp. 145-154. [18] K. Georg, Numerical integration of the Davidenko equation, in Numerical Solution of Nonlinear Equations, E. Allgower, K. Glashoff, H.-O. Peitgen, eds., Springer-Verlag Lecture Notes in Math. 878, New York, 1981. [19] P. E. Gill and W. Murray, Niewton type methods for unconstrained and linearly constrained optimization, Math. Programming, 7 (1974), pp. 311-350. 27

[20] M. W. Heruska, L. T. Watson, and K. K. Sankara, Micropolar flow past a porous stretching sheet, Comput. & Fluids, to appear. [21] M. R. Hestenes, The conjugate gradient method for solving linear systems, in Proc. Symp. Appl. Math., 6, Numer. Anal., AMS, New York, 1956, pp. 83-102. [22] M. P. Kamat, L. T. Watson, and V. B. Venkayya, A quasi-Newton versus a homotopy method for nonlinear structural analysis, Comput. & Structures, 17 (1983), pp. 579-585. [231 H. B. Keller, Numerical Solution of Two-point Boundary Value Problems, Society for Industrial and Applied Mathematics, Philadelphia, 1976. [24], Numerical solution of bifurcation and nonlinear eigenvalue problems, in Applications of Bifurcation Theory, Academic Press, New York, 1977. [25] R. W. Klopfenstein, Zeros of nonlinear functions, J. Assoc. Comput. Mech., 8 (1961), pp. 336-373. [26] M. Kubicek, Dependence of solutions of nonlinear systems on a parameter, ACM-TOMS, 2 (1976), pp. 98-107. [27] H. H. Kwok, M. P. Kamat, and L.. T. Watson, Location of stable and unstable equilibrium configurations using a model trust region quasi-Newton method and tunnelling, Comput. & Structures, 21 (1985), pp. 909-916. [28| R. Mejia, CONKUB: A conversational path-follower for systems of nonlinear equations, J. Comput. Phys., to appear. [293 R. Menzel and H. Schwetlick, Zur Losung parameterabhdngiger nichtlinearer Gleichungen mit singuldren Jacobi-Matrizen, Numer. Math., 30 (1978), pp. 65-79. [30! 0. Merrill, Applications and extensions of an algorithm to compute fixed points of upper semicontinuous mappings, Doctoral thesis, I. 0. E. Dept., Univ. of Michigan, Ann Arbor, 1972. [31] G. Meyer, On solving nonlinear equations with a one-parameter operator embedding, SIAM J. Numer. Anal., 5 (1968), pp. 739-752. 3321 J J. More, B. S. Garbow, and K. E. Hillstrom. User Guide for MIIPACK-1, ANL-80-74, Argonne National Laboratory, Argonne, IL (1980). 33] A. P. Morgan, A transformation to avoid solutions at infinity for polynomial systems, Appl. Math. Comput., 18 (1986), pp. 77-86. [34], A homotopy for solving polynomial systems, Appl. Math. Comput., 18 (1986), pp. 87-92. [35] A. P. Morgan and L. T. Watson, Solving nonlinear equations on a hypercube, Proc. ASCE Structures Congress, New Orleans, Sept., 1986. |36 W. C. Rheinboldt and C. Den Heyer, On. steplength algorithms for a class of continuation methods, SIAM J. Numer. Anal., 18 (1981), pp. 925-947. 337j W. C. Rheinboldt, On the computation of critical boundaries on equilibrium surfaces, SIAM J. Numer. Anal., 19 (1982), pp. 653-669. [38j W. C. Rheinboldt and R. Melhem, A comparison of methods for determining turning points of nonlinear equations, Computing, 29 (1982), pp. 103-114. [399 W. C. Rheinboldt and J. V. Burkardt, Algorithm 596: A program for a locally parameterized continuation process, ACM Trans. Math. Software, 9 (1983), pp. 236-241. [40] E. Riks, An incremnental approach to the solution of snapping and buckling problems, Internat. J. Solids Structures, 15 (1979), pp. 529-551. [41J R. Saigal, On the convergence rate of algorithms for solving equations that are based on methods of complementary pivoting, Math. Operations Res., 2 (1977), pp. 108-124. 28

[42] R. Saigal and M. J. Todd, Efficient acceleration techniques for fixed point algorithms, SIAM J. Numer. Anal., 15 (1978), pp. 997-1007. [43] R. Saigal, An efficient procedure for traversing large pieces in fixed point algorithms, in Homotopy Methods and Global Convergence, B. C. Eaves, F. J. Gould, H.-O. Peitgen, and M. J. Todd, eds., Plenum, New York, 1983, pp. 239-248. [44] K. K. Sankara and L. T. Watson, Micropolar flow past a stretching sheet, Z. Angew. Math. Phys., 36 (1985), pp. 845-853. [45] H. Scarf, The approximation of fied points of a continuous mapping, SIAM J. Appl. Math., 15 (1967), pp. 1328-1343. [46] M. R. Scott, Invariant Imbedding and its Applications to Ordinary Differential Equations, Addison-Wesley, Reading, MA, 1973. [47] L. F. Shampine and M. K. Gordon, Computer Solution of Ordinary Differential Equations: The Initial Value Problem, W. H. Freeman, San Francisco, 1975. [48] S. Smale, Convergent process of price adjustment and global Newton methods, J. Math. Econom., 3 (1976), pp. 107-120. [49] C. Y. Wang and L. T. Watson, Squeezing of a viscous fluid between elliptic plates, Appl. Sci. Res., 35 (1979), pp. 195-207. [50], Viscous flow between rotating discs with injection on the porous disc, Z. Angew. Math. Phys., 30 (1979), pp. 773-787. [51], On the large deformations of C-shaped springs, Internat. J. Mech. Sci., 22 (1980), pp. 395-400. [52], Theory of the constant force spring, J. Appl. Mech., 47 (1980), pp. 956-958. [53], The fluid-filled cylindrical membrane container, J. Engrg. Math., 15 (1981), pp. 81-88. [54], Equilibrium of heavy elastic cylindrical shells, J. Appl. Mech.. 48 (1981), pp. 582-586. [55], The elastic catenary, Internat. J. Mech. Sci., 24 (1982), pp. 349-357. L56], Free rotation of a circular ring about a diameter, Z. Angew. Math. Phys., 34 (1983), pp. 13-24.,57 C. Y. Wang. L. T. Watson. and M. P. Kamat, Buckling, postbuckling and the flow through a tethered elastic cylinder under external pressure, J. Appl. Mech., 50 (1983), pp. 13-18. [58] L. T. Watson, T. Y. Li and C. Y. Wang, Fluid dynamics of the ellilptic porous slider, J. Appl. Mech., 45 (1978), pp. 435-436. 591 L. T. Watson and C. Y. Wang, A homotopy method applied to elastica problems, Internat. J. Solids Structures, 17 (1981), pp. 29-37. [60], Deceleration of a rotating disc in a viscous fluid, Phys. Fluids, 22 (1979), pp. 2267-2269. [61], The circular leaf spring, Acta Mech., 40 (1981), pp. 25-32. [62], Hanging an elastic ring, Internat. J. Mech. Sci., 23 (1981), pp. 161-168. [63] _ Overhang of a heavy elastic sheet, Z. Angew. Math. Phys., 33 (1982), pp. 17-23. [64], Periodically supported heavy elastic sheet, J. Engrg. Mech., 109 (1983), pp. 811-820. [65], The equilibrium states of a heavy rotating column, Internat. J. Solids Structures, 19 (1983), pp. 653-658. 66] L. T. Watson and W. H. Yang, Optimal design by a homotopy method, Applicable Anal., 10 (1980), pp. 275-284. 29

[67], Methods for optimal engineering design problems based on globally convergent methods, Comput. & Structures, 13 (1981), pp. 115-119. [681 L. T. Watson and D. Fenner, Chow- Yorke algorithm for fixed points or zeros of C2 maps, ACM TOMS, 6 (1980), pp. 252-260. [69] L. T. Watson. A globally convergent algorithm for computing fixed points of C2 maps, Appl. Math. Comput., 5 (1979), pp. 297-311. 170], Numerical study of porous channel flow in a rotating system by a homotopy method, J. Comput. Appl. Math., 7 (1981), pp. 21-26. [71], Computational experience with the Chow-Yorke algorithm, Math. Programming, 19 (1980), pp. 92-101. [72], Fixed points of C2 maps, J. Comput. Appl. Math., 5 (1979), pp. 131-140. [73], Solving the nonlinear complementarity problem by a homotopy method, SIAM J. Control Optim., 17 (1979), pp. 36-46. [74], An algorithm that is globally convergent with probability one for a class of nonlinear two-point boundary value problems, SIAM J. Numer. Anal., 16 (1979), pp. 394-401. [75], Solving finite difference approximations to nonlinear two-point boundary value problems by a homotopy method, SIAM J. Sci. Stat. Comput., 1 (1980), pp. 467-480. r76], Engineering applications of the Chow-Yorke algorithm, Appl. Math. Comput., 9 (1981), pp. 111-133. [77], Numerical linear algebra aspects of globally convergent homotopy methods, Tech. Rep. 86-7, Dept. of Industrial and Operations Eng., Univ. of Michigan, Ann Arbor, 1986. i78j L. T. Watson. S. M. Holzer, and M. C. Hansen, Tracking nonlinear equilibrium paths by a homotopy method, Nonlinear Anal.. 7 (1983), pp. 1271-1282. [79] L. T. Watson, K. K. Sankara, and L. C. Mounfield, Deceleration of a porous rotating disk in a viscous fluid, Internat. J. Engrg. Sci., 23 (1985), pp. 131-137. i80] L. T. Watson, M. P. Kamat, and M. H. Reaser, A robust hybrid algorithm for computing multiple equilibrium solutions, Engrg. Comput., 2 (1985), pp. 30-34. 811 L. T. Watson and M. R. Scott, Solving spline collocation approximations to nonlinear two-point boundary value problems by a homotopy method, Tech. Rep. CS84015-R, Dept. of Computer Sci., VPI&SU, Blacksburg, VA, 1984. [82] L. T. Watson, S. C. Billups, and A. P. Morgan, HOMPACK: A suite of codes for globally convergent homotopy algorithms, Tech. Rep. 85-34, Dept. of Industrial and Operations Eng., Univ. of Michigan, Ann Arbor, MI, 1985. 1839 L. T. Watson and L. R. Scott, Solving Galerkin approximations to nonlinear two-point boundary value problems by a globally convergent homotopy method, Tech. Rep. 86-16, Dept. of Industrial and Operations Eng., Univ. of Michigan, Ann Arbor, MI, 1986. 30

x Figure 2. The zero set p'~(0). y Figure 1. Elastic rod. 3 7 0001 I 0 11 too4 00 010 10 I 1Figure 3. 0ubd ne Figure 3srcrannd14 Figure 3. 4-axbe structure and node labelling.

UNIVERSITY OF MICHIGAN 3 9015 03994 8131