EFFICIENT SOLUTION OF TWO STAGE STOCHASTIC LINEAR PROGRAMS USING INTERIOR POINT METHODS John R. Birge and Derek Holmes Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 92-8 January 1992

Efficient Solution of Two Stage Stochastic Linear Programs using Interior Point Methods J. Birge Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109 (313) 764-9422 email: John.R.Birge(@um.cc.umich.edu D. Holmes Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109 (313) 764-9422 30 December 1991. email: Derek.Holmes@um.cc.umich.edu Professor Birge's work is supported by the National Science Foundation Grant EECS-885101.

Keywords: Interior-point algorithms, stochastic programming. Abstract Solving deterministic equivalent formulations of two stage stochastic linear programs using interior point methods may be computationally difficult, due to the need to factorize quite dense search direction matrices (e.g. AAT). Several methods for improving the algorithmic efficiency of interior point algorithms by reducing the density of these matrices have been proposed in the literature. Reformulating the program decreases the effort required to find a search direction, but at the expense of increased problem size. Using transpose product formulations (e.g. ATA) work well, but are highly problem dependent. Schur complements require solutions to potentially near singular matrices, and so suffer from numeric instabilities. Explicit factorizations of the search direction matrices eliminate these problems while only requiring the solution to several small, independent linear systems. These systems may be distributed across multiple processors. Computational experience with these methods suggest that substantial performance improvements are possible with each method, but the least computational effort is required by the use of explicit factorizations.

1 Introduction Many practical problems with uncertain parameters can be modeled as stochastic programs. Some examples include cash and portfolio management models (Mulvey [34]), electric power generation capacity planning (Louveaux [28]), and forestry management (Gassman, [19]). A survey of documented stochastic programming formulations can be found in King [27]. Most basic among stochastic programs are Stochastic Linear Programs (SLPs) which are the stochastic extensions of standard linear programs. Even small linear programs may lead, however, to large SLPs and extensive computational requirements since the size of these problems typically grows exponentially with the number of stochastic parameters in the formulation. The recent advent of interior point methods for the solution of large linear programs (Marsten, et.al. [30], Carolan, et. al. [10], and Monma and Morton [33]), however, holds great promise for the efficient solution of these problems. The basic requirement is the efficient solution of a sequence of large, symmetric, positive definite systems of linear equations. Generally, the solutions to these systems are obtained by factoring the coefficient matrix into some equivalent triangular matrix and back solving with a given right hand side. The ease with which the factorizations are obtained decreases significantly as the density of the coefficient matrix increases. Unfortunately, the structure of SLPs can lead to quite dense systems, limiting the use of interior point methods for their solution. The purpose of this paper is to review various methods for improving the efficiency of solving the linear systems associated with (two stage) Stochastic Linear Programs, and report both serial and parallel computational experience with one particularly promising method described in Birge and Qi [6]. This method has been shown to have a worst case computational complexity at least an order of the number of variables better than that of the standard Karmarkar algorithm. In Section 2, we review the structure of stochastic linear programs with fixed recourse and the computational requirements of interior point programs. In Section 3, we look at methods for improving the running time of the interior point codes, and focus on the Birge and Qi method. Computational results appear in Section 4. A brief summary is contained in Section 5. 2 Preliminaries 2.1 Stochastic Linear Programs Two-stage stochastic linear programs with fixed recourse that are defined over a discrete probability space have the general form minimize cT x + Q(xo, ) subject to Aoxo = bo, (1) xo > 0, where Q(xo, () is a recourse functional describing the expected costs of undertaking a specific action xo before the uncertainty characterized by the random variable ( is resolved. The expectation of the recourse cost is obtained by N Q(xo, ) = plQ(xo ), 1=1 1

where pi is the probability that the Ith scenario occurs (i.e. pi = P[(0) = 61] where (l is a realization of the random variable (, defined on (-, A, P)), and Q(xo, l) is the recourse cost obtained by solving the following recourse problem: Q(xo,l) = inf{dlyl I Wlyi = bi - Tixo, yi > 0,, yl E R=}, l (di,bT, 7W), (2) for each scenario I = 1,...N. Here, a decision x0 is made before 4/ is known, and a "corrective" optimal action yl is taken after ~l is known. The cost of the second action is Q(xo, l), and the expected cost with respect to the random variable ~ is Q(xo,Q ). Note here that the solution to Q(xo, l) assumes that xo has been fixed. This nonanticipativity restriction requires that all first stage decisions are invariant with respect to future outcomes. Substituting the recourse program into (1) and simultaneously minimizing over (Xo, yi,... YN) we obtain minimize cTxo + Nf1 CTyl subject to AoXo = b~ T1xo + Wlyl = b I = 1,... N X0, yl > 0, where co, xo0 S0 T1 E 1mXno, Ao E W 1E0,, 1 mX c = p idT gnl for l =,...N. Note that this problem has n = no + EiN= nl columns and m = mo + EZh= mi constraints. For purposes of discussion, we assume that Ao, We have full row rank, ml < n, I = 0,... N, and no < EN= nl. This problem, classified as a dual block angular linear program, was first studied by Beale [4] and Dantzig [14]. Several special purpose algorithms for solving linear programs with this special structure have been developed, including the L-shaped method of Van Slyke and Wets [39] and the decomposition method pioneered by Dantzig and Madansky [15]. Interior point algorithms such as those proposed by Karmarkar [26] and Marsten, et. al. [30] applied to these problems have been discussed by Birge and Qi [6], and Carpenter, et. al. [11]. 2.2 Interior Point Methods In the last decade, several breakthroughs in general purpose linear programming algorithms have been made using path following methods [21]. Kamarkar [26] pioneered these breakthroughs with the first algorithm that could be proven to converge to an optimal solution in polynomial, or O(n2m2L) time, where n is the size of the problem and L is a measure of the problem's data. By contrast, the worst case complexity of the simplex method cannot be bounded by a polynomial. For the purposes of discussion, we focus on what is generally called the dual affine scaling method [1] as applied to the dual-block angular program described above. Consider a linear program in the following standard equality form: (P) minimize cTx subject to Ax = b, x > 0, where A E sRmxn and has full row rank, b E sm is a resource vector, and c E an is the objective function vector. The dual affine variant finds an optimal solution to the dual (D) to the problem 2

(P): (D) maximize bTy subject to ATy < c, where y e -Rm is a dual vector to the equality constraints of (P). In general, we could also re-write (P) in the polyhedral inequality form of (D) and apply the same algorithm. In this case, the method performs the same steps as the procedure called primal affine scaling ([40] [3] [16]). The only difference is in the form of the matrix factorization ([8]). We can, therefore, use simply affine scaling to refer to these methods. Applying affine scaling to problem (D) requires an initial interior point yo that satisfies dual feasibility (ATy~ < c), and successively iterates on a current point to find another interior point with a better objective function value. The algorithm proceeds as follows: 1. k = 0. Select a stopping criterion (e.g., Stop if bTyk+l - bTyk < 6, where e > 0 is a small positive number. 2. Stop if optimality criterion is satisfied. 3. Calculate dual slack variables: vk = c - ATyk. 4. Calculate the search direction, which is a function of a Newton step taken toward the "center" of the feasible region and a steepest descent step in some transformed space: (a) Let Dk - diag{(1/v),.., (l/ )}. (b) Let dy = (A(Dk)2AT)-lb (c) Let dv = -ATdy 5. Calculate a step size: (a) Let a = 7 x min{k - (dv)i (dv)i < 0, i = 1,..., m} Here, 0 < y < 1 is a step size parameter to insure that the next iterate will remain interior to the feasible solution set. For most practical purposes, 0.95 < 7 < 1 is sufficient. 6. Update dual variables, primal variables, and counters: (a) Let yk+l = yk + ady (b) Let xk+l = (Dk)2dv (c) Let k = k + 1 (d) Goto 2 Dual Affine Algorithm 3

The vast majority of the computational effort required in the above procedure is to calculate a solution to the system (AD2AT)dy = b (the iteration counter will be dropped whenever the context is clear), or to calculate some factorization of the matrix to enable quick solution of the system. These computations are common to every interior point algorithm developed thus far (Shanno and Bagchi, [37]). The matrix M AD2AT is a large (M E ismxm) symmetric, positive definite matrix for which several solution methods have been developed (see, e.g. Golub and Van Loan [23]). Generally speaking, a direct inversion of M is an inefficient solution method, and is seldom used in large scale implementations. There are two main strategies for solving the system (AD2AT)dy = b. They are iterative methods and direct methods. 1. Iterative methods generate a sequence of approximations to dy. Since these methods only involve vector-matrix multiplication, they are computationally more attractive and require less storage than alternative solution procedures. Convergence to solutions of these linear systems can be unacceptably slow, unless special tricks (e.g., matrix preconditioners) are employed. Examples of iterative methods include the Jacobi, Gauss-Seidel, Chebychev, Lanczos, and conjugate gradient methods. Meijerink and van der Vorst [32] discuss the use of these methods in interior point algorithms. 2. Direct methods calculate the exact solution to the set of equations (AD2AT)dy = b by factoring the matrix (AD2AT), and using backwards/forwards substitution to find dy. The most common schemes in use are (LU) factorization and Cholesky (LLT) factorization. The effectiveness of these methods depends on the use of special data structures and pivoting rules, and on the characteristics of the coefficient matrix itself. Examples of software implementations include YSMP (Eisenstadt [18]) and SPARSPAK (Chu, et. al. [13]). Direct and iterative methods can also be combined by using ideas from the direct solution procedures to generate an effective preconditioner that improves the convergence of iterative methods. This paper will focus on implementations of interior point algorithms that use direct methods only. The ease with which direct methods may be used depends heavily on the amount of fill-in, or density of the factorized matrix. Matrices which can be rearranged to minimize fill-in can be stored using less memory, and hence require fewer operations to update the factorization or obtain a solution. However, matrices that are ill-structured will generate an extremely dense matrix M, and hence can be quite inefficient to solve. The density of the matrix (AD2AT) largely depends on the number of dense columns that are contained in the original matrix A. Unfortunately, the dual block angular program (3) described in ~2.1 (potentially) contains many dense columns. To see this, let D2 E moxmo and Do 6 mlXml be defined by D2 = diag {(v)-2,...,(v, )-2},1 = 0,...,N. Suppose further that T' = T,W = W,l = 1,..., N. Then the required system to solve is: Ao ' D2 ' * Ao TT **2 T' T T W D2 Wr AD2AT =.. X x T W D N WT 4

AoDo AT AoD2 TT AoDT.. AoD2T TD2AT TD2TT + WD2WT TD2TT... TDoTT TD2 AT 2 T 2 T...2WT T - TDoAo TDo2T TDoTT + WD2WT... TDoTT TDoAo TD2TT TD2TT... TDoTT + WDvWT Clearly, the presence of the columns associated with the T matrices creates an extremely dense M matrix to factorize. For this reason, Arantes and Birge [2] found that dual block angular programs in the primal form are expensive (if possible) to solve, even with basic preprocessing or row reordering to reduce fill-in. The extent to which the fill-in affects computational performance will be explored empirically in Section 4. As we will see in the next section, there are many ways to approach the problem of dense columns in a coefficient matrix to reduce fill-in and improve solution times. 3 Methods for Reducing Computational Requirements Several modifications to both the formulation of the block angular program and the implementation of the interior point algorithm required for its solution have been proposed. Their intent is either to reduce the number of dense columns that are in the coefficient matrix of the linear program or to separate them explicitly from the other (non-dense) columns. Four alternatives will be explored in this section: reformulation of the program to split up dense columns (Carpenter et. al. [11]), solution of the dual to the program (or a factorization formed from the dual, Arantes and Birge [2] and Birge, Freund and Vanderbei [8]), the use of the Schur Complement to remove dense columns (Lustig, et. al. [29]), and direct solution by a special factorization of the matrix AD2AT (Birge and Qi [6]). 3.1 Reformulation of the Program Carpenter, Lustig, and Mulvey [11] consider a two-stage generalized stochastic network derived from a portfolio management model (developed in Mulvey and Vladimirou [35]). The model can be written in the standard form given in (3), where Ao constrains the flow of capital between assets in the first stage of the problem, and W contains the stochastic arc multipliers that describe the yields of each asset modeled. To remove the dense columns associated with the first stage decisions, they split the first no columns into scenario dependent variables. Whereas nonanticipativity in the original formulation ( 3) is enforced by the fact that the first no columns are invariant with respect to each scenario, they enforce nonanticipativity by including explicit constraints that guarantee the invariance. Specifically, if xo(l) is the first stage decision given that scenario I occurs, nonanticipativity requires that xo(l + 1) = xo(l) for all 1 = 1,..., N - 1. The resulting formulation (called the full-splitting formulation) removes nonzeros from the first stage columns for scenario I from all constraints except those associated with first stage columns 5

for scenarios I - 1 and I + 1, and is min CoXZ + c yl + + CNYN st Aozo = b~ IXo -Ix1 = 0 Tx1 +Wy = b1 (4) Ilx -Ix2 = 0 TXN +WYN = bN By concentrating the non-zero elements of the constraint matrix of ( 4) around the diagonal, the density of the matrix AD2AT is reduced. However, the full-splitting formulation also increases the overall size of the problem. In a set of ten portfolio test problems run in [11], the average increase in the number of rows was 47.2% and the average increase in the number of columns was 12.8%. Further improvements to the full splitting formulation can be made by only splitting those firststage variables which have non-zero elements in the rows of T1, and leaving the remaining first-stage decision variables in the A matrix only. This partial splitting representation can effectively limit the increase in problem size that occurs in the full splitting formulation. For the stochastic network test problems in the paper, the average row and column growths for the partial splitting model were 12.8% and 5.2%, respectively. Although reformulating the dual block angular problem increases its size, the resulting improvement in AD2AT fill-in can be substantial. On the same test problems mentioned above, the original formulation ( 3) had (on average) 2.47 times more nonzeros than the partial splitting representation. As a result, the Cholesky factorizations of the AD2AT matrices are also less dense (on average 2.8 times less), and can be solved more efficiently. As might be expected, the reduction in matrix density also has a substantial impact on solution times. When solved using a commercial implementation of a primal-dual interior point algorithm (OBI [29]), the full splitting formulation was (on average) 4.65 times faster than the original formulation. The partial splitting formulation was (on average) 10.8 times faster than the original formulation. The average speedup of the partial splitting model over the original formulation run on a well-known simplex based LP solver (MINOS 5.3, developed by Murtagh and Saunders, [36]) was 5.58. Unfortunately, the effectiveness of reformulating the linear program is largely dependent on the relative sizes and forms of the first stage coefficient matrices. In general, two-stage stochastic linear programs may have large first stage coefficient matrices or many first stage decision variables which are in linking constraints to recourse decisions. Since Carpenter, et. al. only considered one type of stochastic decision problem, other methods which are not as problem specific might be more robust. 3.2 Using the Transpose Product Factorization The density of the matrix AD2AT was shown in Section 2.2 to be quite high when a problem which exhibits a dual block angular structure is solved using dual affine scaling. Arantes and Birge [2] suggested that reformulating the primal problem in the polyhedral inequality form would allow more efficient computation since the relevant matrix for computation is ATD2A which is much sparser than AD2AT. Reformulation is in fact not necessary (see [8] where it is shown that 6

computations with the AD2AT matrix structure can always be replaced by computations with the ATD2A structure). It is, however, illustrative to think of this approach as solving the dual to the original problem using again the dual affine scaling method. The dual of ( 3) is max bTyo + EZN bT st ATo +E 1 T yl < co (5) WTyl < Cl Since the resulting coefficient matrix B of (6) has n = no + ENi nl rows and m = mo + ENi ml columns, BD2BT is of order n x n, and so may be considerably larger than the coefficient matrix of (3). However, the matrix BD2BT exhibits the ATD2A matrix structure and enables an efficient Cholesky factorization. Specifically, the matrix BD2BT is AT DoAo + EN1 TTDiT TTDW... TTDT W WTD2T WD1WT 0 0 BD2BT 1 (6) = 0 '. 0 WTD2T 0 0 WD2 WT As an example of the different fill-in characteristics of the primal and dual problems, pictures of the AD2AT and BD2BT matrices are shown in Figure 1 for a test problem. 1 Since the density of the Cholesky factorization of a matrix depends on the symmetry of that matrix, the factorization of the matrix ( 6) should be sparse. Solution times reported by Arantes and Birge for a common test set (Ho and Loute, [24], Birge, [9]) were much faster using the BD2BT form over the AD2AT form. The largest problem they tried (32 scenarios, 7023 nonzeroes in M) was a full order of magnitude faster with BD2BT. However, the off-diagonal blocks of the matrix BD2BT contain matrix products with the recourse matrix W, whereas the matrix AD2AT does not. Hence, if W is unusually large or dense, solving with BD2BT may not be as efficient. More extensive comparisons may be found in Section 4. In general, solving stochastic linear programs with the form based on the ATD2A structure (as in BD2BT) appears preferable to solving using the AD2AT structure. 3.3 Schur Complement Many implementations of interior point algorithms avoid dense columns in coefficient matrices by explicitly removing them and accounting for them separately. The mechanism for solving the system (AD2AT)dy = b is the Schur complement, which involves solving a small, dense matrix derived from a larger sparse matrix. Consider a coefficient matrix A which can be partitioned into [AdA,,], where As E R2mxns is a submatrix containing only sparse columns, and Ad E Jmxnd contains only dense columns. In the case of the two-stage dual block angular program, Ad contains the columns corresponding to the first stage decisions and As contans the columns corresponding to the second stage decisions. Let Ds and Dd be diagonal matrices corresponding to A, and Ad. Then AD2AT = AsD2AT + AdDOAd. Let the Cholesky factorization of AsD2AT be LLT, V = AdDd, and 6 = -VTdy. Then 1SC205 with 16 scenarios. For problem characteristics, please refer to Section 6. 7

VVT = AdD2AT and solving the system TVT I - ] 0= (7) is equivalent to solving (AD2AT)dy = b. From the first set of equations in ( 7), we get LLTdy = b+ V or dy = (LLT)-l(b + V). (8) From the second set of equations in ( 7), we find that VTdy +16=0 (9) Substituting ( 8) into ( 9), [I + VT(LLT)- V]6 = -VT(LL)-lb (10) The matrix I+ VT(LLT)-1V is a Schur complement and a dense matrix of order nd x nd. Methods specific to the solution of dense matrices (e.g., a dense Cholesky factorization) can be used to solve (10) while sparse solution methods can be used to calculate (8) and the right hand side of (10). Once 6 has been calculated, then the search direction is the solution to LLTdy = b + VS. The entire method requires nd + 1 sparse Cholesky backsolves (nd solutions to (LLT)z = (V).i and one solution to (LLT)z = b) as well as a dense backsolve for 6. As shown in Lustig et. al. ([29]), use of the Schur complement can significantly reduce the overall effort needed to solve the search direction system, since removal of dense columns results in quite sparse Cholesky factorizations. However, as reported in Choi, et. al. ([12]), there are two disadvantages to using the Schur complement. As the number of dense columns grows large, the effort needed to solve the dense matrix I + VT(LLT)-1V grows markedly. As an example, Choi, et. al [12] solved the ISRAEL problem in the NETLIB test set (Gay, [20]) with different sizes of the dense submatrix. With the number of dense columns set at 40, the time required to solve the problem was over twice that when 6 columns were included in the dense partition. The second disadvantage is possible numerical instability. For dual block angular programs, moving a first stage column into Ad leaves a column with no nonzeroes in AsD2AT. For example, consider a one scenario problem, with coefficient matrix, A = T~ W ] In this case, Ad = [AoT] and AD2AT = AdD2AT + AD2AT = Ao 0 [ ] [ D2 AT TT + 0 WDW ] (11) _ AoD2A AoD2TT 0 0 (12) TD2AT TD2 0TT WDWT j(12) 8

Since AsD2AT is singular, the above procedure fails. Even if A, is forced to have full rank, the procedure is likely to suffer from numerical instability. To address this problem, various methods have been proposed to improve the accuracy of the algorithm. For example, Lustig, et. al. [29] uses iterative refinement to improve the accuracy of solutions to equations involving LLT. For some problems in the NETLIB test set [20], however, they are only able to guarantee the solution to one decimal place. This reflects the inherent difficulty with problems with many dense columns. While loss of numerical accuracy may be a problem for dual block angular programs with many dense columns, the use of the Schur complement may improve solution times substantially for problems with a few dense columns. For a class of stochastic network models, Carpenter et. al. [11] halved solution times using the Schur complement over a splitting reformulation (Section 3.1). However, the inherent loss of accuracy associated with the Schur complement suggests that it may not be desirable for solving dual block angular programs. In many general linear programs, an identity (or a substantial part of an identity) matrix exists in the original constraint matrix, A, due to slack or surplus variables. In this case, AAT may be written as AdAT + Is. Now, the Schur complement approach can be used to write (AAT)-1 = (Is + AdAT)- = Is - Ad(Id + ATAd)-1AT, where we use Id to denote an identity of rank n. Other diagonal coefficients corresponding to slack variable values can be used in place of IS and Id with the same basic result. In this way, solution using the AAT structure is replaced by solution with the ATA structure, which may be sparser. This is the approach in [8] that allows the dual factorization form to be used in solving the primal problem. Even when an identity does not exist, one can be added to the ASD2AT part of AD2AT and subtracted from AdDAT. This allows Is + AsD2AT to remain well conditioned. The next section describes this method to maintain accuracy in a form similar to the Schur complement. 3.4 Explicit Factorization of Dual Block Angular Programs The solution to the set of equations that determine search directions in affine scaling algorithms may also be accomplished by decompositions specific to the dual block angular structure. Birge and Qi [6] proposed the better conditioned form of the Schur complement using a generalized version of the Sherman-Morrison-Woodbury formula (see, e.g. Golub and Van Loan [23]). While the full calculation of the step size in affine scaling may be performed generally in O(m2n) operations, the decomposition they propose reduces the computational complexity to 0(n2) operations (assuming n o, n for all 1 = 0,..., N). In this subsection, we review the theoretical result obtained and discuss the procedure for its implementation. Serial and parallel computational results for this particular method are reported in Section 4. The main result obtained by Birge and Qi notes that the matrix AD2AT may be written as the sum of a block diagonal matrix D (the Is + AsD2AT segment) and the product of two similar matrices U and V (defined below). Given this representation, the Sherman-Morrison-Woodbury formula, which is reproduced here for convenience, may be used to find the inverse of the matrix. The notation used in the following follows that of Section 2.2. Lemma 1: For any matrices A, U, V such that A and (I + VTA-1U) are invertible, (A + UVT)- = A-1 - A-'U( + VTA-1U)-IVTA-l. (13) Proof: (A +UT)-(A UVT) = I + A-UVT- A-'U(I + VT AU)-1VT 9

A-1U(I+ VTA-1U)-iVTA -UVT =I + A-UVT - A-U(I + VTA-U)-'(I + VTA-U)VT I + Al-UVT- A-lUVT = I. The lemma is the main tool in deriving the BQ factorization. We slightly change the form of the M matrix in [6] to obtain the following result. The proof follows the same development as [6]. Theorem 1: Consider the feasible region of the dual block angular program given in (3), and some dual solution (y,..., yN). Let M = AD2AT,S = diag {So, Si,..., SN}, where Si = WID2Wtr, I = 1,..., N, So = 12 E mo xmo, and D2 = diag {(vl)-2,..., (vI)-2}. Furthermore, let I1 and 12 be identity matrices of dimension no and mo, respectively. Also, let N G1 = (Do)2 + TTSTl, G2 = -AoGlAT 1=0 Ao 12 Ao - 12 = T 0 T1 0 TN 0 TN 0 If Ao and WI has full row rank, for I = 1,..., N, G2 and M are invertible and -0 -2 [O G 1 Ao 1 2 2 4) Proof: In the affine scaling algorithm, v1k is positive, so D is invertible. By assumption, Wi has full row rank, so SI is invertible for I = 1,..., N and S is invertible. Let D = diag {Do, 12} and AoDo 2 AoDo -I2 TiDo 0.r TiDo 0 U =.o O, V =.o ~. TNDo 0 TNDo 0 Note that U = UD and V = VD. By construction, M = S + UVT. Applying Lemma 1 (the Sherman-Morrison-Woodbury formula), M is invertible and M-1 = (S + UrT)-I S-1 - S-1U(I + fTS-lU)-fVTS-l ( iff S and I + VTS- U are invertible ) S-1 _ S-1UD2(I + DTVTS-UD)-~VTS-1 S-1 _- S-UD2D-2(D-2 + VTS-U)-1VTS-1 = S - S-1U(D -2 + VTS-lU)-lVTS-l Let GI AT] N G = [ A o ] where G = (Do)2 + Eo TTS-T t. G2 = -AoG- AoT. 10

Then AoAo + '_N I TjS-1Tt AT VTS-1U 0= [,T1=i TI Ag' [ -Ao -12 and D2 + VTS-U G. So, M-1 = S-1 - S-'UG-lVTS- (15) if and only if (I + fTS-1 U), or G, is invertible. G1 can be expanded as N G1 = (Do)2 + AoAT + ZTIS-ITT. /=1 By construction, (Do)2 and AoAT are positive definite and symmetric. Since SI is positive definite for all I = 1,..., N, TISITT is positive definite and symmetric. The sum of positive definite matrices is again positive definite, so G1 is positive definite and symmetric. So, Gj1 exists, is symmetric, and can be written as G = G1/2G1/2, where G1/ is also symmetric. By assumption, 1mtca abwie G 1 1 1f Ao has full row rank, so AoG1/2 has full row rank. Consequently, G2 = -AoG lAoT is invertible, and 0 ~ -I2 [ 0 G21 Ao I2 0 2 T1 nT ~O 1 "O 1 [ 1T T0] GI Ao 1 G-1 + G-1 GTG2 AoG-1 G- ATG-1 G- 1 =_ [! A GO A 2 G1 GAG1 0 ] - -Ao O j -G21AoG-1 -G21 0 2 Since G is invertible, (15) holds. Consequently, M is invertible and (14) holds. Using Theorem 1 to explicitly compute the inverse of the matrix M is not the most efficient way to determine the search direction dy. However, the system of equations Mdy = b may be efficiently solved using Theorem 1 by solving (in order) Sp=b Gq= VTp Sr= Uq (16) and setting dy = p - r. To verify that dy = M-lb, note that dy = p-r=S-b - S-'Uq = [S-1 - S-'U(G-'VTS-1)]b = M-b Further simplification of the second equation of (16) may be made by symbolically expanding G into its components G1 and Ao. Let qT = (qT, q), where ql e no and q2 E Sm. Then solving [G Ao] q, 1 A T 1 [-Ao o0 q2 [P2 -12 0... 0 ' L PN J implies that q2 = (AoG'A )- (1P2- AoG11p1) - -G2 (j2-AoG'lPl) ql = (G1)-'(ii-ATq2). 11

Solving (16) requires Cholesky factorizations of SI, G1, and G2. At each iteration of the affine scaling algorithm, an update of Si to reflect the current dual solution must precede the solution of Mdy = b. Given the updated matrices (Di)2 and subsequent updates of the Cholesky factorizations of SI, dy may be found using the algorithm described below. Procedure finddy ( S, A... AN, b, dy) begin 1. (Solve Sp = b). Solve Sipl = bl for p, I = O,..., N. 2. (Solve Gq = VTp). (a) Form G1 by solving Sl(ul)i = (Al).i for (ul)i,l = O,...,N,i = 1,...,nl) and setting (G1) i (Do)ii + TT (ul). (b) Form P1 and P2 using (17). (c) Solve G1u = pi for u and set v = P2 - Aou. (d) Form G2 by solving (Gi)wi = (AT).i for wi and setting G2 = -Ao[wi... W]. (e) Solve G2q2 = v for q2, and solve G lql = pi - ATq2 for qi. 3. (Solve Sr = Uq). Set ro = Aoq1 + q2 and solve Sirl = Alqi for rl e, m = 1,..., N. 4. (Form dy). Set (dy)l = pi - rl for I =,..., N. Return dy. end. Forming dy using Birge and Qi's Decomposition There are several advantages to using this approach to find the search direction dy. Unlike the Schur complement, the method does not suffer from ill-conditioned working matrices and requires no iterative improvement to maintain solution accuracy. To illustrate this point, a one scenario problem with constraint matrix, A = [ A~ W ] In this case, AD2AT = S + UVT = T W W I 0 [ AoDA —I AoD2TT ( 18 WD2WT + (18) 0 oWD2 W L TD2'A TDoTT Comparing (18) to (12) shows the effect of adding the identity matrix to eliminate the singularity in the sparse system. We note also that this approach can in general be used to obtain better conditioned factorizations for problems with dense columns. Although we have only considered dual block angular problems so far, this is not necessary. In the case of A = [ ], we can T W write AD2AT = S + fVT = I+ QD2Q 1 + AoD AT-_I AoD2TT +QD2W 119 0o WD12WT j TDAoT + WD2QT TD TT [QD12Q 0 ] [ 0 12

An implementation based on (19) can use S = [ W WT ] ust as n Theorem 0 WDWTj 1, where Q = 0. In this way, instability in QD2QT can be countered. Note that this approach does not increase problem size as in splitting or dual factorizations. Birge and Qi [6] also show that the decomposition given in Theorem 1, when applied to the dual problem (6), allows efficient updating from a smaller stochastic linear program with few scenarios to a larger problem with more scenarios. A substantial portion of the computational effort required for the procedure lies in the factorization of SI and the solution of systems of equations involving SI. Since these calculations may be performed independently, coarse-grained parallelization or distributed processing may be used to provide a substantial gain in computational efficiency. With one processor for each scenario, the simulataneous running time to solve for the search direction may be reduced from O(N(nl + n2no + non2 + n2n1)) as given by Birge and Qi to O(n3 + n2no + non2 + Nn2), a reduction of roughly min{N, n1}. Finer grain systems with many processors could achieve further reductions by peforming matrix additions and multiplications in parallel, but we concentrate on distributed systems due to their wide availability. In the next section, we provide computational experience on both serial and distributed implementations. 4 Computational Experience In [6], only the theoretical efficiency of the BQ decomposition was demonstrated. We wish to demonstrate that practical computational advantages also exist by comparing this factorization with the other methods mentioned for stochastic linear programs. We chose to compare the method against the interior point solver in IBM's Optimization Subroutine Library (OSL) [25] since that package is widely available and, as such, represents a benchmark for general purpose interior point implementations. Our goal was to determine whether the specialized BQ factorization improves upon general purpose performance mainly in terms of the effort required in each iteration of the respective methods. We also wished to see whether a distributed computing implementation of the BQ decomposition could provide further efficiencies. The OSL interior point approach is a primal barrier method described in Gill, et. al. ([21]). In the current study, we compare the BQ factorization to the OSL implementation for three formulations: (1) the primal problem (3), (2) the dual of (3) (so that the transpose factorization was used), and (3) the split variable formulation (4) of the primal problem. 4.1 Problem Characteristics The test problems used in the current study are stochastic versions of a set of staircase test problems (Ho and Loute, [24]) as in Birge ([9]). The problems in the test include: SC205 A dynamic multisector development planning model. SCRS8 A technological assessment model for the study of the transition from fossil fuel use to renewable energy sources. SCAGR7 A multiperiod dairy farm expansion model. SCSD8 A model to find the minimal design of a multistage truss. SCFXM1 A production scheduling problem. Deterministic equivalent problems (3) and split variable formulations (4) were created using a test problem generator developed by Birge, et. al. [7]. Dual problems were generated from the 13

original deterministic formulations prior to any input into an LP solver. Linear programs were converted by all LP solution implementations into the standard equality form given in Section 2.2 prior to solution. For each problem in the test set, deterministic equivalents with 4, 8, 16, 32, and 64 scenarios (distinct realizations of ~) were created. The exceptions are scfxml and scsd8, which were only solved to 32 scenarios. The characteristics of each problem are given in Table 1. The number of nonzeros in the AD2AT matrices for the various formulations is also given in Table 1. Figure 2 graphically shows the number of Cholesky nonzeroes for the sc205 and scsd8 problems. Scen- Columns A Matrix Density AD2A' Nonzeros Problem arios Rows (1) Nonzeros (Percent) Dual Primal Split sc205.1 4 101 102 270 2.62% 367 478 623 sc205.2 8 189 190 510 1.42% 695 1200 1441 sc205.3 16 365 366 990 0.74% 1343 3454 3323 sc205.4 32 717 718 1950 0.38% 2639 11236 8316 sc205.5 64 1421 1422 3870 0.19% 5231 39882 18928 scagr7.1 4 167 180 546 1.82% 1164 1191 1129 scagr7.2 8 319 340 1050 0.97% 2248 3443 2546 scagr7.3 16 623 660 2058 0.50% 4416 11123 5417 scagr7.4 32 1231 1300 4074 0.25% 8752 39155 11593 scagr7.5 64 2447 2580 8106 0.13% 17424 145907 25743 scfxml. 1 4 684 1014 3999 0.58% 28564 7669 9585 scfxml.2 8 1276 1914 7319 0.30% 56111 14997 18449 scfxml.3 16 2460 3714 13959 0.15% 125689 31819 38952 scfxml.4 32 4828 7314 27239 0.08% 249402 75641 90220 scrs8.1 4 140 189 457 1.73% 1877 1235 1367 scrs8.2 8 252 341 849 0.99% 5195 4771 3550 scrs8.3 16 476 645 1633 0.53% 4955 23137 10251 scrs8.4 32 924 1253 3201 0.28% 9563 88732 29337 scrs8.5 64 1820 2469 6337 0.14% 18779 347247 89687 scsd8.1 4 90 630 1890 3.33% 31677 1433 2673 scsd8.2 8 170 1190 3650 1.80% 59595 3916 7066 scsd8.3 16 330 2310 7170 0.94% 117113 12112 19892 scsd8.4 32 650 4550 14210 0.48% 232287 41588 65143 (1) Slack Columns not included Table 1. Problem Characteristics As can be seen from Table 1, the scfxml and scsd8 problems contained many more nonzeroes in the dual Cholesky factorization than in the primal Cholesky factorization. This behavior runs counter to the supposition given in Section 3.2 that solving the dual problem reduces the fill-in in the projection matrix factorization. Resolving this discrepancy requires the consideration of the sizes of the submatrices Ao, T, and W of the deterministic equivalent problem. The major block components off the diagonal of the matrix AD2AT are TAo and TT. In contrast, the block components of the system ATD2A (shown in 6) off the diagonal are matrices with the same nonzero structure as WTT. If this matrix is comparatively large or dense, then the dual projection matrix may be more dense than the primal projection matrix. Table 2 shows the block component densities and sizes for each of the test problems considered. The two exceptional problems, SCFXM1 and SCSD8, have many columns and have relatively dense TWT blocks. These results imply that consideration of the block 14

characteristics of the deterministic equivalent problem is important before deciding which form of the problem to solve. (We should note that Carpenter and Vanderbei ([41]) recently proposed another alternative for use in a primal-dual algorithm which allows some compromise between the AD2AT or AT D2A factorization forms.) l _ | T Matrix W matrix Primal Dual Primal and Dual Problem Rows Columns Rows Columns TAT TT1' TW' WWT AoAoT sc205 13 14 22 22 26 22 25 102 54 scfxml 92 114 149 225 2729 12996 168 5937 3450 scagr7 19 20 39 40 73 80 67 370 124 scrs8 28 37 28 38 98 185 100 248 221 scsd8 10 70 21 140 1732 9900 1732 4880 1420 Table 2. Block Characteristics Another statistic which may help distinguish which form of the problem to solve is the relative row and column densities of the deterministic equivalent linear program. Table 3 shows these densities for the two largest instances of each problem. The row (column) density is defined as the number of nonzeroes in a row (column) divided by the number of rows (columns) in the entire constraint matrix. Since the problem scfxml has an unusually low maximum column density, solving the dual problem may not necessarily remove many nonzeros from its Cholesky factorization. Likewise, the problem scsd8 has an unusually high maximum row density, which is consistent with the small number of rows in its primal problem. Row Densities (1) Column Densities (2) Problem Minimum Maximum Minimum Maximum sc205.4 0.000 0.696 0.279 4.881 sc205.5 0.000 0.352 0.141 4.715 scagr7.4 0.077 0.769 0.081 18.278 scagr7.5 0.039 0.388 0.041 18.349 scfxml.3 0.027 3.096 0.041 2.114 scfxml.4 0.014 1.572 0.021 2.071 scrs8.4 0.080 1.117 0.108 14.286 scrs8.5 0.041 0.567 0.055 14.286 scsd8.3 0.433 3.723 0.303 15.152 scsd8.4 0.220 1.890 0.154 15.077 Notes: (1) Defined as the number of nonzeroes in a row divided by the number of rows (2) Defined as the number of nonzeroes in a column divided by the number of columns Table 3. Row and Column Densities 4.2 Computational test parameters and algorithms The deterministic equivalent test problems, as described in the previous section, were solved using OSL 2 and an implementation of the BQ decomposition on an IBM RS/6000 320H workstation 3 for 2Subsequent improvements to OSL show lower solution times than those quoted here. 3System performance measures are given as: 41.2 SPECs, 37 IMIPs, and 11.7 MFLOPs 15

the serial comparisons and a network of six DEC 5000/320 workstations for the distributed implementation of BQ. Data structures and recommended parameter values used for the BQ implementation followed those in ([1]). The implementations were written in the FORTRAN programming language, and complied using the XLF RS/6000 compiler with all applicable optimizations. The solution of the sparse systems of equations required to find the search direction dy in the standard dual affine method were obtained using Cholesky factorizations of each SI in the BQ decomposition. These factorizations were computed using SPARSPAK, developed by Chu, et. al. [13]. The SPARSPAK routines use a minimum degree ordering heuristic to permute the columns and rows to obtain relatively sparse factorizations. Since the nonzero structure of the AD2AT matrix remains the same on each iteration, only the entries of the matrix are updated at each iteration. As suggested in Karmarkar [26] and Adler, et. al. [1], an approximate scaling matrix D is used to determine a search direction at each iteration. Instead of using Dk = diag{1/lv,..., l/v. } in AD2AT, each implementation uses Dk, where (Dk)ii is (Dk-l)ii except in those elements that differ (relatively) by at least a fixed amount u, from the corresponding element in the exact scaling matrix. So, for i = 1,..., n and Eu = 0.1 > 0, ((Dk-(1)ii if l(Dk)i-(Dk-1)-i < eu (DkY- ( ^l~lii~~^ ~ <~u (Dk) (Dk)ii if I(Dk)ii(Dk-)iil > u (Dk..1)i - U Feasibility in the BQ implementation was obtained using a Big M scheme (proposed by Adler et. al. [1]) and a Phase I/Phase II stopping criterion. Specifically, a feasible solution may be obtained by solving max bT - Mya subject to ATy - eTya < c. If the algorithm stops and yk < 0, then the algorithm proceeds without the artificial variable and yk as an interior point to the original problem. If yk < Ef 4 thetheh problem is unbounded or an optimal solution was found. Otherwise, the problem is infeasible. The initial interior point used for each phase one procedure was y~ = (llcll/llATbll)b. Obtaining the search direction dy using the BQ decomposition requires many solutions to linear systems of the form Six = WiD2WTx = y for all I = 0,..., N. Since these systems are independent of each other, and their symbolic Cholesky factorizations need to be performed only once, they are ideal "processes" to compute in a distributed computing environment. As part of this study, a parallel implementation of the BQ decomposition which assigns groups of scenario blocks Si to different processors was developed. The implementation consists of several node programs and a host program. Each node program uses SPARSPAK to compute and solve Cholesky factorizations for each block in a set of blocks for which it is responsible. 5 At each iteration, the host program sends a set of right hand side vectors yi,,..., YiN to each node program and collects the solutions 51 yyil,..., S7i-yi1N. The host program also performs all other tasks, such as setting up the problem, finding the search direction dy from the solutions given by the node programs, and printing the final results. Communications between the host program and each node program were managed by the Parallel Virtual Machine (PVM) package, developed by Beguelin, et. al. ([5]). PVM allows rapid 4f = 10-6, in the implementations used here. 5Blocks were spread as evenly as possible across processors. 16

prototyping of distributed applications across a number of UNIX based environments. Testing was performed on a network of six DEC 5000/320 workstations connected by a shared ethernet. The primal barrier point method implemented in the OSL package is described in Gill, et. al. ([21]). Given a feasible interior point, the algorithm projects the steepest descent direction of a barrier problem onto the nullspace of a set of equality constraints. This algorithm is fundamentally different from the dual affine scaling method used in the BQ implementation, since primal feasibility is maintained at all times. Like the dual affine scaling algorithm, however, the majority of the computational effort is spent solving linear systems of the form AD2ATdy = b. OSL also uses a minimum degree row-ordering heuristic to find an efficient Cholesky factorization of the projection matrix. The OSL implementation used for this study was programmed to consider all columns as sparse columns in a full-sized Cholesky factorization. All default parameters were used to solve the problems, except that the stopping criterion was designed to solve problems to six significant digits. No preprocessing was performed on the problems prior to solution, and the method to find a feasible interior point used was the same as that employed in the BQ decomposition. 4.3 Computational Results All problems were solved using only interior point iterations. As mentioned above, no preprocessing was used and all system parameters were set to their system defaults. Solutions were obtained to within six significant digits of (known) optimality except for the 32 scenario instances of the scfxml and scsd8 problems. These problems were not solved to optimality due to computational difficulties with the BQ implementation. The difficulty in these cases was most likely in round-off accumulation in the construction of the G1 matrix. Although we believe that more precise solutions with each Sp = b system might have eliminated this problem, it does suggest that some numerical difficulties may still exist in the BQ factorization. The tradeoff is, however, for increased speed and distributed capabilities as shown below. Table 4 shows the interior point solution times (see Figures 3 and 4) and the total solution times for the dual, primal and split formulations, as well as the BQ decomposition. As might be expected, the dual requires less time to solve than the primal problem, except when the number of columns in the primal problem is unusually large or the product TTT is unusually dense. Considering the split formulation may considerably reduce computational requirements, but is not as advantageous as the dual problem when the number of scenarios grows large. Finally, finding the solution using the BQ decomposition embedded in the dual affine algorithm is the fastest option. 17

Solution Time (seconds) (1) Total Time (seconds) (2) Problem Dual Primal Split BQ Dual Primal Split BQ sc205.1 0.59 0.70 0.76 0.243 0.87 1.00 1.05 0.264 sc205.2 1.07 1.32 1.46 0.481 1.56 1.78 1.98 0.52 sc205.3 2.03 3.07 2.82 1.014 2.87 3.94 3.75 1.102 sc205.4 4.08 9.33 6.64 1.987 5.55 10.86 8.45 2.206 sc205.5 9.42 44.74 15.44 4.219 12.56 47.87 18.97 4.828 scagr7.1 2.66 1.31 1.32 0.391 4.22 1.95 1.90 0.444 scagr7.2 4.80 2.92 2.66 0.861 5.79 3.96 3.72 0.971 scagr7.3 9.56 8.03 5.33 1.831 11.45 9.89 7.33 2.089 scagr7.4 14.55 41.84 11.48 4.011 18.02 45.57 15.34 4.673 scagr7.5 27.29 263.17 33.23 9.124 34.19 270.27 40.82 11.064 scfxml.1 20.67 10.03 10.29 4.76 23.30 12.89 13.27 5.239 scfxml.2 43.57 21.30 19.37 6.91 48.14 26.82 24.86 8.701 scfxml.3 104.00 50.78 46.73 10.46 112.74 60.99 57.07 12.56 scfxml.4 NA 135.02 144.03 NA NA 154.72 164.27 NA scrs8.1 1.92 1.07 1.23 0.651 2.40 1.53 1.80 0.693 scrs8.2 4.57 2.53 2.55 1.46 5.44 3.37 3.58 1.544 scrs8.3 5.55 12.39 6.18 1.836 6.99 13.88 8.14 2.022 scrs8.4 11.52 63.10 34.08 4.691 14.38 66.03 37.91 5.155 scrs8.5 25.30 459.20 60.06 11.997 30.86 464.90 67.55 13.292 scsd8.1 11.25 2.66 2.75 0.909 12.75 4.28 4.42 1.065 scsd8.2 21.40 5.44 5.76 1.693 24.29 8.61 8.85 2.01 scsd8.3 44.93 13.32 13.47 3.139 50.24 19.24 19.60 3.832 scsd8.4 93.82 42.25 39.58 NA 104.50 54.30 51.58 NA Notes: (1) Includes factorization time and interior point solution time (2) Includes solution time, reading and printing time Table 5. Solution Times Since two different methods and implementations were used in this study, direct solution time comparisons between problems solved with the BQ decomposition and the primal, dual and split variable formulation do not accurately represent the relative merits of each solution method. Since this study is ultimately concerned with the efficiency of solving linear systems of the form AD2ATdy = b, more meaningful comparisons may be made by comparing the CPU time required per interior point iteration. Table 6 (and Figures 5 and 6) shows the (average) CPU time in seconds required per interior point iteration performed on each problem. With the exception of some smaller problems (e.g. sc205.1, scfxml.1, and scfxml.2), using the BQ decomposition is considerably faster than any nondecomposition based solution technique. As the number of scenarios in the deterministic equivalent increases, the BQ decomposition becomes particularly attractive. 18

Iterations Time per Iteration (secs) Problem Primal Dual Split BQ Primal Dual Split BQ sc205.1 26 52 33 19 0.027 0.011 0.032 0.014 sc205.2 25 20 34 22 0.053 0.054 0.058 0.024 sc205.3 30 21 34 23 0.102 0.097 0.110 0.048 sc205.4 32 22 36 23 0.292 0.185 0.235 0.096 sc205.5 21 24 38 24 2.130 0.393 0.499 0.201 scagr7.1 23 121 33 24 0.057 0.022 0.058 0.019 scagr7.2 24 74 35 27 0.122 0.065 0.106 0.036 scagr7.3 31 73 34 28 0.259 0.131 0.216 0.075 scagr7.4 33 51 34 32 1.268 0.285 0.451 0.146 scagr7.5 37 39 53 34 7.113 0.700 0.770 0.325 scfxml.1 42 35 46 15 0.239 0.591 0.288 0.349 scfxml.2 48 37 42 15 0.444 1.178 0.592 0.460 scfxml.3 49 38 50 14 1.036 2.737 1.141 0.747 scfxml.4 18 NA 61 14 7.501 NA 2.693 0.933 scrs8.1 20 39 24 24 0.054 0.049 0.075 0.029 scrs8.2 26 38 24 17 0.097 0.120 0.149 0.091 scrs8.3 22 33 28 25 0.563 0.168 0.291 0.081 scrs8.4 28 32 82 33 2.254 0.360 0.462 0.156 scrs8.5 28 30 27 32 16.400 0.843 2.502 0.415 scsd8.1 18 17 20 18 0.148 0.662 0.221 0.059 scsd8.2 20 17 20 18 0.272 1.259 0.443 0.112 scsd8.3 20 17 20 24 0.666 2.643 0.980 0.160 scsd8.4 20 17 20 22 2.113 5.519 2.579 0.271 Table 6. Solution Time per Iteration The merits of each solution method are highlighted by their relative speedup factors. Table 7 and Figures 7 and 8 give these statistics for the stochastic test set. The absolute speedup factors shown in Table 6 are the ratios between the solution times for each of the solution methods being compared. Likewise, the speedup factors based on time per iteration are the the ratios between the the average CPU time per iteration for each solution method. In terms of absolute solution time, Table 7 shows that solving the dual is not necessarily faster than solving the primal for small problems. However, as the number of scenarios increases, solving a problem that does not have an exceptionally dense or large projection matrix is much easier when the dual formulation is used. In the extreme, solving the dual to problem scrs8.5 (64 scenarios) was over 18 times faster than solving the primal. As the scfxml and scsd8 problems show, the ratios of solution times for solving the dual compared to the primal are limited to at most approximately 2.5. Solving the split variable formulation offers smaller performance gains (over solving the primal) than solving the dual problem, but works for all problems. The problem scfxml.4 suggests that these performance gains may be limited for exceptionally large problems and that there may be a breakeven point for each problem where the performance loss resulting from matrix size increase associated with solving the split formulation offsets the gain obtained through reduced cholesky fill-in. Comparisons of speedup factors based on solution time per iteration show similar results. With the exception of the scsd8 problem, the BQ decomposition is on average 1.95 (std. dev 0.59) times faster than solving with the dual factorization. These results confirm the claims made in [6]. The problems scsd8 and scfxml offer higher speedups, and show trends toward increasing speedup with 19

problem size. The same conclusions appear to hold, though with less certain speedup factors, when the BQ decomposition is compared with the split variable formulation. Absolute Speedups (1) Speedups based on time per iteration Primal Primal Dual Split Primal Primal Dual Split Problem Dual Split BQ BQ Dual Split BQ BQ sc205.1 1.19 0.92 2.43 3.13 2.37 0.85 0.82 2.29 sc205.2 1.23 0.90 2.22 3.04 0.99 0.91 2.26 2.46 sc205.3 1.51 1.09 2.00 2.78 1.06 0.93 2.02 2.30 sc205.4 2.29 1.41 2.05 3.34 1.57 1.24 1.93 2.45 sc205.5 4.75 2.90 2.23 3.66 5.43 4.27 1.95 2.48 scagr7.1 0.49 0.99 6.80 3.38 2.59 0.99 1.19 3.11 scagr7.2 0.61 1.10 5.57 3.09 1.88 1.14 1.80 2.96 scagr7.3 0.84 1.51 5.22 2.91 1.98 1.20 1.76 2.89 scagr7.4 2.88 3.64 3.63 2.86 4.44 2.81 1.95 3.09 scagr7.5 9.64 7.92 2.99 3.64 10.16 9.24 2.15 2.37 scfxml.1 0.49 0.97 4.34 2.16 0.40 0.83 1.69 0.83 scfxml.2 0.49 1.10 6.31 2.80 0.38 0.75 2.56 1.29 scfxml.3 0.49 1.09 9.94 4.47 0.38 0.91 3.66 1.53 scfxml.4 NA 0.94 NA NA NA 2.79 NA 2.89 scrs8.1 0.56 0.87 2.95 1.89 1.09 0.71 1.70 2.60 scrs8.2 0.55 0.99 3.13 1.75 0.81 0.65 1.32 1.64 scrs8.3 2.23 2.00 3.02 3.37 3.35 1.94 2.08 3.59 scrs8.4 5.48 1.85 2.46 7.26 6.26 4.87 2.30 2.96 scrs8.5 18.15 7.65 2.11 5.01 19.45 6.56 2.03 6.02 scsd8.1 0.24 0.97 12.38 3.03 0.22 0.67 11.18 3.74 scsd8.2 0.25 0.94 12.64 3.40 0.22 0.61 11.27 3.96 scsd8.3 0.30 0.99 14.31 4.29 0.25 0.68 16.55 6.14 scsd8.4 0.45 1.07 NA NA 0.38 0.82 20.38 9.52 Notes: (1) Ratio of Solution Times Table 7. Speedup Comparisons As mentioned previously, the structure of the BQ decomposition lends itself to distributed processing. Table 8 (and Figure 9 in the appendix) give the best run times for the sc205 problem 6 obtained using a parallel version of the BQ implementation with those obtained using the serial version of the decomposition. The solution times given in Table 7 are "wall clock" times necessary to calculate the iterations required by the dual affine scaling algorithm 7. The parallel implementation clearly gives a substantial performance improvement over a serial implementation of the BQ decomposition for large problems. For example, solving the 64 scenario sc205 problem is almost three times faster using four processors than the serial version. Speedups are not linear with the number of processors since the communication requirements quickly overtake the benefits provided by the distribution of computational work. The 8 and 16 scenario problems suggest that three or four scenarios per processor offer the best performance. 6Run times were obtained using a network of six DEC 5000/320 workstations. 7Unfortunately, since the network connecting the processors was a shared resource, obtaining accurate solution times for larger number of processors was quite difficult. Fully utilizing the parallel implementation requires communications bandwidths unobtainable over a shared ethernet. 20

Solution Time (seconds) (1) Serial Number of Processors Problem Scenarios Algorithm 1 2 3 4 5 6 sc205.1 4 3 5 5 5 5 - sc205.2 8 6.7 9 7 8 7 9 11 sc205.3 16 13.2 17 13 12 12 14 15 sc205.4 32 28 38 22 22 19 23 23 sc205.5 64 55.5 34 31 20 93 Notes: (1) Solution times are best encountered in at least 10 runs (2) Blank times unavailable due to system limitations Table 7. Parallel Solution Times 5 Conclusions This paper reviewed the need for and characteristics of solving (discrete) stochastic linear programs with fixed recourse (or more generally, dual block angular linear programs) using interior point methods. Direct application of interior point methods to larger problems of this nature is computationally difficult if not infeasible. The reason for this lies in the many dense columns which are characteristic of these problems. To resolve this problem, four methods that directly address the problem of dense columns were reviewed. Reformulation of the program to break up these dense columns can improve the performance of interior point methods, but may require a substantial increase in problem size. Solving the problem's dual (or using the transpose factorization) is also possible and generally works well in practice. However, solving the dual for problems with dense or large recourse matrices does not work well. Partitioning the constraint matrix into two matrices with dense and non-dense columns is another alternative but suffers from inherent numerical instability. Decomposing the constraint matrix into the sum of two matrices and applying the ShermanMorrison-Woodbury formula is the BQ decomposition alternative studied here in detail. Computational experience with this method suggests that it substantially reduces the effort needed to solve for the search direction at each iteration. Solving the dual or primal requires a Cholesky factorization of a sparse, but still large matrix. By contrast, the BQ decomposition technique requires several smaller, but independent, factorizations. In practice, this effort is small compared to that required by non-decomposition based techniques. Computational experience indicates that speedups may increase with the number of scenarios. Taking advantage of the ease with which the decomposition may be parallelized further reduces the computational requirements necessary to solve dual block angular programs, and in practice offers substantial performance improvements. References [1] I. Adler, N. Karmarkar, M.G.C. Resende, and G. Veiga, 1989."An Implementation of Karmarkar's Algorithm for Linear Programming," Mathematical Programming 44, 297-335. Errata in Mathematical Programming 50, 415, 1991. [2] J. Arantes and J. Birge, 1989. "Matrix Structure and Interior Point Methods in Stochastic Programming," Presentation, Fifth International Stochastic Programming Conference, Ann Arbor, Michigan. 21

[3] E. R. Barnes, 1986. " A variation on Karmarkar's algorithm for solving linear programming problems," Mathematical Programming 36,174-182. [4] E.M.L. Beale, 1955. "On Minimizing a Convex Function subject to Linear Inequalities," Journal of the Royal Statistical Society B 17 173-184. [5] A. Beguelin, J. Dongarra, A. Geist, R. Mancheck, and V. Sunderam, 1991. "A User's Guide to PVM: Parallel Virtual Machine," Report ORNL/TM-11826, Oak Ridge National Laboratory, Deparment of Energy, Oak Ridge, Tennessee. [6] J. R. Birge and L. Qi, 1988. "Computing Block-angular Karmarkar Projections with Applications to Stochastic Programming," Management Science 34:12, 1472-1479. [7] J. R. Birge, H. I. Gassman, and D. Holmes, 1991. STOPGEN stochastic linear program deterministic equivalent problem generator. [8] J. R. Birge, R. M. Freund, and R. J. Vanderbei, 1990. " Prior reduced fill-in in the solution of equations in interior point algorithms," Technical Report, Sloan School of Management, MIT (Cambridge, MA, 1990), to appear in Operations Research Letters. [9] J. R. Birge, 1985. "Decomposition and Partitioning Methodsfor Multistage Stochastic Linear Programs," Operations Research 33, 989-1007. [10] W.J. Carolan, J.E. Hill, J.L. Kennington, S. Niemi, and S.J. Wichmann, 1990. "An empirical evaluation of the KORBX algorithms for military airlift applications," Operations Research 9:2, 169-184. [11] J. Carpenter, I. Lustig, and J. Mulvey, 1991. "Formulating Stochastic Programs for Interior Point Methods," Operations Research 39:5, 757-770. [12] I.C. Choi, C.L. Monma, and D.F. Shanno, 1988. "Further Development of a Primal-Dual Interior Point Method," Report 60-88, RutgersCenter for Operations Research, Rutgers University, New Brunswick, New Jersey. [13] E. Chu, A. George, J. Liu, and E. Ng, 1984. "SPARSPAK: Waterloo Sparse Matrix Package User's Guide for SPARSPAK-A," Research Report CS-84-36, Department of Computer Science, University of Waterloo, Waterloo, Ontario. [14] G. Dantzig, 1955. "Linear Programming under Uncertainty," Management Science 1 197-206. [15] G. Dantzig, and A. Mandansky, 1961. "On the Solution of Two Stage Linear Programs under Uncertainty," Proceedings of the Fourth Berkely Symposium on Mathematics and Probability Vol. 1, Universityof California Press, Berkely, California, 165-176. [16] I.I. Dikin, 1967. " Iterative solution of problems of linear and quadratic programming," Soviet Mathematics Doklady 8, 674-675. [17] J. Edwards, 1988. "A Proposed Standard Input Format for Computer Codes which solve Stochastic Programs with Recourse," Numerical Techniques for Stochastic Optimization, Yu. Ermoliev and R.J.-B. Wets (eds), 215-228. 22

[18] S.C. Eisenstadt, M.C. Gurshy, M.H. Shultz, and A.H. Sherman, 1981. "The Yale Sparse Matrix Package, I. The Symmetric Codes," ACM Transactions on Mathematical Software. [19] H.I. Gassman, 1989. "Optimal harvest of a forest in the presence of uncertainty," Canadian Journal of Forest Research 19, 1267-1274. [20] D.M. Gay, 1985. "Electronic Mail Distribution Linear Programming Test Problems," Mathematical Programming Society COAL Newsletter, December. [21] P.E. Gill, W. Murray, M.A. Saunders, J.A. Tomlin, and M.H. Wright, 1986. "On Projected Newton Methods for Linear Programming and Equivalence to Karmarkar's Projection Method," Mathematical Programming 36, 183-201. [22] P.E. Gill, W. Murray, and M.A. Saunders, 1988. "A single phase dual barrier method for Linear Programming," Report SOL-88-10, Systems Optimization Laboratory, Stanford University, Stanford, California. [23] G.H. Golub and C.F. van Loan, 1983. Matrix Computations, The Johns Hopkins University Press, Baltimore, Maryland. [24] J.K. Ho and E. Loute, 1981. "A Set of Linear Programming Test Problems," Mathematical Programming 20 245-250. [25] International Business Machines Corp., 1991. "Optimization Subroutine Libary Guide and Reference, " document SC23-0519-01, International Business Machines Corp. [26] N. Karmarkar, 1984. "A New Polynomial Time Algorithm for Linear Programming," Combinatorica 4, 373-395. [27] A.J. King, 1988. "Stochastic Programming Problems: Examples from the Literature," pp. 543-567 Numerical Techniques for Stochastic Optimization, Yu. Ermoliev and R.J.-B. Wets (eds). [28] F.V. Louveaux, 1980. " A solution method for multistage stochastic programs with recourse with application to an energy investment problem,", Operations Research 28, 889-902. [29] I. Lustig, R. Marsten, and D. Shanno, 1989. "ComputationalExperience with a Primal-Dual Interior Point Method for Linear Programming," Technical Report SOR 89-17, Department of Civil Engineering and Operations Research, Princeton University, Princeton, New Jersey. [30] R. Marsten, R. Subramanian, M. Saltzman, I. Lustig, and D. Shanno, 1990. "Interior Point Methods for Linear Programming: Just call Newton, Lagrange, and Fiacco and McCormick!," Interfaces 20:4, 105-116. [31] N. Megiddo, 1986. "Pathways to the Optimal Set in Linear Programming," Technical Report RJ 5295, IBM Almaden Research Center, San Jose, California. [32] J.A. Meijerink and H.A. van der Vorst, 1977. "An IterativeSolution Method for Linear Equation Systems of which the Coefficient matrixis a Symmetric M-matrix," Mathematical Computations, 31 148-162. 23

[33] C.L. Monma and A.J. Morton, 1987. "Computational Experience with a Dual Affine Variant of Karmarkar's Method for Linear Programming," Operations Research Letters 6, 261-267. [34] J.M. Mulvey, 1984. "A Network Portfolio Approach for Cash Management," Journal of Cash Management 4, 46-48. [35] J.M. Mulvey and H. Vladimirou, 1989. "Stochastic Network Optimization Models for Investment Planning," to appear, B. Shelby (ed.) Annals of Operations Research, special issue on Networks and Applications. [36] B.A. Murtaugh and M.A. Saunders, 1983. "MINOS 5.1 User's Guide," Technical Report SOL83-20R, Systems Optimization Laboratory, Stanford University, Stanford, California. [37] D.F. Shanno, and A. Bagchi, 1990. "A Unified View of Interior Point Methods for Linear Programming," Annals of Operations Research 22, 55-70. [38] D. Shanno, 1988. "Computing Karmarkar Projections Quickly," Mathematical Programming 41, 61-71. [39] R. Van Slyke and R. Wets, 1969. "L-Shaped Linear Programs with Applications to Optimal Control and Stochastic Programming,"SIAM Journal of Applied Mathematics 17, 638-663. [40] R. J. Vanderbei, M. S. Meketon, and B. A. Freedman, 1986. " A modification of Karmarkar's linear programming algorithm," Algorithmica 1, 395-407. [41] R. J. Vanderbei and T.J. Carpenter, 1991. "Symmetric Indefinite Systems for interior point methods." Technical Report SOR 91-7,Dept. of Civil Engineering and Operations Research, Princeton University. 24

xipuaddv

Rown: 366 Co: 366 Nz3: 2493 i' 0% Figure la: SC205, 16 scenarios, Dual '. ___ Roa 6 c 365 Nr 4 1. * ~ * rvrv,. r \; \ \ -. ~. * ~ Figure 1 b: SC205, 16 scenarios, Primal

Figure 2. Cholesky nonzeroes for SC205 and SCSD8 40 35 30 25 20 0 15 10 5 0 250 200 150 0 o r-s c 0 Z: PRIMAL SPLIT DUAL PRIMAL SPUT 100 50 DUAL T-f-1-~-V --- —V i N* 00 0D (N4 rl(o 0 4-t f-t- + -j 00 (0D f --- lf It 00 (0 C r — ) r00 DO C- r 3 as D 00 o c Scenarios t 00 coD C Scenarios

I CM, cn C) 0 uO '2o1 (A co CD,) c( '0 E C C X.0 Lo EW co 1.1., - _I_ ii LL, I b Co\ C i --- I \ X I. _ Z. 9; 9. 7 31 8 7r (.,L 9i 98 2 v U) z U UC Io -8 - 0 4, 8 V 0 -8

Figure 4. Solution Times (seconds) for SCRS8 and SCAGR7 300.00 250.00 200.00 150.00 100.00 50.00 0.00 SCRS8 SPLIT DUAL SCAGR7] Primal DUAL Birge-QI SPLIT, r- o c t t- r^ ) 0 oO a o c'- J -t "- 00 (0 ( —j -s- " (X) (o < ( J., - r-S) (0CEN -- 0 A SCENARIOS

Figure 5. Solution Times per iteration (seconds) for SC205, SCFXM1, and SCSD8 8.000 7.000 6.000 5.000 4.000 3.000 2.000 1.000 0.000 SCFXM DUAL Prima ISCSD8J I,/ -\ 'I - -4I ) - 4 - C) )t ( SPUT l cr CI ) CC C-4 r —) co c i) ( Ci (' - 0 0 CC ) (c N V 0 c ) ~ t IL)Loo i') cC) r S )t I I SCENARIOS I-

Figure 6. Solution Times per iteration (seconds) for SCRS8 and SCAGR7 10.000 1 SCRS8 SCAGR7 8.000 t 6.000 t Primal (, 1) c/i 4.000 4 2.000 t Primal SPLIT DUAL DUAL SPLIT It- -tT I 0.000 1 — JT 0 (-L).- r- ) (c) -r Cx)) (C) ( so) ( ) -1 (x) (C) ( 4 I r- ) o SCENARIOS

Figure 7. Speedups associated with solving the Primal (based on solution time per iteration) 20 18 16 14 Dual vs. Primal - Split vs. Primal 12 10 8 6 4 SC205 SC l r — AGR7 SCFXM1! t I -+-I —t - -- t I0( f SC RS8 SCSD8 +I- + - — I —t- — F 2 0 --- t — t- --- -- C00 Q0D C-l - r (D - 00 0 C r) CD Sc- co Q CNs Scenarios I - 00 C (. C r- 0D 00- 00 (N I )l

Figure 8. Speedups associated with using the BQ decomposition (based on solution time per Iteration) 16 - 14 - Birge-QI vs. Dual *-= Birge-Qi vs. Split i 12 - 10 8 6 4 2 ' [SC205 I. SCAGR7] / SCFXM1I /i / _^ ' SCRSB SCSD8 a 0 I I I I I 00 r-(C - M3 (C -I --- I- ~ ~~I I I Il Il~W00 )D ( lq- rl- COD I 00 (C t-i-. ---. - -.- t ----t- t -- t t - i (0 (D 00 (C o 00 c r- N- (CD r Scenarios

Figure 9. Distributed Solution Times (SC205) 100 - 90 80 70 60 c- 0 c 50 O 40 4 <L) (-) 30 20 10 0 4 scenarios 8 scenarios 16 sconaros 32 scenarios 64 Serial Mg. U 6- U-U — ^ ^ *It t —.t —4- -t-1-f 1 1 I 1 I r ---t t ---f - I — 1234 123456 123456 123456 12345 Processors