SOLVING STOCHASTIC LINEAR PROGRAMS VIA A VARIANT OF KARMARKAR'S ALGORITHM John R. Birge Department of Industrial & Operations Engineering The University of Michigan Liqun Qi Department of Applied Mathematics Tsinghua University Technical Report 85-12

Solving Stochastic Linear Programs Via a Variant of Karmarkar's Algorithm John R. Birge1 and Liqun Qi2 Abstract: We present a variant of Karmarkar's algorithm for the special structure of stochastic linear programming. We give a worst case bound on the order of the running time that can be an order of magnitude better than that of Karmarkar's standard algorithm. A related variant is applied to the dual program, and its implications for very large-scale problems are given. 1. Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, MI 48109, USA. His work was supported by National Science Foundation Grant NO. ECS-8304065. 2. Department of Applied Mathematics, Tsinghua University, Beijing, China, and Department of Mathematics and Statistics, University of Pittsburgh, PA 15213, U.S.A. This author's work was supported by the 1984-1985 Andrew Mellon Postdoctoral Fellowship at University of Pittsburgh.

1. Introduction The stochastic linear programming problem with fixed recourse (SLPF), whose random elements have discrete distributions can be formulated as: min cTx + Q(x) (1.1) s.t. Ax = b x > 0 where N (1.2) Q(x) = Z pkQ(x, 9k) k=l and for each k = 1,..., N, the recourse cost Q(x, Sk) is obtained by solving the recourse problem: (1.3) Q(x, Sk) = inf {qky I Wy = hk - Tkx, y ~R+ } ~k = (qk, hk, Tk), Pk = ptob [(wo) = Ski. no n1 mO The sizes of matrices are consistent with x E R, y R, b ~ R, and hk e R K, where mg < no, m1 < nl, and A and W have full row ranks. This formulation can also be regarded as part of an approximation scheme to SLPF, whose random elements have continuous distributions (see Birge and Wets [3]). Substituting the expressions for Q in (1.1), we obtain n min cTx + Z pkqkyk k=l s.t. Ax = b (1.4) Tkx + Wyk = hk k=l,...,N. x > 0, yk > 0, k=l,...,N. This is a highly structured large-scale linear programming problem. This structure is called dual block angular structure [15][16]. It has n = nO + Nn1 variables and m = m0 + Nm1 constraints. The methods for solving it include: 1

the L-shaped method, proposed by Van Slyke and Wets [13]; the decomposition method, proposed by Dantzig and Madansky [7]; and the basis factorization method, proposed by Strazicky [14], and modified by Kall [10] and Wets [15]. The first method directly solves (1.4), while the other two solve the dual of (1.4). Birge [1][2] discussed the relationship between them. Also see [15][16] for other references. If T1 = *' = TN, then (1.4) has a staircase structure. This type of problem is widely encountered in the context of dynamical systems, i.e., discrete versions of continuous linear programs or linear control problems [4][5][8][12][13][16]. In this paper, we study the approach of Karmarkar's new polynomial-time algorithm for linear programming [11] to solve (1.4). Karmarkar's algorithm is briefly described in Section 2. The order of his method is 0(n3'5 L2 in L ~nin L), where L is the length of the input. In Section 3, by means of the Sherman-Morrison-Woodbury formula and block matrix method, we explore the advantage of the special structure of (1.4) and obtain a variant of Karmarkar's algorithm for solving (1.4) with an order 0 ((n0o5nn + nn + n)nL2 n L 3 nn L), where n = max(no, nl). Section 4 presents another variant of Karmarkar's algorithm to solve the dual of (1.4) and Section 5 describes its implementation in sequential approximation methods. Section 6 provides some special cases where additional computational savings are possible. 2. Karmarkar's algorithm We briefly describe Karmarkar's algorithm in this section. This version of his algorithm was presented at the Symposium on Theory of Computing, Washington, D.C., April, 1984. It was later revised for publication in Combinatorica. Suppose that there is a linear programming problem: 2

min cTx (2.1) s.t. Ax = b, x > 0. where x C Rn, c e Zn, b ~ Zm, A ~ Zmxn Suppose that we have a strictly interior feasible point a of (2.1), i.e., (2.2) A a = b, a > 0. Also suppose that we know a lower bound t0 of (2.1). Karmarkar's algorithm creates a sequence of points x(0), x(l),..., x(k) by these steps: 1. x() =, k=O. An upper bound of (2.1): u = cTa. A lower bound of (2.1): Z = Qg. 2. A tentative upper bound ul = - + 2/3(u-,) and a tentative lower bound 91 = ~ + l/3(u-.). 3. If cTx(k) - Z1 is small enough, i.e., less than a given positive number ~, then stop. Otherwise do {xk+l = ~(x(k)}, where the function P is defined later. 4. If cTx(k+l) u1, then do (u=u1, k=k+l> and go to step 2. If f(x(k+l)) > f(x(k) - 6, then do (9 = 91, k = k+l} and go to Step 2. Otherwise do (k = k+l} and go to Step 3. CTX cTx f is the potential function: f(x) = Z An ( --— ). j Xj 6 is a constant and depends upon the choice of the value of a parameter a in P. A particular choice suggested in [11] is: a = 1/4, 6 = 1/8. g = C(a) is defined by the following operations: (i) Let D = diag (al,...,an}, e = (1,...,1)T ~ Rn+l, 3

AD -b B = _ j eT (ii) Cp = (Il - BT(BBT)-1 B) i.e., (Dc\ / DAT cTa (2.2) Cp I)( - T (AD2AT + bbT)-l AD2c - e. 0 \ -bT / n+l 1 a (iii) g' = -- e - -. n+l IICp112 / g \ (iv) Suppose g' ( where Rn, gnl R. gn+l 1 Then g = --- Dg. gn+l The main computational effort at each step k is to compute cD by (2.2). Karmarkar [11] also showed that to retain the linear decrease of the objective function value, it is only necessary to update those diagonal components Dii of D from the corresponding diagonal components Dii of old D if / Dii 2 (2.3) ( -Di ). [1/2, 2], \ Dii / after multiplying old D by a scaling factor. He showed that the average number of such ii's at each step is O(n0'5). If D and D' differ only in the ith entry, the inverse of M' = A(D 2)AT can be computed from the inverse of AD2AT in O(n2) arithmetic operations by means of rank-one updating formulas, such as the Sherman-Morrison formula [9] employed in [11]. Karmarkar showed that his algorithm converges on 0(nL) steps. In the revised version of Karmarkar's algorithm [11], a combined form of the primal 4

and the dual problems was used and the lower bound Qo is unnecessary. Since a lower bound may be obtained in practice for stochastic linear programs and since the combined form of the primal and dual problem is complicated for practical use, we still use the original version of Karmarkar's algorithm. The primal-dual form may, however, be solved by combining our individual approaches to the primal and dual problems in Sections 3 and 4. Additional work requires a lower order number of operations. The result is a complexity bound that sums the primal and dual bounds. 3. Computation Reduction To unify the notation, we rewrite (1.4) as N min Z c k k=0 s.t. AOxO = bo, (3.1) AkxO + Wxk = bk k = 1,...,N, Xk' O, k = 0,...,N nk mk where the sizes of matrices are consistent with xk, ck ~ R, bk E R, mk < nk, n1 = n2 =... = nN, ml =... = mN, AO and W have full row ranks. If an element of a vector or a matrix needs to be specified, we use parentheses, e.g., (Ak)ij. Let n = nO + Nnl, m = m0 + Nml, x0 cO b0 o A0 A1 W 0 x= c 1, b= [ ], A= 5: 0. t XN J LCN cL bN AN W Then (3.1) can be expressed exactly the same as (2.1). It is now ready for application of Karmarkar's algorithm as described in Section 2. Suppose 5

a=(a,...,aT)T is the current iteration point of Karmarkar's algorithm. Let Dk = diag ((ak)l, (ak)2...,(ak)nk}, D = diag (DO,...,DN}. As discussed in Section 2, the main computational work at each step of Karmarkar's algorithm is to calculate (2.2), which is dominated by calculating M1, where (3.2) M = AD2AT + bbT mgxm0 Theorem 3.1 Let SO -12 E R, Sk = WDWT, k = 1,...,N, S = diag <SO,...,SN}. Then S-1 = diag {So, S -1... SN1}. Let Bk = (Ak bk) for k =,...,N, dig Yg SI I ***' SN k k) for k O q... _-~~~ ~~n0+1 m0 DO = diag{D0, 1}, Il and 12 be unit matrices in R and R respectively. Let N (3.3) 1 = -2 + Bk sk Bk, G2 = -B0 G1 BG, k=0 B0 12 \ U = ( B1 0. BN / If G1 is invertible, then G2 and M are invertible and T'-1 (3.4) M-1 = S-1 - S-1 U 1 ) ( ( I ( uT-1( \0 2 I 0 G1 Bo 2 0 2G2 B2 2 I( 2 Proof Since W has full row rank, Sk is invertible for k = 1,...,N. Therefore, S is invertible. In Karmarkar's algorithm, a is positive, i.e., D is always invertible. Let D = diag {Do, I2}. Let / G1 -Bo \ /B DO I2 G = |, U 0= =UD. -Bo 0.. BN DO 0 It is seen that M = S + U UT. According to the Sherman-Morrison-Woodbury formula [9], M is invertible and 6

M-1 = S-1 - S-1 (I + UT S-1 u)-lTS-1 (3.5) = S-1 - S-1 U(D-2 + UTS-1 U)-uTs-1 = S-1 _ S-1 UG-1 UTS-1 if and only if (I + UT S-1 U) is invertible, i.e., G is invertible. Suppose G1 is invertible. Since G1 is a symmetric matrix, so is G1 9 and we can write G- GGl/2 G 1/2 where G 1/2 is also a symmetric matrix (although -1/2 possibly complex). Since Ao has full row rank, BOG12 also has full row rank. Since no + 1 i m0, G2 = -(BoGTl/2)(BoGl/2)T is invertible. It is easy to see that /Ii GI Bo II 1 B/ 7 G 0 1 0 ) ( )( )2( ) ( )I \0 0 G-1 Bo 2 ( I2 0 Hence, G is invertible. According to (3.5), M is invertible and (3.4) holds. B A variant of Karmarkar's algorithm for solving linear programming problem (3.1) is as follows: Algorithm 3.1 (1) Apply Karmarkar's algorithm described in Section 2 to (3.1) with A,b,D,C specified in this section. (2) Replace the expression of M-1 = (AD2AT + bbT)-1 in (2.2) by expression (3.4). (3) Update old D by multiplying by the scaling factor and when (3.6)...Dk)ii [1/2, 2], old (Dk)ii (3.6) --------- [1/2, 2], use old (Dk)ii instead of (Dk)ii in (2.2) and (3.4). (4) When (3.6) does not hold, use the repeated rank-one updating technique to calculate Sk, and G, i.e., k = Sk + oWiW 7

(3.7) (SW)-l = s51 - (s i)(SslWi)T Gi = GI - B(BT Sk1 Wi)(BT Sk1 Wi)T where Wi is the ith column vector of W, a = (Dk)i - old (Dk),ii & = a/(l + awT S-1 Wi). Then Sk, Sk, G1 < Sk, (Sk)1, G. We see that mathematically, Algorithm 3.1 is equivalent to Karmarkar's algorithm, when G1 is invertible at each step. The difference is in the numerical calculation orders. Theorem 3.2 (Complexity). Suppose G1 is invertible at each step. Let n = max(no, nl). Then the overall running time of Algorithm 3.1 is 0((n0*5 n2 + nn + n3)nL2 Zn L knin L). Proof At each step, we have: number of arithmetic work operations (in average) calculate S-1 by (3.7) 0(n0-5 n2) calculate G1 by (3.7) 0(n0~5 n2 + n3) calculate G2 directly O(n3) -1 -1 3 calculate G1 -and G2 0(no) multiply A with a vector 0(mn) multiply S-1 with a vector O(mml) multiply U with a vector O(mn0) Other efforts are small compared with the operations above. Sum up the above numbers to yield the number of arithmetic operations at each step in 8

average, O(n0'5 n2 + nn + n3). Since there are 0(nL) steps and since each arithmetic operation needs a precision of O(L) we obtain our conclusion. U We can roughly compare this order with 0(n3'5 L2 An L Znin L), the order of Karmarkar's algorithm in general cases. Suppose N ~ nO ~ n1, then 0((n05 n2 + nn + n3)nL2 An L knin L) = 0(n2'5L2 An L knin L), as we mentioned in Section 1. 4. Dual Formulation The dual problem to (3.1) can also be solved by Karmarkar's algorithm. The program has block-angular structure which leads to computational reductions similar to those of Section 3. An advantage is also gained for the use of Karmarkar's algorithm in approximation schemes. The dual of (3.1) is N (4.1) max bTro + E bk rk k=i N s.t. AfTr0 + Z Ak rk ~ Co k=l W TTk r ck, k = 1,...,N. An alternative with constraints of the form in (2.1) can be easily obtained. Rewrite (4.1) as N (4.2) max Z bk Yk k=0 N s.t. Z Ak Yk = cO' k=0 WT k k = C k = 1,...,N; Yk X 0, k=0,...,N; 9

where A6 = (As, -AT I), AT = (AT - AT 0), k=l,...,N, b = (bT -bT ), 0' k k' k' k k' k' k=l,..,N; W1 = (WT, -WT, I) and the Yk variables are defined accordingly. The only necessary modification of Karmarkar's algorithm is to change the sign on the second term in Step iii to reflect maximization. The computational effort is again dominated by calculating M-1 where M = AD2AT + ccT. Note that N N__ AD2AT, —T 2 — AD2T = Z Ai Dk Ak AfDJW... ANDNW k=O 0 WTD2IA WTDW * TD 0 WTDNAN 7 DW Now, observe that M = P + U VT where N P A Z K DK AK k=0 W D1 A1 W D1 W O 0 DN AN D U = I CO cN ~~~10 10

and 0 CO V = D A- c I WT DN AN CN The Sherman-Morrison-Woodbury formula yields (4.3) M- 1 = p -1 U(I + VT p-1 U)-VTP-1. Let N H = S Ak Dk Ak' k=O k k Akk IIk = W D Ak, k = 1,...,N, and Jk = D k = 1,...,N, then H-i (4.4) P-1 = -HH J1 0 _H1 H-1 0 -HNH-1 JN and N I N (4.5) (I + vTP-lu) = - H H - (H Hk H1 c + H I Jj c) k=l k k=l -.-... ——.-... —- r —---------. —_..._ -----—. —N I N c~H-1- ZE CH,1 I+cTH -lc+ Z (cHkHlo+ k) k1k Ck)k - k=l k=l _k Equations (4.3), (4.4), and (4.5) can then be used to develop another variant of Karmarkar's algorithm for (4.2). 11

Algorithm 4.1 1) Apply Karmarkar's algorithm from Section 2 to the data in (4.2). 2) Replace the M-1 expression by (4.3). 3) Update old D by the scaling factor & and if (3.6) holds, use old (Dk)ii instead of (Dk)ii in (2.2) and (4.3). 4) If (3.6) does not hold, use a repeated rank-one updating scheme to find (H')-1, H1p and (J)-1, where (4.6) H' = H + a()i (Ak)i., H- = Hk + a((WT)i (Ak)i., and J k = Jk + a(WT)i Wi., where Ti. denotes the ith row of matrix T, and a is as defined above. Equations (4.6) are then used repeatedly (with the Sherman-Morrison formulas for the inverses of H-1 and Jk1) to update p-1 and (I + VTp-1 U). (I + VTp-l U)-1 can then be calculated directly. Algorithm 4.1 is equivalent to Karmarkar's algorithm but the structure again allows for a reduction in computational effort. Theorem 4.2. The overall running time of Algorithm 4.1 is O((n05 n2 + nn + n3)nL2 Zn L ZnZn L). Proof: The computational effort in each step is: work average operations Calculate (H'F 1 O(n0'5 n2) Calculate (H')-1 O(n0'5 no) Calculate Hj(H')I O(n~-5 n0I) Calculate (H4)T(J))-1 O(n0 5 n2) 12

Calculate (I + VT p-l u)-1 O(n) Multiply A with a vector O(m nO + Nnl(ml + nl)) Multiply P-1 with a vector O(n n) Multiply VT with a vector O(no n) All other effort is dominated by these operations. Note that Hk(H')- and (Hk)T(Jk)- are found sequentially by updating (H')-1 and (Jk) first and then using the formula in (4.6) or H1. The number of operations in each step is O(n0'5n2 + nJ + ni), and the result is obtained as in Theorem 3.2. A rough comparison for N -n0 -n yields O((n05i2 + nn ++ n3)nL2 ~n L Znin L) = 0(n2*5L2 kn L nAZn L). This bound is the same as that provided in Theorem 3.2. 5. Dual Use in Approximations The dual provides an additional benefit. In using (3.1) to approximate a stochastic linear program with more than N realizations of the random vector bk, additional realizations bN+l,...YbN+g are added to (3.1) and the probabilities Pk' k=l,...,N+k are updated. (See Birge and Wets [3] for procedures to do this.) A feasible solution to the new problem (3.1) with N'=N+9 is not readily found but given a feasible solution (4.2) another can be found easily if T1=T2=...=TN=T and ql=q2=...=qN=q. Under these assumptions let the old solution be (aold, old,..., ld) where wT wld P q N and AoT old + Z TT To ld = c k=l Define a new solution by 13

(5.1) New = Old N TNew = ( Z T1ld) pewk=1,.N+. i=l The solution in (5.1) is then feasible in (4.2) with objective value N+ N N bT Tl8d + Z bT( Z Tld)pNew. The total effort for solving the new (4.2) k=l i=l should be less than resolving (3.1) because no procedures to find a feasible interior point are necessary (assuming, of course, that TOld is interior). The objective is generally close to the old optimal value of (4.2) and should be close to the new optimal value. The effort factor for the number of steps to solve the new (4.2) should therefore be much smaller than 0(nL). A similar procedure can be used in the primal problem (3.1) if the randomness is limited to the objective function, i.e., T1=T2=...TN=T and hl=h2...=hN=h but qi f qj for all i # j. Again, a starting feasible solution can be easily found by letting (5.2) xnew = xold, k= N, k k and New = argmin {qJ xld, i=l,...,N}, j=N+l,... N+. -J old xi N+Z new *Ol Now, let p = Z pjw, Zold be the objective value of the old (3.1) with xld j=N+l and let N+g zg T new ewq ew ZNew p cT xew + p j x new j=N+l J The new initial objective value in (3.1) is then N = o _ New jold + ze ZNew = (1-) cTld + j qj ld New' j=l and we wish to find 14

New min{(l1-)cTx0 + pNew qjx + z j=l N+ 9 where ZN = P c + Z peW qJx. Now, assume that pje/( -p) = p j =N+l 1 i.e., that the old probabilities have the same relative values in the new N problem, so that x~ld minimizes (1-p) cTx0 + Z pewqixi. We then have j=l (5.4) ZNew (l-P)zold + max zN ZNew > (l-p)zold + min zN* Let L' be the size of the data for the objective and constraints relating only to z4. This yields the following result. Theorem 5.1 The overall running time of Algorithm 3.1 starting from a solution defined by (5.2) for N+Z realizations of the objective qk, assuming T =T2.. =TN+=T, hl=h2.. hN = h, and pNew/(p = pldis O((n0~5 n2 + nn + n3)nL L' Zn L knin L). Proof: The number of operations per iteration follows from Theorem 3.2. The number of iterations 0(n L') follows from (5.4) for 20(L') > z > - 20(L'). * Theorem 5.1 specifies the comments above about the reduction in effort in solving the enlarged problem. The structure of the primal under the assumptions of Theorem 5.1 allows for a specific bound to be obtained. The entire approximating process can then be seen as a variable dimension variant of Karmarkar's algorithm in which the number of iterations is 0(T n L') where 9 additional problems of size L' are added after the initial solution of (3.1). 6. Special Cases The work of the variants of Karmarkar's algorithm is reduced for the following special cases. 15

(1) Simple recourse [16] ml x ml In this case, W = (I, -I), where I is the unit matrix in R, and ml + Ml n1 = 2m1. Write xk = (xk, xk), x, R or k=,...,N in (3.1). Let mlxml Dk = diag(D, Dk} in (3.2), where Dk, Dk jR, or k=l,...,N. Then in Theorem 3.1, we have S = W T = (Dj)2 + (D)2, k Wwk k k (4.1) Sk1 = diag ((a(a k)+(a2)-1,-,((ak)m +(ak)2 )-1}, where 1 1 D+ = diag }{a)'(a)m and Dk = diag ((ak)l,',(ak)m }, for k=l,...,N. 1 1 Therefore, S-1 can be easily calculated in (3.4). Substitute Dk in (3.6) by Dj, D-. Then according to (3.3), (3.7) can be replaced by (4.2) Gj = G1 + k'(Bk)I(Bk)i,' - ((ak)2 + (ak))- ew ((a) + (ak)?)-ld where (Bk)i is the ith row vector of Bk. The work is reduced but the order of the algorithm is the same. Similar modifications can be done for the dual approach. (2) Network recourse [17] In this case, W = (I I) where E is the network node-arc incidence matrix. The calculation work of Sk and Sk1 can still be reduced but again the order is the same. 16

Reference [1] J. Birge, 1984. A Dantzig-Wolfe decomposition variant equivalent to basis factorization, Mathematical Programming Study, to appear. [2] J. Birge, 1985. The relationship between the L-shaped method and dual basis factorization for stochastic linear programming, in Y. Ermoliev and R. Wets, eds., Numerical Methods in Stochastic Programming, to appear. [3] J. Birge and R. Wets, 1985. Designing approximation schemes for stochastic optimization problems, in particular, for stochastic programs with recourse, Mathematical Programming Study, to appear. [4] J. Bisschop and A. Meeraus, 1977. Matrix augmentation and partitioning in the updating of the basis inverse, Mathematical Programming, 13, 241-254. [5] J. Bisschop and A. Meeraus, 1980. Matrix augmentation and structure preservation in linearly constrained control problems, Mathematical Programming, 18, 7-15. [6] A. Charnes, T. Song, and M. Wolfe, 1984. An explicit solution sequence and convergence of Karmarkar's algorithm, Research Report CCS501, Center for Cybernetic Studies, University of Texas at Austin, 1984. [7] G. Dantzig and A. Mandansky, 1961. On the solution of two-stage linear programs under uncertainty, Proc. Fourth Berkeley Symposium on Mathematical and Probability, Vol. 1, University of California Press, Berkeley, 1961. 165176. [8] R. Fourer, 1984. Staircase matrices and systems, SIAM Review, 26, 1-70. [9] G. H. Golub and C. F. Van Loan, 1983. Matrix Computations, John Hopkins University Press. [10] P. Kall, 1979. Computational methods for solving two-stage stochastic linear programming problems, Z. Angew. Math. Phys., 30, 261-271. [11] N. Karmarkar, 1984. A new polynomial-time algorithm for linear programming, Combinatorica, 4, 373-395. [12] A. Perold and G. Dantzig, 1979. A basis factorization method for block triangular linear programs in: Sparse Matrix Proceedings, 1978, I. Duff and G. Stewart, eds., SIAM Publications, Philadelphia, 283-312. [13] R. Van Slyke and R. Wets, 1969. L-shaped linear programs with applications to optimal control and stochastic programming, SIAM J. Appl. Math. 17, 638-663. [14] B. Strazicky, 1980. Some results concerning an algorithm for the discrete recourse problem, in: Stochastic Programming, M. Dempster, ed., Academic Press, London, 263-274. 17

[15] R. Wets, 1983. Stochastic programming: Solution techniques and approximation schemes, in: Mathematical Programming: The State of the Art, 1982, A. Bachem, M. Groetschel and B. Korte, eds., Springer-Verlag, 566-603. [16] R. Wets, 1985. Large scale linear programming techniques in stochastic programming, in: Y. Ermoliev and R. Wets, eds., Numerical Methods in Stochastic Programming, to appear. [17] S. Wallace, 1985. On network structured stochastic optimization problem, report no. 842555-8, Chr. Michelsen Institute, Bergen, Norway. 18