SOLVING STOCHASTIC PROGRAMMING PROBLEMS VIA KALMAN FILTER AND AFFINE SCALING Sarat Puthenpura and Lakshman Sinha AT&T Bell Laboratories Murray Hill, NJ 07974 Shu-Cherng Fang North Carolina State University Raleigh, NC 27695-7913 Romesh Saigal The University of Michigan Ann Arbor, Michigan 48109-2117 Technical Report 94-24 October 1994

Solving Stochastic Programming Problems via Kalman Filter and Affine Scaling Sarat Puthenpura & Lakshman Sinha AT&T Bell Laboratories Murray Hill, New Jersey 07974 Shu-Cherng Fang North Carolina State University Raleigh, NC Romesh Saigal University of Michigan Ann Arbor, MI Kalman filtering theory [1] has been around for more than two decades and has been the backbone of modem applied stochastic estimation. It is used extensively in many areas including engineering and econometrics. A Kalman filter estimates the states of a system from observations made about the stochastic inputs and outputs of the system. The Kalman filter performs its operations in an attractive recursive fashion, and can be easily implemented on a computer. The Karmarkar's algorithm [2] is a relatively recent invention, which is an interior point algorithm to solve linear programming problems. It has been shown to be very effective in solving large linear programs. There are several variants of this algorithm, and the primal affine scaling algorithm [3] is a popular one. This algorithm has also been extended to solve convex programs with linear constraints [4]. A comprehensive treatment of various interior point algorithms has been the main focus of a recent book by Fang and Puthenpura [5]. In this paper, we first establish a parallelism between the Kalman filter and the affine scaling algorithm. This is done by appropriate modeling of the affine scaling algorithm in a state space form and identifying the corresponding elements in a Kalman filter framework. In this case, the stochastic elements of the filter are suppressed to emulate a deterministic system. These ideas have immediate extension to quadratic programming as well.

Page 2 Here several interesting concurrences (both conceptual and computational) between the affine scaling algorithm and the Kalman filter are established. This enables us to derive a unified treatment of both the estimation and the optimization problems. When we activate the stochastic elements of the Kalman filtering theory, we can effectively deal with stochastic linear programming problems. We obtain some theoretical results and a new Karmarkar-based algorithm is developed for this problem. This new approach appears to be very promising for applications in stochastic optimization and control of systems. 1. THE STATE SPACE REPRESENTATION OF LINEAR SYSTEMS In this section we review some basics of the state space representation of systems, within the framework of Kalman filtering. For the sake of brevity we consider only the discrete time representation of systems (as opposed to continuous time representation) with single input and single output (SISO). However, the results can easily be extended to multivariable systems in either discrete or continuous time. Consider a SISO system, with a given input sequence {u(t)} and output sequence {y(t)}, where t, sometimes referred to as the discrete time index, represents the sampling time. We say that the system is of order n if the sequences satisfy an n-th order difference equation of the form, y(t) + aly(t-l) + a2y(t-2) +... + ay(t-n) = bou(t) + blu(t-l) +... + bmu(t-m) (1) where m < n. The state space form transforms the above n-th order equation into n first-order difference equations by introducing dummy state variables. In equation (1), we can define variables: z(t) = y(t), z2(t) = y(t-l) = z(t-l), * * *, z(t) = y(t-n+l) = zl(t-l). Thus we have

Page 3 z(t) = Fz(t-l) + Gu(t) (2a) and y(t) = Hz(t) (2b) where z(t) = [Z (t),z2(t),,Zn(t)]T.-a1 — a2 * -an 1 0 * * 0 O 1.. O F= 0 0 0 *- 1 u(t) = [U I(t),U2(t),.,U (t),lT G =[bo,b l,-,bm], and H =[1,0,"-,0]. Equations (2a) and (2b) are usually called the state transition equation and the output equation respectively. Also, matrix F is commonly referred to as the state transition matrix. The eigenvalues of F represent the natural frequencies of the system. The choice of the state variables {z } is not unique. Actually it is possible to choose proper states so that F follows a suitable canonical form [6]. Moreover, the state variables can be chosen so that the states represent a physical aspect of the system under consideration. For example, in the case of a DC motor, the system can be conveniently modeled as a second-order system with two states representing the angular displacement and velocity of the motor shaft. The vector u(t) can be considereds s a forcing function of the difference equation (1), hence (2a). If u(t) = O for all t, the system is called an autonomous system.

Page 4 2. THE FILTERING PROBLEM If we have access to the states of a system, by virtue of the relationship (2a) and (2b), the output of the system can be expressed as a function of u(t)'ans ad z(t)'s. Ts aspect is discussed extensively in the literature of modem control theory, for example [6]. However, the control of the system becomes more challenging if the states are not readily available. We then have to estimate the states from some measurable aspects of the system. This means samples of input and output of the system are needed. A device which estimates the states of a system from input output samples is commonly referred to as an observer. If the data used by the observer is assumed to be noise free, the observer becomes deterministic. On the other hand, if the input and/or the output data is noisy, one has to estimate or filter-out the values of the states from the noisy data given some knowledge of the statistical properties of the noise. Such an observer is a stochastic observer and Kalman filter is a good example of it. The state space model of the Kalman filter can be obtained by extending the model depicted by equations (2a) and (2b). For a multivariable system, these are given as follows: z(t+l) = Fz(t) + Gu(t) + v(t) (3a) and y(t) = Hz(t) + e(t) (3b) where z(t)eRn, FeRnxn, GURxmm, HVRl, u (t)eRE, v(t)R, y)eR and e(t) R. The parameter v(t) is the vector of system noise (or plant noise vector) and e(r) is known as the measurement noise vector. We assume that e(t) and v(t) are from two independent Gaussian distributions with

Page 5 E[v(t)] = 0 (4a) E[e(t)] = 0 (4b) E[v(t)v(t)] = Q (4c) E[e(t)e(t)] = R (4d) and E[e(t)v(t)] = 0 (4e) where Q and R are covariance matrices (often diagonal) of orders nxn and Ixl respectively. The filtering problem is to estimate the state vector z(t+l) from the sigma algebra Yt of observations up to time t. If one estimates z(t+l) by minimizing its variance, it is equivalent to solving the conditional expectation problem given by z(t+l) = E[z(t+l) Y] (5) where z(t+l) denotes the expectation of z(t+l). The geometrical interpretation of the above operation is the projection of the observations onto the hyperplane defined by z(t)'s. A Kalman filter solves the above problem in a recursive fashion. 3. THE KALMAN FILTER The essential idea behind a Kalman filter is to propagate the expected value of the state vector and its covariance matrix in a recursive fashion. Without going into the detailed derivation [7], the essential mathematical equations that govern the Kalman filter of system (3a), (3b) are given as follows:

Page 6 z(t t) = z(t t-1) + P(t I t-)HT(HP(t Lt-)Hr + R)-l(t) - Hz(t | t-)) = z(t t-1) + K(t t-l)(y(t) - Hz(t I t-l)) (6a) P(t t) = P(t t-l) - P(t t-1)HT(HP(t I t-1)HT + R)-IHP(t | t-l) = P(t I t-l) - K(t I t-1)HP(t I t-l) (6b) where K(t t-l) = P(t I t-1)HT(HP(t I t-)H + R)-1 (6c) and finally, z(t+l t) = Fz(t t) + Gu(t) (6d) P(t+l t) = FP(t I t)F' + Q (6e) The matrix P is called the covariance matrix and matrix K the Kalman gain matrix. As described before, z(t+l I t) is the estimate of z(t+l) at the instant t (based on all the measurements made up to and including at time t), which is depicted by equation (5). The matrix P(t+l | t) is the covariance matrix of the above estimate of states. From this, we can select an arbitrary vector z(l I 0) to start the Kalman filter so long as the covariance matrix P(l | 0) is set very large (theoretically infinitely large). In a Bayesian framework, this amounts to non informatory prior on z(l 1 0). In practice, one can conveniently choose z(l I 0) = 0 and P(1 I 0) = i with X e R+ and k >> 0 and I is the identity matrix. It is also important to note that the Kalman filter can be applied to time varying systems in which the matrices F,G and H vary in time. 3.1 SIMPLIFIED FILTERING EQUATIONS FOR AUTONOMOUS SYSTEMS In this section the Kalman filtering equations are simplified for autonomous systems, which will be directly applicable to our forthcoming discussion. To begin with, we introduce the

Page 7 following notation. z(t+l I t) = z(t+l); z(t t-1) = z(t) and P(t+l I t) = P(t+l); P(t t-1) =-P(t). Now we state our results as follows: Proposition 1 For an autonomous system, the Kalman filtering equations are, z(t+l) = Fz(t) + K(t)(y(t) - Hz(t)) (7a) K(t) = FP(t)HT(HP(t)HT + R)- (7b) and P(t+l) = FP(t)FT + Q- K(t)HP(t)F (7c) Proof: Since for an autonomous system, u(t) = 0 for all t, it follows from (6d) that z(t+l) = Fz(t) for all t. This property, in conjunction with equations (6a) and (6b), and some simple algebraic simplifications yields the desired results. 0 Now let us focus on affine scaling algorithm for linear programming problems. 4. THE LINEAR PROGRAMMING PROBLEM AND THE AFFINE SCALING ALGORITHM The linear programming problem can be stated as follows in the standard form: minimize cTx (8a) such that Ax= b and x > 0, (8b) where

Page 8 A is an mxn matrix called the constraint matrix, b is an m vector called the right-hand-side, c is an n vector called the cost vector, x is an n vector of variables. The affine scaling algorithm [3] is summarized below. Assume x~ is an interior feasible point to the problem (8a) and (8b). We let D. be the diagonal matrix containing the components of x. Then the translation vector (Cp) from x~ to a new feasible point xI is given by c, aD2r (9a) where r = (c - ATw), (9b) w = (AD2AT)-1 AD2c, (9c) maX(e.PDxc) (9d) here ei is i-th unit vector and Px is projection matrix given by Px I - DXAT(ADxAT)-ADx (10) The parameter a is a scalar between 0 and 1. Vector r is called the vector of reduced costs, and vector w the vector of dual variables. The new interior feasible point x1 is obtained by x= xO-_ a2r (11) The affine scaling algorithm essentially transforms the LP problem by performing the affine mapping (scaling of the affine space defined by equations (8a) and (8b) with respect to the current feasible solution x~)

Page 9 x'= Dxx (12) and obtains the direction of descent by projecting the gradient of the transformed cost function (Dxo) on to the null space of the transformed constraints set (ADxo). The new solution xl is then obtained by translating the current solution (in the transformed space) along this direction, and then mapping the result back into the original space. Now an interesting observation on the affine scaling algorithm can be made. Proposition 2 The dual solution in equation (9c) is argmin I (ADx)w - Dxc 112. Proof: Let V = I (AD)Tw - Dc 1 12 Consequently, V = [(ADx)w - Dc]T[(ADx)T- Dxc] = wTAD2xAT - 2wAD2c + Dx2crc Partially differentiating V with respect to w and setting it to zero yields the stated result. 5. THE AFFINE SCALING METHOD IN A FILTERING FRAMEWORK Based on the observation by Puthenpura and Sinha [9], we now derive the connection between the Kalman filter and the affine scaling algorithm, both conceptually and computationally. Theorem 1. The Kalman filter minimizes E[e'(t) e'(t) i Y ] (13a) where e'(t) is called the filtered state error vector defined by

Page 10 e'(t) = z(t) - z(r) (13b) and z(t) is the state estimated by the Kalman filter. Moreover, when the filter converges, E[i(t) I Yt] = (14a) E[n(t)yT(t)] = 0 (14b) E[e'(t)ri(t)] = 0 (14c) E[e'(t) Y'] = 0 (14d) where rl(t) = y(t) - Hz(t) is called the innovations or residual vector. Proof: See [7] and [8] for a complete proof of these results. 0 One important note should be made here: Conditions (14a) to (14d) imply that the filter converges when the estimates and the residuals become statistically orthogonal to (statistically independent of) the stochastics of the system. In this case, no component of the statistical information in the stochastics of the system contributes to the improvement of the estimates of the states. Theorem 1 indicates that the states estimated by the Kalman filter, z(t), is a minimum variance estimator of z(t). At the same time, Proposition 2 shows that the dual vector w of the affine scaling algorithm is a least squares solution. Thus, the underlying computation for both cases is the minimization of deviations measured by the 12 norm. This is the key to the parallelism between the affine scaling algorithm and Kalman filter. We now introduce a state space model to exam the affine scaling algorithm (depicted by equations (9a) through (11)) in a Kalman filtering framework.

Page 11 Theorem 2 The state space equations for the affine scaling algorithm are w(t+l) = w(t) (15a) and Ck(t) = Ak(t)T(t) + e(t) (15b) where Ck(t) = Dkc and Ak(t) = ADk; Dk being a diagonal matrix with diagonal elements being components of xk. Proof: At any step k of the affine scaling algorithm, from Equation (9c), we see the dual vector is unique provided the matrix A has full row rank. Therefore, if we use the Kalman filter to estimate the dual corresponding to the current interior point, it has to converge to this unique dual vector wk. That is, at some sufficiently high value of t, we expect w(t+l) = w(t) = wk. This justifies (15a). Now, from equation (11), we have x+ = xk - 3D2(c - ATwk) (16a) where 6 =. Rearranging terms results in Dkc = DkATw - 1 Dl [t+l - xk] (16b) Adding the discrete time index t to the above equation (to emulate the filtering environment),

Page 12 we get Equation (15b). Note at Equation (15b) depicts e t instant of t filtering iteration at affe scaling Note that Equation (15b) depicts the tth instant of the filtering iteration at the k th affine scaling iteration. The following result is a direct consequence of Theorem 2: Proposition 3 The affine scaling iteration is a special case of applying Kalman filter to an autonomous system with F=I H = ATD y(t) = Dkc Q = R = dI Q=R=:I — O0. so that the state estimates z(t) converges to the dual vector wk corresponding to the current interior point xk, when the filter converges. Combining Propositions 1 and Proposition 4, one can immediately set up the Kalman filter equations corresponding to the affine scaling algorithm. To render further clarity, a stepby-step implementation procedure on emulating affine scaling iteration via Kalman filter is given below.

Page. 13 1. Initialization of affine scaling iterations: Find x~ such that Ax~ = b, x~ > 0. Select 0 < a < 1. (One may use a Phase 1 LP here, which itself could employ the Kalman filter approach) 2. Kalman filter iterations to obtain the dual vector: (a) Set t = O; w(O) = O; P(O) = I such that X. 0. Also set H = ATD; y(t) = Dkc for all t, Q = R = el such that Dk is a diagonal matrix with diagonal elements being the components of xk and e -- 0. (b) Perform the updates w(t+l) = w(t) + K(t)[y(t) - Hw(t)] P(t+l) = P(t) + Q - K(t)HP(t) where K(t)= P(t)HT[HP(t)H + R]-1 and set t <- t+l until I I P(t) I < - where 4 is arbitrarily small, for some t = T. (c) Set wk = w(T). 3. Affine scaling update to obtain the primal vector: Set rk+ = c- ATw*; dk = -Dkr t =min a Ia d <0] xk+' = x* +Dd 4. Test for optimality of x*+l (KKT conditions). If optimality conditions are not satisfied, set k <- k+l and go back to Step 2. Note that the main computational burden at each affine scaling iteration is the evaluation of the matrix inverse (AD2AT)-l which is an O(m3) operation given A is a m x n matrix. Now if we invoke KFS to compute the dual vector (see Step 2 of the above step-by-step implementation procedure), then the bulk of the computation is to evaluate the Kalman gain

Page 14 matrix K(t), specifically the computation of [HP(t)HT + R]-' which is an O(n 3) operation. Note that this inverse operation is to be carried out several times within an AFS iteration. Moreover, the matrix P(t) is dense so that sparse matrix techniques cannot be employed to reduce the computational burden. Therefore, for deterministic case, invoking KFS tends to be a rather expensive way to obtain the dual vector. However, as will be seen later, the real usefulness of this approach will be for the stochastic case where the dual vector will be filtered out by the KFS from noisy (stochastic) environment. Before that, let us look at the correspondence between Kalman filter and the affine scaling algorithm in a detailed manner, which will be useful in subsequent discussions. 6. CORRESPONDENCE BETWEEN KALMAN FILTER AND AFFINE SCALING ALGORITHM The conceptual and computational parallelism between the the Kalman filter and affine scaling are summarized in this section. For brevity we denote the Kalman filter as KFS and the affine scaling algorithm as AFS. 6.1 ENVIRONMENT OF OPERATION KFS operates in a stochastic environment, while AFS works essentially in a deterministic framework. 6.2 MEASUREMENTS (OBSERVATIONS) KFS estimates the states from the measurements of the noisy output of the system i.e., y(t). By virtue of Proposition 3, AFS derives the information from the gradient of the cost in the transformed space, i.e., Dc. This is quite true, as c is a constant the only variable is Dx or x itself, which essentially contains the entire information.

Page 15 6.3 OPTIMAL ESTIMATES KFS estimates the states z, of a system in the sense of minimum variance from the observations. AFS basically computes the dual vector w, in the sense of least squares, from the observation of the current interior feasible point. 6.4 SOURCES OF INFORMATION Both KFS and AFS can be considered as a predictor-corrector mechanism. That is, start at some point and move towards optimality in steps by predicting the next point from the information that is currently available. As far as KFS is concerned, the next estimate of states is based on the "residuals (innovations)" defined as s(t) = y(t) - Hz(t) (17a) For AFS, the next direction of move is based on "reduced cost vector" (rk(t)), that is defined in the transformed space by the equation rk(t) = Ck(t) - A (t)w(t) (17b) where ck(t) and Ak(t) are defined in Equation (15b). 6.5 OPTIMALITY CRITERIA For KFS, the optimality conditions are given by the zero mean of residuals (equation (14a)) and the statistical orthogonality (statistical independence) between the residuals vector and the observation vector (equation (14b)). Likewise, for AFS optimality occurs when the reduced cost are non-negative (dual feasibility) and the reduced costs vector and the current interior point vector becomes geometrically orthogonal (complementary slackness).

Page 16 By now we see that the affine scaling algorithm is equivalent to the Kalman filter working in a deterministic environment. In other words, one can emulate the affine scaling algorithm by the Kalman filter by shutting off stochastic elements. Now the question is, why can't w: make use of the inherent power of the Kalman filter to deal with stochastics, to solve stochastic programming problems. After all, the Kalman filter has been designed to deal with stochastic systems. This idea leads to our accomplishment of developing a Karmarkar-based algorithm for stochastic linear programs. 7. KARMARKAR-BASED ALGORITHM FOR STOCHASTIC LINEAR PROGRAMS In this section we introduce stochastic parameters into the linear program and focus on two classes of problems. The first consists stochastic cost vector and deterministic constraints, and the second class of has both the cost vector and the constraint matrix stochastic. 7.1 STOCHASTIC PROGRAMMING WITH DETERMINISTIC CONSTRAINTS AND STOCHASTIC COST VECTOR The problem can be stated mathematically as follows: minimize [co+&(t)]Tx (18a) such that Ax = b and x > 0, (18b) where 8c(t) is assumed to have zero mean Gaussian noise with a covariance matrix Y and co is the unknown "true" value of the cost vector. Note that in this case, the projection matrix which projects on to the null space of the constraint set is deterministic, as A and b are deterministic. However the projected gradient of the cost and hence the dual estimates at each step of the iteration are stochastic.

Page 17 Proposition 4 The state space equations at the k th affine scaling iteration, corresponding to the case with only the cost vector stochastic, are: w(t+l) = w(t) (19a) and Ck(t) = Ak(t)TW(t) + e(t) (19b) where Ck(t)= Dkc(t), Ak(t)= ADk, and Dk is diagonal matrix whose diagonal elements are components of xk. Proof: When A and b are deterministic, the projection operator onto the hyperplane containing the rows of ADk is unique. This implies (19a), provided A is of full rank. Also note that Ck(t) is the stochastic measurement (as c is stochastic) which implies e(t) is the noise to be filtered using the output equation (19b). Now assume that we are at the k th affine scaling iteration. We Set (a) P(O) = XI where X >> 0, (b) w(0) arbitrary (a good choice would be the dual vector of the previous iteration). We set (c) Q = EI, where E - 0, and (d) R =

Page 18 in the Kalman filtering framework as described in Proposition 1. Then we run the filter until the optimality criteria of Theorem 1 are satisfied. Assume that w(T) and P(T) are the estimate of the dual vector and its covariance matrix respectively, when the filter converges. We take w(T) as the value of the dual vector for the kth affine scaling iteration, and translate xk to x+' via Equation (11). While doing so, one could use an estimate of the expected value of c in Equation (9b). A good choice is to use the mean value of the known samples, i.e., c(M) = c(M-1) - [c(M-1) - c(M)] (20) M where c(M) is the mean up to the M h sample and M is the number of samples obtained up to time t. 7.2 STOCHASTIC PROGRAMMING WITH STOCHASTIC CONSTRAINT MATRIX AND STOCHASTIC COST VECTOR This problem can be stated mathematically as follows: minimize [co+c(t)]rx (21a) such that [Ao + 8A(t)]x = b and x > 0, (21b) where &c(t) is Gaussian noise with zero mean and covariance matrix C, and 5A(t) is Gaussian with zero mean and covariance matrix ZA. We also assume that b is deterministic and that the system (21b) is feasible for all outcomes 8A(t). We show later how the infeasibility of (21b) can be handled. Proposition 5 The state space equations at the k th affine scaling iteration, in the case that both the cost vector

Page 19 and the constraint matrix are stochastic, are: w(t+l) = w(t) + v(t) (22a) and Ck(t) = Ak(t)Tw(t) + e(t) (22b) where Ck(t) = Dkc(t) and Ak(t) = A(t)Dk. Proof: When A is stochastic, the projection onto to the space containing the rows of A(t)Dk is not unique. Hence, the dual vector w has some inherent uncertainty or system noise and it is impossible to completely filter out this noise from the dual estimates. As a matter of fact, the covariance matrix of the dual estimate does not vanish, but asymptotically reaches the CramerRao lower bound [12]. Correspondingly, we can model the dynamics of the system corresponding to the dual vector as described in (22a). Equation (22b) can be obtained in the same manner as before. In this case we can set up the Kalman filter as follows: At kth affine scaling iteration, we set (a) P(O) = XI where ) >> 0, (b) w(O) arbitrary [a good choice is the dual vector at the previous affine scaling iteration (k-l)], (c) Q =A (Since A(t) contributes to v(t))

Page 20 (d) R = EAr + c (due to the fact that A(t) and c(t) contribute to e(t)) in the Kalman filtering framework as described in Proposition 1. Then we run the filter and stop it when the optimality criteria of Theorem 1 are satisfied. Assume that w(T) and P(T) is the value for the dual vector estimate and its covariance matrix respectively, when the filter converges. We shall take w(T) as the value of the dual vector for the kt affine scaling iteration. 7.3 INFEASIBILITY OF (21b) In order to avoid the infeasibility of (21b), we only consider those samples of A which makes the set {x I [A~ + SA(t)]x = b, x >0} feasible. A simple Phase-1 step can be quickly detect the case that a sample of A leads to an infeasible problem. By discarding such samples in the Kalman filter, we have the minimization problem (21a) conditioned on feasible samples of A. 8. ERROR ANALYSIS In this section we look into an analysis of the error propagation properties of the proposed algorithm, and assume that (21b) is feasible for all values of A in the sample space. Theorem 3 Assume that we are at an interior feasible point xk, and the algorithm generates a new interior feasible point x+1 for the interior feasible point. If A is contaminated by a Gaussian noise with zero mean and covariance matrix ~A and c is contaminated by a Gaussian noise with zero mean and covariance matrix E,~ then we have

Page 21 E[x+l I Y ] = o+ (23) and E[(xk+l)(+)] I yt ] 2D,22 [I + ATP..Ao + ZAD2 (24) where Xok+ is the true solution vector, Y' is the sigma algebra of observations, x = x+l - x, Ao is the true value of A ( which in practice could be replaced by E[A(t)] ), P, is the steady state covariance matrix of the dual estimate, and 0 < a < oo. The scalar 3 is a positive constant which assures that x+T will lie strictly within the polyhedral set defined by the constraints. Proof: Let w(t) be the estimate of the dual obtained by the filtering process, Ao and co be the "true" values of A and c. Given the history Y' of observations, we define the "true" interior (primal) solution as xo+l = x -PD2 [cO-(Ao)TWo] (25) where wo is the'true' dual solution at the interior point xk. Also, the "stochastic" interior point is given by xk+l = x - D2 [c(t)- (A(t))Tw()] (26) Subtracting (24) from (25) and substituting x(k+l), c(t), A(t) and w(t) for x(k+1) - xO+1, c(t)- - C, A(t) - Ao, and w(t) - wk respectively, we obtain + = -, PD2 [c(t)- (Ao)Tw(t)- (A(t))TW -(A(t))Tw(t)] (27) It can be readily confirmed that E[x(r+l) | Y'] =. (Note that the conditioning on Y1 renders x(t) a constant). Hence the first part of the theorem follows. Also

Page 22 E [(x (x ) | Yt = p2- 2 [E (c(t)c | ) ] + E AOW(t)(w(t))TAo Y] + p2D2 [E (A(t))TWo(wo)TA() I T]D (28) Since w(t) is the dual estimate obtained by Kalman filter, and the filtering process has stabilized, all the stochastics of the system become "statistically orthogonal" to the error in estimate. Hence the other terms like c)(t)wTA(t), c(t)(w)TA(t) etc. in the calculation of E[x (x )r] from (27), have zero expectations. Also, the Kalman filter gives E[w(t)(w(t)) as the matrix PO,. Now, E [(A(t))Tw o(wo)TA(t) I Y' t oE (A(t))rA(t) I Y] =YA (28a) where a= sup{p [w(wo) r}; p(.) denotes the matrix spectral radius. Clearly a 0. Now, given the primal LP is feasible, the duals are bounded, hence a < oo. Thus we have (24). Corollary 1 For the case where only c is stochastic, we have E[xk+l I Y']= x +1 and E[(x )(x ] r Y']= [D cD (29) Proof: Since only c is stochastic, clearly 2A = O and the error in the state equation v() = 0, and from a standard result, P, = 0. Hence (29) follows from (24). Notice that Theorem 3 and Corollary 1 indicate that the proposed algorithm generates an

Page 23 unbiased estimator with controlled variance. The key thing to note here, that the step length P can be adjusted to control the variance. This is intuitively true. In a stochastic environment, taking small step lengths (that is, making "cautious translations") is expected to reduce the uncertainty in estimates. However, the price we pay here is the slow convergence (that is, sacrificing efficiency). More studies need to done in striking a trade off between the reduction of variance and improving the efficiency of the algorithm. When the algorithm is converging to the solution (i.e. x is converging to the solution), by considering the limiting behavior, the trajectory, starting at xk under the stochastic case, and the trajectory starting at xk but determined by the true values A and/or c, are the closest in the minimum variance sense, dictated by the Cramer-Rao lower bound. Thus, the estimate of the conditional variance given by (24) is a reasonably good estimate of the variance of of the solution generated, at least for the stochastic cost vector only case. In the other case, only a bound on the variance can be established. Our preliminary numerical experiments also support the above claim. However, we caution that the conditional variance ( given by (24) ) must not be used to predict the variance of the solution in the earlier iterations. Further research work needs to be done in estimating the variance of xk also recursively in a computationally efficient manner. 9. CONCLUSIONS We have shown how the Karmarkar algorithm in conjunction with Kalman filter can be used to solve stochastic programming problems. As it stands now, with the proposed approach, one can solve stochastic linear programming problems with stochastic cost vector and/or stochastic constraint matrix. To our knowledge, this is the first interior-point method that solves linear

Page 24 programs in a stochastic environment The proposed approach actually differs from the conventional approaches studied in [10] to [13], since these approaches rely on the knowledge of the distribution functions of the stochastic elements while our approach only depenids upon a finite number of available samples. This distinction renders us the capability to perform "online" analysis or "on-line" monitoring of stochastic linear programs. For the problem with nonGaussian noise, the idea of "robust Kalman filtering" [14] and "bootstrap algorithm" [15] can be incorporated in our approach. More work needs to be done to extend these results to the cases involving stochastic right hand side vector along with stochastic constraint matrix as well as stochastic cost vector. Besides, means have to be explored to enhance the computational efficiency of the proposed method. These aspects are under investigation by the authors. Acknowledgement We thank the anonymous reviewer, whose comments were very useful to improve the original version of this paper.

Page. 25 REFERENCES [1] Kalman, R. E., "A new Approach to Linear Filtering and Prediction problems", Trans. ASME J. Basic Engineering, Vol. 82, pp.35-45,1960. [2] Karmariar, N. K., "A New Polynomial Time Algorithm for Linear Programming", Combinatorica, Vol.4, No.4, pp.373-391, 1984. [3] Vanderbei, R., Meketon, M.S., and Freedman, B.A., "A Modification of the Karmarkar Linear Programming g Algorithm", Algorithmatica, pp.395-407, 1986. [4] Freedman, B.A., Puthenpura, S.C., and Sinha, L.P., "A New Karmarkar-Based Algorithm for Optimizing Convex, Non-Linear Cost Functions with Linear Constraints", 13-th International Math. Symp., Tokyo, Japan, Aug.29-Sept.2, 1988. [5] S.C. Fang and S. Puthenpura, "Linear Optimization and Extensions: Theory and Algorithms", Prentice Hall, 1993. [6] Kailath, T., "Linear Systems", Prentice Hall, 1980. [7] Haykin, S., "Adaptive Filtering Theory", Prentice Hall, -1986. [8] Anderson, B.D.O, and Moore, J.B., "Linear Optimal Control", Prentice Hall, 1979. [9] PuthenpuraS., and Sinha, L.P., "A Joint Observer-Controller Design Based on the Variants of the Karmarkar Algorithm and the Kalman Filter", 14-th Mathematical Programming Symposium, Aug. 1991, Amsterdam. [10] Tintner, G., "A Note on Stochastic programming", Econometrica, Vol. 28, pp. 490-495. [11] Tintner,G., and Sengupta, J.K., "Stochastic Economics: Stochastic Processes, Control, and Programming", Academic Press, 1972.

Page 26 [12] Van Trees, H.L., "Detection, Estimation, and Modulation Theory", Part I, Wiley, 1968. [13] Stancu-Minasian, I.M., "Stochastic Programming with Multiple Objective Functions", D. Reidel Publishing Company, 1984. [14] Masreliez, C.J. and Martin, R.D., "Robust-resistant Spectrum Estimation", IEEE Transaction on Automatic Control, Vol.22, pp. 361-371, 1978. [15] Puthenpura S. and Sinha, N.K., "Robust Bootstrap Method for Joint Estimation of States and Parameters of a Linear System", Journal of Dynamic Systems, Measurement, and Control, Vol.108, pp.255-263, 1986.