Rolling Horizon Procedures in Nonhomogeneous Markov Decision Processes Jeffrey M. Alden General Motors Research Laboratories Warren, MI 48090 Robert L. Smith Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109 Technical Report 87-25 October 1987

Rolling Horizon Procedures in Nonhomogeneous Markov Decision Processes Jeffrey M. Alden General Motors Research Laboratories Warren, MI 48090 Robert L. Smith Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109 August 1987 Abstract By far the most common planning procedure found in practice is to approximate the solution to an infinite horizon problem by a series of rolling finite horizon solutions. Although a great number of empirical studies have been done, the so-called rolling horizon procedure has heretofore been the subject of few analytical studies. We provide a cost error bound for a general rolling horizon algorithm when applied to infinite horizon non-homogeneous Markov Decision Processes, both in the discounted and average cost cases. We show that a Doeblin coefficient of ergodicity acts much like a discount factor to reduce this error. In particular, we show that the error goes to zero for any fixei rolling horizon as this Doeblin measure of uncertainty about the future increases. The theory is illustrated through an application to vehicle deployment. The classical approach for solving homogeneous Markov Decision Processes over an infinite horizon is to employ either policy or value iteration (Denardo [1982]) to solve the corresponding dynamic programming functional equation. Policy iteration (a variant of the simplex method) solves the functional equation in a finite number of steps while value iteration involves a series of finite horizon approximations. The nonhomogeneous case, while not even finitely expressable, can also be solved by a form of value iteration that recursively mecovers a sequence of optimal policies by solving a sequence of successively longer horizon problems (Hopp, Bean, and Smith [1984], Bhaskaran and Sethi [1985]). The sequence of finite horizons employed to recover a sequence of optimal policies are however variable and difficult to discover (Bean and Smith [1986]). In practice, planners employ a fixed horizon procedure to iteratively recover a sequence of policies. The procedure is to fix a horizon T, solve the corresponding T period problem, implement the initial policy found, roll forward one period and repeat from the new current state. This rolling horizon procedure is by far the most common method employed in practice for generating and implementing solutions to non-homogeneous problems (see for example Baker [1977]). 1

Because the horizon employed by a rolling horizon procedure is fixed, the sequence of policies (or strategy) generated may fail to be optimal. Bean, Birge and Smith [1987] have determined the error in cost obtained as a function of horizon employed for a variant of the standard rolling horizon procedure. They require a full roll of T periods so that the entire subsequence of policies identified in solving the first T period problem must be committed to. In related work, Lee and Denardo [1983] provide for the dynamic lot size problem a bound on the error from infinite horizon optimal under the assumption that one continues optimally from T onwards. In this paper, we bound the error from infinite horizon optimal as a function of horizon employed for the standard rolling horizon procedure. The analysis, provided in Section 1, is complicated in this standard case by the policy interactions induced by overlapping subproblems. In Section 2, we reduce the bound on error found in Section 1 by exploiting the mitigating effects of uncertainty as measured by a coefficient of ergodicity we call the decoupling coefficient. We show that, as this measure of uncertainty about the future grows, the error between the infinite horizon optimal value and that attained by a rolling horizon strategy goes to zero regardless of horizon used. This result provides rigorous support for the commonly held view that uncertainty about the future justifies short term planning. In Section 3, we consider an application to vehicle deployment. 1 The Rolling Horizon Procedure and Its Associated Error We begin with a formulation of the problem to be solved. 1.1 The Markov Decision Process Consider a system which, during all time intervals or periods, can be described as being in exactly one of a finite set of states S = {1, 2,..., n}. Corresponding to each state i E S and period k = 1,2,3, * is a finite set of feasible decisions, Dk. If the system is currently in state i during period k and decision x7 E D' is chosen, then the system makes a transition into state j during period k + 1 with probability P, (4x). In general, the cost of selecting the decision and making the transition may depend on the state we transition into as well as the decision. However, by calculating the expected cost of making a transition over all possible such states and using the additivity of expectation, we may, without loss of generality, assume that the expected cost Ci (x) depends only on the current state i, period k, and decision taken 4x. We assume that all costs are non-negative and uniformly bounded from above, i.e., for all i and k 0 < C() < C < oo for all a, E Do. Costs incurred during period k are discounted by a factor ak-1 where 0 < a < 1. The column vector ak = (xk ^xx,...,x )z that associates feasible decisions to all possible states in period k is called a feasible policy for period k. A sequence of feasible policies, x = (zl, x2, 3,. ) is called a feasible strategy. A Markov Decision Process (MDP) is the problem of determining a feasible strategy that minimizes discounted expected costs over the infinite horizon. To express this problem mathematically, we adopt the following additional notation where 1 < I < k throughout. z:X = (Zx,:x+l,-.., ) E DD = [ =Di is the set of feasible policies for period i where Di = (Df, Z,/, D** * D =. Di - D)k= D is the space of feasible strategies which we assume to be closed (Bean and Smith[1984]). Pk(x) = (PX (4)) is the kt period one step transition probability matrix under policy xk. plk(zlk) = r=k Pi(zJ) is the (k- + 1)-step transition probability matrix from periods 1 through k under policies zx through k where P-1 = I. C (z) = (Ck(X), C2(X), * C(X)) is the column vector of expected costs under policy xk during period k. Ck(Xz), as with other expected costs considered 2

later, is a column vector whose ith component represents a conditional expected cost given we enter state i at the beginning of period k. The MDP is then (P) min C(x):ED where C(x) = S -=l Ock- Pl'k-l(xl'-l )Ck(xk). Because all costs are uniformly bounded, the discount factor is less than one, and the set of feasible strategies is compact, the minimum above is attained (Bean and Smith[1984]). Let x* be an optimal strategy and C* its corresponding optimal cost. Note that z* incorporates an optimal policy (x*)' for the first period that provides an optimal decision for all possible initial states. Further, the fact that subsequent policies can only depend on the state entered during that period and not on previously visited states involves no loss of optimality by the principle of optimality. 1.2 The Rolling Horizon Algorithm A rolling horizon procedure generates a (suboptimal) strategy for (P) by solving a sequence of subproblems, each with a fixed finite horizon. The initial periods of these subproblems start at one and increment by one for each subproblem in the sequence. The optimal initial policies of these subproblems form a suboptimal strategy for the original infinite horizon problem (P). Let *(l,k) = arg min Cl(xl(k):5k ED's where Clk(xlk) = E=l j-lPl'j-l(z'-l)Cj(xz), so that x*(l,k) is an optimal substrategy for periods I through k. Let C'(l,k) = Ck(z* (1, k)) be the optimal expected cost for the subproblem spanning periods I through k. Rolling Horizon Algorithm 1. Set I =1 andk = I+T-1. 2. Solve for x*(l, k) and set ix = (x(l, k))1. 3. Let l = I +1 and go to 2. The algorithm above recursively generates the suboptimal infinite horizon strategy 2=(31B21, ^3,..'). The strategy x is called a rolling horizon strategy. Let C = C(x) be the expected cost vector of x. In an implementation of the rolling horizon algorithm, the random state II occurring in period I from implementation of the previous 1 - 1 policies would lead to implementation of decision 1 during period I after iteration I. Because the determination of x' is, for all 1, based upon a truncated version of the original infinite horizon problem, the resulting strategy x is myopic and may not be optimal. Consider, for example, the following deterministic special case of (P), represented in Figure 1 as a decision network. The numbers above all of the decision arcs correspond to the discounted cost of making that decision all of which have a duration of one period. The problem is to make an infinite sequence of decisions at minimum total cost. It is clear that the minimum cost strategy x* is to move downward and then across resulting in an optimal cost of 1 + - + - +. = 2. However, the rolling horizon strategy x generated by a horizon of T equal to 2 periods 3

1 11 11 1 Figure 1: An example of problem (P) where a rolling horizon procedure (using a horizon of 2 periods) generates a poor solution. when applied to the initial state node 0 is to move up and then across for a total cost of 1 + 1 + 1 + or infinity. The rolling horizon strategy is in fact the only infinite cost strategy! It is interesting that a full roll solutiontachieves a nearly optimal finite cost even though most decisions are made with less information. As this example illustrates, the departure of a rolling horizon solution from optimal can be dramatic. We turn, in the next section, to an analysis of the magnitude of error involved for less pathological instances of (P). 1.3 An Error Bound for Rolling Horizon Procedures In this section, we derive a tight upper bound on the difference in expected cost between a rolling horizon strategy x and an optimal strategy z*. We begin with some definitions. Let Cj = C*(l,l + T - 1) be the optimal expected cost of the Ith subproblem and z(l) be any strategy such that zl'l+T-l () = z*(l, + T- 1) is its optimal substrategy. Let C(l, k) = CIk(xl) be the expected cost of the rolling horizon strategy spanning periods I through k. We turn now to introducing the important notion of an extension cost. Note that the optimal substrategy in z(l) obtained from solving the Ith subproblem of a rolling horizon algorithm can always be extended to form a feasible substrategy for the + 1" subproblem. In particular, any strategy Q(l) with a substrategy of the form T(I) = (1+(1), Z+(1),...,,zl+Tl'(l), z+T) for T+T E Dl+T (formed by dropping the initial policy of zl'+T-(1) and adding any feasible policy for period 1 + T) is feasible for the 1 + 1't subproblem. Let el = min al+T-1 pi+T-1 (ZI+1,+T-1 ())Cl+T(zl+T) ZJ+TiEDI+T be the expected cost vector whose ith component is the minimum conditional expected cost of extending substrategy z(l) given the state entered beginning period 1. This is called the Ith extension cost. 4

We now establish in the following lemma an inequality that bounds from above the expected cost of a rolling horizon strategy in terms of its extension costs. Lemma 1.1 (C < Cj + ~ E(g) where at is a random vector under strategy i whose ith component, i = 1, 2,, n, takes on the value (er)j, when the initial state of (P) beginning period 1 is i and I~ is the state entered beginning period 1. Proof: See Appendix. ~ Using Lemma 1.1, we now establish an upper bound on the cost error between Z and z*. Theorem 1.2 Let e =.~1 aD'C' e where C' = max=12,,...,n minl eD C'(z) and e = (1, 1,..~,1)'. The error in cost between a rolling horizon strategy x and an optimal strategy Z for problem (P) is uniformly bounded from above by aTe, i.e., C-C < aTe. In particular, -CC < aT Ce - 1- Proof: Since yl+T is feasible, el = min al++T-1pl,l+T-l(zi+ll+T-1(i))CI+T(V+T) xl+TEDI+T < a'+T-ipl,,+T-1(ZI+1,1+T-I(l))Cl+T(++T) where y.+T = argmin_,+rTD+r Cl+T(zi+T) for i = 1,2,.., n. Since CI+T(yl+T) < I, el < a'+T-lpl,l+T-l(zl+l,+T-l(l))Ole < a'+T-1C'e. Hence E() < al+TLC(Ole. Also Cj; < C(1, T) < C( since Ck(xz) > O for all i and k by the assumption of Section 1.1. Therefore 00 C < C' + a3T a1-iye = C* + arT. i=1 Moreover, 00 ~ Ce < E E-'Ce = ~1=1 1-a' since Cj' < C for all i and I by the assumption of Section 1.1 * It is interesting to note that the error of the second bound of the theorem is the present worth of an annuity paid every period after period T while the Bean, Birge, and Smith model incorporates a smaller error corresponding to an annuity paid once every T periods. Although the full roll error bound is smaller than that for decision rolls, the former is seldom used in practice. One can speculate that the reason is lack of accurate forecasting of the parameters of the MDP, encouraging updating every period. 5

2 An Improved Error Bound Exploiting Stochastic Effects The error bound derived in the last section relies exclusively on discounting to bound the error involved in a rolling horizon procedure. In fact, for fixed horizon T, as the discount factor a goes to 1, the error bound goes to infinity. In this section, we improve this bound by exploiting the effect of uncertainty about the future on the error involved in failing to consider immediate effects beyond the horizon T. The procedure is to formulate an equivalent MDP whose discount factor is scaled by a factor s that measures the sensitivity of future state distributions to current decisions. As / goes to 0, a myopic strategy becomes closer to optimal. The error involved is quantitatively assessed by applying the error bound of Theorem 1 to this equivalent MDP. It is shown that, for fixed horizon T and fixed discount factor a, the bound on error goes to zero as 3 -0. 2.1 An Equivalent Markov Decision Process We say two Markov Decision Processes (P) and (P) are equivalent if the sets of optimal substrategies x* (l, k) and x*(l, k) are equal for all 1, k. Whenever (P) and (P) are equivalent, it is easy to see that the rolling horizon strategies i and x will be the same provided that ties for optimal substrategies are broken according to the same rules. Before we define an MDP (P) equivalent to our problem (P) of Section 1.1, we need to introduce the notion of a coefficient of ergodicity. Lemma 2.1 Every one step transition probability matrix in (P) can be expressed as a convex combination of a stochastic matrix and a stable matrix (i.e., a stochastic matrix of equal rows) dependent only on period where the multiplier of the former, / is given by = max(l - min rn min P(). k SkEDk $ That is, for all k and all x E Dk, Pk(x*) = /,P*(zk) + (1 - /)Lk where Pk(x) is a stochastic matrix, Lk is a stable stochastic matrix independent of x*, and 0 < / < 1 is given as above. Proof: See Appendix. ~ The multiplier, P is the largest Doeblin coefficient of all the one-step transition matrices (Seneta[1981]) and is a measure of ergodicity. That is, the closer, is to 0, the closer the Pk(xk) are to a stable matrix. In the proof of the lemma above, /, is a coefficient of ergodicity for the transition matricies of the kth period alone, where / = max& pk. In the extreme case where k- = 0, {Pk(xk), k E Dk} for that k are all stable. The MDP is then effectively decoupled at the end of period k; that is, the optimal substrategy for a horizon equal to k is an optimal substrategy for the infinite horizon problem. On the other hand, if /k is one, then every state in priod k + 1 is impossible to attain from some state and decision pair in period k. For these reasons, we reftlo 6A or more generally / as coupling coefficients. We are now i position to define an equivalent MDP (P). C() is formed from (P) by replacing Pk(xk) by k(xzk) for xz E D) and a by a/ respectively. Specifically, (P) is the MDP (P) min:ED C(x) where C(x) = Ci(w)*L) k l(x11-l)Ck(x) with Pl*(zLk) = J=l p(Zx) for 1 I k and p,:,-l(-l,-l) = I for zlk E D*k. 6

Lemma 2.2 (P) and (P) are equivalent Markov Decision Processes. Proof: See Appendix. a We now use the equivalent MDP (P) to establish a tighter bound on the error associated with the rolling horizon procedure when applied to (P). 2.2 An Improved Error Bound for the Rolling Horizon Procedure Although the MDP's (P) and (P) were demonstrated to be equivalent in optimal strategies, their optimal costs are not equal. In order to relate cost error bounds between them, we establish the relationship between the expected costs generated by a rolling horizon procedure when applied to the two formulations. Let 00 C(l) = Z(aI)'-1lPa-(i'l-(lCr ( s) i=l be the expected cost for problem (P) of the rolling horizon strategy i from period I onwards, and 00 C*(l) = Z(a13)il- P'-1 ((')'l-1)C((z)').=/ be the expected cost for problem (P) of the optimal strategy z* from period I onwards. Lemma 2.3 The error in cost between a rolling horizon strategy 2 and an optimal strategy z* for problem (P) is given in terms of problem (P) by 00 C- C* = C- C* + (i - ) E -rL' (C (+ 1) - *(l+ 1)). /=1 Proof: See Appendix. ~ Note that C*(l + 1) in the above lemma also represents the optimal cost vector of beginning period 1 + 1 in all possible initial states since ((z*)l+l, (x*)1+2,. ) must, by the principle of optimality, be optimal for that problem. That is, 00 ((X*)l+l (X*)1+2 *...) = argmin{ (ap)i-''1-1(z'-')C'(x'), x' E D' for i = 1 + 1,! + 2,...}. i=l+l Also the rolling horizon algorithm if started at the beginning of period 1 + 1 would generate the strategy (xi+l, x+2, * * ). Therefore, we may apply Theorem 1.2 to bound the terms C(l + 1) - C'(l + 1) in Lemma 2.3 to obtain a bound on the rolling horizon error C - C. Theorem 2.4 Let 00 (B) = Y(o a)'-',e l=1 where C1 = max min C'(xl) i=l, -,n Ct EDV 7

and e = (1, 1,.. *, 1). The error in cost between a rolling horizon strategy x and an optimal strategy xz for problem (P) is bounded from above by ((a3)T(1 - a/3)/(1- a))e(P3), i.e., C-* < () a 3) In particular, ~C-c < O(")Ce. -1-a Proof: From Theorem 1.2, C(l + 1)- C*( + 1)' <(a +() where E(3) = ZI1l(ac)l)C'e and C1 = maxi,,...,n minrED: Cl(x). Hence from Lemma 2.3, C - < (a13)Te(3) +(1- f_) Z3L'(a)T+'e(p) I=1 = (a3)Te(p) + (1 _- ) Z'(a3)T+ e(3) /=1 since Lie = e. Hence C C < (a.)Te(p) + (1 -)(a$)T(a/( -a))(fP) = ((a/)T(1 - aA)(-))(). In particular, since ~c(3) < (Ce)/(l - ac?), we have C —C* < {3)T(j1- ) Ce (a3)Te.T - 1-a 1-al? 1- Ce. Note that the error bound provided by Theorem 2.4 is always better than that of Theorem 1.2 and that they are equal when p equals one. Unlike the bound of Theorem 1.2, the error bound here goes to zero for the horizon finite and fixed when 3 goes to zero. Also since C-C" < (C)TE(f), a' = ac3 may be viewed as an effective discount rate that acts at least as strongly as conventional discounting For example, an a of 0.87 corresponding to a real interest rate of 15% together with an equal 8 of 0.87 lead to an effective discount factor of 0.76, or an effective interest rate of 32%. The conclusion is that a real interest rate of 15% together with stochastic effects corresponding to a coupling coefficient of 0.87 reduces the rolling horison error over the deterministic case by a factor at least as great as doubling the real interest rate produces. Theorem 1.2 provides a bound on rolling horizon error as a function of the horizon used. By inverting this bound, we can obtain a formula for the horizon T1 necessary to guarantee a rolling horizon solution within a predetermined error e of optimal. In particular, n(( - a)e/C) ln(a/3) 8

Thus a horizon guaranteed to achieve an error of at most e depends only on the error e, the discount factor a, the coupling coefficient 3, and a uniform bound on single period costs, C. Note that T,' goes to zero as uncertainty about the future increases. We turn now to the average cost case where, in the absence of discounting, the criterion is to minimize the average cost per period, A(z) over all feasible strategies x. Corollary 2.5 Suppose = max(l- min maxmin Pj(zk)) < 1. k XkEDh j ~ Then the error in average cost between a rolling horizon strategy x and an average optimal strategy x* is bounded by PTCe, i.e. A- A < TCe for all T, where A = A(x), A* = A(x*), and 1 H A(x) = limminf H plk l(xlLk-l)Ck(k) 0 k= Proof: Analogous to the proof of Theorem 2.4, it is possible to obtain the following error bound formula for the problem (P) with finite horizon H < oo (see Alden and Smith [1986]) CH_ <(a~)T [1- (1 — /)(a - aH-T)] Ce CH - C < 1 a l+-'a Ce. Setting a = 1, we get CH -CH < [1 + (1 -)(H-T- )]Ce. Then liminfH —.o CH/H - liminfHoo CQ/H = lim infH-,oo(C'H - C)/H < liminfH —oo T(1 + (1 - /)(H - T - ))Ce]/((l - f)H) = /lCe. The first two liminfs exist (i.e., they are finite) since period costs are uniformly bounded by C < oo. Now lim inf C = A() H-co H since XH- = z1' -T where XH is the rolling horizon strategy generated for the H horizon problem. Also liminf- = A(*) H-oo H where z* is a periodic forecast horizon optimal strategy since for the case P < 1, x is average optimal (see Hopp, Bean, and Smith[1984]) and, by definition, xH -- x* for some Hn -+ oo where x is an optimal strategy for horizon H. Hence A - A* < ITCe for all T.! Corollary 2.5 states that even in the absence of discounting, when /3 < 1 and the rolling horizon is fixed, the error between a rolling horizon strategy and an average cost optimal strategy goes to zero as the coupling coefficient goes to zero. The form of the bound makes a precise analogy to discounting with / adopting the role of an effective discount rate. Corollary 2.5 is particularly important from a practical point of view since average cost criteria are frequently employed in practice. We apply the bound of Corollary 2.5 to a problem of vehicle deployment in the next section. 9

Ald pl P3 A3d A3- -2 — X2d Figure 2: A three station dynamic vehicle deployment problem. 3 An Application to a Vehicle Deployment Problem The following model is motivated by a real life truck deployment problem considered by Powell [1986]. Consider the problem of assigning a fixed number of vehicles M to a given set of stations numbered 1, 2, * *, K in order to move loads between them to maximize revenues minus costs. Loads randomly arrive at each station at the beginning of each day forming a queue. Vehicles assigned to a station at the end of the previous day serve this queue with each vehicle carrying exactly one load until there are either no more loads or no more vehicles at a station. The number of loads to be moved at a station is random depending on both the station and the day. The destination of a load is also random depending on the station at which the load arrived. Since a vehicle can carry exactly one load per day, the loads in excess of the number of vehicles at a station will be lost with no costs or revenues incurred. Alternatively, the vehicles in excess of the number of loads at a station will not carry any loads during the day and remain at the station unless reassigned to some other station later in the day. There is a fixed profit, R, for moving a load from a station to its destination. After all vehicles are finished moving their loads, any number of vehicles at a station can be assigned and moved to other stations to move loads the next day. There is a fixed cost, C, to assign and move an empty vehicle to another station. The period in this problem is a day. The state of a period is the number of vehicles at each station after all vehicles have moved their loads for the day. A decision for a state is an assignment of vehicles to station in preparation for the next day. A policy for a period is a specification of an assignment for each possible state of the period. The optimization problem is to select a strategy, which is a vehicle assignment for each state of each period, that minimizes the average expected net costs per day. The rolling horizon procedure is simple: find a least cost strategy for the problem over a fixed horizon, roll forward one day using the first policy of that strategy, and repeat the process indefinitely. We now turn to the question of determining a rolling horizon T sufficiently large to produce an acceptable error from optimum. For our particular example, we will consider the three station problem of Figure 2. We assume that the number of loads that arrive during the morning at a station has a Poisson distribution with parameter Aid for station i = 1,2,3 and day d = 1, 2, * *, 7 of the week. We assume for purposes of exposition that no seasonal variation is present. Let Nid be the random variable for the number of vehicles that arrive at station i during the dth day of a week. The probability distribution of Nid, for n = 0, 1,2,, is Poisson: An -"i4 P{Nid n}= Ade Assume that a load at station i will require a move to the adjacent station in a clockwise direction with probability pi and will require a move in the counterclockwise direction with probability 1 -pi. A state of a 10

period is represented by the vector (kI, k2, k3) where ki is the number of vehicles at station i. An assignment of vehicles is represented in a similar manner as (al,a2,a3) where ai is the number of vehicles assigned to station i. The transition probabilities between states of two consecutive periods depend on the assignment of vehicles in the former period and are driven by the probabilities P{Nid = n} and pi. For example, if all M vehicles are assigned to station 1, corresponding to the assignment (M, 0,0), then the probability that they all go to station 2, or the state (0, M, O) as a result of moving loads on day d of the week is P{(M, 0,0),(0. M, 0)}= 1- E I... The expected net cost of an assignment is C times the number of empty vehicles moved in the assignment less R times the expected number of loads moved the next day. In this initial formulation, the expected cost of an assignment can be negative. However, by subtracting a lower bound C > -0o over all expected net costs, we obtain an equivalent problem where all costs are nonnegative as required. A simple lower bound on the expected cost of an assignment is provided by assuming that no vehicles are moved in a assignment and that all vehicles happen to move a load in the following day. This gives a lower bound of C = -MR. We turn now to computing an upper bound on costs, C, for use in Corollary 2.5 to bound the error from optimal. For the Poisson distribution, the expected number of loads moved is minimized by the assignment of placing all vehicles at the station having the lowest expected number of arrivals. Such an assignment will minimize the expected profits of moving loads. Assuming the worst case situation, where this assignment requires all vehicles to move empty to other stations, we obtain the following bound on the maximum expected cost of an assignment. FM- 1 00 C= M(C + R)-RmiR E- nP{Nid = n} + M E P{Nid = n} n=l nnM — Typical values for truck deployment (Powell [1986]) are: C =$100 to move an empty vehicle, R =$400 profit per load moved, routing probabilities of pi = P2 = p3 = 0.5, and expected number of arrivals per station per day of the week as given in Table 3. For these data and for the case of five vehicles (M = 5), the above bound is C = $1350. We illustrate the calculation of the coupling coefficient 3, for this problem containing only three vehicles. Later, we will consider a larger problem with five vehicles when actually applying Corollary 2.5. The reason we consider the larger problem is to show that sometimes, even if the coupling coefficient is near one for the problem as stated, by eliminating decisions that are clearly non-optimal in any strategy or substrategy we may find a much smaller coupling coefficient. Recall that the coupling coefficient of a period is calculated from the set of transition matrices of the period, and that a transition matrix for a period is defined by specifying the decision for each state of the period. Specifying a decision for one state effectively defines a transition vector from the state to states of the next period and so defines a row of the transition matrix for the period. In our example, the decision set of a state, that is, the set of possible vehicle assignments, is state and period independent. Moreover, the transition probabiities given a particular vehicle assignment is also state independent (but period dependent). Hence, the set of cadidate transition vectors for any row of a period's transition matrix is row independent. This means the coupling coefficient for a period in this problem is equal to the coupling coefficient of the matrix whose rows represent the set of unique candidate transition vectors for a single row state. We call this matrix the candidate matrix for a state. In this example, the candidate matrix for a state is the same for each state of a period, hence the matrix representing the set of unique transition vectors taken from the union of all candidate matrices for each state of a period, or the candidate matrix of a period, is equal to the candidate matrix for any given state of the period. 11

Table 4 contains the candidate matrix for the first day of a week in our example with three vehicles. The last line of the table contains the column minimums whose sum is 0.3025. Hence, the coupling coefficient is 0.6975 for the first day of a week. Now we consider the case of five vehicles. Calculations of the coupling coefficient for each day of the week were made using the approach illustrated above and the results are given in the column headed by the number 5 in Table 5. Ignore, for the moment, the remaining columns of Table 5. The maximum coupling coefficient over each day of the week, and hence over all days in the problem, is 0.9606. This number is near one which means that a small error bound from Corollary 2.5 will require a large rolling horizon. However, the coupling coefficient we need is only for the set of transition matrices that could be a part of an optimal substrategy in the rolling horizon procedure or part of an optimal strategy since the formulation of(P) may clearly include constraints that fail to exclude an optimal solution. We argue in Alden and Smith [1984] that it is suboptimal to assign more than 3 vehicles to any station. The column headed 3 in Table 5 contains the results of using this limit. Thus, we can use / = 0.7106 for this particular problem, which is significantly lower than our initial value of 0.9606. The other columns of Table 5 are included only for completeness sake in showing the sensitivity of the coupling coefficient to the assignment limit. We turn now to Corollary 2.5. Sincewe are interested in selecting a horizon T to guarantee the average error of a rolling horizon solution is small, say no more than e, we set the right hand side of the expression in the corollary to less than or equal to e and solve for a lower bound on the horizon. Letting f = c/C represent the maximum expected relative error, ln(f) - In(3)8 For our 8/ of 0.7106, we find that a fixed horizon of 14 days or more will guarantee the average error of a rolling horizon procedure is no more than one percent of the maximum expected cost of a decision (i.e., f = 0.01). Note that this error is no more than $13.50 per day which is one percent of the bound C < $1350 we calculated above. Table 6 contains the horizon for this and other values of / found in Table 5. The bound on the rolling horizon of 14 days is remarkably small and illustrates the power of the coupling coefficient in reducing error. A horizon bound to achieve a comparable error in the presence of discounting is on the order of years. We suspect that the 14 day bound grows as the number of vehicles in the problem increases, since the stochastic effects correspondingly diminish. Acknowledgement: We would like to thank Stephen Pollock for the many stimulating discussions we had with him on this research. This work was partially supported by the National Science Foundation under Grants No. ECS-8409682 and ECS-8700836.

References 1. Alden, J. and R.L. Smith [1986], "Error Bounds for Rolling Horizon Procedures in Markov Decision Processes", Technical Report 86-10, Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, Michigan 48109. 2. Baker, K. [1977], "An Experimental Study of the Effectiveness of Rolling Schedules in Production Planning," Decision Science Vol. 8, pp.19-27. 3. Bean, J., Birge, J., and R.L. Smith [1987], "Aggregation in Dynamic Programming,"Operations Research Vol. 35, pp. 215-220. 4. Bean, J.and R.L. Smith [1984], "Conditions for the Existence of Planning Horizons," Mathematics of Operations Research, Vol. 9, 1984, pp. 391-401. 5. Bean, J. and R.L. Smith [1986], "Conditions for the Discovery of Solution Horizons," Technical Report 86-23, Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, Michigan 48109. 6. Bhaskaran, S. and S. Sethi [1985], "Conditions for the Existence of Decision Horizons for Discounted Problems in a Stochastic Environment: A Note," Operations Research Letters, Vol. 4, p. 61-65. 7. Denardo, E. [1982], Dynamic Programming: Models and Applications, Prentice-Hall,Inc., Englewood Cliffs, New Jersey. 8. Hopp, W., J. Bean, and R.L. Smith [1984], "Non-Homogeneous Markov Decision Processes," Technical Report 84-24, Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, Michigan 48109, October, 1984. Forthcoming in Operations Research. 9. Lee, C. and Denardo, E. [1983], "Rolling Planning Horizons: Error Bounds for the Dynamic Lot Size Model", Working Paper, Yale University. Forthcoming in Mathematics of Operations Research. 10. Powell, W. [1986] Private Communication, Department of Civil Engineering, Princeton University. 11. Seneta, E. [1981],Non-negative Matricies and Markov Chains, New York: Springer-Verlag. 13

Appendix Lemma 1.1 C < C; + 4 - I1 E() where e is a random vector under strategy x whose tih component, i = 1,2, -, n, takes on the value (el)/, when the initial state of (P) beginning period 1 is i and I/ is the state entered beginning period 1. Proof: Consider the extended strategy.+l'l+T(l) = (zl+1(l), z'+2(l),... z+T-1(l), Xz+T(l)) where xl+T(l) is the argmin yielding C1. C(1) is feasible for the 1 + 1' subproblem and its cost is therefore greater than or equal to the optimal cost, i.e., C +'1+T(+(()) > Cl+. Premultiplying both sides by P'(i') = P'(zl(l)) we get P/(z/(I))C+l'1+T((l)) > P(z/(l))C'+l. Now Cl+1,+T(()) = P+T(() = E T+ l+1'j-1((l+lj-1())Ci )) = +T-1 _ pl+lj-l(z'+lj-1) -al+TlplI+T- 1(z(l))C(+T( T(l+)) > Pl(zl(l))C1+,' or Ej+T-1 1'plj-'(zlj-(l))C=i( (1)) I-ilPll1(zl1 (l))C+ l +-P+,l( z,(l))+a+Tlpl+T(z())Cl+T(X=+T(l)) p(l))C. ince a~pt.'-^ -^C'iz~l^+a -^-^-^zWC~ix'^il >. ( +1(O)fCi. Since P'-1(z'l'-1(()) = I, we have C - C(l,) + >e P'(x')C,*+, 1 = 1,2,.. Premultiplying both sides of the inequality by P1',-1(1,''1) we get P1'-1 (' 11-'))c''-P'- (l-'' )C(l, )+ pl,-l(Xil,'-l)ei > P1tl(tl)C+ or E(_(t, )) < E(C)- E(C;+I) + E(t), I = 1,2, * -. Summing the above inequality over I = 1,2, * we get EE(Ql, )) < E( C) + E (<). 1=1 1=1 Since E,~E(C>I, I)) = E,~ 1 -1P,'-1(.-1)c=(; ) and C; = E(G.), we have c<C;+ZE(c) K 1=1 Lemma 2.1 Every one step transition probability matrix in (P) can be expressed as a convex combination of a stochastic matrix and a stable matrix (i.e., a stochastic matrix of equal rows) dependent only on period where the multiplier of the former, / is given by / = max(1- min ZminPn(z*). k X18ED J 14

That is, for all k and all xk E Dk, Pk(Xk) = 3Pk(xk) + (1 -/3)L where Pk(xk) is a stochastic matrix, Lk is a stable stochastic matrix independent of x*, and 0 < i < 1 is given as above. Proof: We say that a matrix X is dominated by a comparable matrix Y if ij < Yij for all i, j. Let Lk be the stable matrix of greatest row sum dominated by every matrix Pk(zk), zk E Dk and let Pk = 1- Ej Lk. By construction, k = 1- min minP, (Rb) and hence / = maxk pk. Let fL k= f 1 -kIif?k < 1 and otherwise set Lk to an arbitrary stable stochastic matrix. In either case Lk is a stable stochastic matrix. Also let i(zk) (= (X) )- (1 -/P)L) which is a stochastic matrix since PE(Xk) = i(j P(zb)-(1-)jLb) = (1 —(1- a)) = 1 and e = b (p)((Xk)_-(1-?)L ) > 1( )(Xk) (1-k)L) >'(Pq(z')-L,i) > 0. Then Pk(zk) =,Ppk(xk) + ( - _3)Lk. Lemma 2.2 (P) and (P) are equivalent Markov Decision Processes. Proof: Let P(l, k) and P(l, k) be the subproblems of P and P respectively covering periods I through k. We fix k and begin with the inductive hypothesis H(l, k) that for 1 = 1,2,..., k states 1. The optimal substrategies z*(1, k) and ix*(l, k) are identical for P(l, k) and P(l, k), respectively, 2. The optimal costs C*(l, k) and C*(l, k) discounted to the beginning of period I are related by C*(1, k) = C* (l, k) + lk where 01b = Ej (k- (ap)j7l+j and 7j = a(1 - P)LJC*(j + 1, k) is a(1 - 0) times the optimal cost for problem (P) beginning period j + 1 with the initial distribution Li. 15

It is clear that H(k, k) is true with Okk = 0 since C*(k,k) = C*(k,k) = min Ck(k). xk E Dk Now assume that H(l, k) is true for I = j, j + 1,, k. We now show that H(l, k) is true for = j- 1. Subproblem P(j - 1, k) phrased in terms of C becomes min Ci-lk(Xj-k) x - l,k E Di- 1,k or k min a ~-j1 p-1'",1-(~-1',-1 )C (x') -.=-1 Rewriting this gives min {PJ-l^j-2(zi-l'j-2)Ci-l(xi - 1) z3J-,kED-l,k 1) +, j a-j Pj1,i-1(Xj-1'i-)Ci(Xi)}. or since pi-lj-2(zi-lj-2) = I min {Ci-l{(zj-1).zi-EDj-1 + aPJ-(Xj-1)minjEDj Ei=j ai- Pj'i-1 (zJ'i-1)Ci(i)}. Making the substitution Pj-'(') =,P'(zxj-l) + (1- /)Li-' from Lemma 2.1 and noting that the resulting term involving Li is independent of zjx-, we get min {Ci-l(x-1) zj-liDj-l + a(l -?)ii-1 min^pEDi aik P-jil(- l)CC(xi) or min {Cj-(xj-') + a<,Pi-~l(z-)C*(j, k)} + 7,j-. Invoking the inductive hypothesis H(l, k) for 1 = j and substituting C*(j, k) = C*(j, k) + Oj^, we get mi {C-'1(zJ1) + aPJ-l(x-l)C*(j, k)} + atO + j-_. or, by the pinciple of optimality, m in { l-Ci'(zij- L)} + ct#jk + Tj-1i We conclude first of all that the argument zi-' solving subproblem P(j - 1, ) also solves P(j - 1, k). Secondly, that C*(j - 1, k) = C'(j - 1, k) + aopjk + 7i 16

l:-j-1 Hence where Ojk = -'i=O(cl)i+. Hence k-j C*(j - 1, k) = *(j -1, k)+ (a3)'i'j+il i'=O where i' = i + 1 thus restoring the inductive hypothesis for I = j - 1. Lemma 2.3 The error in cost between a rolling horizon strategy x and an optimal strategy x* for problem (P) is given in terms of problem (P) by 00 C- =- E + (1 - 3p)E L ((+ 1)-C(+1)). /=1 Proof: We first establish by induction that, for all z E D, k P (x k) = pkP1k(X1k) + (1 - 13) E: 3kPiLiP(i+1k(i+1k). i=l The relationship holds for k = 1 since Pl(zl) = 3Pl'(xl) + (1 - P)L1. Suppose now it holds for k = 1,2,-..,j - 1. We show it holds for k = j. By definition P'i(zlj) = pl'i-l(Zlij-1)Pi(zi) = j-i1Plj1j-i(xlj-1) + (1-) J-1 pi —liip+ij-.l(- +J1i ))( Pi(xi) + -(1-/)L). Since PL = L for any stochastic matrix P and stable matrix L, Plj(z'l) = i/Pji(Xlj)+(l-1) j- j-iiP+lj(i+1j)+ (1-/3)-1Li+(1 -i3)2E l-'- = j PlJ()+(l 3) Pi- -i Li+ i+l (1 /3) L-i 1 #1s (1 _-? i)2 13j-i-)L = - 3Pj(xj) + (1 -'1) E 13-i-LiP+li(i+li) since (1 - 3)j3i- + (1 - P?)2 Ei} 1 pj-i-l = 1-/. Hence the relationship holds for k = j. Substituting this relationship, we obtain, for all x E D, the identity E a k-1p1k-1(xk-)Ck(xk) = 0(a0)k-1p1.k-(X1k-1)ck(Xk) k=l k=l oo k-1 +(1 - /) Z a'-1 Z#k'-i-1Lipi+",k-(xi+l"k-)Ck(xk). k=2 i=1 Since all terms are nonnegative in the double sum above, we can reverse the order of summation by Fubinis' Theorem to obtain for all z E D E ak-1P,-1(,k-1)Ck(wk):= (P - k-1 k=l +(1 - 1?) E/-T' L (cp)k-1 pi+l,k*-1 (Xi+l,k-1)ck (k). i=1 k=i+l Substituting x = i and z = z* we get, respectively oo C= C+(1 -/) Z-'C(i + 1) i=1 17

and 00 C* C + (1 - )) E-3iL*(i + 1) i=i since x = x and i' = x*. Subtracting the above equalities, we get 00 C -C* =C - C" + (1- ) -i'((i + 1) - C(i + 1)) i=l Table 1: Expected Daily Arrivals by Station and Day of Week Day Ald Ald Ald 1 1 1 1 2 2 1 1 3 1 2 1 4 1 1 2 5 2 2 1 6 2 1 2 7 1 2 2 18

Table 2: Transition Vectors for each Assignment of Day 1 Assignment 003 012 021 030 102 111 120 201 210 300 003.3678.1839.0459.0100.1839.0919.0301.0459.0301.0100 012.1162.1934.0885.0243.1744.1675.0694.0790.0660.0208 021.0243.0885.1934.1162.0694.1675.1744.0660.0790.0208 030.0100.0459.1839.3678.0301.0919.1839.0301.0459.0100 102.1162.1744.0790.0208.1934.1675.0660.0885.0694.0243 111.0367.1110.1110.0367.1110.2231.1110.1110.1110.0367 120.0208.0790.1744.1162.0660.1675.1934.0694.0885.0243 201.0243.0694.0660.0208.0885.1675.0790.1934.1744.1162 210.0208.0660.0694.0243.0790.1675.0885.1744.1934.1162 300.0100.0301.0301.0100.0459.0919.0459.1839.1839.3678 Minimums:.0100.0301.0301.0100.0301.0919.0301.0301.0301.0100 Table 3: Coupling Coefficients for each Day of the Week Limit on Number of vehicles Assigned to a Station Day of Week 5 4 3 2 1.9606.8746.7106.3909 2.9296.8127.5893.2986 3.9296.8127.5893.2986 4.9296.8127.5893.2986 5.8513.7267.4668.1969 6.8513.7267.4668.1969 7.8513.7267.4668.1969 Maximums:.9606.8746.7106.3909 19

Table 4: Horizons for the Coupling Coefficients of Table 5 with f =.01 Assignment Limit coupling Coefficient Horizon in Days 5.9606 114.56 4.8746 34.37 3.7106 13.48 2.3909 5.01 20

UNIVERSITY OF MICHIGAN 901111111104732111111117385 3 9015 04732 7385