REPLICATION SPLITTING AND VARIANCE FOR SIMULATING DISCRETE-PARAMETER STOCHASTIC PROCESSES W. David Kelton Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, Michigan 48109-2117 Revised November 1985 Technical Report 85-21

Replication Splitting and Variance for Simulating Discrete-Parameter Stochastic Processes W. David Kelton Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, Michigan 48109-2117 Previous results for stationary continuous-time processes concerning allocation of a fixed amount of simulation effort across independent replications are extended both to stationary and certain nonstationary discrete-time processes. In particular, in the presence of positive autocorrelation, variance is reduced if more short replications are designed. The magnitude, however, of the variance reduction is not great as long as the computation budget is not tight, suggesting that a good strategy is to design for a moderate number of replications in any case, which also mitigates potential bias problems. Key Words: Simulation; Variance; Replication; Budget constraint.

1. Introduction One of the principal drawbacks of using simulation to study the behavior of complex stochastic systems is that we obtain only estimates (as opposed to exact values) of desired system characteristics. Such estimators are properly regarded as random variables (r.v.'s), whose degree of imprecision or uncertainty is typically measured by their variance. Accordingly, considerable effort has been devoted to finding techniques to reduce the variance of such output r.v. estimators, at little or no additional cost, in which case more precise results are obtained for the same simulation effort, or (equivalently) less effort is required to attain a desired precision. Many of these variance reduction techniques (common random numbers, antithetic variates, and control variates, for example) manipulate the random number generator to induce certain correlations in the simulation output which then enter the variance formula, with appropriately signed coefficients, to reduce the variance of the final estimator. Thus, some amount of internal modification of the simulation code itself is usually required to use such techniques. This paper examines a different method of variance reduction that is entirely external to the simulation program, affecting only the duration and number of independent replications of the simulation through the experimental design of the simulation study. Assuming a budget constraint given in terms of the total amount of simulation possible (expressed either as simulation clock time or as the number of discretely-indexed observations), a decision must be made before the simulations are run as to the number of independent, identically initialized and terminated replications to make, and the duration of each. Gafarian and Ancker [2] considered monitoring a stationary continuoustime process with a positive, exponentially declining autocorrelation function during a simulation, and showed that it is better (in terms of variance of the time-integral output estimator) to break up the simulation effort into "many short" runs, rather than "a few long" runs. This paper establishes similar results for discrete-time processes, which are often more informative and easier to observe in simulation, that are either (a) stationary with any form of positive autocorrelation function, or (b) nonstationary first-order autoregressive (AR(1)) with positive multiplicative factor; a counterexample, however, demonstrates that the result does not hold for arbitrary nonstationary discrete-time processes, even with positive autocorrelation. Section 2 treats stationary processes with arbitrary positive autocorrelation function, and Section 3 shows that, while similar results do not in general hold for nonstationary processes, they are valid for positively correlated nonstationary AR(1) processes. In Section 4 some numerical quantification of the results is presented, and conclusions and observations appear in Section 5. 1

2. Stationary processes Let {X1,X2,...} be a covariance stationary process with E(Xi) = - and qp = Cov(Xi,Xi+p). From a simulated realization X1, X2,..., Xm of m consecutive observations from the process, Xm = Eim Xi/m is an unbiased estimator of A, with m-i Var(Xm) = (70 + 2 L(1 - p/m)p)/m. (1) p=l (Empty sums, such as (1) in the case m = 1, are taken throughout as 0.) Thus, if we make k independent replications of length m observations each, resulting in k independent realizations of Xm, our final unbiased point estimator is Xk,, the sample average of the k independent Xm's, which has variance equal to the expression in (1), divided by k. Suppose a budget constraint is imposed in the form of a limit n on the total number of Xi's that can be simulated, regardless of how these n observations are allocated to replications. We must then decide, before simulating, on how many replications k to make, each of length m = n/k, under the budget constraint. (It is assumed that n is divisible by k, a mild restriction since n will probably be relatively large. Also, we require the replications to be of equal length m to preserve the identically distributed nature of the within-replication averages, in order to allow application of statistical methods based on an identical-distribution assumption.) Since Xk, is unbiased for,u regardless of the choice of k and m, it is reasonable to focus on the effect of the splitting of n into k times m on Var(Xk,); the following result shows that in the common case of positive autocovariance, choosing many short replications is preferable to choosing a few long replications. Proposition 1. For j = 1,2 let kj be a positive integer dividing n, let mj = n/k., and assume that kl < k2. For a covariance stationary process {X1,...,Xn with yp > 0 for all p, we have Var(Xklml) > Var(Xk2m2). Proof. For j = 1,2 let my-l gj = (1 - p /j)rp, p=1 so that Var(Xk.m.) = (-o + 2gj)/n. Thus, it is enough to show that 91 > g2. Since mi > m2, m1 -1 ) (P,91 - n92(Ml )l Z1 P~Z 1 (2)1 m2 m ~p= P=m2 2

which is clearly nonnegative since the autocovariance function is nonnegative. Note that the assumption of nonnegative autocorrelation was used only in the final step of the proof, and that nonnegativity of (2) is necessary and sufficient for the result of the proposition to hold. If the jp's could be negative (as can arise, for example, in inventory systems) then (2) could be negative, resulting in the opposite conclusion, i.e., that a few long replications are preferable to many short ones. Thus, some knowledge of the sign of the autocorrelations would appear to be helpful in designing the simulation experiment. 3. Nonstationary processes Simulation of complex systems typically results in output stochastic processes which are nonstationary, due to the often artificial nature of the initial conditions needed to start the simulation. This section establishes a result similar to Proposition 1 for one useful class of such processes; the proof, however, is entirely different. Before establishing this, we demonstrate that the result of Proposition 1 does not hold in general for any discrete-parameter nonstationary process, even if positive autocovariance is assumed. For a general (possibly nonstationary) process {X1, X2,... }, let i = Cov(Xi, Xj) and define Xk formally as in Section 2 (except now using the Xi's from the nonstationary process). In this case, m m Var(Xkm)= =k Eij, i=1 1=1 (3) where n = km. As a counterexample, let {Xl, X2,...,X6} be multivariate normal with covariance matrix 4 3 1 1 1 1 3 3 1 1 1 1 1 1 2 1 I 1 I 1 2 I 1 I I1 1 2 1 I 1 1 1 1 2 Note that S is positive definite (so such a process exists), and that the correlations are all positive (but not stationary). Choosing n = 6 and, as before, setting m = n/k, (3) is equal to 1.056 if k = 2, but is 1.083 when k = 3; thus, increasing k led to an increase in Var(Xkm), contrary to Proposition 1. One general class of nonstationary processes, however, does yield the result of Proposition 1. 3

The AR(1) process is defined by the recursion Xi = ++(Xi-l -I)+i, for i = 1, 2,..., and the ej's are a sequence of uncorrelated r.v.'s with mean 0 and common variance 2; we assume throughout that 1\1 < 1. This class of processes was introduced as a model for simulation output processes by Fishman [1] and investigated further by Turnquist and Sussman [4], and by Kelton and Law [3]. While making such an assumption certainly entails some amount of approximation, the relatively simple form of the AR(1) process enables a more intensive analysis. Further, this process shares many important features with actual output processes from simulation, such as having an autocorrelation function that (at least asymptotically) declines exponentially; depending on the initial specification of Xo, the process may or may not be stationary. If we make the additional assumption that the ei's are normally distributed and that Xo is drawn from a normal distribution with mean p and variance a2/(1 - p2), then the {Xi} process is stationary with mean 0, variance a2/(1 - p2), and lag-p autocovariance 7p = OPa2/(1 - 42); thus, the result of Proposition 1 would apply. If, however, we do not make these additional assumptions and let Xo be deterministically specified, then the {Xi} process is neither first- nor second-order stationary; for the rest of this section, we will assume that this is the case, and so Proposition 1 would not apply. The conclusion of Proposition 1, however, still holds, as we show in the remainder of this section. Defining Xm and Xk, formally as in Section 2 (except now using the Xi's from the nonstationary AR(1) process), note first that from equation (4) of [3], Var(Xm) 2 (1- (( 2 2++(1 —m))). (4) V(l - m)2 m(1 - 2) The following Lemma is stated for use in evaluating the key expression appearing below in Proposition 2. Lemma. For q a nonnegative integer and any real y, q (i) yvP= (1 yq+l)/(l- y) p=o q (ii) pyp1l = (1- (q+ l)yq+ qyq+l)/(1- y)2 p=O q (iii) p2y- + y (q+ 1)2yq+ (2 2+ 2q- 1)yq+l q2y+2)/(y(l- y)3) p=O 4

Proof. (i) follows from induction on q, and (ii) and (iii) are obtained by successively differentiating through (i) with respect to y. Proposition 2. Let n, kj, and mj be as in Proposition 1. Then for any AR(1) process with > 0, we have Var(Xklml) > Var(Xkm2). Proof. Dividing (4) by kj and rewriting yields Var(Xkj) = cl(l - C2g9m), with cl = a2/(n(l - 0)2), C2 = 4/(1 - p2), and gm = (2(1 - Om) + 4>(1 - 4m)2)/m for any positive integer m; note that neither cl nor c2 depends on my. Thus, it is enough to show that gm1 < gm2, which would follow from establishing that gm is nonincreasing in m (since mi > m2). Noting that gm - gm+l > 0 if and only if hm = 2 + - - 2(m + 1)Om - 20m+l + 2rnm+2 + (m + 1)2m+l - rm2m+3 is nonnegative, we use the relations in the Lemma to rewrite h (1 = - )3 (3p + 4)(p + 1)P + Om (-p -(2m+ l)p+ 3m(m+ 1))P). (5) p=o p=o Since 4 > 0 and the coefficients in (5) of qP in the summations are always positive over the sums' respective ranges, we see that hm > 0, and the proof is complete. Again, the assumption of nonnegative autocovariance (4 > 0 here) is critical for the result, and the opposite conclusion could be reached otherwise. In principle, one could investigate whether the conclusion of Proposition 2 holds for higher-order autoregressive processes, as well as for more general ARMA processes, by using methods similar to those above; it seems, however, that the complexity involved would be formidable. 4. Numerical illustration Propositions 1 and 2 establish inequalities about the variances resulting from alternative splitting of the simulation budget, but say nothing about the magnitude of the variance reduction obtained from choosing a larger value of k. In this section we use the AR(1) model (both stationary and nonstationary) to quantify the nature of the decrease in Var(Xk,) as k increases, with n fixed. 5

With the AR(1) process (with normal -e's) initialized by drawing Xo from a normal distribution with mean pi and variance u2/(l - <2), the process is stationary with lag-p autocovariance 7p = Opr2/(1 - _ 2). Combining this with (1) results in r(S) 2 m- 2 - <n2 + 2+m+l km2(1 + 4 )(1 - )3 where the superscript (S) denotes stationarity. On the other hand, if the AR(1) (with possibly nonnormal e:'s) is initialized via a deterministic choice for Xo, we get from the expression for Var(Xkm) in the proof of Proposition 2 that =(NS) =(S) 2 (0m+1 - _)2 Var(Xkm ) = Var(Xkm) - k(1 + )(1 -); =(NS) (S} the superscript (NS) denotes nonstationarity. As an aside, note that Var(X(k ) < Var(X(), evidently reflecting the lower variability induced by the deterministic initialization in the nonstationary case. With a = 1, n fixed at 1000, 2000, 4000, and 8000, and setting m = n/k, Figure 1 plots the standard deviations \/Var(Xk) (solid lines) and /Var(X( ) (dashed lines) as functions of k, for k < 125 and dividing n; 4 is taken as 0.90. The decrease in variance with k is evident in all cases, but is less marked for a more generous budget. In fact, for n not too small, the quantitative advantage of splitting into many (high k) replications appears to be only minor. (From plots not shown for other values of b, the size of n required for the preceding statement to hold is smaller for =(NS) small d, and vice versa.) This is significant in the nonstationary case if we are using Xj, as an estimator of y; this estimator is biased for,l, and the bias increases with k (see [3]), so that choosing =(S) k small is attractive from this standpoint. In the stationary case, however, Xk, is unbiased for A, and it appears advantageous to make a number of replications provided that it is neither expensive nor inconvenient to implement the multiple simulation initializations the mliple simnis would entail. =(NS) Finally, Figure 1 displays a curious crossing of the dashed lines, for example, Var(X00,10) =(NS) (n = 1000) is less than Var(X100,20) (n = 2000), so that the variance is larger with more total data. To explain this apparent paradox, recall that these simulations are initialized deterministically, and allowing them to run for m = 20 points (rather than m = 10) extends further away from the deterministic initialization, providing greater opportunity for variation that in some cases more than offsets the increased information in the additional data. 5. Conclusions This paper has focused on variability of estimators of means of positively autocorrelated 6

discrete-parameter processes (or asymptotic means in the nonstationary case), and showed that splitting the budget into multiple replications is always preferable in the stationary case, may not be in the general nonstationary case, but is still preferable for the nonstationary AR(1) model considered. Looking at the actual magnitudes, however, of the variance reductions obtained for both stationary and nonstationary AR(1) processes, it appears that relatively little is to be gained by splitting into many short replications unless the budget is tight. Thus, especially in the nonstationary case where bias may also be a concern, a moderate number (no more than, say, 25) of replications should result in reasonably stable estimators and adequate degrees of freedom for efficient application of various statistical procedures, such as hypothesis testing about g, or forming confidence intervals for p. A cost model incorporating both the cost of variance and the cost of eliminating the biasing effects of such nonstationarity on point estimators could quantify the tradeoffs involved in the splitting of the budget into replications. In any case, it appears that in automated procedures for run length determination, preference should be given to run length increase over increases beyond about 25 in the number of replications to attain a desired precision in the output. Acknowledgements The author would like to express his thanks to Averill M. Law for his valuable input, and to Diane P. Bischak for her careful reading of the manuscript. The comments of two referees were _(S) also quite helpful, including pointing out an error in the expression for Var(X(,,). References [1] G.S. Fishman,"Bias Considerations in Simulation Experiments," Operations Research 20, 785 -790 (1972). [2] A.V. Gafarian and C.J. Ancker, Jr., "Mean Value Estimation from Digital Computer Simulation," Operations Research 14, 25-44 (1966). [3] W.D. Kelton and A.M. Law, "An Analytical Evaluation of Alternative Strategies in SteadyState Simulation," Operations Research 32, 169-184 (1984). [4] M.A. Turnquist and J.M. Sussman, "Toward Guidelines for Designing Experiments in Queueing Simulation," Simulation 28, 137-144 (1977). 7

CI, 0 0 ^Ss. 'o v\ 0 0 _ 4 t-., n=8000 o Stationary Nonstationary.. o.,"'',,,.. 0 0O 25 50 75 100 125 k Figure 1. \/Var(X(n/k) (solid lines) and VVar(Xn/k) (dashed lines) as functions of k for AR(1) processes with a = 1 and - = 0.90. 8