Division of Research Graduate School of Business Administration The University of Michigan March 1976 ON THE STRUCTURE OF MOVING AVERAGE PROCESSES Working Paper No. 128 by C.F. Ansley, W.A. Spivey, W.J. Wrobleski The University of Michigan ~ The University of Michigan, 1975 FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research.

Introduction In a recent paper in this journal O.D. Anderson [1] discusses the theorem that the sum of two independent moving average processes of order ql and q2 is itself a moving average process of order q < max(ql,q2). This summation theorem has considerable practical importance. Box and Jenkins [4, p.121] used it without proof to investigate the effects of correlated noise and white noise on ARMA processes. O.D. Anderson [1,2] shows how the theorem can be used to explain the development of composite models from simpler processes which are amenable to practical interpretations. The result has also been used by Zellner and Palm [12] and by Zellner [13]. In two important papers they have investigated relationships between structural assumptions in simultaneous equation econometric models and their associated final equations and transfer functions. Their approach, which in turn uses the Box-Jenkins ARMA model formulation procedures, makes implicit use of the theorem above (Zellner and Palm [12, p. 19] and Zellner [13, p. 378]). Although this theorem may appear to be intuitively obvious, the work of a number of authors has shown that its proof is far from trivial. T.W. Anderson [3, p. 224 -25] outlines the proof of a result which can be used to prove this theorem. This development requires lengthy arguments and is not simple. Granger [7] attempted a

-2 - proof of the theorem based on frequency domain arguments, but according to O.D. Anderson [1, p. 151], "it has a number of flaws." O:.D. Anderson's attempt to develop a simple proof rests on a convexity property of vectors of MA processes of fixed order q. However, in his development of this convexity property, there is a crucial omission which renders his proof incomplete. This paper uses Hilbert space methods to develop a rigorous proof of the summation theorem. We begin by discussing the Wold decomposition, a now classical result, using a Hilbert space formulation. Directly from the Hilbert space development of the Wold decomposition we are able to give a simple proof of a result of T.W. Anderson, referred to earlier, which showed that any stationary process for which the autocorrelations vanish after lag q has an MA(q) representation. We then show that the random shocks in the MA representation of the sum of two independent, normal MA processes are themselves normal. This result is of particular importance to the work of Zellner and Palm, because it justifies their use of Box-Jenkins estimation methods for final equation and transfer function models. The Hilbert space approach has several advantages. First, although based on sophisticated mathematical arguments, it is both elegant and conceptually simple. Second, because Hilbert space methods depend only on

-3 - covariance properties, the proof of the generalization of the summation theorem to uncorrelated rather than independent processes is immediate. Third, the Hilbert space framework specifies clearly the relationships between the MA process and its random shocks. We point out that O.D. Anderson, in attempting to prove the summation theorem, addressed only the problem of the existence of suitable MA coefficients but ignored the question of whether an appropriate stream of random shocks can be found. Our Hilbert space development resolves the problem of the existence of suitable random shocks by giving explicit representations of them as elements in the Hilbert space. Examination of O.D. Anderson's Argument Anderson's argument [1] is based on the convexity of the set of vectors of autocorrelation functions of MA(q) processes. The major omission in his approach results from the way in which he attempts to establish the convexity property. To understand the nature of this omission consider the class of MA(q) processes q t= jiat-j 0 = 1 (1) 0 whose autocorrelations are q-k q 2 Pk kj=0 j j+k j k = l,..,q. (2) j~~~~~o~' '~

-4 - Let /q 2 w = 1/ 2 (3) =/ jand denote each of two MA(q) processes by one and two primes, respectively. Then for 0 < X < 1 we have q-k q-k 'p + (1 - X)p" = Xw' M '' + (l- X)w" Z 80e'0" (4) Ak k 3+k +k j=0 j j+k j=0 j j+k Equation (4) can be written in the form (2) if the system of nonlinear equations q-k /q 2. 6j+k j 0 = Xk p + (l-X)p k=l,2,...,q (5) j=0 / j=0 X j k has real solutions l....,6 q Anderson asserts without proof that such real solutions exist. ("We obtain q equations for the q unknowns 01' '..,q..., and so the resulting equations are soluble." [1, p. 155]) The remainder of his argument rests on this assertion. However, it is not obvious that (5) has real solutions and a proof of this point is required. We develop such a proof and comment in the final section of this paper on the existence of multiple solution sets corresponding to multiple representations of MA processes. Besides showing the existence of suitable coefficients, it must also be proved that there exist suitable random shocks {at} such that t q Yt= X jat = 1(6) 1We note that the class of M1A(q*) processes, where q* < q, is embedded in the class of MA(q) processes.

-5 - with probability 1, ere + and where y' and " are the MA processes whose autocorrelations appear in (4). Establishing this is especially important if one wants to make inferences concerning distributions of statistics associated with time series data. As mentioned earlier, O.D. Anderson's convexity argument is incomplete because he has not considered this problem. The existence of a stream of such random shocks at emerges naturally from the Hilbert space approach we take below. In particular, we show that the random shocks associated with the sum of independent normal processes are themselves normally distributed. Some Preliminaries and the Wold Decomposition In order to establish the results indicated earlier, some concepts and properties concerning wide sense stationary processes and their representations in a Hilbert space must be introduced. We first define a wide sense stationary process {yt} as a process with the following properties: (i) E(yt) is a constant for all t —i.e., E(yt) = p for all t; and (ii) Cov(ysYt ) is a function of s-t alone —i.e., Cov(Ysy ) = Y.-t for all s,t.

-6 - This definition is sometimes referred to as covariance stationarity. Note that the definition implies that the variance of {yt} (assumed to be finite) is a constant for all t. Without loss of generality we assume that p = 0 for the remainder of this paper. Following Rozanov [10,p. 3] a Hilbert space H can be generated from the closure with respect to mean square convergence of the linear manifold generated by the random variables {Yt; -o < t < c}. The elements of the Hilbert space H are random variables and the inner product of any two random variables x1,x2 E H is (x1,x2) = Cov(x1,x2). (7) We write H(t) to represent the (Hilbert) subspace of H formed by the closure in mean square of the linear manifold generated by {y; s < t}. Further, let D(t) denote s the orthogonal complement of H(t-l) in H(t). Then every 1 E 'D(t) is orthogonal to every x2 E H(t-l). D(t) is a closed subspace of H(t), and H(t-l) is a closed subspace of H(t). If S is a closed subspace of H and x E H, we 2 write E*(xIS) to represent the projection of x on S2 Parzen [8] uses this notation because the projection in a Hilbert space of random variables has many of the properties of conditional expectation..For the same reason, Doob [5,p. 155] refers to projection as "wide sense conditional expectation." Parzen discusses a necessary and sufficient condition for projection to be conditional expectation.

-7 - As Parzen [9, p. 961] points out, the best linear predictor of Yt+s, in terms of mean square error,given all the values of the time series up to and including Yt, is E*(yt+sIH(t)), the projection of Yt+s on H(t). We are interested in processes with no deterministic component —i.e., in processes for which the best linear predictor E*(y t+H(t)) + 0, the mean value, as the prediction interval s -+ for all t. Such processes are called regular processes. Rozanov [10, p. 156] has shown that for regular processes there is a unique sequence of constants {c.; j > 0} such that 00 Yt = catj = 1 (8) for all t and where at = E*(ytID(t)). (9) The {at} form a sequence of uncorrelated random variables 2 with constant variance a. a This representation, having been first investigated by Wold [11], is known as the Wold decomposition. If {Yt} is a real process, the coefficients c. and the J random variables at are necessarily real. An important application of this decomposition is in providing an explicit representation for E*(Yt+s lH(t)). From the Wold decomposition (8) we note

-8 - s-1 00 Yt+s Cja t+s-j + jat+sj 0 s Now s-1 0 jt+s-j is orthogonal to H(t), because each at+sj in this sum is an element of D(t+s-j), which is orthogonal to H(t+s-j-l) D H(t). Also 00 Zo c Cjat+s-j s is an element of H(t) because each at+s j in this sum is an element of H(t+s-j) c H(t). By the standard uniqueness properties of projections in Hilbert space (see e.g. Parzen [8, p. 306]) we have 00 E*(Yt+sH(t)) = Cjat+sj (10) A special case exists where the correlation function is identically zero after a given lag q — i.e., for s > q (t+sYt) = s = 0. (11) We show that such a process has a MA(q) representation. It is clear from (11) that y is orthogonal to H(t) for s > q, and thus E*(y tsH(t)) = 0. t -ts

-9 - This establishes regularity and for all s > q we have from (10) I E*(Yt+ IH(t)) 112 2= c = S giving c. = 0 for all j > s > q. Thus for such a process the Wold decomposition can be written as q Yt = cat- c = 1 (12) and {yt} has a MA(q) representation as asserted.3 Consider now an MA(q) process q Yt = jut-j 0 = 1 (13) where {ut}is an uncorrelated sequence of random shocks 2 with mean zero and constant variance a. This process u has autocovariance function Cov(yt+syt) = Y = 0 for s > q, so that from (12) an MA(q) process has the unique Wold decomposition q Yt = cjat (14) 0 where at = E*(ytlD(t)) (15) as in (9). The representations (14) and (15) are not as obvious as they might appear at first sight, because This is a simple proof of the result of T.W. Anderson [3, p. 224-25] and referred to earlier.

-10 - an MA(q) process may have multiple representations. The representation (14) is a canonical representation in which the c. may not coincide with the ej used in the defining equation (13), nor the at coincide with the ut. We will discuss some of the implications of multiple representations as they affect the summation theorem in the final section. Normal Processes In this section we show that if {yt} is a normal stationary process then every nonzero element in the Hilbert space H generated by {Yt; -o < t < 0} has a normal distribution. By a normal process we mean a process {yt} where any finite set of random variables {It ' *'Yty } from the process has a multivariate normal 1 n distribution. A stationary normal process is therefore strictly stationary in the sense that all the moments of the joint distribution of ys and yt depend on s-t alone. An example of a normal stationary process would be an MA process generated by independent normal random shocks. We can extend the primary result by showing that any two (nonzero) elements xl and x2 in H, and indeed any finite set of (nonzero) random variables from H, have a multivariate normal distribution.4 4In this case uncorrelated random variables are also independent, and we see that projection becomes conditional expectation.

-11 - To prove these results, we note that for any nonzero x E H there exists a sequence {xn} of finite linear combinations of {Yt; -c < t < o} such that x -+ x. Suppose that xn and x have respectively variances 2 2 2 >0 and 2 > 0, distribution functions F and F, and n n characteristic functions 4n and p. We have by mean square convergence a2 a (16) n and F F (17) n at every continuity point of F (see Fisz [6, p. 238]), and therefore for all h such that -o < h < X we have (n(h) -+ (h) (18) (see Fisz [6, p. 188]). Each xn is a finite linear combination of random variables with a multivariate normal distribution and, therefore, is itself normally distributed. Now from (16) P(h) = exp[- o2h /2] + exp[-A2h2/2] n n Therefore from (18) we have (h) = exp[- 2h2/2] and x has a normal distribution as asserted. We extend this result to more than one variable

12 by noting that (17) and (18) extend to multivariate distributions and that if xln - xl and x2m x2, then CoV(xlnx2m) = (Xln X2m) + (x1 x2) =Cov(x1,x2) as n,m -+. Noting that xln and x2m have a bivariate normal distribution, we have (hh 1 22+2h h Cov(x 2h2 nm ) exp[2( n 1+ 2hlh2C ov(lnl m) + 2m 2)] 1 22 22 exp[-t(chl + 2h1h2Cov(x,x2) + 2h2)] = (h1,h2) so that x1 and x2 have a bivariate normal distribution as asserted. Clearly, using an identical argument this result extends to a finite family x1,...,xn of random variables in H. In particular, if {yt} is a normal stationary process, then the random shocks {at} in the Wold decomposition (8) are uncorrelated multivariate normal and, therefore, are independent normal, random variables. The Summation Theorem The theorem can be formally stated as follows. Suppose that {Ylt} and {Y2t} are uncorrelated stochastic processes with MA(ql) and MA(q2) representations

-13 - qi Yit i =. 6a.' i 1,2 (19) Yit j=0 13 it-j where for i = 1,2 and for all s and t Var(ai) = ci Cov(aitais) = 0 and E(ai) = 0. Then {Yt}, where yt = Ylt + y2t has a MA(q*) representation, where q* < q = max(ql,q2). Moreover, if {Ylt} and {Y2t} are independent normal processes, then {yt} is normal and its associated random shocks have independent normal distributions. We begin the proof by observing that {yt} is wide sense stationary with covariance function {Yk} such that Yk = 0 k > q = max(ql,q2). It follows immediately from (12) that {Yt} has the (Wold) MA(q) representation q Yt = Cj.at_ w0J Jere where at = E*(ytD(t))

- 14 - and where {cj} and {at} are real. Note that we do not exclude the possibility that c. = 0 for 1 < j < q, and therefore we see that {yt} has an MA(q*) representation for some q* where 0 < q* < q, and where cq* A 0. If {Ylt} and {Y2t} are independent normal processes,-we see immediately that {yt} is also normal, and, by the results developed earlier, that its random shocks are independent and normally distributed. Q.ED. We note that the theorem can be extended to correlated processes having MA(ql) and MA(q2) representations provided that Cov(alt,a2t) = const and Cov(alta2) = 0 for s / t. Under these conditions the sum is stationary and the covariance function {ykl of the sum has the property Yk = 0 k > q = max(qlq2), as in the uncorrelated case. Comments on the Summation Theorem We conclude this paper with some observations on how our results relate to the coefficient structure of MA processes. First, we have pointed out in the proof of the summation theorem that it is possible that q:* is less than max(ql,q2). It is not difficult to see that this will be the case whenever ql 2 2 q2 q and = a2 2q = 0. Moreover, it is possible that 2q and q 2q the sum {yt} is an uncorrelated sequence of random variables, i.e., an MA(0) process. An example is 2 2 Ylt =at - Ialt-l 1 =a and 2 2t = 2t + 2t-2 a 2 Y2t = a2t + 0a2t-1 ~2 = ~

-15 - Then {yt}, where yt = yt + Y2t' is a sequence of uncor2 2 related random variables with variance 2(1 + 2)a. Because an MA process may have multiple representations (see below), there is no simple way to state the conditions which are necessary and sufficient for this situation to arise. Although it may seem improbable that either of these situations could arise in practice, it is easy to imagine a situation where some of the yk were very small after a given lag, and not regarded as significant after identification or estimation of the MA(q) model using standard Box-Jenkins methods. This is an important point to bear in mind when using Zellner's and Palm's technique for examining structural assumptions in econometric models. We return now to our result (12). If a process {Yt} has autocovariances {yk} that vanish after a given lag q, we have in dffect shown that the set of q + 1 equations q-k a i j+k = k = 1,2,.,q = 1 (20) 2 in the q+l unknowns 8'...'q,a has at least one real q a solution. T.W. Anderson [3, pp. 224-25] reaches the result (12) by showing directly that a real solution exists. We now discuss how multiple solutions can arise.

-16 - Box and Jenkins [4, pp. 196-97] show that for the MA(q) process q t 0 j t-j 0 1 there are multiple choices of the 0. and at which lead to multiple representations of {yt. Essentially, they consider the characteristic function C(B) = 1 + 8 B +...+ 8 Bq 1 q q = n (1 - ta.B) 1;1 and show that one can construct another function q C*(B) = n (1 - aB) = 1 *B +.+ *Bq 1 1 1 q by arbitrarily setting at equal to either a. or l/a.. Then by choosing suitable random shocks { a* } we have a new representation q t = C*(B)a* = a_* (21) Yt C * (B) at 0 where B is the backward shift operator. There are 2q different ways of choosing the a. to be replaced by their inverses in C*(B); this implies that there are up to 2q different solutions to the equations (20). The actual number of solutions may be less than 2q for two reasons. First, some of the 2q possible choices of C*(B) may be identical, because two or more of the factors (1 - aOB) 1

-17 - may be identical. Second, some of the ai may be complex. Such ai appear in complex conjugate pairs because the coefficients of C(B) are real, and, to ensure that the * * i. are real, the ai must also be chosen in complex conjugate pairs and not independently. Box and Jenkins do not consider the existence of suitable random shocks for (21). However, it can be shown that suitable random shocks can be found for the multiple representations provided C(B) and thus C*(B) have no zeros on the unit circle. If all the zeros of C*(B) lie outside the unit circle, then the random shocks a* can be written in the form 00 a* = rYt-i (22) at 0 JJ' The representation in this case is said to be invertible. If all the zeros of C*(B) lie inside the unit circle, we can write t = j t+q+j and if some zeros lie outside and some inside the unit circle, we can write 00 at =-j tIn each case ZIr I < 00

-18 - and the limits can be shown to exist in mean square. In the case where all the zeros lie outside the unit circle, we see that a* in (22) is an element of H(t), and the corresponding representation in therefore the Wold representation (14). Our proof of the result (12) establishes the existence of one set of coefficients (those in the Wold decomposition) which satisfies the equations (20). By using the procedures of Box and Jenkins to construct multiple representations of the resulting MA process, we see there are corresponding multiple solutions to (20). The existence of multiple representations and therefore multiple solutions to (20) can be illustrated by the following example. Consider the MA(2) process u - + -25u (23) Yt = ut - Ut-l + '25ut- (23) where 2 2 C(B) = 1 - B -.25B = (1 -.5B) (24) Both zeros lie outside the unit circle, and (23) is therefore the Wold representation (14). We note that 00 ut= I l+j (.s5)3tj and Var(ut) = (16/33)Y0.

-19 - We can now construct - 2 2 C(B) = (1 - 2B) = 1 - 4B + 4B and write t = t 4t-+ 4t-2 where 00 at = 25 X (l+i) (.5) Yt+2+j 0 and Var(a*) = (1/33)Y0. t The remaining possibility is to choose C*(B) = (1 - 2B)(1 -.5B) = 1 - 2.5B + and write Yt = U* - 2.5u*t + U t t-1 t-2 B2 B where u*t -= 2 -00 (.5) Yt+l-j and Var(u* ) = (4/33)y,. t

-20 - Note that the number of possibilities is reduced from 2 = 4 to 3 because there is a double factor of (1-.5B)2 in (24). Finally, we comment on the case where C(B)' has a zero on the unit circle. It is not easy in this case to show directly that suitable random shocks exist corresponding to solutions of (20). T.W. Anderson [3, p. 226'] specifically excludes this case from his discussion. The problem is that we cannot write the random shocks in the Wold representation (14) in the form (22). This special case is now illustrated by a numerical example. Consider the MA(1) process at -at ~ al (25) the characteristic function C(B) = 1 - B has a zero B = 1 on the unit circle. Solving for at, we see that at Yt + Yt- + Yt-2 + ' (26) Clearly, this does not converge. To gain some insight into the relationship between the process and its random shocks, we now construct a sequence of finite linear combinations of {Ys; s < t} which does converge to at.

-21 - Consider first the sequence at(n), n=1,2,...} defined by 00 (n) - 2-k/n t k= Yt-k k=0~ We note that a(n) is a well defined element of H(t) because II (n) 2 < | 1/2 1/ 2-k/n) I- a<0 k=0 where y0 = Var(yt). The sequence {at() is a Cauchy sequence because -1 -1 -1 -1 n n m 1-2 n 2-2 n -2 m 1-2 m (n) (m) i 22 -! a 1 + t t 0 -2 11-2 n 1-2 -2 m - 0 as n,m + o. Thus {a (n)} approaches a unique limit point in H(t) which, by considering the inner products (n) (at,a ) for s < t, we can show to be a. If we choose N such that n Nn nk=0 tk lat(n) [ 2-k/nYt - I < n/2

-22 - and define N a (n)= 2k/nyk, (27) k=0 it is clear that lim (n) lim z -k/n at = n+o at = n+ kO 2 at n-*00 a t n->- k=0 Yt-k' We have constructed a sequence of finite linear combinations of {Ys; s < t} which converges to at. It has already been shown from (27) that a cannot be expressed in the form (22) and it can now be observed that this situation arises because limiting operations in (27) cannot be interchanged. This is analogous to the situation where uniform convergence of a power series no longer holds on its radius of convergence. In fact, this property can be directly related to the divergence of the inverse of C(B) on the unit circle. Now, by a similar argument we can show that lim X 2 t l (28) t n-noo k=0 corresponding to (27). Thus we can recover the random shocks at by a limiting process on either past or future values of the process. O.D. Anderson [2] calls this the semi-invertible case.

-23 -As we said earlier, the problem of showing the (n) existence of a sequence {a )}converging to {at} in the general case where C(B) may have a zero on the unit circle is not a simple matter if approached directly. However, the problem is almost deceptively easy using the Hilbert space approach —a is simply the projection of Yt on the subspace D(t).

References 1. Anderson, O.D. "On a Lemma Associated With Box, Jenkins and Granger." Journal of Econometrics 3(1975): 151-56. 2.. "Moving Average Processes." Statistician 24, No. 4 (Dec. 1975): 283-97. 3. Anderson, T.W. The Statistical Analysis of Time Series. New York: John Wiley, 1971. 4. Box, G.E.P., and Jenkins, G.M. Time Series Analysis, Forecasting and Control. San Francisco: Holden Day, 1970. 5. Doob, J.L. Stochastic Processes. New York: Wiley, 1953. 6. Fisz, M. Probability Theory and Mathematical Statistics. 3d ed. New York: John Wiley, 1963. 7. Granger, C.W.J. "Time Series Modelling and Interpretation." Paper presented to the European Econometric Congress, Budapest, 1972. 8. Parzen, E. Statistical Inference on Time Series by Hilbert Space Methods. Technical Report No. 23, January 2, 1959, O.N.R. Contract 225(21), Statistics Department, Stanford University. Reprinted in E. Parzen, (ed). Time Series Analysis Papers, San Francisco: Holden Day, 1967, pp. 251-382. 9. _. "An Approach to Time Series Analysis." Annals of Mathematical Statistics 32(1961): 951-89. Reprinted in E. Parzen (ed.). Time Series Analysis Papers, San Francisco: Holden Day, 1967, pp. 383 -421. 10. Rosanov, Y.A. Stationary Random Processes. San Francisco: Holden Day, 1967. 11. Wold, H.O. A Study in the Analysis of Stationary Time Series. 1937. Reprinted Stockholm: Almquist and Wiksell, 1954.

12. Zellner, A., and Palm, F. "Time Series Analysis and Simultaneous Equation Econometric Models." Journal of Econometrics 2(1974): 17-54. 13. Zellner, A. "Time Series Analysis and Econometric Model Construction." In Applied Statistics edited by R.P. Gupta. Amsterdam: North-Holland, 1975.