/ / f i Division of Research Graduate School of Business Administration The University of Michigan August 1975 AN ANALYSIS OF TRANSFORMATIONS FOR TIME SERIES Working Paper No. 117 by C.F. Ansley, W.A. Spivey, W.J. Wrobleski The University of Michigan ~ The University of Michigan, 1975 FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research.

Introduction To apply the Box-Jenkins methodology to the analysis and forecasting of nonstationary time series, a preliminary transformation of the data is oftentimes required. Evidence suggests that the choice of this preliminary transformation is of critical importance. Chatfield and Prothero [4], for example, were unable to obtain a satisfactory forecasting model using Box-Jenkins methods on their transformed data, and several discussants of this paper, [2], [6], and [8], suggested that this result was attributable to an improper choice of initial transformation. It was also indicated by discussants that a more flexible parametric family of transformations, first suggested by Box and Cox [1] for the fixed effects analysis of variance model, could be used to select an initial transformation. This family is given by (1) y,(t) = (y(t)) - 1 A A Yo(t) = lny(t), where for our purposes y(t), assumed to be positive, denotes a nonstationary time series and X is the transformation parameter. Estimates of X for the time series of Chatfield and Prothero were presented in [2], [5], and [8]. Apparently these estimates were developed by extending to time series

- 2 - models the likelihood methods associated with the Box-Cox transformation (1). To the best of our knowledge, however, no formal development of such an extension has been published. This paper presents such a development, together with a numerical algorithm for estimating X, and illustrates the use of the algorithm in the analysis of several time series. 'Development of the Likelihood Function We have called (1) a preliminary transformation because one may wish to employ it before entering the identification phase of the Box-Jenkins methodology. Moreover, the reduction of a nonstationary time series to a stationary ARMA process by successive backward differencing is a central feature of the Box-Jenkins methodology. Thus to develop the likelihood function for the observed (transformed) time series data we must consider the effects of two systems of transformations, the preliminary system and the differencing system. Suppose we have observations from an underlying time series y(t) and that for some value X of the transformation parameter the transformed observations y (t) in (1) follow an ARIMA(p,d,q) process having autoregressive parameters - = ((1' ' '*' * p) and moving average parameters 8 = (1-, *** q). Suppose further that the random shocks a(t) associated with the process are normally distributed, independent random variables with mean zero and variance. The constant variables with mean zero and variance a. The constant

- 3 - term in the model is denoted by 6. Since the y (t) form an ARIMA(p,d,q) process, the d order backward differences, (2) w(t) = vdy(t) form an ARMA(p,q) process. The joint probability density of n such differences w (1), *., w (n) is given by Box and Jenkins [3; p. 273]: (3) f (w_, ( ew o 02) ( )= 21 n/2 1/2 S n(3) w6 CY IM( 2 exp{ S} 2 Tro 2 a where (4) wX = (w (n), --, w%(1)), n n 2 (5) S = S(o,e,6) = Z {E[a(t)lw;,,6] }, t=-00 and where M = M (p,e) is the inverse of the n by n corren n lation matrix of the differenced series w (t). Because it is impossible to reconstruct the joint probability density of the n+d transformed observations using only the joint probability density of the n differences w (t), we will assume that the initial d observations y (O), y (-1), y (-d+l) are fixed. We now seek the joint probability density of the n most recent observations y (n), ', Y (1) of the time series, regarding the initial d values as constants. Accordingly we substitute (2) for w (t) in (3) and multiply the resulting A

- 4 - function by the Jacobian of the transformation for the dth backward differences. This Jacobian is equal to 1 because it is the determinant of a triangular matrix whose principal diagonal elements are each equal to 1. Thus the joint density of the y (t), regarding the initial d values y(0), y(-1), **-, y(-d+l) as fixed, is (6) gn(YXIYX(O), -*, yj(-d+l); 4,e,6, 2) 1_ _ n 1/2 S2 21 ) IMnI exp{- 2} where (7) yk = (y (n)' ' -, Y,(l)), (8) S = S(,o, 6) = z {E[a(t)| Vr y(n), --- y );(); er6] } t=-oV Finally, we can obtain the joint density of the original (untransformed) time series y(l), ~~-, y(n), assuming y(O), y(-l), -**, y(-d+l) to be fixed, by substituting for y (t) in (6) and multiplying the result by the Jacobian of the transformation (1), 2 (9) h (y|y(O), *-.-, y(-d+l);,6,,a2, X) 2c na 22

- 5 - where (10) y= (y(n),, y(l)) n d d 2 (11) S = S(,O,,6,X) = z {E[a(t) y(n), ',V y((l);_. 6,%] } t-co VV from (8), where S is now regarded as a function of X as well as of the other parameters, and where n ay (t) n (12) J = J n (Xy) = y = y(t) } n t=l ay(t) i=l Interpreting (9) as a likelihood and taking the logarithm gives (13) L = const - -ln2 + In|M I -+lnJ. - n 2 2o 2 Maximum likelihood estimates of 6, 8, 6, c and x can now be obtained by solving the corresponding simultaneous system of nonlinear normal equations. Numerical Algorithms for Parameter Estimation Following Box and Jenkins [3; p. 213] we will assume that for moderate and large values of n, the term -ln|Mnl in (13) is dominated by S/2o. Discarding this term we approximate (13) by n 2 S (14) L* = const - -lno - + lnJ, 2 2 2o and note that the solution to the normal equation obtained 2 from (14) for a is (15) 2 S (15) y = - n

- 6 - For fixed values of the remaining parameters we can maxi2 mize L* over a by substituting (15) into (14), obtaining (16) L* m = La* (,6,6X) = const - Zln( ) + lnJ. Maximization of L* remains complicated. It is possible, max however, to reexpress (16) as a monotonic function of a sum of squares, enabling the use of a standard, nonlinear, least-squares algorithm without major revision of existing computer programs. The first step in this process is to examine the sum of squares function S in (14). From (11) we have n 2 (17) S = z [a(t)] t= - 00 We simplify the notation here and below by abbreviating E[a(t)w X;,,8,6,X] to [a(t)]. Numerical values for [a(t)] are obtained from the difference equations (18) ((B) [w (t)] =6 + 0(B) [a(t)] where (19) [w (t)] = E[w (t) w,;_8,6,X], (20) +(B) = 1 B - *- B BP p, q (21) O(B) = 1 - el B - * — - 9qB and where B is the backshift operator defined by Box and Jenkins [3; p. 8].

- 7 - Continuing with the problem of reducing (16) to a monotonic function of a sum of squares, we follow Box and Cox and define an additional transformation w (t) (22) z (t) = t 1, X, n; outside the range 1, -—, n we write [w (t)] (23) z (t) n j1/n Because [w (t)] = w (t) for t = 1, *-*, n, (23) is a simple extension of (22). If we now write (24) a(t) = [(/n we can divide (18) by J1/n and obtain (25) t(B)z (t) = 6 + o(B)aZ(t) where (26) = z j1/n Given z (1), *', z (n), it is clear that values of a (t) can be obtained from (25) in exactly the same way that values of [a(t)] are obtained from (18). Using (24), the sum of squares S in (17) can be written

- 8 - n n (27) S = [a(t)]2 = J2/n (a(t))2 = J2 t=-00 t==-00 where n ^ (28) S = (a (t)). t= — Finally, we can substitute (27) into (16) and obtain n Tn 2/n SZ (29) L* = const - -ln(J2/ Z) + lnJ max 2 n S = const - -ln(z). 2 n The problem of maximizing (14) is now reduced to that of minimizing Sz in (29) with respect to o, 0, 6z, and X. From the estimates 6 and X we then obtain z ^ 1 /^ 6 = Jl/n (). We have found Sz to be well behaved and approximately quadratic in X for a variety of time series, suggesting that standard nonlinear least squares algorithms may be used successfully to minimize S. In fact, we have adapted a least squares version of the Marquardt algorithm [7] to minimize S simultaneously over all the parameters including X, and this has proven to be efficient over a wide range of ARIMA models.

- 9 - Examples of Likelihood Functions and Parameter Estimation We demonstrate the approach described above by examining the likelihood surfaces associated with four, simulated time series, denoted A, B, C, and D. Each of these is an ARIMA time series transformed by the inverse of (1) using a different value of A. The four underlying models were chosen to reflect a variety of correlation structures commonly found in practice. Each series contains 100 observations and in each case the initial value y (1) was arbitrarily chosen to be 1000. The simulation process is described in detail for Series A only. Series A was generated from an ARIMA (1,1,0) model 2 with c = 0.5, 6 = 2.5, and a = 1, transformed by the inverse of (1) for X = 0.7. The steps in the process were as follows: (i) A stream of independent N(0,1) variates a(t) was generated and from it an initial value, w (1) was determined. (ii) Subsequent values of the ARMA (1,0) series w (t) A were then obtained from the equation w (t) = 2.5 + 0.5w (t-1) + a(t). (iii) Using a starting value y(l) = y 7(1) = 1000, values y 7(2), *, y 7(100) were determined using y 7(t) = y 7(t-1) + w (t). (iv) Values of y(l), -—, y(100) were obtained from the inversion of (1) for X = 0.7 and y(t) = (0.7y 7(t) + 1)1/7.

- 10 - Using the approximate maximum likelihood procedure described above, estimates were obtained as follows: $ = 0.345, 6 = 4.19, = 0.724, ^2 a = 1.56. For each pair of (fixed) ( and X the likelihood L* 2 in (14) was maximized over 6 and a. The nature of the resulting likelihood is indicated by the contours in Figure 1. It should be noted that the differences between values of L* on adjacent contours are not equal in Figure 1, nor in the other contour maps shown below. It is clear that L* is well behaved and unimodal in this case. Because we work with Sz in (28) rather than L* in our computing algorithm, it is appropriate to examine the variation of Sz with respect to p and X. In Figure 2 Sz is minimized over the remaining parameters. The similarity of Figure 1 and Figure 2 supports the use of (29) in obtaining maximum likelihood estimates. The approximately quadratic nature of L* with respect to X can be more easily seen by plotting L*m (maximized over 6, a max and () against X as in Figure 3. The underlying model for Series B was ARIMA(2,1,0) with %1 = 0.75, %2 = -0.5, 6 = 3.75, and a = 1. Series B was then obtained by an inversion of (1) for X = 0.6. Maximum likelihood estimates were

- 11 - Po U a Cc Q 12 -J.25.29.33.37.41.4, PHI Fig. 1. Log Likelihood Contours6

- 12 - C) co 2: eJ PHI Fig. 2. Sums of Squares of Contours,

- 13 - I I I I I I I I 724 x / \ S I -j I I I I I /. - I I I.63.67.71.75.79.83 LRMBDR Fig. 3. Log Likelihood Series A.

- 14 - @1 = 0.665, @2 = -0.537, = 3.67, ^2 a = 0.428, = 0.585. The regularity of L* is apparent from the surface contours for (tl, X) and ("2' x) in Figures 4 and 5. In-each case, L* is maximized over the remaining parameters. Using X = 0.5, Series C was obtained by inverting (1) for an ARIMA(1,1,l) process with % = 0.8, 0 = -0.5, 6 = 1.0, 2 and = 1. Maximum likelihood estimates obtained were = 0.752, 8 = -0.383, 6 = 0.195, ^2 a = 0.0235, X = 0.362. Figures 6 and 7 show the nature of L* (maximized over the remaining parameters) with respect to (t,X) and (8,A). Series D was generated from an ARIMA(0,1,1) process by inverting (1) for X = 0.4 with 0 = -0.4, 6 = 5.0, and 2 = 1. Maximum likelihood estimates were 0 = -0.444, 6 = 3.04, ^2 a = 0.388, X = 0.366.

- 15 -.61.65.69.i PHI (1) Fig. 4. Log Likelihood Contours,.77

- 16 - OD * ca: C c: CJ co g( -.64 -.060 -.56 -.52 -*48 PHI (2) -. 44 Fig. 5. Log Likelihood Contours5

- 17 - + -, cr 0' Lv) 0 ( I.65.63.73.77.81 PHI Fig. 6. Log Likelihood Contours. I I

- 18 - cD cc C0 -.40 THETR -.28 Fig. 7. Log Likelihood Contours,,

- 19 - The variation of L* over Co, k) is shown in Figure 8 in the same manner as for the previous cases. The large discrepancies between estimates and true 2 values of 6 and a, respectively, are an interesting feature of the estimates determined from the simulation results. The transformation (1) is in effect a choice of a new metric for the analysis. Because the units of measurement of 6 '^2 2 are units of the new metric and the units of are (units)2 of this metric, these estimates are very sensitive to the choice of X. An investigation of the joint properties of the maximum likelihood estimates of X and the structural parameters will be the subject of a future paper by the authors. Application to the Analysis of U.S. Money Supply, M2 We illustrate the results of the preceding analysis by applying them to the U.S. money supply (M2), seasonally adjusted. In Table 1 the data are shown by month for the period January. 1970 through April, 1975, and a plot of the data is given in Figure 9. An examination of Figure 9 suggests that there may be an exponential trend in the data so that an initial value of X less than 1 may be appropriate. In Figure 10 we have plotted the series after transformation for X = 0.75, 0.50, '0.25, and 0 (the logarithm). A slight overtransformation is suggested for X = 0.5 which becomes more pronounced for X = 0.25 and X = 0. The

2Q - 54 -.50 -.46 -.42 -.38 -.34 THETA Fig. 8. Log Likelihood Contours* r&

- 21 - W~ o OC In \j " 3 + UJ $ CM Cu Co -Jo s-o C> I 14 28 42 56 64 MONTH Fig. 9. U.S. Money Supply (M2).

- 22 - X =.75 ' =.50 '> =.25 o = 0.00 Fig. 10. U.S. Money Supply (M2) S.A. -— I --- —---------------------- -----— cl

- 23 - Table 1 U.S. MONEY SUPPLY (M2), SEASONALLY (billions of dollars) ADJUSTED - - 1970 1971 J 393.3 429.9 F 392.6 436.7 M 395.3 443.1 A 398.8 447.6 M 401.0 452.6 J 403.6 457.1 J 407.4 459.3 A 411.9 461.2 S 416.4 463.7 0 419.2 466.6 N 421.9 469.6 D 425.3 473.1 Source: Various issues 1972 477.5 482.7 487.5 490.9 494.3 498.7 503.2 507.8 512.3 516.5 520.3 525.7 1973 529.8 532.9 535.3 538.8 544.2 549.5 551.9 555.1 557.2 561.6 567.2 572.2 1974 575.5 580.9 585.5 589.4 591.6 597.1 599.6 601.9 603.4 607.6 611.6 613.5 1975 615.5 620.3 626.4 630.4 of the Federal Reserve Bulletin. series transformed by X = 0.75, however, shows a trend that appears to be approximately linear; thus we could expect that the first differences would be approximately stationary. The identification stage of the Box-Jenkins procedure was therefore applied to the series transformed with X = 0.75. First differences of the transformed series appeared to be stationary, and the differenced series was identified as an ARMA(2,0) process. Initial estimates obtained from the identification phase were: (0) = 0.44, (0) -0.19,

- 24 - 6(~) = 0.60, (0) = 0.75. Maximum likelihood estimates obtained by our procedure were:,1 = 0.561, = 0.266, = 0.588, X = 0.759, ^2 o = 0.718. The variation of the likelihood function L* with respect to X for this series is shown in Figure 11. Finally, we show in Figure 12 the effect on the linearity of the trend when X = 0.759 is used in the transformation (1).

- 25 - x:C -J.66.70.74.78.82 LRMBDR.86 Fig. 11. Log Likelihood (M2),

- 26 - C\J co 0 (U CT 0 0 0 CU in O (U 9 cm4 1 14 28 42 56 64 MONTH Fig. 12. Transformed M2 (Lambda =.759),

- 27 - BIBLIOGRAPHY 1. Box, G.E.P., and Cox, D.R., "An Analysis of Transformations." Journal of the Royal Statistical Society, B, 26 (1964): 211-43. 2. Box, G.E.P., and Jenkins, G.M. "Some Comments on a Paper by Chatfield and Prothero and on a Review by Kendall." Journal of the Royal Statistical Society, A, 135 (1973): 337-45. 3.. Time Series Analysis, Forecasting and Control. San Francisco, Calif.: Holden-Day, 1970. 4. Chatfield, C., and Prothero, D.L. "Box-Jenkins Seasonal Forecasting: Problems in a Case-study." Journal of the Royal Statistical Society, A, 136 (1973): 295-315. 5.. "A Reply by Dr. Chatfield and Dr. Prothero." Journal of the Royal Statistical Society, A, 136 (1973): 345-52. 6. Harrison, P.J. "Discussion of a Paper by Dr. Chatfield and Dr. Prothero." Journal of the Royal Statistical Society, A, 136 (1973): 319-24. 7. Marquardt, D.W. Journal of the Society of Industrial and Applied Mathematics, 11 (1963): 431-41. 8. Tunnicliffe Wilson, G. "Discussion of a Paper by Dr. Chatfield and Dr. Prothero." Journal of the Royal Statistical Society, A, 136 (1973): 315-19.

REQUISITION UNIVERSITY OF MICHIGAN — I CHARGE TO PRINTING [3 ADDRESSING [I Fund/Dept. Expend.... Restr. Fund/Dept- --------- -----—.-..p.e —n-.d...R e ---. —..........-....-... —........-.......... Acct. No.....30768 P-_0.... Acct...-Re s,.,.P-ubl. —...&.. Ser-v. --- —-—.. FILL IN EXACT NAME OF FUND OR DEPARTMENT AND BUDGET ACCOUNT SEND ORIGINAL AND DUPLICATE TO ABOVE NAMED DEPARTMENT DATE 8-19-75 REF. NO. QUANTITY [y DESCRIPTION LEAVE BLANK 100 copies One-sidedXfrom bond typescript Wbrking Paper 117 (Spivey and Wrobleski) i ' -\ \^ — 91Ld -______ 25.-z____I aJz — 0 m 0 N 0 — 0 (Our project 911) _______ \, l (Or pr)iec 911 \ k I \ I-k INSTRUCTIONS Services of Printing Office or Addressing Service may be ordered direct on this requisition form. Check the appropriate square at top of form. Do not intermix items for the different services but use separate requisitions instead. Send white and yellow copies of requisition. Yellow copy will be returned to you with reference number added. When calling, refer to reference number. ADDRESSING - Services include preparation of metal addressing plates and addressograph printing from plates on labels, envelopes, self mailing pieces, publications, file cards, etc. Care and upkeep of plates, deletions, new additions, changes, coding done on order of list owner. Use of other than own list must have authorization by owning department. Diamond labels kept in stock for immediate use. High-speed mechanical application of labels is available. PRINTING-Give quantity and description by title or name of item. If similar to a previous order give old reference number, and form number if previously used. Provide as many details as possible. Copy, sample, sketch, photocopy or other material to identify or describe finished product must accompany requisition. Proofs desired Yes D No X Phone [X COmpus Mail D Via 164 LS&A [ Contact person.A.iice --.P ketes Address -.-4i2-A-.Buls-...Ad.-... -— Phone -— 4.13-66 -] Deliver to. --- —------------—................... Will call for at: [j North Campus Office Signed:................... % - EAD OF DEPT. OR AUTHORIZED REPRESENTATIVE