Division of Research Graduate School of Business Administration The University of Michigan ACCURACY MEASURES AND THE EVALUATION OF FORECASTS Working Paper No. 463 Essam Mahmoud University of Michigan-Flint FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the expressed permissions of the Division of Research. May 1986

I

Accuracy Measures and The Evaluation of Forecasts Many forecasters and decision makers such as executive managers, planners, production managers, sales managers, and inventory managers have different needs in terms of the following: - The timing of an event (e.g., when the next recession will start); - The magnitude of a variable (e.g., sales volume next month); - The timing and quantities of some variables (e.g., when and how many raw materials to order); and - The monitoring of some quantity (e.g., market share). Managers need the above predictions and are faced with the problem of having to select forecasting techniques among the many that are available. Forecasting techniques range from naive models, moving average, exponential smoothing (single, double, etc.), adaptive techniques and econometric models to sophisticated techniques (Box-Jenkins, Parzen's Method, etc.). In addition, forecasts can be made judgmentally. The obvious question is what is the best way of predicting the future. This paper deals with the different accuracy measures and forecasting evaluation. First, a review of research studies in the area of forecasting measures and their application in evaluating forecasts is provided. Second, the reliability of the data sources available for forecasting is discussed. The different accuracy measures and their use are summarized. The paper then considers the selection of the proper parameters and the adjustment of the forecasts through the monitoring of forecast accuracy on a continuous basis. Finally, there is a general discussion before conclusions are drawn and future directions for research suggested.

II i i i I i i I I I i i i i

A REVIEW OF RESEARCH STUDIES IN THE AREA OF FORECASTING ACCURACY AND EVALUATION There have been many research studies which summarize the accuracy and the performance of quantitative and qualitative forecasting techniques. For example, numerous studies have indicated that quantitative techniques perform better than qualitative techniques while others have found the opposite result or that their performance is about the same. Other research has evaluated the performance of a particular model relative to other models. Many studies have indicated that simple forecasting techniques do as well as sophisticated techniques and in some cases they do better. Other researchers have showed the importance of using combining forecasting techniques and the impact on improving the accuracy whether using a simple combining approach or a weighted approach. Detailed information on many of the findings is summarized in an article by Mahmoud (1984). Table I summarizes briefly some of the most important findings. Insert Table 1 about here On the whole, past research suggests that quantitative methods out-perform qualitative methods. This is of obvious significance to practitioners wishing to improve their forecasting accuracy. Forecasters or practitioners must, however, be aware of the particular circumstances under which empirical research has demonstrated the superiority of quantitative methods. Only where the circumstances are similar in practice can more accurate forecasts using quantitative techniques be expected. For instance, when the forecaster is dealing with a limited number of past observations, the applicability of qualitative methods might be more appropriate. Armstrong (1985) suggested that it is advantageous to experiment with more than one qualitative method as some are more accurate than others. Another interesting conclusion from the practitioners' point of view that many past studies have revealed is that simple forecasting methods perform equally as accurately as do sophisticated methods. This has been illustrated by a variety of studies. Examples are Makridakis and Hibon (1979) and Makridakis et. al. (1982). The implication of these findings is to encourage - 2 -

practitioners to view forecasting methodologies as a set of methods within their ability to understand and use them. Thi.s may be especially so in the case of t'mna.gers who wish to predict and cope with future uncertainties but do not have the training or expertise to deal with the very complex forecasting techniques. For theorists, the implications are to concentrate their efforts on the development and refining of simplier forecasting models, and on the simplification of more complex techniques. THE RELIABILITY OF THE DATA SOURCES A major consideration in the selection of a forecasting method for a particular application is the type of pattern in the data. Normally, there are four different data patterns: horizontal, seasonal, cyclical and trend. However, one may find that one or more of these patterns could exist in a particular time-series. Identifying the type of data would enable the forecaster to concentrate on a group of methods which is more suitable to a particular data pattern. However, before a data pattern is identified, it is important that the forecaster recognizes the dependence of any forecasting method upon a reliable database. For example, Mahmoud (1982) and Rice and Mahmoud (1985) provided information on a variety of data bases available which would be useful for organizations or international businesses. The lists identified the type of data available and its applicability in forecasting. Also, they discussed the importance of measuring the accuracy of the databases and how one would identify their reliability for a particular source. Proper operation and maintenance of an accurate and timely data system gives the forecaster an instrument with which to control and minimize the shortcomings of various forecasting methods. It is, therefore, essential to evaluate the databases available to verify the reliability of the data before analyzing the data pattern. Finally, from a practical standpoint, if valuable results are to be obtained from applying forecasting models, managers and forecasters must remember that a forecast is only as accurate as the data set upon which is based. - 3 -

MEASURES OF FORECASTING ACCURACY Accuracy plays an important role in evaluating forecasting methods. Accuracy can refer to "goodness of fit" which in turn measures how well the forecasting model is able to reproduce the data that were used to develop the forecasting model. Most importantly, however, it should refer to the future (post-sample). that is, for data that have not been used to develop the forecasting model. Perceived accuracy varies from one application to another or from one decision maker to another as described by Wheelwright and Makridakis (1985). For some decision situations, plus or minus 10% may be sufficient; in others, a variation of as little as 5% could spell disaster. Thus, being familiar with the different accuracy measures and their pros and cons would enable those decision makers seeking high levels of accuracy to acheive more accurate forecasts. While accuracy is a significant factor in evaluating forecasts, it is difficult to define it. The difficulty is associated with the absence of a single universally accepted measure of accuracy (Gardner and Dannenbring, 1980; Mahmoud, 1984; Makridakis and Wheelwright, 1979; Makridakis and Winkler; 1985. This is due to the fact that specific accuracy measures are appropriate for different types of forecasting applications. For example, accuracy measures are defined by Granger (1969) as loss functions and can be in the form of linear, quadratic, or non-symmetric functions. Suppose a forecasting model could be best fitted using a quadratic model and an accuracy model such as Mean Absolute Error (MAE) was used which is more suitable for measuring linear or non-symmetric functions. In this case an accuracy measure is not appropriate for the type of data used. This problem can be avoided with a clear understanding of the different accuracy measures. Unfortunately, there is no single accuracy measure that can be implemented in every forecasting situation. Also, it has been shown in many studies that the best model fitted (ex-ante) in terms of accuracy does not necessarily provide the best forecast in the forecasted phase (ex-post) (see Mahmoud, 1982; Makridakis and Wheelwright, 1979). For example, Table 2 shows the performance of thirteen forecasting models - 4 -

tested with a representative series of weekly sales data covering a 104 week time horizon, in which 12 periods were used for the ex-post phase (Mahmoud, 1984). The thirteen forecasting models ranged from the simplistic naive forecasting method to the complex Box-Jenkins approach. The thirteen forecasting models are shown in Table 2 and are listed in the order of the Mean Square Error (MSE) of forecasting accuracy. The rank order of the MSE is from low to high. Note that the rank order of the Mean Square Error (ex-ante) and the U-Statistic are usually closely related. There is little association between the rank order of the accuracy measures at the ex-ante and ex-post phases. The cost of forecasting error does not appear to be related to any of the other accuracy measures used at the forecasted phase. Insert Table 2 about here In this section some of the most widely applied measures will be discussed to show their advantages and disadvantages. It should be noted that one common goal is to minimize the error in the forecast. Thus, the error is defined as: Error = Actual - Forecast or et = At - Ft where et represents the error at period t At represents the actual value at period t Ft represents the forecasted value at period t. For a time series of a variable such as the sales of product A, Figure 1 represents the actual value of the monthly sales of the product from January 1980 to December 1985, that is, 72 periods. By identifying the data pattern and choosing the appropriate model, the forecaster can measure the performance of the model by calculating the total errors from January 1980 to December 1985 (fitted phase). The difference between the two values (actualforecast) is a measure of the error in forecasting this variable for each period. In this fashion, t1 = January 1980 and tn December 1985. Remember that December 1985 represents the current - 5 -

period. In Figure 1, it should also be noted that the forecaster can consider the fitted phase as starting form January 1980 which is represented by period tl, through tn and the forecasted phase is from period tn+1 to period tn+m. The forecaster would like to forecast sales for the next six months of 1986. These periods are defined as t+1, t+2, to t+m where m = 6. In other words, Ft+i represents the sales forecast for January 1986. Ft+2, February 1986 and Ft+m represents the forecast for June 1986. A clear distinction is needed between the errors of fitting the model to the data from January 1980 to December 1985 (the fitted phase) and the errors of forecasting from January 1986 to June 1986 (the forecasted phase). Total errors from January 1980 to December 1985 (fitted phase): 2 (At - Ft) t=1 or n - 2 et t=1 where t = 1, 2,..., n, from January 1980 to December 1985. The right hand side is known as "the sum of the error term." The total errors of the forecast for January 1986 to June 1986 can be calculated as follows, after the actual sales values for those months are known: m Sum of the error = 2 et t=n+l where e1 = the error of January 1986; em represents the error of June 1986 in this example. Summary of Measures A summary of accuracy measures is presented below based on sources such as Makridakis et. al. (1982), Armstrong (1985), Steece - 6 -

(1982), and Mahmoud (1982), Some of these measures are more widely used than others. However, it is important to know what type of measures are available. Following the same definition mentioned earlier, we will summarize and discuss some of these measures. 1. Error = Actual - Forecast or et = At - Ft This represents an individual error for a given time t. 2. Mean Error (ME) n 2 (At - Ft) ME = t_ n 3. Mean Absolute Error (MAE) n n 2 et 2 At - F) MAE =..... _ t-1.. n n This measure is also known as Mean Absolute Deviation (MAD). The measure gives an equal weight to the individual error of each period, while not offsetting the positive and negative values of the individual error. MAE is an appropriate measure whenever the loss function is linear and symmetric. In the case of a linear cost function, the MAE of 10 units is twice as costly as an error of 5 units. 4. Percentage Error (PEt) At - Ft PEt = A (100) At The error is determined based on a weighted value which is the - 7 -

actual value of each period. 5. Mean Percentage Error (MPE) n n 2 Pt 2 At - Ft t=1 t=1 At MIPE = -- = -.. tn n If the percentage errors are simply added together, positive values will offset negative values and the average percentage error will be small, even though the individual error may be substantial. MPE assumes a linear cost function. An alternative approach to MPE is the Mean Absolute Percentage Error (MAPE) which combines the individual percentage errors without offsetting the negative and the positive values. 6. Mean Absolute Percentage Error (NAPE) n 2 PEt MAPE = t= n This measure is similar to the Mean Absolute Error (MAE) or MAD. However, MAPE treats each error equally without taking account of the sign. It is useful in comparing different forecasting models. MAPE assumes that the cost of errors is more closely related to the percentage error than to the unit error. 7. Adjusted Mean Absolute Percentage Error (NAPE) M2PEAt 1 Ft t=1 1/2(At + Ft) MAPE =.... (i10 0) n - 8 -

The MAPE is similar to the MAPE. It does not weigh the error based on the actual value only but on both the actual and forecasted values for the same period. 8. Mean Squared Error (MSE) n At - Ft)2 MSE = t-i1... n MSE is one of the most commonly used measures of accuracy. Forecasters usually choose the models which minimize MSE. However, there are two shortcomings of using MSE as a measurement of accuracy as discussed by Makridakis et. al. (1983). First, a comparison of the MSE developed during the fitted phase may give little indication of the accuracy of the model at the forecasting phase. Secondly, MSE as a measure of forecasting accuracy is limited by the fact that different methods use different procedures in the fitting phase. For example, smoothing methods are highly dependent upon initial forecasting estimates, whereas regression methods minimize the MSE by giving equal weight to all observations. Furthermore, Box-Jenkins minimizes the MSE of a non-linear optimization procedure. Thus comparisons are difficult and highly dependent on absolute units which makes comparisons among series practically impossible. 9. Root Mean Squared Error (RMSE) n N/ 2 CAt - Ft) RMSE = / t-1 n It is similar to the MSE measure, but the associated cost function is quadratic. The disadvantage of using the RSME, as with MSE, is that it is an absolute measure of the errors. - 9 -

10. Standard Deviation of Error (SDE) / 1' n 2 At - Ft)2 t=l SDE n - 1 It is similar to the RSME. The only difference is that the total sum squared of the errors is divided by n - 1 instead of n. 11. Coefficient of Variation (CV) It is similar to the statistical inference coefficient of variation. It relates either SDE or RMSE to the average of the actual data. The smaller the value the better the performance of the model. SDE CV = or CV = n 2 At /n t=i RMSE eL....~~~~~~~~~~~~~~~~ n 2 At t=1 /n - 10 -

12. Coefficient of Determination CR2) n 2 At - Ft)2 t=1 n 7 CAt - X )2 t=1 where X is the average of At. R2 is commonly used in regression analysis. It can also be used as a measure of accuracy for time series models. R2 ranges from 0 to 1. The closer the value of R2 to 1 the better the forecast of the model. However, one should be familiar with the interpretation and the use of R2. Armstrong (1985) and Nelson (1974) discussed the use of R2. 13. Theil's U-Statistic (U) 2 / n-i / tl n Ft+i - At+l t=1 At U = n-1 A A X 2 | At+1 - At t=l At Theil (1971) explains in detail the use of the U-Statistic as a relative accuracy measure. U as a measure of accuracy allows a relative comparison of formal forecasting methods with the naive approch and also squares the error involved so that large errors are given much more weight.than small errors. When the accuracy of a naive method and the formal forecasting model are compared, the interpretation of the U-Statistic is as follows: - 11 -

U = 1 the naive method is as good as the forecasting model being evaluated. U < 1 the forecasting model being used is better than the naive method. U > 1 the naive method produces better results than the forecasting model. 14. The Durbin-Watson Statistic (D-W) n (et - et-) t=2 D-W n 2 t t=1 Makridakis et.al. (1983) detailed the computation and its use. As a rule of thumb, a good fitted forecasted model would reveal a value of the D-W statistic around 2. 15. Loss-cost Function for Measuring Accuracy Accuracy measures are known as cost functions. Granger (1969) indicated that the actual function of the error could be estimated by standard accounting procedures but that in many cases it will not be symmetric. He discussed different cases and the consequences of using the generalized cost function. Many cases can be considered such as the quadratic, linear and non-symmetric cost functions. Granger focused on the error term and the shape of the function of the ierror. We assume that for every forecast error, there is at associated loss function. Granger assumes that for each orecast error (e) at time t, et, there is an associated loss t(et). He discussed many different functions associated with forecasting error. Mahmoud (1982, 1985) also defined the accuracy of forecast errors in terms of a loss-cost function which is measured in dollars. The previously discussed measures of accuracy suffer from several shortcomings in addition to giving no indication of the dollar value of the forecasting error. It is useful to analyze the trade-off between the cost of using a method - 12 -

and the accuracy of that method. Mahmoud (1982) and Mahmoud et. al. (1985) developed several accuracy measures that managers or forecasters can use in measuring the opportunity cost of an inaccurate prediction in terms of dollars for various practical situations. Thus, managers or forecasters will be able to evaluate the total cost of using a forecasting model including its accuracy in terms of dollars. Applying a comparative analysis among different methods also enables them to determine the best alternative. The cost consists of the amount of money invested in a special program or a partial amount if the forecasting method is a part of a comprehensive or integrated package, the storage cost of the program and data, running costs and cost of human resources (for more details see Mahmoud, 1982). Furthermore, determining the total cost of different forecasting methods in terms of dollars enables managers or forecasters to choose the method which provides them with the accuracy they would like to achieve within their financial constraints and according to the level of technology and forecasting abilities that they may have. The loss-cost functions enable managers to choose a model based on the trade-off between the amount of money that could be saved and the extra money required to implement a more sophisticated method. For example, it is possible to compare the cost of combining two simple forecasting techniques versus the use of a more sophisticated technique. Studies have shown that the former in combination is more accurate than the latter (for more details on combining forecasting methods, see Makridakis and Winkler, 1983 and Mahmoud, 1984). A particular example of a loss-cost function can be implemented only under a specific assumption. This assumption is that management requires that the total amount available in a period must be equal to the forecast for the period. Thus, the forecast is defined as Ft and At represents the actual value at the same period. The cost of forecasting error at period t can be defined as: Ct = h Zt(Ft - At) + s (1 - Zt)(At - Ft) where h represents the stock holding cost per item per period, - 13 -

s represents the shortage cost per item and Zt = 1 when Ft ~ At, otherwise Zt is zero. Thus, the total loss-cost function over the 'n' forecasting periods is determined as follows: n TC = 2 [h Zt(Ft - At) + s(1 - Zt)(At -Ft)] t=1 Some other practical cases for different types of inventory systems can be found in Mahmoud (1982) and Mahmoud et. al. (1985). Relative Accuracy Measures There is a need to compare the accuracy of two or more models in order to select the most accurate model or models. While Theil's U-Statistic is considered a relative accuracy measure, it is limited in that it only compares the formal forecast method being evaluated relative to the naive method. Gardenfors (1980) also introduced a relative accuracy measure known as the 'I' value for comparing the naive model with the formal forecasted model. The 'I' value is determined as follows: I = log NSS - log ESS = 2 (log RMSN - log RMSS) where NSS is the sum squared of the difference between the actual values of the variable and the values obtained from the naive model, and ESS is the sum squared of the difference between the actual values of the variables and the values obtained from the forecasting model. RMSN is the root mean square of using the naive model and RMSS is the root mean square of using the forecasting model (for more details see Gardenfors, 1980). Mahmoud showed that the 'I' value is identical to the MSE. Table 3 shows the relative performance of thirteen different forecasting methods and their accuracy according to the use Insert Table 3 about here of both the 'I' value and the Mean Squared Error. It is clear that the rank order is identical. The greater the value of 'I' - 14 -

the more accurate the model will be. One should notice the value of the Time-Series Multiple Regression model which was -0.03. As Gardenfors indicated, one should not consider a given model as long as its value by applying 'I' is negative. One can conclude, as long as the MSE of a particular model is greater than the MSE of the naive model the forecaster should not consider the model.Furthermore, one would be able to determine a relative accuracy measure by using MSE rather than using the 'I' value which would require more computational efforts from the forecaster. Mahmoud's study concluded the following I. The accuracy measures of both the forecasting method and the naive method can be defined by the following ratios: Accuracy Measure of Forecasting Method MSES MPES i.e______ i. _ or Accuracy Measure of Naive Method MSEN MPEN MAPES SDES or or MAPEN SDEN where all the numerators (MSES, MPES, MAPES, SDES) represent the accuracy measures of the forecasting method (a smoothing technique, for example) and all the accuracy measures in the denominators (MSEN, MPEN, MAPEN, SDEN) represent the accuracy measures of the naive model. The following comments can be made: A. The forecasting model (i.e. the smoothing model) performs better than the naive model if any of the MSES, MPE., or SDES is less than the associated measure for the naive model CMSEN, MPEN, MAPEN, or SDEN) or the ratio is less than 1. B. The naive model performs better than the forecasting model if the MSES, MPES, MAPES, or SDES of the forecasting model is greater than the associated accuracy measure for the naive model or the ratio is greater than 1. C. The naive model performs the same as the forecasting model if the MSEs, MPES, MAPES, or SDES is as the same as the associated accuracy measure the naive model or the ratio is equal to 1. It is clear that these rules are similar to the rule indicated by the U-Statistic. - 15 -

It should be clear that in some situations where the data are very stable and do not fluctuate, using a naive method would be appropriate. II. In the case of evaluating any two forecasting methods, the same rules can be followed. For example, if a manager would like to determine the relative performance of two methods, such as Single Exponential Smoothing and Box-Jenkins method, he or she would define the ratio as follows: MSES MSE1 or MSEB-J MSE2 If the MSES or MSE1 (Mean Squared Error of using Single Exponential Smoothing) is less than MSEB_J or MSE2 (Mean Squared Error of using Box-Jenkins method), this indicates that the exponential smoothing model is more accurate than the Box-Jenkins model and the ratio is less than 1. One could conclude the same as A, B, C. The above list does not represent an exhaustive list of accuracy measures. Some more accuracy measures are discussed by Armstrong (1985) and Makridakis and et. al. (1983). Structural Change and Bias Forecasters should find systematic methods that improve the forecast performance. Thus, it is desirable to have a forecasting system that corrects potential biases prior to the integration of a forecast in the organization. It is useful to rely on an approach that helps forecasters detect bias and measure it. Thus, once bias is measured, the opportunity to improve future forecasts arises through correction of the bias. This could either be achieved through applying a test of structural stability as explained by Tiao et. al. (1975) or by determining bias decomposition as discussed by Theil (1971). It is important to realize that almost all forecasts can be expected to contain some error. However, one would select the model which minimizes the errors. Theil (1971) showed the usefulness of using three different - 16 -

bias attributes known as the Mean Different Error, UM, the Regression Pattern Error, U, and the Random Error expressed in terms of the population parameters, UD Expected Squared Error can be decomposed into three components as follows: MSE = E (At - Ft)2 = (A - F)2'+ (CF - p jA)2 + (t - p2)2A where JA and pF are the population means of actual value, At, and the forecasted value Ft, oA and aF are the population standard deviations of and p is the population correlation between At and Ft. Dividing the previous equation by MSE provides the following: (CA - F)2 MSE Mean Differnce Error ( CF - p A)2 MSE + (1 - p2)o MSE The Regression Pattern Error The Random Error Thus, the three components derived by Theil as follows: Mean Different Error = UM = Regression Pattern Error = UR = Random Error = UD = (1971) can be defined (K - F)2 MSE (SF - rSA)2 MSE (1 - r2)S2 SE A.SE The bias attributed to differences in sample average levels of actual values At and forecasted values Ft is measured by U. In a - 17 -

regression of the form: At = Ft + st UR measures the deviation of the sample regression slope from i and uD measures the sample variance of the Regression Error term st. To achieve perfect forecasts, U = UR = and UD = 1 Thus, one could conclude in practice that if both values UM and UR are close to zero and UD is close to one, the forecaster has achieved an unbiased forecast. For more detailed information related to the use and the application of different cases readers are referred to Theil (1971). and Moriarty (1985). Forecasting Alternatives The issue of considering alternative models, measures or approaches is an important one. The work of Makridakis et. al. (1982), known as the M-Competition, indicated the importance of three factors (time horizon, the type of data, and the accuracy measure) that affect the forecasting accuracy of various methods. Accuracy depends on the application considered. Managers should consider alternative accuracy measures when these are to be used for a variety of forecasting applications. For example, Makridakis (1979) indicated that in case of forecasting inventories, large errors are undesirable, thus the use of the MSE accuracy measure would be appropriate. In budget forecasting the MAPE accuracy measure is commonly used. In situations requiring a single forecast (e.g. in bidding for a large contract in the futures market), Average Rankings of a particular accuracy measure must be used. Where only two methods are considered and the size of the error is not important, the Percentage Better method should be employed. Some forecasting techniques require a minimum number of data points. For example, applying Census II and Decomposition forecasting methods using some computer packages requires a minimum number of 78 data points. Thus, if the database does - 18 -

not include the minimu.m -number of data points thlat a particil.lIarp model requires it would affect the availability of this alternative. Different models also are appropriate for different time horizons. For example, it is advisable to consider deseasonalizing single exponential smmothing and Holt's method in the case of forecasting for one-period ahead. In the case of forecasting for four to six periods ahead, that is longer forecasting horizons, sophisticated methods such as Parzen's and Lewandowski's methods are recommended by Makridakis et. al (1982). Fildes C1982) indicated the importance of considering alternative variables to be included in econometric or regression models. The inclusion of different variables will affect the explanatory power and predictive accuracy of such models. By considering a checklist of variables and their likely impact on the decision being contemplated, it becomes possible to identify those variables that most require attention. Thus, preparing alternatives would be very useful in addressing this issue. In addition, the availability of such alternatives depends on the information available and the number of data points. It is important to consider the alternative of combining two or more forecasting methods instead of relying on only one method. Several studies of combining forecasting methods have shown the value of combining forecasts in improving accuracy (CMakridakis et. al., 1982; Makridakis and Winkler, 1983; Mfahmoud, 1982 and Zarnowitz, 1984). Users can consider one of the three forms of combining forecasting techniques (see Makridakis and Winkler, 1983). The first form takes a simple average of two or more forecasts. The second approach is known as historical weightings, in which each forecast generated from each model should be weighted by the ratio of one minus its Mean Squared Error to the total Mean Square for all the forecasts. The third method uses subjective weighting, in which managers apply weights based upon their personal judgments to the forecasts as to which methods more closely reflect the changing reality. Managers should be encouraged to consider even a simple - 19 -

average of two or more forecasting models. In conclusion, being aware of the conditions under which some techniques perform better than others would enable managers to prepare for different alternatives. By monitoring which alternative works best, managers would be able to achieve their goals effectively. MTONITORING THE PERFORMANCE OF FORECASTING METHODS The manager should monitor the environment closely and should constantly attempt to adjust the parameters of the forecasting model to incorporate any environmental changes. This would provide the manager with a predicted figure which would be closer to the actual value. Makridakis (1986) and Gardner (1969) indicated that monitoring the changes is an extremely important task to ensure that the sytem remains in control. Thus, quantitative forecasts can be modified to account for non-random changes. The constant monitoring of forecasting methods can be achieved easily through a good forecasting system. As computers and many good forecasting packages are now widely available, it is the responsibility of managers or forecasters to select the best program(s) or the system that provides them with many of these guidelines. It is crucial to detect errors as quickly as possible. The forecasting model can then be refitted to the data or changed to a more appropriate model to prevent any serious production or inventory problems. In inventory control, for example, forecast monitoring is essential because of the need to take action when there is a significant change in demand. If the forecast model suggests an increase in demand, new orders should be placed on a priority basis. If demand is expected to fall, any unneeded orders should be cancelled to prevent excess inventory investment. Monitoring devices (tracking signals) are used to keep watch for signs of bias in the forecast errors. Gardner (1983) also discussed three warning signs that can be used to show when a forecasting system goes out of control. 1. The first indicator is the simple cumulative sum (cusum) - 20 -

of the forecast errors, which can be computed and tested in several different ways. This indicator compares the cumulative sum of the errors at the end of each period to the smoothed Mean Absolute Deviation (MAD). Cusum is determined as follows ~ et = At - Ft SMt = et + SUMt_t MADt = a I et I + (1 - a) MADt_i Ct = I SUMt / MADt I where et represents the error for period t, At is the actual value and Ft is the forecast value. SUMt represents the sum of the error at period t, and MADt is the smoothed Mean Absolute Deviation at period t. The smoothing parameter a should have a value between zero and one. Ct is the tracking signal (cusum) for period t. Cusum should fluctuate around zero whet the system is in control. Biased errors occur when the cusum departs from zero or the system is out of control if the Ct exceed the valueof the smoothing MADt. 2. The second indicator is the smoothed-error tracking signal. The tracking signal Tt is measured by using the following set of equations: Et = a et I + (1 - a) Et-1 MADt = a I et I + (1-a) MADt_i Tt I Et / MADt I where Et represents the updated value of the error value et. 3. The third indicator is the first-order autocorrelation in forecast errors. It is more complex than the previous two indicators. The existence of any significant positive - 21 -

autocorrelation indicates lack of control. For more details and signals, see Gardner (1985). Applying any of the three methods is recommended. They are easy to use. The only problem, however, that a user faces is determining the starting values. For example, MADt_1 and SUMt_1 represent the values at the starting point for the previous period t-i. SUMt_. can start with a value of zero. MADti can be set equal to its expected value. Monitoring forecasts constantly by using one of the tracking signals discussed previously alerts the manager to any problem in his forecasting system. This may lead to managerial intervention in the system by actions such as refitting the model, or searching for another model or a combination of models. Also, monitoring the error and adjusting the forecast constantly enables the manager to take some other courses of action related to the decisions that are affected by the forecasts such as inventory and production adjustment. DISCUSSION AND FUTURE DIRECTIONS This paper has outlined some of the important issues related to forecasting evaluation. However, organizations or individuals implement the process of evaluation differently. This is due to the knowledge of forecasting available, training facilities, time, data, the computer system, the degree of integration, and software used. The level of knowledge that a particular forecaster has will reflect on his forecasting performance given a forecasting situation. Having more knowledge enables the forecaster to apply more techniques and to apply more tests to check the accuracy and the basic assumptions under which a particular method can be implemented. However, one should realize that it does not take a great deal of time or effort to gain this knowledge. Once a particular person or team starts, he or she or it will be able to gain more knowledge, especially with the help of the many forecasting software packages available for a - 22 -

variety of computers. Time is an important factor, especially for those who handle many forecasts in a given time period and for those who forecast a limited number of variables during the same time. It is obvious that the latter would be able to apply more accuacy measures and set different alternatives and check thoroughly for model specification parameter performance, etc. For example, forecasters seeking to achieve better forecasting performance are encouraged to spend more time applying different computer runs for their models. This would enable them to specify the parameters that improve the accuracy of the forecasting model. Figures 2, 3, and 4 illustrate the actual sales values of a personal care product, the fitted forecast Insert Figures 2,3,4 about here model (ex-ante) and the forecast (ex-post) using Winter's Exponentional Smoothing model. Figures 3 and 4 show the results achieved by using the parameter values (a = 0.06, ( = 0.17 and 6 = 0.10) and (a = 0.06, (3 = 0.06 and 6 = 0.90) respectively. Note that the former parameter values were determined by the software package using the search guide for the optional values of a and f( given that 6 is set at 0. 10. One should realize that spending more effort and time would enable the forecaster to achieve better results by using the value judgmentally as applied in Figure 2. In this way the forecaster can identify any seasonal factors and the degree of variability in the data. Studying the data structure carefully will indicate whether the values of the parameters should be increased or decreased. Among many other studies, Mahmoud (1982) showed that using Single Exponential Smoothing and changing the parameter from 0.10 to 0.11 improved the accuracy of the model. Thus, to summarize, it is important that the forecasters or managers spend sufficient time in applying different attempts (runs) with different values of parameters (i.e. a, (3, 6, etc.) to choose the estimated parameters which provide the better forecast. - 23 -

Applying different combining forecasting methods enables the forecaster to achieve more accurate results than relying on an individual method. Integrating forecasting applications as part of the planning process of the organization would help the forecasters to monitor closely any changes and their impact on forecasting. Wright (1986) showed the importance of considering forecasting as part of Decision Support Systems and not as a self-contained activity. Finally, the availability of a good forecasting system would have a great impact on the forecaster or the decision maker. Surely a good forecasting package or program would be very useful so the forecaster would be able to address many of the issues discussed in this paper. For example, Wheelwright and Makridakis (1985) describe in detail two complete comprehensive forecasting systems, SIBYL-RUNNER and FUTURCAST. In the latter the system consists of a wide range of forecasting techniques. The system allows the forecaster to use at least five different accuracy measures and different techniques for model structure such as autocorrelation. It also permits combining more than one forecasting technique. The system allows the forecaster to monitor the environmental change and include the change in the model. For more details about forecasting software and systems see Beaumont et al. (1985) and Mahmoud et al. (1986). In order to achieve a better accuracy performance the forecaster applies more than one accuracy measure to ensure a thorough evaluation. This can be achieved by either including more accuracy measures in the forecasting package that is currently available or considering having such facilities when buying a package or a program. Using an integrated package such as a spreadsheet would enable the forecaster to apply more accuracy measures. Furthermore, a package which includes comprehensive forecasting methods would enable the forecaster to test the performance of a variety of models. A package or system that provides the forecaster with different tools for diagnostic checking, model specification and monitoring the forecast error as was - 24 -

menticon ed earlier would be more useful, CONCLUSIONS Managers or forecasters face a great deal of difficulty in selecting and evaluating their forecasts. A systematic procedure should be integrated with the planning activities and as a part of the Decision Support System. However, a great understanding of the state of the art of forecasting techniques, awareness of the most important findings, the availability of different accuracy measures and their use, the availability of reliable databases, the contribution of different alternatives, monitoring environmental changes, and the availability of good computer forecasting systems are important criteria to be considered when evaluating the forecasts. The value and the outcome of the evaluation process depends on the organization's databases and the forecasters' experience, their knowledge of certain forecasting models, and their ability to understand the past and current changes. ACKNOWLEDGEMENTS The author would like to thank Professors Hossein Shalchi and Gillian Rice for their comments and Rochelle Moleski for help. I - 25 -

Area of Application Quantitative Methods versus Judgmental Methods Box-Jenkins vs. Exponential Smoothing Combining Forecasts Using Simple or Weighted Average Combining Corresponding Sets of Individual Forecasts Forecasting Systems for Reduced Bias in Forecasts Assessment of the State of the Art of Forecasting Table 1 Main Results These studies indicated that uantitative methods provided better forecasts than judgmental methods. In a comprehensive study of 111 time series for determining the accuracy of different time series forecasting techniques, the results indicated that Box-Jenkins models were less accurate than moving average and smoothing methods. It was found that combining methods either using simple or weighted average methods provided a good forecast which resulted in a total overall better performance on the average than the individual methods included. The study investigated the accuracy of combining corresponding predictions from different sources and the corresponding sets of individual forecasts. The results showed that there are gains to the forecast users from combining predictions from different sources. The study illustrated different design features of forecasting systems that can be used to improve the performance of any forecasting method. The study provided an overall assessment of the state of the art of forecasting and also suggested guidelines for forecasters. Literature Sources Armstrong (1985) and Fildes and Fitzgerald (1981). Makridakis and Hibon (1979). Makridakis et. al. (1982) Winkler and Makridakis (1983). Zarnowitz (1984) Moriarty (1985) Makridakis (1986)

Table 2 Forecasting Model and Associated Accuracy Measures Fitted Phase Ex-Ante Rank Order of Forecasting Models According to MSE MSE m m 1. Harrison's Harmonic. Classical Decomposition 3. Census II 4. Linear Exp. Smooth. 5. General Adaptive 6. Quadratic Exp. Smooth. 7. Single Exp. Smooth. 8. Winters' Trend and Seasonal 9. Adaptive Response 10. Box-Jenkins 11. Holt'sModified 12. Time Series Multiple Regression 13. Naive Method 4,760 5,585 5,677 7,276 7,290 7,357 7,758 8,045 8,336 9,877 10,445 11,930 13,324 I J-! MPE m - 8.64 -10.01 - 9.01 - 7.62 -11.58 - 8.39 -14.01 - 5.85 -15.68 -11.87 - 8.03 - 9.73 -12.61 MAPE UStat I I 23.8 26.2 26.8 30.4 30. 7 30. 1 31.8 30.0 33.2 35.0 35.2 34.0 40.8.23.28.29.37.34.39.34.36.40.46.57.66 1.00 849 1,214 1,299 949 889 1,194 1,395 1,298 945 1,416 1,095 1,410 1,420 9.04 13.25 -19.17 13.90 12.53 12.32 28.81 11.38 14.81 17.07 12.66 19.22 - 2.31 Forecasted Phase (12 Periods Ahead) Ex-Post Cost of MSE MPE MAPE Forecast. Error 18.2 21.3 24.2 19. 5 18.0 19.8 24.2 23.4 20.5 27.2 19.8 $ 54,112 $ 75,280 $120,433 $ 68,618 $ 64,636 $ 70,003 $ 68,395 $120,705 $ 69,171 $ 76,694 $ 69,785 23.2 $155,501 I I 26.2 $168,426 J I 1 1 2 i L - Notes: MSE = Mean Squared Error MPE = Mean Percentage Error MAPE = Mean Absolute Percentage Error U-Stat = Theil's U-Statistic Source: Mahmound (1984) -27 -

Table 3 Rank Order of MOdels According to "I: Value an.d tSE Ser.ies A "I" Valu e H.using M ean Rank Order Rank Order of Models "I" Value Harrison's Squared Of Models According to "I" using Naive Harmonic Error According Valu.te Mo el Model MSE in MSE 1. Harrison's Harmonic 0.85 4,952.6 1 2. General Adaptive 0.53 -0.32 6,816.3 2 3. Classical Decomposition 0.51 -0.34 6,957.4 3 4. Quadratic Ex. Smooth. 0.48 -0.37 7,209.9 4 5. Single Exp. Smooth. 0.43 -0.42 7,515.3 5 6. Adaptive Response Rate 0.33 -0.52 8,333.6 6 7. Box-Jenkins 0.31 -0.54 8,530.6 7 8. Linear Exp. Smooth. 0.15 -0.70 10,004.8 8 9. Linear and Seasonal Exp. ' Smooth. 0.06 -0.79 10,904.9 9 10. Holt's Exp. Smooth. 0.03 -0.82 11,199.3 10 11. Naive Model -0.85 11,595.8 11 12. Time-Series Multiple Regression -0.03 -0.88 11,905.0 12 Source: Mahmoud, Essam (1984), Gardenfors' 'I' Value: A Comment on the Measurement of the Relative Accuracy of Sales Forecasting Models. Technological Forecasting and Social Change, Vol 25, No. 4, p. 357. -28 -

Figure 1 Sales of Product A A\ At-n+l AtKAl At Ns - eFt Ft+ Ft+3 tA3 Ft+2 w Ft+m tI-1......t1 I....!l,,, I I e I > t-n+l t-3 t-2 t-1 t t+l t+2 t+3 t+m Time tn tl1 t tn tn+l Dec. Jan. 198 1986 tn+m une 986 Jan. 980 Fitted Phase <Ex-Ante..> I < Forecasted Phase tX-I'ost Present - 29 -

I 0 CO I <c C 0 P4 N urn, cxT C% LO 09

- T~1 - ti Inaoia i s ':2:l'A asa ia 8ir

I Figure 4 61t323 48 FI TTED FSRECAS T PRODUCT A - 32 -

REFERENCES 1. Armstrong, J. S., Long-Range Forecasting: From Crystal Ball to Computer, John Wiley, Second Edition, 1985. 2. Beaumont, C.; Mahmoud, E. and McGee, V., Microcomputer Forecasting Software: A Survey. Journal of Forecasting, vol. 4, pp. 305-311. 3. Fildes, Robert, Forecasting: The Issues, The Handbook of Forecasting: A Manager's Guide, Edited by Makridakis, S. and Wheelwright, S. C. New York: John Wiley and Sons, 1982, pp. 83-104. 4. Fildes, R. and Fitzgerald,D., The Use of Information in Balance of Payments Forecasting, paper presented at the First International Symposium on Forecasting, Quebec, Canada, May 1981. 5. Gardenfors, P., On the Information Provided by Forecasting Models, Technological Forecasting and Social Change, 16, 1980, pp. 351-361. 6. Gardner, E. S., Exponential Smoothing: The State of the Art, Journal of Forecasting, Vol. 4, No. 1, 1985, pp. 1-28. 7. Gardner, E. S., Automatic Monitoring of Forecast Errors, Journal of Forecasting, Vol. 2, No. 1, 1983, pp. 1-21. 8. Gardner, E. S. and Dannenbring, Forecasting with Exponential Smoothing: Some Guidelines for Model Selection, Decision Sciences, Vol. 11, 1980, pp. 370-383. 9. Granger, C. W. J., Prediction With a Generalized Cost of Error Function: Operations Research Quarterly, Vol. 20, No. 2, 1969, pp. 199-207. 10. Mahmoud, Essam, Accuracy in Forecasting: A Survey, Journal of Forecasting, Vol. 3, No. 3, 1984, pp. 139-159. -33 -

11. Mahmoud, Essam, Short-term Forecasting: Matching Techniques to Tasks. An Integrated Framework and Empirical Investigation, Ph.D. Dissertation, State University of New York at Buffalo, June 1982. 12. Mahmoud, Essam and Pegels, C. Alternative Methods for Ranking Time Series Forecasting Models, (working paper), 1984. 13. Mahmoud, Essam; Goyal, Suresh K.; and Shalchi, H., Loss-cost Functions for Measuring the Accuracy of Sales Forecasting Methods, (working paper), 1985. 14. Mahmoud, Essam; Rice, Gillian; McGee, Victor and Be-aumont, Chris, Mainframe Specific Purpose Forecasting: A Survey. Journal of Forecasting, Vol. 5, 1986. 15. Makridakis, Sypros, Empirical Evidence versus Personal Experience, Journal of Forecasting, Vol. 2, No. 3, 1983, pp. 295-311. 16. Makridakis, Sypros, The Art and Science of Forecasting: An Assessment and Future Directions, International Journal of Forecasting, Forthcoming, 1986. 17. Makridakis, S. and Hibon, M., Accuracy of Forecasting: An Empirical Investigation, J. R. Statistical Soc., Part 2, Vol 142, 1979, pp. 97-145. 18. Makridakis, S.; Wheelwright, S.; and McGee, V.E., Forecasting: Methods and Applications, Second Edition, New York: John Wiley and Sons, 1983. 19. Makridakis, S.; Andersen, A.; Carbone,R.; Fildes, R.; Hibon, M.; Lewandowski, R.; Parzen, E. and Winkler, R., The Accuracy of Extrapolation (Time Series) Methods: Results of Forecasting Computation, Journal of Forecasting, Vol 1, 1982, pp. 111-153. - 34 -

20. Makridakis, S. and Wheelwright, S. C., Forecasting: Framework and Overview, Forecasting TIMS Studies in Management Science, Vol 12, S. Makridakis and S. C. Wheelwright, eds. North-Holland Publishing Company, 1979. 21 Makridakis, S. and Winkler, R., Averages of Forecasting: Some Empirical Results, Management Science, Vol. 29, No. 9, 1983, pp. 987-996. 22. Moriarty, Mark M., Design Features of Forecasting Systems Involving Management Judgements, Journal of Marketing Research, Vol. 22, November 1985, pp. 353-364. 23. Nelson, C. R., The First Order Moving Average Process Identification, Estimation and Prediction, Journal of Econometrics, Vol. 2, 1974, pp. 121-141. 24. Rice, Gillian and Mahmoud, Essam, Forecasting and the Database: An Analysis of Databases for International Business, Journal of Forecasting, Vol. 4, 1985, pp.89-97. 25. Steece, Bert, Evaluation of Forecasts, The Handbook of Forecasting: A Manager's Guide, edited by Spyros Makridakis and Steven C. Wheelwright, New York: John Wiley and Sons, 1982, pp. 457-468. 26. TheilH., Applied Economic Forecasting, Amsterdam: NorthHolland Publishing Company, 1971. 27. Tiao, G. C., Box, G. E. P. and Hamming, W. J., Analysis of Los Angeles Photochemical Smog Data: A Statistical Overview, Journal of the Air Pollution Control Association, Vol. 25, 1975. 28. Winkler, R. L. and Makridakis, S., The Combination of Forecasts, JRSS, Series A. 29. Wright, David J., Evaluation of Forecasting Methods for Decision Support, The International Journal of Forecasting, Forthcoming, 1986. -35 -

30. Wheelwrig'ht, S. and Makridakis,-S., Forecasting Methods for Management, 4th edition, New York: John Wiley and Sons, 1985. 31. Zarnowitz, V. The Accuracy of Individual and Group Forecasts From Business Outlook Surveys, Journal of Forecasting, Vol. 3, 1984, pp. 10-27. -36 -