Division of Research February 1987 Graduate School of Business Administration CROSS-SECTIONAL DEPENDENCE AND PROBLEMS IN INFERENCE IN MARKET BASED ACCOUNTING RESEARCH Working Paper #501 Victor L. Bernard University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the expressed permission of the Division of Research.

I I I I ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ -— ~ —~ 1~ — r —~ --- —— ~ - ----— ~ —~

Cross-sectional Dependence and Problems in Inference in Market Based Accounting Research by Victor L. Bernard School of Business Administration University of Michigan Initial Draft: November 1985 Revised Draft: February 1986 Revised Draft: June 1986 Revised Draft: October/November 1986 Revised Draft: February 1987 forthcoming Journal of Accounting Research Spring 1987 I am grateful to Pete Wilson, whose comments and suggestions led to several important insights that substantively affected this paper. Craig Ansley, Jack Hughes, Laurentius Marais, and Jim McKeown also deserve credit for helping me to better understand the issues addressed here. Among the many others whose helpful suggestions have affected this research are Ray Ball, Bill Beaver, Dan Collins, Joel Demski, Ken Froot, Robert Holthausen, S. P. Kothari, Richard Leftwich, Tom Linsmeier, Eric Noreen, Pat O'Brien, Katherine Schipper, Tom Stober, Mark Wolfson, Dave Wright, Jerry Zimmerman, and participants in workshops at the University of British Columbia, University of Chicago, Carnegie-Mellon University, the University of Illinois, the University of Iowa, Massachusetts Institute of Technology, the Ohio State University, the University of Rochester, the 1986 Stanford Summer Camp, the University of Washington, and The Wharton School. Any remaining errors are the responsibility of the author. 4

f- ~ ~~~ ~ ~ ~~~ ~ ~ ~~~ ~ ~ ~~~ ~ ~ ~~ ~ ~ ~~~ ~ ~ ~~ ~ ~ ~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ r~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

6 - 1. INTRODUCTION This paper provides a framework and some empirical evidence to evaluate the seriousness of problems in inference that arise in stock-return-based studies when the data are cross-sectionally dependent. The study is motivated on the grounds that statistical procedures designed to address such problems are often infeasible, and even when they can be implemented they sometimes introduce other more serious difficulties. Thus, researchers have frequently adopted an approach that ignores the cross-sectional dependence (e.g., ordinary least squares OLS). The objective of this paper is to help identify the contexts in which ignoring the dependence would lead to serious misstatement of significance levels. Cross-sectional dependence in stock returns data is likely to exist when at least some of the returns are sampled from common time periods. This would be the case in all studies of the reaction of stock prices to a regulatory event (e.g., Leftwich [1981], and Schipper and Thompson [1983]), and in many studies of the relation between stock prices and accounting data (e.g., Ball and Brown [1968], Beaver and Landsman [1983], Beaver, Clarke and Wright [1979], Biddle and Lindahl [1982]). In such contexts, procedures based on the assumption of independence can yield biased estimates of standard errors, and, therefore, can lead to incorrect inferences. Previous literature provides mixed predictions about the seriousness of bias that would arise when cross-sectional dependence in the data is ignored in typical accounting and finance contexts. Some have described contexts in which the bias may be small. Christie [1986] briefly reviews three accounting studies that involved cross-sectional OLS regressions of stock return metrics against firmspecific variables, and finds no evidence that cross-sectional dependence in the data cause serious bias in standard errors. Brown and Warner [1980, 1985] draw similar conclusions with regard to event studies in a simulated environment. 1 On the other hand, Collins and Dent [1984] and Sefcik and Thompson [1986] describe hypothetical situations where, when cross-sectional dependence is ignored, true standard errors of estimates would exceed reported standard errors by several orders of 1 This conclusion holds in Brown and Warner, so long as market-wide effects are removed from stock returns metrics. They noted that the generality of their conclusions may be limited, since their study of event-time clustering did not also consider the effects of industry clustering. Dyckman, Philbrick, and Stephan [1984] also investigated simulated event studies, but their study of event day clustering focused only on power, as opposed to the size of tests. It is the error in the reported size of tests that is of concern here. 1

magnitude. Beaver, Clarke, and Wright [1979, pp. 329-332], Schipper and Thompson [1983, Table 9] and Hughes and Ricks [1984, Table 2] describe empirical studies in which indicated significance levels vary substantially, depending on whether residual cross-sectional correlation is taken into account when calculating t-values. This paper seeks to identify characteristics of the research context that determine the amount of bias that can arise when cross-sectional dependence is ignored. In the process, the paper attempts to reconcile conflicting statements in the literature regarding the seriousness of the bias. The analysis indicates that, among common research contexts in accounting, there is one in particular where the bias may frequently be important, and where the usual alternatives to OLS would be either infeasible or inadvisable. Within this context, two empirical studies are conducted, and the degree of bias that would exist in OLS-based standard errors is assessed. The studies include one similar to Beaver and Landsman [1983]. The conclusion is that bias in the standard errors of the Beaver-Landsman study might have been large enough to influence conclusions about the incremental information content of historical cost income, relative to current cost income. A more detailed summary of this paper is as follows. Section 2 discusses the difficulties of overcoming bias due to cross-sectional dependence in the data, including problems that can arise when generalized least squares is applied to samples sizes common in accounting and finance research. The discussion then turns to the implications of ignoring cross-sectional dependency in the data, and using OLS. In order to gain a preliminary indication of the seriousness of the problem, Section 3 provides information about the degree of cross-correlation in market model residuals, the stock return metric most frequently used as a dependent variable in market-based accounting research. Estimates of the magnitude of cross-correlations are furnished, both within and across industries, and for daily, weekly, monthly, quarterly, and annual data. The evidence suggests that problems in inference may be much more likely when the return interval is long. The underlying reasons for the relevance of the return interval length are surprising, since they could involve (perhaps mild) deviations from weak-form market efficiency.2 2 The possibility of deviations from weak-form market efficiency have also been entertained recently by DeBondt and Thaler 119851 and Fama and French [1986a,b]. 2

Section 4 discusses a model of the bias that exists in OLS-based estimates of coefficient variances, and attempts to identify those contexts in which the bias would be most problematic. Most previous studies have examined a special case (a certain type of "event study") where the bias under OLS would frequently be severe; however, this special case is one where feasible alternatives to OLS are likely to be available. Whether serious bias would exist in remaining contexts depends on factors not considered heretofore. These remaining contexts include cross-sectional OLS regressions of stock return metrics against firm-specific variables. Although several have suggested that bias in such studies should not be serious so long as the sample is well-diversified across industries, the analysis shows that the bias can be serious no matter how well-diversified the sample may be. Furthermore, Christie's [1986] conclusion that cross-sectional dependence does not appear to create serious problems in these studies may not extend beyond studies based on short (daily or weekly) return intervals. For studies involving cross-sectional regressions of quarterly or annual stock return metrics against firm-specific variables (e.g., Beaver and Landsman [1983]), it appears that the use of OLS might frequently lead to serious bias in standard errors, depending on certain properties of the regressors, and the sample size. Unfortunately, it is this class of studies for which approaches to dealing directly with cross-sectional dependencies are most difficult, and for which there is little empirical evidence concerning the magnitude of the bias that might result if the dependencies are ignored. Section 5 examines two cross-sectional returns studies based on quarterly and annual data, and estimates the bias that persists in OLS-based standard errors. Both studies employ more time series data than were available in prior research, which permits analyses of bias that were previously infeasible. One application is a study of the incremental information content of current cost income, based on a methodology modeled after Beaver and Landsman [1983]. The data in our application span 20 years for 113 firms. The second study examines the incremental information content of cash flows and accruals; it employs quarterly balance sheet data that have been disclosed since 1976. In both studies, the conclusion is that denominators of t-values from OLS cross-sectional 3

regressions would contain a substantial degree of bias, at least in large samples (e.g., in excess of 200 firms). Section 6 provides a summary and offers some suggestions about approaches that might prove fruitful in future studies that deal with time-clustered data. 2. THE NATURE OF THE PROBLEM Consider a regression where the dependent variable is stock return metric (e.g., market model residuals), and where the regression residuals are potentially correlated in cross-section. Although ordinary least squares can provide unbiased coefficient estimates in such a context, OLS-based estimates of the corresponding standard errors would generally be biased. Thus, hypotheses tests that rest on OLS-based standard errors could lead to incorrect inferences. There are several approaches that could potentially overcome the bias due to residual crosscorrelation, including 1) two-stage generalized least squares techniques, including seemingly unrelated regression and variance components models, 2) cross-sectional aggregation of the data, and 3) use of a multi-index version of the market model to measure the dependent variable. In this section, we review the difficulties that arise in applying these and certain other approaches. 2.1. Generalized least squares (GLS). A commonly used approach to dealing with crosssectional correlation is to use a feasible Aitken estimator.3 One example of this approach is Zellner's seemingly unrelated regression (SUR); another example, which can be viewed as a special case of SUR where regressors may be identical across equations, is sometimes called a multivariate regression model (MVRM). One disadvantage of these approaches is that error in the estimated covariance matrix introduces inefficiency in estimation of coefficients. Another disadvantage - one that can be crucial in hypothesis testing - is that finite sample properties of SUR and MVRM 3 A feasible Aitken estimator is a GLS estimator in which the correct residual covariance matrix is replaced by an estimate. 4

estimators are known only in certain highly specialized cases.4 As a result, researchers tend to rely on asymptotic properties of estimators. Since asymptotic standard errors of SUR and MVRM coefficients ignore variation in the coefficients attributable to sampling error in the residual covariance matrix, there is reason to suspect that asymptotic standard errors may often understate true standard errors in finite samples. The understatement can be serious in accounting and finance contexts, where researchers have often estimated a covariance matrix for a large number (20 to 40) of firms or portfolios. Since the length of a time series over which researchers might be willing to assume stationarity is relatively short (perhaps five years), the number of observations may be small relative to the number of parameter estimates, and the amount of sampling error in the covariance matrix can be large. This is especially likely when return intervals of a month or longer are used.5 Some recent research provides examples of finite-sample GLS estimates that may be severely biased. In an accounting context where monthly data were used, Marais [1986] compares standard errors from a MVRM to those based on bootstrapping techniques. If the bootstrap standard errors are accepted as "correct," then the GLS-based standard errors are only 1/3 to 1/20 as large as they should be. Hughes and Ricks [1984, Table 9] provide another accounting context (based on weekly data) in which a conventional SUR-based F-test (Theil's F) rejects the null 43 percent of the time at the 5 percent level, during periods when the null should be true.6 Some alternative GLS techniques may be less susceptible to bias resulting from estimation error in the residual covariance matrix. These techniques impose assumptions on the structure of the covariance matrix, and thereby reduce the number of parameters to be estimated. One such approach is the error components model (see Judge, et al. [1980, section 8.4.2]). This approach has rarely been used in accounting. (One example is Ryan [1986], who concluded that the error components model led to misspecification in his context.) Whether such techniques improve upon 4 Phillips [1985] derives the exact finite sample distribution of the two-stage GLS estimator. However, the mathematical description of the distribution is not in a form that would facilitate its use for empirical work, except in very special cases that are not of interest here. I am indebted to Craig Ansley and Laurentius Marais for discussions of this point. 5 Gibbons [1982], Stambaugh [1982], Schipper and Thompson [1983], Hughes and Ricks [1984], Bernard [1986], and Lipe [1986] represent applications where residual covariance matrices were estimated using only 3 to 6 observations per parameter, and thus may contain a substantial amount of estimation error. 6 Hughes and Ricks [1984], Binder [1985] and Schipper and Thompson [1985] all describe a more conservative test (based on Rao's F), the properties of which are sometimes known in finite samples; however, this test can be used only in certain special cases. Specifically, it applies to hypotheses about linear restrictions on coefficients, when regressors are identical across equations. This point is discussed further in Section 4. 5

those discussed above depends on the cost of misspecification introduced by imposing assumptions on the covariance matrix, relative to the benefit of reducing estimation error.7 A final note that applies to each of these GLS techniques concerns their feasibility. Even the more parsimonious GLS techniques, such as the error components model, require a time series of data, and would not be useful when only a few cross-sections of data are available (as is the case, for example, in studies of the information content of FAS No. 33 disclosures). More demanding estimation procedures, such as SUR and MVRM, are infeasible unless the number of time-series observations exceeds the number of cross-sectional units (firms or portfolios); the effective number of time-series observations necessary to yield satisfactory estimates may be much larger. 2.2. Cross-sectional aggregation of the data. Jaffe [1974], Beaver, Clarke, and Wright [1979], Burgstahler and Noreen [1986], and Sefcik and Thompson [1986] all describe methods of aggregating data so as to form a single time series of observations. Hypothesis tests can then be based on the standard deviation in this series of (presumably) independent observations. Of course, this approach is feasible only if a time series of reasonable length exists. Another disadvantage is that, depending on the structure of the model and the method used to aggregate the data, the approach can involve the loss of information and, thus, a reduction in power. A third disadvantage is that conventional applications of this approach rest ultimately on the assumption that the variance of returns in "information periods" is equal to that in "noninformation" periods. Christie [1983] suggests that, for an "average" information period, this approach would have led to understatement of standard errors for each of eleven randomly selected firms, by factors of 2 to 10. (Collins and Dent [1984] describe one approach that could potentially mitigate this final difficulty.) 2.3 Multi-index return-generating models. Although residuals from a single-index market model are typically used as the element of analysis in most market-based accounting research, a model that incorporates extra-market factors could potentially reduce residual cross-correlation. (Matching on industry and other factors can accomplish a similar result.) Collins, Rozeff, and 7 Mundlak [1978] shows that when the error components model is estimated properly, it yields estimates that are equivalent (in expectation) to those of fixed effects models, in which the regression coefficients depend only on within-group variance. Thus, if the groups are defined as industries, any power of the test would have to be derived solely from within-industry variation. 6

Dhaliwal [1981] inserted an industry factor in the market model; use of a multi-index model based on the arbitrage pricing theory is an alternative approach.8 Aside from not necessarily eliminating cross-correlation in the data, a disadvantage of these approaches is that they may capture and eliminate a portion of the treatment effect that the researcher is attempting to detect.9 For example, consider adding industry indices to a market model in a test of the incremental information content of current cost data. The residuals of the expanded market model would reflect only within-industry variation; to the extent that inflation adjustments are affected by factors that vary across but not within industry, the probability of detecting information content is reduced. 2.4 Other approaches. There are other approaches that have been used infrequently (if at all) in accounting or finance, but which could, in certain cases, be used to address problems in inference due to cross-sectional dependence. Computer-intensive methods, including bootstrap methods (Efron [1979]) and randomization (Edginton [1980]) are among the possibilities. Bootstrap methods may be useful in cases where there are enough data to estimate a residual covariance matrix, even if the estimates contain a substantial amount of sampling error (see Marais [1986]). Randomization techniques are especially useful when time can be segregated into "event periods" and "non-event periods." Then statistics estimated within event periods can be compared with distributions of statistics generated from randomly selected nonevent periods to form the basis for a significance test (see, for example, Noreen and Sepe [1981] and Lys [1984]).10 Another possibility is to maintain OLS for purposes of estimating regression coefficients, but to estimate the standard error of the OLS coefficient while taking residual dependence into account. 8 In practice, modeling of industry effects may be a complex process that cannot be accomplished by simply inserting an industry index, or by matching. For a good example of the difficulties that can arise, see Hughes, Magat, and Ricks [1986]. 9 For a detailed examination of conditions under which an industry index would eliminate all or a portion of the treatment effect, see Salamon [1985]. 10 Randomization is also potentially useful in any other context where the researcher is interested in testing the null hypothesis of stochastic independence between two variables. Randomization provides valid tests of this hypothesis even when the data are not independent (Noreen [1986]). It is important to note, however, that the patterns of cross-correlation that cause the bias of concern in this paper are included among the forms of stochastic dependence that would, except in special cases, lead to rejection of the null in a randomization test. Thus, if the researcher is interested in a more specific null hypothesis (for example, that a regression coefficient is equal to zero), then randomization does not necessarily "overcome" the cross-correlation problem discussed here. 7

The generalized-methods-of-moments estimator (Hansen [1982]) developed by Froot [1987] accomplishes this. Froot's approach is a generalization of that recommended by White [1980] for dealing with heteroscedasticity. Although the approach has not yet been applied in any empirical context, it may prove useful even in some cases where a long time series of data is not available. 3. THE MAGNITUDE OF CROSS-SECTIONAL DEPENDENCE IN MARKET MODEL RESIDUALS We now turn to consideration of the effects of ignoring dependence in the data and using OLS. The bias in OLS-based standard errors depends on several factors (to be discussed in Section 4), but one important factor is the degree of cross-sectional dependence in the regression residuals. To obtain some preliminary evidence concerning likely magnitudes of cross-sectional correlation, this section provides estimates of the degree of cross-correlation in market model residuals. Market model residuals or abnormal returns serve as the dependent variable in most stock-return-based accounting studies.ll To the extent that correlations in the dependent variable translate to regression residuals, the estimates will be useful in our subsequent discussions of the degree of bias in different contexts. (Section 5 examines two accounting studies in which the degree of correlation in the dependent variable persists in the regression residuals.) Table 1 summarizes estimates of cross-correlation for daily, weekly, and monthly residuals; in addition, monthly residuals are cumulated to create quarterly and annual data. The table is based on data for all firms for which returns were continuously available on CRSP during the test period noted, and for which no change occurred in the 3-digit SIC code.12 In addition, firms from any 11 Note that the analysis focuses on residuals from the period used to estimate the market model. Prediction errors from outside the estimation period are likely to exhibit even more cross-correlation, due to out-of-sample shifts in marketmodel parameters that covary across firms. For that reason, estimates of the degree of cross-correlation presented here may underestimate that which would exist in studies based on market model prediction errors. 12 Firms were retained for tests based on daily data only if they met requirements for inclusion in tests based on weekly data; that is, returns must have been continuously available over the 1981-1984 period. In selecting firms used in tests based on annual data, changes in SIC codes that resulted from July 1962 reclassification of industries were ignored. 8

3-digit SIC category were excluded unless that category included at least three firms (five in the case of daily and weekly data). Table 1 was constructed as follows. First, firm-specific market models were estimated, where the return on the market was defined as the value-weighted NYSE index.13 Residuals from the market models serve as the basic unit of analysis in the case of daily, weekly, and monthly data.14 To generate quarterly and annual data (and in keeping with conventional practice), market models were first estimated using monthly data, and then monthly residuals were summed to arrive at residuals for quarterly or annual periods.15 Note that the test period for monthly, quarterly, and annual periods extends over 20 to 30 years. However, the estimation assumed that the parameters of the market model remained fixed only over nonoverlapping 5-year intervals within each test period. The second step was to calculate contemporaneous cross-sectional correlations in the residuals, using all available time-series observations. Pairwise correlations were calculated among all firms within each 3-digit SIC industry. In addition, one firm from each industry (that with the lowest CUSIP number) was selected to calculate inter-industry correlations. 16 Some data concerning the content of the sample are presented in the first panel of Table 1. Statistics describing intra-industry cross-sectional correlations appear in the second panel of Table 1; those describing inter-industry cross-sectional correlations appear in the third and final panel. One conclusion to be drawn from Table 1 is that the degree of cross-sectional correlation rises dramatically as the observation interval is enlarged from daily periods through annual periods. The mean intra-industry cross-sectional correlation is only.04 when daily data are used; the mean 13 Results based on an equally-weighted NYSE index are similar. Within-industry cross-sectional correlations in market model residuals tend to be slightly lower than those reported in Table 1. 14 For purposes of measuring weekly returns, a week is defined as five consecutive trading days, rather than a calendar week. 15 An alternative to summing monthly residuals would be to compound monthly excess returns. Sums were used here because they are more frequently used in practice (e.g., Beaver, Clarke, and Wright [1979], Biddle and Lindahl [1982], Rayburn [1986], Bowen, Burgstahler, and Daly [1986]). 16 Only one firm was selected from each industry, so that each correlation between selected firms would represent a crossindustry correlation, and thus tests of hypotheses that all such correlations are simultaneously equal to zero could be conducted using conventional procedures. 9

rises to.09 in weekly data,.18 in monthly data,.24 in quarterly data, and.30 in annual data.17 (A similar effect is observed when the sample firms and periods are held constant across the last three columns of Table 1.18) If noncontemporaneous within- and cross-firm correlations were zero, as would be assumed under weak-form market efficiency with constant expected returns, then the expected degree of cross-sectional correlation in residuals would be invariant to the length of the observation interval. However, even slight deviations of such correlations from zero can cause large differences in measures of cross-sectional correlation based on observation intervals of varying lengths. In this case, small average negative within-firm serial correlation in residuals combines with slightly positive cross-firm, noncontemporaneous correlations to give rise to the effect noted in Table I.19 Appendix A explores some possible explanations for the effect. It is unlikely that the effect could be explained by "noise" in returns caused by, for example, bid-ask spreads or non-synchronous trading. Furthermore, the effect is not fully consistent with an explanation based on non-stationarity in market model parameters. One possible explanation is (perhaps mild) departures from weak-form market efficiency, possibly in combination with market model parameter non-stationarity. Departures from weak-form market efficiency have been entertained recently by DeBondt and Thaler [1985] and Fama and French [1986a,b].20 A second observation about Table 1 is that the variance of within-industry cross-correlations, like the mean, rises as the observation interval increases. The standard deviation (across industry means) in residual cross-correlations that is not attributed to sampling error rises from.05 in daily 17 Table 1 provides estimated standard errors for each of these grand means. The estimates take into account the covariance among cross-correlations estimated for pairs of firms within the same industry, but assume that means for different industries are uncorrelated. The estimation included two steps. First, the variance of the mean cross-correlation for each industry was estimated, by employing the properties of elements of a Wishart matrix (see Press [1972, section 5.1.5]). In that calculation, estimated cross-correlations were substituted for the correct (but unknown) cross-correlations. The second step was to estimate the variance of the grand mean of the industry means, which was assumed equal to the sum of the variances of means. 18 When the same 274 firms used in Table 1 to calculate cross-correlations in annual data for 1955-1984 were also used to calculate cross-correlations in monthly and quarterly data for the same period, the mean within-industry cross-correlations are.19 for monthly data and.26 for quarterly data, compared to.30 for annual data. 19 First-order serial correlations on a within-firm basis, when averaged across firms, were -.01 in daily data, -.09 in weekly data, -.09 in monthly data, and -.12 in quarterly data. Serial correlations at longer lags were, for the most part, smaller in absolute value but still negative. Average across-firm, within-industry, noncontemporaneous correlations for lag one were.01 in daily data,.02 in weekly data,.03 in monthly data, and -.03 in quarterly data. 20 DeBondt and Thaler [1985] discuss an apparently profitable trading strategy based on negative within-firm serial correlation in abnormal returns. Fama and French [1986b] are able to replicate the result, but question its robustness. Fama and French [1986a] document strong negative serial correlation in returns over long intervals at the market-wide level, and find especially strong negative serial correlations in the returns of small firms. The evidence is consistent either with market inefficiency, or with an equilibrium model that permits fluctuating expected returns. 10

data to.10 in monthly data, and.15 in annual data.21 Subsequent sections demonstrate that this increase in the variation can translate into more serious bias in studies based on long return intervals. A final observation about Table 1 is that the degree of inter-industry cross-sectional correlations is small relative to intra-industry correlations. (The hypothesis that all inter-industry correlation are simultaneously equal to zero can be rejected at the.05 level, but this result is not surprising, given how narrowly defined the industries are in Table 1.) An interesting empirical question is whether the inter-industry cross-correlations are large enough to cause serious bias in OLS-based standard errors; Section 5 will present two studies in which the answer appears to be negative. 4. THE LIKELIHOOD OF BIAS IN TYPICAL RESEARCH CONTEXTS We now develop a model of bias in OLS-based standard errors which, in conjunction with the data already discussed, can help identify research contexts in which the bias is most likely to be serious. The following questions will be addressed: 1. What factors determine the extent of the bias? 2. Under what conditions is the bias likely to be serious in common classes of studies ("event studies" and "cross-sectional returns studies")? 3. Can diversification across industries mitigate or eliminate the problem? 4. What is the relation between sample size and the bias? 5. What is the effect of the choice of return interval (daily, weekly, etc.) on the bias? 4.1 A model of bias in OLS-based variance estimates. Assume we have data for N firms for each of T periods. We consider a regression of an (NT x 1) vector of stock return metrics, denoted 21 This estimate was obtained by subtracting, from the observed variance in industry means, the variance that is estimated to be due to sampling error. The sampling variance across industry means was estimated as the mean of the sampling variances of the industry means. In turn, the sampling variance of each industry mean was estimated by employing the properties of elements of a Wishart matrix (see Press [1972, section 5.1.5]). In that calculation, estimated cross-correlations were substituted for the correct (but unknown) cross-correlations. 11

by R, against an (NT x K) matrix of independent variables denoted by X. (Boldface Roman letters are used to denote vectors and matrices.) R = XB + e. (1) We assume that the residual vector e has mean zero, and is uncorrelated with each of the columns of X. We also assume that the residuals are serially independent,22 and that the residual covariance matrix is constant over time. (If X is viewed as stochastic, our assumptions about the behavior of e hold conditionally on X.) Under these assumptions, we can write the residual covariance matrix E[ee'] as follows: E[ee'] = V = a2A = a2[IT ~ P], (2) where a2 = TraceE[ee']; Pij = ajPij; Pi = o' for i = j. The (N x N) matrix cra2 contains the variances and contemporaneous covariances between residuals. The matrix is defined so as to permit contemporaneous residual cross-correlations to take on any pattern. and also permits cross-sectional heteroscedasticity in the residuals. The diagonal elements of a2P are a2a?, where ar2 is the average (cross-firm) residual variance, and a? captures the deviation of firm i's residual variance from the average. For firms with largerthan-average residual variance, ao" exceeds 1, and vice versa. The off-diagonal elements of ao2P are a2o'iojpij, where pij represents residual cross-correlations. 22 This assumption obviously cannot be problematic when we discuss studies conducted within a single cross-section. In a pooled time-series cross-sectional analysis, serial correlation in market model residuals could induce bias in standard errors in addition to that considered here. This bias is likely to be of second-order importance, however. 12

The vector B, the matrix X, and the dimension K can be defined so as to permit the regression coefficients to vary across firms, or to be equal across firms. If all the coefficients are permitted to vary across firms, we have the classic seemingly unrelated regression framework. In that framework, OLS provides unbiased estimates of coefficient variances. In contrast, if at least one coefficient is constrained to be equal across firms, then the OLS estimate of the variance of that coefficient can be biased. This latter case is the one that motivates this paper; it includes nearly all cross-sectional returns studies (e.g., Beaver and Landsman [1983], Leftwich [1981]), as well as event studies that focus on the mean event-period return. In the analysis which follows, we focus on the variance of any coefficients that are constrained to be equal across all firms in the sample, which implies that each associated regressor will occupy a single column in X (i.e., values of the regressor for different firms will be "stacked"). The essence of the analysis is to assume that OLS is used to estimate the coefficent vector B, and then to compare the correct variance of an OLS coefficient with the OLS-based estimate of the variance. Denote the correct covariance matrix of the OLS coefficients as C; it is equal to a2(X'X)-1(X'AX)(X'X)-1. Denote the expected value of the OLS estimate of the covariance matrix as E[C]; it is equal to E[c2](X'X)-1. Throughout the remainder of this paper, we will refer to a measure of bias, equal to the correct variance of an OLS coefficient, divided by the expected value of its estimate. For example, for the variance of the ktI coefficient, the measure of bias is equal to the kt diagonal element of C (written here as Cek), divided by the kth diagonal element of E[C] (written here as E[Ckk]) Ck F a12 ]' r X'X) -1 (X'AX) (X'X)-lk Bias =[ E[t] [2] [ It(X' )XI - (3) where lk = (K x 1) vector with kh element equal to one, and other elements equal to zero. In general, the measure of bias in the estimated variance of a coefficient is a cumbersome function of values of all regressors, as well as residual covariances. However, it is possible to derive a simple, yet equivalent representation of the bias if we consider a transformation of the 13

problem. The simplification is made possible by viewing the X matrix as one that has already been transformed so that its PIc column is orthogonal with respect to each of the other columns.23 This transformation leaves unaffected the coefficient of the kth regressor and the estimated standard error of that coefficient (Christie, Kennelly, King, and Schaefer [1984]). Thus, so long as we restrict attention to the OLS estimates corresponding only to the the It regressor, the problem is equivalent to that existing before the transformation. Now, however, the measure of bias to be used here can always be written in terms of only the kh column of the transformed X matrix, rather than the entire (untransformed) matrix of regressors. When the kt regressor is orthogonal with respect to all other regressors, then equation (3) simplifies to Bs[ _icka 1 [ 2] j Bias E['E-t] = [ ] [(XrXk) (XAX)]. (4) where Xk=the kh column of the matrix X. Greenwald [1983] derives a measure of bias in OLS-based variance estimates for a general case that would include the model described by equations (1) and (2). The measure of bias given by equation (3) is a transformation of that used by Greenwald, and thus the measure of bias given by equation (4) is a transformation of Greenwald's measure for the case where the kt regressor is orthogonal to the remaining regressors. Greenwald explains that in most contexts, the term a2/E[ci2] in equation (3) or (4) will have little impact, relative to the remaining component of the bias. For that reason, we will focus on the remaining component. We can rewrite that component of the bias in equation (4) as shown below, by employing equation (2). 23 The transformation is accomplished by regressing the kth regressor against all remaining regressors, and then using the residuals of that regression in place of the kh column of the original X matrix. This approach is used, for example, in Beaver and Landsman [1983]. Since our interest lies only in coefficients that are constrained to be equal across firms, the kth column of original X matrix contains values for all firms, and the orthogonalizing transformation involves either a single cross-sectional regression (if T = 1), or a single pooled regression (if T > 1). 14

Bs= - [(XXk) (X- AXk)] Bias siE[CIkk [(X Xk) R(Xk[IT 0 Pl]Xk) N N N N N = 1 + N jwij + NE (ai- l)Pjw + ( ~ 1)Ui/U, (5) t=1 i~j i=1 ifj i=1 where wij = T( itjt )/U; Ui= T X t; U= Ui; 1i - t=1 i and Xit = value of kt regressor for firm i, period t. Equation (5) decomposes the bias in OLS-based variance estimates into three parts. The first term beyond '1' on the right-hand side of (5), N ZE Nl j4 pjywij, captures the bias due solely to residual cross-correlation. The last term, ZEN (o,2 - 1)Ui/ U, capture bias due solely to residual heteroscedasticity (across firms). The term in the middle, Et=, E ij(aj - l)pjjwiy, captures bias that arises due to an interaction between cross-correlation and heteroscedasticity. The purpose of this decomposition is to facilitate linking the model with data later in Section 4. Observations on the bias due to cross-correlation. The bias due solely to cross-correlation, ZEN, EZpiiwii depends on the residual cross-correlations, Pi, and certain functions of the regressors, denoted wiy. The terms wiy will be labeled here as regressor cross-correlations, or measures reflecting regressor cross-correlation. Each wi, term can be written as 1 t= l XitXjt IN i= 1 t= itX2 This ratio is equal to the calculated sample cross-sectional correlation between values of the tk0 regressor for firms i and j if the calculation assumes that a) values of the ih regressor for each firm have an expected value of zero, and b) the kt regressor is homoscedastic across firms, so that a pooled variance estimator is used. Of course, in general, these assumptions would not hold.24 24 Condition (a) would, however, be guaranteed when the ki regressor is orthogonalized, so long as EIXit] is identical across i. This would be the case when the regressor represents the unexpected component of some accounting variable. 15

Nevertheless, it is useful to attach a brief label to the term wiy, and we will refer to it as regressor cross-correlation in this paper. The term in (5) that describes the bias due solely to cross-correlation, EN 1,Zitj Pywij, can be rewritten as follows:25 N EEjwij = (N-i ) N(N 1) i=-1 ij N N i= Pijwi i= 1 iWj = (N - 1)Covijj[pii, wi] + (N - l)pw (6) N N where p = N(N - 1) p' v =1 fvpj N N and - N(N - 1) E ii i=1 ioj When the kit regressor is orthogonal to the remaining regressors, including an intercept, then the kl regressor has a zero mean. In that case, w = -1/(N- 1),26 and (6) simplifies to: N N N L piiwi = (N - 1)Covjij[pij,wiy] - i=1 ij (7) An equivalent result was first obtained within a context this general by Greenwald [1983, Proposition 1]. 25 The term Covif[pijywiy] cannot literally be interpreted as a covariance, since pi and wi are not random variables. The operator Cov is used here as a descriptive measure, defined by equation (6). It is referred to as a covariance for convenience. 26 This can be shown as follows: T N N T N N N = TN(N - 1)U XtXjt TN(N - 1)U x(itLxt - Xt) t==l i= joi 1 i=l j=i i=l Since NZ 1Xit =0, we have: T N T N(N-)- (-NU) = -1(N - 1). =-TN(N - 1) TN(N - l)U U t=- 1 i=1t 16

We can immediately make the following observations about the bias due solely to crosscorrelation: * First, in order for bias to exist, there must be cross-sectional dependencies in not only the residuals, but also in the regressor. That is, both pij and wij must be nonzero for at least some pairs of firms. * Second, the bias depends not only on the magnitude of the residual cross-correlations pij and the regressor cross-correlations wy, but also their covariability; higher covariability translates into greater bias. When the focus lies on a slope coefficient and the intercept includes an intercept, as in (7), then bias can exist only if there is a nonzero covariance between pijy and wiy. The implication is that the bias will be most serious when unidentified factors (e.g., industry effects) that lead to a large correlation between residuals for some firm-pairs also lead to a large correlation between values of the regressor for the same firm-pairs. * Third, it is only cross-correlation in the orthogonal component of the regressor that ultimately affects the bias. (To see this, recall that we are always able to write the bias in terms of wi, the cross-correlation in the orthogonalized regressor, even though the process of orthogonalization does not affect the coefficient of that regressor, its estimated variance, or the bias in that estimate.) An implication is that if there is cross-sectional correlation among values of each of two or more regressors only because of common industry effects that do not persist in the orthogonal components of the regressors, then no bias (from this source) would exist in the OLS-based variance estimates. * Fourth, equations (6) and (7) suggest a relation between sample size and bias. In the special case where Cov[pj,wy], p, and w are held constant and positive as we increase the sample size, the relation is simple; the bias increases linearly in N. We will consider more general cases later. The industry effects model. The final stage of our model development is to derive a measure of bias for what will be called the "industry effects model." In the industry effects model, we group the sample of N firms into P representatives from each of 1 or more industries. The N(N - 1) 17

firm-pairs that can be created from the set of N firms are segregated into N(P - 1) pairs of firms where each member of the pair belongs to the same industry, and N(NT - P) pairs of firms where each member of a given pair belongs to a different industry. The two sets of firm-pairs are referred to as within-industry pairs and cross-industry- pairs, and parameters corresponding to the two sets of pairs are subscripted with a W and a C, respectively. We define: Piw = mean of N(P - 1) values of pij, where firms i and j belong to the same industry; pa = mean of N(N- P) values of Pij, where firms i and j belong to different industries; ww = mean of N(P - 1) values of w1y, where firms i and j belong to the same industry; wC = mean of N(N - P) values of wi, where firms i and j belong to different industries; Cov w[pij, wi] = covariance between piy and wiy, across the N(P - 1) firm-pairs where firms i and j belong to the same industry; Covc[pi, wJ] covariance between Pij and wi,, across the N(N - P) firm-pairs where firms i and j belong to different industries. Using these definitions, we can now rewrite the bias due solely to cross-correlation as follows: N N N -pijwi- =(P - 1)Covw [Pij, wij] + (P - 1)p w w i=1 i*j + (N - P)Covc[pj,wij] + (N - P)Powc. (8) (P-l - 1 27 When the regressor has a mean of zero, then wC =- (p) W - (N )27 and (8) can be simplified: N N N Epiiwij = (P - 1)Cov w[pij, wi] + (N - P)Cov c[Pij, wi] + (P - l)(p w - Pc) w - Pc (9) i=1 ij Thus far, introduction of the "industry effects model" has simply involved rewriting equations (6) and (7) in a new form. Now we introduce an assumption: that covariation between cross-industry 27 This can easily be shown by noting that * = ((P - 1)/(N - 1))* W + ((N - P)/(N - 1))*c, while recalling that when the regressor has mean zero, w = -1/(N - 1). 18

residual correlations and regressor correlations contributes little to the bias, and thus the term (N - P)Covc[pi,wijy] can be ignored. Although this would not hold in general, section 5 will examine two studies for which the assumption holds approximately; furthermore, the assumption simplifies the explication of the remainder section 4. When this assumption is maintained, equation (9) can be simplified as shown here: N N 1 N pijwij = (P - 1)Cov w[pij, wij] + (P - 1)(pw - Pc)' w - PC (10) t=1 ti 4.2 Combining model and data. We now turn to our assessment of the likelihood of bias in various research contexts. The discussion throughout the remainder of this section will be supported by Table 2 and Figure 1, and so we begin by explaining their development. Table 2 and Figure 1 provide descriptions of the bias in different research contexts. Table 2 provides formulae for the bias in four cases; Figure 1 combines those formulae with some data from Table 1 to provide graphical representations of the bias. The four cases described by Table 2 arise as follows. We first segregate the general case (equation (6)) from the special case where the industry effects model applies (equation (8)). For each of these two cases, we then consider two alternative research designs, referred to here as the "event study" and the "cross-sectional returns study." The "event study?. The term "event study" is used here in a special sense. That label is applied only to studies in which the researcher is interested in assessing the significance of the average stock return metric for the sample of N firms for some set of periods, called "event periods." The event(s) is (are) assumed to occur during the same period(s) for all firms. Although it is typically not done, a study of this kind can always be modeled as a regression with only one regressor, where the regressor takes on a value of one during an event period, and zero otherwise. The coefficient of that regressor is equal to the average event-period stock return metric. In many event studies of this kind, the significance of the average event-period market reaction is assessed by aggregating data for the N firms to create a single time series of returns for a portfolio. 19

In terms of our notation, this is equivalent to a situation where N is equal to one. A glance at equation (6) confirms that the bias due to cross-correlation must be zero in that case. The bias can be nonzero only when N exceeds one. This would be the case if one used OLS in a pooled timeseries cross-sectional design, or if one used the cross-sectional standard deviation of the stock return metric during the event period to assess the significance of market reaction. Few have used these approaches recently, except to demonstrate the dangers of bias due to residual cross-correlation. Nevertheless, examining these cases will prove valuable for appreciating the implications of Collins and Dent[1984], Schipper and Thompson [1983], Hughes and Ricks [1984], Sefcik and Thompson [1986], and others. Bias in the standard error of the coefficient in the event study is described in the top row of Table 2. Each formula is equal to one plus the bias due solely to cross-correlation. The entry in the first column represents the general case and that in the second column represents the case of the industry effects model. The two formulae are derived by adding one to the expressions in (6) and (8), respectively, while noting that in the event study context, the terms wij must always be equal to one.28 "Cross-sectional returns studies". The second set of studies considered includes OLS crosssectional regressions and OLS pooled time-series cross-sectional regressions of stock return metrics against firm-specific variables. This category could include a wide variety of studies, including most information content studies (e.g., Beaver and Landsman [1983], Rayburn [1986], Bowen, Burgstahler, and Daley [1986]), and some studies that have sometimes been labeled elsewhere as "event studies." Examples of the latter would include studies of the economic consequences of mandated accounting policy changes (e.g., Leftwich[1981], Lys[1984]). It is assumed that the regression includes an intercept, and that our interest is focused on the sampling variance of the slope coefficient for the kta regressor. The formulae for the cross-sectional returns studies are 28 Recall that wi = (l X txjt)/ U where U= Z i1 T Xit. During an event period, the terms XitXjt are both equal to one; outside an event period they are both eual to zero. If there are E event periods, then then the numerator of wy, I T 1 XtXjt, is equal to (E); the denominator U = 1 EN 1 f lT 12 is also equal to 4r(E); and wyj itself is equal to one. 20

presented in the bottom row of Table 2. The formula in the first column is equal to one plus the expression in (7); in the second column, where the industry effects model applies, the formula is given one plus the expression in (10). A graphical representation of the bias. Figure 1 provides a graphical representation of the formulae in Table 2 that correspond to the industry effects model. The top two graphs show the amount of bias that might exist in the event study. These graphs are generated using the formula in the upper right-hand corner of Table 2, while assuming that the mean within-industry residual cross-correlations, pw, and the mean cross-industry residual cross-correlations, Pc, are equal to the corresponding estimates from Table 1.29 Obviously, any given study might employ a sample that differs substantially from that on which these estimates are based, and thus there is no suggestion that the graphs could be used to assess the likelihood of bias in a specific study; nevertheless, the graphs in Figure 1 will be useful in contrasting the order of magnitude of bias that could reasonably arise in different classes of studies. The four graphs at the bottom of Figure 1 show the amount of bias that might exist in a cross-sectional returns study. The graphs are based on the formula in the lower right-hand corner of Table 2. Again, estimates of pw and Pc are taken from Table 1. Mean within-industry crosscorrelations in the regressor, Eiw, are allowed to take on values of either.10 or.40. To obtain values for the term Cov w[pijwi], we first decompose that term as follows: Cov w [pywj] = pacrCorr w [Pjwij] where ap and aw represent standard deviations of pij and wj, respectively. For ap, we substitute estimates of the standard deviations of industry means of within-industry residual cross-correlations from Table 1.30 (The standard deviations used are those that remain after elimination of estimated variation in industry means due to sampling error.) The product awCorr w[pjywj] is permitted to 29 Specifically, the value substituted for p W is the grand mean across industry means of within-industry residual crosscorrelations. The value substituted for PC is equal to the mean (across firm-pairs) residual cross-correlation, where the members of each firm-pair are selected from different industries. 30 Note that we are using an estimated standard deviation of industry means, which would tend to understate the standard deviation across firm-pairs. To that extent, the estimated degree of bias in Figure 1 is conservative. 21

take on values ranging from 0.0 to 0.4.31 Since we permit this product to take on a range of values, the bias in the graphs always appears as a region, rather than a line. Note that in Figure 1, the estimates of pw, PC, and au are all based on cross-correlations observed in market model residuals, which serve as the dependent variables in many cross-sectional returns studies. What we are in fact interested in is the behavior of the regression residuals in those studies. Although Section 5 will present two cross-sectional returns studies in which the degree of cross-correlation in the dependent variable does translate to the residuals, that would not be true in general. However, it may often hold approximately in the many studies where the regressors explain only a small portion of the variance in the dependent variables. 4.3. Discussion of bias in different research contexts. We now turn to our assessment of the likelihood of bias in various research contexts, using Table 2 and Figure 1 to support the discussion. We discuss three points. First, we contrast the likelihood of bias in event studies with that in crosssectional returns studies. Second, we discuss conditions under which the bias is most likely to be serious in cross-sectional return studies. Third, we consider the impact of diversification and sample size on the bias. 4.3.1. "Event studies" versus "cross-sectional returns studies." Papers providing mathematical descriptions of the amount of bias due to residual cross-correlation include Collins and Dent [1984], Sefcik and Thompson [1985], and Kothari and Wasley [1986]. All the papers conclude that residual cross-correlation could frequently lead to a substantial degree of bias in estimated standard errors. Recent empirical studies that are consistent with this point include Schipper and Thompson [1983] and Hughes and Ricks [1984].32 This collection of studies, however, has not focused on a variety 31 Although it is difficult to determine "reasonable" values for these terms, it appears that only in unusual circumstances would awCorr W[pjwij] be larger than 0.4. To see this, note that the terms wij can range from -1 to 1; imagine that they are uniformly distributed over that interval. In this case there is no clustering of values of w i around a central moment, and thus the distribution would have a higher variance than one might expect in most contexts. In this case, aw would be equal to.57. Now imagine that Corr W[pijwij] is as high as.80. Even when these two cases occur simultaneously, the product awCorr W[pijwiy] would only be 0.46. Values much closer to zero may be much more likely. 32 These studies find that the OLS- and GLS-based standard errors differ substantially, which is at least consistent with a large bias in the OLS-based estimates. See Schipper and Thompson's Table 9, or Hughes and Ricks' Table 2. 22

of research contexts; in each case, the context is equivalent to what is referred to here as the event study.33 Bias due to cross-sectional dependence can indeed be substantial in the event study context. The formula in the upper left-hand corner of Table 2 indicates that the bias is a function of the sample size minus one, (N - 1), multiplied by the average degree of residual cross-correlation, p. If the sample is not well-diversified and thus p is large, the bias could be serious. Even if the sample includes a variety of firms and p is relatively small, the bias could be serious for large N.34 The first graph in Figure 1 shows that even for a sample that includes equal representation from twenty industries, true standard errors might exceed estimated standard errors by a factor of five in a sample of 100 firms when monthly data are used, and by a factor of three when weekly data are used. Fortunately, in event studies, methods of avoiding bias due to residual cross-correlation are most straightforward, so long as serial homoscedasticity in returns is a reasonable assumption.35 It is nearly always feasible to aggregate firms into a portfolio, and base hypothesis tests on the timeseries variance in the portfolio return. The generalized-methods-of-moments estimator developed by Froot [1986] would also be feasible here, and would not involve the loss of information that sometimes occurs when aggregating across firms. Randomization presents another useful alternative in this case. Finally, GLS would be more readily justifiable in this context than in others. One reason is that, if daily or weekly data are used, there may enough observations to avoid much of the small sample bias discussed in Section 2. In some cases, small sample bias may not be an issue anyway, since hypotheses about some linear restrictions on firm-specific coefficients can be tested within the GLS framework by employing Rao's F (Rao [1973, section 8.c.5]).36 33 In fact, the mathematical formulae used by Collins and Dent[1984] and discussed further by Sefcik and Thompson [1985] and Kothari and Wasley[1986] are all special cases of the formula in the upper left-hand cell of Table 2, where pij is assumed constant across firm-pairs. 34 A relation would likely exist between N and p. If a large N is obtained by constructing a well-diversified sample, then the value of p would tend to be low. 35 Indeed, that is the point of much of the literature in this area. 36 The advantage of Rao's F is that, unlike most GLS-based test statistics, its finite sample properties are known exactly, so long as the number of linear restrictions being tested simultaneously is fewer than three. The use of Rao's F is not, of course, sufficient to insure that standard errors are free of bias from any source; two event studies that employed Rao's F (Hughes and Ricks [1984] and Linsmeier [1986]) concluded that it tended to reject the null more frequently than it should. 23

The interesting question that remains is whether the bias might be serious in the cross-sectional returns studies, where approaches to dealing with cross-sectional correlation would often be difficult or infeasible. In general, the bias in a cross-sectional return study could be either smaller or larger than that in an event study (for a given sample size and a given residual correlation matrix). However, under reasonable assumptions, the bias in the former would have to be smaller. Consider the formulae for the industry effects model in Table 2, and assume that cross-industry residual correlations are on average zero (Pc = 0). Also assume that variations of within-industry residual cross-correlations about their mean are either 1) small relative to the magnitude of that mean, or 2) not correlated with the variation in corresponding within-industry regressor cross-correlations, so that the contribution of the term Cov w [py, wiy] to the bias in the cross-sectional return study can be ignored. Then the measures of bias for the event study and the cross-sectional return study are: event study: 1 + (P - l) w cross-sectional return study: 1 + (P - 1)pwww The term W w does not appear in the formula for the event study, because in that case it takes on a value of one, which is the maximum value it can assume. In that sense, the event study with simultaneous event dates represents that context where the bias is maximized, for a given level of residual cross-correlation and a given sample size.37 Thus, under the assumptions invoked here, the bias in the cross-sectional return study must be smaller. Of course, that does not imply that the bias would never be serious. We now turn to a more complete description of the potential for bias in cross-sectional returns studies. 4.3.2 Bias in "cross-sectional return studies." Assessing the potential for bias in the crosssectional returns studies is more difficult than in the event studies. In cross-sectional returns studies, the bias depends on not only the sample size and the degree of cross-correlation in the 37 With respect to the bias due to heteroscedasticity, the event study represents an extreme in the opposite direction; that bias must be zero. To see this, examine the last term in equation (5). When the regressor takes on values of one during event periods and zero otherwise, then Ui/U must always be equal to one, and the last term of five becomes zN1 l(a2 - 1). This term must be equal to zero, since by construction E/Nil (a2) = 1. An implication is that, while standardizing residuals in event studies might be justified in terms of increased efficiency of coefficient estimation, there is no reason to adopt such procedures because of concerns about bias in standard errors due to cross-sectional heteroscedasticity. 24

residuals (Piy), but also on the degree of cross-correlation among values of the regressor(w y), as well as on the degree of covariation between pij and wiy. Figure 1 presents graphs that permit the relevant parameters to take on a variety of values discussed in Section 4.2. Graphs are presented for both daily and annual data; graphs for weekly, monthly, and quarterly data would, of course, fall between these two extremes. Unlike event studies, which usually employ returns measured over short intervals, cross-sectional return studies use a wide variety of return intervals. In a cross-sectional returns study, if the degree of cross-correlation in regression residuals is no greater than that observed in market model residuals measured over daily intervals, the degree of bias would frequently not be serious. In the four graphs in Figure 1 that correspond to cross-sectional returns studies. the ratio of correct variance to estimated variance ranges from 1 to 1.9 for daily data, but that ratio would always be less than 1.5 so long as the sample size is less than 150 firms, or the sample includes 20 industries, rather than 10. 38 In contrast, when cross-sectional returns studies are based on annual data, the bias could frequently be serious. When the measures reflecting within-industry regressor cross-correlation are as high as.40, the ratio of correct variance to its estimate is always in excess of 3, for a sample size exceeding 150. The ratio sometimes exceeds 5 for samples of 300. The reason for the seriousness of the problem in annual data, relative to that in daily data, is that within-industry residual crosscorrelations are higher on average, and more variable across firm-pairs (even after taking sampling fluctuation into account). The suggestion that the bias could be serious in at least some cross-sectional returns studies may initially appear inconsistent with Christie's [1986] conclusion: "... residual dependence may have a relatively small influence on significance levels, at least in studies that include a spectrum of industries, even when the event date is common to all firms." In order to reconcile Christie's statement with our discussion, note that Christie's statement is based on a review of studies that 38 Even if residual cross-correlation were potentially problematic in a study based on daily or weekly data, there would often be enough data to support reliable alternatives to OLS. Use of short return intervals most frequently arises when attempting to explain cross-sectional differences in the reaction to some common event (e.g., Leftwich [19811, Lys[1984]). In that context, since returns data outside the event window are usually available, and since the values of the regressors hypothesized to explain the cross-sectional differences are known to be zero outside the event window, there may be enough data to generate reliable results with GLS. The same conditions make randomization a reasonable approach; examples are Lys [1984, section 5.61 and Noreen and Sepe [1981, p. 260). 25

employed only daily or weekly data. Christie's conclusion may well hold for many studies within that category, but may not apply to studies that employ quarterly or annual data.39 Whether cross-sectional dependence would be problematic in those studies is difficult to assess. It is troublesome that the degree of bias may be most serious in the studies based on quarterly or annual data. For these studies, alternatives to OLS are frequently infeasible, and no attempts to estimate the resulting bias have yet been undertaken. The reason is that, given the number of quarterly or annual cross-sections that is typically available, it is difficult to estimate the residual correlation matrix. As a result, researchers are forced to be tentative in drawing conclusions. For example, Beaver and Landsman [1983] qualified their analysis by stating that "until a longer time series becomes available, the t-values are to be viewed largely as descriptive statistics, rather than be taken literally." Note also that while studies based on short return intervals have grown more popular recently, it is still common to employ data for longer intervals.40 4.3.3. The relationship between diversification, sample size, and bias. Consider expanding the breadth and size of the sample, so that it approaches the content of the market portfolio. Ball [1975] has shown that, as this occurs, the average cross-sectional correlation among the residuals approaches an amount that is negative and close to zero. This observation has led some to conclude that so long as a sample is well-diversified, cross-correlation should not create serious bias in standard error estimates. We explain here that whether this holds true depends on the context. - Within the event study context, the bias due to cross-correlation approaches zero as diversification of the sample drives the average degree of cross-correlation to zero. This can be seen easily by examining the formula in the upper left-hand corner of Table 2. The point represents the logic underlying Beaver's statement (within an essentially equivalent context) that, "for samples 39 Evidence provided by Beaver, Clarke, and Wright [1979] is consistent with substantial bias in a cross-sectional returns study, when annual data are used. For a hypothesis test conducted while assuming residual independence, Beaver, Clarke, and Wright obtain a t-value of 23.6. A test procedure based on the same data, but which implicitly reflects cross-sectional dependence, yields a much lower t-value (10.35). (See Beaver, Clarke, and Wright [1979, pp. 329-332].) 40 Recent examples of studies based on quarterly or annual return intervals include Beaver and Ryan [1985], Beaver, Lambert, and Ryan [1986], Beaver, Eger, Ryan, and Wolfson [1985], Bublitz, Frecka, and McKeown [1985], Collins, Kothari, and Rayburn [1987], Bernard [1986], Bowen, Burgstahler, and Daley [1986], Kormendi and Lipe [1987], Lipe[1986], Rayburn [1986], Ricks [1986] and Ryan [1986]. 26

whose industry composition 'mirrors' that of the market portfolio, concern over industry effects is unwarranted. Industry effects... can be diversified away"(Beaver[1981], p. 179).41 In cases where the regressor does not conform to the special form assumed in the "event study," Beaver's statement does not hold. In the cross-sectional return study, the bias depends primarily on the covariance between residual cross-correlations and regressor cross-correlations, which can be large even when diversification drives the average residual cross-correlation to precisely zero. To illustrate the point, consider the industry effects model, and assume that the sample is so welldiversified that the average degree of residual cross-correlation is zero. Then Pc = - Pjf)Pw, and the formula in the lower right-hand corner of Table 2 can be written: (P- 1)(N- 1) (P- 1) 1 + (P - 1)Cov w[Pij, wi] + (N ( ) P ww + ( - Pw iP C]i (N- P) (N- P) When N (total number of firms) is large relative to P (the number of firms per industry), the formula can be approximated as: 1 + (P - 1)Cov w[pij, wij] + (P - 1)p w w Since Covw[pij,wi wj] and pw w can both be positive when the average degree of residual crosscorrelation is zero, diversification cannot guarantee elimination of the bias in a cross-sectional returns study. This point casts doubt on some previous explanations for why residual dependence might not cause serious bias in OLS-based standard errors in such studies.42 It is also possible to make some general statements about the relation between sample size and bias. The above formula shows that once N becomes large relative to P, the bias depends on P, 41 This point can also be used to predict and assess the generality of evidence presented by Brown and Warner [1980, Table 6, 1985, Table 8]. Within an event study context, and so long as returns are market-adjusted, Brown and Warner detect little or no bias in standard errors calculated in cross-section during the event period. However, this result is what would be expected, given the Brown and Warner design. The Brown and Warner samples are randomly selected, and so the average degree of residual cross-correlation should be close to zero. Thus, the formula in the upper left-hand corner of Table 2 would predict that little or no bias would be detected. The above discussion confirms what Brown and Warner suspected: that such a result would not necessarily obtain if the sample had been selected in a nonrandom fashion. 42 For example, Beaver and Landsman [1983] reference Beaver [1981] to suggest that cross-sectional dependence might not cause serious problems in inference "for a sample that is representative of the market portfolio" (Beaver and Landsman [1983], section 3.7.2]. However, since in the Beaver/Landsman context, the regressors do not take on the special form that is implicitly assumed in Beaver [1981], bias in OLS-based standard errors could be large in spite of diversification. Another example is Christie [1985], who argues that bias might not be serious in cross-sectional returns studies because, so long as no one industry dominates the sample, the average degree of residual cross-correlation is likely to be small. However, as explained above, that condition is not sufficient to assure that the bias is small in the context of cross-sectional returns studies. 27

Covw[pij, wij], and p wWw. There is no reason to believe that the two latter terms would have any relation to sample size. If these terms remain constant and positive while the sample size is increased, then the bias would increase linearly in P, the number of firms per industry. Thus, if the sample is increased by adding firms from industries not previously represented, P is not affected and the bias would remain unchanged (but would not decrease). However, if the sample is increased by including more representatives from the same industries, P increases and the bias would increase in proportion to (P - 1). An increase in the sample size could reduce the bias only if (Cov w[pij, wj] + p w w) were negative; it is entirely possible that such a condition could occur, although it does not arise in either of the two empirical studies to be examined in section 5. 4.4. Summary. Section 4 has suggested that although cross-sectional dependence in the data could create serious problems in inference in "event studies," standard techniques for attempting to deal with the dependence can almost always be used. It is more difficult to predict whether the bias would be serious in cross-sectional returns studies. Although Christie notes no indications of serious bias in a set of three such studies where daily or weekly data are used, there are good reasons to suspect that the same conclusion may not hold when quarterly or annual data are employed, especially if sample sizes are large. Unfortunately, in studies based on quarterly or annual data, there are usually not enough time-series data to permit direct assessment of the bias due to crosssectional dependence. However, in the following section, two studies in this category are conducted, where enough time series data have been accumulated to permit some assessment of the seriousness of bias in OLS-based standard errors. 5. THE EFFECTS OF CROSS-SECTIONAL DEPENDENCE IN SPECIFIC APPLICATIONS This section reviews two empirical studies and assesses the degree of inferential bias that would result from failure to account for cross-sectional dependence. The first study investigates the incremental information content of current cost income. The data employed include current 28

cost income for 113 firms over 20 years, as estimated by Bernard and Ruland [1986a, 1986b]. Several previous studies (e.g., Beaver and Landsman [1983], and Bublitz, Frecka and McKeown [1985]) have examined this issue using OLS cross-sectional regressions over fewer annual periods. The second study investigates the incremental information content of accounting accruals. That analysis employs quarterly data, including balance sheet data, for 104 firms in 19 industries. Such balance sheet data have been disclosed since 1976. Most previous research has examined this issue with annual data (e.g., Bowen, Burgstahler, and Daley [1986], Rayburn [1986], Beaver and Dukes [1972], Ball and Brown [1968]).43 Note that even though both studies employ more time-series data than have been available in prior studies of the same issues, the number of available observations still imposes some severe restrictions on the analyses. In both studies, assumptions about the structure of the residual covariance matrix are imposed in an attempt to reduce error in the estimation of the matrix. In addition, data constraints necessitate sample sizes that are small relative to those used in most of the prior work. (For example, Beaver and Landsman used from 297 to 392 firms; Bublitz, Frecka and McKeown used up to 338 firms.) The restriction on the sample size is important, since it has already been demonstrated that the bias could depend on sample size. Specifically, if residual cross-correlations (pif) vary with regressor cross-correlations (wyi), then the bias would be greater than indicated here if a larger sample from the same set of industries were used. 5.1. The incremental information content of current cost income. Bernard and Ruland [1986b] focus, in part, on the same issue investigated in Beaver and Landsman [1983]. Beaver and Landsman regressed annual stock returns against measures of unexpected historical cost income and unexpected current cost income, to determine whether either of the independent variables offered significant incremental explanatory power. Beaver and Landsman employed current cost data disclosed under FAS No. 33 for 1980 and 1981. Using current cost income from continuing operations, the current cost data appeared to offer significant explanatory power in 1981 (t=2.3), but not in 1980. The historical cost data appeared to offer significant explanatory power in both years (t=6.9 43 Wilson [1986, 1987] investigates the issue by examining market reactions surrounding earnings announcement dates and the dates of release of 10-K's. 29

for 1980; t=2.1 for 1981). Beaver and Landsman concluded, on the basis of this and other evidence, that there is incremental information content in historical cost data, but not in current cost data.44 The Bernard and Ruland data include estimated current cost income for 113 firms for the years 1961-1980. The 113 firms are the December and January fiscal-year-end firms among those described in Bernard [1986]. That paper explains the assignment of the sample firms to 27 industries, defined in most cases by 2-digit SIC codes. Bernard and Ruland [1986b] estimate OLS regressions for each of 19 years, 1962-1980. The regressions differ in form from those of Beaver and Landsman in two respects. First, measures of unexpected income are scaled by the market value of the firm's common equity at the beginning of the year, rather than by the previous year's income. Second, whereas Beaver and Landsman use a combination of two simple regressions, a single multiple regression is used here.45 The precise form of the multiple regression used here is: UH Cit UCCit Rit = bl ---- b2 + 2 it +. (11) vi,t-l vi,t-l where Rit = stock return for firm i, year t; UHCit = first difference in historical cost income for firm i, year t; UCCit = first difference in current cost income for firm i, year t; Vit-1 = market value of common equity for firm i at end of year t - 1. To maintain consistency with Beaver and Landsman, who allowed the intercept of their crosssectional regressions to vary across years, all data used here are expressed in terms of deviations of raw returns from the annual cross-sectional mean. The transformation is equivalent to suppressing a time-varying intercept while leaving the remaining coefficients unchanged, and eliminates crosssectional dependence due solely to market-wide events. The degree of bias in OLS-based standard errors is assessed two ways, labeled Methods 1 and 2. Both methods yield a ratio corresponding to the measure of bias used throughout the paper. Method 1, which was used for similar purposes by Christie [1985], involves calculating the ratio of 44 The conclusions of Bernard and Ruland [1986a] are different. Bernard and Ruland estimate industry-specific regressions in time-series, while Beaver and Landsman were able to estimate only cross-sectional regressions. Bernard and Ruland show that, in those industries where the correlation between unexpected historical cost income and unexpected current cost income is relatively low, there is evidence of incremental information content in current cost data. 45 Christie, et al. [1984] explain that the two approaches are mathematically equivalent. 30

the observed time-series variance (over all cross-sections) in a given coefficient, to the mean (over all cross-sections) reported (OLS-based) variance of the coefficient.46 The ratio may exceed one either because reported variances are downward-biased, or because the true coefficient value changes over time. Method 2 calculates, for each crosssection, a ratio corresponding to that in equation (3); the numerator is equal to an estimate of the correct variance of an OLS coefficient (based on an estimate of the correct residual covariance matrix), and the denominator is the reported (OLS-based) variance of that coefficient. Method 2 is implemented as follows: 1. Equation (11) is estimated in a time-series for each firm, as would be done in the first stage of a seemingly unrelated regression procedure. 2. The regression residuals from the first stage are then used to estimate a residual covariance matrix. Using the notation of Section 4, we call the estimate o2A. If this were done while imposing no assumptions on the structure of the covariance matrix, the estimates would be consistent, but (since the number of free parameters in the matrix would exceed the number of available observations) the estimated matrix would be singular and the estimates would contain much sampling error. Thus, two alternative approaches, each based on different assumptions about the structure of the matrix, are used. Neither approach imposes any restrictions on residual variances, or on cross-sectional correlations between residuals of firms in the same industry. However, both involve (different) assumptions about cross-industry residual correlations: a. The first approach assumes that residual correlations between firms from a given pair of industries are homogeneous. For example, it is assumed that the residual correlation between mining firms and retailers is the same for all mining firms and retailers. Such cross-industry correlations are assumed equal to the average of all correlations between pairs of firms that include one member from each of the two given industries. Using this approach, the number of observations is approximately 4 times as large as the number of parameters to be estimated. b. The second approach assumes that all cross-industry correlations are the same (and equal to the grand mean of such correlations), regardless of the firms considered. In 46 To make this more explicit, define gb as the estimated regression coefficient for period t, a2 as the estimated residual variance, and Xt as the matrix of independent variables for period t. Then the numerator of the ratio is T T 1) (bkt)2. The denominator of the en of ratio is the mean over the T OLS-based reported coefficient variances, each of which is equal to the kth diagonal element of a matrix 82[XXtl-'. 31

the applications examined here, the mean cross-industry correlations are very small, and this approach is tantamount to assuming that all cross-industry correlations are zero. When this approach is used, the number of observations is approximately 9 times as large as the number of parameters to be estimated. Approach (b) is consistent with the industry effects model underlying Figure 1 and Table 2 in that it ignores variation in cross-industry correlations. If that variation is important, then the estimates of bias from approach (a) will tend differ from those based on approach (b). Thus, comparison of the two approaches will permit us to assess the reasonableness of the assumption that the bias is due primarily to within-industry dependencies. 3. Cross-sectional OLS regressions are estimated separately for each year. For the kh coefficient for each year, a ratio based on equation (3), the measure of bias, is estimated by substituting the estimated sample covariance matrix A for the correct residual covariance matrix:47 [1 k(Xt lk(Xt Xt) l] Estimated Bias = [1 (X t'Xt ) r (X At) (X'Xt) where Xt = (N x K) matrix of independent variables for period t. It is expected that Method 1 may overstate the bias, because it reflects any variation over time in the true regression coefficients. On the other hand, Method 2 may understate the bias, because both approaches to estimating the residual covariance matrix ignore at least some variation in cross-industry residual correlations. Precise statements about the sampling distributions of the test statistics are (at best) difficult to derive without relying on potentially unreasonable assumptions. 48 However, certain parameters of the sampling distributions will be estimated here using the technique of bootstrapping. (A brief summary of the application of bootstrapping in this environment is 47 The estimate assumes that a2/E[a2] = 1. See the development of equation (5) for justification of this assumption. 48 The distribution of the test statistic under Method 1 is distributed as Bias F(T-1), T(N-K) if a) the correct variance of the coefficient estimate is the same for all periods and b) the bias in the estimated variance is the same for all periods. The statistic thus has expectation Bias. [ T(TNN 'K2 which indicates that it is essentially unbiased for the values of T, N, and K used here. When conditions (a) and (b) do not hold, the distribution of the statistic is, at best, difficult to derive. The test statistic under Method 2 is a consistent estimate of Bias if the restrictions imposed on A are correct. (This follows because A would then be a consistent estimate of A). In addition, some statements about expectations in finite samples are possible. Write the bias due to cross-correlation as (N - 1)Cov[p y, wij] - P, and write the estimated bias as (N - 1)Cov[(pf + via), wij - p - v, where vij is estimation error. Then so long as E[vij] = 0, and Cov[vij, wiy] = 0 (and there is no obvious reason to suspect otherwise when the restrictions imposed on A are correct), the estimate is unbiased. More general statements about the distribution of the statistic are not derived, because the restrictions imposed on A destroy its properties as a Wishart matrix, and thus standard theorems do not apply. 32

contained in Appendix B.) In addition to providing bootstrap standard errors of our estimated measures of bias, we also bootstrap the sampling distribution of the measures as they would behave when the correct but unknown residual covariance matrix is a scalar identity matrix, as is assumed under OLS.49 This bootstrap distribution can suggest whether our measures of bias are themselves unbiased under the null hypothesis of no bias, and can indicate how likely it is that we would obtain our observed measures of bias, _when in fact no bias exists. Table 3 summarizes some descriptive statistics concerning the degree of cross-sectional correlation in the dependent variable, the regression residuals, and the regressors in equation (11). The mean intra-industry cross-sectional correlation in the dependent variable is.34, which is of the same magnitude that we might have expected in annual data.50 Note that the same high degree of cross-sectional correlation persists in the regression residuals, indicating that the regressors fail to reduce the degree of cross-sectional correlation in the dependent variable. The mean cross-sectional correlation in the regressors is also high. Within industries, the mean value of wiy for unexpected historical cost income and unexpected current cost income are.30 and.28, respectively. Table 3 also reports the same measures after each regressor is orthogonalized with respect to the other, using the procedure described by Beaver and Landsman. (Recall that it is only the cross-correlation that persists in the orthogonalized component of the regressor that affects the bias.) It is interesting to note that orthogonalizing unexpected current cost income with respect to its historical cost counterpart (or vice versa) does not decrease the measure of regressor cross-correlation. This occurs because that relatively small portion of the variation in the current cost variable that is not accounted for by the historical cost variable is due to factors with strong industry effects (e.g., changes in specific prices of inputs, variation in age of depreciable assets, inventory turnover). 49 When the first method is used to measure bias, in addition to assuming that the scaled residual covariance matrix is the identity, the bootstrap estimates also assume that the true coefficients are constant over time. 50 The mean correlation is slightly higher than the mean intra-industry cross-sectional correlation estimated for annualized market model residuals in Table 1. The correlation is expected to be higher because the Beaver-Landsman methodology employs total stock returns (expressed as deviations about the cross-sectional mean for the given year), rather than market model residuals. 33

Estimates of the amount of cross-sectional correlation in the residuals and the regressors would suggest that the degree of bias in OLS-based inferential statistics could be serious, at least in large samples. To obtain a preliminary estimate of the degree of bias one might encounter, we substitute the mean figures discussed above into equation (9), which was used earlier to create Figure 1. That is, we estimate that the mean intra-industry residual correlation (pw) is equal to.34, and that the mean measure of intra-industry cross-correlation in the regressors (ww) is.30. We also assume that cross-industry residual correlations (pc) are equal to zero, and we ignore the effect of the term (P - 1)Cov w[pij, wij]. If the number of firms per industry (P) is 4 in a sample of 113 (as used here), then the ratio of true coefficient variance to the expected value of its estimate would be 1.3; in a sample three times as large, the ratio would be 2.1. The indicated amount of bias could be larger (smaller) if cross-industry correlations and/or heteroscedasticity in the residuals exacerbate (mitigate) the problem. We now turn to Table 4 for a summary of the OLS regressions, and estimates of the amount of bias in the OLS-based standard errors. Method 1 yields estimates of the ratio of correct variance to estimated variance equal to 3.48 and 2.71 for UHC and UCC, respectively. These ratios are much larger than our preliminary estimate, and are indicative of a substantial degree of downward bias in OLS-based standard errors. However, as indicated above, these ratios tend to overstate the amount of bias if the true coefficients vary over time. Method 2 also indicates substantial downward bias in OLS-based standard errors. The next-tolast panel in Table 4 presents the ratio of the coefficient variance as estimated with the full residual covariance matrix to that estimated under OLS. The mean (over 19 years) of the ratio is 1.81 and 1.70 for the coefficients of UHC and UCC, respectively. When the ratios are re-estimated while assuming all cross-industry residual correlations are equal to their mean (which is close to zero), we obtain 1.64 and 1.59, respectively. In this context, then, cross-industry dependencies have little impact on the bias. It appears unlikely that sampling error could account for measures of bias as large as those obtained here. The measures of bias under Method 1 are about two bootstrap standard deviations 34

above one, and those based on Method 2 are three to four bootstrap standard deviations above one. Moreover, when the statistics were generated while maintaining the hypothesis that the scaled residual covariance matrix is the identity, the mean statistics were all close to one (indicating unbiasedness under the null), and measures as high as those actually observed here occurred less than 1 percent of the time under the first method, and never occurred under the second method. That the ratios are higher than our preliminary estimate of 1.3 appears due to that estimate not taking into consideration the possible effects of heteroscedasticity, nor the interaction between heteroscedasticity and cross-correlation in the residuals. When the bias calculated using the second approach is segregated into the three components shown in equation (5), it appears that for the estimated variance of the coefficient of UHC, 44 percent of the bias is attributable solely to crosscorrelation; 37 percent of the bias is attributable solely to heteroscedasticity; and 19 percent of the bias is due to the interaction between cross-correlation and heterscedasticity. For the coefficient of UCC, cross-correlation and heteroscedasticity each accounting for about 46 percent of the bias, with interaction between the two account for about 8 percent. For both UHC and UCC, the magnitude of the bias due only to cross-correlation is approximately consistent with our preliminary estimate. That heteroscedasticity exacerbates the bias indicates that firms with high variance in the regressors (unexpected income) also tend to have high variance in the regression residuals. When the same phenomenon exists in other studies, the total amount of bias could be much larger than indicated by Table 2 and Figure 1, which ignore the effects of heteroscedasticity. Bias of the magnitudes documented here could be important in the Beaver-Landsman context. If the bias in standard errors reported by Beaver and Landsman is as large as indicated by Method 1, then t-values in the range of 3 to 4 would be required to achieve a true (two-tailed) significance level of.05. On the other hand, if the bias is no larger than that indicated by the Method 2, then t-values of approximately 2.5 or 2.6 would be required to achiee a true significance level of.05. This is before consideration of the effects of sample size. The Beaver-Landsman sample is three times larger than that used here, and the increment in sample size appears to have been achieved, for the most part, by including more representatives from the same industries already included here, 35

rather than representatives from a different set of industries. 51 Under such conditions, as discussed in Section 4.3, the three-fold increase in sample size could be accompanied by a three-fold increase in that portion of the bias due solely to cross-correlation. Elimination of bias of such magnitudes could potentially reverse the conclusions of Beaver and Landsman. Specifically, it might not be possible to reject the hypothesis of no incremental information content not only for current cost income, but for historical cost income as well; the two income measures might be nearly perfect substitutes. 5.2. The incremental information content of quarterly accruals. A second information content study is now examined, in order to determine to what extent the conclusions of the previous section might hold in contexts where the research design is similar, but the data differ. In this study, market model residuals for a three-month interval are regressed against unexpected components of three firm-specific accounting variables. The first variable (denoted UCF) is the unexpected component of quarterly "cash flows" from operations, where cash flows are defined to include all working capital accounts except inventory.52 The second variable (UINV) is the unexpected change in the inventory balance for the quarter. The third variable (UACC) is the unexpected component of depreciation, interest expense, and nonoperating gains and losses. Following Wilson [1986, 1987], these are referred to as noncurrent accruals. The sum of the three variables is equal to unexpected pretax income. The purpose of the analysis is to determine, assuming there is information content in unexpected cash flows, whether there is incremental information content in the other components (inventory accruals and the noncurrent accruals) included in quarterly pretax income. The precise form of the regression used as the basis for the study is shown in equation (12) below. All independent variables are scaled by the market value of the equity at the beginning of the fiscal quarter. The market model residuals used as dependent variables are cumulated over 51 The sample used by Bernard and Ruland, for the most part, includes representatives of the same industries included in the Beaver-Landsman study. Exceptions are as follows. Construction companies, wholesalers, and trucking and shipping firms were included in the Beaver-Landsman sample, but excluded by Bernard and Ruland. Firms in consumer service industries were included by Bernard and Ruland, but excluded by Beaver and Landsman. 52 The operational definition of "cash flows from operations" is operating income before depreciation, interest, and taxes, minus the increase in inventory balances. This differs from working capital from operations minus increases in inventory primarily because the latter includes: 1) the working capital effects of interest income and interest expense, 2) taxes currently paid, or deferred and classified as current, and 3) dividends from unconsolidated subsidiaries. 36

the three-month period ending two months after the fiscal quarter, so as to allow for delay in the publication of quarterly financial statements. Results based on residuals cumulated over the fiscal quarter, and over the three months ending one month after the fiscal quarter, are similar. Ut = Fit UNVi b A V it + eC t Rit = bl --- b bs --- - e. (11) i't-l y it- Vit- 1 where Rit = stock return for firm i, quarter t; UCFit = unexpected cash flows for firm i, quarter t; UINVit = unexpected change in inventory balance for firm i, quarter t; UACCit = unexpected change in noncurrent accruals for firm i, quarter t; Vit-1 = market value of common equity for firm i at end of quarter t - 1. The accounting data used in the study were obtained from the Compustat II Primary Quarterly Industrial File.53 To be included in the sample, a firm was required to have a December fiscal year end, a complete monthly returns series on the CRSP file for 1976 through 1984, and complete data on the Primary Quarterly Industrial File for 1976:II through 1984:IV for the required accounting data.54 A total of 104 firms met these requirements; the firms were grouped into 19 industries. Most of the 19 industries correspond to 2-digit SIC codes. The degree of cross-sectional correlation in the dependent variable, the independent variables, and the residuals is summarized in Table 5. The mean intra-industry cross-sectional correlation in the dependent variables is.17.55 As was the case in the previous study, most of the cross-correlation in the dependent variable does translate'to the residuals. The measures of regressor covariability are relatively low, averaging about.10 for orthogonalized values of the regressors. However, values within some industries do exceed.40. 53 The unexpected components of each. of the three accounting variables (UCF, UINV, and UACC) were estimated by fitting ARIMA models to the time-series of each variable for each firm. The general form of the ARIMA models was assumed to be the same for all firms, but firm-specific parameter estimates were used. For cash flows from operations, the undifferenced series was modeled with one regular autoregressive term, one seasonal autoregressive term and a constant drift term. For inventory accruals and for depreciation and interest accruals, the same model was fit to the first differences in the raw series. Because of differencing, the inclusion of autoregressive terms in the model, and the need for lagged stock returns data, the number of quarterly time-series observations available in the final analysis declined to 29. 54 These data include inventory (item 38), operating income before depreciation, interest and taxes (item 21), and income before tax (item 23). 55 This degree of correlation is much lower than that noted in the first information content study. To some extent, this reflects the use of quarterly data, rather than annual data, and market model residuals, rather than mean-adjusted returns. However, the mean intra-industry cross-sectional correlation in the dependent variables (.17) is still lower than that noted in Table 2 for quarterly market model residuals (.24). The likely explanation lies in the way industries are defined. In Table 2, industries are based on 3-digit SIC codes. Here, broader, primarily 2-digit, industry groupings are employed. 37

Using the same approach adopted in the previous section, we generate a preliminary estimate of bias in OLS standard errors based on mean within-industry values of residual and regressor correlations. The preliminary estimate suggests that the degree of bias might be very low; the projected ratios of true coefficient variances to the OLS-based estimates are less than 1.1. Using the same two methods adopted in the previous study, we now generate our final estimates of the bias. The results are summarized in Table 6. The estimates in Table 6 suggest that the bias in OLS standard errors may be substantial, even though the sample size and the mean measures of residual and regressor cross-correlation are relatively small. Method 1 yields bias estimates of 3.36, 3.57, and 1.80 for the three coefficients. Recall that if the true coefficients change over time, these ratios overstate the amount of bias in the OLS estimates. The three corresponding ratios based on Method 2 are somewhat lower: 2.32, 2.18, and 1.34. When the cross-industry cross-correlations are assumed homogeneous (and close to zero), the estimates under Method 2 are little changed, to 2.07, 2.00, and 1.35. Thus, as in the first study, cross-industry dependencies appear to contribute little to the estimated bias. All of the estimated ratios are higher than our preliminary estimate. The reason appears to be that the preliminary estimate did not account for the effects of heteroscedasticity. When we again segregate the bias into the three components shown in equation (5), it appears that for the two estimated coefficient variances containing the largest bias (those corresponding to unexpected cash flows and unexpected inventory changes) only 27 percent of the bias is attributable solely to cross-correlation; 67 to 71 percent of the bias is attributable solely to heteroscedasticity; very little of the bias is due to the interaction between cross-correlation and heteroscedasticity. Thus, cross-correlation induces a relatively small amount of bias, but the effects of heteroscedasticity render the total bias substantial. Note, however, that if the same patterns of cross-correlation were observed in a sample of, say 200 or 300 firms, with industry representation no broader than that observed here, even the bias due solely to cross-correlation could be problematic. 5.3. Corroborative evidence from a related study. Rayburn [1986] estimates models similar to those just discussed, using annual data. For each of twenty years, stock market residuals for 38

175 firms are regressed (using OLS) against unexpected cash flows and unexpected accruals. Using data reported in Rayburn's Table 5, one can calculate the ratio of observed coefficient variance over twenty years, to the mean OLS-based estimate of that variance. For the two slope coefficients in the Rayburn model, the ratios are 4.87 and 3.40. Thus, the Rayburn study provides corroborative evidence that OLS-based variance estimates may seriously understate the true variance in a "crosssectional returns study," at least when returns are measured over long intervals.56 6. SUMMARY AND CONCLUSIONS This paper examines the degree of bias likely to exist in some common research contexts when cross-sectional dependence is not accounted for. Previous accounting studies documenting the existence of serious bias in OLS-based statistics have focused on a special case where the bias would frequently be serious, but where standard procedures to account for the dependence are most likely to be feasible. In other cases, previous research provides little guidance for predicting the magnitude of bias that may result from cross-sectional dependence. Unfortunately, these other cases include "cross-sectional returns studies," in which it is often difficult or infeasible to employ procedures that explicitly consider the effects of cross-sectional dependence. While the degree of bias is very context-specific, this paper has at least identified the factors that must be considered in assessing its seriousness. On the basis of examining these factors, it appears that standard errors based on OLS cross-sectional regressions of quarterly or annual return metrics against firm-specific variables (or the equivalent) may frequently include substantial bias. The results of those examinations suggest that when OLS is used in such studies, inferential statistics should be interpreted cautiously. Researchers may find it worthwhile to devote more effort to avoiding problems in inference that can arise when analyzing market model residuals with OLS. Although seemingly unrelated regression may rarely serve as a suitable alternative, other approaches probably deserve more 56 Rayburn recognizes this difficulty and does not rely heavily on inferential statistics calculated in cross-section. 39

attention than received to date. These include computer-intensive techniques, the use of multiindex models to generate residuals, and the use of models based on parsimonious characterizations of residual covariance matrices. Since inter-industry cross-correlations appeared to cause little bias relative to intra-industry cross-correlations, a GLS approach that allows only for the latter might be useful in settings where sufficient time-series data exist. Alternatively, if one is more concerned about bias in standard error estimators than efficiency in coefficient estimators, then the generalized-method-of-moments estimator described by Froot [1987] could be used. Finally, although this paper has focused on contexts in market-based accounting research, the same issues are potentially important in a wide variety of other settings. In any quasi-experimental setting, there are likely to be sources of dependence among observations that are not captured by the model designed by the researcher. Moreover, outside market-based research, there are rarely enough data to permit explicit recognition of the dependence in hypothesis testing. Examples include studies that attempt to explain differences in management compensation, audit fees, or choices of accounting policy. Readers of these and other quasi-experimental studies should frequently not interpret significance levels too literally. 40

APPENDIX A Why Do Cross-correlations Increase with the Return Interval? This appendix considers some potential explanations for why the degree of residual crosscorrelation increases as the length of the return interval increases. We begin by contrasting the behavior of market model residuals observed here, with that which would be expected if the residuals were serially uncorrelated. The schedule below summarizes the behavior of the numerators (i.e., covariances) and denominators (i.e., products of standard deviations) of the mean within-industry cross-correlations in Table 1. The amounts are grand means of averages across all firm-pairs within each industry. Amounts are presented for daily, weekly, monthly, quarterly, and annual data. (Covariances and products of standard devations are multiplied by 100) Return interval: Daily Weekly Monthly Quarterly Annual Mean covariance: 0.0017 0.0224 0.114 0.402 1.428 Mean product of standard deviations: 0.0420 0.2298 0.588 1.589 4.709 Mean correlation: 0.0406 0.0906 0.183 0.240 0.302 If market model residuals were serally uncorrelated, then both the covariances and the products of standard deviations would grow in proportion to the increase in the length of the return interval, and the ratio of these two amounts would remain invariant to the length of the interval. For example, the covariances and products of standard deviations would be 5 times larger in weekly data than in daily data, and the cross-correlations would be the same. Below, we scale the covariances and products of standard deviations for a given return interval by those corresponding to the next smallest return interval, in order to see how fast the amounts grow as the return interval increase. Weekly Monthly Quarterly Annual scaled by. scaled by scaled by scaled by Daily Weekly Monthly Quarterly Mean covariance: 13.1 5.1 3.5 3.6 Mean product of standard deviations: 5.5 2.6 2.7 3.0 Expected ratio: 5.0 4.3 3.0 4.0 When the return interval is increased, the mean covariance grows by more than expected under serial independence (except when moving from quarterly to annual data). This implies the presence A-1

of positive noncontemporaneous, cross-firm, within-industry residual correlation. 1 When the return interval is increased, the mean product of standard deviations grows by less than expected under serial independence (except when moving from daily to weekly data). This implies the presence of negative within-firm serial correlation in residuals.2 Could "noise" in returns account for the negative within-firm serial correlation in residuals? If noise is defined as a deviation of recorded prices from market-clearing prices, then two sources of noise would be bid-ask spreads, and nonsynchronous trading. Noise from these sources is almost certain not to explain the negative serial correlation observed here. First, such "noise" should have the most dramatic effect on serial correlation in daily residuals, which in turn would cause the most dramatic reduction in the growth of the products of standard deviations as one moves from daily to weekly data. To the contrary, however, that is the one case where the growth is actually larger than expected. Second, the effect of such noise would not be nearly large enough to account for the patterns noted as one moves to monthly, quarterly, and annual data. For example, assume that the residual variance in weekly data is a2 + a2, where the first term captures the variance in the correctly measured residual, and the second term captures "noise." The first term grows in proportion to the length of the return interval, but the second term does not. Then the residual variance in monthly data would be 4.3 2 +a2, assuming that the average month contains 4.3 weeks. If, as indicated in the schedule above, the total variance in monthly data (4.3072 + ar2) is 2.6 times as large as the variance in weekly data (cr2 + a2), then the variance due solely to noise (u2t), would have to be 52 percent as large as the total variance in weekly residuals (a'2 + CO2). However, the portion of the variance in weekly residuals that could be accounted for by measurement error due to bid-ask spreads is not even one-tenth this large.3 Could nonstationarity in market model parameters explain the patterns observed here? If departures of correct market model parameters from the estimates are positively correlated over time and across firms within the same industry (which is quite plausible, if not likely), that would lead to positive noncontemporaneous cross-firm, within-industry residual correlations. Thus, this could explain the overall tendency of covariances to increase more than expected as the return interval increases. However, this explanation is inconsistent with the data in two ways. First, such a phenomenon would also lead to positive serial correlation on a within-firm basis, but we observe 1 Separate calculations verify that these noncontemporaneous cross-firm correlations are, on average, positive in daily, weekly, and monthly data. 2 Separate calculations verify that these serial correlations are, on average, negative in weekly, and monthly, and quarterly data. 3 Stoll and Whaley [1983, Table 5] indicate that the median bid-ask spread is about 1.3 percent. Assume a.5 probability that the measured price represents a bid, and a.5 probability that the price represents an ask. Then the variance in the measured price due solely to movement between the bid and ask would be.000042 (that is, (.0132)(.52)). If the probability of the measured price representing a bid (ask) is independent across weeks, then the variance in returns due solely to movement between the bid and ask would be twice as large, or about.00008. This amount is less than 4 percent as large as the mean variance in weekly residuals. A-2

the opposite. Second, the explanation is inconsistent with the behavior of covariances as we move from quarterly data to annual data. Another possible explanation would be departures from weak-form market efficiency. If the market tends to "overreact" to firm-specific news, as suggested by DeBondt and Thaler [1985],4 then negative serial correlation on a within-firm basis would arise, especially over long return horizons. If such behavior were accompanied by a market inefficiency involving the impounding of industryspecific news at different points in time for different firms, then positive noncontemporaneous cross-firm correlations could arise as well. Alternatively, "overreaction" to firm-specific news, when combined with nonstationarities in market model parameters that are shared by firms within an industry, could potentially explain the patterns observed here. However, even if departures from weak-form market efficiency do explain the data, they are not necessarily indicative of an exploitable trading strategy. The serial correlations observed are no larger than those observed by Fama [1965] and others in the 1960s, who at that time dismissed their economic importance. 4 Fama and French [1986a, 1986b] document negative serial correlation in returns at the market-wide level, which is consistent with the possibility that stock prices "take long swings from fundamental values" (Fama and French [1986a]. However, to explain negative serial correlation in market-adjusted returns, the "swings from fundamental values" would have to involve firm-specific 'mispricing,' not just market-wide 'mispricing.' Although Fama and French [1986b] are able to replicate the DeBondt-Thaler evidence, which is consistent with firm-specific 'mispricing,' they conclude that the negative serial correlation in returns appears to be primarily a market-wide phenomenon. A-3

II i I ~r r

pooled time-series cross-sectional regression in step 1. The number of replications underlying each bootstrap distribution for Method 1 was 500. Method 2. For Method 2, the measure of bias is a function only of the regressors and the estimated residual covariance matrix, scaled so that the mean residual variance is equal to one. To assess the behavior of that measure of bias while maintaining the assumption that the correct scaled residual covariance matrix is the identity matrix, 'pseudo-residuals' from the IMSL subroutine GGNML were used in conjunction with the regressors for equation (11) or (12) as input to the procedures described in Section 5. The number of replications underlying each bootstrap distribution was 100. B-2

APPENDIX B BOOTSTRAP ESTIMATES This appendix presents some details concerning the bootstrap methods used to approximate the sampling distributions of the measures of bias used in this paper. For a more general summary of application of bootstrap methods in a related setting, see Marais [1986]. Boostrap standard errors of bias measures. Each bootstrap standard error of a bias measure in Tables 4 and 6 is simply equal to the standard deviation of the bootstrap distribution of that bias measure, where each observation of the bootstrap distribution is generated as follows: 1. Equation (11) (for Table 4) or equation (12) (for Table 6) is estimated separately for each of N firms, using T time series observations, to generate an (N x T) matrix of residuals. 2. An (N x T) matrix of "pseudo-residuals" is created by sampling from the columns of the matrix of original residuals with replacement. 3. An (N x T) matrix of "pseudo-dependent variables" is created by adding the "pseudoresiduals" to the product of the regressors in equation (11) or (12), and the coefficients estimated in step one above. 4. The 'pseudo-dependent variables' and the original regressors serve as the inputs to the same procedures described in Section 5 of the text for measuring the bias. The output represents a single observation of the bootstrap distribution of the bias measure. Steps 2 through 4 are repeated many times, in order to create the a distribution of a given measure of bias. In general, 100 replications were conducted. However, because the procedure is extremely computer-intensive when Method 2 is used to measure bias, only 40 replications were used for the Method 2 measures in Table 6. Bootstrap distributions under the hypothesis that correct scaled residual covariance matrix is the identity matrix. The bootstrap estimates in the final panels of Tables 4 and 6 were generated as follows: Method 1. Bootstrap estimates for Method 1 were generated by modifying the procedure described above. Step 1 was replaced by a pooled time-series cross-sectional model that assumed the coefficients are constant across time and across firms. This assumption was invoked because the measure of bias under Method 1 would itself be biased upward if the true coefficients change over time. The idea is to determine how the measure of bias performs when the problem of time-varying coefficients does not exist. In step 2, an (N x T) matrix of 'pseudo-residuals' was generated by first constructing independent normal (0,1) random variables using the IMSL subroutine GGNML, and then introducing serial (but not crosssectional) heteroscedasticity into that set of random variables. Specifically, each of the T columns of the matrix of random variables were multiplied by the (time-varying) estimated residual standard deviation for the corresponding cross-sectional regression imbedded in the B-1

FIGURE 1 Graphical Representtion of Bias Due to Cross-Correlation. Bias is Defied as Ratio of Correct Varince of OLS Coefficient to Expected Value of OLS Estimate of Variance "Event Studies" Bias 20 Industries in sample 3 monthly data.//..' ___ _ weekly data 1 ~............ i daily data O - I. 0 100 S Somple Size 300 Sample Size "Cross-Sectonal Retur Studies" Uean with-in industry regressor cross-correlation =.40 Within-industry covariance between residual cross-correlations and regressor cross-correlations varies. I Bla 20 Industries in sample 1 3 d nua data o -00M - 6_*, L,. I I I _ a II '' 0 100 Zn0 aW Sample Size Sample Size Uean within-industry regressor cross-correlaton =.10 Within-industry covariance between residual cross-correlatons and regressor cross-correlations varies. Bias 20 industries In sample 4 -3 2 annualdo a 1. daily data 0o i Bias 10 Industries In sample 4 - 2 t. I annua - d l~y data o 100 o 230 Sample Size 0 100 200 300 Sample Size 41

I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ i I t ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~ ~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~ ~~~ ~~~ ~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ r~*~na~~~-r,~,~ — l~~~n~~ Ilr~~~~ ~l-~~-r ~ ~ ~ l rr

TArL 1 Cross-sectional Correlation in Market Model Residuals Observation Interval of Data Daily Weekly Monthly Quarterly Annual Description of data: Test period.................. Number of: firms..................... 3-digit SIC industries.... time series observations.. Minimn firms per industry.... Maximum firms per industry.... 1984 1080 87 253 5 63 1981-84 1965-84 1965-84 1955-84 1080 87 202 5 63 428 59 240 3 49 428 59 80 3 49 274 42 30 3 15 Intra-industry cross-sectional correlations: Distribution across industries of mean intra-industry cross-sectional correlations: Grand mean across industries.. a Standard error of mean. Standard deviation of industry means.. Standard deviation b not attributed to sampling error.25 percentile.................50 percentile.................75 percentile............... Mean cross-sectional correlation, equally-weighted across firms..04 (.002).05.05.02.03.05.04.09 (.002).08.08.05.07.10.13.18 (.004).11.10.10.16.25.23.24 (.010).14.12.14.21.35.29.30 (.020).20.15.21.31.42.33 86% Fraction of industries for which one can reject hypothesis at.05 level of no intra-industry cross-sectional correlation: 51% 78% 98% 90% Inter-industry cross-sectional correlations: Reject at.05 level hypothesis of no inter-industry correlation? Mean across industry-pairs.... c yes.01 c yes.02 c yes.05 c yes.06 N/A.06 aStandard errors were estimated while taking into account the covariance among cross-correlations estimated for pairs of firms within the same industry, but while assuming means for different industries are uncorrelated. This estimate was obtained by subtracting from the observed variance in industry means the variance that is estimated to be due to sampling error. CThe test is described by Press [1972, p. 180]. Not applicable; insufficient number of time series observations to permit testing. 42

II~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE 2 Bias Due to Cross-correlation for Four Cases (Ratio of Correct Variance of OLS Coefficient to the Expected Value of its Estimate, When Cross-correlation is Sole Source of Bias) Structure of Regressor Restrictions Imposed on Sources of Bias "Event Studies": Regressor is column of ones during even period; zero otherwise No restrictions 1 - (N - i)p Industry Effects Modela 1 + (P - l)pw + (N - P)pc "Cross-sectional returns studies": No restrictions on regressor 1 + (N - 1)CovPj[p0,wj - 1+(P - 1)Cov w[pv, wu] + (P - l)(pw - pc)*w - PC N = number of firms in sample; P = number of firms per industry; p = mean residual cross-correlation; Pw = mean residual cross-correlation within industries; Pc = mean residual cross-correlation across industries; *w = mean "regressor cross-correlation" within industries; Covij1[pv,wj] = covariance between residual cross-correlations 'and "regressor cross-correlations" across all firm-pairs; Cov w[pv, wVj] = covariance between residual cross-correlations and "regressor cross-correlations' within industries. aThe industry effects model imposes the assumption that the covariance between cross-industry residual correlations and cross-industry regressor correlations is zero. 43

I

TiE 3 Estimates of Cross-sectional Correlation in Data Used for Regressions of Anmual Stock Returns Against Unexpected Historical Cost and Current Cost Income _~~~~~~~~,... ~. ',,' _, m i *: _~ ~ _..:_ ~ _ _.~ _. _ Sample size: 113 firms in 27 industries Test period: 1962 through 1980 Distribution of cross-correlations in department variable & regression residuals:.25 percentile................50 percentile................75 percentile............... Mean......................... Intra-industry mean cross-correlations Inter-industry mean b cross-correlations Dependent Regression Dependent Regression variable residual variable residual.21.36.46.34.22.38.47.34 -.13 -.02.11 -.01 -.11.00.12.01 Distribution across industries of mean intra-industry cross-correlation (w )c amng values of a given regressor: Independent Variable kUexpected Uhexpected Historical Current Cost Income Cost Income.25 percentile..............................50 percentile.............................75 percentile.............................. Maxin m...................................... Grand mean across industries..................16.27.41.76.30.18.28.34.63.28 ( Distribution across industries of mean intra-industry cross-correlation (w )C mong values of a given orthogonalized regressor:.25 percentile................50 percentile................................75 percentile.............................. Maxtim...................................... Grand mean across industries................. )rthogonalized Independent Variable LUexpected Utexpected Historical Current Cost Income Cost Income.19.26.33.92.30.19.25.33.92.29 efore constructing distribution, cross-correlations are averaged across firm-pairs within the same industry. Before constructing distribution, cross-correlations are averaged across firms-pairs selected from the same pair of industries (e.g., all correlations between steel producers and retailers are averaged). CFor each industry-year, the amount w (as defined in Section 4.1) is calculated. These amounts are averaged across 19 years to arrive at the distribution described here. 44

r~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ 4 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~I

TAWE 4 Results of Cross-sectional Regressions of Annual Residuals Against Unexpected Historical Cost and Current Cost Incame and Estimates of Bias in Estimated Variances of Coefficients Sanple size: 113 firms in 27 industries Independent Variable Test period: 1962 through 1980 Unexpected Unexpected Historical Current Cost Income Cost Income Surmary of cross-sectional regressions for each of 19 years: Median regression coefficient: 1.74 -.09 Mean regression coefficient: 3.30 -.64 Time-series-based standard deviation of mean coefficient: 1.47 1.28 Estimated Measures of Bias: Method 1: Ratio of time-series variance of OLS coefficient, to mean of OLS-based estimate of variance: 3.48 2.71 Bootstrap standard error: (.86) (.73) Method 2: Ratio of coefficient variance based on estimated full covariance matrix, to that based on OLS: a) Intra-industry residual correlations unrestricted; Cross-industry residual correlations assumed homogeneous within a given industry pairing Mean across 19 cross-sections: 1.81 1.70 Bootstrap standard error: (.19) (.20) b) Intra-industry residual correlations unrestricted; Cross-industry residual correlations assumed homogeneous (and close to zero) across all industry-pairs Mean across 19 cross-sections: 1.64 1.59 Bootstrap standard error: (.18) (.19) Behavior of bias measures when correct residual covariance matrix is scalar identity: Method 1: Bootstrap mean (value of 1.00 indicative of unbiasedness): 1.01 1.01 Bootstrap standard deviation: (.57) (.57) Frequency of occurrence of measure as large as that actually observed above:.01.01 Method 2: Bootstrap mean (value of 1.00 indicative of unbiasedness): 1.00.99 Bootstrap standard deviation: (.08) (.08) Frequency of occurrence of measure as large as that actually observed above:.00.00 45

L ~ ~~~ ~~~~ ~~~~ ~~~ ~~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I~~~~~~~~ I~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~ I-"~ ""*l~-~-;r ~, -I~ ---- rr -~ —rP-,~-~-rr x-.-~ -l^ --- —— ~- *s —~ ----,r^

A TABLE 5 Estimates of Cross-sectional Correlation in Data Used for Regressions of Quarterly Residuals Against Unexpected Cash Flows and Accruals ----— ~~~ ~ ~~~~.,, ~,...,-,-:. -- ---...... ~.......... L..................., Saple size: 104 firms in 19 industries Test period: 1977:III through 1984:III Distribution of cross-correlations in dependent variable & regression residuals: Intra-industry mean cross-correlations Dependent Regression variable residual Inter-industry neag cross-correlations Dependent Regression variable residual.25 percentile......................50 percentile.......................75 percentile................... Grand mean across industries.........04.12.27.17.06.10.26.16 -.09.00.06 -.01 -.07.00.05 -.01 Distribution across industries of mean intra-industry across-correlation (ww)C among values of a given regressor: Unexpected Cash Flows from Operation Independent Variable Unexpected Unexpected Inventory Noncurrent is Accruals Accruals.25 percentile............................50 percentile..............................75 percentile.............................. Maxium.......................... Grand mean across industries..................03.06.15.36.02.04.23.55.10.04.14.39.53.14.09 Distribution across industries of mean intra-industry cross-correlation (w )c among values of a given orthogonalized regressor: Orthogonalized Unexpected Cash Flows from Operations Independent Unexpected Inventory Accruals Variable Unexpected Noncurrent Accruals.25 percentile..............................50 percentile................................75 percentile............................. Maxien..................................... ~i~Wle ~ e~e~eeeeeeee'e ee.02.06.09.52.03.06.15.57.01.10.15.43.10 Grand mean across industries..................10 aBefore constructing distribution, cross-correlations are averaged across firm-pairs within the same industry. Before constructing distribution, cross-correlations are averaged across all firms-pairs selected from the same pair of industries (e.g., all correlations between steel producers and retailers are averaged). For each industry-quarter, the amount w (as defined in Section 4.1) is calculated. These amounts are averaged across 29 quarters to arrive at the distribution described here. 46

I,..

Li TABLE 6 Results of Cross-sectional Regressions of Quarterly Residuals Against Unexpected Cash Flows and Accruals and Estimates of Bias in Estimated Variances of Coefficients Saple size: 104 firms in 19 industries Independent Variable Test period: 1977:1I through 1984:III Unexpected Unexpected Unexpected Cash Flows Inventory Noncurrent from Operations Accruals Accruals Sumary of cross-sectional regressions for each of 29 years: Median regression coefficient.27.30.27 Mean regression coefficient.37.36.28 Time-series-based standard deviation of mean coefficient:.11.13.12 Estimated Measures of Bias: Method 1: Ratio of tine-series variance of OLS coefficient, to mean of OLS-based estimate of variance: 3.36 3.57 1.80 Bootstrap standard error: (.63) (.60) (.71) Method 2: Ratio of coefficient variance based on estimated full covariance matrix, to that based on OLS: a) Intra-industry residual correlations unrestricted; Cross-industry residual correlations assumed hanogeneous within a given industry pairing Mean across 29 cross-sections: 2.32 2.18 1.34 Bootstrap standard error: (.22) (.20) (.12) b) Intra-industry residual correlations unrestricted; Cross-industry residual correlations assured hcmogeneous(and close to zero) across all industry-pairs Mean across 29 cross-sections: 2.07 2.00 1.35 Bootstrap standard error: (.18) (.17) (.11) Behavior of bias neasures when correct residual covariance matrix is scalar identity: Method 1: Bootstrap mean (expect 1.00 if unbiased): 1.00.97.98 Bootstrap standard deviation: (.34) (.31) (.34) Frequency of occurrence of measure as large as that actually observed above:.00.00.025 Method 2: Bootstrap mean (expect 1.00 if unbiased): 1.00 1.00 1.00 Bootstrap standard deviation: (.06) (.06) (.06) Frequency of occurrence of measure as large as that actually observed above:.00.00.00... 47

I 1 3

C~ REFERENCES Ball, R. "Market Model Studies: Justification, Interpretation, and Experimental Problems," Working Paper, University of Queensland, 1975. Ball, R. and P. Brown. "An Empirical Evaluation of Accounting Income Numbers," Journal of Accounting Research (Autumn 1968): 159-178. Beaver, W. H. "Econometric Properties of Alternative Security Return Methods," Journal of Accounting Research (Spring 1981): 163-184. Beaver, W. H. and R. Dukes. "Interperiod Tax Allocation, Earnings Expectations, and the Behavior of Security Prices," The Accounting Review (April 1972): 320-332. Beaver, W. H., R. Clarke and W. Wright. "The Association Between Unsystematic Security Returns and the Magnitude of the Earnings Forecast Error," Journal of Accounting Research (Autumn 1979): 316-340. Beaver, W. H., C. Eger, S. Ryan, and M. Wolfson. "Financial Reporting and The Structure of Bank Share Prices," Working Paper, Stanford University, July 1985. Beaver, W. H., R. Lambert, and S. Ryan. "The Information Content of Security Prices: A Second Look," Working Paper, Stanford University, September 1986. Beaver, W. H. and W. R. Landsman. The Incremental Information Content of FAS S3 Disclosures, FASB Research Report (1983). Beaver, W. H. and S. Ryan. "How Well Do Statement No. 33 Earnings Explain Stock Returns?" Financial Analysts Journal (September-October 1985): 66-71. Bernard, V. "Unanticipated Inflation and the Value of the Firm," Journal of Financial Economics (March 1986): 285-321. Bernard, V. and R. Ruland. "The Information Content of Alternative Income Numbers: An Examination of Evidence and Methodological Issues Based on Time-series Data," Working Paper, University of Michigan, 1986a. Bernard, V. and R. Ruland. "The Information Content of Current Cost and Historical Cost Income: Cross-sectional Analyses for 1962-1980," Working Paper, University of Michigan, 1986b. Biddle, G. and F. Lindahl. "Stock Price Reaction to LIFO Adoptions," Journal of Accounting Research (Autumn 1982): 551-588. Binder, J. "On the Use of the Multivariate Regression Model in Event Studies," Journal of Accounting Research (Spring 1985): 370-383. Bowen, R. D. Burgstahler, and L. Daley. "The Incremental Information Content of Accruals versus Cash Flows," Working Paper, University of Washington, June 1986. Brown, S., and J. Warner. "Measuring Security Price Performance," Journal of Financial Economics (September 1980): 205-258. Brown, S., and J. Warner. "Using Daily Stock Returns: The Case of Event Studies," Journal of Financial Economics (March 1985): 3-32. Bublitz, B., T. Frecka and J. McKeown. "Market Association Tests and FASB Statement No. 33 Disclosures," Journal of Accounting Research (Supplement, 1985): 1-23. Burgstahler, D. and E. Noreen. "Detecting Contemporaneous Security Market Reactions to a Sequence of Related Events," Journal of Accounting Research (Spring 1986): 170-186. Christie, A. "On Information Arrival and Hypothesis Testing in Event Studies," Working Paper, University of Rochester, 1983. 48

Christie, A. "On Cross-sectional Analysis in Accounting Research," Working Paper, University of Southern California, January 1986. Christie, A., M. Kennelley, W. King and T. Schaefer. "For Incremental Information Content in the Presence of Collinearity," Journal of Accounting and Economics (December 1984): 205-218. Collins, D. and W. Dent. "A Comparison of Alternative Testing Methodologies Used in Capital Markets Research," Journal of Accounting Research (Spring 1984): 48-84. Collins, D., S. P. Kothari, and J. Rayburn. "Firm Size and the Information Content of Prices with Respect to Earnings," Journal of Accounting and Economics (1987, forthcoming). Collins, D., M. Rozeff and D. Dhaliwal. "The Economic Determinants of Market Reaction to Proposed Mandatory Accounting Changes in the Oil and Gas Industry: A Cross-Section Analysis," Journal of Accounting Research (May 1981): 37-72. DeBondt, W. and R. Thaler. "Does the Stock Market Overreact?" Journal of Finance (July 1985): 793-805. Dyckman, T., D. Philbrick, and J. Stephan. "A Comparison of Event Study Methodologies Using Daily 50 Stock Returns: A Simulation Approach," Journal of Accounting Research (Supplement, 1984): 1-30. Edginton, E. S. Randomization Tests (Marcel Dekker, Inc.) 1980. Efron, B. "Computers and the Theory of Statistics: Thinking the Unthinkable," SIAM Review (1979), Vol. 21: 460-480. Fama, E. " The Behavior of Stock Market Prices," Journal of Business (January 1965): 631-674. Fama, E. and K. French. "Permanent and Temporary Components of Stock Prices," CRSP Working Paper No. 178, University of Chicago, July 1986a. Fama, E. and K. French. "Common Factors in the Serial Correlation of Stock Returns," Working Paper, University of Chicago (October, 1986b). Froot, K. "Consistent Covariance Matrix Estimation with Cross-sectional Dependence and Heteroskedasticity in Cross-sectional Financial Data," Working Paper, Massachusetts Institute of Technology, 1987. Gibbons, M. "Multivariate Tests of Financial Models: A New Approach," Journal of Financial Economics (March 1982): 3-27. Greenwald, B. C. "A General Analysis of Bias in the Estimated Standard Errors of Least Squares Coefficients," Journal of Econometrics 22 (1983): 323-338. Hansen, L. "Large Sample Properties of Generalized Method of Moment Estimators," Econometrica (1982): 1029-1054. Hughes, J., W. A. Magat, and W. Ricks. "The Economic Consequences of the OSHA Cotton Dust Standards: An Analysis of Stock Price Behavior," The Journal of Law and Economics, Vol. 29 (April 1986): 29-60. Hughes, J. and W. Ricks. "Accounting for Retail Land Sales; Analysis of a Mandated Change," Journal of Accounting Economics (August 1984): 101-132. Jaffe, J. "Special Information and Insider Trading," Journal of Business (July 1974): 410-428. Judge, G., W. Griffiths and T. Lee. The Theory and Practice of Econometrics, New York (1980). Kormendi, R. and R. Lipe. "Earnings Innovations, Earnings Persistence, and Stock Returns," Journal of Business (1987, forthcoming). 49

Kothari, S.P. and C. Wasley. "Measuring Security Price Performance in Size-clustered Samples" Working Paper, University of Rochester, 1986. Leftwich, R. "Evidence on the Impact of Mandatory Changes in Accounting Principles on Corporate Loan Agreements," Journal of Accounting and Economics (March 1981): 3-36. Linsmeier, T. J. "A Debt Covenant Rationale for a Market Reaction to a Mandated Accounting Change: The Case of the Investment Tax Credit," Working Paper, University of Iowa, February 1986. Lipe, R. "The Information Contained in the Components of Earnings," Journal of Accounting Research, (Supplement 1986, forthcoming). Lys, T. "Mandated Accounting Changes and Debt Covenants: The Case of Oil and Gas Accounting," Journal of Accounting and Economics (April 1984): 39-65. Marais, M. L. "An Analysis of a Multivariate Regression Model in the Context of a Regulatory Event Study by Computer Intensive Resampling," Working paper, University of Chicago, July 1986. Mundlak, Y. "On the Pooling of Time Series and Cross Section Data," Econometrica (January 1978): 69-85. Noreen, E., An Introduction to Testing Hypotheses Using Computer-Intensive Methods, Manuscript, University of Washington, 1986. Noreen, E., and J. Sepe, "Market Reactions to Accounting Policy Deliberations: The Inflation Accounting Case," The Accounting Review (April 1981): 253-269. Phillips, P. C. B. "The Exact Distribution of SUR Estimator," Econometrica (July, 1985): 745-756. Press, S. J., Applied Multivariate Analysis. New York: Holt, Rinehart & Winston (1972). Rao, C. R., Linear Statistical Inference and Its Applications, John Wiley & Sons: New York (1973). Rayburn, J. "The Association of Operating Cash Flow and Accruals with Security Returns," Journal of Accounting Research, (Supplement 1986, forthcoming). Ricks, W. "Firm Size Effects and the Association Between Excess Returns and LIFO Tax Savings," Journal of Accounting Research (Spring 1986): 206-216. Ryan, S. "Structural Models of the Price to Earnings Relation: Measurement Errors in Accounting Earnings," Working paper, Stanford University, 1986. Salamon, G. L. "The Econometric Properties of Alternative Security Return Methods in the Presence of Industry and Time Period Clustering," Working Paper, University of Florida, June 1985. Schipper, K. and R. Thompson. "The Impact of Merger-Related Regulations on the Shareholders of Acquiring Firms," Journal of Accounting Research (Spring 1983): 184-221. Schipper, K., and R. Thompson. "The Impact of Merger-Related Regulations Using Exact Distributions of Test Statistics," Journal of Accounting Research (Spring 1985): 408-415. Sefcik, S. and R. Thompson. "An Approach to Statistical Inference in Cross-Sectional Regression with Security Abnormal Returns as Dependent Variable," Journal of Accounting Research (Autumn 1986): 316-334. Stambaugh, R. "On the Exclusion of Assets from Tests of the Two-parameter Model: A Sensitivity Analysis," Journal of Financial Economics (November 1982): 237-268. Stoll, H. and R. Whaley."Transactions Costs and the Small Firm Effect," Journal of Financial Economics, Vol. 12 No. 1 (June 1983): 57-80. 50

f 1. Thompson, R. "Conditioning the Return-Generating Process on Firm-Specific Events: A Discussion of Event Study Methodologies," Journal of Financial and Quantitative Analysis (June 1985): 151-172. White, H. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica (May 1980): 817-828. Wilson, P. "The Relative Information Content of Accruals and Cash Flows: Combined Evidence at the Earnings Announcement and Annual Report Release Date," Journal of Accounting Research, (Supplement 1986, forthcoming). Wilson, P. "The Incremental Information Content of the Accrual and Funds Components of Earnings After Controlling for Earnings," The Accounting Review (April 1987, forthcoming). 51