Research Support September 1993 School of Business Administration ON THE CLUSTERING OF AGENTS' DECISIONS: HERD BEHAVIOR VERSUS THE ENDOGENOUS TIMING OF ACTIONS Working Paper #727 Faruk Gul Stanford University and Russell Lundholm University of Michigan

lI

Research Paper #1195 On the Clustering of Agents' Decisions: Herd Behavior versus the Endogenous Timing of Actions Faruk Gul* and Russell Lundholm** latest version: September 1993 * Stanford University ** University of Michigan We thank Ayman Hindy, Steve Huddart, Nahum Melumad, Ennio Stacchetti and Jeffrey Zwiebel for helpful suggestions. Gul gratefully acknowledges support from the Alfred P. Sloan Foundation and the National Science Foundation.

I. INTRODUCTION In many economic situations agents base their decisions largely on the observed decisions of other agents. Upon observing an empty restaurant we typically conclude that the food is bad. There is evidence that money managers tend to choose their portfolios based on the observed choices of other money managers, currency traders tend to gather the same information as other currency traders, industries are unusually slow to deviate from standard practices, female grouse tend to enter male territories already populated by other female grouse, and analysts tend to bias their forecasts toward the previously-made forecasts of other analysts (see Scharfstein and Stein (1990); Froot, Scharfstein and Stein (1990), Zwiebel (1991), Bikhchandani, Hirshleifer and Welch (1991) and Stickel (1990, 1991), respectively). Deferring to conventional wisdom may not be irrational. Indeed a basic message of information economics is that we can often infer the information of other agents by observing their actions. Unfortunate outcomes can arise, however, when agents defer to others' decisions so much that they ignore their own information completely and simply take the same action that predecessor agents have taken. Such a strategy is known as "herding" and its potential causes have been studied extensively. The purpose of this paper is to analyze the observed similarity of agents' decisions. In what follows we will distinguish between clustering, which is the observation that agents' decisions tend to be very similar, and herding, which is the statement that clustering occurs because some agents ignore their own information entirely. We provide a formal definition of clustering, illustrate a simple mechanism for clustering that does not involve herding and argue that clustering is likely to be a common phenomenon while herding is not. In particular, we consider a setting where agents choose both an action and the time at which to take the action. We show that allowing agents to choose when to act creates clustering, even when the previously studied motivations for herding are absent. Furthermore, the clustering of agents' choices that results from herding is informationally inefficient -- agents ignore useful information -- whereas the clustering of choices in our

2 model is due to an informational efficiency; agents use their own information and can infer other agents' information as well. Before describing our approach in more detail, however, a brief review of the herding literature is in order. The existing literature offers two rational explanations for herding, which we label as statistical herding and reputational herding. Examples of statistical herding are given by Bikhchandani, Hirshleifer and Welch (1991), and Banerjee (1990). Consider the situationm in Bikhchandani, Hirshleifer and Welch (1991) where agents make binary decisions (accept or reject) in a pre-determined sequence. Each agent has a conditionally independent signal about the value of each choice and can observe the choices of all predecessor agents. Suppose the first two agents receive "high" signals and choose to accept. It is quite possible that the information implicit in the first two agents' actions overwhelms whatever information the third agent might have and hence she too will choose to accept. But now all subsequent agents are in exactly the same position as the third agent; they will eachignore their own information and choose to accept. This type of result, known as an "information cascade," renders the economy informationally inefficient in the sense that useful information is ignored. Note, however, that the result hinges on the binary nature of the agents' choice set. It can be shown that for any fixed number of players, the larger the choice set, the less likely statistical herding is to occur. In particular, if the choice variable is continuous and agents are rewarded according to the proximity of their choice to the fullinformation optimal choice, then no information goes unused. Each agent in the sequence uses their own information and any information recoverable from the predecessor agents' decisions. Although the agents' choices are closer together than if they were made simultaneously, the economy is informationally efficient. Examples of reputational herding are given in Scharfstein and Stein (1990), Froot, Scharfstein and Stein (1990), Zwiebel (1991), and Trueman (1991)). Consider the situation in Scharfstein and Stein (1990) where each agent receives a signal about the value of alternative choices, but the signal may or may not be informative. Informative signals

3 have correlated errors while uninformative ones are independent, and agents do not know whether their signal is informative. An agents does not attempt to make the most valuable decision; rather, she attempts to maximize the probability that an outsider will place on the possibility that she is an informed agent (i.e. her reputation). Because informed agents receive signals with correlated errors a subsequent agent maximizes her appearance as an informed agent by taking the same action as the predecessor agent, regardless of her information. Note that this result depends critically on the assumptions that agents' incentives are not aligned with the value of the actual outcome and that informed agents' signals have correlated errors. If an agent wanted only to make the most valuable decision then her own information would influence her decision and if the signal errors were uncorrelated then common decisions would not indicate the presence of informed agents. Reputational herding could be mitigated by a contract that would align the interests of the agent with the value of the outcome to the firm. In short, the herding literature explains clustering by showing that, for either statistical or reputational reasons, agents rationally ignore their own information and mimic other agents' actions. Note, however, that in both statistical and reputational herding the order in which agents act is given exogenously. We offer a different and arguably more plausible explanation for the clustering of agents' decisions based on the idea that agents choose the timing of their actions strategically. If agents choose when to act then their timing choice may reveal some of their information. Furthermore, if the choice of when to act is informative then so is the choice of when not to act. Thus, the very first actor knows something about the other agents' information by the simple fact that they have not yet acted. The endogenous timing of actions creates an information leak that may enable the first actor to make a more informed decision. While it may appear that the second agent is biasing her action toward the first agent's choice (as in the herding models), we show that the first agent is actually altering her decision toward the forthcoming decision of the second agent. This source of clustering is labeled anticipation. In addition, if the cost of

4 delaying an action is higher for agents with more extreme information, then in equilibrium they choose to act first. Holding aside the improved decision of the first agent, if the most extreme agent acts first and the second agent can recover the first agent's signal by observing the action, then the most extreme differences in the two agents' decisions cannot arise. This source of clustering is labeled ordering. In sum, the strategic timing of actions. decreases the expected gap between the actions of the agents when those who have more l extreme signals act first (i.e. ordering) and when the those who act first to infer some of the non-acting agents' information (i.e. anticipation). Many economic situations present a trade-off between waiting for additional information to present itself and acting quickly on the basis of less information. A money manager may learn something about the optimal allocation between stocks and bonds by waiting to observe another manager's allocation choice, but the longer he waits the longer he holds a portfolio that is suboptimal based on his own information. A firm may wait to' observe another firm's success with a new product before deciding how vigorously to enter the market, but the delay will cost the firm some market share if it subsequently chooses to enter. The simple discounting of future payoffs creates a delay cost. The tradeoff between more informed decisions and the urgency to make a decision is the main ingredient of our model.1 In our setting, agents prefer to make decisions that are accurate (in the sense of being close to the full-information decision) and, for a given level of accuracy, they prefer to make decisions with as little delay as possible. In the next two sections we focus primarily on a game where the cost of delay increases as the value of the unknown variable increases. While none of the previouslystudied causes of "clustered" decisions are present, we demonstrate that the agents' decisions are closer together than when the timing of actions is exogenous. We then show 1Other papers that demonstrate the cost of delayed decisions include: Hendricks and Kovenock (1989), who study the tradeoff between waiting to see the results of another firm's oil exploration and the cost of delaying the profit if the results are favorable; Bulow and Klemperer (1991), who study how a buyer trades off waiting to get a lower price against the probability that the seller will run out of stock; and in the context of a public goods problem, Bliss and Nalebuff (1984), who show that the agent who suffers most by waiting for the public good is the first to supply it privately.

5 that clustering occurs for any delay cost that is either a strictly monotone or strictly convex function of the unknown variable. We illustrate this general theorem with two examples from a setting where the first agent's decision remains unchanged as time passes (so there is no anticipation). In one case the cost of delay is higher for agents with more extreme signals, so they forecast first. Clustering occurs because the most extreme news is available to the second agent and so she doesn't make the most extreme errors. However,. in another case where the cost of delay is higher for agents with less extreme signals, the second agent makes the most extreme forecast errors and agents' forecasts become dispersed rather than clustered together. In section IV we discuss two extensions to our model. The first demonstrates that our results hold in a setting where, in addition to time and accuracy, agents are also concerned about their relative performance. The second shows how our results carry over to a model with many agents. We conclude in section V by arguing that, in general, the set of conditions sufficient for clustering are quite mild and, in particular, they are considerably less stringent than the assumptions found in the herding literature. II. THE MODEL Consider a model with two agents. Each agent is interested in predicting the future value of a project, denoted by the realization of a random variable W and, holding the accuracy of the prediction constant, would prefer to make her prediction sooner rather than later. Each agent has information about the realization of W; in particular, W = S1 + S2 and agent i observes the realization si (uppercase characters denote random variables and lowercase variables denote their realizations). For simplicity we assume that the Si's are independent and have a uniform distribution on the interval I = [0,1]. Denote agent i's prediction by zi and the time of the prediction by ti. Each agent makes only one prediction and the second agent observes the first agent's prediction.

6 To capture the tradeoff between the accuracy of the prediction and the time at which it is made, assume agent i's utility is given by u(w,zi,ti) = - (w - zi)2 - awti, (1) where a > 0 is a constant. The utility function trades off the cost of an error in the agent's prediction (the first term) against the cost of delaying the prediction (the second term). Note that, absent some interaction with the other player, there is no reason to delay in making a prediction; the reason an agent may choose to wait is to observe the other agent's prediction. If the forecast of the agent who acts first depends in some way on her realized si then by observing this forecast the agent who acts second will be more informed about w. The delay cost is increasing in the realized w, capturing the idea that there is more urgency in forecasting more valuable projects. Later we present a model where the time cost is increasing in the squared deviation of w from its prior expectation, capturing situations where there is greater urgency in predicting extreme realizations in either direction. Finally, a parameterizes the utility function's relative sensitivity to accuracy versus delay. For simplicity, the extensive form we consider precludes any pre-play communication between the agents. This is noteworthy because the utility of agent i given in (1) does not depend on agent j's actions, so both agents could achieve the highest utility by simply sharing their signals prior to the beginning of the game. Of course, the incentive to exchange information would be eliminated in a game where the players' utilities are decreasing in the accuracy of their opponent's decision. This would be the case in a model of relative performance evaluation, such as in Zwiebel (1991), or in any zero-sum game. Later we modify our model to include a term in agents' utilities that is decreasing in their opponent's accuracy and show that none of our conclusions are altered by the modification.

7 We focus primarily on the model without a relative performance term to highlight the source of clustering in the simplest possible setting. An agent's prediction zi can be interpreted in a number of ways. It may literally be a forecast, as might be the case if the agents were financial analysts or macroeconomists. Alternatively, it may be a more tangible action choice. For example, zi may be the size of an initial investment in a new project and w the optimal level of investment based on all available information. The agent prefers to make an investment as close to the fullinformation optimum as possible and, the larger the full-information level of investment, the more costly it is to delay the decision. In general, all that we require of zi is that it be a one-to-one function of the agent's expectation of W. Denote the strategy profile of the two agents by a = (al,a2). This set is potentially quite large, but a few observations will greatly limit the set of possible best responses. First, agent i's prediction should minimize the mean squared error of the forecast (the first term in the utility expression) conditional on the agent's signal si, the equilibrium strategy profile a and the elapsed time t. The forecast with this property is si + E(Sjla,t). Second, note that once one agent has made a forecast there is no benefit to the other agent in delaying her forecast any longer. Thus, once one player makes her prediction the other player predicts immediately afterward. With these observations, a strategy for agent i is fully described by a function ti:I-9t+, where ti(si) specifies the latest possible time that agent i will make her forecast (she will forecast earlier if the other agent forecasts before this time). We will refer to the agent who chooses to forecast first as the first agent, although this may be either the agent with signal sl or the agent with signal s2. A final observation is in order. For games in continuous time, guaranteeing that strategies imply well-defined outcomes entails certain technical difficulties (see' Stinchcombe (1988)). So, for example, the strategy that says one player will predict "immediately after" the other player is somewhat vague. To make this precise, our game should be considered as the limit of a series of discrete time games as the length of the time

8 between periods goes to zero. We show in the appendix that there is a unique symmetric equilibrium outcome to the discrete time game, and that the equilibrium outcome converges to the outcome given for the continuous time game. III. RESULTS The Symmetric Equilibrium Most of our attention will focus on the symmetric equilibrium where tl(si) = t2(si)jTt(si). In particular, we show that there exists a unique symmetric equilibrium. In this equilibrium t'(si) < 0 and t(l) = 0. Consider such a strategy profile. Note that, because t(si) is invertible, the second agent can infer the first agent's signal from the time the first agent made her forecast; denote the inverse of t(si) by s(ci). Because t(si) is downward sloping, if the game proceeds to time t without a forecast, each agent knows that the other agent's signal is not in the region [s(i),l]. Thus, if the first agent chooses to forecast at time i then her forecast is si + s(t)/2. Finally, because t(si) is invertible, finding the t(si) that maximizes agent 1's expected utility for the given strategy of agent 2 is equivalent to finding the s e I that minimizes 1 s s J a(sl + s2)t(s2)ds2 + J (s2 - s/2)2ds2 + J a(sl + s2)t(s)ds2 (2) s 0 0 for each sl E I. Of course, agent 2 solves the analogous problem. To understand this expression note that for s2 E [s,l] agent 2 will forecast first. Thus, for this region agent 1 will forecast immediately after agent 2, get the prediction exactly right, and incur time cost a(si + s2)t(s2). This is the first term in (2). For s2 e [O,s) agent 1 will forecast first. In this case she will forecast sl + s/2, her forecast accuracy cost will be (s2 - s/2)2, and her delay cost will be a(sl + s2)t(s). These are the second and third terms in (2), respectively.

9 The following proposition gives the unique symmetric equilibrium of this game. Proposition 1: There exists a unique symmetric Nash equilibrium outcome for the game described above. In this equilibrium agent i predicts (3/2)si at time t(si) = (1 - si)/6a if her opponent has not made a prediction; otherwise she predicts si + (2/3)zj = sI + s2 immediately after her opponent's announcement (at time t[(2/3)zj]). If agent i observes that tj # (2/3)zj then agent i forms an arbitary conjecture about the distribution of sj and forecasts si plus the mean of sj given her new conjecture.2 (The proof is given in the appendix.) The intuition for why t(si) is decreasing and continuous is straightforward. First, t(si) cannot be increasing because it is more costly for agents with higher signal realizations to wait than it is for agents with lower signals, and the gain to waiting is not signaldependent. Second, there can be no region where t(si) is constant; if there were, an agent could wait an arbitrarily small amount of time and gain a strictly positive amount of additional information. Finally, t(si) must be continuous because no agent would be willing to wait the strictly positive amount of time represented by the discontinuity to gain an infinitesimal amount of additional information. The Asymmetric Equilibria An asymmetric sequential equilibrium for our model is the following. Suppose agent i's strategy is to make her prediction immediately and agent 2 is willing to wait indefinitely before being the first to forecast: tl(sl) = 0 and t2(s2) = oo for all sl and s2. In this case agent 1 forecasts sl + 1/2 and agent 2, after observing agent 1's prediction, 2Note that off-equilibrium forecasts (i.e. forecasts at time Tj # t[(2/3)zj]) play a very limited role in our model. This is because the utility of the first agent does not depend on the forecast of the second agent. Hence, the second agent's off-equilibrium beliefs do not affect the first agent's expected utility calculations and, consequently, there is a multiplicity of off-equilibrium conjectures associated with the unique symmetric equilibrium outcome.

10 forecasts si + s2. If agent 2 is willing to wait forever before making a forecast then agent 1's best response is to forecast immediately. Similarly, if agent 1 is going to forecast immediately then it is agent 2's best response to wait an instant, observe agent l's prediction, and then forecast Another asymmetric equilibrium reverses the roles of agent 1 and agent 2. These equilibria are given in the following proposition. Proposition 2: Two asymmetric sequential equilibria in the game described above have the properties that agent i predicts si + 1/2 at time 0 and agent j predicts sj + (zi - 1/2) = sl + S2 immediately afterward. If agent i fails to predict at time 0 agent j maintains her belief that si is distributed uniform on I and continues to wait. Note that neither agent's strategy depends on her signal, so observing the time at which an agent acts is uninformative. Consequently, this equilibrium yields exactly the same forecasting and timing behavior that occurs if the order of the agent's actions is given exogenously. As such, it serves as a useful benchmark when evaluating the degree of clustering in the symmetric equilibrium created by the endogenous timing of forecasts.3 Herding and Clustering In our economy neither reputational herding nor statistical herding arise. Agents' utilities do not depend on an outsider's perception of their ability, so the second agent has no reason to mimic the first agent in order to influence someone else's assessment of her ability. In addition, each agent's decision variable is chosen from a continuum, so the second agent always uses her own information to improve her decision. Nonetheless, agent's decisions are clustered together. 3As in the standard war of attrition problem, the following other equilibria exist: for all si > s, agent i forecasts si + 1/2 with probability p at time zero and for all si < s, agents i and j play a suitably rescaled version of the symmetric equilibrium.

11 In the previous herding studies, agents' decisions are clustered together in an extreme way: subsequent agents take the same action as the predecessor agent. In these studies there was no need to define a more general concept to capture the notion that "agents' decisions are too close together." For our purposes it is necessary to define a more sensitive metric of clustering. We will say that clustering has occurred when the squared difference between the two agents' predictions in the economy with endogenously. ordered forecasts is smaller in expectation than the squared difference between the two agents' predictions when the forecasting order is exogenously given.4 Because the forecasts that arise when the order is exogenous are the same as in the asymmetric equilibria to our game, the appropriate benchmark is naturally defined from within the model. Let den = E{ [Z1 - Z2]2} be the expected squared difference between the two predictions when the forecasts are ordered endogenously and dex be the analogous measure when the forecasts are exogenously ordered, so that our clustering measure is dex - den. The second agent can always infer the first agent's signal from the first agent's forecast using the relation zi = Si + E(Sjl(a,t), so the second agent always forecasts zj = sl + s2. Thus, the difference in forecasts is effectively the difference between the second agent's signal and the first agent's forecast of that signal. Denote the first agent's signal by X = JS1 + (1-J)S2, where J=1 if t(sl) < t(s2) and J=O otherwise, and the second agent's signal by Y = (l-J)Si + JS2. With this, den can be written as den = E(E{[Y - E(YIX)]21X}); that is, the mean-squared error of the first agent's forecast of the second agent's signal, averaged over all possible realizations of the first agent's signal. Further, the inner expectation is Var(YIX), so den = E { Var(YIX). When the forecasting order is given exogenously, who forecasts first is uninformative. In this case the first agent's forecast of 4More generally, for the models in propositions 1 and 4, the results that follow can be established for any measure of clustering that is increasing in the absolute difference between forecasts (i.e., the mean squared difference, the mean absolute difference etc.). For these models, the absolute difference between endogenously-ordered forecasts is first order stochastically dominated by the absolute difference between exogenously-ordered forecasts.

12 the second agent's signal is simply the prior mean, so dex = E{ [Si - E(Sj)]2} = Var(Si). The clustering measure can now be written as dex - den = Var(Si) - E{Var(YIX)}; that is, the difference between the exogenous variance of a signal and the expected variance conditional on the first agent's signal. A different decomposition of den will illustrate two sources of clustering. Note that den = E{Var(YIX)} = Var(Y) - Var{E(YIX)} = Var(Y) - E{[E(YIX) - E(Y)]2}, so that dex - den = Var(Si) - Var(Y) + E{[E(YIX) - E(Y)]2}. (3) ordering anticipation Label the first two terms together as ordering and the last term as anticipation. Anticipation represents the change in the first agent's forecast that results from the realization that she is indeed the first agent (i.e. the conditioning on X). This inference potentially informs her that the second agent's signal must not be in certain regions of I. For instance, in the previous model the ex ante forecast of the second agent's signal is E(Y) = 1/3.5 However, when the first agent realizes that she is indeed the first agent she can conclude that the second agent does not have a higher signal than she does, so her optimal forecast is si/2. 1 In expectation, then, anticipation contributes J [si/2 - 1/3]2dsi = 1/36 to clustering in the 0 previous model. Anticipation cannot be negative; the first agent's forecast cannot become less informed than the ex ante forecast E(Y). Consequently, unless the first agent's 5Note that the forecast is of the second agent's signal as opposed to the agent with signal S2, which would be 1/2.

13 forecast is completely insensitive to the passage of time, the inference that the first agent can draw from the second agent's lack of action contributes to clustering. Ordering is the difference between the ex ante variance of an agent's signal and the variance of the second agent's signal. Thus, if the equilibrium strategy is for agents with more extreme signals to forecast first then the second agent's signal must be less extreme, so it will have a smaller variance. For instance, in the previous model Var(Si) = 1/12 but L Var(Y) = Var(Si I Si < Sj) = 1/18, so ordering contributes 1/36 to clustering. The next proposition shows that clustering is a general phenomenon. In particular, while the delay cost in the previous model is awt, we show that clustering occurs for delay cost functions of the form ag(w)t, where g(w) denotes the function g: [0, 2] - 9S+ and is either strictly monotone or strictly convex; that is, when the expected cost of delay is higher for agents with relatively more extreme signals. Further, we show that ordering is positive when g(w) is either strictly monotone or strictly convex, so clustering occurs even absent any anticipation. Proposition 3 Suppose agent i's objective function is u(w,zi,ti) - (w - zi)2 - ag(w)ti, where a > 0 and g(w) is either strictly monotone or strictly convex. In this case i) there exists a symmetric equilibrium such that t(si) is strictly quasi-concave;6 ii) in all symmetric equilibria, t(si) is strictly quasi-concave; and iii) in any symmetric equilibrium, ordering is positive, so clustering occurs. (The proof is in the appendix.) 6As in proposition 1, agent i forecasts si + E(sj I t(sj) > x) at time t(si) if her opponent has not made a prediction; otherwise she predicts sl + s2 immediately after her opponent's announcement.

14 As seen in the proof, the existence of a strictly quasi-concave t(si) depends only on the strict quasi-convexity of g(w). Adding the requirement that g(w) is either strictly monotone or strictly convex allows us to prove that any t(si) is strictly quasi-concave and clustering occurs for any symmetric equilibrium. With strictly convex or strictly monotone costs, it is most costly for agents with extreme signals to delay their forecast. Therefore,, the best response to any strategy profile in a symmetric equilibrium is for agents with more extreme signals to forecast first. If the equilibrium results in agents with relatively more extreme signals forecasting second, then ordering can work against clustering. The next two examples illustrate how ordering can either contribute to or mitigate against clustering. To isolate the effect of ordering, each example uses symmetric delay costs, so that the equilibrium strategy allows no anticipation. Symmetric Equilibrium with Two-Sided Cost of Delay Suppose that the cost of delay increases as the squared deviation between W and its prior expectation increases. The idea here is that an agent is eager to act when the value of the unknown variable is extreme in either direction. A money manager's allocation between stocks and bonds is be a good example, where W is the optimal fraction to have invested in stocks given full information and E(W) is the existing fraction. Another example is an analyst's forecast, where it is as valuable to predict extreme decreases in earnings as it is to predict extreme increases. This idea is captured by the following utility function: u(w,zi,ti) =- (w- zi)2 - a(w - 1)2ti. (4) We show that a symmetric equilibrium exists for this model, where t(si) is symmetric about 1/2, increasing for sie [0,1/2), decreasing for siE (1/2,1] and t(l) = t(O) = 0.

15 Note that as time passes without a prediction each agent learns that the other agent's signal is not in an extreme region of I. However, because t(si) is symmetric about the prior mean of sj, the optimal forecast does not change over time: zi = si + 1/2 for all t. Once the first agent makes her prediction, however, the second agent can use the observed forecast and the time that it was made to recover the first agent's signal (as in the model with onesided cost of delay). The t(sl) that maximizes agent 1's expected utility for the given strategy of agent 2 is determined by finding, for each sl E I, the s E I that minimizes S 1 1-s 1-s J a(sl+s2-1)2t(s2)ds2 + J a(sl+s2-1)2t(s2)ds2 + I (s2-1/2)2ds2 + I a(sl+s2-1)2t(s)ds2. (5) 0 1-s s s As before, agent 2 solves the analogous problem. To understand this expression, note that for s2E {[0,s)u(1-s,l]} agent 2 will be the first to forecast. In this case agent 1 will get the forecast exactly right and incur only the time cost. This is given by the first two terms in (5). For s2e [s,1-s] agent 1 will be the first to make a prediction. For this case the third term in (5) measures the cost of her forecast error and the fourth term measures her cost of delay. The following proposition gives the equilibrium. Proposition 4: There exists a symmetric sequential equilibrium to the game with twosided delay cost. In this equilibrium agent i predicts si + 1/2 at time t(si) = - -3 log(l2si - 11) if her opponent has not made a prediction; otherwise she 4ac predicts si + (zj - 1/2) = si + s2 immediately after her opponent's announcement of zj. If her opponent forecasts at some time rj X t(zj - 1/2) then agent i forms an arbitrary conjecture about the distribution of sj and forecasts si plus the mean of sj given her new conjecture. (The proof is in the appendix.)

16 While the first agent's forecast does not change with the passage of time, so there is no anticipation, clustering s occurs in this model due to ordering. In particu. lar, Var(Y) = E{ min[(Si - 1/2)2, (S2 - 1/2)2] } = 1/24 so the measure of ordering equals 1/12 - 1/24 = 1/12.7 Thus, the agents' forecasts are closer together in the economy with endogenouslyordered forecasts than when the forecasting order is given exogenously; that is, clustering occurs. The expression for Var(Y) given above clearly demonstrates the source of the ^ clustering in this model. The equilibrium ordering reveals the most extreme signal first, leaving only the difference between the less extreme signal and the forecast of 1/2. While the ordering of agents has not altered their point forecasts, it does rule out the extreme regions of the joint distribution of Si and S2. Proposition 3 gave some sufficient conditions for clustering to occur. By eliminating anticipation and inverting the two-sided delay cost of the previous example, the next example demonstrates how ordering can cause forecasts to be dispersed rather than clustered. Suppose that agents were more eager to act when the value of W was closer to its prior mean. This idea is captured by the following utility function: u(w,zi,ti) = - (w- zi)2 - a[l - (w - 1)2]ti. (6) It can be shown that a symmetric equilibrium exists to this model where t(si) is symmetric about 1/2, decreasing for siE [0,1/2), increasing for sie (1/2,1] and t(1/2)=0. As in the previous example, the optimal forecast remains zi = si + 1/2 for all t. Unlike the previous example, however, the agent with the less extreme signal is the first to forecast. Thus, Var(Y) = E{ max[(Si - 1/2)2, (S2 - 1/2)2] } = 1/8, so the measure of ordering is 1/12 - 1/8 = - 1/24. We offer this example to illustrate that ordering can be either positive or negative; 7To compute this value note that the cummulative distribution function of v = (si - 1/2)2 is 24v with support on (0,1/4) and so the probability density function of the minimum of two independent vi's is 2(1 - 241u)/4lu with support on (0,1/4).

17 not because we have a particular economic situation in mind that exhibits this type of delay cost.8 Efficiency In the herding literature the economy is informationally inefficient in the sense that. subsequent agents ignore their own signals when the information is still useful in making a: superior decision. The observed clustering of behavior in many economic situations is seen as the undesirable outcome of herding. In contrast, our economy is informationally very efficient. Not only do both agents use their own information, but the second agent can recover the first agent's information by observing her forecast and, as long as the delay cost is not completely symmetric, the first agent can partially infer the second agent's signal from the passage of time. Even when the delay cost is two-sided, as long as the cost is higher for agents with more extreme signals, the first agent doesn't make the most extreme forecast errors. In all cases, both agents use their own information and all the information provided by the endogenous variables in the economy. Another kind of inefficiency is present in the symmetric equilibrium of our model, however. In the symmetric equilibrium each agent trades off her own gain in accuracy with her own cost of delay without considering that the other player would also benefit from an earlier prediction. By delaying their predictions, each agent imposes a negative externality on the other agent. In contrast to the symmetric equilibrium, there is no delay cost in the asymmetric equilibria, or in a setting where the order of forecasting is given exogenously; the first agent gains nothing from waiting, so she forecasts immediately. The asymmetric equilibria do not strictly dominate the symmetric equilibrium, however, because the first agent's 8Another way that anticipation may be zero is if the cost of delay does not depend on the agent's signal. For example, if the utility function is u(x,zi,ti) = - (x- z)2 - cti, then the passage of time does not reveal anything about an agent's signal and the optimal forecast remains zi = Si + 1/2. This is a standard war of attrition; its symmetric equilibrium strategy at time t is for each agent to mix between forecasting and waiting based on the probability density f(t) = 12e-12at.

18 forecast is less accurate in the asymmetric equilibria. Nonetheless, the sum of agents' expected utility is higher in the asymmetric equilibria than in the symmetric equilibrium, so there is a sense in which the asymmetric equilibria are superior. In particular, if utility is transferable between agents then the asymmetric equilibria Pareto-dominate the symmetric equilibrium.9 Suppose that agent 1 forecasts first in the asymmetric equilibrium. Her ex ante expected utility (averaged over all realizations of si) is 1 1 J J-(1/2 - 2)2ds2dsl =-12 00 Agent 2 forecasts in the next instant and so her expected utility is zero. In the symmetric equilibrium each player's expected utility is given by 1( [-si 6 L 12 J (51 + 2 s - j - J (sl + S2) 6S2)s2 - I (s + S2)(1 - )ds2 - I ( - s2)2ds2 ds 1 F 3 2 0 sl 0 0 = [ (1 - sl)2(5si + 1) s1 Sl (-i 1l 0 36 12 4 ds1 16. The sum of expected utilities in the asymmetric equilibria (- 1/12) is greater than the sum of expected utilities in the symmetric equilibrium (-1/8). Note also that the expected utilities do not depend on the players' sensitivity to the cost of delay, as parameterized by a. In particular, reducing the cost of delay does not reduce the public goods problem present in the symmetric equilibrium. Although it becomes less costly to wait as the delay cost diminishes, in equilibrium the second agent will wait longer before the first agent reaches the point where her gain from increased accuracy equals her loss from additional delay. 9While the sum of utilities as defined here is larger at the asymmetric equilibrium, there are behaviorally equivalent representations of preferences (for example, the cube of the utility given here) such that the symmetric equilibrium yields the higher sum.

19 IV. EXTENSIONS A Model with Relative Performance Evaluation For the models presented so far, both agents could increase their expected utility by simply sharing their information prior to the beginning of the game. However, if an agent derived some utility from the forecast error of the other agent -- as would be the case if the agent was subject to some type of relative performance evaluation -- then she would no at longer find it in her best interest to truthfully share her information. The following utility function captures this idea: u(w,zi,zj,ti) =- (1-3)(w - zi)2 - awti + 3(w - zj)2, (7) where P is positive and small enough to keep agent i's focus primarily on her own forecast error (it is shown in the appendix that P < 1/18 satisfies the second order condition). For this model, agent i's utility is increasing in the forecast error of agent j so an agent's offer to truthfully her reveal information prior to the beginning of the game is no longer credible. Consider the symmetric equilibrium and, as before, consider the strategy profile t'(si) < 0 and t(l) = 0. For each sl E I, agent l's objective is to choose s E I to minimize 1 1 I a(sl + s2)t(s2)ds2 - I P(sl - 2 dS2 s s s 2 s s 2 + J (l-3)(s2 - ) ds2 + I a(sl + s2)t(s)ds2 - P(sl - s) ds2 (8) 0 0 0 As before, agent 2 solves the analogous problem. The first, third and fourth terms of (8) are the same as in (2). The second term in (8) captures the effect of agent 2's forecast error when agent 2 is the first to forecast. In this case agent 2 forecasts (3/2)s2 and agent 1

20 forecasts immediately afterward. The last term in (8) captures the effect of agent 2's forecast error when agent 1 is the first to forecast In this case agent 2 forecasts s + s2 (which at this point need not be equal to sl + s2). Note that agent 1 internalizes the effect of agent 2's off-equilibrium belief through the last term-in (8); choosing an s X sl misleads agent 2 by exactly the difference between s and sl. Proposition 5: There exists a symmetric sequential equilibrium outcome for the game with"' relative performance evaluation described above. In this equilibrium agent i predicts (3/2)si at time t(si) = (1 - si)/6a if her opponent has not made a prediction; otherwise she predicts Si + (2/3)zj = sl + s2 immediately after her opponent's announcement (at time t[(2/3)zj]). If agent i observes that tj X (2/3)zj then agent i continues to believe that E(Sj) = s(cj) and forecasts zi = si + s(Tj) immediately. (The proof is in the appendix.) The equilibrium strategy described in proposition 5 is the same as in the original model given in proposition 1. In this model, at the margin an agent trades off the cost of waiting an instant against the benefit of improving her own forecast and the benefit of harming the other agent's forecast by misleading her. By weighting the agent's own forecast error and the other agent's forecast error as a convex combination, the combined marginal benefit is exactly as in the original model. It is not incentive-compatible for agents to share their information prior to the beginning of the game in this model, yet it exhibits the same behavior as our original model. This justifies our original assumption of no preplay communication. An N-Person Game The basic results of the two-person game will continue to hold in an n-person version of our model, where the future value of the project is now W = Si + S2 +...+ Sn. A strategy in this game specifies a set of functions, each specifying the maximum amount

21 of time an agent will wait after the beginning of the game, and then wait after each observed forecast, before making her forecast. Denote a strategy in a symmetric equilibrium by the set {tk(s,y): ke { 1,2,...,n} }, so that tk(si,y) gives the maximum amount of additional time agent i will wait before forecasting, given that k agents have not yet forecast and the sum of the forecasts made so far is y. As in the two-person game, tl(si,y) = 0 for all si and y; once everyone else has acted there is no reason to delay. Note that in the n-person game each subgame that ensues initially and after a forecast is essentially a rescaled version of the two-person game. Two features in particular remain the same. First, for one-sided delay costs, the cost of delaying any increment of time is higher for agents with higher signals. While some of the unknown components of W may be realized, it is still the case that, for agents who have not yet announced, the expected value of W is higher for agents with higher signals. Second, the expected forecast accuracy does not depend on an agent's own signal. An agent's own signal is forecast without error, as are the realized signals an agent observes from previous agents' forecasts. Using arguments similar to those given in the appendix for the two-person game, these two facts can be used to prove that in a symmetric equilibrium, for all k and y, tk is strictly decreasing in s. The tk are non-increasing in s because the expected cost of delay is higher for agents with higher signals. To see why the tk are strictly decreasing, given that they are not increasing, suppose that for some k, tk is constant in some region of I. An agent with a signal in this region can gain a strictly positive increase in forecast accuracy by waiting an arbitrarily small amount of time. This contradicts the supposition that the tk is constant in some region. Two implications follow from the fact that tk is decreasing in s for all k and y in the n-person game. First, there will be intervals of time in which no forecasts are made, just as in the beginning of the two-person game, but there will never be a frenzy of forecasting activity. Basically, agents' timing choices are strategic substitutes; a more aggressive

22 choice of when to forecast is met by other agents with less aggressive choices. They benefit by waiting to observe the other agent's forecast. Second, the two sources of clustering in the two-person model, anticipation and ordering, are also present in the nperson model. Because each tk is invertible, the passage of time is informative. Each agent's forecast will incorporate some knowledge of the subsequent agents' signals, thus... moving all forecasts toward the full-information prediction. Furthermore, agents with more extreme signals forecast sooner, so subsequent agents do not make the most extremez forecast errors. V. CONCLUSION We have provided a framework in which clustering and herding can be defined formally and have shown that the tradeoff between delayed decisions and more accurate decisions creates clustering without the informational inefficiencies that accompany herding. In this final section we argue that clustering is likely to be a general phenomenon while herding is not. Given a finite sequence of observed actions, zj, Z2,... Zn, the "natural" assumption is that each agent i knows her own information and the information of all the agents j < i at the time she takes her action zi. That is, she knows the signals (i.e. types) of all agents that preceded her and knows nothing about the signals of agents who have not yet acted, other than what she can deduce from her prior and the realizations of the earlier signals. Thus, at the end of the game, all the agents' information is revealed. This "natural" level of information at each stage of the game corresponds to the level of information that players have in the two asymmetric equilibria discussed in section mil. Using the "natural" assumption as a benchmark, note that both statistical and reputational herding generate clustering because, in equilibrium, the typical agent i knows less than under the natural assumption. In the statistical herding models this is because the binary choice sets provide an insufficient vocabulary to sustain a fully separating

23 equilibrium, given the incentives of the players. At some point in the sequence of decisions, the information available from observing predecessor agents' decisions overwhelms agent i's information. Thus, she ignores her information and, consequently, her decision does not transmit her information to subsequent players. Similarly, in reputational herding the sequence of decisions fails to aggregate information at some point because the incentives of agents are such that they maximize their reputation by pooling with their predecessor agent. Hence, in a 3-person model the third agent would not have complete information about the second agent's signal. Herding can be eliminated by enriching the setting in a way that allows prior agents' information to be transmitted; a finer set of action choices eliminates statistical herding and more appropriately aligned incentives eliminates reputational herding. It is in this sense that we feel that herding is not a particularly robust phenomenon. In contrast to the herding models, clustering occurs in our model because, in equilibrium, a typical agent i knows more than under the natural assumption. In particular, she knows the exact signals of all agents who have announced before her and she knows that she is the i'th highest signal (or the i'th most extreme signal in the case of two-sided delay cost). Hence, information has leaked. In our model this information leak is a result of the tradeoff between accuracy and delay. Whenever the appropriate marginal calculations for this tradeoff are not identical across agents' different possible signal realizations, the choice of when to act will cause an information leak and may result in clustering. In sum, the herding literature explains clustering by noting that agents may know less than you thought, while we explain clustering by noting that they may know more than you thought.

24 APPENDIX Part I of this appendix derives the symmetric equilibrium to a discrete-time version of our game with one-sided time cost and Part II presents the proofs of propositions 1, 3, 4 and 5. Part I: The Symmetric Equilibrium for a Discrete-Time Model The first proposition in the text is somewhat vague regarding certain aspects of the equilibrium strategy as they relate to choosing ti from a continuum (in particular, the idea that the second agent acts "immediately after" the first agent). Here we make these ideas precise by considering a model with the same features as the continuous time model given in section II, but where an agent can act only in discrete time periods. We show that a symmetric equilibrium can be constructed, it is unique, and that as the time between periods goes to zero the equilibrium strategy converges to the strategy in the continuous time model. Let A denote the time between subsequent periods. Thus, an agent who forecasts in period k does so at time ti = (k - l)A. Agent i's utility if she announces zi in period k is now - (w- zi)2 - aw(k- 1)A. (al) As before, the optimal forecast is zi = Si + E(Sjlsi,a,t). Furthermore, once one agent has forecast there is no additional benefit to waiting, so the remaining agent will forecast in the next period (this is the analog to forecasting "immediately afterward" in the continuous time model). Thus, a symmetric equilibrium is described by a function tA(s) that specifies the latest time that an agent with signal s will forecast. If an agent with signal s is willing to wait indefinitely for her opponent to forecast then we will write this as tA(s) = oo. Replacing At with A in the proof that t is non-increasing (see part II of this appendix) establishes that tA is non-increasing for all A > O. Now suppose that no type forecasts in period k. If this is the case then either all types have forecast by period k or

25 there exists a first period k > k such that t,(s) = (k - 1)A. But if such a k existed then the agent with signal s would be strictly better off by forecasting in period k -- the time cost is lower and she learns nothing between periods k and k. Thus, in every period k either some types forecast in that period or both players have announced prior to k (in which case the game is over). Together with the fact that tA is non-increasing, this establishes the existence of a sequence of sk such that so =1, sk < sk-I and t^(s) = A(k - 1) for s E (sk, sk-l), k = 1, 2,.... (a2) The existence of the tA(s) given in (a2) together with the continuity of utility in the agent's own type imply that the agent with signal sk is indifferent between acting in period k and waiting to act in period k+1 (provided that the game does not end with probability one by period k). If the game has proceeded to period k without a forecast, then agent 1 with signal sk knows that S2 is below skI1 and forecasts in period k. Her expected utility is sk-1 sk-1 (-) ( -2 - S2)2ds2 + (k — ) I a(sk + s2)(k - l)Ads2. (a3) The term (k-1) is the density of s2, given that no forecast has been made prior to period k. Alternatively, if agent 1 with signal sk waits to forecast in period k+l then her expected utility is k-1 (s -kkk -1 - sk k-1 )!(a(sk + S2)kA( ks1 k )d2 + sk 1 Sksk s4) Qf {J kI(-f - s 2)2ds2 + a(sk + s2)kAds2 (a4)

26 sk-1 - k To understand the first term in (a4), note that with probability ( sk-1 )agent 2 forecasts in period k, so agent 1 incurs only the time cost, and in this case the density of s2 is uniform on the interval [sk, sk-1]. With probability (-) agent 2 does not forecast in period k. In this case agent 1 is in a position very similar to when she forecast in period k, except she has waited an additional A of time, and now knows that s2 lies below sk rather than below sk-1. This is given in the second term in (a4). Because agent 1 with signal sk is indifferent between acting in period k or period k+l, the expected utility in (a3) must equal the expected utility in (a4). Equating these two expressions and evaluating the integrals gives (sk-1)2 a(k - )A(sksk-l + (sakA(k l + (k ) )2 )3 *Sk- 1)2 2 uk(sksk 2 (sk)3 12 k-l k-l 12sk-1 ' which simplifies to (sk)3 + 12aAsk-lsk + 6aA(sk-1)2 - (sk-1)3 = 0. (a5) Thus, sk is the root to a cubic that is parameterized by sk-l and A. There is at most one solution to (a5) because the cubic's first derivative is strictly positive for sk-1 > 0. Furthermore, at sk = sk-l the cubic is strictly positive, so the root is strictly less than sk-. Substituting in sk = sk yields the results that the sequence { sk} is strictly decreasing. Also, for sk-l > 6ccA, the cubic is less than or equal to zero at sk = 0 and greater than zero at sk = 1, so there is exactly one root in the interval [0, 1] for sk-l > 6cA. Thus, the sequence Isk} is uniquely determined for sk-l 2 6cxA. Finally, for sk~1 < 6uA, sk < 0, so the sequence{sk} decreases to zero (or lower). To distinguish between the sequence of points sk-1 for k = 1, 2,... and the generic parameter sk- to the cubic in (a5), denote the root of (a5) as a function f(r, A) where r corresponds to sk-1. The cubic can now be re-expressed as

27 f(r, A)3 + 12aArf(r, A) + 6aAr2 - r3 = 0, (a6) where sk = f(r, A) for r = sk-l. Thus, the sequence {sk} defined by so = 1 and sk = f(sk-l, A) for all k ~ 1 is uniquely determined. In any symmetric equilibrium this sequence determines the behavior of any s E sk. For any s = sk, arbitrarily specify t,(s) = A(k - 1). Thus, for any arbitrarily specified off-equilibrium path belief, the tA as defined above is the unique equilibrium outcome (up to the behavior of type sk agents) of the discrete time game. We will now show that along any sequence An > 0 such that lim An = 0 as n -* oo, lim tA(s) = as n -> oo, for all s E (0, 1]. First, fix an s E (0, 1] and choose a A 6a such that 6aA < s/2. To see that the function f: [s/2, 1] x [0, A] [0, 1] is continuous (jointly in r and A), define a sequence (rn, An) that converges to (r, 0) as n - oo and observe that by the definition of f, the ordered triplet [f(rn, An), rn, An] solves (a6) for each n = 1, 2,.... Further, if lim f(rn, An) exists as n - oo, then [lim f(rn An), lim rn, lim An] solves (a6) as well. Thus, to verify the continuity of f, it is enough to show that lim f(rn, An) exists as n - oo. Because the range is compact, if the limit did not exist then two subsequences would exist such that each would converge to a different limit. But each of these limits would constitute a solution to (a6), which contradicts the fact that there is only one root to (a6) for r > 6aA. Next we derive an expression that is analogous to the derivative of t,(s). In particular, solve (a5) for A to get (sk- - sk)((sk)2 + skk-l + (sk-1)2) 6a(2sk + sk-l) so that

28 B(sk 1 A), where B(skl ) - (sk)2 + sksk'l + (sk-1)2 k 1= -^B(sk-l, A), where B(sk-l, A) = sk(2s k + s' ) (a8 sk - sk-1 6a sk-1(2 s + sk- ) Note that (a8) loosely resembles t'(s) = -116a in the continuous time model. To establish the continuity of B we again use r to denote the generic value of the parameter sk1 in (a5) in order to distinguish it from the particular point sk-1 in the sequence {sk}. Recalling that f(r, A) = sk for r = sk-l, B can be expressed as B( A) f(r,A)2 + f(r,A)r + r2 r[2f(r,A) + r] Because f is continuous on [s/2, 1] x [0, A] and the denominator of the B is strictly positive for r ~ s/2, B is also continuous on [s/2, 1] x [0, A]. Since this domain is compact, for a given A, B attains its minimum and maximum. Denote the minimum and maximum of B by mA and MA, respectively. We now establish bounds for tA(s) using mA and MA. First, note that by using (a7) we can express the A in tA(s) = (k - 1)A as (sk-1 - sk)B(sk-, )) A = (a10) which holds for all k = 1, 2,..., N. Consider the lower bound t(s) (s - s)m (al) tA(s) = 6cz (all) 6ix To verify that this is a lower bound to tA(s) note that, for s E [sk, sk-l), tA(s) is at its highest point at sk. Thus, consider tl(k) (sl - sk)mA = k (sj-l - s)mA (a12) 6a j=2 6a

29 Because (alO) holds for all k and ma is the minimum of B, each of the k-1 terms in the summation of (a12) is less than or equal to A. Thus, t(sk) < t(sk). Since this bound holds for each sk, tl(s) < tA(s) for all s. As an upper bound consider h. (1 - s)M,& th(S) = (a13) 6c: For s e [sk, sk-l), tx(s) is at its lowest point at sk-l. Thus, consider hsk1) (1- - sk-l)MA k-(s1- - sj)M tA (s- 1)= 6c 6a I -oc (al4) 6a j=6 6c recalling that s~ = 1. Since each of the k-l terms in (a14) is greater than or equal to A, t(sk-) > tA(sk1). Since this bound holds for each sk-l, t(s) > tb(s) for all s. The preceding expressions were derived for a given A. We now consider the behavior of the tA(s) as the A goes to zero. First, define a sequence (rn, An) that converges to (r, 0) as n - oo. Note that because f is continuous lim f(rn, An) = f(r, 0) = r as n -4 oo. This implies that the term sl(l), which is used in the lower bound tA(s), converges to one as n ~ oo. We now show that m& and M, converge to one as the sequence of An converges to zero. By definition, MA = max B(rn, An), for rn chosen from [s/2, 1]. Because the domain of B is compact, there exists a value rn where B reaches its maximum. Thus lim MA = lim B(rn, An) as n -> oo. Suppose that lim B(rn, An) 1 1. This would imply that there exists a neighborhood V of 1 and a subsequence (rnj, Anj) such that none of the elements of the subsequence B(rnj, Anj) belong to V. Now consider a subsequence of (rnj, Anj), denoted by (rnk, Ank), that converges to (r, 0). The compactness of the domain of B assures the existence of a convergent subsequence and, by definition, all subsequences of An converge to zero. Since f(f, 0) = r and by the continuity of B, lim B(rnk, Ank) = B(rf, 0) = 1. But this is a contradiction. If the subsequence

30 B(-rnk, Ank) converges to 1 and this sequence is a subsequence of B(rnj, Anj), then some elements of B(rnj, Anj) must belong to V. An analogous argument shows that lim mA = 1 as n - oo. With both the minimum and maximum bounds of B converging to 1 and sl(l) converging to 1, it follows that as n -> oo, h 1 (1-s) lim th(s) = lim t,(s) (, (al5) A " 6a which is t(s) of the continuous time model given in proposition 1. Since both the upper and lower bounds are converging to the same number, tA(s) must also converge to the t(s) given by the continuous time model. This establishes the desired result. Part II: Proof of Propositions 1, 3. 4, and 5 Parts of the proof of proposition 3 can be used to prove proposition 1, so we present it first. Proposition 3 First we establish the existence of a symmetric equilibrium when g(w) is either strictly monotone or strictly convex. Consider the case where g(w) is strictly convex but not monotone. Denote by s* the agent type with the lowest ex ante expected delay cost; that is, s = 1/2 argmin g(w), where we [0, 2]. We postulate a symmetric equilibrium t:[0,1] such that t is differentiable on [0, s*)u(s*, 1], strictly increasing on [0, s*), strictly decreasing on (s*, 1] and satisfying either t(O) = 0 or t(l) = 0. Denote by h(s) a function h:[ s, s*) -> (s*, s ]. This function will be used to identify the agents with signals s and h(s) who will both forecast at the same time. As the proof proceeds, two cases will present themselves, depending on the nature of g(w). In one case s = 0, h(O) = s and t(l) = 0 is the initial condition. In the other, s = 1, h(l) = s

31 and t(O) = O0 is the initial condition. For clarity, we present the entire proof for the former case; the proof for the latter case is symmetric and is discussed briefly at the conclusion of the proof. With this, agent l's optimization problem is to choose s from [0, s*) to minimize s 1 h(s) h(s) cIrs h(s)+s 2 Jat(s2)g(sl+s2)ds2 + Jat(s2)g(sl+S2)ds2 + Jat(s)g(sl+s2)ds2 + J[S2- 2 ] ds2 0 h(s) s s for all sl [0, s*). This yields the first order condition t'(s)a{G(sl+h(s)) - G(sl+s)} + [h(s s] [h'(s)- 1] = 0, (a6) x where G(x) = Jg(w)dw. Substituting sl for s in (a16) gives 0 [h(sl) - sI]2 t'(sl)a{G(sl+h(sl)) - G(2sl)} + [h'(sl) - 1] = 0. (a17) Equation (a17) is the equilibrium condition for an agent with sie [0, s*). For an agent with sl E (s*, 1], replace sl with h(sl) in (1) and then substitute sl for s to get t'(Sl)a{G(2h(sl)) - G(h(sl)+sl)} + [h(s) - l]2 [h'(s) - 1] = 0. (a8) Both (al7) and (a18) are satisfied if there exists a differentiable h(si) such that, for all S1E [0, S*), G(2h(sl)) + G(2sl) = 2G(h(sl)+sl). (al9)

32 Consider two mutually exclusive cases; either G(2) ~ 2G(1) or the opposite inequality holds. If the inequality is as given then there exists an s E (s*, 1] such that G(2s) = 2G(s). This follows because G(x) is continuous and G(2s*) < 2G(s*) since g(w) is strictly decreasing on [0, 2s*). Thus, we are in the case where h(0) = s and the appropriate initial condition is t(l) = 0. If the opposite inequality holds then we are in the case, to be discussed later, where there exists a s such that h(s) = 1 and t(0) = 0. To verify the existence of a function h:[O, s*) -> (s*, s] that satisfies (a19) for all sie [0, s*), implicitly differentiate (a19) to get the first order differential equation h'(sl) - g(h+sl) - g(2sl) (a20) g(2h)- g(h+sl) Let QE = [0, s* - e)X(s*, S]. The RHS of (a20) is a continuously differentiable and bounded function from U6 to 91 with a bounded derivative on Q~ for a fixed e > 0. Hence the function h satisfies a Lipchitz condition sufficient to guarantee a unique solution to (a20) on Q2 (see Bartle 1976, p. 256). Bartle's theorem requires that the domain be open. This creates no problem in our case since both g(w) and the RHS of (a20) can be extended differentially to (-e, s* - e)x(s*, s). To extend h to the domain [0, s*), let e -> 0. This establishes that there exists a unique solution to (al9). For sl E (s, 1] agent l's optimization problem is to choose s to minimize 1 s s Jat(s2)g(sl+s2)ds2 + Jat(s)g(sl+s2)ds2 + I[ s2-s2]2ds2, s 0 0 which yields the first order condition t'(s)a{G(sl + s) - G(sl)} + s2/4 = 0. (a21)

33 Setting s=sl yields t'(sl)a{ G(2sl) - G(sl)} + (sl)2/4 = 0. (a22) Since (a22) holds for si E (s, 1] and a solution can be obtained by integrating and using the initial condition t(l) = 0, t(sl) is described by this solution on this region. Further, taking the calculated value of t(s) from the solution of (a22) gives a boundary condition t(O) = t(s) for (a17). By integrating (a17) and using this boundary condition, t(sl) is described on [0, s*). On (s*, 1], t(si) is described by the relation t(sl) = t(h(sl)). Finally, define t(s*) as lim t(sl) as sl -i s* for sI E [0, s*)u(s*, 1]. Note that this limit exists, but may be infinite. We have shown that t(sl) satisfies the first order condition for all sl e [0, s*)u(s*, 1]. To establish that t(sl) is a global minimum we will show that for s and si belonging to [0, s*), when sl < s the LHS of (al6) is positive and when sl > s the LHS is negative. The first order condition states that for sl = s the LHS of (a16) equals zero. Note that the only term in (al6) involving sl is {G(sl+h(s)) - G(sl+s)} and that t'(s) is positive on [0, s*). If this term is greater or lesser at values of sl X s than it is at sl = s then the as1 {G(sl+h(s)) - G(sl+s)} = g(sl+h(s)) - g(sl+s). This derivative equals zero at most once because g(w) is strictly convex. Further, g(2s) > g(s + h(s)). If this was not the case then the ordering g(2s) < g(s+h(s)) < g(2h(s)) would imply that G(s+h(s)) - G(2s) < G(2h(s)) - G(s+h(s)), which would contradict (a19) at sl = s. Thus, {G(sl+h(s)) - G(sl+s)} is decreasing at sl = s. This, combined with the facts that {G(sl+h(s)) - G(sl+s) I is also decreasing at s = 0 (by the monotonicity of G) and changes direction at most once, establishes that for all sl < s, IG(sl+h(s)) - G(sl+s) > IG(s+h(s)) - G(s+s)}. Thus, the LHS of (a16) is positive for all sl < s. For sl > s, it is possible that

34 {G(sl+h(s)) - G(si+s)) is increasing in sl. However, even at sl = s*, {G(s*+h(s)) - G(s*+s)} < {G(2h(s)) - G(si+h(s)) = {G(s+h(s)) - G(s+s) by the definition of h(s) and (al9). Thus, {G(sl+h(s)) - G(sl+s)} < {G(s+h(s)) - G(s+s)} for sl > s, so the LHS of (al6) is negative for all sl > s. This establishes that choosing t(s) for s < si or s > sl would yield a higher value, hence the derived t(sl) is indeed a global minimum. Recall that following (al9) we assumed that G(2) > 2G(1). The proof when G(2) - < 2G(1) is symmetric. In this case s =1, so h is defined as h:[, s* ) - (s*, 1], h(l) = s r and the initial condition is t(O) = 0. The equilibrium conditions given in (a17) and (al8) remain the same and an equation analogous to (a22) is obtained for the segment [0, s). Finally, the proof for the case where g(w) is strictly monotone follows from the construction of the equilibrium t(sl) on the segment [s,1] when g(w) is increasing and from the construction of t(sl) on the segment [0, s) when g(w) is decreasing. Thus a symmetric equilibrium exists. Next we show that when g(w) is either strictly monotone or strictly convex, any symmetric equilibrium t(s) is strictly quasi-concave. Recalling that Si and Sj are independent, denote the expected squared error of agent i's forecast made at time no later than t as 1(Q) = ji[ sj - E(sjlt)]dF(sj), sj Aj() where Aj(z) is the subset of I for which agent j forecasts before T (recall that when agent j forecasts first, agent i's mean squared error is zero.) Also, denote the expected cost of delay to agent i who forecasts at time no later than T as c(si,) = Jat(sj)g(si+sj)dF(sj) + Jacg(si+sj)dF(sj). sjeAj(T) sjgAj(r)

35 Note that the first term represents the delay cost when agent j forecasts first at time t(sj), so that t(sj) < T for sj E Aj(Z). With this, agent i's objective function is to choose T to minimize l(r) + c(si,). To show that t(si) is'quasi-concave, suppose the contrary (strict quasi-concavity will be shown later). If t(si) is not quasi-concave then there exists a Xe (0,1), s and s such that t(Xs + (l-X)s) < min[t(s),t(G)]. Let = X s + (l-X)s, 2 = t(s), T = t(_s) and z = t(-). By the optimality of we have 1(t) + c(s,) < min{l( ) + c(,_ ), 1(X) + c(,t)}, which implies 1()- l() c(,T) - c(s, $) (a23) and 1()- 1() c(s, )- c(s, I). (a24) Similarly, the optimality of T and T imply 1() - 1(t) c(, ) - c(S, ') (a25) and 1() - 1(Z) > c, I) - c(, ^). (a26) We show that either (a25) contradicts (a23) or (a26) contradicts (a24). Consider c(si, I) - c(si, t). Denote Al = Aj(t), A2 = Aj(t)Ajt) and A3 = Aj(t)C, so that At, A2 and A3 are disjoint sets and their union equals I. With this,

36 c(si, ) - c(si, ) = Jat(sj)g(si+sj)dF(sj) + J ag(si+sj)dF(sj) sje AlA2 sjeA3 - Jat(sj)g(si+sj)dF(sj) - JaTg(si+sj)dF(sj). sjEA1 sjeA2UA3 = Ja[t(sj) - ~]g(si+sj)dF(sj) + Ja[t - t]g(si+sj)dF(sj) (a27) sjeA2 sjeA3 where the final simplification follows because A1, A2 and A3 are disjoint sets. Recall that by supposition? < min{t, T) and note that, for sje A2, t(sj) >. Thus, (a27) is positive; forecasting later yields higher expected delay costs. Using (a27), [c(s, ) - c(, )] - [c(~, ) - c(S, )] can be written as ac[t(sj) - ][g(+sj)- g(s+sj)]dF(sj) + J[ - ][g+sj)- g(s+sj)]dF(sj). (a28) sjeAG2 sjeA3 For strictly monotone or strictly convex g, either g(s+sj) > g('^+sj) or g(s+sj) > g(^+sj). If g(s+sj) > g(s+sj), (a28) is positive, implying that [c(s, i) - c(s, )] > [c(s, i) - c(s, 4)] and (A26) contradicts (A24). Alternatively, if g(s+sj) > g(^+sj) then, substituting s for s and T for T yields another positive (a28), and [c(, ') - c(s, 1] > [c(^, z) - c(, 1)] and (A25) contradicts (A23). This establishes that t(si) is quasi-concave. To show that t(si) is strictly quasi-concave, suppose to the contrary that there exists s and s such that for si E (s, s), t(si) = t. The optimality of si E (s, W) implies that 1() -1(Z + At) < c(si, A'+ A) - c(si, ) (a29) for at At > 0. However, 1() integrates over Aj()C -- it includes the region (s,s) -- while l(T+AT) integrates only over Aj(A+At)C -- it excludes the (s,) region. Hence, since the integrand is always positive, there exists e > O0 sukch that 1(t) - l(t+At) > e for all AT > 0.

37 Further, c(si, L+AT) is continuous and increasing in AT, so there exists a Ar sufficiently small to make c(si,t+AT) - c(si, A) sufficiently close to zero such that (a29) is contradicted. Thus, t(si) is strictly quasi-concave. Finally, we show that the strict quasi-concavity of t(si) implies that ordering is positive. Because anticipation is non-negative by definition (see (3) in the text), establishing that ordering is positive is sufficient to establish that clustering occurs. First,, note that Var(Si) = 1/12, so ordering is positive if Var(Y) < 1/12. By the strict quasiconcavity of t(si), when the first agent conditions her belief on X she knows that the second agent's signal, Y, is uniformly distributed on an interval [a(y), a(y) + y] and, as the notation implies, the interval is uniquely determined by its length y. The cummulative distribution function of r (with realizations denoted by y) is P(T1<y) = P(sie [a(y), a(y) + y]) * P(s2E [a(y), a(y) + y]) = y2, and so the density of r is 2y on y E [ 0, 1]. With this, 1 r a(y)+y O Var(Y)= (y - g)2(1/y)dy2ydy, J a(y) 0 where g = E(Y). To find g, note that E(YIy) = a(y) + y/2, so 1 1 i= E{E(YIr)} = j[a(y)+y/2]2ydy =2 a(y)ydy + 1/3. (a30) 0 0 Now note that Var(Y) can be written as E(Y - 1/2)2 - (1/2 - g)2, so 1 r a(Y)+,y Var(Y) = (y - l/2)2(1/y)ddyd - (1/2 - g)2. (a31) Ja(y) 0 Integrating (a31) with respect to y and simplifying gives

38 1 Var(Y) = 2 Ja(y)y[a(y) +y - l]dy + 1/12 - (1/2 - g)2. (a32) 0 Note that the first term (involving an integral) and the last term in (a32) are non-positive. The first term is zero only if a(y) = 0 or a(y) + y = 1 for all y. By the strict quasi-concavity of t(si) it cannot be the case that a(y) = 0 for some of the domain and a(y) + y = 1 for the - remaining domain. But, from (a30), when a(y) = 0 for all y, L = 1/3 and when a(y) + y = F for all y, kt = 2/3. Thus, when the first term is zero, the last term is strictly negative. Hence, Var(Y) < 1/12. Proof of Proposition 1 This proposition is a special case of proposition 3. By substituting w for g(w) in (a22), so that G(2sl) - G(s) = 2, we get the equilibrium condition 1 t'(si)= -. (a33) 6ca Integrating (a33) and using the initial condition t(l) = 0 gives t(s) 1 - (a34) 6a The solution given in (a34) is unique within the class of differentiable strategies. Next we show that any symmetric equilibrium to the game with g(w) = w must be strictly decreasing and differentiable, so (a34) describes the unique symmetric equilibrium. First, to show that t(si) is strictly decreasing it is sufficient to note that g(si+sj) is strictly increasing in si so, from the proof of proposition 3, (a28) is positive and, if t(si) were nondecreasing, (a26) would contradict (a24). Next we show that t(si) is continuous. With it established that t(si) is strictly decreasing, we can express agent 1's optimization problem as minimizing

39 1 S s L(si,s) = J c(sl + s2)t(s2)ds2 + J (s2 - s/2)2ds2 + f a(sl + s2)t(s)ds2 (a35) s 0 0 where the second argument of L is agent's choice variable. By relying on the invertibility of t(s), choosing s is equivalent to choosing t. Suppose t(s) is discontinuous at si and, without loss of generality, suppose that the discontinuity takes the form t(sl) > lim t(sl + e) as e 4 O. We show that if this is the case; then by forecasting a little sooner agent 1 can reduce her cost of delay by a positive amount without reducing her forecast accuracy. In particular, for agent 1 with signal si, L(si,sl) equals 1 SI si fa(si + s2)t(s2)ds2 + (s2 - sl/2)2ds2 + Ja(sl + s2)t(sl)ds2. (a36) Si 0 0 However, if agent 1 chooses to predict at time t(sl+e) then L(sl,sl+e)) equals 1 s1+E S1+E Jc(sl + s2)t(s2)ds2 + 2-(s(sl+e)/2)2ds2 + a(Sl + s2)t(sl + e)ds2. (a37) s1+E 0 0 As E 1 O0 the first two terms in (a37) converge to the first two terms in (a36) but the third term converges to S1 a(sl + s2)lim[t(Sl + c)]ds2. (a38) 0 Because t(s) is positive and decreasing, and by the discontinuity of t(s) at si, (a38) is strictly less than the third term in (a36). This contradicts the optimality of t(s) at sj. Finally, we show that t(si) is differentiable. If agent 1 with signal sl chooses s=sl, it must be the case that for all ~ > 0 L(sl,sl+e) - L(sl,sl) (a39) (a39)

40 Similarly, if agent 1 with signal si + e chooses s=sl + e, it must be the case that for all e>0 L(sl+e,sl) - L(sl+e,s+e) (a40) By substituting in the component terms of L and taking the difference term by term, (a39) is expressed as 1 1 - sj s+1 2 s si 2 Ja(sl+s2)t(s2)ds2- a(sl+s2)t(s2)ds2]+ 1 (s2 - - ) ds2 - (s2- d2 Sl 0 [s1+~ Sl +- a(sl + s2)t(sl+e)ds2 - J a(sl + s2)t(sl)ds2 > 0. (a41) 2 As e 0, the limit of the first term in (a41) is -2slat(sl), the limit of the second term is 4 and the limit of the third term is 2slat(sl) + 2sl2 irt(S i) - (1) Thus, the limit of (a39) as e t 0 is S2 3 2 t1+) - t(S (a42) Ttasili hn j O. (a42) A similar exercise yields the limit of (a40) to be 2 Si 3 2 [t(Sl+e) - t(Sl) -4 2- S 1 J0. (a43) Because the LHS of (a43) is of opposite sign of the LHS of (a42), the only way to satisfy both inequalities is for each to equal zero. This yields li{t(l+e) - t()] = - 61 L e J 60 The conditions in (a39) and (a40) are given for e > 0. If e < 0 the inequalities in both conditions are reversed (as is the limit direction) and hence the inequalities in (a42)

41 and (a43) are reversed. As before, the only way both conditions could be satisfied is for each to equal 0. Thus, li it(Sl+e) - t(s) - 1 regardless of the direction that the limit e 6a is taken; consequently, t(s) is differentiable. Thus, the unique symmetric equilibrium strategy for the model with one-sided delay cost is strictly decreasing and differentiable. Hence, the solution given in proposition 1 is the unique symmetric equilibrium. Proof of Proposition 4 Proposition 4 is a special case of proposition 3. The solution to (a20) is h(s) = 1-s (which we guessed based on the symmetry of g(w)), and s* = 1/2. Substituting 1 - sl for h(sl) and (w - 1)2 for g(w) gives the equilibrium condition t'(si)=2(- ) (a45) 2ac(2si - 1) Integrating (a45) and using either initial condition t(l) = 0 or t(0) = 0 gives t(sl)=- 3 log(2sl- 11). (a46) 4a Proof of Proposition 5 Differentiating equation (8) in the text with respect to s gives the first order condition 0 = p( - +(sl + )st'(s)- + 2sp(sl - s) - (s - s)2. Solving for t'(s) and evaluating at s = sl gives t'(s) = -; 6ao

42 integrating and using the initial condition gives t(sl) = (i ) 6a To verify that this is a global minimum, substitute the derived t(s) into the objective function and note that the first derivative is ( s2 - s s). This quadratic has a root at s=sl and at s=O. For <1/18, the first derivative is negate ive between the two roots and positive elsewhere. This guarantees that s=sl is the unique minimum when B<1/18.

43 REFERENCES Banerjee Abhijit, 1992, "A Simple Model of Herd Behavior," The Ouarterlv Journal of Economics, 107:797-817. Bartle, R. The Elements of Real Analysis, Wiley: New York, 1976. Bikhchandani, Sushil, David Hirshleifer and Ivo Welch, 1992, "A Theory of Fads, Fashion, Custom and Cultural Change as Informational Cascades," Journal of Political Economy, 100:992-1026. Bliss, Christopher and Barry Nalebuff, 1984, "Dragon-Slaying and Ballroom Dancing: The Private Supply of a Public Good," Journal of Public Economics, 25:1-12. Bulow, Jeremy and Paul Klemperer, 1991, "Rational Frenzies and Crashes," NBER Technical Paper No. 112. Froot, Kenneth, David Scharfstein and Jeremy Stein, 1990, "Herd on the Street: Informational Inefficiencies in a Market With Short-Term Memory," NBER Working Paper No. 3250. Hendricks, Kenneth and Dan Kovenock, 1989, "Asymmetric Information, Information Externalities, and Efficiency: The Case of Oil Exploration," RAND Journal of Economics, 20:2, 164-82. Scharfstein, David and Jeremy Stein, 1990, "Herd Behavior and Investment," American Economic Review 80:3, 465-79. Stickel, Scott, 1990, "Predicting Individual Analyst Earnings Forecasts," Journal of Accounting Research, 28, 409-17. Stickel, Scott, 1992, "Reputation and Performance Among Security Analysts," The Journal of Finance, 48:1811-36. Stinchcombe, M., 1988, "Maximal Strategy Sets for Continuous-Time Game Theory," University of California-Berkeley Working Paper. Trueman, Brett, 1991, "Analyst Forecasts and Herding Behavior," University of California-Berkeley Working Paper. Zwiebel, Jeffrey, 1991, "Corporate Conservatism, Herd Behavior and Relative Compensation," Stanford University Working Paper.