Why people punish defectors Weak conformist transmission can stabilize costly enforcement of norms in cooperative dilemmas Joseph Henrich Robert Boyd University of Michigan University of California, Los Angeles 701 Tappan Road, D3276 405 Hilgard Ave Ann Arbor, MI 48109-1234 Department of Anthropology (734) 763-0370 Los Angeles, CA 90024' henrich@umich.edu rboyd@anthro.ucla.edu In this paper we present a cultural evolutionary model in which norms for cooperation and punishment are acquired via two cognitive mechanisms: 1) payoff-biased transmission-a tendency to copy the most successful individual; and 2) conformist transmission-a tendency to copy the most frequent behavior in the population. We first show that if a finite number of punishment stages is permitted (e.g. two stages of punishment occur if some individuals punish people who fail to punish noncooperators), then an arbitrarily small amount of conformist transmission will stabilize cooperative behavior by stabilizing punishment at some n-th stage. We then explain how, once cooperation is stabilized in one group, it may spread through a multi-group population via cultural group selection. Finally, once cooperation is prevalent, we show how prosocial genes favoring cooperation and punishment may invade in the wake of cultural group selection.

Cooperation and Conformist Transmission In many societies, humans cooperate in large groups of unrelated individuals. Most evolutionary explanations for cooperation combine kinship (Hamilton, 1964) and reciprocity ("reciprocal altruism" Trivers, 1971). These mechanisms seem to explain the evolution of cooperation in many species including ants, bees, naked mole rats and vampire bats. However, because social interaction among humans often involves large groups of mostly unrelated individuals, explaining cooperation has proved a tricky problem for both evolutionary and rational choice theorists. Evolutionary models of cooperation using the repeated n-person prisoner's dilemma predict that cooperation is not likely to be favored by natural selection if groups are larger than around 10, unless relatedness is very high (Boyd & Richerson 1988). As group size rises above 10, to 100 or 1000, cooperation is virtually impossible to evolve or maintain with only reciprocity and kinship.' Many students of human behavior believe that large-scale human cooperation is maintained by the threat of punishment. From this view, cooperation persists because the penalties for failing to cooperate are sufficiently large that defection 'doesn't pay.' However, explaining cooperation in this way leads to a new problem: why do people punish noncooperators? Individuals who punish defectors provide a public good, and thus can be exploited by non-punishing cooperators if punishment is costly. Second order free-riders cooperate in the main activity, but cheat when it comes time to punish non-cooperators. As a consequence, 2nd order free riders receive higher payoffs than punishers do, and thus punishment is not evolutionarily stable. Adding 3rd (3rd order punishers punish 2nd order free riders) or higher order punishers only pushes the problem back to higher orders. Solving this problem is important because there is widespread agreement that the threat of punishment plays an important role in the maintenance of Two other explanations for cooperation go by the handles by-product mutualism (Brown 1983) and group selection (Sober & Wilson 1998). In by-production mutualism, individuals who 'cooperate' get a higher payoff (have a higher expected fitness) than non-cooperators. The cooperative contribution to the fitness of others is simply a by-product of narrow self-interest. That is, in the process of helping myself, I also help you 'by accident.' Hence, although this situation may abound in nature, it's not the situation we are interested in (and not cooperation by many definitions). And, while genetic group selection may explain some cooperation in nature (e.g. honeybees, see Seeley, 1995), we believe that gene flow rates between human populations, relative to selection, are too high to maintain the required variation between groups (Richerson & Boyd, 1998). 2

Cooperation and Conformist Transmission cooperation in many human societies. Social scientists have explained the maintenance of punishment in three ways: 1) Many authors assume that a state or some other external institution does the punishing; 2) Others assume punishing is costless (McAdams, in press; Hirshleifer & Rasmussen, 1989); and 3) A few scholars incorporate a recursive punishing method in which punishers punish defectors, individuals who fail to punish defectors, individuals who fail to punish non-punishers, and so on in an infinite regress (Boyd & Richerson, 1992; Fundenberg & Maskin, 1986). However, none of these solutions are satisfactory. While it is useful to assume institutional enforcement in modem contexts, it leaves the evolution and maintenance of punishment unexplained because at some point in the past there were no states or institutions. Furthermore, the state plays a very small role in many contemporary small-scale societies that nonetheless exhibit a great deal of cooperative behavior. This solution avoids the problem of punishment by relocating the costs of punishment outside the problem. The second solution, instead of relocating the costs, assumes that punishment is costless. This seems unrealistic because any attempt to inflict costs on another must be accompanied by at least some tiny cost-and any non-zero cost lands both genetic evolutionary and rational choice approaches back on the horns of the original punishment dilemma. The third solution, pushing the cost of punishment out to infinity, also seems unrealistic. Do people really punish people who fail to punish other non-punishers, and do people punish people who fail to punish people, who fail to punish non-punishers of defectors and so on, ad infinitum? Although the infinite recursion is cogent, it seems like a mathematical trick. Conformist transmission in social learning can stabilize punishment In this paper, we argue that the evolution of cooperation and punishment are plausibly a side effect of a tendency to adopt common behaviors during enculturation. Humans are unique among primates in that they acquire much of their behavior from other humans via social learning. However, both theory and evidence suggest that humans do not simply copy their parents, nor do they copy other individuals at random (Henrich & Boyd, 1998; Takahasi, 1998; Harris, 1998). Instead, people seem to use 3

Cooperation and Conformist Transmission social learning rules like 'copy the successful' (termed pay-off biased or prestige-biased transmission, see Gil-White & Henrich, 2000) or 'copy the majority' (termed conformist transmission, Boyd & Richerson, 1985, Henrich & Boyd, 1998), which allow them to short-cut the costs of individual learning and experimentation, and leapfrog directly to adaptive behaviors. These specialized social learning mechanisms provide a generalized means of rapidly sifting through the wash of information available in the social world and inexpensively extracting adaptive behaviors. These social learning short-cuts do not always result in the best behaviors, nor do they prevent the acquisition of maladaptive behaviors. Nevertheless, when averaged over many environments and behavioral domains (e.g. foraging, hunting, social interaction, etc.), these cultural transmission mechanisms provide fast and frugal means to acquire complex, highly-adaptive behavioral repertoires. 'Both theoretical and empirical research indicates that conformist transmission plays an important role in human social learning. We've already shown that a heavy reliance on conformist transmission outcompetes both unbiased (i.e. vertical) transmission and individual learning under a wide range of conditions (Henrich & Boyd, 1998), and especially when problems are difficult. Second, empirical research by psychologists, economists and sociologists shows that people are likely to adopt common behaviors across a wide range of decision domains. Although much of this work focuses on easy perceptual tasks (Asch, 1951), and confounds normative conformity (going with the popular choice to avoid appearing deviant) with conformist transmission (using the popularity of a choice as an indirect measure of its worth), more recent work shows that social learning and conformist transmission are important in difficult individual problems (Baron et. al., 1996; Insko et. al., 1985; Campbell & Fairey, 1989), voting situations (Wit, 1999) and cooperative dilemmas (Smith & Bell, 1994). Conformist transmission can stabilize costly cooperation without punishment, but only if it is very strong. All other things being equal, pay-off biased transmission causes higher payoff variants to increase in frequency, and thus cooperation is not evolutionarily stable under plausible conditionsbecause not-cooperating leads to higher payoffs than cooperating. Thus, pay-off biased transmission, alone, suffers the same problem as natural selection in genetic evolution. However, under conformist 4

Cooperation and Conformist Transmission transmission individuals preferentially adopt common behaviors, which acts to increase the frequency of the most common behavior in the population. Thus, if cooperation is common, conformist transmission will oppose payoff-biased transmission, and, as long as cooperation is not too costly, maintain cooperative strategies in the population. However, if the costs of cooperation are substantial, it is less likely that conformist transmission will be able to maintain cooperation. A quite different logic applies to the maintenance of punishment. Suppose that both punishers and cooperators are common, and that being punished is sufficiently costly that cooperators have higher payoffs than defectors. Rare invading 2nd order free riders who cooperate but do not punish will achieve higher payoffs than punishers because they avoid the costs of punishing. However, because defection doesn't pay, the only defections will be due to rare mistakes, and thus the difference between the payoffs of punishers and 2nd order free riders will be relatively small. Hence, conformist transmission is more likely to stabilize the punishment of noncooperators than cooperation itself. As we ascend to higher order punishing, the difference between the payoffs to punishing vs. non-punishing decreases geometrically towards zero because the occasions that require the administration of punishment become increasingly rare. Second order punishing is required only if someone erroneously fails to cooperate, and then someone else erroneously fails to punish that mistake. For third order punishment to be necessary, yet another failure to punish must occur. As the number of punishing stages (i) increases, conformist transmission, no matter how weak, will at some stage overpower payoff-biased imitation and stabilize common i-th order punishment. Once punishment is stable at the i-th stage, payoffs will favor strategies that punish at the i - I order, because common punishers at the i-th order will punish non-punishers at stage i - 1. Stable punishment at stage i - 1 order means payoffs at stage i - 2 will favor punishing strategies, and so on down the cascade of punishment. Eventually, common 1st order punishers will stabilize cooperation at stage 0. It is important to see that the stabilization of punishment is, from the gene's point of view, a maladaptive side-effect of conformist transmission. If there were genetic variability in the strength of conformist transmission (a) and cooperative dilemmas were the only problem humans faced, then 5

Cooperation and Conformist Transmission conformist transmission might never evolve. However, human social learning mechanisms were selected for their capability to efficiently acquire adaptive behaviors over a wide range of behavioral domains and environmental circumstances-from figuring out what foods to eat, to deciding what kind of person to marry-precisely because it is costly for individuals to determine the best behavior. Hence, we should expect conformist transmission to be important in cooperation as long as distinguishing cooperative dilemmas from other kinds of problems is difficult, costly or error prone. Looking across human societies we find that cooperative dilemmas come in an immense variety of forms, including harvest rituals among agriculturalists, barbasco fishing among Amazonian peoples, warfare, irrigation projects, taxes, voting, meat sharing and anti-smoking pressure in public places. It's difficult to imagine a cognitive mechanism capable of distinguishing cooperative circumstances from the myriad of other problems and social interactions that people encounter. In what's to come, we formalize this argument. Our goal is to demonstrate the soundness of our reasoning and show how very weak conformist transmission can stabilize cooperation and punishment. After demonstrating this, we will describe how cooperation, once it's stabilized in one group, can spread across many populations via cultural group selection. We'll also briefly show how genes for prosocial behavior may eventually spread in the wake of cultural evolution. A cultural evolutionary model of cooperation & punishment In this model, a large number of groups each consisting of N individuals are drawn at random from a very large population. Individuals within each group interact with one another in an i +1 stage game. The first stage is a one-shot cooperative dilemma, which is followed by i stages in which individuals can punish others. We number the first, cooperative stage as '0' and the punishment stages as l,...,i. The behavior of individuals during each stage is determined by a separate culturally acquired trait with two variants, P (Prosocial variant) and NP (Not Prosocial variant). During the initial cooperative dilemma, individuals can either "cooperate"-contribute to a public good-or "defect"-not contribute and free-ride on the contributions of others. Each cooperator pays a 6

Cooperation and Conformist Transmission cost C to contribute a benefit B (B > C) to the group-this B is divided equally among all group members. Defectors don't pay the cost of cooperation (C), but do share equally in the total benefits. The variable po represents the frequency of individuals in the population with the cooperative variant in stage 0. People with the cooperative variant "intend" to cooperate, but mistakenly defect with probability e. Individuals who have the defecting variant always defect. This makes sense because, in the real world, people may intend to cooperate, but fail to for some reason. For example, a friend who plans to help you move, may forget to show up or have car trouble en route, etc. Defectors, however, are unlikely to mistakenly show-up on moving day and start carrying boxes. We will assume errors are rare, so that the value of e is small. During the first punishment stage, individuals can punish those who defected during the cooperation stage. Doing this reduces the payoff of the individuals who are punished by an amount p, at a cost of 0 to the punisher (0 < p < C). Individuals with the punishing (P) variant at this stage intend to punish, but mistakenly fail to punish with probability e. Non-punishers, those with the NP-variant at stage 1, do nothing. We use pi to stand for the frequency of 1st stage punishers (i.e. individuals who have the Pvariant at stage 1), and (1-pl) gives the frequency of lst stage free riders. During the secondpunishment stage, individuals with the P-variant punish those who did not punish the non-cooperators during the previous stage with probability (1 - e), and mistakenly fail to punish with probability e. And as before, punishment costs punishers 0 to administer, and costs those being punished an amount p. Those with the NP-variant at stage 2 do not punish. Letpz2 be the frequency of 2nd stage punishers. At stage 3, individuals with the P-variant will punish individuals from stage 2 who failed to punish non-punishers from stage 1. The costs of punishment remain the same. Those with the NP-variant in stage 3 will not punish anyone from stage 2. The pattern repeats as one descends to stage i in Table 1 (pi gives the frequency of punishers at stage i). Because the interaction ends after stage i, individuals who fail to punish on stage i cannot be punished. Note that the trait that controls individual behavior at each stage has only two variants, and the values of variants at different stages are 7

Cooperation and Conformist Transmission independent-so an individual could cooperate at stage 0 (have the P-variant), not punish at stage 1 (NPvariant), and punish at stage 2 (P-variant). [Table 1 about here] After all the punishments are complete, cultural transmission takes place. As we explained earlier, two components of human cognition create forces that change the frequency of the different variants: payoff biased and conformist-biased imitation. Equation 1 gives the change in the frequency of stage 1 cooperators as a consequence of pay-off biased and conformist transmission (see Henrich 2000). Apo = po (1-p, )[(1-a)p (b - b) + a(2po -1)] (1) Payoff -biased Conforrmist The parameter a varies from 0 to 1 and represents the strength of conformist transmission in human psychology relative to pay-off biased transmission. We will generally assume a is positive, but small. Practically speaking, a must be less than 0.50, because otherwise beneficial variants would never spread-once a variant became common, it would remain common no matter how deleterious. The second term in (1), labeled 'conformist,' varies in magnitude from -a to +a and is the component of the overall bias contributed by conformist transmission. In the term labeled 'payoff-biased', the symbols be and bD are the payoffs to cooperators and defectors, respectively. The quantity (bc - bD), which we label Abo, gives the difference in payoffs between cooperation (P-variant) and defection (NP-variant) in stage 0. More generally, Abi is the differences in payoffs between the P- and NP-variants during the i-th stage. The parameter /3 normalizes the quantity Abi so that it varies between -1 and +1, and therefore P = l/lAbjmax. Thus, the term labeled 'payoff-biased' varies between -(1- a) and +(1 -a) and represents the component of the overall bias contributed by payoff-biased transmission. The expected payoffs, b, to the P- and NP-variant at each stage depend on the rate of errors, the costs of cooperation and/or punishment, and the frequency of cooperators and punishers in the population. At stage 0, cooperators receive an average payoff of be, while defectors receive an average payoff of bD: 8

Cooperation and Conformist Transmission be = (1- e)(poB(1- e)- C + e(pB - Npp)) bD = (- e)pB - Np,p) (2) Abo = bc - bD = (1 - e)(Np (1 - e)p - C) And, as we mentioned, the term A bo gives the difference in payoffs between the two variants that control stage 0 behavior. A Heuristic Analysis Let's first analyze equation (1) by asking under what conditions will transmission favor cooperation (Apo > 0) in the absence of stage I punishers (pi = 0). In this case, Abo = -C (1 - e), which is always negative; hence, payoff biased transmission never favors cooperation in the absence of punishment. So, to give cooperation its best chance, we assume that by some stochastic fluctuations the frequency of cooperators ends up near one. How big does a have to be so that conformist transmission overpowers payoff-biased transmission and increases the frequency of cooperators? The frequency of cooperators increases when: a0 > -— (3) 1 + 1 C(1 - e) where ajc is the minimum value of a that favors the spread or maintenance of the P-variant at stagej (Api > 0). With no punishment, Ai = 1/1 A bilma means po = 1/(C(1 - e)). As a consequence, aO must be greater than 0.50. And, as we mentioned earlier, a> 0.50 seems extremely unlikely because such high values would prevent the diffusion of novel practices-cultures would be entirely static (see Henrich 2000). Hence, conformist transmission, operating directly on cooperative strategies, is unlikely to maintain cooperation in the absence of punishment. Now, let's examine the conditions under which 1st stage punishment will increase in frequency. Again, the change in the frequency of 1st stage punishers, Apl, is affected by both payoff biased and conformist transmission: ApI = Pt (1 - p,)[(1 - a)P(bp, - bNl) + ac(2p, - 1)] (4) 9

Cooperation and Conformist Transmission The payoffs (b's) to punishment and non-punishment depend on the cost of punishing (0) and of being punished (p), as well as the chance of mistakenly not punishing (e). The subscript PI indicates the Pvariant at stage 1, while NP1 indicates the NP-variant at stage 1. bpl = -(1 - e)N@(1 - p, + poe) - eNp2p(l - e) bNP = -Np2 (1- e)p (5) Ab, = bp, - bPI = -N(1- e) (1- (I - e)p,) - P2 (1- e)p) Assuming that there is only one punishment stage (i = 1), and that cooperators and stage 1 punishers are initially common (po = 1 &pi = 1), then Abl = -N (1 - e) e 0. If errors are rare enough such that terms involving e2 are negligible, then Abl = -N e b. Thus, the difference in payoff between the P-variant and the NP-variants at stage 1 is just the cost of punishing cooperators who make errors. If e < (1/N), which is plausible unless groups are very large, then Abl is less than p-and smaller than Abo because 0 < p < C. Note that, when i > 0, / = 1/(N (1 - e) (p (1 - e) + e 0)), so the threshold value of a necessary to stabilize cooperation in a two stage game, a1, is2: 1 = e ee a, = -- --- -- (6), p p(- e)+20e p Equation (6) tells us that a 1 depends only on the error rate and the ratio of the cost of punishing to the cost of being punished. It also says that unless punishing is much more costly than being punished (2(e > p ), the threshold strength of conformism necessary to maintain first stage punishment is small and less than the amount of conformism necessary to stabilize 0th stage cooperation (a0o > a I = e). If we do the same analysis for stage 2, we get the following expressions for Ap2 and Ab2: 2 Note, under a small range of conditions, when C > N(p(l-e) + e), the system can still remain stable. Under these conditions, however, j3 becomes 1/C(1 -e). For simplicity, we leave this nuance until later in the paper. 10

Cooperation and Conformist Transmission AP2 = P2 (1 - 2)[(1 - a)fAb2 + a(2p2 -1)] (7) where Ab2 =b2 -bNP2 = -(1 -e)N[( -p (1- e))( -p (1 -e)N) -(1-e)p] (8) The first term inside the square brackets in (8) is proportional to the number of individuals who didn't punish during stage 1, (1 -pl(l - e)), and to the probability that there was at least one defector during stage 0: (1 - p (1 - e) N). The quantity po (1 - e) is the expected frequency of cooperators who did not make a mistake, thus (po (1- e))N gives the probability that a group contains all cooperators who did not make a mistake-so, to get the probability that a group contains at least one defector, we simply subtract this probability from one. The second term inside the brackets is the cost of being punished during stage 2 for failing to punish during stage 1. If no 3rd stage punishers exist (p3 = 0), and 1St stage punishers and cooperators are initially very common, then Ab2 = -(eN)2 0. Note, the difference in payoffs, Ab2, is a factor of eN smaller than Abl, but the strength of conformist transmission remains constant. Calculating the required size of a2 we get: Noe2 e N a2 = N —e Ne (9) p(1- e) + ef p Equation (9) demonstrates that 0 < a< a 1 < a o = ~2. In this case a2 =Nea 1. If we repeat this calculation for games with more punishment stages, we find that, although punishment during the last stage of the game is never favored by pay-off biased transmission alone, any positive amount of conformist transmission (a >0) will, for some finite number of stages, overcome payoff-biased transmission and stabilize punishment. For any value i (i > 0), the amount of conformist transmission required to stabilize punishment at the i-th stage is: Oe(Ne)'-' ed) (Ne10)_ a, = q- Ne — j- (10) p(l- e)+e(l+(Ne)'-) p Equation (10) shows that minimum amount of conformism necessary to stabilize punishment during the last stage, a i, gets smaller and smaller for greater values of i (assuming e < 1/N). 11

Cooperation and Conformist Transmission Once conformist transmission overcomes payoff-biased transmission and stabilizes punishment at stage i, punishment at the stage i - 1 will be stabilized because non-punishers at stage i - 1 will be punished by frequent punishers during stage i. Once punishing strategies are common and stable at stage i - 1, frequent punishers at i - 1 will cause pay-off biased transmission to favor the prosocial variant at stage i - 2. In most cases, a combination of punishment and conformist transmission will eventually stabilize cooperation at stage 0. However, if C is sufficiently greater than Np(l - e), then stable punishment at stage 1 will not be able to overcome the costs of cooperation at stage 0, and cooperation will not be maintained, despite stable, high-frequency 1st stage punishers. Formal Stability Analysis A more rigorous local stability analysis of the complete set of recursions supports the heuristic argument just given. Consider the set of i + 1 difference equations where Apj (j = 0, 1,... i; see Appendix A) provides the dynamics of the behavioral traits at each stage. The cooperative equilibrium point (po = 1, pi = 1... pi = 1) is locally stable under two distinct conditions: Stability Condition 1, When i > 0 and C < p (1- e)N+ (eN)iq the cooperative equilibrium is locally stable when: id = -a +(1- e)(1 - a)p0(Ne)' < (11), where p = 1/(N(1 - e)( (1 - e) + eq)). First, note that if a = 0, the cooperative equilibrium is never stable because all the parameters involved are always positive. However, as long as a is positive and e < 1/N, then the system of equations will be stable for some finite value of i. Substituting in the value of P, and solving (11) for a, we find that the minimum value of a is eo(Ne)'-' a, > ----- rv -/ ---- - (13) p(1-e)+ e0(l+ (Ne)'-')) which is the same value (given in equation 10) derived using a less formal argument. Stability Condition 2: However, if C > p (1- e)N + (eN)Q' and i > 0 then the cooperative equilibrium is stable when: 12

Cooperation and Conformist Transmission Ao = -a + (1- a)(1- e)/3(C - (1 - e)Np) < O (12) If we then solve this for the values of a that create a stable cooperative equilibrium, we find: > /3(1- e)(C - (1- e)Np) (13) 1 + p(1- e)(C - (1-e)Np) Under Stability Condition 2, / = 1/(C (1-e)), so3 1 Np(1-e) ai > (14) 2- Np(-e) C The term Np(-e) is always between zero and one, so the required a is always less than /2. This means that, even when the expected costs of being punished by everyone does not exceed the cost of cooperation (or the cost saved by defecting), the cooperative equilibrium can still be favored. Intuitively, this is the case in which conformist transmission and punishment combine to overcome the cost of cooperation. As with the previous condition, however, it's conformist transmission that stabilizes i-th stage punishment, which stabilizes 1st stage punishment. At first, stability condition 2 may seem strange, but the world is seemingly full of cases in which the costs of being punished seem insufficient to explain the observed degree of cooperation. Hence, this may illuminate such things as why Americans pay too much in taxes (i.e. more than they should assuming most people pay because they fear punishment; Skinner & Slemrod, 1985), why Americans wait in line, why the Ache share meat (Hill & Kaplan, 1985), and why people bother going to the voting booth (Mueller, 1989)-all of which seem overly cooperative, given the expected penalty. As we'll show, this may be important from a cultural group selection perspective because groups that minimize the costs of punishing and being punished (p and 0), while still maintaining cooperation, will do better than those that rely heavily on punishment to maintain cooperation. 3 Actually, there's a tiny range of(Np(l-e) + (eN)j) < C < (Np(l-e) + N0e) under whichj, still equals 1/(N(1 -e)(p(l-e) + eq)). Nothing particularly interesting happens in this range, so we will not discuss it. Note, if i = 1, the range is non-existent. 13

Cooperation and Conformist Transmission Once cooperation is stabilized, it can spread by cultural group selection By itself, the present model does not provide an explanation for human cooperation. We have shown that, under plausible conditions, a relatively weak conformist tendency can stabilize punishment, and therefore cooperation. However, non-cooperation and non-punishment is also an equilibrium of the model, and we have given no reason, so far, why most populations should stabilize at the cooperative equilibrium rather than the non-cooperative equilibrium. However, when there are multiple stable cultural equilibria with different average payoffs, cultural group selection can lead to the spread of the higher payoff equilibrium. As we've demonstrated above, cultural evolutionary processes will cause groups to exist at different behavioral equilibria. This means that different groups have different expected payoffs (due to different degrees of economic production, for example). The expected payoff of individuals from cooperative groups is b =(1- e) (B - C- eN(0 + p(l+ i)), while the expected payoff of individuals in noncooperative/nonpunishing groups is zero. Thus, cooperative groups will have a higher average payoff as long as the benefits of cooperation are bigger than the costs of cooperation and punishment. The combination of conformism and payoff biased transmission must also be strong enough to maintain stable cooperation in the face of migration between groups. Such persistent differences between groups creates the raw materials required by cultural group selection. Cultural group selection can operate in a number of ways to spread prosocial behaviors. Cooperative groups will have higher total production, and consequently, more resources that can support more rapid population growth relative to non-cooperative groups. Or, cooperative groups may be better able to marshal and supply larger armies than non-cooperative groups, and hence be more successful in warfare and conquest. However, although these factors may be important (see Bowles 2000), another, slightly subtler, cultural group selection process may also be significant. Pay-off-biased imitation means people will preferentially copy individuals who get higher payoffs. The higher an individual's payoff, the more likely that individual is to be imitated. If individuals have occasion to imitate people in neighboring groups, people in cooperative populations will be preferentially imitated by individuals in noncooperative populations because the average payoff to individuals from cooperative populations is much 14

Cooperation and Conformist Transmission higher than the average payoff of individuals in non-cooperative populations. Boyd & Richerson (2000) have shown that, under a wide range of conditions (and fairly quickly), this form of cultural group selection will deterministically spread group-beneficial behaviors from a single group (at a groupbeneficial equilibrium) through a meta-population of other groups, which were previously stuck at a more individualistic equilibrium. Culturally-evolved cooperation may cause genes for prosocial behavior to proliferate Once the cooperative equilibrium becomes common, it's plausible that natural selection acting on genetic variation will favor genes that cause people to cooperate and punish-because such genes decrease an individual's chance of suffering costly punishment. This could arise in many ways. Individuals might develop a preference for cooperative or punishing behaviors that increases their likelihood of acquiring such behaviors. Or, alternatively, natural selection might increase the reliance on conformist transmission, making people more likely to acquire the most frequent behavior. Here, we analyze the case in which the probability of mistakenly defecting or not-punishing, e, varies genetically. We assume that cultural evolution is much faster than genetic evolution, which implies that the population exists at a culturally evolved cooperative equilibrium. Further assume that while most individuals still make errors at the rate e, rare mutant individuals have a slightly different error probability of e'( = e - E), where E is small (I1l << e). If we assume that an individual's average payoff, b, is proportional to her average genetic fitness, then we can ask whether prosocial mutants will spread. The expected fitnesses for the two types, F and Fm ('m' for mutant), and the difference between them, AF, are as follows (assuming i > 0):4 F = (1- e)(B - C - eN(0 + p(l - e)(i + 1)) F,, = B(1- e)- C(1 - e') - N(e + e'p(l - e)(i + 1)) (20) AF = F, - F - (Np(i + 1)- C) 4 If conformist transmission alone can stabilize cooperation without any punishment (i = 0), then AF < 0, and prosocial genes will never spread. 15

Cooperation and Conformist Transmission When AF is positive, prosocial genes can invade. If C < (1- e)Np + (eN)'f (Stability Condition 1), then C is always less than Np(l - e) (i + 1), and prosocial genes are always favored. Once at fixation, these prosocial genes cannot be invaded by more error prone, antisocial, individuals. In Stability Condition 2, where C> (1- e)Np + (eN)'1, prosocial genes are favored (for i >0) when: (Ne)'b C 1+ (Ne) < - < i+1 (21), Np(l- e) Np(1-e) which is a wide range, since the smallest possible value of i is 1. However, there exists a range of conditions in which culturally-evolved cooperation is stable, but prosocial genes cannot invade-in fact, anti-social genes (genes favoring more mistakes) may invade. This occurs when (for i > 0): C (1 - a) (i+ ) < — - < (22) Np (1- e) 1-2a Noprosocial Stability When condition (22) holds, cultural transmission will stabilize cooperation, but prosocial genes will not be able to invade-instead, anti-social genes will be favored (i.e. e is negative). Note however, that the minimum value of a for this condition to exist requires a> 0.333, which occurs when i = 1. Generally, we believe a is much smaller than this, but we'll await the verdict of future empirical work. Interestingly, this antisocial invasion is likely to occur in the groups most favored by cultural group selection-i.e. those who maximize group payoff by minimizing punishment costs (and i), without destabilizing cooperation. Unfortunately, anti-social invasion will decrease average payoffs, and may eventually destabilize cooperation. Further work on this gene-culture interaction will require coevolutionary models that combine both cultural and genetic evolutionary processes, and particularly the cultural group selection process we have described above. As we've begun to model it here, prosocial genes are not strongly selected against in noncooperative populations because error making, in terms of mistaken cooperation and punishment, only occurs when individuals adopt prosocial traits-defectors don't mistakenly cooperate. So, if the world is a 16

Cooperation and Conformist Transmission mix of cooperative and non-cooperative populations, prosocial genes will be favored in a wide range of circumstances in cooperative populations and will be comparatively neutral in non-cooperative populations. It's possible that incorporating defector errors, in the form of mistaken cooperation or punishment, may effect this prediction. Furthermore, cooperation may not be a dispositional trait of individuals, but rather a specific behavior or value tied only to certain cultural domains. Some cultural groups, for example, may cooperate in fishing, and house-building, but not warfare. Other groups may cooperate in warfare, and fishing, but not house-building. Such culturally-transmitted traits would have the form 'cooperate in fishing,' 'cooperate in house-building,' and 'don't cooperate in warfare,' rather than the more dispositional approach of simply 'cooperate' vs. 'don't cooperate.' If this is the case, then the migration and spread of prosocial genes becomes more difficult. As prosocial genes spread among groups with different stable cooperative domains, individuals with such genes would be more likely to mistakenly cooperate in noncooperative cultural domains. For example, in cultures where people cooperate in fishing, but not warfare, individuals with prosocial genes may be more likely to mistakenly cooperate in warfare (and pay the cost), as well as less likely to mistakenly defect in cooperative fishing. We intend to pursue those avenues in subsequent papers. Conclusion We have done three things in this paper. First, we've shown that, if humans possess a psychological bias towards copying the majority, as well as a bias towards imitating the successful, then cultural evolutionary processes will stabilize cooperation and punishment for some finite number of punishment stages. Second, we discussed how, once cooperation is stable, a particular form of cultural group selection is likely to spread these group-beneficial cultural traits through human populations. And finally, we've demonstrated that prosocial genes, which can not otherwise spread, can invade in the wake of these cultural evolutionary processes, under a wide range of conditions. 17

Cooperation and Conformist Transmission Appendix A For all i: Api = pi (1 - p) [(1 - a) P (Abi) + a(2p, - 1)] Difference in payoff for i = 0: Abo = b -bD = (1 - e)(Np, (1 - e)p - C) Difference in payoffs for i > 0: i-2 Ab = b, - - ( -- e)N(1( - p-,_ (1 - e)) (1- p- ( 1 - e) ) - p+l( - e)p) j=0 Where Q - e)N N (-l)i N!eJ I -e =i I - Ne j=1 j!(N-j)! Thus i-2 Ab = bPi - bNP, -(1 - e)N(~(l1 - pi- (1 - e))Ij (1- p712 (1 - Ne) -P+ (1 - e)p) j=0 Eigenvalues for the system of i + 1 equations with punishment up to the i-th stage: AO =-a + (1- a)(1-e)P(C-(1-e)Np) i = -a + (1 - a)(1 - e) ((eN)i - pN(1 - e)); 0 <j < i Ai = -a + (1 - a)(l - e)tp (eN)' c When the dominant eigenvalue (that with the largest value) is less than zero, the system is locally stable at point (po,Pi,,..., pi+) = (1, 1,..., 0). 18

Cooperation and Conformist Transmission Reference List 1. Asch, S. E. (1951) Effects of group pressure upon the modification and distortion ofjudgments. In: Groups, Leadership and Men (Guetzkow, H., eds) pp. 39-76. Pittsburgh: Carnegie. 2. Baron, R., Vandello, J., & Brunsman, B. (1996) The forgotten variable in conformity research: impact of task importance on social influence. Journal of Personality & Social Psychology 71, 915-927. 3. Bowles, S. (in press) Individual interactions, group conflicts and the evolution of preferences, in Social Dynamics. (Durlauf, S. & Young, P. eds.) Washington D. C.: Brookings Institution. 4. Boyd, R.& Richerson, P. J.. 1988. The evolution of reciprocity in sizable groups. Journal of Theoretical Biology 132, 337-56. 5. Boyd, R.& Richerson, P. J (in press). Norms and Bounded Rationality. In: The Adaptive Tool Box, G. (Gigerenzer and R. Selten eds.), Cambridge MA: MIT Press. 6. Boyd, R.& Richerson, P. J. (1985) Culture and the Evolutionary Process. Chicago, IL: University of Chicago Press. 7. Boyd, R.& Richerson, P. J. (1992). Punishment allows the evolution of cooperation (or anything else) in sizable groups, Ethology and Sociobiology, 13, 171-195. 8. Brown, J. L. (1983) Cooperation-a biologist's dilemma. Advance in the Study of Behavior 13, 1-37. 9. Campbell, J. D and Fairey, P. J. (1989) Informational and normative routes to conformity: the effect of faction size as a function of norm extremity and attention to the stimulus. Journal of Personality & Social Psychology, 57, 457-468. 10. Fudenberg, D. and Maskin, E. (1986) The folk theorem in repeated games with discounting or with incomplete information, Econometrica, 54, 533-554 11. Gil-White, F., and Henrich, J. (2000) The Evolution of Prestige. Working Paper at the University of Michigan: webuser.bus.umich.edu/henrich. 12. Hamilton, W. D. (1964) The genetical evolution of social behavior. Journal of Theoretical Biology 7, 1-52. 13. Harris, J. R. (1998) The Nurture Assumption: Why children turn out the way they do. New York: Touchstone. 14. Henrich, J., & Boyd, R. (1998) The evolution of conformist transmission and the emergence of between-group differences. Evolution and Human Behavior 19, 215-42. 15. Henrich, J. (1999). Cultural Transmission and the Diffusion of Innovations: Adoption dynamics indicate that biased cultural transmission is the predominate force in behavioral change and much of sociocultural evolution. Working Paper at the University of Michigan: webuser.bus.umich.edu/henrich. 16. Hirshleifer, D. & Rasmusen, E.. (1989) Cooperation in the repeated prisoner's dilemma with ostracism. Journal of Economic Behavior and Organizaiton, 12, 87-106. 19

Cooperation and Conformist Transmission 17. Insko, C. A., Smith, R. H., Alicke, M. D., Wade, J., & Taylor, S... (1985). Conformity and group size: the concern with being right and the concern with being liked. Personality & Social Psychology Bulletin 11, 41-50. 18. Kaplan, H. & Hill, K. (1985) Current Anthropology 26, 223-245. 19. Mcadams, R. H. (in press). The origin, development, and regulation of norms. Michigan Law Review, 96, 338. 20. Mueller, D. (1989). Public Choice II. Cambridge: Cambridge University Press. 21. Richerson, P. J., & Boyd, R.. (1998). The Evolution of Ultrasociality, in Indoctrinability, Ideology and Warfare. (Eibl-Eibesfeldt, I. & Salter, F. K. eds.), pp. 71-96. New York: Berghahn Books. 22. Seeley, T. D. (1995). The Wisdom of the Hive. Cambridge: Harvard University Press. 23. Skinner, J., & Slemrod, J. (1985). An Economic Perspective on Tax Evasion. National Tax Journal 38, 345-53. 24. Smith, J. M., & Bell, P. A. (1994). Conformity as a determinant of behavior in a resource dilemma. Journal of Social Psychology 134, 191-200. 25. Sober, E., and Wilson, D. S. (1998). Unto Others: The Evolution and Psychology of Unselfish Behavior: Cambridge MA, Harvard University Press. 26. Takahasi, K. 1999. Theoretical aspects of the mode of transmission in cultural inheritance. Theoretical Population Biology 55, 208-25. 27. Trivers, R. L. 1971. The evolution of reciprocal altruism. Quarterly Review ofBiology 46, 35-57. 28. Wit, J. (1999) Social Learning in a Common Interest Voting Game. Games and Economic Behavior 26, 131-156. 20

Cooperation and Conformist Transmission Table 1: Dichotomous traits for cooperation and punishment Stage Freq. of P P-variant NP-variant variant 0 o0 cooperate defect 1 _p1 punish defectors don't punish defectors 2 P2 punish non-punishers don't punish nonat stage 1 punishers at stage 1 3 P3 punish non-punishers don't punish nonat stage 2 punishers at stage 2 i pi punish non-punishers at don't punish nonstage i -1 punishers at stage i-1 21