Approximating Markov Perfect Equilibria for Nonstationary Dynamic Games Alfredo (arcia Robert L. Smith Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109 Technical Report 96-16 November 1996

Approximating Markov Perfect Equilibria for Nonstationary Dynamic Games Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109 1 Introduction A sequential characterization1 of irfinite horizon subgame perfect Nash equilibria as limits of Finite Horizons subgame perfect Nashl etquilibria is not possible due to "end of horizon" effects (see. for instance. Fudenberg and Le\viin[1983] and [1986]). This can be illustrated, with the celebrated game "prisoners (lilemnla" w\\ere "always defect" is the unique etquilibrium outcome of any- finite horizon repetition of the (Irigame. Hlowever, when the game is repeated ad infinitrun then, alway-s cooperate" is t he eqlilihriumilln outcome of a, certain Perfect Nash equilibrium for a certain range of discount rates. This situation is due to the fact that for a fixed finite horizon, "lefection" in the last timne period dlomlilnates "cooperation", hence "always cooperate" carl not )be aIn equilibrium outcome for such fixed hlorizon. The first work attempted to overcoIme this phenomenon was carried out by Fudenberg and Levine[198:3].They build on -the work hy! Ra(dner[1980], who pointed out that if players are boucndedlv rational, or equivalently. if one relaxes the definition of equilibria. up'to an epsilon, making epsilon smaller as the horizon (livCcerge(l. then the set of epsilon-equilibria, of finite horizons converged to the set of infinite hlorizonl e(lqilibria. Fudenberg and Levine's work focused on the space of strategies, in whichl. t lhey d(lefille a suitable metric topology, tha.t facilitates the result. Pearce[t1990], explains the relationshipi otf tadn(ler's and Fudenberg and Levine's work to the self-ge ne r(ation approach used )by- A\},re'l. l ')ar(ce and Stacchetti [1986] and [1990] for repeated garries. They exploit the stIructIre of re'peaed games with imperfect monitorig to obtain a see KIleil et al. [1984] for the Kilratowskv ldcltiinition of set convergence. 1

dynamic programming approach to infinitely repeated games. That allows to obtain a set valued "value-iteration like" algorithm to compute the set of normalized equilibriuml payoffs. This theory can be easily extended to stationary discounted dynamic games. A second line of work, is provided in Harris [1985a] and [1985b]. In those papers. he studies the various topologies, other than the one introduced by Fudenberg and Levine. that may be used to obtain the set convergence result. Borgers[1989] took a different approach, by focusing on the space of feasible histories (i.e infinite sequences of outcomes). Assuming compactness of such space, he shows that the limit of a converging sequence of finite horizon equilibrium histories is an equilibrium history of the infinite horizon game. Our objective in this paper, is to provide a new method to overcome "end of horizon" effects in the context of non-stationary infinite horizon dynamic games. Our inspiration comes from the works by Schochetman and Smith [1989] and [1992] which leads to the compuitational procedures described in [1991] and [1992]. The idea here is to restrict the deviation possibilities for players by forcing and ending target state. Interestingly enough, Ostrom, Gardner and Waalker [1995] report the existence of a similar mechanism in the common pool resources exploitation. In that context, players agree to restrict the catch size or net size so that implicitly a reasonable level of stock is maintained. On the other hand, recent work has pointed out the difficulties in extending to the dynamic game setting the sustainabilitv of first best outcome as equilibrium play( see Dutta[1995] ). Onlv under very restrictive assumptions ( see Tolwinski[1982] and Gaitsgory anid Nitzan[1994] ) efficient outcomes have been shown to be sustained by Markovian Perfect Equilibrium. In addition to this, Engwerda[1996] has pointed out the various difficulties in approximating infinite horizon open-loop Nash equilibrium in Linear Quadratic games. Motivated by all these features, we obtain a sequential characterization of MNarkov Perfect Infinite horizon equilibrium histories under some hypotheses. A tie-breaking rule is then proposed, as in Schochetman and Smith [1991] and [1992], to ensure convergence of a particular selection from the set of finite horizon relaxed solutiors to a Markov Perfect Infinite horizon equilibrium history. Although, the game structure assumed is more restrictive than that assumed by Fude-lnberg and Levine[1983] and Borgers[1989], the imJplemlenitation of the computational proce(lure is sirpler and it is extended to the un(liscountedl case, which constitutes a, very significanlt advance. This is the structure of the chapter. dWe introduce the class of nonstationary (dynamic games that we study. then the new solution concept is presented and the results. Finally, an example for which the theory holds is presented. Large scale tests and computational implementation

are currently being undertaken. 2 Nonstationary Dynamic Games. For ease of the exposition, we will present the case for two players, bearing in mind that the extension to the Multi-player case is straightforward. A two-person, discrete time, finite horizon dynamic game with state variable consists of: * An index set A = {0,1, 2, 3....T} denoting the stages of the game, where T is the maximum number of stages to be played, also known as "planning" horizon. * An index set I = {1,2} called the players set. * A collection of sets {Uk: k A,' i C I}, with Uk C RP for some p. We refer to this set as the "set of available controls at period k for player i" * A collection of sets {Sk: k E AK}, A Sk C Rq for some q, called the "set of attainable states T at period k". The cartesian product H Sk contains the set of T-long feasible sequences k=O of states. * A collection of functions {fJ,: A (E A} of the form fk: Sk x U X Uk -> Sk+l, wlhich modlels state dynamics Sk+1 = J^(Sk, I. L') for every k E AN wtlere so is given a priori. * Let H(T) be defined as T H (T) f(U I x () k=0 In wordls, it is the set of T-long feasible sequences of controls that players exert. 'For every, T tiT i H(T),one can recursively construct an element of H Sk with the state dynamics k=O l)resent ed above. * A collection of functions {r: k C.A i \ Z1} of the form r: Sk x U x ( H -, R, called the "A-stage reward function for player. 3

* Discounted rewards: The total discounted reward for player i C I is T P}(hT(SO)) = E riJ(s, Uk U2) k=O where Ai C [0,1) is player's i discount factor. * A closed loop perfect information structure, i.e at each time period players observe the "state" variable. 2.1 Strategies and Markov Perfect Equilibria (MPE). We now introduce the concept of Nas[h Elquilibria in strategies that employ available information for dynamic games with fixed "planrlinig" horizon T * A Closed-Loop Strategy for plaelr i C I, say fT, is a T-tuple of continuous maps it..S- -I Uk, so that 7TT is of the form: 77T= (TT 77., ^) \e denote Fi(T) the set of all s1ich( strategies for player i G Z. We refer to the 2-ttple 7T = (7T, ) 1(T) x H(T) ) as a Closefd Loop Strategy combination and denote H(T) the set of all strategy combinrlationls. Suich strategy combinations indtlce 7 -lolng feasible sequence of controls as follows 1. In the first period players pIlay ((so),, (2S)) C UT x U2 so that at the end of that p)eriod the state attained is givein by:.*1 =./ (.,,. (,(.So), 7o2(So)) 2. Then they observe s1 C ^i aInd pl(ay 1 ( (.I). 7(s1)) U x 2 so that the state to be attained is sT e o ess t n u eils rcii(. ( ).7 sI(.1)) The process then unveils recllrsiv'ely. 4

We will denote by hT (so) E H(T) the control sequence recursively obtained as above (sometimes called the "history" induced by strategy combination 7rT ) Note that from any state Sk E Sk with k ~ 0 and k E Ai,the strategy combination completely specifies the play that follows after state sk recursively: 1. In period k players play (7](jSk), wk(Sk)) CE k X lk so that at the end of that period the state attained is given by 5k+1 =,k(sk, lr(Sk), 72(Sk)) 2. Then they observe Sk E Sk and play (T7r+l(sk), Tk+l(Sk)) E Uk+ x Uk+ so that the state to be attained is Sk+2 - fk+l (-Sk+ 17, k+l.(Sk), 7r+1 (Sk)) The process then unveils recursively. T T \Ve will denote by hr (sk) C H(U1I ( x r2) the control sequence recursively obtained as above k (sonletimes called the "history" induced by strategy combination 7T after state s). Remark 1 lWe can conclude that a strat(gy corbination can be alternatively described if for f rr.y alttainable state.sk f Sk. thl: play to fol/lo11 i. specified. \We are now ready to introduce the solution concepts with which we shall be dealing with. Definition 1 We say that VT is a Nash Equilibrium in Closed-Loop strategies iffor ( r'y plat lf r i i Z wtho would like to drviatlt from TT by playing fT G Hi(T) would find n.o /lnc(f ritie: in doinng so, i.e: P (hT ( —P) (T ()) Wl/e11f (Clr T Ti) E n(T) stands for the strategy corrmbination in which all players j E C and j iI jfllor 1r 7 and player i C I follo, lls T The above solution concept is knowrn niot to be time consistent in the sense that the play l)rescril)ed after some state other thani tihe initial state, may not constitute itself an equilibrilrn for the game that starts at suchl state. HIlrece, play off the equilibrium history is not ("c'rdbl(le" that is, implicit "threats" of sluch off' equilibrium play will not be taken seriously b)l oppolellts.To rule out suc1h "rlon-ciredihble thlreats" for deviators, a refinement of the previouls solution concept is to require that thbe play to follow after any other state sk E Sk with 0( < T must prescribe a Nash e(qilibrium in the sense above defined for the game that starts at.sk. most commonly known as a subgame( see Fudenberg and Tirole[1993] ).

Definition 2 We say that 7T is a Markov Perfect Equilibrium (MPE) in Closed-Loop strategies if for every player i C I who would like to deviate from 7T by playing JT r I(T) from everjy:s C Sk with 0 < k < T.would find no incentive in doing so, i.e: PT (hsh t (Sk)) < PT(h T (k)) 7where (< TTTI) C H(T) stands for the strategy combination in which all players j C Z and j i followr 7T r and player i.C I follows -T. We denote nI*(T) the set of all "Markov Perfect Nash equilibrium" strategies. 4t) Remark 2 In the above definition we have slightly abused notation, by writing P(h" Tr' (sk)) (T ^T) T silnce the play that follows after sk. that is h (Sk) is only defined in (l(U x Uj). However. k by thel additive and separable r'eward structure assumed, the sense of the inequalities in the above definit/ion is to be understood by appending "play" that reaches state sk to h~' (sk). It is Wortlh pointing out thiat this is not correct in a more general setting. for instance, whhen there is no (additirve and separable rewalrd structure assumre.d. 2.2 Infinite Horizon Dynamic Games. \\e 11\ow extendI the above (ldeilitions an111 coincep)ts to tthe case when AV = {0, 1.2,..}, i.e there is anl infinite nulmber of stages to play. * (Alosed-Loop Strategyl for player i C I say.T; is of the form (7T[, 7+..... lwherel.: 5k-1 - '. \\We denote I1I tlhe set of all such strategies for player i. \\e refer to the 2-tuptle- =7 (71. 72) as a C(losd Loop Strategy combination. and denote 11 tle set of all closed loo0) strategy cOll})iIlations. As a-bove explained, it should b)e clear that siich( combinations induce an infinite sequence of' controls of the form (7r,(.5). 1(.1)..) - n (tj x 7) = H. We will denote by h(so) k -=( ' suchl sequcence. Similarly, from any otilier stte.l.,Sk. the play that follows according to tr is (lenote(l by h s(k). Moreover, the total aggregated rewalrd receivedl 1) plaver i under strategy combination wis (lefiIietl by introducing the collectiois)f of Inaps p): Hl -f (T) which are the T- Horizon trluncation of infinite horizon strategy 'colbl)ilalt iols.: I —'x. 6)

where pT Tstands for the T-period truncation of 7. All the above definitions carry over in a straigthforward manner. 2.3 Topologies on the set II. Since our interest is to study convergence of finite horizon equilibrium strategies to infinite horizon equilibrium strategies, it is very important to carefully define relevant topologies on H, and consequently the different notions of convergence they induce.In this section we closely follow Harris[1985b] We will adopt the convention that any finite horizon strategy combination is trivially extended through any feasible choice of continuation sequence of strategies, so that its extension is an element of H. We first concentrate on a top)ology for.f. Given h = (uo, U1, 2,. ) and i' = (,t.' u.. we define the metric D: H x H -- R+ Iby nD([h, I=, [ min{( dt (ut, u'), 1} D(hJ( /1) Sil) [ t t where dt: ( t. 1 x (1T;) x (Ut x 1t2) - H is tIhe product metric on Ut x (2. This metric induces the )product topology on H (see Munrkres[1'75]. ). 123 ) Definition 3 W is the topology witlh lt.si.s consisting of the sets: { - H I ID( I h) < s} ~here 'h C H. i.e some infilnite horizon history. The -basis is then obtained as we vary -c. h. va ries over H. In wordls, the notion of convlergernce related to the topology W is simply the fact tlhat for any given subgame ( or state.st ) - - witl respect to W if and only if the sequence of histories induced by TT, namely {IT (.,t)r converges to hs(st) in the product topology. Definition 4 C is the topology Iv/itlh.l..s (l'ni./tin71 of the sets ' { - II 7- T with I. obtained as vari. ol I rs over all perods. witl1 H C 1. obtained as ^ varleS(ji 01'f a1- I varies over all periods. /

In words, wrT - 7 with respect to C if and only if for all subgames simultaneously the sequence of histories induced by lT converge in the discrete topology (they fully agree) to the histories induced by y. LC is essentially a uniform version of W,for convergence in L requires that for all t there exists a T(t) such that for all T > T(t) we have that pt7 T = pt, that is, the strategy combinations from the first period up to period t as prescribed by 7T and 7 agree. Clearly, for practical purposes it ma be easier to prove convergence with respect to W, since convergence need only be verified for representative subgames. On the contrary, L imposes more restrictive conditions on an approximating sequence, so it may be more helpful in proving uniqueness. We now proceed to present the metric on II that was used by Fudenberg and Levine[1983] and that for most applications will induce topology C on I. To capture the notion of closeness most relevant to Markovian Perfect Equilibrium we expect two strategies to be close if for every t and from every feasible state st (subgame) the histories induced by these strategies are close and the histories resulting when any one player deviates from them are also close. The metric that generates such topology is defined as follows, for 7, X H: p(7-,y)s) {su D(h(.st), h(st))~ sup [D(hS"-'(st), h -(st)]} t..St CEt in EHi Fudenberg and Levine [198;3] observed that the topology induced by metric p coincides with C when the action sets for all players in all periods are finite 3 Constrained Markov Perfect Equilibria. Ir this section, we introduce a new methodology to obtain sequential characterization of infinite horizon perfect equilibria. 3.1 Preliminaries. We denote Hn, rH and Hl(T): the set, of' Infinite Horizon MIPE strategy combinations, the set of strict Infinite Horizon MPE strategy combinations and the set of T-period horizon MPE strategy combinations, respectively. Definition 5 Let ST G ST be some fj(alsibl state at time period T. we denote by 1I(7T, sT). the sft of closed-loop strategy colbirntion rsulch that for every sk C Sk; 0 <: < T. and thle statef ST 7s reach lable from sk. tlhr play tof follou ' r1i ust reach ST* In other words. the historiy prescribed 8

Ti.e h/ (sk) reaches state ST. at time period T. whenever state ST is reachable from sk k. 0 < k < T O<k<T. Note that the play prescribed by any 7- T E I(T, ST) from some state sk C Sk from which ST is not reachable, is completely irrelevant to the definition. Let us now briefly discuss the motivations for the solution concept relaxation that we will introduce shortly. The main difficulty for a sequential characterization of infinite horizon equilibria as limits of finite horizon equilibria is due to "end of horizon" effects. In words, for a fixed finite horizon. the final state attained for finite horizon equilibrium will generally be different from the state attained by the truncation of the infinite horizon equilibrium. Myopic behavior close to the fixed finite horizon is the explanation for this.Ve will try to overcome this effect by forcing equilibrilm strategies to attain a certain "target" state. However, this artificially imposed constraint introduces "strategic" alterations to the original game. Intuitively, some player may find it attractive to deviate in the early stages if he knows in advance that players will have to coordinate at the final stages in order to attain the target state. This is clearly a new artificial feature that as we will see pose difficulties to prove that any Infinite Horizon MPE is the limit of FiIite Horizon Constrained ApproximIate MPE. Definition 6 A strategy comb-ination 7T C H(T, sT) is called a 'Constrained MPE to state.-T ifif / f.r (very deviation XT X H(T) such that (cT T7) i H(T, sT) from ev(ery ok C Ak Ilith 0 < A < T. suchi that state ST is reachablle from sk we have P( h 'T ) (Sk)) P( T ( S k)) \\'e denote I\(T, sT) the set of all "Constrained MPE to state ST" \We now introduce the last additional notation needed for the analysis. * Let \'(T) U n*(T,ST) ST E ST The set of all constrained XMIPE e(lili)riumin strategies to all attainable states for horizoln T. 9

4 A First Example: Sequential Duopoly. In this section we briefly illustrate all the definitions above introduced for the case of a duopoly competition in prices, as in Maskin and Tirole [1988]. Players move sequentially, so that in odd numbered periods k, firm 1 chooses its price which remains unchanged until period k + 2.That is, P1+ = Pk if k is odd. Similarly, firm 2 chooses prices only in even numbered periods, Pk+l = P| if k is even. Hence, at time period k, firm's i instantaneous reward rk(.) is a function of the 'state", i.e the price that firm's j set on period k- 1, say p:, and the "action", i.e the price that firm's i will establish p1. Price sets are discrete and bounded, goods are perfect substitutes, that is, firms share the market equally wilenever they charge the same price. Firms have the same unit cost c. Let Dk(.) denote the market demand function at time period k. The total reward at time period k is given by rk(p)) (p - c)Dk(p) Thlel: r.k(J)k) if pk < pk: rX(p.. p. ) - () ifpk=pi 0 ifp > pi St rategies are "M.arkovian" inl that tlhey dependl ofn the current "state", i.e last period rival's act ion. Hence, the set of' all histories is the saIme as the set of all feasible sequences of states. (oiisider thlie infinite history l = {(/k, /,)} thlel firnm i total discounted payoff is pi(h) _ = -,\t. (p1 2p P' =(( Now, let us assume that pT is a feasible price (tecision for firm 1 at odd time period T. ThKen. rl(T, 4) stands for the set of all markoviaCi st rat e combinations for horizon T in which player I is constrained to play PT at tire period l.Si iilarly, H*(T, ) is the set of T-long horizon coInstrained"' MPE strategy cornbl)iat iolns to tste" p. Notice that under the assumptions by a backwards induction arguneiTt one c; s low tat nt*(T, p) 7 0. 5 A Second Example: Linear Quadratic Games. Each player chooses controls at time pc)i(ol i. AI'. E UM C Rq where i E Z, q some positive integer. Tfhe state variable.sk. E k C I' wit li /) some positive integer. follows linear dynamics 1()

Sk = Ask_- + Z Bui'u iEl where A is a given p x p matrix, and Bi i E I are p x q given matrices. The initial value of the state vector so is given. Players payoffs are of the form rk(Sk, U) = skQksk + U'PkUk where Qk and Pk are p x p and q x q positive definite matrices respectively. Basar and Olsder [1995] give a wide array of applications of this model and sufficient conditions for existence of constrained open-loop Nash Equilibria. 6 Approximating Nonstationary MPE. 6.1 Assumptions. Assumption 1: (Non-Emptyness) \(T): 0 for every T Assumption 2:(Continuity) Reward functions are continuous and uniformlv boundded, tllat is: V kE A. / e I Sk E Skk Lk E k r(sk, uk) < lIM < co Note that [PI -- P uniformly, so that P(-) is continuous. Assumption 3: Reachability: FoI ayii\.- E.k.a ad any infinite feasible'seq(ence of states s = (. s, 1I, 2,...) with s. sk tlhere exist sorile finite time period T > k and sequence of control profiles {us}k<,<T, so that state.ST il the seq(luence is reached ill T = t + A(... s) > t periods. 6.2 The Approximation Scheme. \We iinow present the main resullts ili this section. Lemma 1( Harris[1985b] ) ensures that inl the fanrily- of' (linamic games considered to prove that a. certain strategy combination is an MPE for t le Infinite Horizon game we only ineed to, look at finite delviations, that is to say, deviations tlat take place for a finite number of' perio(ls. T'iis result simplifies the task of proving that a certainl carndidate is in fact an MIPE. On t lie othler hand, Lemma 2 ensures that a limit point of the constrained MPE set constitnutes a feasible strategy for the infinite horizon game. In words, tlihe rac(lllhbility assumption ensures that tlie limit point strategy is well defined. 11

6.2.1 Preliminaries. Lemma 1 Let 7r LI and let us assurnme that no single period deviation fronm any sul)gamrt against T is profitable, then 7r E n*, i.e it is a Markov Perfect Equilibrium. Proof. ( Harris[1985b] )Let ^Y = {,,7^ +...) E VP be some general deviation for player i from 7r from the initial state s0.Let us construct a collection of deviations denoted by 7t for player i that will approximate 7i as follows. "iy = ( - 7 ^,1,", k7i'k+l,71k+2"') By hypothesis P (/ho(' '-,)(so)) < P(h'(so)) Let us dlenote by.S the state attained by following deviation 7-rY from i wwith 0 < k < T.Then by hypothesis: P'~(a^',+ -t )( Sk))) < p?(h'(Sk)) Concatenatinio these inequalities we obtain /,i(/r(,. —.)(. o)) < P'(h"(so)) Tliheii )v (coist ructioI: /(- I*I-s)(,,) _ /(,. —,)(0a) as T -oo aii( ((co)tiIltlitv of discounted reward furct iolal: I(/(t <-)(.,,)) Pi'(/1(So)) Ttle same argument can be argue(l ift tlhe (deviat ion "starts" at an arbitrary intermediate feasible state. The i(lea behind the al)ove p)reseilte (l proof is siImple. If single period deviations are llnprofitable then finite sequences of (levialtio us irllalvel. Since any infinite sequence of deviations gives rise to a history that is the lilit of listories derived from finite sequences of deviations. None of tllese deviations are uniprofitable alln tllhe limit is also not profitable by continuity. The next lemma ensures that a linit poinit 'roIm our constrained MPE set is a well (defined strategy for the Infinite Horizoni game. 12

Lemma 2 lim sup -x(T) C H T-ooE Proof. Let wr C lim sup X*(T). By definition, there is a infinite feasible sequence of'states T-oco {ST T E Ag} and constrained MPE strategy combinations {7rT: T AK} with 7T 6 1*(T, ST), such that the it has a converging subsequence {TrTk: k C kC C Ar} with respect to the topology W whose limit is wr. We will focus our attention on the following infinite sequence of states S =(...,STk,STk+1 = ( T(Tk))ST = fTk+l (STk-+l 4 TTkI+ (-STk+ ): — ' STk+ ' *-') II1 words, we have appropiately filled tle gaps on the sequence {STk}keCIn order to prove that T C [, we reed to verify that from an arbitrary intermediate feasible state.s, the play prescribe(dl by T is well (lefined. By reachability from state.s,, there exist a feasible finite sequence of actions the "reaches" the infinite sequence of states s alove corlstructe(d a.t some period n. Then let A- = nlir{Ak CG A: Tk, > n} Thus ever- 7Tk such that A > kA prescrib)es pla that reaches state STk which we denote by 7Tk By% definitioIn of convergence i0 1 1lp( 'i7 ) i hT(sm) So we conclude that 7r FI. - "The limit point of constrained MIPE is an MPE for the Infinite Horizon Game". Theorem 3 lirn sup \'(T) C I ' twit.,p /./rl to the topology W. T- x, 13

Proof. Let us first show that P (h( -t 5)(eSk)) < P (h ('k)) for any player i E I, who would deviate by playing 7i E IP, which constitutes a single pIeriod deviation from w7 from the arbitrary feasible state sm W e recall that this is sufficient in view of Lemma 1 Then by convergence in W and lemma 2: (t) )h(S - ) '- as k - oo where (j^lk, 7TIt) is simply the strategy combination formed of the Tk-period truncation of i as a deviation from 7rTk.By hypothesis: Tk Tk Tk T(h k) (sm)) < ( Tk ( sm)) Finally.by continuity of the (liscounted reward functional after taking limits as k oo Pti(li(3t -1)(7 m)( ) < Pi(h(sm)) \\e inote that Lemma 2 and thle above theorem are easily proven when considering tlhe topology L in the set of Infirlite Horizori strategies. Corollary 4 Lemma 2 and TheoremIn 1 hold true with respect to the stronger topology C "Any MPE is the limit of Constrained MPE" Let pTT7 be the T-truncatiorl of' 7.Let.sT le the state attained by hP (so), in general ) 7t, I(T, T) The reason being, that from somne state.si, C 5k with 0 < k < T the play prescribed by pT 7, i.e T hT 7'(k) need not attain state.T at time teriod T.We now define a. projection operatiorl which, when applied to any strategy comblirlatiion 7r7 will yield the "closest" constrained strategy comiation to. combination to n 14

Definition 7 Let 7T E H(T) be some strategy conmbination that attains state ST at time period T.The projection pT(7T) = frT of wT is defined as follows * The play prescribed by ^rT from so is exactly the one prescribed by 7T. That is hT (so) h(so)T hT (S) =hT (so) * For all states sk E Sk with 0 < k < T such that h+T (sk) reaches state ST at time period T, we again set hT (sk) to be exactly hT (k). * For all states Sk E Sk with 0 < k < T such that wT(sk) leads to state sk+1 (through, the dynamics 5.+ = fk(sk, 7rT(sk)) ) from where state ST is reachable, we set ^T T - XT W(k) = 7(k Sk) * For any other state sk E Sk with 0 < k < T not in any of the above cases, and such that state.T is reachable from Sk we set hjT (.sO) to be any sequence of controls required to attain state ST. Notice that by construction, E H(T.sT). Intuitively, this is the constrained strategy cornl)inatioIl to state.ST -whose play fronl every feasible state st t < T "resembles" the nlost tlhe play prescribed by TrT. However, as we shall riow see, the great degree of freedom in choosinlg ein(lirig play that, satisfies the enrding target state requirement renders difficult the full sequential (claract(' erizat ion. \\' o110w p)rove thle converse statement. Theorem 5 If In i. compact with r'espect to WV then n, C lim intf *(T) T - x, Proof. Let 7T be the T-truricatiori of,, E 1,t arid sT the state attained by following this strategy,. Let. us consider the sequence { Tr: 7 Ai'} where TrT = T(T7T) By construction we have - = liTIm T T-x. Reasoiinrg by contradiction, let us now assullme that: 7T I II (7',-.T) for all T 1,)

By definition, this implies the existence of profitable deviations, i.e for each T there is ^fT with i I such that (E)T, KTi) G H(T, sT) and P (h'T ( So)) > PT(hT (s)) But by compactness of the space H with respect to the topology W, the collection {(, -~7Ti):TCA' i E I} has a converging subsequence with limit, say (7^, 7ri_) n. By continuity of discounted reward functional after taking limits, we obtain: P (/I - )(.o)) > P(h(so)) In other words, 7r ~ flIt, hence a contradictionl. A comment on this result is mandatori:;we have said that the artificially imposed constraint. of reaching a target state, introduced "strateg<ic" alterations to the original game. Intuitively, some player may find it attractive to (leviate in the early stages if he knows in advance that players will have to coordinate at the final stages in order to attain the target state. Thus it is critical to choose when constrllcting the p) rojection rmap /T(.) the right ending play. However, as tlhe plaIniiing horizon di-vergaes to iiifiiiitv tllis alteration becomes negligible. Nonetheless, onl- "strict" Infinite Horizorn Iarkov 'Perfect't E(tllibiria are immune to this effect. \\be )elieve that this problem can t)e solved by looking at the specifics of each application so to redulce t he artificiality involved in (onust ricti ng the projection map rT(.). This is a matter of fu'rther research. Theorem 6 If all Infinite Horizon,larkor I') l:rft Equilibria are "strict" and Fr is corn-pact in /V thlen. lin \(i) - n* Proof. By the statement in tle h}yOlt licsis a rl( t lheorems 1 and 2: lim sut) ^(7T) - I11 = Ilt C lim inf x*(T) T —xo T —oc 1I

7 Existence of Markov Perfect Equilibria (Average Reward). Perhaps the main contribution of the approach introduced is that it enables to infer existence in the undiscounted case, provided the set of constrained MPE for finite planning horizon is not empty. We say that it constitutes a significant advance in that none of the sequential characterizations available and reviewed, can be applied in this context. MrWe focus our attention for the case when players do not discount rewards, i.e Ai = 1 for i I Z. For the infinite horizon dynamic game we define the following total aggregated reward received by player i under strategy combination 7r as follows A'(hV(so))= lim inf P T(hT (s)) T-oc T Since there is no discounting, the function PT(.) yields the total reward accrued up to period T. Ttlus the above defined functional can be interpreted as the "average reward" received rlndler strategy combination 7w from initial state so. One can easily extend this definition if we restrict to (lifferent starting states at different periods. 7.1 Assumptions. Assumption 1: (Non-Emptyness) \'(T) 7 0 for every T Assumption 2 (Average case): 1. Discreteness. Each set Uj. is discrete. arid endowed with the discrete metric. Hence, the product space (U = [ (' x l"' is compact in the product topology and H is A-=() k=/ colmpact in C. 2. Reward Boundedness.that is. for everyv i IZ and k E A' -c - < -</,.(.,). M < oo Assumption 3' Uniformly Bounded Reachability: For alny._;i E S., and any infinite feasible se'(lllnce of states s = (.so, l s2,...) with s. q s, kthere

exist some finite time period T > k and sequence of control profiles {U's}k<.<<T, so that state ST in the sequence is reached in T = k + A(sk, s) > t periods. Moreover, sup sup A(s', s) < L < oo k s A comment is due on this last assumption. In the discounted case, the reachability assumption required players to be able to effectively control the state dynamics by cooperating, i.e given any infinite feasible sequence of states and any state off that sequence, players could agree on a finite sequence of actions so to reach the given infinite sequence. In the average case, we have to limit the "reward" effect of such finite sequence of actions by requiring that it is of an uniformly bounded length. This assumption then ensures that in the average the "reward" effect caused by it disappears. 7.2 The Existence Result. Theorem 7 0:7 limr sup \*(T) C I witli respect to C. T-oc Proof. Norl-emrptvyness is (tile to discreteress of action sets. Let w C liill su1) x'(T). BY! Lemma 2. we know that 7iT C. T —. Let uts (lenlote )- {'k k C k C A } suchll t iat, = lilhm T with respect to L k - Let us first shlow that,V-i( /hl(t*-l)(0))< KiV(hs(o)) lor arln player I I Z1 who %wouild deviate by p)layig llo, E H from initial state so. We recall that 11l' - )(.S(, ) arld h(.s) stand for t le 7'-t lrui(at iois of such histories. Let.T (deInot e the state attainle(l t / II^l — ' (b o) By colivergence in C and LemmIIIa 2 t lI(sre e{xist kT C hC such tlhat for any w7- witlh k > AT > Tthte I)lay prescribed bl ( /L. );,n ^ - (,oi lcide exactly with h7r T'"(.s) and hT(.so) respectively. in t-he first T T>erio(ls. l1(Moreoverx'. tle (leviation for player i ^1i = ( 1o' ^' *. -. ^-l T+,..., a k) 18

in which we append fronl T-period the actions (aT+l,..., aT+) as prescribed 7rT. is such that (k, Tki ) "reaches" state.k.. Formally: (OI I 7_)C 1(k, Sk) Hence, by hypothesis on Trk with k > kT > T we have: P(h ( t i ) (So)) < (h So)) By cost boundedness and the choice of k, the strict reachability assumption and the fact that we deal 'with rmarkovia(l Ytratfgie.s: (h ''^ }h, 7P~so)) 2M-L P' ((TY/ z.-,)t < T( ((0)) 2 L T T T aild. (/](-')(.so)) < P4 (hlT(so)) 2M L T T T Hence A(/"-')(.0))- lii iif ((- T (')) < lim inf PT (hT(s)) - i((so) I-, cT - T-oo T Thlls. from the initial state. t lhe proposed(t (leviation is not profitable. For a deviation from any otiler state.s,. 5;. with 0 k we proceed in the same manner. I 8 Conclusion. In thils chapter, we have addressed tlie p)rolem of approximating MPE for Infirnite Horizon Ganmes. For finite planning horizois. we iHitrodu(ced the notion of "Constrained MPE" as a surrogate that is immune to "end of tloizoln, " eff'ec(ts. We see that under fairly general assuimllpt ions a d(1 "reachability" (players can, by cooperating, effectivelyr control tlie state dynalmics) tlie liiiit point of constrained IMPE is aIn M PE for the Infinite Horizon Game. Thlis result is of partic(iilar value as a computational procedure. However, dule to "strategic" effects indulel )a. the artificial terminal constraint, a.nyn Infinite Horizon IMPE is nriot necessarily tlie liit of "(Constrained MPE". Nonetheless, wlen a1 Infinite 1)

Horizon MPE are strict and the space of strategies is compact, then every Infinite Horizorn PE is the limit of "Constrained MPE". This sequential characterization allows for comIputational procedures as in Smith and Schochetman [1991] and [1992]. We believe that substantial improvements can be obtained by focusing in very specific settings in order to avoid the artificiality resulting from working in an abstract framework. This is a subject of further research. Finally, and perhaps more importantly, for a different set of assumptions, when players do not use discounting, the limit point of "Constrained MPE" is MPE for the Infinite Horizon Game. A standard compactness argument yields existence, which constitutes a significant advance given that existing methodologies do not work well in this setting. 20

References [1] "Infinite Horizon Optimization", Schochetman I. and Smith R.L. Mathematics of Operations Research 14 (1989) 1-16 [2] "Finite dimensional approximation in Infinite dimensional mathematical programming", Schochetman I. and Smith R. L.Mathematical Programming 54(1992) 307-333 [3] "Convergence of Selections with applications to Optimization", Schochetman I., Smith R.L. J.. Mlath. Anal. Appl. 155 (1991) 278-292 [4] "Convergence of Best Approximatiors from Unbounded Sets" Schochetman I., Smith R.L. J. llath. Anal. Appl. 166 (1992) pp. 112-128 [5] "Sulhgame-Perfect equilibria of Finite and Infinite-Horizon Games", D.Fudenberg and D.Levine Jo urna l of Economic Theory 31 (1983) 251-267 [6] "Liliit Games and Limit Equilibria". D.Fudenberg and D.Levine.Jolrn al of Economic Theory 38 (19)86) 261-279 [7] "(ollusive Belhavior in nonl-cooperiative epsilon-equilibria of Oligopolies with long but finite lives".Jourtnal of Economic Theory 22 (1980) pp 136-154 [S] "Theory of Correspondances" E. K leir. Joh/n Wiley 1984 [9] "Topology: A first course", J.R uIllinkres. Prentice Hall 1975 [10] "Perfect Equilibrium histories of Finite aIld( Infinite Horizon Games" B6rgers,Tilmnan. Joulrntll of Econonmic Theory/ 47 (1)989) p)p 218-227 [1 1] "A (claracterization of perfect e(lliria in Infinite Horizon Games" Harris,Chris. Joulrnal of Econon.mic Theory 37 ( 1985) )pp. 99-127 21

[12] "'Existence and Characterization of Perfect equilibrium in Games of Perfect Information' Harris,Chris. Econometrica 53 No 3 (1985) pp. 613-628 [13] "Cooperation and Rationality" D. Pearce Sixth Wol'orld Congress of the Econon-etrica Society (1990), Jean Jacques Laffont ed. [14] "Toward a Theory of Discounted Repeated Games with Imperfect Monitoring" Abreu, D., Pearce, D. and Stacchetti E. Econometrica 58 No 5 (1990) pp. 104 1-1063 [15] "Optimal Cartel Equilibria with Imperfect Monitoring" Abreu, D., Pearce. D. and Stacchetti E. Jou rnal of Economic Theory 39 (1986) pp. 251-269 [16] "A Theory of Dynamic Oligopoly II: Price Competition, Kinked Demand curves and Edgeworth cycles" Maskin E. and Tirole J. Econoraetrica 56 No.3 (1988) 571-599! [17] "('ame Theory" Fudenberg D. anIld Tirole.. AlT Press 1993 [IS] "DL)iamic Non-Cooperative Gamei ltlieory" Basar T. and Olsder G. Alcad(efmic Press 1995 (2nd( Edition) [19] "A folk theorem for cdyiamic ganles (Gaitsgaorl V. and Nitzan Sh..Jourllnal of lIathematical Econormics 23: (1994 —) pp 167-178 [20] "A cor!cept of cooperative equilil)rillil for (Ivilnamic games" Tolwinsky B., itoruatica 18 (1982) pp. 4-131-4-16 [21 ] Computational aspects of tle Ilifiniite p Illniri horizon open-loop Nash equilibriumn i LQ( gamnes" Engwerda.J.C \Vorking paper 1996, Tillburg (nir r.,itlJ ' >