Learning Conditional Behavior in Similar Stag Hunt Games

Dale Stahl and John Van Huyck

November 2001

[ Download | Introduction | Conclusion | References | John's Web ]

Abstract: This paper reports an experiment that varies the range of Stag Hunt games experienced by the participants. The experiment provides evidence that changing the range influences the likelihood of efficient conventions emerging. In the experiment, we observe conditional behavior, much like risk dominance, emerging with experience. We develop a model of conditional expectations to explain these stylized facts that depends crucially on the assumption that after a brief learning period participants categorize their experience using the same relative bandwidth in both treatments even though the range of experience is twice as large in treatment 1 as it is in treatment 2. The assumption can not be rejected by the data. The analysis provides a formal example in which increasing experienced diversity by changing the way similar experiences are categorized increases the likelihood of efficient conventions emerging in communities playing similar Stag Hunt games.


Introduction

Solution concepts that satisfy subgame consistency allow one to construct a general deductive theory of games that is independent of history. The solution of a game is subgame consistent if the solution is independent of the path leading up to the game. The logic of strategic rationality suggests that once a well defined game has been reached other parts of some larger game ought to be strategically irrelevant. Subgame consistent solution concepts have the desirable property that one need not know how a game arose in order to prescribe or predict what will happen.

In general, games have multiple solutions that are neither equivalent nor interchangeable. The class of two player Stag Hunt games, g(x), is an example, see figure 1. Members of the class have three mutually consistent strategy combinations. The strategy combination ($,$) is a payoff dominant equilibrium, while the strategy combination (",") is a secure equilibrium. Harsanyi and Selten’s (1988) original deductive selection theory selects ($,$), but Harsanyi’s (1995) revised theory based on risk dominance selects ($,$) when x < 0.5, (",") when x > 0.5, and the mixed equilibrium when x = 0.5.

"

$

"

x, x

x, 0

$

0, x

1,1

Without a unique solution it is not obvious even from the point of view of strategic rationality that history shouldn’t matter. For example, past experience with the game could condition the players' beliefs in a way that solves the strategy coordination problem arising from multiple strategically rational solutions. Many investigators have found that this strategy coordination problem can be solved by participants if they are given repeated experience with the game, see Battalio, Samuelson, and Van Huyck (2001) for example. Moreover, the equilibrium selected depends on the experience of the community from which the two players are drawn.

Learning models formalize the way past experience with the game influences behavior. Most learning models assume that the same game is played repeatedly. Even if the game has never been encountered in the past, experience with games similar to it could also condition players' beliefs and solve the strategy coordination problem. One of the defining human characteristics is the ability to classify strategic situations and gauge the similarity between situations. Our ability to understand and predict human behavior would be greatly enhanced by a successful theory of how past experiences with similar situations affect current behavior.

Very few theoretical results exist for learning in similar games. LiCalzi (1995) extends the fictitious play dynamic to similar games. He demonstrates that strict equilibria need not be absorbing and that if convergence takes place, what is learned may be path-dependent. LiCalzi’s results are for repeated games rather than either the random matching or mean matching protocols discussed below. The myopic best response function assumed in the fictitious play literature leads to wildly inaccurate predictions when applied to repeated games, see almost any paper in the early experimental literature.

Rankin, Van Huyck, and Battalio (RVHB, 2000) report an experiment in which participants play a sequence of similar Stag Hunt games in which payoffs and labels change each period. Each cohort consisted of 8 participants. The participants were randomly pairwise matched and presented with a sequence of 75 Stag Hunt games, g(x). The value of x in period t was independently and uniformly selected from {1/370, 2/370, ..., 369/370}. A constant gt was added to all payoffs each period, where gt was independently and uniformly selected from {0/370, 1/370, ..., 50/370}. The action labels {A,B} were equally likely to designate {", $} or {$,"}. These perturbations resulted in payoff dominance emerging as the conventional equilibrium selection principle even when the risk dominant equilibrium had a large basin of attraction under the myopic best response dynamic. There was little evidence for the conditional behavior predicted by risk dominance.

We will refer to the action with the constant xt+et payoff as the "Secure" (SEC) action, and we will refer to the other action as the Payoff-Dominant (PD) action. The participants were not told that they would be facing these similar Stag Hunt games: only that they would be facing a sequence of decisions. Thus, the participants were not prompted to look for similarities, and the design masked the similarities by permuting the rows and columns and randomizing the xt and et values. The RVHB data leave no doubt that the participants perceived the similarity of the games, because they invariably converged to the PD equilibrium, which would have been impossible if they had not identified and labeled the actions as SEC and PD or something equivalent.

This paper reports an experiment that changes the RVHB design in two important ways. The participants are matched against everyone in the cohort each period and receive a payoff equal to the mean of these matches. We will call this protocol mean matching. We report two treatments using mean matching. The first treatment uses the same xt sequence as RVHB, and the second treatment restricts xt to be independently and uniformly selected from {185/370, 186/370, ..., 369/370} without replacement. In the second treatment, risk dominance always selects the SEC action.

These changes to the design result in one cohort that converges to the PD convention as in RVHB, two cohorts that converge to the SEC convention, and two cohorts that converge to what we will refer to as conditional behavior. Roughly speaking, in the two cohorts exhibiting conditional behavior the PD action was chosen when xt < 0.75 and the SEC action was chosen when xt > 0.75.

There is no way such behavior could have arisen without the participants recognizing the similarity of the games and conditioning on xt. While the participants may have formed conditional beliefs about what their opponents would do and then chose a best reply to those beliefs, it is also possible that participants merely imitated the population but to an extent that depended on the similarity of the current game and the previous game: that is, on |xt+1 - xt |. Such imitative behavior with inertia would generate a conditional trend in behavior.

Given these experiment findings, any theory that explains the data will have to incorporate conditional strategies, like the cut point rules in RVHB, conditional trends, and/or conditional beliefs. This paper rejects cut point rules in favor of extending Stahl (1999) to include conditional trends/beliefs to explain the conditional behavior observed in the experiment.

[ Top | Download | Introduction | Conclusion | References | John's Web ]


Download

Adobe Acrobat (PDF) format:

Surface mail request (comments, suggestions, references, etc.): john.vanhuyck@tamu.edu

[ Top | Download | Introduction | Conclusion | References | John's Web ]


Conclusion

Experience with a greater range of Stag Hunt games increases the likelihood that a convention based on payoff dominance emerges in a laboratory community. Specifically, when xt ranged between 0 and 1 a convention based payoff dominance emerged three times as often as when xt ranged between 0.5 and 1. The likelihood of emergent conditional behavior was also influenced by the range of Stag Hunt games experienced.

This paper develops a model of conditional adaptive expectations to fit the laboratory phenomena. The key analytical assumption is that after a brief learning period participants categorize their experience using the same bandwidth in both treatments even though the range of experience is twice as large in treatment 1 as in treatment 2. An important result is that this assumption can not be rejected by the data.

Simply including conditional adaptive expectations into a logistic best reply model did not allow us to explain the data with one set of fitted parameters. It was necessary to introduce an exogenous belief in the salience of the PD action. Again we could not reject the key analytical assumption that after a brief learning period participants categorize their experience using the same relative bandwidth in both treatments even though the range of experience is twice as large in treatment 1 as in treatment 2. Moreover, simulations with the fitted parameters reproduce the stylized facts described above. The estimated minimum bandwidth is about one-seventh the range of xt, which is consistent with the stylized categorization facts reported by cognitive psychologists. We have thus developed a formal example in which increasing experienced diversity increases the likelihood of efficient conventions emerging in laboratory communities playing similar Stag Hunt games.

[ Top | Download | Introduction | Conclusion | References | John's Web ]


References

Barsalou, L.W., Cognitive Psychology: An overview for cognitive scientists, (Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers, 1992)

Battalio, R., L. Samuelson, and J. Van Huyck, (2001) "Optimization Incentives and Coordination Failure in Laboratory Stag Hunt Games," Econometrica, 69(3), 749-764.

Carlsson, H. and Eric van Damme, (1993) "Global games and equilibrium selection," Econometrica 61, 989-1018.

Clark, K., Stephen Kay and Martin Sefton, (1996), "When are Nash Equilibria Self-Enforcing? An Experimental Analysis."

Cooper, R., D.V. DeJong, R. Forsythe, and T.W. Ross, (1992) "Communication in coordination games," Quarterly Journal of Economics 107, 739-773.

Crawford, V., (1991), "An ‘Evolutionary’ Interpretation of Van Huyck, Battalio, and Beils Experimental Results on Coordination," Games and Economic Behavior 3, 25-59

Gilboa, I. and D. Schmeidler, (1995) "Case-based Decision Theory," Quarterly Journal of Economics, 110, 605-639.

Friedman, D., (1996) "Equilibrium in evolutionary games: Some experimental results," Economic Journal 106, 1-25.

Fudenberg, D. and Kreps, D. (1993) "Learning Mixed Equilibria," Games and Economic Behavior, 5, 320-367.

Harsanyi, J.C. (1995) "A new theory of equilibrium selection for games with complete information," Games and Economic Behavior 8, 91-122.

Harsanyi, J.C. and Selten, R. (1988). A General Theory of Equilibrium Selection in Games, Cambridge,MA: MIT Press.

Haruvy, E. and D.O. Stahl, (2000) "Initial Play and Equilibrium Selection in Symmetric Normal-Form Games."

LiCalzi, M. (1995). "Fictitious Play by Cases," Games and Economic Behavior, 11, 64-89.

Rankin, F., J.B. Van Huyck, and R.C. Battalio, (2000) "Strategic Similarity and Emergent Conventions: Evidence from Similar Stag Hunt Games," Games and Economic Behavior, 32, 315-337.

Schmidt, D., Robert Shupp, James Walker, Elinor Ostrom, (1997) "Playing Safe in Coordination Games: The Role of Risk Dominance, Payoff Dominance, Social History, and Reputation."

Stahl, D.O. (1999), "A Horse Race Among Action Reinforcement Learning Models."

Stahl, D.O. (2000), "Rule Learning in Symmetric Normal-Form Games: Theory and Evidence," Games and Economic Behavior, 32(1), 105-138.

Straub, P., (1995) "Risk Dominance and Coordination Failure in Static Games," The Quarterly Review of Economics and Finance 35, 339-363.

Van Huyck, J.B., J.P. Cook, R.C. Battalio, (1997) "Adaptive Behavior and Coordination Failure," Journal of Economic Behavior and Organization, 32, 483-503.

Van Huyck, J.B., R.C. Battalio, and R.O. Beil, (1990) "Tacit Coordination Games, Strategic Uncertainty, and Coordination Failure," American Economic Review, 80(1), 234-248.

Van Huyck, J.B., R.C. Battalio, and F.W. Rankin, (2001) "Evidence on Learning in Coordination Games ."

[ Top | Download | Introduction | Conclusion | References | John's Web ]