Monday, September 16, 2013

Prisoner's Dilemma and evolutionary stability

In the last post of this series, I discussed a paper which established the existence of so-called ZD strategies in the iterated prisoner's dilemma.  Using a ZD strategy, you can either unilaterally choose your opponent's score, or you can enforce a linear relationship between your opponent's score and your own.  You can choose an extortionate ZD strategy, wherein small increase in your opponent's score results in a larger increase in your own score.  Here, your opponent's best strategy is to cooperate fully, even though you reap most of the benefits.

The paper claimed that this is a powerful strategy against an evolutionary opponent, since you can cause your opponent to evolve into a tamer form.  Here, I will discuss a paper which disagrees:

Adami, C. & Hintze, A. Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything. Nat. Commun. 4, 2193 (2013).

Evolutionary stability

A key concept in this paper is that of evolutionarily stable strategies.  As the title of the paper says, winning is not everything, and what makes a strategy "win" is not the same as what makes a strategy evolutionarily stable.

When we say that one strategy "wins" against another, we imagine the two strategies facing off directly, and the winner getting the higher score.  In symbolic terms, let E(X,Y) be the expected score of strategy X when playing against an opponent with strategy Y.  X wins against Y if

E(X,Y) > E(Y,X).

But when we say that one strategy is evolutionarily stable against another, we imagine that there is a population where one strategy is most prevalent.  We ask if mutations of that strategy will ever gain a foothold.  Rather than having X and Y face off against each other directly, they both face off against the most prevalent strategy, X.  X is evolutionarily stable against Y if

E(X,X) > E(Y,X).

In the case where E(X,X) = E(Y,X), then X and Y seem to be on equal footing, so we need to consider higher-order effects.  X may be the most prevalent strategy, but at least a few individuals will mutate into Y.  So at least some of the time, the opponent will be Y rather than X.  X is weakly stable against Y if

E(X,X) = E(Y,X) and E(X,Y) > E(Y,Y),

and X is weakly unstable against Y if

E(X,X) = E(Y,X) and E(X,Y) < E(Y,Y).

Already we can vaguely see how cooperation might be an evolutionarily stable strategy, even though it is unstable from a game theory perspective.  A key to evolutionary stability is how well you do against opponents that are like yourself.

Why ZD strategies don't work

Consider the ZD strategy where you unilaterally choose your opponent's score.  This isn't actually that great from the perspective of evolutionary stability.  For the ZD strategy to be evolutionarily stable, we want

E(ZD,ZD) > E(O,ZD),

where O is some other strategy.  But by unilaterally choosing your opponent's score, you force

E(ZD,ZD) = E(O,ZD).

At most this ZD strategy can be weakly stable.   In fact, ZD is weakly unstable against many other strategies.  For example, they show that it is always weakly unstable against the Pavlov strategy.1  This is not only shown in the equations, but in a couple simulations.  These simulations pit a ZD population against a Pavlov population, and show that Pavlov dominates.

There's also another kind of simulation which allows strategies to mutate and evolve freely (as opposed to being constrained to just ZD and Pavlov).  In these simulations, if you start out with a ZD population, and have a very slow mutation rate, the population eventually settles on the "general cooperation" strategy.2  This is interesting, because the ZD strategy is evolutionarily stable against the general cooperation strategy.  Even though a ZD population would beat out a general cooperation population, small mutations cause ZD to be unstable, but do not cause general cooperation to be unstable.3  This is a second, distinct sense in which ZD is an evolutionarily unstable strategy.  The paper calls this "mutational instability".

But so far I've only discussed the kind of ZD strategy where you unilaterally choose your opponent's score.  The paper briefly considers extortionate ZD strategies, and finds that they do even worse.  When an extortionate strategy faces off against itself, it results in mutual defection.  This makes it very unstable.  Extortionate ZD strategies do in fact tame opponents into cooperation--and this will result in cooperators replacing the existing ZD population.

I have a criticism of this paper.  The paper only considers these two kinds of ZD strategies, which were the ones mentioned explicitly in the original ZD paper.  However, there are plenty of other ZD strategies not considered.

Communication: a force for evil?

In the BBC article about this paper, it's implied that the reason cooperation evolves is because of communication.  This is basically the opposite of what the paper says, no joke.

The paper says that perhaps ZD strategies can be evolutionarily stable against more cooperative strategies if ZD players recognize each other.  This isn't so much about communication (since communicators can lie) but about visible indicators of one's genetics.  If ZD players recognize each other, and selectively cooperate with each other (but not with mutants), this makes it more stable.

But there are a few problems with this solution.  First, if ZD players can recognize other ZD players, it stands to reason that other kinds of players can do the same.  Second, it's possible for other players to evolve "camouflage" to look like ZD players.  This would result in some sort of camouflage/detection arms race.

Explanation of my simulation results

The conditions for evolutionary instability very nicely explain the results of my simulation of the evolution of the iterated prisoner's dilemma.  In my simulation, I showed cyclic evolution from defection to tit-for-tat to cooperation, and back to defection.  After some thought, this makes sense, because each strategy in the cycle is evolutionarily unstable against the next one.  However, the instability of defection against tit-for-tat is extremely marginal, so I can see why the simulation seemed to get stuck in defection for long periods of time.

I was surprised to find that one of the simulations in the paper is quite similar to my own.  There are a few critical differences which I will discuss later.  It's become clear to me that in the next installment of this series, I will redo my simulation, with modifications in light of what I have read.


1. The Pavlov strategy is to cooperate if and only if the previous game was mutual cooperation or mutual defection.  The notation is (1,0,0,1).  The Pavlov strategy is strange, but it plays well against itself because it always leads to mutual cooperation regardless of the initial game.

2. The general cooperation strategy is (0.935, 0.229, 0.266, 0.42).  In the literature, simulations have established that this is the dominating strategy under the condition of low mutation rates.

3. I believe that this result may depend on the particular implementation of the mutation.  The implementation used is that there is a very small chance that an individual will replace one of its four numbers with a new random number between 0 and 1. This is different from the way I implemented mutation in my simulation, for instance.

This is part of a miniseries on the evolution of Prisoner's Dilemma strategies.
1. Evolution of prisoner's dilemma strategies
2. Extortionate strategies in Prisoner's dilemma
3. Prisoner's Dilemma and evolutionary stability
4. A modified prisoner's dilemma simulation

1 comment:

Larry Hamelin said...

Excellent. Eagerly anticipating the next installment.