Wednesday, December 31, 2014

A modified prisoner's dilemma simulation

Recap

Over a year ago, I wrote a series of posts about the evolution of iterated prisoner's dilemma strategies.  Since it was so long ago, I will briefly recap, although reading the previous posts may be necessary.
1. I explained what the iterated prisoner's dilemma game is.  Then I program a simple evolutionary simulation where there is a population of individuals who play iterated prisoner's dilemma games against each other.  Rather than intelligently choosing strategies, the individuals blindly follow strategies determined by their "genes".

2. I explain an important scientific paper which demonstrates the existence of so-called "Zero-determinant strategies" (ZD strategies).  These strategies allow one player to unilaterally determine the score of their opponent, or unilaterally enforce a linear relationship between their opponent's score and their own.  The paper claims that this should lead to extortionate strategies in evolution.

3. I explain a response to the ZD paper.  In some sense, ZD strategies may "win", but this does not mean they are evolutionarily stable.  The paper explains conditions for evolutionary stability, and shows that several key ZD strategies are not evolutionarily stable.
In this, the final post of the series, I will modify my evolutionary simulation in light of what I learned from the papers.  I actually modified the simulation a long time ago, but I stalled while trying to figure out what parameters to use.  It was difficult to find a mutation rate which was low enough to allow for population stability, but high enough that the simulation would operate in a reasonable amount of time.

My primary conclusion, therefore, is that researchers in this field have their work cut out for them.  I could see using a supercomputer on this problem, and more fully exploring different variations.  But I'm not trying to contribute to the field, I'm just doing this for fun, and to better understand the field.

Modifications in the simulation

Here are the things I changed in my simulation in light of what I read in the literature:
• Previously, each individual was specified with three parameters.  Now I use four, because these individuals remember not only whether their opponents cooperated or defected in the previous round, but also remember what they themselves did.

• Previously, I calculated the average outcome after ten iterations of the prisoner's dilemma.  Using mathematical techniques from the literature, it now calculates the average outcome after an infinite number of iterations.  This modification eliminates the need to know what individuals do in the first iteration.  Also, the calculations don't work with "pure" strategies, so every individual always has at least 0.1% chance of cooperation and 0.1% chance of defection.

• I switched to a payoff matrix of 5/3/1/0, which is the standard in literature.

• Previously, the individual with the highest score reproduced, and the individual with the lowest score died.  In the literature, the chance of reproduction is proportional to score, and a random individual dies.  I found that the reproduction process makes a huge difference, because it's the difference between trying to be better and trying to be the best.  I changed it so that reproduction rate is proportional to score, and death rate is proportional to five minus the score.
I note that the literature seems to prefer a different mutation process than what I used.  I had individuals constantly mutating their parameters by small amounts.  They instead gave each individual a small chance of mutating a single parameter to a random value.  Both methods make sense.  If behaviors are controlled by many genes, there will be a large chance of small mutations; if behaviors are controlled by just a few genes, there will be a small chance of large mutations. I tried both ways, but stuck with my method because it was easier to make the simulation stable.

The results

The following comes from a population of 40 (which is far smaller than the populations used in the literature).  Each generation, each player plays against two randomly chosen opponents.  Then reproduction and death is assigned by the scores, and the genes of each individual are mutated by a random number between -0.005 and 0.005.

A million generations are shown.  It takes about seven minutes to run.

Here I plot values as they change over the generations.  The thick black line shows the average score (right axis).  The other four lines show the probability (left axis) of cooperating given the outcome of the previous iteration.  For example the CC line shows the probability of me cooperating if we both cooperated in the previous iteration.  The CD line shows the probability of me cooperating if in the last iteration I cooperated and you defected.  All of these values vary through the population, but what's shown are the population averages.

You could be forgiven for thinking that it all looks like a bunch of noise.  This is basically why my simulation stalled.  Is it all just noise from genetic drift?  Do I need a lower mutation rate or larger population?  Or am I biasing results by changing the parameters until the noise is reduced to my satisfaction?  I'm not paid enough to figure it out.

But you can see a few strategies showing up over and over:
• Defection.  All colored lines are near zero, and the average score is near 1.
• General cooperation.  In the literature, evolutionary simulations converge on the so-called general cooperation strategy, (0.935, 0.229, 0.266, 0.42).  My simulation is a bit different and not very stable, but you can see similar results whenever the green line is high and the other colored lines are low.
• Tit-for-tat.  This is the (1,0,1,0) strategy, when the green and orange lines are high, and the others are low.  It only appears a couple times, but seems stable.

• Other hybrid cooperation strategies (green plus blue, or green plus aqua) occur repeatedly, and generally precede a descent from cooperation to defection.  If I were to give this a narrative, it's sort of like the population goes soft on crime, and then the criminals take over?
•  I find myself very puzzled by the aqua only strategy, which occurs twice.  I call it the "submission" strategy, since it means that if you cooperate and your opponent defects, then you may just choose to cooperate again.  I don't understand how this could possibly be evolutionarily stable.  This was the sort of thing that made me wonder if my simulation was being wonky.  But here it is:

I also wanted to do a quick test to see how much the different strategies resemble ZD strategies.  I didn't figure out how to define resemblance though, so that was a dead end.

In summary, you can see the results are complicated, and that's why you have to pay researchers money instead of making them work for free.

Extra: Battle of the Sexes

My boyfriend was really curious what happens if you try the same simulation in an iterated Battle of the Sexes game.  In Battle of the Sexes, there is a married couple, and the husband wants to go to a monster truck rally, and the wife wants to go to the opera, or something ridiculously gendered.  Anyway, each person prefers to get their own choice, but going together is still preferable to going separately.

This is pretty easy to do with the same script I already wrote, I just need to replace the payoff matrix 5/3/1/0 with 5/0/0/3.

This is largely an afterthought, so I'm just running it for 100,000 generations.  The left axis shows the probability of giving in, given different outcomes of the previous iteration.
(ETA: I realized the legend is ambiguous.  "My choice" represents the probability that I will give in if in the previous iteration both me and my spouse went to my preferred event.  "Your choice" is the probability that I will give in if in the previous iteration both me and my spouse went to my spouse's preferred event.)

That was really straightforward.  If you're playing an infinite number of iterations of Battle of the Sexes, the evolutionarily stable strategy is to mostly avoid giving in.  But once somebody finally does give in, they continue to give in.  It's funny that the average score hovers around 3, instead of 4.  I guess it's common to have evolutionary players who just never give in.

I suppose this supports the idea that the Iterated Prisoner's Dilemma is really complicated, and it's not just that my simulation is noisy.

---------------------------------------------

This is the last part of a miniseries on the evolution of Prisoner's Dilemma strategies.
1. Evolution of prisoner's dilemma strategies
2. Extortionate strategies in Prisoner's dilemma
3. Prisoner's Dilemma and evolutionary stability
4. A modified prisoner's dilemma simulation