Saturday, August 24, 2013

Evolution of prisoner's dilemma strategies

While I was in France, I couldn't spend all my time going to the beach and talking about physics.  I mean, sometimes the beach had too many jellyfish.  And we had no internet for a time.

Our arch-nemesis, the jellyfish

So I might have spent an hour or two programming an evolutionary simulation of Prisoner's Dilemma strategies.  The inspiration was that my boyfriend described to me an awful article on BBC news which completely distorted a paper (update: corrected link) on evolutionarily stable strategies for the iterated prisoner's dilemma.

This is all based on multiple hearsay, since I have read neither the news article nor the paper.  So my evolutionary simulation may be completely off the mark.  It's okay, this is just for fun.  Later I will look at the paper and compare what they did to what I did.

Introduction

I assume readers are familiar with the prisoner's dilemma.  The prisoner's dilemma contains the basic problem of altruism.  If we all cooperate, we all get better results.  But as far as the individual is concerned, it's better not to cooperate (ie to defect).  It gets more complicated when two players play multiple games of prisoner's dilemma in a row.  In this case, even a purely selfish individual may wish to cooperate, or else in later iterations of the game, their opponent might punish them for defecting.

The iterated prisoner's dilemma can be analyzed to death with economics and philosophy.  But here I am interested in analyzing it from an evolutionary perspective.  We imagine that individuals, rather than intelligently choosing whether to cooperate or defect, blindly follow a strategy determined by their genes.  I parametrize each person's genes with three numbers:

x: The probability of cooperating if in the previous iteration, the opponent cooperated.
y: The probability of cooperating if in the previous iteration, the opponent defected.
z: The probability of cooperating in the first iteration

I believe everything is better with visual representation, so here is a visual representation of the genespace:
Each square represents an individual.  Each individual is described by three numbers x, y, and z, which are visually represented by the position and color of the square.  The major strategies of the iterated prisoner's dilemma are associated with different quadrants of the space.  In the upper right quadrant is the cooperative strategy.  In the lower left quadrant is the defecting strategy.  In the lower right, is the "tit for tat" strategy, wherein you cooperate if your opponent cooperate, and defect if your opponent defects.  The upper left quadrant... I don't think there's a name for it.

And of course, there are hybrid strategies.  For example, between cooperation and "Tit for Tat", there's "Tit for Tat with forgiveness".  In the evolutionary algorithm, the only way to get from one strategy to another is by accumulating mutations from generation to generation, so the hybrid strategies are important.

The simulation

I tried to perform the evolutionary simulation in the simplest way possible.  Start with some number of individuals (here I use 20 individuals).  Have each pair of individuals play an iterated prisoner's dilemma (here I used 10 iterations).1 Afterwards, I total up all the scores, and remove the individual with the worst score while duplicating the one with the best score.  Then every individual mutates by randomly making small changes in x, y, and z.

Upon testing, I found that this simulation favors defection far too much.  The trouble is that it's really a zero sum game.  There is no absolute standard for what sort of scores are good or bad, it's just a comparison between the scores of different individuals.  Even if defecting doesn't benefit you directly, it's still beneficial to hurt your opponents so that you come out ahead.

I'm not really sure how to solve this problem, and I'm very interested to see how researchers solve it.  I tried several things, and settled for a simple modification.  Rather than playing against every other player, each individual will play against two random individuals.2  It's still beneficial to hurt your opponents, but not that beneficial since you are only hurting a few of them.  It's also possible that an individual will play against itself, and by defecting hurt itself.

I believe that researchers typically use the following payoffs: mutual cooperation is 3 points, mutual defection is 1 point, and if only one player defects then they get 5 points while the cooperating player gets 0.   However, I wanted to find some interesting behavior, so I adjusted the parameters to better balance the cooperation and defection strategies.  I eventually settled on 4/3/1/0 instead of 5/3/1/0.

Results

Interestingly, rather than there being a single preferred strategy, multiple strategies can be evolutionarily stable.  Actually, this may not be so surprising, since I basically adjusted the parameters of the simulation until I saw interesting behavior.  But it's still fun to watch.

Often the population will settle into the defection strategy and not move for a thousand generations.  But sometimes there is a shift to a cooperative tit-for-tat hybrid strategy.  This hybrid strategy is not very stable, and always seems to eventually fall back to the defection strategy.  Here I show an example of this happening:


Why does this happen?  I'm not quite sure.  But I think that when everyone is defecting, there's no real disadvantage to taking a tit-for-tat strategy.  So genetic drift will sometimes cause the population to shift towards tit-for-tat.  Once there is a critical mass of tit-for-tat, it becomes advantageous to cooperate, since you will get back what you paid for.  Soon everyone is cooperating or using tit-for-tat.  But then, perhaps by genetic drift, we lose too many tit-for-tat individuals, and it becomes advantageous to defect again.  And so begins the cycle again.

Here I also show the average scores of the population over time:

If the population is defecting, then the average score will be 1.  If it is cooperating, the average score will be 3.  Here you see a punctuated equilibrium behavior.

Of course, you cannot draw any real conclusions from this simulation, since I did a lot of fine-tuning of the parameters, and because I'm not really interacting with the established literature.  The conclusion is that this is an interesting topic, and now I feel motivated to read some papers.  When I do, I'll report back.

--------------------------------
1.  Actually I don't have individuals really play an iterated prisoner's dilemma against each other.  Instead, I calculate the average result of an iterated prisoner's dilemma between those two players.  There is no chance involved in this calculation.
2. This means that each individual plays four games on average.  Two games where the opponent is chosen randomly, and two games where that individual was chosen as the opponent.

This is part of a miniseries on the evolution of Prisoner's Dilemma strategies.
1. Evolution of prisoner's dilemma strategies
2. Extortionate strategies in Prisoner's dilemma
3. Prisoner's Dilemma and evolutionary stability
4. A modified prisoner's dilemma simulation

1 comment:

Larry Hamelin said...

Keep us posted. This is important.