Thursday, August 29, 2013

Extortionate strategies in Prisoner's Dilemma

In an earlier post, I programmed a simple simulation of the evolution of prisoner's dilemma strategies.  I promised to read a bit of the established literature on the subject, and follow up.  In particular, I will write reports on the following two papers:

Press, W. & Dyson, F. J. Iterated Prisoners’ Dilemma contains strategies that dominate any evolutionary opponent. Proc. Natl Acad. Sci. USA 109, 1040910413 (2012).

Adami, C. & Hintze, A. Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything. Nat. Commun. 4, 2193 (2013).

The first paper demonstrates that in the Iterated Prisoner's Dilemma game, there exist extortionate strategies.  If you use such a strategy, your opponent can slightly improve their own score, but only by greatly benefiting you.  While this seems like a pretty powerful strategy, the second paper demonstrates that it is not evolutionarily stable.  This post will focus on the first paper.

The strategic landscape

In the previous post, I already explained the Iterated Prisoner's Dilemma.  I proposed a player who is described by three numbers: x, y, and z.  The three numbers describe the probability that the player will cooperate, given what their opponent did in the last round.  This player is known as a "memory-one" player, because they only remember the outcome of one previous round.  But I oversimplified it a bit.  In this paper, a "memory-one" player doesn't just remember what their opponent did in the previous round, they also remember what they themselves did in the previous round.  Thus you really need to describe the player with four numbers, each describing the probability of cooperating, given each of the four outcomes in the previous round.1

You could also have a "memory-two" player who remembers the outcomes of the previous two rounds.  Or a player of arbitrarily large memory.  You'd think that more memory makes for a better player.  However, the paper proves that there is a sense in which this is not true.

Suppose that player X is a memory-one player, and chooses some particular strategy.  And suppose player Y is a memory-two player, who chooses the best strategy to respond to X's strategy.  But it turns out that Y's best memory-two strategy fares no better against X than does the best memory-one strategy.  Note that this is a precise statement, and it's not quite the same as saying that more memory is useless.2 But it provides some justification for focusing solely on memory-one strategies.

In summary, each player is described by four probabilities.  Player X is described by p1, p2, p3, and p4 (for the outcomes cc, cd, dc, and dd respectively, where c is cooperate, and d is defect).  Player Y is described by q1, q2, q3, q4.  Since these are all probabilities, they must be between zero and 1.

Zero-determinant strategies

From this point on, there will be a lot of math.  Skip to "extortionate ZD strategies" to get past the math.

When I calculated scores in my simulation, I used what's called the Markov matrix.  It looks like this:

Fig 2A of paper

The Markov matrix tells you how to transform the probabilities of one iteration to the probabilities of the next iteration.  Each iteration is associated with four probabilities describing the likelihood of each of the four outcomes.  The four probabilities can be written as four components of a vector.  By multiplying the vector by the Markov matrix, you get the vector corresponding to the next iteration.  By multiplying the Markov matrix many times, you can get the probabilities after many iterations.

But if you multiply the Markov matrix many many times, the outcome should approach some limiting state, called the stationary state.3  I didn't calculate the stationary state in my simulation, because I wasn't really sure how.  But the authors of the paper are a little bit more knowledgeable, and can calculate the stationary state.  In my simulation, I calculated the average outcome of the first 10 iterations, but here the authors effectively calculate the average outcome of an infinite number of iterations.

Once you calculate the stationary state, you just need to calculate the scores.  To do this, you do a dot product between the probability vector and the score vector.  The score vector is (3,0,5,1) for the first player and (3,5,0,1) for the second player.  It turns out that this dot product is equivalent to the determinant of a special matrix:

Figure 2B of paper.  v is the stationary vector, while f is the score vector.

Why is this matrix so important?  Because player X has complete control over the second column, and player Y has complete control over the third column.  This allows for "zero-determinant" strategies (henceforth ZD strategies), where the player forces the determinant of the above matrix to be zero.  All you need to do is choose probabilities such that your column is exactly equal to the last column.  If a matrix has two identical columns, its determinant is zero.

You might wonder, why would you ever want to set the determinant to zero?  Doesn't that just set someone's score to zero?  In fact, it does more than that.  You could, for instance, set the sum of your score and the opponent's score to zero.  Instead of using (3,0,5,1) or (3,5,0,1) for the score vector, all you have to do is use the sum of these two vectors, (6,5,5,2).  In general, you can enforce equations of the following form:

Equation 7 from the paper.  sX is X's score, and sY is Y's score, while the rest of the variables are parameters chosen by the player using the ZD strategy.

This equation should make you skeptical.  Can X choose a strategy such that sX - 10 = 0, thus causing their average score to be 10?  Clearly this is impossible, since the maximum score is only 5.  If you try such a ZD strategy, you must set p1, p2, p3, and p4 outside their allowed range between zero and one.  This is the catch of the ZD strategy: not every ZD strategy is possible.

Extortionate ZD strategies

So far, I've shown that a player using a ZD strategy can enforce a linear relationship between their own score, and their opponents score.  However, not all linear relationships are possible.  What linear relationships are possible?

One thing you can do is unilaterally choose your opponent's score.  In a game where the outcome scores are 5/3/1/0, you may set your opponent's score anywhere between 1 and 3.  However, you can't unilaterally choose your own score using ZD.  After all, it would be a contradiction if both you and your opponent had complete control over your score.

Another thing you can do is enforce a positive correlation between your score and your opponent's score.  Thus, the more your opponent helps you, the better it is for your opponent.  One such ZD strategy is known as the "tit for tat" strategy, where you simply copy the action of your opponent in the last iteration.  In the tit for tat strategy, your opponent's average score will exactly equal your own average score.

But there are more extortionate ZD strategies, where you benefit far more than your opponent does.  For example, player X could enforce the rule (sX-1) = 6*(sY-1), meaning that if Y tries to improve their own score, then X's score improves six times more than Y's does.4  In general, you can be as extortionate as you want, although at some point your opponent loses incentive to bother trying for such meager rewards.  Or if your opponent is smart, they may give up on those rewards to spite you.

Consequences for evolution of altruism

It might seem like the existence of extortionate strategies is bad news for the evolution of altruism.  In fact, it's more complicated than that.  You can use extortionate strategies to hurt your opponent, but hurting your opponent doesn't necessarily help yourself.  The mere existence of an extortionate strategy doesn't lead to selfishness.  However, ZD-extortionate strategies have an additional property that I think was not properly emphasized in the paper.  The more extortionate you are, the greater your own rewards when your opponent cooperates.

For example, the least extortionate strategy is the tit for tat strategy.  If you persuade your opponent to fully cooperate, your average score will be 3.  But if you instead use the extortionate strategy above, with (sX-1) = 6*(sY-1), and if your opponent fully cooperates, then your average score will be 4.  The problem with ZD strategies is not that they're extortionate, but that they benefit more extortionate players.

So imagine that you are playing against an evolutionary opponent.  You can choose a ZD-extortionate strategy, and watch your opponent evolve towards a more cooperative strategy.  When they cooperate fully, you can use a more and more extortionate strategy to reap greater rewards (while also hurting your opponent).

Of course, in this picture, we're imagining that one player is intelligently choosing a ZD strategy, and the other player is blindly evolving.  The paper also briefly discusses the case where both players are using ZD strategies, or if one player has a "theory of mind".  I am a bit unsure about the applicability to evolution.  Even the most intelligent agents are not going to choose a strategy based on its eventual evolutionary outcome, many generations down the road.  If agents are not intelligent, I'm not sure if ZD would ever evolve.  Does a ZD strategy really give you an advantage if you're mostly playing against siblings and cousins who are using similar strategies?

I suspect that the next paper, which shows that ZD strategies are not evolutionarily stable, will bring up one or more of these issues.  I will read the paper and see.

-----------------------------------

1. Actually, a memory-one player requires five numbers.  The first four numbers specify what to do in the case of the four possible outcomes of the previous game.  The fifth number specifies what to do in the very first game.  However, in the long run, the first game doesn't really matter (except in the scenario mentioned in footnote 3).  And this paper uses some special calculations that don't use the outcome of the first iteration.

2. The key thing is that if the low-memory player chooses a strategy first, and the high-memory player responds, then there is no advantage to having high memory.  However, if the high-memory player chooses a strategy first, and the low-memory player responds, there is a disadvantage to having low memory.

3. There's also the possibility that the game-state will circle around the stationary state without ever reaching it, and I think it's possible for there to be many stationary states.  The authors don't really address these possibilities, but they're smart so I'm sure they know what they're doing.  I think we can safely ignore these possibilities because they only occur for pathological cases.

4. To get this outcome, try (p1,p2,p3,p4)=(3/5,0,2/5,0).  Tit for tat is (1,0,1,0).

This is part of a miniseries on the evolution of Prisoner's Dilemma strategies.
1. Evolution of prisoner's dilemma strategies
2. Extortionate strategies in Prisoner's dilemma
3. Prisoner's Dilemma and evolutionary stability
4. A modified prisoner's dilemma simulation

Monday, August 26, 2013

A relaxed community

This post was cross-posted on The Asexual Agenda.

I've seen many arguments over demisexuality over the years, and I've seen many people say that demisexuals aren't an oppressed group.  Defenders of demisexuality  generally use two strategies.  The first is to talk about a few problems experienced by demisexuals.  The second is to argue that even if demisexuals are not oppressed, it is still a useful label.  The second strategy was seen in SlightlyMetaphysical's post earlier, but can be seen in much older articles as well.

------------------------------------------

At my last queer conference, I attended a caucus for queers in STEM (science, technology, engineering, mathematics).  One person I talked to there just couldn't think of a single issue he personally dealt with as a queer in STEM.  That's okay!  There are some intersecting identities that just don't have that many issues, and there are some individuals who for whatever reason avoid the issues experienced by others in the same minority group.  But a queer STEM community can still be useful, because it gathers people with similar interests.  I think communities are great, why do we even need an excuse to create one?

------------------------------------------

I'm trying to draw these threads together, and apply them to ace identity.

I know that it is distasteful to compare the marginalization experienced by different groups of people, because this often results in dismissing smaller problems, as if only large problems were worth solving.  But we all basically know that aces tend to experience less marginalization than, say, transgender people.  This clearly doesn't imply that we should ignore ace problems, but what does it imply?

I think it means we can relax a bit.  We don't need to focus so much of our discussion on the problems we face, or how to solve those problems.  We can also talk about our favorite cakes, exchange our favorite webcomics, geek out over our favorite sexuality models.  We can discuss the ups and downs of social networks, organize outings, find partners.

We can spend some time helping other people, especially other sexual minorities.  I've heard many a complaint from queer folk that "allies" are too demanding and unhelpful.  It's an unfortunate consequence of allies not having much personal investment in queer issues.  But if you're ace and identify as queer, the personal investment is there.  You spent all that time researching and educating yourself about the ins and outs of asexuality, you can do the same for other sexual minorities.  I consider it damn near a moral obligation.

Another advantage is that we can be ourselves!  We often talk about the "gold star asexual", who has all their personal characteristics arranged in such away that no one can deny their asexuality.  We all feel pressure to conform to that gold star image, lest you cause people to doubt asexuality, and you hurt your fellow aces.  But if we recognize that the stakes are low, that takes some of the load off.  I'd like more people to recognize and affirm ace identity, but I feel confident that we will eventually achieve widespread public awareness.  So while we're at it, we might as well be ourselves, and not pressure ourselves to conform to a perfect image.

Of course, I don't mean to dismiss or minimize the problems faced by many aces.  Having your identity denied and erased is not a trivial thing.  It leads to some very real consequences.  But let's recognize that one of the negative consequences is that sometimes it makes us tense, as a community.  It is okay to relax as well.  There are situations where we as a community don't face many problems, and there are individuals who for whatever reason actually have it pretty good.  When you have problems, recognize them and fight them!  When you don't have problems, enjoy it for what it is.

Regardless of how marginalized we are, this community is still useful, and still meaningful.  And the great thing is, when people do face problems, we can instantly convert to a support group as necessary.