Wednesday, December 30, 2015

Finding Nash Equilibria, part 1

John Nash is the patron saint of game theory, the importance of his mathematical discoveries and methods across the breadth of the sciences is hard to overestimate. His profound work is perhaps even more humanly significant given his person struggles. Thanks to "A Beautiful Mind" he is one of a very few celebrity scientists and mathmeticians. His passing this year was a great tragedy and I'd like to dedicate this first model in The Coded Lens to his memory.

Nash's game theory often describes a game where two players pick from a limited menu of strategies to play against each other. Depending on the pair of strategies played in the round each player gets a payout.

The typical structure of Nash's games is as follows:
  • N strategies
  • 2 players
  • Both players are always free to play any strategy
  • Playing a strategy against another strategy has a payout
  • Payouts are the same for both players
  • Play is simultaneous and the payout is oblivious to history
  • Rational play is defined as maximising the individual payout, (or minimising loss)

There are many customised styles of game with different constraints that have been developed to test many different theories. Most people are familiar with two games that fit the constraints listed above: The prisoner's dilemma and rock-scissors-paper.

The prisoner's dilemma is a game with N=2 and the payouts structured so that if both players collaborate they can maximise the payout and share it equally, but if one cheats the cheater wins a bigger share but the net result is worse, and if they both cheat then both lose some and the net result is a loss but they individually lose less than if they play nice and are the victim of cheating. Mutual cheating ends up being the only stable strategy that minimises a players loss and in a blind competition is the rational behaviour. A great deal of moral theorising has been informed by this game.

Rock-scissors-paper is game with N=3 and the payouts structured so that there is no simple strategy to minimise losses. In fact the best way, (baring tricks of human psychology), is to make your choices randomly with equal probability of playing any strategy.

Nash's work proves some interesting truths that are present in those two games. The first is that the best rational stable result may not be the best result if you could organise to collaborate outside of the game; forced blindness cuts us off from the best outcomes. The second is that the stable result may be a probalistic concern over many plays of the game; so we can ensure you win some of the time but you'll also lose some of the time, the net outcome overtime will minimise your loss.

The existence of Nash equilibria is often wrongly taken to mean something like there being an obvious strategy producing an optimal outcome. For instance if markets are seen as Nash games, (a moderately reasonable abstraction), then misinterpreting Nash equilibria to mean that there is a simple optimal strategy choice would (wrongly) prop up the idea of stable markets.

Beyond that there's another problem. Finding the Nash equilibrium is hard. The more strategies, (larger N), the harder it is.

Hunting Nash Equilibrium

This model sets up Nash type games for N strategies by randomly assigning payouts to different strategies pairs and then plays them "endlessly" with players changing the probability of their most recent strategy choice based on if it won or lost and starting from equal probabilities; the probabilities become each player's memory of past play, but the actual payouts are never available to players nor is the opponent's choice.

We're using feedback from play to modify subsequent play, that's a recipe for a complex dynamic system.

The metric we'll look at is the the difference in payout in each play:
  • in the simple equilibrium case where both players always get the same payout the the difference in payout should be 0 when the model has converged on the equilibrium; 
  • if we've settled into a simple probabilistic phase then we should see the difference in payout converge on a non-zero value; 
  • the hardest case to analyse has multiple phases which may trap any optima seeking heuristic until some interesting perturbation pushes them into a new comparatively stable state;
  • if it hasn't converged anywhere the difference in payouts should be chaotically scattered.

In the next couple of posts I'll look at some of the results from running this model and talk about it's implementation. 

In the outputs we'll see all the outcomes listed above: convergence on zero, convergence on simple probabilistic equilibria; chaos and phase shifts. We'll also see how the experimental set up makes some of those states noisy. We'll see that finding settling into an equilibrium state takes make longer for small increase to N. And we'll see that some of the interesting outcomes are hard to quantify.

One of my goals with this blog is to demonstrate how some modern programming techniques can be applied to these problems. I'm mostly going to build bespoke models rather than using particular modelling packages; in part because I'm interested in the breadth of modelling approaches that might be used in different domains so presenting them on an equal footing built from the ground up is fair. I also think programming for scratch might best show how simple some of the most important algorithmic models have been Also, I'm a generalist programmer so it comes easy to me.

I'll also make some suggestions for people who want to play more with this model. One of my hopes is to make a resource that students in the sciences can play with to explore how computers and algorithmic models can be a part of their research practice. 

No comments:

Post a Comment