The Coded Lens: Finding Nash Equilibria, part 2

Can a simple learning model help us understand Nash equilibria? Let's look at some results.

Given the setup described in the previous post the following graphs show the results of many runs of the model for a given N. They show the variation in the difference in payout over small windows of runs. If the variation in difference of payout over time is small then the players choices of strategies represent a stable equilibrium state.

This model is reproducing a finding that's fairly well attested. Even though Nash proved that stable equilibria always exist in games like this, it turns out that they can be enormously difficult to find. This is particularly true when they games are being used as models of some natural system and there are realistic constraints on the equilibrium seeking algorithm; the lesson is that just because something exists in theory don't expect that it will manifest in a natural system.

So, what is the equilibrium seeking algorithm we are using here? It's a type of lottery. Imagine a bucket with tickets for each strategy, say five tickets for strategy 1, and five for strategy 2, etc. Each player makes their choice by drawing a ticket from the bucket and then returning the ticket to the bucket. If they win they randomly pick another ticket and replace it with a new ticket for the strategy that just won. If they lost they find the ticket for the losing strategy and randomly replace it with a ticket for another strategy. Nothing changes for a tie. This is a particularly simple minded learning strategy: it has no knowledge of the other player's strategy, nor the actual payouts, nor any memory of the specific history of plays, all the memory is embodied in the probabilities, and the only feedback is win or lose. I set it up such that for a game of N strategies each player has N tickets for each strategy, N*N tickets in total. (I've called it button shuffling in some of my code and you'll see that mentioned on the graphs below.)

What does simple convergence look like? (N=5)

For five strategies, N=5, figure 1 shows data points that represent the difference in payout over the previous 25 rounds.

Figure 1, N=5 example games

Notice that there's some really wide variation before around 300 rounds but after then they settle into some narrow range of values. That's the sign of the equilibria we're looking for, (the payouts are between -1 and +1, and in fact we're seeing the average of the square of differences in payouts over the window, so positive values only).

The blue series settles into a narrow band of choices until it finds a single stable strategy and converges on zero. This is the sort of pattern we were expecting for prisoner's dilemma style situations where in every round both players are getting the same payout.

The yellow and red series settle into some narrow non-zero bands that suggest they're each in a probabilistic equilibrium state, like rock-scissors-paper converging on 50:50 win:lose. The red series in particular is interesting; it's seems to be tending toward a single stable non-zero number which we'd expect to see for a simple probabilistic equilibria; but that means in any round one or the other player is going to lose and so they'll be constantly tweaking their probabilities meaning it doesn't get to sit smoothly on that equilibrium state, (the equilibrium is really about settling on a set of probabilities for choosing a particular strategy to play).

The green series is interesting for having a slightly noisy phase typical of a probabilistic equilibrium, but around round 3400 it experiences a transition to an even more stable probabilistic equilibrium. The fact that it's equilibrium is so close to zero may be meaningful. I'll talk about how we might efficiently hold onto the player states and other information to give as more data so we look more deeply into it's behaviour in the next post.

Quirky periodic noise and late discovery of optima, (N=8)

Figure 2 is a set of outcomes for some random games with 8 strategies, N=8, window size of 64 for the variation of differences metric.

Figure 2, example N=8 games

Again we see all the series settling into a stable state after around 3000 rounds, (much later than the 300 rounds of N=5)

The blue and the red series are nice clear examples of the settling into a probabilistic equilibrium that we know should be possible. That's probably true of the yellow as well. Notice how the red actually finds a simple equilibrium just after 40000 rounds; remember it's just shuffling 64 tickets betweens 8 strategies!

But what is happening with green? It's clearly in a very different state to the others we've looked at so far. Up until about 6000 rounds it seems to be tending toward a some wide and high band of values for it's equilibrium state, but then the band drops to a lower but wider band of differences. Sometimes it finds a large variation in the payout differences over the window of rounds, but then at other times it finds a much smaller variation in the payout difference in the rounds covered by the window. More than that the high variation state seems to hover around a consistent value and low variation state seems like it might be stable as well. We may be looking at a two attractor complex system and some weird phasing between my window size and the size length of time in each of the two competing equilibria.

This type of complex dynamic system with multiple stable modes of operation that it can flip between is a classic in the study of complex systems, if thats what series like this represent then it's not surprising. I'll show more models with this behaviour in future. But for now look up logistic map, bifurcation diagram, and Mitchell Feigenbaum.

Stability is still there to be found, (N=10)

Figure 3 shows a set of outcomes for some random games with 10 strategies, N=10, window size of 100 for the variation of difference metric.

Figure 3, N=10 example games

I've decided to include this set of results to illustrate the kind of obvious convergence on a narrow range of differences we expected to see for straight-forward probabilistic equilibria. The only truly interesting thing here is that for only two more strategies over the previous examples it now takes around 8000 rounds to find those equilibria.

Equilibrium keeps getting harder to find, (N=12)

Here is set of outcome for some random games with 12 strategies, N=12, window size of 144 for the variation of difference metric.

Figure 4, N=12 example games

Even in the nice green series here it doesn't find equilibrium until after about 16000 rounds. The growth in time to find stability is really steep.

The blue series seems to find an early noisy dynamic equilibrium, (results in a narrow band), but then really sharply, just after 50000 rounds, it settles into a much narrow band, much closer to the output we expect for simple probabilistic equilibrium.

The red series is interesting here. It's clearly settled into some sort of dynamically stable regime, with results tending to either of two values on either edge of the band out outputs, and few in the middle of the band. And it's showing what to my eye look like phasing effects probably from my windowing strategy. I think the yellow series might be doing the same but in a less obvious way.

What does it all mean?

First, simple systems can produce complex behaviour, which will be repeated theme in this blog.

Second, equilibrium is not a simple concept, which means automatically recognising it or quantifying it is also hard.

Third, small growth in variables (increasing N in this case) can have serious impact on the performance of models like this. In this case running enough of the model to see the interesting results took much longer simply because it needed more rounds to see the interesting behaviours.

Fourth, finding the interesting range for variable is an art not a science.

Fifth, we're going to need to look more closely to understand some of these data series.

The Coded Lens

Wednesday, December 30, 2015

Finding Nash Equilibria, part 2

No comments:

Post a Comment