More robust game AI with randomness

David Stark / Zarkonnen
11 Jan 2015, 3:33 p.m.

When programming game AI, a common pattern is to look at each possible action and assign them a numerical score to figure out which to take. This may be done explicitly, with each possible action actually considered, or implicitly. A lot of thought may go into calculating the best action scores, taking all available information into account and also projecting play forwards in time. Or it may be as simple as "there's an enemy in front of you, shooting it would be a great idea".

A problem can arise when the AI always picks the action with the best score. This sounds odd - after all, the best-scoring action is the best action, right? But unless you did a very good job at calculating scores, a particular type of action can end up constantly scoring best. The AI gets locked into repetitive actions that become easy to predict and outmanoeuvre, and may fail to take care of secondary or long-term goals.

(This post is partly inspired by Vi Hart and Nicky Case's Parable of the Polygons in that it has a bunch of interactive toys to hopefully illustrate its points. It's not nearly as pretty as the polygons stuff, mind you.)

For example, an AI in a space conquest game has decided that the planet Zurg is an excellent target for invasion. It sends an invasion fleet, but due to the superior tactics of the human opponent, the fleet is defeated. The next turn, the AI decides that there's this planet, Zurg, that would make an excellent target for invasion. It sends an invasion fleet, which is easily defeated by the recently reinforced defences on Zurg. Next turn, it notices a planet called Zurg that should make a suitable invasion target...

The AI isn't clever enough to adjust its strategic analysis on the basis of experience. If it were to spend some of its efforts developing new weapons, or chose a less juicy but worse-defended target, it might make some progress instead of just throwing its forces into a shredder.

As a programmer, you can compensate for this by making the AI remember that someone is harder to invade than they look. Now the AI yearnfully looks towards Zurg, but never dares to invade because it doesn't understand that the human's tactical advantage had been based on a temporary situation caused by the inferior quality of the AI's missile jammers, making its ships vulnerable against ground-based missiles from Zurg. On the basis of this inferiority complex, the AI now pours vast resources into fleet construction, terrified of being wiped out by the human's supposedly superior forces. Soon, most of its spare resources go into fleet maintenance, crippling its long-term economic prospects...

You can compensate for this by allowing for these experience-based re-evaluations of invasion targets to decay over time. And by enforcing a particular ratio of industrial development to fleet production. Or by sitting in a corner and drinking yourself into a stupor.

My point is that it's extremely hard to create a correct ranking of actions in all conceivable situations that never suffers from one type of action being overused or starved for attention.

The solution? Take the top actions and pick one at random. In the very simplest implementation, this just means taking all actions that aren't obviously a bad idea and picking from them with equal likelihood.

A better system would be more likely to pick higher-scoring actions. Also, it's a good idea to bundle actions by type. If there's three hundred invasion targets and only three ways to improve industry, if you count each action individually, the AI will keep on picking invasion. Instead, represent each type of action by the score of its best choice, make an initial random decision on the type, and then make a second random decision inside the type to determine which precise action to take.

Yes, this will sometimes lead to suboptimal play, but it also leads to far more robust play. It's much better to allocate some effort to everything fairly important than to get locked into pursuing the one thing at the top of the list.

The following is a concrete example of how this works, as a little interactive toy. In this game, there are two players playing two games of chance in parallel. Each turn, a player may put another point into the first or second game, increasing their likelihood of winning. Then, both games pay out a reward. For example, if player 1 has put 3 points into a game and player 2 has put 7 points, there's a 30% chance player 1 will get the payout, and a 70% chance player 2 will get the payout.

Now of course there's a perfectly simple algorithm for perfect play here: put in a point so that your expected payout is increased the most. But let's pretend that this is way harder, and so the AI programmer just uses the payout amount to pick which game to improve. In a game where the first game pays out 4 points and the second 3 points, the AI will always choose to improve game one. This is obviously stupid, but again, let's assume this is a much harder game where we don't really understand how the payoffs will work.

Now if we take the same AI, but just make it choose the second-best (other) option 30% of the time, it does much better than the original AI. You can try this out below, though note that it will take a little bit for the scores to stabilise.

The blue player is the normal AI, and the green one is the AI with randomisation.

0
0
0
Payout:
%
1
1
%
Payout:
0
1
1
30
Start
Reset

And more than that: an AI with randomness can beat one which has a better scoring function but doesn't randomise. We can represent this by switching around the probabilities. Player 2 is now under the impression that the second game is the more valuable one, and is saved from this error by randomization. Despite preferring game 2, it still picks game 1 30% of the time, which eventually turns out to be a better strategy than always picking game 1.

0
0
0
Payout:
%
1
1
%
Payout:
0
1
1
70
Start
Reset

Of course, there's a limit to how far you should go with randomness. If one action is much better than the other, picking randomly will do worse. Randomness is only useful for picking between a bunch of actions that are at least decent:

0
0
0
Payout:
%
1
1
%
Payout:
0
1
1
50
Start
Reset

Finally, there's an additional benefit simply in that it makes the AI a bit unpredictable (but not frustratingly random), which tends to look more clever and "natural" to players.

I've used this approach pretty heavily in some older games like "Asteroid Storm" and my Java4k remake of Master of Orion. In both cases I was pretty lazy and really only eliminated obviously stupid actions before just picking one at random, and the resulting AI played really quite well.

There hasn't been that much cause to use this technique yet in Airships: Conquer the Skies, as I've been focusing mostly on realtime tactical combat AI, where optimal play can be computed pretty robustly and there isn't a well-defined set of actions to choose from. But as I put more work into the strategic conquest mode, I imagine this will again be very useful.