Reinforcement Learning applied to Learning Beast Behaviour

It seems obvious to use reinforcement learning techniques to learn the behaviour of the autonomous agents in the game, given that the problem is to learn a priori unknown control policies, and rewards seem easy to assign.

The simplest (effective) such techniques seem to use the idea of temporal difference; of these, Q-learning is straightfoward and has some nice convergence properties.

There are two aspects to these algorithms:


The idea of Q learning is to directly approximate the Q-values, which are maintained in a two-dimensional table keyed by the world-state and the action. Roughly, a Q-value is an approximation to the expected reward value if we take this action in this world-state, and from the successor state follow the optimal policy.

The update rule is as follows:
Q(s, a)' := Q(s, a) + alpha * (reward + gamma * maxa'[Q(s', a')] - Q(s, a))

Note that it may be the case that the successor state is not entirely determined by the action performed by the agent.

The formula can be applied directly, in a forward manner; this is nice as the memory requirements are low. The problem is that learning is quite inefficient, given the usual "zero rewards mostly" learning scenario, as non-zero rewards will only be propogated backwards very slowly.

The alternative is to maintain a stack of (state, action, reward) triples, and update the Q-values at some convenient time backwards, i.e. from most-recent reward to the bottom of the stack.

The requirements for the algorithm to converge on a optimal control policy are as follows:

See [Mitchell] for details.

Action Selection Policies

Which to choose? softmax feels nicer as it exploits all information for a given state.

Learning Beast Behaviour



[Mitchell] Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997.

[Sutton and Barto] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.

Thanks to Waleed for several discussions, and in particular, the formulation of the state-space transformation.

« back
Peter Gammie
Last modified: Sat Mar 16 19:34:42 EST 2002