Thursday, 18 April 2013

Game Evolution

Game Interference vs. Game Independence

Consider two games that are completely opposite, i.e. maximise and minimise z accelerometer. Consider binary tournament VEGA. Actor molecules are selected randomly in a tournament on one of these games. An actor cannot be fit on both, being fit means it must be less fit on another. These games therefore interfere with each other in evolution. If there are interfering games in a mixed population of solutions, then improvement on the games is limited in comparison to the case where these games had been played in isolation.


Interference on the min and max z accel sum games

A simple solution is to force perfect niching such that for each game in the population of games there exists a unique niche of actors that are evolved on that game. In binary tournament selection, when a random actor is chosen, the second actor chosen must come from the same niche. 

Each molecule is assigned to a particular niche. For now atoms don't overlap between different molecules so its much simpler. When this is done, we see that there is obviously no interference at all, see below... 

But I suspect we need something in the middle. In this second case, the molecules are independently evolving on the games. There is no information sharing between populations so this means that if there are shared modules evolved in different games they do not help the other games. 

Therefore, we can introduce some kind of migration between game islands. For now we can include random migration. It is an empirical question whether this helps or not. Perhaps it will help only if recombination is allowed which currently we have not implemented yet. 

So, we stick with the Hawaiian island model (no migration). The actor population is run for 100 generations per game generation. At that point we have a range of measures of the population of solutions that we can use to decide which games are worth concentrating further effort on. 

The basic measurements of the population are. 

1. Mean fitness in each island. 
2. Variance of fitnesses in each island. 
3. Co-variance for an individuals in one island between the fitness on the island game vs. on the games on other islands. 

Several selective processes are possible given the above breakdown. 

1. Fitness of game = Mean fitness of solutions on that game. So, games with higher fitness replicate preferentially. 

2. Fitness of game = Variance of fitnesses on each island. 

3. Fitness of game = A.mean + B.variance where A and B are some coefficients. 

What is the outcome we are trying to optimise? We want interesting games to evolve and we can tune A and B to optimise interestingness. Not sure what that is. But I know it when I see it for now. 

So, lets try it out. We'll do SHC basically with a set of just 2 games for now. We choose the best game and we mutate it and revert to the original if the mutant is no good, but if its better we keep it and go on... 

Game mutations for now are of a simplified type, they involve choosing between one of 3-4 functions, sum, negsum, dif, negdif, and one of the possible 2-element combinations of sensory streams. 

In this case, the behaviour of the NAO is expected to be rather trivial though isn't it. With only simple games evolving. There are so many games in the above space that involve e.g just choosing a subset of sensors and maxing them or mining them! There is a combinatorial explosion of trivial games at this low level! 

Something OTHER than selection for fitness variance seems to be needed to escape this lower level of trivial games. If games only get input from the sensors then they remain trivial. However, if game atoms get input from memory locations written to by the action atom they are scoring, then they can be as complex as the action atoms themselves. 

If an action atom contains for example compressors/clusterers then the game can be to max or min some property of the clustering atom, eg.. the unexpectedness of an event. Or, it might be how sophisticated/accurate/encompassing an LWPR model can be of some sensory/motor data can be. Shit. I think we're back to square 1 with the reward function. 

I think we can't escape the fact that some pretty high level heuristics for value functions are required hard-coded/innate in an epistemic agent. 

Lets think backwards, what would make Nao try to jump, or try to move forward? Seeing new things, trying to make sense of new things in the world. Novelty. Well, that can be done in terms of a classifier. Surprise. So, lets say that a game atom gets input from a classifier atom message. The message signals the surprise in the sensory stream obtained during this run. If its classified as different from previous sensory streams then that is high fitness.


THANKS FELIX FOR SENDING ME THIS REFERENCE ! 

http://www.cs.cmu.edu/~tom7/mario/

Chooses bytes in memory that go up, and defines maximising them as winning, by watching people play. It infers that people WANT to maximise something. So this is imitation. It uses the concept that people will try to maximise something. The controller then tries to maximise this set of registers. 

So to generalize, the heuristic is, the robot observes something that someone else does. Uses pattern recognition to detect some low-dimensional description of the data that extracts something that the observed person seems to be maximising. 

Not something I can use really, because in a way it is supervised by an expert. 

No comments:

Post a Comment