Conversations: Transfer from population to Long Term Memory with Taboo Effects

MB12: Adding an archive to maintain diversity by punishing solutions that are similer to those already in the archive.

1. Adding archiving. Every M generations (e.g. 10), the best candidate is stored into the archive.

def myObserver(self, population, num_generations, num_evaluations, args):
best = max(population)
print('{0:6} -- {1} : {2}'.format(num_generations,
best.fitness,
str(best.candidate)))

#Store the best candidate in an archieve and use this to punish exisiting solutions in the population.
#1. On convergence, or simply every M generations, move the best individual into an archieve...
#2. Later the archieve may also be used to bias the varation operator (but not yet).
if num_generations%10 is 0:
self.cl.addToArchieve(best)

2. Whenever fitness is calculated a fitness modification is calculated on the basis of comparison to the archive.

fitness = self.getFitness(smMatrix)
#The agent x is compared with the archieve in order to see what fitness decrement it will be penalized by based on its similarity to the archieve.
fitnessMod = self.cl.getSimilarity(x)
print("fit mod = " + str(fitnessMod))
fitness = fitness - fitnessMod

return fitness
#return np.random.randn(1)[0]

The method for determining similarity looks like this. It adds a fitness penalty if there are archive objects that have the same sensory prediction dimension or the motor control dimension as the candidate whose fitness is being determined.

def getSimilarity(self, x):
penalty = 0
for a in self.archieve:
#print(type(a.candidate))
a2 = a.candidate
#Add a linear penilty of 1 if the sensory stream predicted is the same as one that already exists.
if a2[2] is x[2]:
penalty = penalty + 1
#Add another penalty (less than 1) if the motor dimension is the same as another one
if a2[1] is x[1]:
penalty = penalty + 0.1
return penalty

The individuals that are moved to the archive are a collection of the best individuals through time, and so the archive is the repository of adaptation in LTM. Archive items are stable in this setting.

Results

Some 'high mutual information sm pairs discovered by a GA where fitness is MI between a single motor and a single sensor.

[ 10, 40,

[0.3459535709298964, -0.6131621226547173, 0.19087203169462041, 0.8003590573540689, -0.32932182058648873, 0.967009052971153, -0.3078637768319372, -0.9880379726303412, -0.9499366045595938]], fitness = 3.32192809489, birthdate = 1358458195.29>,
, 0, 14,
[-0.5324853272902659, -0.09836515624320341, -0.7370449134980012, -0.5394703720111196, -0.09431121706739425, 0.4531339866976117, 0.025579642363618482, -0.18384184128236153, -1.1156234876733415]], fitness = 3.32192809489, birthdate = 1358459011.34>,
8, 36,
[-0.4705824374585931, -0.16727919696753368, -1.3523655343568557, -0.7943489059779616, 0.9522655355401921, 0.5397046224948765, -0.5772961248785852, -0.39362067755596775, -0.46715438522785085]], fitness = 3.32192809489, birthdate = 1358459814.11>,
16, 41,
[-0.8732839721248119, -0.08163419394461269, -0.8761458614931034, -0.49311953958530497, 0.45628680751851103, 0.7918101875049637, -0.22994717681554705, -0.5309220753609515, -1.6099097132444657]], fitness = 3.32192809489, birthdate = 1358460615.54>]

Note that different sensor dimensions are put into the archive at each 10 generations. Tomorrow I will take the archive file that evolved overnight and run these behaviours on the real robot and upload videos of the diversity of behaviours learned. The early runs show that the population wonders over optima not revisiting previously discovered ones...

Pop size = 10. With archiving each generation. Probably I should run it overnight with archiving every 10 generations and a larger population size. There is elitism which might be switched off I think.

OK, so if this all works out tomorrow morning I'll have a load of curiosity loop type things, with sensory dimensions which seem to have a lot of mutual information with motor actions. These pairs will typically be pairs of sensory dimensions influenced by motor actions. The robot will have discovered a bunch of these pairs. What next?

Possible next steps:

1. This knowledge of interesting sm pairs could be built upon. So far the emergent goal of this algorithm has to discover sensory dimensions that can be influenced by motor action. However, only very simple actions have been evolved. These are actions specified by 2,2,1 FFNNs which are optimised by EC.

2. Once we discover the lowest level sm primitives, one possibility is to attempt to form forward and inverse models of the dimensions that have been discovered, e.g. take gyroscope and left leg, attempt to model gyro(t), leg command(t) --> gyro(t+1), i.e. learn a forward model of how the sensor will be modified given the motor command. If such forward models are learned with all other joints kept fixed, they may not necessarily apply when other joints have different values however! Nevertheless, the dimensions discovered by the MI mechanism can be used to define what one tries to predict.

a. Construct predictors (forward and inverse) for each of the sm dimension pairs discovered by the MI method.
b. Spend some time T trying to learn each kind of model with the prediction machinery available.
c. Bias time spent on those models which are currently improving fastest in reducing prediction error.

At the end of this, one will have some sm pairs with accurate inverse and forward models, and some with inaccurate inverse and forward models (possibly). For example, it may be that it is possible to predict very accurately what will happen to the gyroscope for a range of knee movements because there is a simple linear relationship between the two, given the way in which the knee movement is done. It may be that the robot learns that when it moves its arm its touch sensor gets activated.

3. Once these sm pair primitive forward and inverse models have been learned, then more complex models can be attempted that are combinations of these dimensions. For example

a. Take L leg to gyro model and combine it with R leg to gyro model, i.e. try to predict gyro from both the L leg and R leg joint commands simultaneously. There may be considerable interference between the two legs in how the gyro is influenced. What should the fitness of such higher order models be?

i. It could be the sum over all mutual informations between all pairs of the system. aka neural (body) 'complexity' in the Tononi Sporns and Edelman sense, but applied to body time series :)

ii. First derivative of prediction error?

4. Goals could be generated in the sensory dimensions present in the pairs, e.g. one may try to produce various patterns of the gyroscope and discover ways to produce these patterns by making specific kinds of motor action in the motor dimensions found in pairs that contain the sensory dimension in question. The goals may be not only final positions, but include FORCE, timing, trajectories, oscillations, etc... How could such a rich representation of sm goals be represented and arise naturally? One possibility is that the sensory and motor commands are extracted from a Liquid State Machine (LSM). The sensory inputs that 'could' influence the motor are injected into the LSM, and the motor output is a linear readout from the LSM. Similarly, the sensory state to be predicted is now not directly predicted, but it is fed into an LSM and dynamic features are extracted from it, and it is these features which are predicted from motor features.

Results continued

Results from a long run with archive based exclusion. The later archive entries have not obtained maximum fitness and so they shouldn't be in here, so add an extra criteria for archive entry that the individual should be above some threshold fitness to enter the archive.

Conversations

Thursday, 17 January 2013

Transfer from population to Long Term Memory with Taboo Effects

No comments:

Post a Comment