Friday, 18 January 2013

Motor Babbling Video and what to do next....

MB12, video of the start of the run...

At the beginning of the run NAO starts with some random babbles, you can see each one is tried for some time, and then it moves onto the next babble.

The next step as discussed previously is to somehow use the insights from the MI pair discovery to guide the construction of predictive models.

Right now, the motor archive contains actions that have high MI between the single motor and the single sensor. The robot does not yet have any competence in achieving specific sensory states in these dimensions, so a natural thing to do would be to choose units from the archive, and attempt to form forward and inverse models of the form

METHODS FOR LEARNING MODELS OF THE AGENT/WORLD TO BE USED FOR CONTROL AND PREDICTION. 

FORWARD MODEL: s(t), m(t)  to  s(t+1) 

INVERSE MODEL: s(t), s(t+1) to   m(t)

How can inverse models be learned? We had better start to try to learn inverse models for the above sm pairs discovered by the MI algorithm, e.g. move legs and try to predict the gyroscope states, etc... This provides a set of competences. Competence progress or some such technique can then be used to guide which pairs to learn more effectively.

Consider Dynamic Movement Primitives and CMA-ES which Oudayer et al have used to try to optimise the achievement of goals. This is their lower level algorithm (rather than their higher-level algorithm for SAGG-RIAC, see refs in http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf

i.e.

[68,51,69,70]  Action Synergies are learned, rather than simple individual joint angles to move to.
http://homepages.inf.ed.ac.uk/svijayak/publications/bitzer-HUMANOIDS2009.pdf
http://hal.inria.fr/docs/00/43/85/95/PDF/BaranesOudeyerICDL09.pdf
http://people.cs.umass.edu/~bsilva/paramSkill_icml2012.pdf

The above techniques also allow the robot to use skills learned for one goal for another goal in a systematic way...

While doing...

[14], SSA Algorithm: Robot juggling: http://robotics.usc.edu/~tdahl/relpubs/Schaal-csm94.pdf
This is a critical principle. LWR (Locally weighted regression) is used as a function approximation technique in [14]. Memory based learning (MBL) is where all training data is stored in memory. NN, weighted average, and LWR are applied to form local models on the MBL dataset. Each local model combining points NEAR the query to estimate the appropriate output, i seeeeee... so, it can for example approximate what it expects the motor action should be to achieve a desired sensory state, if there are other actions in the vicinity to use to form an approximation of the function. ahhhhhhhh.... thats clever. LWR for each new point is expensive, and the above paper shows how to do LWR in real time. LWR weights points closest to the query more. So, you form the weights, form the regression matrix, and solve the normal equations. The next issue is how to explore. The SSA algorithm does this by i. trying to keep the system at a desired set point, ii. shifting the set points to achieve a goal (at a slower time scale). "The SSA tries to explore the world by going to the fringes of its data support in the direction of the goal. It sets the setpoints in the fringes until statistically sufficient data has been collected to make a further step towards the goal. In this way the SSA builds a narrow tube of data support in which it knows the world. This data can be used by more sophisticated control algorithms for planning or further exploration."  

or

optimisation methods [84 PI^2 with CMA-ES,99 ES methods,83 Policy gradients ]

towards reaching the goal. So you try to get to the goal the best you can by some black-box method, and on the way you're learning the model for how to do it the next time in a more directed (planning based) manner, i.e. from [14] "The point of view explored in this paper is that the goal of a learning system for robots is to be able to build internal models of tasks during execution of those tasks." 

From Goren Gordon's Hierarchical Curiosity Loops

"Learning the inverse model is usually much harder than the forward model (Jordan1992Nguyen-Tuong,Peters, Seeger, & Scholkopf, 2008). These models are usually usedfor trajectory prediction and planning for robotic arms (Behera,Gopal, & Chaudhury, 1995Ouyang, Zhang, & Gupta, 2006) or description of internal models in the brain (Kawato1999Lalazar& Vaadia, 2008Shadmehr & Krakauer, 2008). Many learning algorithms have been developed for them (Cheah, Liu, & Slotine,2006Nguyen-Tuong et al.2008Ouyang et al.2006Wainscott,
Donchin, & Shadmehr, 2005). However, the training sets were always composed of random presentation of input–output pairs. The basic curiosity loop attempts to find the best input–output pair
presentation by selecting the appropriate actions. [Thats what SSA does too...]"


Jordan, M. I. (1992). Forward models: supervised learning with a distal teacher.
Cognitive Science, 16, 307–354.
http://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture11/jordan-CS92.pdf
[The inverse model is sensorimotor state(t), desired/next sensory state(t+1) to motor action (t), or in other words.  [change in sensor state] sensor(t)-sensor(t+1), [current joint angle] m(t)   to  m(t)-m(t+1) [change in joint angles]. Inverse models can be arbitrarily complex. One of the simplest things to do is just have a forward model (i.e. a simulation, as in Hod Lipson and Josh Bongard's work with the robot that models itself), and run it several times, see what happens, and decide what to do after watching the results of the internal simulation. The idea of inverse models is however that it might be possible to store some data structure that means one doesn't have to run such simulations which might be expensive. This paper is about learning inverse models directly through self-supervised learning during experience. 



Nguyen-Tuong, D., Peters, J., Seeger, M., & Scholkopf, B. (2008). Learning inverse
dynamics: a comparison. European symposium on artificial neural networks
(ESANN) pp. 13–18.
http://eprints.pascal-network.org/archive/00004342/01/ESANN2008-Nguyen-Tuong_4936%5B0%5D.pdf
[Says LWPR is the best method for learning inverse models etc..
http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Nguyen-Tuong-ModelLearningLocalGaussianl_6067%5b0%5d.pdf

Non-parametric regression methods are used to learn inverse models from behaviour, e.g. Gaussian process regression (GPR) or locally weighted projection regression (LWPR). They are less restricted than parametric models. Looks like I'm going to have to learn about learning inverse models, and try this out on the NAO! Damn, and there seem to be a shit load of ways of doing it. 

http://homepages.inf.ed.ac.uk/svijayak/publications/klanke-JMLR2008.pdf

TRY TO USE PYTHON LWPR LIBRARY 

http://wcms.inf.ed.ac.uk/ipab/slmc/research/software-lwpr
http://homepages.inf.ed.ac.uk/svijayak/publications/vijayakumar-NeuCom2005.pdf
http://wcms.inf.ed.ac.uk/ipab/slmc/research/lwpr/lwpr-doc.pdf

]

Learning Inverse Kinematics
Aaron D’Souza
[Very good paper explaining a local method to learn inverse models. Basically, you learn 

[change in sensor state] sensor(t)-sensor(t+1), [current joint angle] m(t)   to  m(t)-m(t+1) [change in joint angles]

by storing data during exploration. While exploring you use LWPR to approximate the above function. Action selection is slowly taken over by LWPR, rather than being randomly generated. The input to LWPR for action selection is a policy about how to change the sensory state given a goal, e.g. simply try to go directly to the goal is one such policy in sensor space.] 


Kawato, M. M. (1999). Internal models for motor control and trajectory planning.
Current Opinion in Neurobiology, 9, 718–727.
http://www.cns.atr.jp/~mieko/CONB.pdf
[Rather too general review with too much information about coupled inverse/forward models. Couldn't understand it on a second reading, but good pointers to the literature. Will come back to this.] 

Behera, L., Gopal, M., & Chaudhury, S. (1995) Self-organizing neural networks
for learning inverse dynamics of robot manipulator. IEEE/IAS international
conference on industrial automation and control (I A & C’95) pp. 457–460.
Ouyang, P. R., Zhang, W. J., & Gupta, M. M. (2006). An adaptive switching learning
control method for trajectory tracking of robot manipulators. Mechatronics, 16,
51–61
http://www.ryerson.ca/~pouyang/an%20adaptive%20switching%20learning%20control%20method.pdf

Lalazar, H., & Vaadia, E. (2008). Neural basis of sensorimotor learning: modifying
internal models. Current Opinion in Neurobiology, 18, 573–581.
http://web.media.mit.edu/~adamb/papers/lalazar_sensorimotor%20learning_review.pdf

Shadmehr, R., & Krakauer, J. W. (2008). A computational neuroanatomy for motor
control. Experimental Brain Research, 185, 359–381.
http://www.jsmf.org/meetings/2008/may/Shadmehr&krakauer.pdf

Wainscott, S. K., Donchin, O., & Shadmehr, R. (2005). Internal models and
contextual cues: encoding serial order and direction of movement. Journal of
Neurophysiology, 93, 786–800.
http://www.shadmehrlab.org/Reprints/jnp_05.pdf



No comments:

Post a Comment