I've started working with Boris on his variant of MCTS. Here is one version of MCTS from here https://github.com/glesica/mcts-project/blob/master/paper.markdown
function TREEPOLICY(v)
while v is nonterminal do
if v is not fully expanded then
return EXPAND(v)
else
v = BESTCHILD(v, Cp)
return v
function EXPAND(v)
a = an untried action, valid at v
v' = result of applying a to v
return v'
function BESTCHILD(v, c)
return argmax of the children of v, based on weight (see text)
function DEFAULTPOLICY(s)
while s is nonterminal do
choose a valid action based on s, uniformly at random
s = result of applying the action to s
return reward for state s
function BACKUP(v, d)
while v is not null do
increment visit count of v
update value of v based on d
v = parent of v
No comments:
Post a Comment