Monte-Carlo Tree Search in one page

<< Fast Graphical User Interface... A small selected sample of... >>

19 juin 2012 2 19 /06 /juin /2012 10:11

MCTS algorithm

Input: a state S

Output: a decision D

Global variable: a tree T, each node having nbSims

initiliazed at a single node with state S and nbSims=0 and totalReward=0

While (I have time)

{

Do one simulation from state S until a game is over

==> we get a sequence s(0), s(1), s(2), s(3),..., s(k) (with s(0)=S and s(k) a final state)

==> we get a reward r

Consider i minimal such that s(i) is not in T

If such an i exists, then add s(i) in T, with an edge from s(i-1) to s(i)

For each i such that s(i) is in T:

s(i).nbSims++

s(i).totalReward+=r

}

Procedure for doing one simulation from state S until a game is over:

s=S

while (s it not terminal)

{

choose child s' of s such that UCB(s') is maximum

s=s'

}

Procedure for computing the UCB score of a node s':

{

meanReward=s'.totalReward / s'.nbSims

fatherSims=s'.father.nbSims

UCB = meanReward + sqrt( log(fatherSims) / s'.nbSims)

}

Based on research in the TAO team

Partager cet article

Repost0

Published by teytaud - dans Recherche
commenter cet article …

<< Fast Graphical User Interface... A small selected sample of... >>

commentaires

E

ewe

17/10/2012 13:34

Voir Blog(fermaton.over-blog.com)No.22- THÉORÈME ÉTAT. - La base des mathématiques de l'Esprit.