Editer l'article Suivre ce blog Administration + Créer mon blog
19 juin 2012 2 19 /06 /juin /2012 10:11
MCTS algorithm
Input: a state S
Output: a decision D
Global variable: a tree T, each node having nbSims
    initiliazed at a single node with state S and nbSims=0 and totalReward=0
While (I have time)
    Do one simulation from state S until a game is over 
            ==> we get a sequence s(0), s(1), s(2), s(3),..., s(k)     (with s(0)=S and s(k) a final state)
            ==> we get a reward r
    Consider i minimal such that s(i) is not in T
    If such an i exists, then add s(i) in T, with an edge from s(i-1) to s(i)
    For each i such that s(i) is in T:
Procedure for doing one simulation from state S until a game is over:
while (s it not terminal)
    choose child s' of s such that UCB(s') is maximum
Procedure for computing the UCB score of a node s':
    meanReward=s'.totalReward / s'.nbSims
    UCB = meanReward + sqrt( log(fatherSims) / s'.nbSims)   
Based on research in the TAO team
Partager cet article


Voir Blog(fermaton.over-blog.com)No.22- THÉORÈME ÉTAT. - La base des mathématiques de l'Esprit.