Monte Carlo Tree Search (MCTS)

A decision algorithm suitable for planning in combinatorial game actions. It is essentially a heuristic search method.

The process consists of four steps:

Tree traversal
Node expansion
Rollout (Random simulation)
Backpropagation

UCB Formula (Upper Confidence Bounds)

– simulation estimate (cumulative value of the node divided by its visit count)
– visit count of the parent node
– visit count of the node itself

The UCB1 formula is commonly used in MCTS.

Steps

Each node in the tree has two values:

Q – cumulative value
N – visit count

(Selection) Starting from the root node, repeatedly select the child node with the largest UCB value.
Is the node expandable?
- Yes → Expansion.
- No → Rollout.
(Expansion) Generate new child node(s), then perform a rollout. Initialize new nodes with Q = 0; N = 0.
(Rollout) Continue random simulations until a terminal state is reached.
(Backpropagation) Using the terminal state value obtained from the rollout, update the nodes along the path by incrementing N += 1 and Q += terminal value.

MCTS uses the UCB formula to bias the algorithm toward high-value paths and ensures that node estimates converge to true values through increased iterations.

Monte Carlo Tree Search (MCTS) ​

UCB Formula (Upper Confidence Bounds) ​

Steps ​

Monte Carlo Tree Search (MCTS)

UCB Formula (Upper Confidence Bounds)

Steps