Monte Carlo Tree Search (MCTS)
A decision algorithm suitable for planning in combinatorial game actions. It is essentially a heuristic search method.
The process consists of four steps:
- Tree traversal
- Node expansion
- Rollout (Random simulation)
- Backpropagation
UCB Formula (Upper Confidence Bounds)
- – simulation estimate (cumulative value of the node divided by its visit count)
- – visit count of the parent node
- – visit count of the node itself
The UCB1 formula is commonly used in MCTS.
Steps
Each node in the tree has two values:
Q– cumulative valueN– visit count
- (Selection) Starting from the root node, repeatedly select the child node with the largest UCB value.
- Is the node expandable?
- Yes → Expansion.
- No → Rollout.
- (Expansion) Generate new child node(s), then perform a rollout. Initialize new nodes with
Q = 0; N = 0. - (Rollout) Continue random simulations until a terminal state is reached.
- (Backpropagation) Using the terminal state value obtained from the rollout, update the nodes along the path by incrementing
N += 1andQ += terminal value.
MCTS uses the UCB formula to bias the algorithm toward high-value paths and ensures that node estimates converge to true values through increased iterations.