How Does a Poker Solver Work Inside?
A poker solver builds a game tree, plays it against itself with regret minimization until neither side can improve, and outputs the equilibrium as
On this page · 4 sections
That trial-and-error process is the answer to how a poker solver works. It builds a game tree of every action both players could take in a spot, plays that tree against itself thousands of times while nudging both sides toward whatever performed better, and reports the stable end point — the Nash equilibrium — as frequencies. Nothing is looked up from a chart; the answer is computed from your inputs each time. Four parts make the machine run.
The four parts
- The game tree — a map of every legal action sequence from the spot you define to the end of the hand.
- The ranges — the hands each player can hold, weighted by how often.
- The learning algorithm — the rule that shifts strategy toward better choices each pass, usually from the CFR family.
- The convergence check — the measure that tells the solver when to stop.
Drop any one and there’s no solve. Supply all four and the solver grinds toward equilibrium on its own.
Building the tree, then playing it against itself
You start by defining the spot: effective stacks, board, and — critically — the bet sizes you allow. From that the solver builds the tree, where each branch is an action and each root-to-leaf path is one complete way the hand plays out. Branches multiply fast, so solvers restrict the allowed sizings rather than permitting every possible amount. Fewer sizings means a smaller tree and a faster solve. That deliberate simplification is called abstraction, and it’s one reason your choice of sizings shapes the answer you get back.
Then the algorithm takes over. It starts both players with arbitrary strategies and iterates:
- Each pass, it walks the tree and asks at every decision point whether a different action would have done better against the opponent’s current strategy.
- It records that “regret” — the value missed by not choosing the better action.
- It shifts each side toward the actions it regretted skipping, in proportion to that regret.
Repeat thousands of times and both strategies drift to the point where neither regrets anything, because any deviation would cost value. That regret-tracking method is counterfactual regret minimization, the engine inside virtually every modern solver.
Knowing when it’s done
The solver needs a stopping signal, and it uses exploitability: how much a perfectly adjusting opponent could win against the current strategy. As iterations pile up, exploitability falls toward zero.
| Exploitability | What it means |
|---|---|
| High | Early, rough strategy — don’t trust it |
| Moderate | Directionally right, still refining |
| Low | Converged — safe to study |
When exploitability is low enough, extra iterations barely move the strategy. That’s convergence, and it’s your signal the output can be trusted. Stop too early and you’re reading a half-solved, misleading answer.
Reading the output
The solve finishes as mixed frequencies, not commands. A hand might bet 60 percent of the time and check 40 percent. That isn’t indecision — it’s the whole point. If a hand always did one thing, an opponent could adjust to punish it; mixing at the equilibrium rate keeps it unexploitable.
Which is exactly why the mechanism matters for study. Your inputs are load-bearing, so sloppy ranges mean a clean solution to the wrong problem. Convergence isn’t optional — read the exploitability figure before you trust a number. And study why a hand mixes rather than memorizing the exact split, because the reasoning transfers to thousands of spots the number won’t. For the plain-language overview, start with what a GTO solver is; for the full study routine built on this, see how to use a poker solver and drill the postflop spots where it pays off most.
Frequently asked
What algorithm do solvers use?
Most use an algorithm in the counterfactual regret minimization family, usually called CFR. It tracks how much each side 'regrets' not taking other actions and steadily shifts play toward the choices it regretted skipping, converging on equilibrium.
What is convergence in a solver?
Convergence is the point where extra iterations barely change the strategy. Solvers measure it with exploitability — how much a perfect opponent could win against the current strategy. A low number means the solve is trustworthy.
Why do solvers output mixed frequencies?
Because at equilibrium some hands must take different actions at set rates to stay unexploitable. A hand might bet 60 percent and check 40 percent of the time, since always doing one or the other would let an opponent adjust profitably.