How to Program a Poker Solver
How to program a poker solver: build a game tree, then run counterfactual regret minimization (CFR) so the strategy converges toward a GTO equilibrium.
On this page · 6 sections
Picture the smallest interesting poker game: Kuhn poker, a three-card deck where each player is dealt one card and can check, bet, call, or fold. It has a handful of decision points and a known optimal strategy. If you can write a program that discovers that strategy on its own, you have written a poker solver — the exact same code, pointed at a vastly larger tree, is what solves Hold’em. So the task splits cleanly into two parts: model the spot as a game tree, then run counterfactual regret minimization (CFR) over it until the strategy stabilizes into a GTO equilibrium. Everything below builds those two pieces.
Building the game tree
The tree is the map of the spot. Every position in the game is one of three kinds of node:
- Decision node — a player must choose an action.
- Chance node — a card is dealt (or the deck deals a hole card).
- Terminal node — the hand is over and the payoffs are fixed.
Programming it means writing a recursive structure that, from any state, can list the legal actions and produce the resulting child state. For Kuhn that tree fits in memory with room to spare. For Hold’em it does not — the full tree is astronomically large — so real engines abstract it: bucket strategically similar hands together and permit only a few bet sizes per node instead of the continuum a human could choose from. That abstraction is the same simplification described in how a solver works internally, and how aggressively you abstract is the engineering trade-off that most shapes both your solver’s speed and its accuracy.
Tracking regret at each decision node
At every decision node you keep two arrays, one slot per legal action:
- Regret sum — cumulatively, how much better off you’d have been had you always taken that action.
- Strategy sum — the running total of the strategies you actually played, so you can average them at the end.
The strategy the solver uses right now comes from regret matching: give each action a probability proportional to its positive regret. Actions you regret not taking get played more often; actions with no positive regret get zero weight. That rule, applied everywhere, is the engine that pushes the whole tree toward equilibrium.
The CFR iteration
A single CFR iteration walks the entire tree. At each decision node it:
- Computes the current strategy from the positive regrets.
- Recurses into each action to find that action’s counterfactual value.
- Adds, to each action’s regret sum, the gap between that action’s value and the node’s expected value.
- Adds the current strategy into the strategy sum.
You repeat this for thousands to millions of iterations. The output you keep is the average strategy over all iterations — never the last one.
Watching CFR converge on one node
Numbers make this concrete. Take a single decision node with two actions, bet and check. Suppose across three iterations the counterfactual values (in chips, relative to the node’s average) land like this:
| Iteration | Bet value | Check value | Regret added: bet | Regret added: check |
|---|---|---|---|---|
| 1 | +2 | 0 | +2 | 0 |
| 2 | +1 | 0 | +1 | 0 |
| 3 | 0 | +1 | 0 | +1 |
Cumulative regret is now bet = 3, check = 1. Regret matching splits play in proportion to positive regret, so this node plays bet 3/4 of the time and check 1/4 of the time. Run that at every node on every iteration and the whole tree drifts toward a strategy no opponent can profitably attack. That is the complete algorithm in one node — the production version is only this, repeated across millions of nodes at high speed.
Making it fast enough for Hold’em
A correct Python prototype is far too slow to solve Hold’em in any reasonable time. Real engines close the gap with several techniques stacked together:
- A compiled core. The iteration loop moves to C++ or Rust; the setup and tree-building can stay in Python.
- CFR+. A variant that floors regrets at zero and converges dramatically faster than vanilla CFR — usually the first optimization to add.
- Monte Carlo sampling. Instead of walking the whole tree each iteration, sample a path through it, so each pass is cheap and you do many more of them.
- Tight memory layout. Regret and strategy arrays for a large abstracted tree can consume many gigabytes, so how you pack them in RAM directly limits how big a spot you can solve.
If a full solver feels like too much for a first project, an equity calculator in Python uses far simpler math and is an excellent warm-up for the recursive, tree-walking mindset CFR demands.
Proving the solver actually works
Convergence bugs are silent. A broken CFR loop still emits a strategy — just the wrong one — so it will happily hand you confident, incorrect output. Validate before you trust anything:
- Solve Kuhn poker and check the game value. Kuhn has a published analytical solution and a fixed value. If your engine reproduces those numbers, the CFR machinery is correct.
- Track exploitability. After each batch of iterations, compute how much a best-response opponent could win against your current strategy. A healthy solver drives that number toward zero; a curve that stays flat means the loop isn’t learning.
- Confirm you’re averaging. If the output looks noisy or unstable, verify you’re reading the strategy-sum average rather than the last iteration’s strategy.
Only after a toy game solves cleanly should you scale the abstraction and bet sizes up toward something Hold’em-shaped.
The two skills, then, are modeling a spot as a game tree and running CFR until the average strategy settles. A toy solver is genuinely within reach in a weekend; a full Hold’em engine is a real performance project defined mostly by how cleverly you abstract and store that tree. It helps to understand what a GTO solver is and the equilibrium it targets before you write a line — the theory tells the code what to aim for. More build-your-own projects live in the tools & software hub.
Frequently asked
What algorithm do poker solvers use?
Almost all use counterfactual regret minimization (CFR) and its variants like CFR+ and Monte Carlo CFR. CFR plays the game against itself, tracks how much it regrets not taking each action, and shifts future play toward higher-regret actions until the strategy stops improving.
What language should I write a poker solver in?
Prototype in Python because it's fast to reason about, then port the hot loop to C++ or Rust for speed. Real solvers are performance-bound, so the production engine is almost always compiled, sometimes with tree-building kept in a higher-level language.
Is building a poker solver hard?
A toy solver for Kuhn poker is a weekend project once CFR clicks. A full No-Limit Hold'em solver is a major engineering effort because the game tree is enormous and must be abstracted, stored efficiently, and iterated billions of times.
What is a game tree in a solver?
It's the branching map of every possible sequence of actions and chance events in the modeled spot. Each node is a decision point for a player or a card deal, and the solver assigns and refines a strategy at every decision node.
How do I know my solver is correct?
Test it on Kuhn poker, which has a known game value and analytical solution. If your engine reproduces those numbers and its exploitability falls toward zero across iterations, the CFR loop is sound. A flat exploitability curve signals a bug.