How does a GTO poker solver work?

A solver takes a defined spot with stacks, positions, ranges, and bet sizes, then repeatedly plays both ranges against each other. On each pass it nudges every hand toward its most profitable action. After enough passes the strategy stops changing, and that stable point is the game-theory-optimal solution.

What is CFR in a poker solver?

CFR stands for Counterfactual Regret Minimization, the algorithm most solvers use. It tracks how much each action would have gained compared to what was chosen, then shifts future frequency toward the actions it 'regrets' not taking. Over many iterations this converges on the equilibrium.

Why do solver outputs show mixed frequencies?

Because the equilibrium often makes an opponent indifferent between their options, and the only way to keep them indifferent is to play a hand multiple ways at set frequencies. A 60 percent raise, 40 percent call output is the balance that prevents exploitation, not indecision.

Does GTO actually work in real games?

Yes, as a defensive baseline. A GTO strategy cannot be beaten in the long run regardless of how opponents play. It won't punish weak players as hard as a targeted counter-strategy, but it never loses to one, which is why strong players use it as their default.

How Do GTO Poker Solvers Work?

A GTO poker solver works by pitting two ranges against each other over and over, nudging every hand toward its most profitable action on each pass, until the strategy stops changing. That stable point — where neither player can improve by deviating — is the game-theory-optimal solution. This guide opens the black box: what a solver takes in, the algorithm that drives it, and why the answers come out as mixed frequencies rather than fixed plays.

What you feed a solver

A solver can’t compute anything until the spot is fully defined. You give it:

Stack depth — how many big blinds each player has.
Positions — who acts first, who’s in position postflop.
Starting ranges — the set of hands each player begins with.
Bet sizes — which raise and bet amounts are allowed at each decision.

Change any input and the solution changes. A 100bb cash spot and a 20bb tournament spot with the same hole cards produce entirely different strategies. This is why matching the solve to your actual game matters, a point covered in best GTO poker solver.

The core idea: iterate toward equilibrium

Once the spot is defined, the solver’s job is to find the Nash equilibrium — the strategy pair where neither player can gain by changing their own play. It finds it not with a formula but by repetition.

The loop, in plain terms:

Start both players with a rough strategy (often uniform).
Play every hand in each range against every hand in the other, across all boards.
For each decision, see which action would have paid better.
Shift a little frequency toward the better actions.
Repeat — thousands or millions of times.

Each pass, the strategies improve slightly and the gap between “what I did” and “what I should have done” shrinks. When that gap is tiny, the solver has converged.

CFR: the engine under the hood

The algorithm that runs this loop is usually Counterfactual Regret Minimization, or CFR. The name describes exactly what it does:

Counterfactual — it asks, “what if I had taken a different action here?”
Regret — it measures how much better that alternative would have done than what it actually chose.
Minimization — it steers future frequency toward the high-regret actions, so over time the regret for every action drops toward zero.

When no action carries meaningful regret anymore, the player can’t improve, and neither can the opponent. That mutual no-improvement is the equilibrium.

Why the outputs are mixed

The most surprising thing about solver output is that it rarely says “always do X.” Instead it says “raise 62 percent, call 38 percent.” That’s not hedging — it’s the point.

The equilibrium often works by making your opponent indifferent between their options. If you always raised a certain hand, they could exploit that by folding to it. If you always called, they’d exploit that too. The only unexploitable answer is a specific mix. The solver computes the exact frequency that leaves the opponent unable to profit from any single response. That balancing act is the heart of what is GTO poker.

Convergence and exploitability

How does a solver know it’s done? It tracks a number called exploitability, measured in big blinds per 100 hands. It represents how much a perfect opponent could win against the current strategy.

Exploitability	Interpretation
High	Strategy still has clear leaks; keep iterating
Low	Close to equilibrium; nearly unexploitable
Near zero	Converged; a perfect opponent gains almost nothing

A solve is considered “done” when exploitability drops below a small threshold. Run-your-own solvers let you choose how low to push it, trading computing time for accuracy.

The combo math the solver never forgets

A solver weighs every hand by its true combination count, something human players routinely misjudge:

Pocket pairs: 6 combos each.
Suited hands: 4 combos each.
Offsuit hands: 12 combos each.

So when a solve outputs “3-bet A-5 suited as a bluff,” it’s committing only 4 combos of bluff, and it balances that against the combos of value it’s raising. The solver’s frequencies are always computed over these real combo weights — one reason its ranges feel more precise than a hand-picked chart. The pot-odds logic that sets the value-to-bluff ratio comes from the odds and math hub.

A worked example: reading a solve

Say you solve a river spot where you can bet pot or check. The solver outputs: bet with all your strong value hands, bet with a specific slice of missed draws as bluffs, and check the rest.

Why that exact slice of bluffs? Because betting pot lays your opponent 2-to-1, so a break-even caller needs to be right one time in three. The solver picks a bluff frequency that makes the value-to-bluff ratio roughly 2-to-1 — leaving the caller indifferent. It chose which specific hands to bluff by preferring the ones with the best blockers (cards that make the opponent less likely to hold a call). That’s not a memorized answer; it’s what fell out of CFR minimizing regret over real combos.

Does GTO actually work?

Yes — as a baseline. A converged GTO strategy cannot be beaten in the long run, no matter how an opponent plays. What it won’t do is maximally punish a bad player; a targeted exploitative line wins more against a known leak. Strong players use GTO as the default that never loses and deviate only with a clear read. So “does GTO work” has a precise answer: it works as an unexploitable floor, not as the profit-maximizing ceiling.

Common misunderstandings

“A solver looks up answers.” It computes them by iterating a spot to equilibrium every time.
“Mixed frequencies mean the solver is unsure.” The mix is the answer — it’s what keeps the opponent indifferent.
“Solver output is universal.” Every solution is tied to its exact inputs; change stacks or ranges and it changes.
“GTO guarantees you win each session.” It guarantees you can’t be beaten long-term, not that variance disappears.

Wrapping up

A GTO solver defines a spot, then uses Counterfactual Regret Minimization to iterate two ranges against each other until neither can improve, tracking exploitability to know when it’s converged. The mixed frequencies it outputs are deliberate balance, and its precision comes from weighing every hand by true combos. Understand the process and solver output stops looking like a magic answer key and starts looking like what it is — computed balance. Build the foundation with what is GTO poker, compare tools in best GTO poker solver, and return to the preflop strategy hub to apply it.

How Do GTO Poker Solvers Work?

What you feed a solver

The core idea: iterate toward equilibrium

CFR: the engine under the hood

Why the outputs are mixed

Convergence and exploitability

The combo math the solver never forgets

A worked example: reading a solve

Does GTO actually work?

Common misunderstandings

Wrapping up

Frequently asked

About the author

What you feed a solver

The core idea: iterate toward equilibrium

CFR: the engine under the hood

Why the outputs are mixed

Convergence and exploitability

The combo math the solver never forgets

A worked example: reading a solve

Does GTO actually work?

Common misunderstandings

Wrapping up

Frequently asked

About the author

Related articles

More in Preflop Strategy & Ranges