METHODOLOGY

HOW WE PREDICT
THE WORLD CUP

A machine learning ensemble trained on 35,000+ international matches, combining team strength ratings, squad quality data, and historical form to simulate the entire tournament 10,000 times.

THE THREE SIGNALS

The model combines three independent sources of team strength. Each captures something the others miss.

01ELO Ratings

A chess-inspired rating system adapted for international football. Every team starts with a base rating that goes up after wins and down after losses — weighted by opponent strength and match importance. Teams that consistently beat strong opponents rise to the top.

30.2% of model importance

02EA FC Squad Ratings

Player ratings from EA Sports FC (formerly FIFA) video games. Professional scouts rate every player’s pace, shooting, passing, defending, and physicality on a 1–99 scale. These are aggregated into squad-level metrics — overall strength, positional depth, and top-player quality.

31.2% of model importance

03Form, History & Context

Recent win rate, goals scored and conceded, head-to-head record between the two teams, tournament importance (World Cup vs friendly), home advantage, and momentum — whether a team is on a winning or losing streak heading into the match.

38.6% of model importance

THE MODEL

Instead of one model making all decisions, this uses an ensemble — four models that each learn different patterns, then vote on the outcome.

× 3 COPIESXGBoost

Gradient-boosted decision trees. Learns complex non-linear patterns — like how ELO difference matters more in knockout rounds than group stages. Handles missing data natively.

× 1 COPYRandom Forest

500 independent decision trees that each see a random subset of features. Provides stability and reduces overfitting — when XGBoost is too confident, the Random Forest pulls predictions back toward reality.

97Input Features

35,304Training Matches

1884–2024Training Period

10,000Tournament Sims

FROM WIN PROBABILITY TO SCORELINES

The model predicts the probability of home win, draw, or away win. But to simulate a tournament, you need actual scorelines. Here's how the model bridges that gap.

STEP 1Predict Outcome Probabilities

The ensemble outputs three probabilities for each match. Example: France vs Brazil → 45% home win, 28% draw, 27% away win.

STEP 2Reverse-Engineer Goal Rates

Using a precomputed Poisson grid, the model finds the goal-scoring rates (λ) that best reproduce those probabilities. If France has a 45% win chance, it finds λ values where Poisson-generated scores give ~45% home wins.

STEP 3Simulate Scorelines

Random goals are drawn from the Poisson distribution — so the same match might be 2-1 in one simulation and 0-0 in another. Over 10,000 runs, the randomness averages out and the true probabilities emerge.

WHAT I DISCOVERED

Building this model taught me things about football prediction that weren't obvious at the start.

THE ECHO CHAMBER PROBLEMWin-based ratings inflate weak-region teams

ELO ratings are calculated from match results. Teams in weaker regions (like CONCACAF or AFC) accumulate inflated ratings by beating other weak teams. Mexico and Japan appeared as realistic contenders in early models — anyone who watches football knows that’s wrong. Adding scout-assessed player ratings from EA FC broke the echo chamber.

THE COVERAGE TRADE-OFFMore matches > better features

EA player ratings only exist from 2014, but the training data goes back to 1884. I tried training only on modern data (6,000 matches with full features) vs all data (35,000 matches with gaps). The larger dataset won every time — old matches still teach the model about ELO patterns, form, and home advantage.

CALIBRATION > RAW ACCURACYA well-calibrated model matters more for simulation

When the model says ‘70% chance of winning,’ does it actually happen 70% of the time? That’s calibration. For Monte Carlo simulation, calibration matters more than getting individual matches right — you want the dice to be fair, even if you can’t predict every roll. This model achieves ECE of 0.018 (near-perfect calibration).

LEAN BEATS COMPLEX9 difference features outperform 44 detailed features

I tried feeding the model 44 individual squad attributes (pace, shooting, defending for each team). But simple difference features — ‘how much better is Team A’s attack than Team B’s defense?’ — performed equally well with far less noise. When 82% of training rows have missing squad data, simpler is better.

FOOTBALL HAS EVOLVEDA 3-0 win in 1920 is not the same as a 3-0 win today

Goals per game have dropped from 5.5 to 2.7 over 140 years. Home advantage has shrunk from 40% to 23%. The game has fundamentally changed — tactics are more defensive, away teams are better prepared, and international squads are more balanced. The model is trained on all eras but learns to weight recent patterns more heavily.

THE 17% THAT MATTERSELO and EA ratings agree 83% of the time — the value is where they disagree

Both ELO and EA FC ratings correctly rank France above Norway and Brazil above Bolivia. That’s the easy 83%. The real value of adding squad ratings is the 17% where they disagree — teams like Mexico and Japan whose ELO is inflated by beating weak opponents, but whose player ratings reveal they lack the individual quality to compete with Europe and South America’s best.

MODEL ACCURACY

Tested on 3,313 matches from 2023–2024 that the model never saw during training, and backtested against the 2022 World Cup.

0.018Expected Calibration ErrorNear-perfect probability accuracy

1.3%Maximum Prediction BiasHome win slightly overestimated

0.826Log LossLower is better — measures prediction confidence

Calibration curves showing predicted vs actual probabilities

2022 WORLD CUP BACKTEST

I ran 10,000 simulations of the 2022 Qatar World Cup to see how the model would have performed. It correctly identified Argentina as a top contender with 25.3% championship probability.

CHAMPION — ARGENTINARanked #2 — 25.30% win probability

Behind only Spain (25.5%) — correctly identified as a top-2 contender

RUNNER-UP — FRANCERanked #3 — 12.59% win probability

Top 3 prediction — correctly flagged as a finalist-caliber team

BIAS CORRECTION VALIDATED

Teams that were unrealistically ranked in early models got corrected:

Mexico6.28% → 0.40%−5.88%

Senegal6.76% → 1.18%−5.58%

Japan6.27% → 0.29%−5.98%

Australia5.19% → 0.11%−5.08%

Adding EA squad ratings — assessed by scouts, not match results — corrected the inflated rankings of teams from weaker confederations.