METHODOLOGY
HOW WE PREDICT
THE WORLD CUP
A machine learning ensemble trained on 35,000+ international matches, combining team strength ratings, squad quality data, and historical form to simulate the entire tournament 10,000 times.
THE THREE SIGNALS
The model combines three independent sources of team strength. Each captures something the others miss.
A chess-inspired rating system adapted for international football. Every team starts with a base rating that goes up after wins and down after losses — weighted by opponent strength and match importance. Teams that consistently beat strong opponents rise to the top.
30.2% of model importancePlayer ratings from EA Sports FC (formerly FIFA) video games. Professional scouts rate every player’s pace, shooting, passing, defending, and physicality on a 1–99 scale. These are aggregated into squad-level metrics — overall strength, positional depth, and top-player quality.
31.2% of model importanceRecent win rate, goals scored and conceded, head-to-head record between the two teams, tournament importance (World Cup vs friendly), home advantage, and momentum — whether a team is on a winning or losing streak heading into the match.
38.6% of model importanceTHE MODEL
Instead of one model making all decisions, this uses an ensemble — four models that each learn different patterns, then vote on the outcome.
Gradient-boosted decision trees. Learns complex non-linear patterns — like how ELO difference matters more in knockout rounds than group stages. Handles missing data natively.
500 independent decision trees that each see a random subset of features. Provides stability and reduces overfitting — when XGBoost is too confident, the Random Forest pulls predictions back toward reality.
FROM WIN PROBABILITY TO SCORELINES
The model predicts the probability of home win, draw, or away win. But to simulate a tournament, you need actual scorelines. Here's how the model bridges that gap.
The ensemble outputs three probabilities for each match. Example: France vs Brazil → 45% home win, 28% draw, 27% away win.
Using a precomputed Poisson grid, the model finds the goal-scoring rates (λ) that best reproduce those probabilities. If France has a 45% win chance, it finds λ values where Poisson-generated scores give ~45% home wins.
Random goals are drawn from the Poisson distribution — so the same match might be 2-1 in one simulation and 0-0 in another. Over 10,000 runs, the randomness averages out and the true probabilities emerge.
WHAT I DISCOVERED
Building this model taught me things about football prediction that weren't obvious at the start.
ELO ratings are calculated from match results. Teams in weaker regions (like CONCACAF or AFC) accumulate inflated ratings by beating other weak teams. Mexico and Japan appeared as realistic contenders in early models — anyone who watches football knows that’s wrong. Adding scout-assessed player ratings from EA FC broke the echo chamber.
EA player ratings only exist from 2014, but the training data goes back to 1884. I tried training only on modern data (6,000 matches with full features) vs all data (35,000 matches with gaps). The larger dataset won every time — old matches still teach the model about ELO patterns, form, and home advantage.
When the model says ‘70% chance of winning,’ does it actually happen 70% of the time? That’s calibration. For Monte Carlo simulation, calibration matters more than getting individual matches right — you want the dice to be fair, even if you can’t predict every roll. This model achieves ECE of 0.018 (near-perfect calibration).
I tried feeding the model 44 individual squad attributes (pace, shooting, defending for each team). But simple difference features — ‘how much better is Team A’s attack than Team B’s defense?’ — performed equally well with far less noise. When 82% of training rows have missing squad data, simpler is better.
Goals per game have dropped from 5.5 to 2.7 over 140 years. Home advantage has shrunk from 40% to 23%. The game has fundamentally changed — tactics are more defensive, away teams are better prepared, and international squads are more balanced. The model is trained on all eras but learns to weight recent patterns more heavily.
Both ELO and EA FC ratings correctly rank France above Norway and Brazil above Bolivia. That’s the easy 83%. The real value of adding squad ratings is the 17% where they disagree — teams like Mexico and Japan whose ELO is inflated by beating weak opponents, but whose player ratings reveal they lack the individual quality to compete with Europe and South America’s best.
MODEL ACCURACY
Tested on 3,313 matches from 2023–2024 that the model never saw during training, and backtested against the 2022 World Cup.


2022 WORLD CUP BACKTEST
I ran 10,000 simulations of the 2022 Qatar World Cup to see how the model would have performed. It correctly identified Argentina as a top contender with 25.3% championship probability.
Behind only Spain (25.5%) — correctly identified as a top-2 contender
Top 3 prediction — correctly flagged as a finalist-caliber team
Teams that were unrealistically ranked in early models got corrected:
Adding EA squad ratings — assessed by scouts, not match results — corrected the inflated rankings of teams from weaker confederations.