Measuring the reliability of investment forecasts: the Brier score
The Brier score measures the gap between what a forecaster announced and what reality delivered. Devised in 1950 for weather forecasting, it applies to any prediction expressed as a probability, investment convictions included. Here is how it works, where it comes from, and what changes when a family takes it seriously.
What the Brier score measures
A probabilistic forecast does not say “the market will go up”. It says: “there is a 70% chance the S&P 500 ends the year above its current level”. The Brier score measures the gap between that announced probability and what reality eventually decided.
Its simplest form reads BS = 1/N Σ (fₜ − oₜ)², where fₜ is the probability announced for event t, and oₜ its outcome: 1 if it happened, 0 if it did not. The score runs from 0, the perfect forecast, to 1, absolute certainty proven entirely wrong. Lower is better.
Its cardinal virtue: it punishes misplaced confidence. Announcing 90% and watching the event happen costs (0.9 − 1)² = 0.01. Announcing 90% and watching it fail costs (0.9 − 0)² = 0.81. Eloquence protects nothing: the stronger the conviction, the dearer the error.
Since the work of Allan Murphy, statisticians distinguish two qualities of a forecaster within the score: calibration, do your 70% calls come true 70% of the time, and resolution, do you dare sharp probabilities rather than hiding near 50%. A good forecaster holds both.
Born of weather, adopted by judgement research
Glenn W. Brier published his measure in 1950 in the Monthly Weather Review. Meteorologists were among the first professions required to publish falsifiable predictions every day, and therefore among the first that had to measure their own accuracy honestly.
Half a century later, the psychologist Philip Tetlock made it the benchmark of forecasting research. His study Expert Political Judgment followed 284 experts and more than 80,000 predictions over twenty years: the average expert barely beat chance. Then the Good Judgment Project, winner of the IARPA forecasting tournaments between 2011 and 2015, demonstrated the opposite: ordinary but disciplined forecasters, the superforecasters, durably outperformed intelligence professionals. In both cases the referee was the same: the Brier score.
Applying it to investment decisions
Nothing in the formula is specific to weather. It only demands three disciplines, which are precisely the ones most investment processes lack.
- Record before the verdict. A forecast is logged the moment it is made: dated, quantified, with a horizon and an unambiguous resolution criterion. “The market will correct” cannot be scored; “the CAC 40 will close 2026 below 8,000 points” can.
- Confront without exception. Every forecast that reaches its deadline is resolved, not only the ones remembered fondly. Selection bias is the first falsification of a track record.
- Update over time. The score only means something in series: it separates skill from luck as forecasts accumulate.
That is what an investment decision journal is for: a register where each conviction is recorded with its reasons, its price and its horizon, then confronted with reality. Kept seriously, it turns a memory of anecdotes into accuracy data.
What it changes for family wealth
A track record measures outcomes, in which the quality of judgement, position sizing and luck are entangled. The Brier score isolates the first component, forecast by forecast. The question is no longer “how much did this call return” but “who was right, when, how often”.
For wealth designed to last several generations, that memory is worth as much as the assets. People pass, eloquence fades. A hierarchy of demonstrated accuracy is transmitted, and with it a simple allocation rule: capital follows the convictions that have already proven right.
At Verdoso
Cassandra, the platform Verdoso built for its own wealth, keeps this journal natively: every forecast is recorded the moment it is made, confronted with the markets month after month, and 100% of the analysts followed carry a continuously updated Brier score. Discover Cassandra.
What counts as a good Brier score?
How is a Brier score different from a track record?
Can market forecasts that are not binary be scored?
Why must a forecast be recorded the moment it is made?
- Glenn W. Brier, “Verification of forecasts expressed in terms of probability”, Monthly Weather Review, vol. 78, no. 1, 1950.
- Allan H. Murphy, “A New Vector Partition of the Probability Score”, Journal of Applied Meteorology, vol. 12, 1973.
- Philip E. Tetlock, Expert Political Judgment: How Good Is It? How Can We Know?, Princeton University Press, 2005.
- Philip E. Tetlock and Dan Gardner, Superforecasting: The Art and Science of Prediction, Crown, 2015.
These notes describe methods. They are neither investment advice nor an offer of services.