What counts as a good Brier score?

0 is perfection, 1 is absolute certainty proven wrong, and answering 50% to everything locks in 0.25. In practice a score is not judged in isolation: forecasters are compared with one another, on the same questions over the same periods.

How is a Brier score different from a track record?

A track record measures outcomes, in which judgement, position sizing and luck are entangled. The Brier score isolates the accuracy of judgement, forecast by forecast, regardless of the amounts at stake.

Can market forecasts that are not binary be scored?

Yes. A question with several outcomes is split into ranges, each receiving a probability. What matters most lies elsewhere: an unambiguous resolution criterion, set before the deadline, that determines what actually happened.

Why must a forecast be recorded the moment it is made?

Because memory rewrites. After the fact, everyone remembers having seen it coming: that is hindsight bias. Only a time-stamped, quantified, falsifiable forecast can be honestly confronted with reality. That is what a decision journal is for.

Brier score: measuring the reliability of investment forecasts

What the Brier score measures

A probabilistic forecast does not say “the market will go up”. It says: “there is a 70% chance the S&P 500 ends the year above its current level”. The Brier score measures the gap between that announced probability and what reality eventually decided.

Its simplest form reads BS = 1/N Σ (fₜ − oₜ)², where fₜ is the probability announced for event t, and oₜ its outcome: 1 if it happened, 0 if it did not. The score runs from 0, the perfect forecast, to 1, absolute certainty proven entirely wrong. Lower is better.

Its cardinal virtue: it punishes misplaced confidence. Announcing 90% and watching the event happen costs (0.9 − 1)² = 0.01. Announcing 90% and watching it fail costs (0.9 − 0)² = 0.81. Eloquence protects nothing: the stronger the conviction, the dearer the error.

Since the work of Allan Murphy, statisticians distinguish two qualities of a forecaster within the score: calibration, do your 70% calls come true 70% of the time, and resolution, do you dare sharp probabilities rather than hiding near 50%. A good forecaster holds both.

Born of weather, adopted by judgement research

Glenn W. Brier published his measure in 1950 in the Monthly Weather Review. Meteorologists were among the first professions required to publish falsifiable predictions every day, and therefore among the first that had to measure their own accuracy honestly.

Half a century later, the psychologist Philip Tetlock made it the benchmark of forecasting research. His study Expert Political Judgment followed 284 experts and more than 80,000 predictions over twenty years: the average expert barely beat chance. Then the Good Judgment Project, winner of the IARPA forecasting tournaments between 2011 and 2015, demonstrated the opposite: ordinary but disciplined forecasters, the superforecasters, durably outperformed intelligence professionals. In both cases the referee was the same: the Brier score.

Applying it to investment decisions

Nothing in the formula is specific to weather. It only demands three disciplines, which are precisely the ones most investment processes lack.

Record before the verdict. A forecast is logged the moment it is made: dated, quantified, with a horizon and an unambiguous resolution criterion. “The market will correct” cannot be scored; “the CAC 40 will close 2026 below 8,000 points” can.
Confront without exception. Every forecast that reaches its deadline is resolved, not only the ones remembered fondly. Selection bias is the first falsification of a track record.
Update over time. The score only means something in series: it separates skill from luck as forecasts accumulate.

That is what an investment decision journal is for: a register where each conviction is recorded with its reasons, its price and its horizon, then confronted with reality. Kept seriously, it turns a memory of anecdotes into accuracy data.

What it changes for family wealth

A track record measures outcomes, in which the quality of judgement, position sizing and luck are entangled. The Brier score isolates the first component, forecast by forecast. The question is no longer “how much did this call return” but “who was right, when, how often”.

For wealth designed to last several generations, that memory is worth as much as the assets. People pass, eloquence fades. A hierarchy of demonstrated accuracy is transmitted, and with it a simple allocation rule: capital follows the convictions that have already proven right.

At Verdoso

Cassandra, the platform Verdoso built for its own wealth, keeps this journal natively: every forecast is recorded the moment it is made, confronted with the markets month after month, and 100% of the analysts followed carry a continuously updated Brier score. Discover Cassandra.

These notes describe methods. They are neither investment advice nor an offer of services.