Superforecasting: The Art and Science of Prediction

Amazon link

written by Phillip Tetlock:

  • Wharton professor
  • famous for his analysis (which formed the basis of his 2006 book Expert Political Judgement) that showed that 'expert' political pundits couldn't predict geopolitical events better than random chance
  • A WSJ article cast Superforecasting as: "The most important book on decision making since Daniel Kahneman’s Thinking, Fast and Slow.

Like many other great books, Superforecasting isn't simply a 'one-off' book. No, it is a continuation of Tetlock's lifework in the field of prediction-making, the central hypothesis of which is that "predicting things is hard, but it's a developable skill." The difficulty of prediction is partially attributed to the biases we all perpetrate in our daily lives and also to a culture of making unmeasurable predictions.

Quick Harvard Business Review summary of research results | Tetlock's 10 Commandments for prediction making

If you didn't read the above, the points I gleaned:

  • Today's predictions/forecasts are hard to judge as correct/wrong, e.g. Steve Ballmer's 2007 prediction that "There's no chance that the iPhone is going to get any significant market share. No chance." What did he mean by significant, which market, how much of a chance constitutes no chance?
  • In addition to this, because we associate presumed accuracy of prediction with confidence, we celebrate pundits that push one-sided arguments, and but can't check what their track record is because they generally don’t exist; we like foxes not hedgehogs
    • This doesn't help them either because without feedback how can they know where to improve; feedback is key in TFAAT (try, fail, analyze, adjust , try again)
  • One of the most common mistakes to make when it comes to prediction is falling for your own bait & switch, which involves substituting the real question at hand for another more accessible one and answering that instead -- watch out 
  • Personality traits such as open-mindedness (and grit) are far more important than raw intelligence -- three-times a better predictor for being a 'superforecaster'
    • Explained by ability to simultaneously hold multiple contradictory views
  • 'Wisdom of the crowds', if you average the predictions of a ton of people, and then extremize them (bump 0.6->0.9 and drop 0.4->0.1), you'll get more accurate predictions than experts
    • Due to every member of the crowd having different information
  • The most important questions today may be complex and seemingly indeterminate, but they can be broken down into more manageable clusters

One of Tetlock's more interesting claims is that not only can good probabilistic thinking be taught, but that a 60-minute lesson in common statistical fallacies can lead to 10% better prediction accuracy (see his 10 commandments).

Tetlock's work with trying to understand why humans are so bad at thinking probabilistically resulted in a long-term partnership with Daniel Kahneman (which in turn lead me to this book). Setting the problem in the context of behavioral psychology, the 'three-setting mental dial' -- i.e. seeing things as 70% as one of Yes/No/Maybe instead of 70% = it will rain on 70% of days if our forecasts are good and subject a certain degree of variation -- that Tetlock deems necessary to think probabilistically is a result of evolution: indecision wasn't rewarded for proto-humans.

The meat of this book is based on results from the ongoing Good Judgement Project (GJP) that Tetlock co-chairs which initially began in 2011 with funding from IARPA (the intelligence community version of DARPA). The IARPA-funded project fielded 5 teams from around the world, but after 2 years GJP did so much better than its academic competitors that IARPA dropped the other teams.

Tetlock goes to great lengths to analyze why the participants in the GJP were so successful despite not being professional geopolitical analysts nor even being paid (a lot) for it. They were average Joes, mostly; they displayed above average traits of intelligence and numeracy, but nothing superhuman or extraordinary. One common trait he recognized amongst them was need for cognition, a personality variable which reflects the degree to which an individual pursues effortful cognitive activities. As mentioned earlier, open-mindedness was another such trait. Additionally, he observed that participants that updated their predictions more often had higher Brier scores (more in a bit), which underscores a crucial point: predictions are made in snapshots of time with temporarily static information -- they should be updated with the release of new information, though they seldom are.

Another central pillar of Tetlock's book is the need for accountability for prediction: we need to be able to keep track so that we can eschew the bad forecasters and stop giving them TV time and book deals. Failure to do so leads us to a similar suboptimal outcome as medieval medicine before evidence-based treatments: bad doctors who keep practicing because (hint: confirmation bias):

"All who drink of this treatment recover in a short time, except those whom it does not help, who all die. It is obvious, therefore, that it fails only in incurable cases." - Galen, ~200AD physician whose ideas endured

It's notoriously difficult, however, to do so. Outside of a competition context, not only are people aren't used to making discrete predictions (5% increased risk) with deadlines (by January 10), they aren't incentivize to do so either: why risk being called out in the future for wrong calls. Although in the GJP Brier scores were used to rank participants, it is important to note that the questions were unambiguous and the answers were formatted. Anyways, Brier scores go from 0 (perfect accuracy) to 1, and can be modified to (i) incentivize accountable confidence by rewarding / penalizing, correct / wrong extreme predictions (10-30% and 70-90%), and (ii) address the importance of false negatives over false positives, e.g. for security/terrorism.

Finally, a nicely fleshed out example of bait & switch / automatic emotional responses that Tetlock explains is the following question:

"Will either the French or Swiss inquiries find elevated levels of polonium in the remains of Yasser Arafat's body?"

Yasser Arafat being the former political leader of Palestine who died mysteriously in 2004. Tetlock points out that many people in answering this will try to answer the question "Was Arafat poisoned?", but that's not what's being asked. What's specifically being asked is instead:

  • Did Israel have polonium
  • Did Israel wanted Arafat dead badly enough to take a risk
  • Did Israel have the ability to poison Arafat with polonium
  • If Arafat was poisoned with polonium would it still be detectable (when the investigation was occurring)

Big difference!