The Incerto Trilogy (excl. Bed of Procrustes) - Nassim Taleb
The Briefest: actionable advice:
- an event’s impact is more important that its calculated impossibility, prepare for the seemingly impossible and question your model (the world is complex and impossible to predict anyway—you only think you can because you can safely predict certain things)… but it only takes one “impossible event” to sink the ship
- consider the silent evidence, i.e. the events outside your sample (swimmer’s body: you purport that swimming->good body, but your sample excludes the swimmers who don’t have nice bodies)
- consider the sample size: 10 people would “beat the market” for 10 years with a 40% survival rate if the pool started with 100,000 investors… adopt the monkey who wrote Shakespeare if he was the only one given a typewriter
- no map is better than a wrong map, i.e. avoid naive interventionism and the need to always have an explanation: the Bell Curve/other ‘statistical science’ + your illusory superiority might have you convinced that an event is a non-possibility (like Nobel-laureate-run hedge fund LTCM’s idea that 1998 was a 10sigma event, link)
- understand that even if you get everything Taleb is saying, there’s a high probability you won’t be able to apply it because knowledge tends to be domain-specific
- address your fragility, and build antifragility: don’t predict the tsunami, make your buildings tsunami proof; don’t eliminate human greed, make your system greed-proof
- options and the ability to exercise them, optionality, are essentially Antifragile (more to gain than lose from volatility) and are the opposite of path-dependency, so Antifragile behavior requires constant re-evaluation and internalizing of new information
- consider the possibility of an agency problem: ensure that policy setters have ‘skin in the game’, or something to lose if they’re wrong; consider that currently, money managers have perfect optionality and that their antifragility comes at the cost of their shareholders’ fragility
A note on the whole series
Taleb’s multi-decade career on Wall Street as a trader and subsequent transition to statistician and probabilistic thinker uniquely positions him as a medium between the abstruse “science” behind statistics and the layperson.
Often confused as a manual on becoming a better trader, The Incerto Series is meant as a sort of re-education on probability, a counter-intuitive introduction to seeing probability not as a form of precise mathematical science, but a perspective, a mindset of applied skepticism which he urges us all to adapt. (Although not too successfully as he notes that if everyone succeeded, he would be out of a job).
> Fooled By Randomness: understanding probability is not about computation, and that there is fundamental uncertainty
> The Black Swan: narrative fallacy, confirmation bias
> Antifragile: systems that benefit from volatility, fragility can be calculated
An exercise in extreme erudition, Taleb includes conversations and thoughts shared with eminent thinkers who span the whole gamut of disciplines (Kahneman, economist; Terry Burnham, evolutionary psychology; Mandelbrot, mathematics); contributes many witty aphorisms and thoughtful anecdotes; disparages scientists, thinkers, and intellectuals who pompously declare certainty where there is none (the quest for which he notes is a trend Descartes started); and ridicules journalists and anyone who takes them too seriously. Buckle up. (Although initially amusingly derisive, Taleb’s tone becomes tedious, sounding condescending at best and cringe-worthingly acerbic at worst.
Fooled By Randomness
In Brief: beware reliance in inductive reasoning
Probability is a matter of knowledge, of questioning your assumptions, rather than one of computation. Today, economists and risk management specialists drown us in indecipherable maths. However, what use is their 99.99% confidence interval if their model is built on shaky foundations? Case in point, look at LTCM and their proclamation that the event that sunk them was a “ten sigma” (1 in 10^9) event. Wittgenstein’s ruler applies here: unless you have confidence in the ruler’s reliability, if you use a ruler to measure a table you may also be using the table to measure the ruler.
Induction, and our misplaced reliance on it, is the key problem here. Most of the time it works, which is the problem because it fails us when it matters most: Black Swan events – difficult to predict, low probability, high impact events. Taleb suggests that we use this inductive-explanation shortcut because when we think probabilistically, we use our limbic emotional brain over our cognitive neocortex. He also includes Kahneman’s work on biases and heuristics in the discussion.
Bertrand Russell's chicken analogy is a nice example of the problems inherent in inductive reasoning:
"Domestic animals expect food when they see the person who usually feeds them. We know that all these rather crude expectations of uniformity are liable to be misleading. The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken…The mere fact that something has happened a certain number of times causes animals and men to expect that it will happen again. Thus our instincts certainly cause us to believe that the sun will rise to-morrow, but we may be in no better a position than the chicken which unexpectedly has its neck wrung."
Expectation (outcome x probability) is far more important than simply mapping probabilities, and somebody who understands this would protect themselves by buying insurance against Black Swans and similar events in a world where there are fat tails (kurtosis); and it’ll pay off in the long run. Robustness > predictive abilities
Survivorship bias means you shouldn’t be too impressed by a fund that beats the market for 30 years if… 30 years ago there were many funds in the market. Conversely, if it was the only fund in the sample and it beat the market, buy into it. (One monkey on a typewriter producing Shakespeare means something, one from an infinite sample doesn’t) Understanding survivorship bias means understanding that models should account for catastrophe-scale events despite their absence in existing data sets simply because such events are necessarily absent. This is also an inductive issue: history isn’t made until it is.
The Black Swan
In Brief: feedback loops create a complex environment we shouldn’t pretend to be able to predict, buy insurance for what the Bell Curve says is impossible, because it probably isn’t
Black Swans (invention of the Internet and our existence to name a few) dictate history, and understanding them requires understanding that today’s society is increasingly complex, characterized by interdependent events and feedback loops. However, most of us mistake this environment for a linear one because confirmation bias (selecting evidence that fits our narrative (also a fallacy)) has us perpetrate the “ludic fallacy”: seeing linear instances as proof that nonlinear ones are linear as well. We are thus lent false confidence which is augmented by our inherent epistemic arrogance (thinking we know more than we really do) leading to the paradox where a person’s beliefs are strengthened in the face of contradictory information.
Complex systems are unpredictable; think butterfly effect. Thus whether an event is deterministic but impossible to predict or truly random, it doesn’t matter -- we can’t predict either. Yet, we continue to model complex systems, and we can get some degree of accuracy as long as we use a fractal model, and not a Gaussian bell-curve. While the bell-curve works in many instances, it quickly reaches an upper ceiling (asymptote) beyond which outcomes are considered impossible. Reality, however, is far different and the high impact stuff is all pre-asymptote.
E.g. in the U.S. the mean height is 69.1 inches and the standard deviation is 2.9 inches. The tallest man was 107 inches, or 13 standard deviations above the mean, which the bell curve would tell us is a 1 in well… a really big number (7 SDs = 1/390,000,000,000).
On the other hand, conditional of a 2SD deviation in height, the average deviation is 2.37SD. So it’s a reasonable approximation for ordinary events. But in a nonlinear world, conditional expectation of an increase in a random variable doesn’t converge: conditional of a stock loss of 100 units, the average loss is 250 units.
Often, a single observation represents 90% of the fat tail. In the stock market: the ten most extreme days in the financial market (S&P500) represented half of the returns.
In Brief: fragility/antifragility is easier to calculate than a statistical relationship; Antifragile systems look unprofitable from the outside because they suffer constant losses, but in the long-run they fare the best because they are prepared for Black Swan events; align interests and build systems from the bottom-up, not the other way round
Taleb invents the word Antifragile because the concept of something which benefits from stress doesn’t exist in the English language. Yet such a phenomenon exists all around us, and even in us: he goes to great lengths to discuss how the human body does better when expressed to stress: Wolf’s Law, bones strengthen after being broken; muscle fibers expand after being torn; the body responds better to sporadic starvation than routine feeding.
Robustness is the closest we have to Antifragile, but it doesn’t go far enough. Perfect robustness might be characterized by the phoenix. Antifragility would be the hydra.
Addressing fragility, and building antifragility is a better strategy than maxing upside with a fragile strategy because when a negative black swan comes you’ll lose more than all the gains you made thus far. Practically and philosophically speaking, losses and mistakes are more informative than successes because they tell you where your flaws are and they are more definitive (success might be due to a confluence of other reasons). Thus avoiding them makes you more fragile, and prone to severe mistakes. So building an Antifragile system is all about allowing randomness, volatility, risk-taking, and failures, but distributing them across a large number of small fallible units. The role of connectivity and responsiveness cannot be ignored either, they are needed to prevent mistakes being repeated.
- Mistakes need to happen… requires non-intervention
- Mistakes need to be contained… requires small independent units with little risk of contagion
- Mistakes need to be learnt from… requires path-independence
Practice, unbiased and uncorrupted (by commercial motives), tinkering, and failing is what has led to most theoretical discoveries. (Taleb references an interesting novel At Home by Bill Bryson [Amazon link] which convincingly traces many pre-modern discoveries to the independent clergyman). E.g. cooking recipes are developed purely through experiments, not theorizing, while the jet engine and 99.99% of modern drugs are the result of empirical research. So concentrating on theory as academia does today bears little fruit.
More on Fooled by Randomness
Generator and result
Taleb is very well-known for his skepticism of successful traders: “Most city hotshots are lucky fools”, something he delves into in Chapter 2. Using Russian roulette as an analogy of the generator (the underlying, inherently random processes that define our lives) behind reality, he describes how our myopic fixation on results precludes our ability to see the generator behind the scenes, and how the hotshot trader, simply put, placed more bullets in the chamber for a greater jackpot. Subsequent survivorship bias means that we only see the successes, which we rationalize after the fact with explanations such as “skill” since the part of the brain that deals with probability is actually the emotional part.
The Russian roulette analogy is apt because it applies to us all. Reality, Taleb claims, is far more vicious, because the revolver fires so rarely that we forget it’s even loaded. In this way, we’ve been fooled by randomness. The first step to probabilistic thinking is to see the successful trader for what he really is: an outlier because the number of alternative histories where someone with such a strategy lost it all far outweigh the ones where he was successful. Very long random sample paths average out to equal each other (ergodicity), in Taleb’s words.
More on survivorship bias: if an infinite number of monkeys wrote on typewriters, one would eventually rewrite Shakespeare’s entire Folio word for word. But would you then make a bet on that very same monkey being able to do it again? No? Then how can we expect fund managers to be able to repeat their performance when:
you haven’t seen the number of other monkeys (fund managers who failed),
and you fail to understand the degree of randomness involved in his work
Evolution and randomness
People have come to abuse the true meaning of Darwin’s theories. Believing that organisms reproduce on a one-way route to perfection, they fail to see how randomness plays a part in the process and how in the “short-run” (which might span millennia), individuals who are not reproductively fit may reproduce and survive. Scientifically such individuals might be said to have negative mutations (or to just be lucky). Fortunately, ergodicity (extending time to infinity) + true Darwinian theory wipes these out.
Example: put two people in a casino. One will do better than the other. To a naïve observer, that individual has a survival advantage over the other. Say he’s taller, then the observer will identify this as the explanation for his survival advantage.
The bell curve has almost universal application in a wide range of fields. Its symmetrical shape and the fact that outliers in most situations can be safely ignored (the professor and weather forecaster removes them before coming up with their averages) is perhaps why we’ve come to assume all other probability distributions to also be symmetrical, which has made it very difficult for us to distinguish between probability and expectation. (Expectation = probability x outcome). Understanding this distinction means that you wouldn’t repeatedly take a bet with a 99% chance of success of winning 99c if there was a 1% chance of losing $100. (You might take it once, but too many times, and ergodicity means you’ll end up losing)
Black swans, falsification, and Solon
John Stuart Mill first raised the black swan problem:
No amount of observations of white swans can allow the inference that all swans are white, but the observation of a single black swan is sufficient to refute that conclusion.
This, combined with our natural tendency to resort to induction/inference (as opposed to logically sound deduction; the Problem of Induction is elaborated on in my summary of Philosophy The Basics); and the essence of finance which is that there is a lot of information, but no ability to conduct experiments leads us to inevitably come to conclusions which are logically flawed.
Further: just because a market has never gone done 20% in 3 months, doesn’t mean it will never go down 20% in 3 months. It takes just one black swan to falsify the statement that “all swans are white”, whereas an infinite number of white swans will never prove the statement
The material above is all part of the first chapter, aptly named “Solon’s Warning” after the story of Croesus, King of Lydia and the richest man of his day, and Solon, the wise Greek. Croesus, expecting the answer to be him, asked Solon if he was not the happiest man of all. Solon answered: “The observation of the numerous misfortunes that attend all conditions forbids us to grow insolent upon our present enjoyments, or to admire a man’s happiness that may yet, in course of time, suffer change. For the uncertain future has yet to come, with all variety of future; and him only to whom the divinity has [guaranteed] continued happiness until the end we may call happy.”
Yogi Berra puts it more succinctly: “it ain’t over until it’s over.” Or similarly, you’ll never knowblack swans exist until you see one.
Why are we so bad at thinking probabilistically?
Case in point:
- OJ Simpson trial: it was argued that only 10% of men who beat their wives go onto murder them, a probability unconditional of the murder… but we need to deal with conditional probability, the probability that OJ killed his wife having beaten her (50%)
- A test for a disease has a 5% of resulting in a false positive. The disease affects 1/1000 people. A random person is tested positive. What’s the probability of him having the disease?
- not 95%, but actually 2%: out of 1000 people, 1 would actually have the disease, but out of the remaining 999, ~50 would falsely test positive. The chance of said person having the disease is (diseased)/(all positive tests) and thus ~2%
- when asked whether people would prefer to buy flood insurance in California or for North America, respondents chose the former option even though North America includes California
Much of this irrationality is covered exhaustively by Daniel Kahneman and the new field of behavioral economics, whose theoretical consequences mount a very robust challenge to neoclassical economics' Homo economus model. Here Taleb offers the theories of Herbert Simon and Terry Burnham to provide an explanation. Simon, a Nobel Laureate economist turned computer scientist, posited that it would cost too much energy to optimize every decision in life and therefore we make approximations. We stop deliberating when we reach a near-satisfactory solution/explanation, “satisficing”. Burnham, offering the evolutionary psychology side of things suggests that we don’t, and can’t, think probabilistically simply because we never needed to on the plains of the African savanna.
Taleb offers more food for thought in Damasio’s Descartes’s Error and LeDoux’s Emotional Brain. The first revolves around the fictional surgical removal of a subject’s emotional brain. Subsequently the subject is unable to make a decision. (Mathematical explanation being that no computer faced with optimizing a multi-variable situation, can come to a solution (see Buridan’s Donkey). LeDoux’s reasoning is as convincing: the connections from the body’s emotional systems to the cognitive systems are stronger than the reversal: i.e. we feel emotions (limbic) then rationalize (neocortex).
More on Black Swan
We’re not hopeless. It was shown from studies of infant behavior that we come equipped with mental machinery that causes us to selectively generalize from experiences (i.e. to selectively acquire inductive learning in some domains but remain skeptical in others). That’s why we don’t think somebody is immortal simply because we’ve never seen them die. But while inductive reasoning may hold for phenomena such as life expectancy, it fails in extreme events: war, earthquakes, market crashes.
The truncation of information is the solution humans use for the problem of endless information and limited neural infrastructure. Narratives, i.e. stories, with causative relationships are easy to remember – why do most books/films follow the same story?
This and our tendency to “platonify”, or see things in terms of studied materials and conceptualized ideals blinds us to the randomness around us.
I asked you to think of the butterfly effect earlier right? It’s not ludicrous: Berry in 1978 showed that by the 56th impact of billiard balls on a table, every elementary particle in the universe needs to be included in the calculations of the balls’ future positions… It is in this context that Taleb introduces the idea of metaprobability or the uncertainty of our uncertainty. This is vital: certainty is dangerous, and often times we prefer false certainty over well-founded uncertainty.
“Some people would rather use a map of the Pyrenees while lost in the Alps than use nothing at all”
In a linear world, things converge; but not so in a non-linear one.
If you purport A->B, you must also show that a lack of A doesn’t also cause B, but such evidence is often less researched – though more informative – and thus presents the problem of silent evidence
Cicero presented the following story: Diagoras was shown painted tablets bearing the portraits of worshippers who prayed, then survived a subsequent shipwreck. Diagoras asked, “Where are the pictures of those who prayed, then drowned?”
This is an extension to the confirmation bias idea presented earlier in that the silent evidence is the sample size you cannot see: the monkeys who just produced manuscripts of gibberish.
So applying this we can debunk common misconceptions of “beginner’s luck” and “swimmer’s body”, simply ask yourself where are the unlucky beginners or the swimmers without good bodies? Well, not in the sample you’re observing.
Domain specificity: our intuitions depend on context—SAT-smart people can be really dumb in real life
More on Antifragility
Antifragile systems are often built of smaller fragile units e.g. evolution depends on the mortality and implicitly risky nature of species mutation, economic progress depends on the risk-taking and high failure rate of entrepreneurs, while an empire of semi-autonomous regions does better than a centralized nation-state – top-down systems are often fragile: look at the great Soviet experiment.
So good systems are set up to have small, contained errors and to learn from them, for example the airlines industry, where after every accident a major industry review is conducted and recommendations implemented. In the same line of thinking somebody who has never made any mistakes is less reliable than one who has made many mistakes (but never the same one twice).
- restaurants take risks and often blow up but those that survive constitute the robust F&B ecosystem
- individual species go extinct, but the “fittest” species survive thereby passing on their genes
- forests have small fires to purge flammable material before it accumulates to catastrophic levels
Some pics from his "Triad" table, which view certain things from different angles across the fragile spectrum, e.g. in the bottom row: all press is good press for artists such as authors while your average middle-manager would get fired for any bad publicity and the average worker is relatively insulated/secure
Fragility and antifragility simply denote concave (more to lose than gain) and convex (more to gain than lose) functions respectively (orange line = neutral outcome):
Although calculating the probability of x itself, the event/variable, is impossible, f(x), the exposure or payoff (impact x probability) to an event x, is easier to map and also far more useful. E.g. x might be the intensity of an earthquake while f(x) is the number of people that die from it; x is the number of meters somebody falls while f(x) is the damage he sustains from a fall of height x.
Whether the payoff is fragile or antifragile to volatility can then be calculated through derivative analysis over a range of x values… for example if it positively reacts more than proportionally to increases in x then it is antifragile:
On robustness vs antifragility
Vaccinations are the practical application of robustification, while hormesis, the idea that a small dose of a harmful substance is good for the organism is characteristic of antifragility. A German toxicology noted that small doses of poison stimulate the growth of yeast while larger doses caused harm.
Epiphenomenal beliefs are when you observe A&B occurring and thus believe that there is a causal relationship between the two. For example the mistaken belief that education leads to wealth. (Empirical studies show that the direction of causality is actually reversed)
Religion and us
Modernity is the trend of humans assuming the agency of making improvements by stifling volatility while blindly believing in the religion of science. So, in fact, theistic religion was good for us in the sense that it removed our confidence in our ability to “stabilize” the world by attempting to remove inherent volatility, something which destabilized it.
“Lucretius problem”, coined by Taleb after the Latin philosopher who wrote that the fool believes that the tallest mountain in the world will be equal to the tallest one he has observed. History isn’t made till it is, by which time it might be too late.
Beware halo effects, which are the opposite of domain dependence, halo effects occur when you imagine somebody to be good at A because he’s good at B, i.e. you imagine a good talker is also a good do-er.