Mostly in order to practice the Mastodon interface, I'm going to reproduce one of my posts from the now-defunct Google+ service (on probability paradoxes, first posted Sep 11, 2014) as a thread here. 🧵

Probability and statistics abounds with many fallacies, "paradoxes", and otherwise unintuitive statements, often caused by the presence of various sampling biases that might not be immediately obvious without some mathematical (or similarly careful) analysis. Some of my favorite examples of these phenomena are Simpson's paradox, Berkson's fallacy, and the friendship paradox (all of which can be found for instance on Wikipedia). 1/n

One such unintuitive fact (basically a special case of Simpson's paradox) is the following: the number of failures that a person experiences in a field can be positively correlated with how competent that person is in that field: the more failures a person generates, the more likely that person is to be competent! 2/

Let's see how. For sake of discussion, let's say that the field is book writing, and that there are only two types of book one can produce: good books (successes) and bad books (failures). Let's also say for simplicity that authors fall into two classes: competent writers, who produce good books 3/4 of the time and bad books 1/4 of the time, and incompetent writers, who produce good books only 1/4 of the time and bad books 3/4 of the time. 3/n

Intuitively, one would then expect that an author who has written a lot of bad books is more likely to be an incompetent author than an author who has written very few bad books. But this is not necessarily the case! The problem is that there is a confounding variable, namely the number of books each author writes. 4/n

This variable is partly determined by internal author traits, such as the author's persistence, but is also influenced by the external success of the author's works, and this can be enough to flip the correlation. 5/n

To make a concrete numerical illustration of this, let us consider four types of authors, who are either competent or incompetent, or persistent or not persistent.

1. A competent, not persistent author might write just 4 books (say) during his or her career, of which 3 are good and 1 is bad.

2. An incompetent, not persistent author might also write just 4 books in his or her career, of which 1 is good and 3 are bad.

6/n

3. A competent, persistent author might write as many as 40 books over his or her career, of which 30 are good and 10 are bad.

4. An incompetent, persistent author might aspire to write 40 books in his or her career; however, faced with frequent negative reviews and poor sales, he or she may eventually quit after writing only 8 books, of which 2 are good and 6 are bad.

7/n

With this simple example, one sees that it is the competent author who ends up writing the most bad books; also, out of all the bad books published by the above four authors, the majority were written by competent authors rather than incompetent ones. Because of the biasing effect of discouragement of persistent, incompetent authors, the fact that an author has a large number of bad books to his or her name becomes evidence in favor of competent writing, rather than incompetent writing! 8/n

It is important though to note that correlation does not imply causation: the above analysis does not indicate that one can become a better writer by intentionally producing bad books. (See also "Goodhart's law".) Similarly, the above analysis does not mean that a writer can cite his or her large number of badly written books as evidence of competence, as one has to take self-selection bias into account in this case. END

Follow

@tao every person who historically confused correlation with causation has since ended up dead. I think we all know what that means.

@ColinTheMathmo @tao oh god, it's not mine, I don't claim to be that smart. I saw it somewhere and liked it.

Searching Google suggests I paraphrased twitter.com/silvervvulpes/stat

@Geoff Oh, cool, thanks for tracking that down. I'll have a look and see what seems to be the right balance.

Lots of these clever, pithy sayings have many variants over time as the evolve into a "best version". Attributions in these cases can be hard, and sometimes not really justified.

I'll check.

Cheers!

@tao

Sign in to participate in the conversation
mastodon.cloud

Everyone is welcome as long as you follow our code of conduct! Thank you. Mastodon.cloud is maintained by Sujitech, LLC.