What is Bayes’ Theorem and Bayesian reasoning?

It’s a helpful way to teach yourself how to react appropriately to new information.

Introduction

Let’s say you strongly believe something to be true. For example, I might have a pile of 100 coins, and I tell you that 99 of them are fair but one is rigged. When you randomly draw a coin from the pile, you’re pretty sure the coin isn’t rigged — you’re 99% sure, in fact. You decide to flip it a few times anyway. The first time it lands on heads. The second time? Heads. The third time? Heads again. And the fourth time? Heads!

There are a few different ways you could process this information.

  1. You could ignore what you’ve just seen. Everything you knew before flipping the coin is still valid, so you still think there’s a 99% chance of it being a fair coin — the four heads in a row is just a fluke.
  2. You could decide definitively that the coin is rigged. If the coin was fair, the odds of seeing what you’ve just seen are very low: 50% * 50% * 50% * 50% = 6.25%. There’s a much higher chance of seeing four heads in a row if the coin is rigged, so you’re now sure that the coin is rigged.
  3. You could take the middle ground. Everything you knew before flipping the coin is still valid — there was a much higher chance of you picking a fair coin than a rigged one — so you’re still pretty sure the coin is fair. But you acknowledge that what you just saw would be very unlikely if the coin were fair, so you’re a bit less confident in its fairness than before — instead of being 99% confident, you’re now only, say, 86% confident.

Option 3 strikes me, and many other people, as the most sensible way of thinking. It’s less of an all-or-nothing approach, considering both what you knew beforehand and what you’ve just learnt. Helpfully, it’s also pretty easy to apply in day-to-day life. It’s called Bayesian reasoning.

Why is Bayesian reasoning useful?

We have a tendency to either under- or over-react to new pieces of evidence. The example above, where one person ignored the evidence altogether because of their prior beliefs and another let the evidence completely overrule their prior beliefs, was extreme. But without realising, you’ve probably done something similar before.

Consider the last time you read about a murder. I’d bet it made you a fair bit more scared of being murdered, even though statistically one additional murder doesn’t meaningfully change the overall likelihood of being murdered. Being more scared is a completely human reaction, but sometimes it’s helpful to step back and remember what your beliefs were before you learnt about the new event.

Similarly, we all know people who are entrenched in their views and refuse to change their minds, even when presented with important new evidence. Being open to having your mind changed is really important.

Approaching situations with a Bayesian mindset is a good way to overcome this. It helps you ensure that your beliefs change with the evidence, and that your beliefs are grounded in everything you know, not just the thing that’s come to mind most recently.

Decoding Bayesian jargon

Bayesians — people who employ Bayesian reasoning — like to use jargon to simplify how they talk about things. So before we jump in any further, let’s learn the jargon.

When we approach a situation, we often go in with a prior belief, often shortened to just “prior”. In the above example, your prior was your 99% certainty that the coin isn’t rigged. You have a whole range of priors on all sorts of topics, at all sorts of confidence levels: you might be 25% confident that aliens exist, or 75% confident that technology will help solve climate change. (Priors can be objective probabilities, as in the example where you know how many coins are rigged, or they can be your subjective estimates of something’s likelihood. We’ll discuss this more shortly…)

Every so often, you’re confronted by new evidence. In the above example, that evidence was seeing the coin land on heads four times in a row.

If you’re a Bayesian, you then use the evidence to update your priors. In the example above, this was the process where you acknowledged that the evidence you just saw was unlikely, so you reduce your confidence in the coin’s fairness a little bit.

Bayes’ Theorem: how to apply this in practice

In most situations, all you need to do is remember that Bayesian reasoning exists. When you encounter a new piece of evidence, X, remind yourself of your prior belief, think about how likely it is that you’d see X if your prior belief was true, and think about how likely it is you’d see X in general, regardless of whether your prior belief was true or not. Taking all these things into account will help you “update” your priors.

If you want to really be an expert, though, you can learn Bayes’ Theorem. Bayes’ Theorem is the mathematical representation of what we’ve just talked about: it tells you just how much to update your priors by. The theorem looks like this:

Where:

  • P(A|B) means the probability of A happening, given B (the | means ‘given’ in probability notation).
  • P(B|A) means the probability of B happening, given A.
  • P(A) means the probability of A happening.
  • P(B) means the probability of B happening.

It looks complicated, but I promise you that it’s simpler than it seems. Let’s apply it to our coin example. We’re trying to figure out the probability of the coin being fair, given that it showed four heads. First, we need to think about how likely it is that we’d get four heads if the coin was fair. Then we need to multiply that by the probability of the coin being fair, regardless of the evidence. Then we need to divide the whole thing by the probability of getting four heads, regardless of whether the coin is fair or not.

Okay, now let’s go back to the equation. Calculating the probability of the coin being fair (A), given that it showed four heads (B) is super easy.

Where:

  • P(A|B) means the probability of A happening, given B.
  • P(B|A) means the probability of B happening, given A.
  • P(A) means the probability of A happening.
  • P(B) means the probability of B happening.

P(B|A) — the probability of getting four heads if the coin was fair — is 50% * 50% * 50% * 50%, or 6.25% — the value we calculated earlier.

P(A) — the probability of the coin being fair — is your prior. In this case, you thought there was a 99% chance of the coin being fair. 

P(B) — the probability of getting four heads — is a little trickier to calculate. We think coins are fair 99% of the time, so 99% of the time we think there’s a 6.25% chance of getting four heads. But 1% of the time the coin is rigged — and if the coin is rigged, it has a 100% chance of getting four heads. So to calculate the overall probability of getting four heads, we add those two scenarios together: (99%*6.25%) + (1%*100%) = 7.1875%

So putting all of that together: 

  • P(A|B)= (6.25% * 99%)/(7.1875%)
  • P(A|B)= 86%

The real magic of this formula is that the more “surprising” the evidence, the more your prior gets updated. Instead of seeing four heads in a row, for instance, let’s say you saw ten. That’s very unlikely to happen if the coin is fair (it’s a 0.09% likelihood). And if we apply Bayes’ theorem with that new value, we get P(A|B)= 8.8%. In this case, the smart thing to do would be to assume the coin is rigged.

An example of when this is useful

Bayesians often like to point to cancer tests as a way to demonstrate the method’s value. Let’s say you’re about to get a test for breast cancer. On your way to the clinic, you do some googling, and you learn that only 1% of women have breast cancer. You also learn that people that do have cancer test positive only 80% of the time. Finally, you learn that people that don’t have cancer still test positive 9.6% of the time. 

The next day, you receive a positive test result for breast cancer. You’re probably freaking out! But should you? Bayes’ theorem can help you out. Let’s pull up the equation again:

Where:

  • P(A|B) means the probability of A happening, given B.
  • P(B|A) means the probability of B happening, given A.
  • P(A) means the probability of A happening.
  • P(B) means the probability of B happening.

In this case, we want to figure out the probability that you have cancer (A), given a positive test result (B).

P(B|A), the probability of getting a positive test result given that you do have cancer, is 80%: people that do have cancer test positive 80% of the time.

P(A), the probability of having cancer, is 1%.

P(B) is the overall probability of having a positive result. Again, we can break it down into:

The 1% of people who do have cancer get a positive result 80% of the time; and the 99% of people who don’thave cancer get a positive result 9.6% of the time. Putting that together gives us:

  • P(B) = (1%*80%)+(99%*9.6%)
  • P(B) = 10.304%

So putting everything together:

  • P(A|B)= (80%*1%)/10.304%
  • P(A|B) = 7.76%

That means you’ve got a 7.76% chance of having cancer. That’s not nothing, obviously, but it’s not an astronomically high percentage either. And chances are, it’s a lot lower probability than you expected before doing the maths — and a lot less scary.

Use with caution

Bayesian reasoning and Bayes’ theorem is a super useful way of thinking. But we quite often don’t have real, calculated numbers to plug into the theorem. And the theorem’s results are only as good as the numbers you put into it.

Instead of picking a coin from a pile where I know one in 100 is rigged, I might be trying to figure out if a coin I got from the bank is fair, given that it showed heads four times in a row. Bayes’ Theorem requires me to put my prior in — the probability of a coin from a bank being fair. What number should I put in here? I’d guess 99.9%, but that isn’t a particularly educated guess — I have no idea how common rigged coins are, or how good banks are at spotting them.

We can imagine other examples where I don’t know what numbers to put into Bayes’ Theorem. Let’s say tomorrow we find a colony of bacteria on Mars, and I want to revise my estimates for the probability of intelligent life on Mars. To do that, I’d need to know the probability of finding bacteria if there was intelligent life, my prior probability for there being life on Mars, and the overall probability of there being bacteria on Mars. I don’t know any of those numbers, and even if I was to guess there’s a very good chance that my guesses would be off by an order of magnitude.

In all of these cases, I could make up some numbers, plug them into the theorem, and confidently present the theorem’s output as an “accurate” probability. But the theorem’s output would in fact be close to meaningless, because the inputs are all made up.

It’s very easy to hide our uncertainty using numbers. By assigning a numerical probability to something, we can trick ourselves and others into thinking that the probability is “objective” and “correct”, even if it has very little basis in reality. The only way we can actually learn about reality is to do what scientists have always done: investigate, learn, and test hypotheses. Even when we do this, our priors will often be subjective — influenced by what we’ve chosen to investigate, our methods for investigation, and how much weight we put on the conclusions.

Bayes’ Theorem isn’t a replacement for the knowledge-seeking process; it’s just a helpful way to synthesise the knowledge you discover in the knowledge-seeking process. But as long as you remember that, it can be a very useful tool for making sure your beliefs take into account new evidence.

Sources and further reading

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s