Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you really mean that "removing impossibility and renormalizing" is equivalent to Bayes Rule?

For example, imagine that we know a priori that a coin has only two possible biases, both of which are neither 0 nor 1. When I use Bayes Rule to update my belief in the two biases in response to observed data, I don't see how I am removing any impossibilities. It seems much more like I am downweighting the unlikely bias and upweighting the likely bias.



Excellent question. Let me unpack a bit. Here's an example from the PDF I linked above:

  fluStatusGivenPositiveTest = do
    fluStatus  <- percentWithFlu 10
    testResult <- if fluStatus == Flu
                    then percentPositive 70
                    else percentPositive 10
    guard (testResult == Pos)
    return fluStatus
In this code, you can read the operator "<-" as "pick a possible value from a probability distribution."

Until we look at the test results, there are four possible "worlds" with the following probabilities:

   7% (Flu, Pos)
   3% (Flu, Neg)
   9% (Healthy, Pos)
  81% (Healthy, Neg)
But once we see the the test result is "Pos", then all those worlds with "Neg" become impossible, giving us:

  7% (Flu, Pos)
  9% (Healthy, Pos)
But our probabilities don't add up to 100% any more. Fortunately, all we need to do is normalize them. That gives us:

  43.75% (Flu, Pos)
  56.25% (Healthy, Pos)
This normalization step is basically the denominator in Bayes rule.

As noted above, the number of worlds grows exponentially with the number of probabilistic choices we make. So if you want to use this on a larger problem, you need to use sampling (run the program N times, tabulate the results) or a particle system (which essentially does the same thing in parallel).


It's unimportant, but the recently reintroduced 'monad comprehensions' bring out some nice features of this approach:

      fluStatusGivenPositiveTest = [fluStatus | fluStatus <- percentWithFlu 10
                                              , testResult <- testRate fluStatus
                                              , testResult == Pos]
        where testRate Flu     = percentPositive 70
              testRate Healthy = percentPositive 10


Until we look at the coin flip results for a coin known a priori to have only two possible biases a and b, where neither a nor b equal 0 or 1, there are two possible "worlds" with the following probabilities:

  0.5 (a)
  0.5 (b)
But once we see that the first coin flip is tails, which we code as -1, then we update using Bayes rule, giving us:

  p(-1|a)p(a)/p(-1) (a, -1)
  p(-1|b)p(b)/p(-1) (b, -1)
I just don't see any removing of possible worlds here. Am I doing this wrong?


Let's take the flu test I used in the grandparent post, and walk through it step by step using the traditional notation.

  P(Flu) = 0.1
  P(not Flu) = 1 - P(Flu) = 0.9
  P(Pos|Flu) = 0.7
  P(Pos|not Flu) = 0.1

  P(Flu|Pos)
    = P(Pos|Flu)P(Flu)
      -------------------------------------------
      P(Pos|Flu)P(Flu) + P(Pos|not Flu)P(not Flu)
    = 0.07 / (0.07 + 0.09)
    = 0.4375

  P(not Flu|Pos) = 0.09 / (0.07 + 0.09) = 0.5625
As you can see, the same numbers show up in both versions of the problem. In the traditional example, the "impossible worlds" are P(Flu|not Pos) and P(not Flu|not Pos), because we know we have Pos. So these two terms never appear in the equation. The division by (0.07 + 0.09) is the same as the normalization step in my example: once we narrow our world down to 0.07 and 0.09, we need to divide by 0.16 so that everything sums back up to 1.

You may need to work through the math in both the Haskell version and the traditional version before it becomes completely obvious that they're exactly the same thing.


The two a priori conditions can also be viewed as 4 possible worlds by branching on the outcome of the (unobserved because it hasn't happened yet or, if you prefer, after it happens but before you observe it) coin toss:

    P(a)P(heads|a)     (a, heads)
    P(a)P(tails|a)     (a, tails)
    P(b)P(heads|b)     (b, heads)
    P(b)P(tails|b)     (b, tails)
When the coin toss is observed, you are removing the possible worlds in which the coin toss did not have the observed outcome. The renormalization step at that point will yield the same posterior probabilities as Bayes' rule.

In your example, we are eliminating the possible worlds where the flip is heads, leaving a total remaining probability that is precisely p(-1).


Great explanation. Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: