Do you really mean that "removing impossibility and renormalizing" is *equivalen...

ekidd · on May 2, 2013

Excellent question. Let me unpack a bit. Here's an example from the PDF I linked above:

  fluStatusGivenPositiveTest = do
    fluStatus  <- percentWithFlu 10
    testResult <- if fluStatus == Flu
                    then percentPositive 70
                    else percentPositive 10
    guard (testResult == Pos)
    return fluStatus

In this code, you can read the operator "<-" as "pick a possible value from a probability distribution."

Until we look at the test results, there are four possible "worlds" with the following probabilities:

   7% (Flu, Pos)
   3% (Flu, Neg)
   9% (Healthy, Pos)
  81% (Healthy, Neg)

But once we see the the test result is "Pos", then all those worlds with "Neg" become impossible, giving us:

  7% (Flu, Pos)
  9% (Healthy, Pos)

But our probabilities don't add up to 100% any more. Fortunately, all we need to do is normalize them. That gives us:

  43.75% (Flu, Pos)
  56.25% (Healthy, Pos)

This normalization step is basically the denominator in Bayes rule.

As noted above, the number of worlds grows exponentially with the number of probabilistic choices we make. So if you want to use this on a larger problem, you need to use sampling (run the program N times, tabulate the results) or a particle system (which essentially does the same thing in parallel).

applicative · on May 3, 2013

It's unimportant, but the recently reintroduced 'monad comprehensions' bring out some nice features of this approach:

      fluStatusGivenPositiveTest = [fluStatus | fluStatus <- percentWithFlu 10
                                              , testResult <- testRate fluStatus
                                              , testResult == Pos]
        where testRate Flu     = percentPositive 70
              testRate Healthy = percentPositive 10

apw · on May 2, 2013

Until we look at the coin flip results for a coin known a priori to have only two possible biases a and b, where neither a nor b equal 0 or 1, there are two possible "worlds" with the following probabilities:

  0.5 (a)
  0.5 (b)

But once we see that the first coin flip is tails, which we code as -1, then we update using Bayes rule, giving us:

  p(-1|a)p(a)/p(-1) (a, -1)
  p(-1|b)p(b)/p(-1) (b, -1)

I just don't see any removing of possible worlds here. Am I doing this wrong?

ekidd · on May 2, 2013

Let's take the flu test I used in the grandparent post, and walk through it step by step using the traditional notation.

  P(Flu) = 0.1
  P(not Flu) = 1 - P(Flu) = 0.9
  P(Pos|Flu) = 0.7
  P(Pos|not Flu) = 0.1

  P(Flu|Pos)
    = P(Pos|Flu)P(Flu)
      -------------------------------------------
      P(Pos|Flu)P(Flu) + P(Pos|not Flu)P(not Flu)
    = 0.07 / (0.07 + 0.09)
    = 0.4375

  P(not Flu|Pos) = 0.09 / (0.07 + 0.09) = 0.5625

As you can see, the same numbers show up in both versions of the problem. In the traditional example, the "impossible worlds" are P(Flu|not Pos) and P(not Flu|not Pos), because we know we have Pos. So these two terms never appear in the equation. The division by (0.07 + 0.09) is the same as the normalization step in my example: once we narrow our world down to 0.07 and 0.09, we need to divide by 0.16 so that everything sums back up to 1.

You may need to work through the math in both the Haskell version and the traditional version before it becomes completely obvious that they're exactly the same thing.

mokus · on May 2, 2013

The two a priori conditions can also be viewed as 4 possible worlds by branching on the outcome of the (unobserved because it hasn't happened yet or, if you prefer, after it happens but before you observe it) coin toss:

    P(a)P(heads|a)     (a, heads)
    P(a)P(tails|a)     (a, tails)
    P(b)P(heads|b)     (b, heads)
    P(b)P(tails|b)     (b, tails)

When the coin toss is observed, you are removing the possible worlds in which the coin toss did not have the observed outcome. The renormalization step at that point will yield the same posterior probabilities as Bayes' rule.

In your example, we are eliminating the possible worlds where the flip is heads, leaving a total remaining probability that is precisely p(-1).

apw · on May 3, 2013

Great explanation. Thank you!