July 2015 – Analog Logic in the Digital Age

One of the most curious features of Bayesian inference is the non-intuitive conclusions that can result from innocent looking observations. A case in point is the well-known issue with mandatory drug tests being administered in a population that is mostly clean.

For the sake of this post, let’s assume that there is a drug, called Drugg, that is the new addiction on the block and that we know from arrest records and related materials that about 7 percent of the population uses it. We want to develop a test that will detect the residuals in a person’s bloodstream thus indicating that the subject has used Drugg within some period time (e.g. two weeks) prior to the administration of the test. The test will return a binary result with a ‘+’ indicating that the subject has used Drugg and a ‘-‘ indicating that the subject is clean.

Of course, since no test will be infallible, one of our requirements is that the test will provide an acceptably low percentage of cases that are either miss detections or false alarms. A missed detection occurs when the subject uses Drugg but the test fails to return a ‘+’. Likewise, a false alarm occurs when the test returns a ‘-‘ but the subject is clean. Both situations present substantial risk and potentially high costs, so the lower both percentages can be made the better.

In order to develop the test, we gather 200 subjects for clinical trials; 100 of them are known Drugg users (e.g. they were caught in the act or are seeking help with their addiction) and the remaining 100 of them are known to be clean. After some experimentation, we have reached the stage where the 99 percent of the time, the test correctly returns a ‘+’ when administered to a Drugg user and 95 percent of the time, it correctly returns a ‘-‘ when administered to someone who is clean. What are the false alarm and missed detection rates?

This is where Bayes theorem allows us to make a statistically based inference and one that is usually surprising. To apply the theorem, we need to be a bit careful with notation so let’s first define some additional notation. A person who belongs to the population that uses Drugg will be denoted by ‘D’. A person who belongs to the population that is clean will be denoted by ‘C’. Let’s summarize what we know in the following table.

Description	Symbol	Value
Probability of a ‘+’ given that the person is C	P(+\|C)	0.05
Probability of a ‘-’ given that the person is C	P(-\|C)	0.95
Probability of a ‘+’ given that the person is D	P(+\|D)	0.99
Probability of a ‘-’ given that the person is D	P(-\|D)	0.01
Probability that a person is C	P(C)	0.93
Probability that a person is D	P(D)	0.07

There are two things to note. First the results of our clinical trials are all expressed as conditional probabilities. Second, the conditional probabilities for disjoint events sum to 1 (e.g. P(+|D) + P(-|D) = 1 since a member of D, when tested, must result in either a ‘+’ or a ‘-‘).

In the population as a whole, we won’t know to which group the subject belongs. Instead, we will administer the test and get back either a ‘+’ or a ‘-‘ and from that observation we need to infer to what group the subject is most likely to belong.

For example, let’s use Bayes theorem to infer what the missed detection probability, P(D|-) (note the role-reversal between ‘D’ and ‘-‘). Applying the theorem we get

\[ P(D|-) = \frac{ P(-|D) P(D) }{ P(-) } \; . \]

Values for P(-|D) and P(D) are already listed above, so all we need is to get P(-) and we are in business. This probability comes is obtained from the formula

\[ P(-) = P(-|C) P(C) + P(-|D) P(D) \; . \]

Note that this relationship can be derived from $P(-) = P(- \cap C ) + P(- \cap D)$ and $P(A \cap B) = P(A|B) P(B)$. The first formula says, in words, that the probability of getting a negative from the test is the probability of either getting a negative and the subject is clean or getting a negative and the subject uses Drugg. The second formula is essentially the definition of conditional probability.

Since we’ll be needing them P(+) as well, let’s compute them both now and note their values.

Description	Formula	Symbol	Value
Probability of a ‘+’ given that the person either is in C or D	\[ P(+) = P(+\|C) P(C) + P(+\|D) P(D) \]	P(+)	0.1158
Probability of a ‘-’ given that the person either is in C or D	\[ P(-) = P(-\|C) P(C) + P(-\|D) P(D) \]	P(-)	0.8842

The missed detection probability is

\[ P(D|-) = \frac{ P(-|D) P(D) }{ P(-) } = \frac{ 0.01 \cdot 0.07 }{ 0.8842 } = 0.0008 \; . \]

So things are looking good and we are happy. But our joy soon turns to perplexity when we compute the false alarm probability

\[ P(C|+) = \frac{ P(+|C) P(C) }{ P(+) } = \frac{ 0.05 \cdot 0.93 }{ 0.1158 } = 0.4016 \; . \]

This result says that around 40 percent of the time, our test is going to incorrectly point a finger at a clean person.

Suppose we went back to our clinical trials and came out with the second version of the test where nothing had changed except P(-|C) had now risen from 0.95 to 0.99. As the figure below shows, the false alarm rate does decrease but still remains very high (surprisingly high) when the percentage of the population using Drugg is low.

The reason for this is that when the percentage of users in the population is small in order to get the missed detection rate low we have to do it at the expense of a greater percentage of false alarms. In other words, our diligence in finding Drugg users has made us overly suspicious.

In the last column, the basic inner workings of Bayes theorem were demonstrated in the case where two different random variable realizations (the attributes of the Christmas tree bulbs) occurred together in a joint probability function. The theorem holds whether the probability functions for the two events are independent or are correlated. In addition, it can be generalized in an obvious way to cases where there are more than two variables and where one some or all of them are continuous rather than discrete random variables.

If that were all there was to it – a mechanical demonstration between conditional and joint probabilities – Bayes theorem would make a curious footnote in probability and statistics textbooks and would hold little practical interest and no controversy. However, the real power of Bayes theorem comes in ability to link one statistical event with another and to allow inferences to be made about cause and effect.

Before looking at how inferences (sometimes very subtle and non-intuitive) can be drawn, let’s take a moment to step back and consider why Bayes theorem works.

The key insight come from examining the meaning contained in the joint probability that two events, $A$ and $B$, will both occur. This probability is written as

\[ P( A \cap B ) \; , \]

where the operator $\cap$ is the logical ‘and’ requiring both $A$ and $B$ to be true. It is at this point that the philosophically interesting implications can be made.

Suppose that we believe that $A$ is a cause of $B$. This causal link could take the form of something like: $A$ = ‘it was raining’ and $B$ = ‘the ground is wet’. Then it is obvious that the joint probability takes the form

\[ P( A \cap B ) = P(B|A) P(A) \; , \]

which in words says that the probability that ‘it was raining and the ground is wet’ = the probability that ‘the ground is wet given that it was raining’ times the probability that ‘it was raining’.

Sometimes, the link between cause and effect is obvious and no probabilistic reasoning is required. For example, if the event is changed from ‘it was raining’ to ‘it is raining’, it becomes clear that ‘the ground is wet’ due to the rain. (Of course even in this case, another factor may also be contributing to how wet the ground is but that complication is naturally handled with the conditional probability).

Often, however, we don’t observe the direct connection between the cause and the effect. Maybe we woke up after the rain had stopped and the clouds had moved on and all we observe is that the ground is wet. What can we then infer? If we lived somewhere without running water (natural or man-made), then the conditional probability ‘that the ground is wet given that is was raining’ would be 1 and we would infer that ‘it was raining’. There would be no way for the ground to be wet other than to have had rain fall from the sky. In general, such a clear indication between cause and effect doesn’t happen and the conditional probability describes the likelihood that some other cause has led to the same event. In the case of the ‘ground is wet’ event perhaps a water main had burst or a neighbor had watered their lawn.

In order to infer anything about the cause from the observed effect, we want to reverse the roles of $A$ and $B$ and argue backwards, as it were. The joint probability can be written with the mathematical roles of $A$ and $B$ reversed to yield

\[ P( A \cap B ) = P(A|B) P(B) \; , \]

Equating the two expressions for the joint probability gives Bayes theorem and also a way of statistically inferring the likelihood that a particular cause $A$ gave the observed effect $B$.

Of course any inference obtained in this fashion is open to a great deal of doubt and scrutiny due to the fact that the link backwards from observation to proposed or inferred origin is one built on probabilities. Without some overriding philosophical principle (e.g. a conservation law) it is easy to confuse coincidence or correlation with causation. Inductive reasoning can then lead to probabilistically support but untrue conclusions like all swans are white – so we have to be on our guard.

Next week’s column will showcase one such trap within the context of mandatory drug testing.

Monthly Archive: July 2015

Bayes and Drugs

Bayesian Inference – Cause and Effect