It is odd how the phrasing of a question changes the meaning and interpretation of probability-based situations.  Philosophically, we should expect a degree of fluidity because when we engage in thinking about and discussing probabilities we wander into the twilight zone of thought.  Clear distinctions between what is known and what can be known, between epistemological uncertainty and ontological uncertainty (sometimes called aleatory variability), and how and why we know are important if we ever want to emerge from the forest of confusion and doubt.

For a simple example of some of the complexity that can arise, consider the lowly coin flip.  Imagine you are at a friend’s house and the two of you are arguing over what movie to watch.  Your friend wants to watch Predator and you want to watch Alien.  You decide to settle the debate on a coin flip of the variety where he flips the coin, catches it, mashes it onto his arm with his hand covering it, and then he invites you to call it heads or tails.

Assuming the coin is fair, you reckon that there is a fifty-fifty chance that you’ll be enjoying Alien tonight while he just has to grin and bear it.  He then flips the coin and, as you contemplate the hidden disk upon which all your hopes and dreams ride (at least as this evening’s movie selection is concerned), you may be moved to say that the probability of heads is 0.5.  But in this you would be wrong.  The probability of the flip coming up heads before it is tossed is 0.5 (an example of ontological uncertainty) but after you friend has flipped and caught the coin there is a decided outcome.  The correct way of phrasing the situation is to say that the probability that you will guess the already selected result is 0.5 (an example of epistemological uncertainty). 

Hopefully this simple example has clarified these points a bit.  Ontological uncertainty usually arises when making predictions of physical outcomes of an event with the traditional example being Aristotle’s sea battle.  Whether a sea battle will happen tomorrow is a statement that cannot have definitive truth value (either true or false) and is an example where the law of the excluded middle may be violated.  Epistemological uncertainty arises when making decisions about the past outcome of an event with limited knowledge with a corresponding example being whether the sea battle that happened today was a victory for one side or a defeat.

It is very easy to get confused on these points, and an excellent example of this controversy was raised by Zach Star in his YouTube video entitled This May Be The Most Counterintuitive Probability Paradox I’ve Ever Seen | Can you spot the error? from April 7, 2019. 

I don’t recommend watching the whole video precisely because Zach gets very contorted in the analysis of a variant of the Boy-Girl Paradox, but it is an important precursor to his follow-up video entitled The Boy or Girl Probability Paradox Resolved | It was never really a paradox from April 11, 2019.

Even in his clarification video, he goes to some effort to caution about his tenuous grasp of the right way to analyze the situation and why his earlier conclusions were wrong.

To explain where the tangle arises, let’s start with the most basic premise of the Boy-Girl Paradox that asks the following.  Suppose you meet a father in a bar and, in the course of conversation (say over gin and tonics), he reveals that he has two children.  What is the probability that he has two girls?

Well, assuming that boys and girls are equally likely, the probability is 0.25.  This conclusion is straightforward but best presented in the following figure, which assumes that you’ve now met 10,000 such two-children families (and have run up a large bar tab).

This is a statement of ontological uncertainty.  That is to say that, in families that have birthed two children, the random process of sex selection will distribute the sexes such that the proportions shown in the figure result. 

But in the context of the bar conversation, the probability is really epistemological in that we are trying to determine, based on the clues we pick up, what is the probability that we will guess correctly.  Since only 2,500 two-girl families are present in the population of 10,000 total families, the probability, if we guess correctly, that a given father has two girls, given no other data, is one quarter or 0.25.

Now suppose that he lets slip that one of his children is a girl.  This revelation provides a bit more data and so our expectation is that the probability should increase and so it does because we now get to exclude all the families with two boys.  Our two-girl families remain at 2,500 but the population against which it is measured as a proportion has dropped to 7,500 and the probability that we will correctly guess that the father has two girls rises to 1/3.  Let me underline this last distinction.  The probability that the father has two girls if he has two children is always 1/4 ontologically.  What we are doing at this point is narrowing our epistemological uncertainty.   

Now comes the tricky part that initially caused Zach Star to stumble.  Suppose that a given father says one of his children is named Julie.  Star says that the probability that the man has two girls has risen to one half or 0.5.  He reasons that conclusion this way.  Assume that the probability of a girl being named Julie is 1/100 (the actual probability value doesn’t matter but this value is convenient).  Then the set of one-girl families supplies 50 girls who meet the bill (on average of course – that is why we took the number of families large to begin with so that we could ignore fluctuations).  The set of two-girl families, while half the size when taken in aggregate as a two-child household, supplies 50 girls as well, since they have two girls for each one in the other set.  Ergo, the probability is 0.5.  And this change in probability is a paradox to him because how can knowing the name Julie make a difference.

This way of talking is sloppy for several reasons.  First, as pointed out before, the ontological probability never changes; what changes is our ability to guess properly, and that should go up or down as new info is provided.  Second, and more important, the reasoning is wrong.  Only half the fathers in the two-daughter set are going to randomly mention that they have a daughter named Julie even if there are 50 Julies to be found.  That is because they have no incentive to select Julie over the other daughter, whatever her name may be.  If, however, we systematically poll each family and ask if they have a daughter named Julie then we will be sure to uncover all the ones in the two-child set.  This process increases our knowledge and so it should decrease our epistemological uncertainty.

It’s amazing how easy it is to get tangled up in probability.