Monthly Archive: June 2015

Bayesian Inference – The Basics

In last week’s article, I discussed some of the interesting contributions to the scientific method made by the pair of English Bacons, Roger and Francis.  A common and central theme to both of their approaches is the emphasis they placed on performing experiments and then inferring from those experiments what the logical underpinning was.  Put another way, both of these philosophers advocated inductive reasoning as a powerful tool for understanding nature.

One of the problems with the inductive approach is that in generalizing from a few observations to a proposed universal law one may overreach.  It is true that, in the physical sciences, great generalizations have been made (e.g., Newton’s universal law of gravity or the conservation of energy) but these have ultimately rested on some well-supported philosophical principles.

For example, the conservation of momentum rests on a fundamental principle that is hard to refute in any reasonable way; that space has no preferred origin.  This is a point that we would be loath to give up because it would imply that there was some special place in the universe.  But since all places are connected (otherwise they can’t be places) how would nature know to make one of them the preferred spot and how would it keep such a spot inviolate?

But in other matters, where no appeal can be made to an over-arching principle as a guide, the inductive approach can be quite problematic.  The classic and often used example of the black swan is a case in point.  Usually the best that can be done in these cases is to make a probabilistic generalization.  We infer that such and such is the most likely explanation but by no means necessarily the correct one.

The probabilistic approach is time honored.  William of Occam’s dictum that the simplest explanation that fits all the available facts is usually the correct one is, at its heart, a statement about probabilities.  Furthermore, general laws of nature started out as merely suppositions until enough evidence and corresponding development of theory and concepts led to the principles upon which our confidence rests.

So the only thorny questions are what are meant by ‘fact’ and ‘simplest’.  On these points, opinions vary and much argument ensues.  In this post, I’ll be exploring one of the more favored approaches for inductive inference known as the Bayesian method.

The entire method is based on the theorem attributed to Thomas Bayes, a Presbyterian minister, and statistician, who first published this law in the latter half of the 1700s.  It was later refined by Pierre Simon Laplace, in 1812.

The theorem is very easy to write down, and that perhaps is what hides its power and charm.  We start by assuming that two random events, $A$ and $B$, can occur, each according to some probability distribution.  The random events can be anything at all and don’t have to be causally connected or correlated.  Each event has some possible set of outcomes $a_1, a_2, \ldots$ and $b_1, b_2, \ldots$.  Mathematically, the theorem is written as

\[ P(a_i|b_j) = \frac{P(b_j|a_i) P(a_i)}{P(b_j)} \; , \]

where $a_i$ and $b_j$ are some specific outcomes of the events $A$ and $B$ and $P(a_i|b_j)$ ($P(b_j|a_i)$) is called the conditional probability that $a_i$ ($b_j$) results given that we know that $b_j$ ($a_i$) happened.  As advertised it is nice and simple to write down and yet amazingly rich and complex in its applications.  To understand the theorem, let’s consider a practical case where the events $A$ and $B$ take on some easy-to-understand meaning.

Suppose that we are getting ready for Christmas and want to decorate our tree with the classic strings of different-colored lights.  We decide to a purchase a big box of bulbs of assorted colors from the Christmas light manufacturer, Brighty-Lite, who provides bulbs in red, blue, green, and yellow.  Allow the set $A$ to represent the colors

\[ A = \left\{\text{red}, \text{blue}, \text{green}, \text{yellow} \right\} = \left\{r,b,g,y\right\} \; . \]

On its website, Brighty-Lite proudly tells us that they have tweaked their color distribution in the variety pack to best match their customer’s desires.  They list their distribution as consisting of 30% percent red and blue, 25% green, and 15% yellow.  So the probabilities associated with reaching into the box and pulling out a bulb of a particular color are

\[ P(A) = \left\{ P(r), P(b), P(g), P(y) \right\} = \left\{0.30, 0.30, 0.25, 0.15 \right\} \; . \]

The price for bulbs from Brighty-Lite is very attractive, but being cautious people, we are curious how long the bulbs will last before burning out.   We find a local university that put its undergraduates to good use testing the lifetimes of these bulbs.  For ease of use, they categorized their results into three bins: short, medium, and long lifetimes. Allowing the set $B$ to represent the lifetimes

\[ B = \left\{\text{short}, \text{medium}, \text{long} \right\} = \left\{s,m,l\right\} \]

the student results are reported as

\[ P(B) = \left\{ P(s), P(m), P(l) \right\} = \left\{0.40, 0.35, 0.25 \right\} \; , \]

which confirmed our suspicions that Brighty-Lite doesn’t make its bulbs to last.  However, since we don’t plan on having the lights on all the time, we decide to buy a box.

After receiving the box and buying the tree, we set aside a weekend for decorating.  Come Friday night we start by putting up the lights and, as we work, we start wondering whether all colors have the same lifetime distribution or whether some colors are more prone to be short-lived compared with the others. The probability distribution that describes the color of the bulb and its lifetime is known as the joint probability distribution.

If the bulb color doesn’t have any effect on the lifetime of the filament, then the events are independent, and the joint probability of, say, a red bulb with a medium lifetime is given by the product of the probability that the bulb is red and the probability that it has a medium lifespan (symbolically $P(r,m) = P(r) P(m)$).

The entire full joint probability distribution is thus

  red blue green yellow  
short 0.12 0.12 0.1 0.06 0.40
medium 0.105 0.105 0.0875 0.0525 0.35
long 0.075 0.075 0.0625 0.0375 0.25
  0.30 0.30 0.25 0.15  

Now we are in a position to see Bayes theorem in action.  Suppose that we pull out a green bulb from the box.  The conditional probability that the lifetime is short $P(s|g)$ is the relative proportion that the green and short entry $P(g,s)$ has compared to the sum of the probabilities $P(g)$ found in the column labeled green.  Numerically,

\[ P(s|g) = \frac{P(g,s)}{P(g)} = \frac{0.1}{0.25} = 0.4 \; . \]

Another way to write this is as

\[ P(s|g) = \frac{P(g,s)}{P(g,s) + P(g,m) + P(g,l)} \; , \]

which better shows that the conditional probability is the relative proportion within the column headed by the label green.

Likewise, the conditional probability that the bulb is green given that its lifetime is short is

\[ P(g|s) = \frac{ P(g,s) }{P(r,s) + P(b,s) + P(g,s) + P(y,s)} \; . \]

Notice that this time the relative proportion is measured against joint probabilities across the colors (i.e., across the row labeled short). Numerically, $P(g|s) = 0.1/0.4 = 0.25$.

Bayes theorem links these two probabilities through

\[ P(s|g) = \frac{ P(g|s) P(s) }{ P(g) } = \frac{0.25 \cdot 0.4}{0.25} = 0.4 \; , \]

which is happily the value we got from working directly with the joint probabilities.

The next day, we did some more cyber-digging and found that a group of graduate students at the same university extended the undergraduate results (were they perhaps the same people?) and reported the following joint probability distribution:

 

  red blue green yellow  
short 0.15 0.10 0.05 0.10 0.40
medium 0.05 0.12 0.15 0.03 0.35
long 0.10 0.08 0.05 0.02 0.25
  0.30 0.30 0.25 0.15  

Sadly, we noticed that our assumption of independence between the lifetime and color was not borne out by experiment since $P(A,B) \neq P(A) \cdot P(B)$ or in more explicit terms $P(color,lifetime) \neq P(color) P(lifetime)$.  However, we were not completely disheartened since Bayes theorem relates relative proportions and, therefore, it might still work.

Trying it out, we computed

\[ P(s|g) = \frac{P(g,s)}{P(g,s) + P(g,m) + P(g,l)} = \frac{0.05}{0.05 + 0.15 + 0.05} = 0.2 \]

and

\[ P(g|s) = \frac{ P(g,s) }{P(r,s) + P(b,s) + P(g,s) + P(y,s)} \\ = \frac{0.05}{0.15 + 0.10 + 0.05 + 0.10} = 0.125 \; . \]

Checking Bayes theorem, we found

\[ P(s|g) = \frac{ P(g|s) P(s) }{ P(g) } = \frac{0.125 \cdot 0.4}{0.25} = 0.2 \]

guaranteeing a happy and merry Christmas for all.

Next time, I’ll show how this innocent looking computation can be put to subtle use in inferring cause and effect.

Bringing Home the Bacon

Don’t worry; this week’s entry is not about America’s favorite pork-related product (seriously there exists bacon-flavored candy).  It’s about the scientific method.  Not the whole thing, of course, as that would take volumes and volumes of text and would be outdated and maybe obsolete by the time it was finished.  No, this column is about two men who are considered by science historians to have contributed substantially to the scientific method and the philosophy of science.  And it just so happens that both of them bore the last name of Bacon.

Roger Bacon was born somewhere around 1214 (give or take – time and record keeping then, as now, was hard to do) in England.  Roger became both an English philosopher of note and a Franciscan friar.  Most of the best scholastic philosophers of the Middle Ages were monks, and in taking Holy Orders, Bacon falls amongst the ranks of other prominent thinking religious, including Robert Grosseteste, Albert MagnusThomas AquinasJohn Duns Scotus, and William of Ockham.

It seems that the cultural milieu of that time was planting the intellectual seeds for the scientific and artistic renaissance that followed. Roger Bacon cultivated modes of thought that would be needed for the advances to come.  Basing his philosophy on Aristote, he advocated for the following ‘modern’ ideas:

  • Experimental testing for all inductively derived conclusions
  • Rejection of bling following of prior authorities
  • Repeating pattern of observation, hypothesis, and testing
  • Independent corroboration and verification

In addition, he wrote extensive on science, both its general structure and on specific applications.  Among his particular fields of interest was optics, where his diagrams have the look and feel of the modern experimental lab notebook.

Roger_Bacon_optics01

He also criticized the Julian day and argued for dropping a day every 125 years.  This system would not be adopted until about 300 years after his death with the creation of the Gregorian calendar in 1582.  He was almost an outspoken supporter experimental science saying that it had three great prerogatives over other sciences and arts in that:

  • It verifies all of its conclusions by direct experiment
  • It discovers truths which can’t be reached without observation
  • It reveals the secrets of nature

Francis Bacon was born in 1561 in England.  He was a government official (Attorney General and Lord Chancellor) and a well-known philosopher.  His writings on science and philosophy established a firm footing for inductive methods used for scientific inquiry.  The details of the method are collectively known as the Baconian Method or the scientific method.

In his work Novum Organum (literally the new Organon referring to Aristotle’s work on metaphysics and logic), Francis has this to say about induction:

Our only hope, then is in genuine Induction… There is the same degree of licentiousness and error in forming Axioms, as in abstracting Notions: and that in the first principles, which depend in common induction. Still more is this the case in Axioms and inferior propositions derived from Syllogisms.

By induction, he meant the careful gathering of data and then refinement of a theory from those observations.

Curiously, both Bacons talk about four errors that interfere with the acquisition of knowledge:  Roger does so in his Opus Majus; Francis in his Novum Organum.  The following table makes an attempt to match up each’s list.

Roger Bacon’s Four Causes of Error Francis Bacon’s Four Idols of the Mind
Authority
(reliance on prior authority)
Idols of the Theater
(following academic dogma)
Custom Idols of the Tribe
(tendency of humans to see order where it isn’t)
Opinion of the unskilled many Idols of the Marketplace
(confusion in the use of language)
Concealment of ignorance behind the mask of knowledge Idols of the Cave
(interference from personal beliefs, likes, and dislikes)

While not an exact match, the two Baconian lists of errors match up fairly well, which is puzzling if historic assumption that Francis Bacon had no access to the works of Roger Bacon is true.  Perhaps the most logical explanation is that both of them saw the same patterns of error; that human kind doesn’t change its fundamental nature in the passage of time or space. 

Or perhaps Francis is simply the reincarnation of Roger, an explanation that I am positively sure William of Occam would endorse if he were alive today…

Ideal Forms and Error

A central concept of Socratic and Platonic thought is the idea of an ideal form.  It sits at the base of all discussions about knowledge and epistemology.  Any rectangle that we draw on paper or in a drawing software package, that we construct using rulers and scissors, or manufacture with computer controlled fabrication is a shadow or reflection of the ideal rectangle.  This ideal rectangle exists in the space of forms, which may be entirely within the human capacity to understand the world and distinguish or may actually have an independent existence outside the human mind, reflecting a high power.  All of these notions about the ideal forms are familiar from the philosophy from antiquity.

What isn’t so clear is what Plato’s reaction would be if he were suddenly transported forward in time and plunked down in a classroom discussion about the propagation of error.  The intriguing question is would he modify his philosophical thought to expand the concept of an ideal form to include and ideal form of error?

Let’s see if I can make this question concrete by the use of an example.  Consider a diagram representing an ideal rectangle of length $L$ and height $H$.

true_rectangle

Euclidean geometry tells us that the area of such a rectangle is given by the product

\[ A = L \cdot H \; . \]

Of course, the rectangle represented in the diagram doesn’t really exist since there are always imperfections and physical limitations.  The usual strategy is to not take the world as we would like it to be but to take it as it is and cope with these departures from the ideal.

The departures from the ideal can be classified into two broad categories.

The first category, called knowledge error, contains all of the errors in our ability to know.  For example, we do not know exactly what numerical value to give the length $L$.  There are fundamental limitations on our ability to measure or represent the numerical value of $L$ and so we know the ‘true’ value of $L$ only to within some fuzzy approximation.

The second category doesn’t seem to have a universally agreed-upon name, reflecting the fact that, as a society, we are still coming to grips with the implications of this idea.  This departure from the ideal describes the fact that at some level there may not even be on definable concept of true.  Essentially, the idea of the length of an object is context-dependent and may have no absolutely clear idea at the atomic level due to the inherent uncertainty in quantum mechanics.  This type of ‘error’ is sometimes called aleatory error (in contrast to epistemic error; synonymous with knowledge error).

Taken together, the knowledge and aleatory errors contribute to an uncertainty in length of the rectangle of $dL$ and an uncertainty in its height of $dH$.

error_rectangle

Scientists and engineers are commonly exposed to a model in determining the error in the area of such a rectangle as part of their training to deal with uncertainty and error in a formula sometimes called the propagation of error (or uncertainty).  For the case of this error-bound rectangle, the true area, $A’$, is determined also in Euclidean fashion yielding

\[ A’ = (L+dL) \cdot (H+dH) = L \cdot H + dL \cdot H + L \cdot dH + dL \cdot dH .\]

So the error in the error in the area, denoted as $dA$, has a more complicated form that the area itself

\[ dA = dL \cdot H + L \cdot dH + dL \cdot dH \; . \]

Now suppose that Plato were in the classroom when this lesson was taught.  What would his reaction be?  I bring this up because although the treatment above is meant to handle error it is still an idealization.  There is still a notion of an ideal rectangle sitting underneath.

The curious question that follows in its train is this:  is there an ideal form for this error idealization?  In other words, is there a perfect or ideal error in the space of forms of which our particular error discussion is a shadow or reflection?

It may sound like this question if predicated on a contradiction but my contention is that it only seems so, on the surface.  In understanding the propagation of error in the calculation of the rectangle I’ve had to assume a particular functional relationship.

It is a profound assumption that the object drawn above (not what it represents but that object itself), which is called a rectangle but which is embodied in the real world as made up of atomic parts (be they physical atoms or pixels), can be characterized by two numbers ($L$ and $H$) even if I don’t know what values $L$ and $H$ take on.  In some sense, this idealization should sit in the space of forms.

But if that is true, what stops us there.  Suppose we had a more complex functional relationship, something, say, that tries to model the boundaries of the object as a set of curves that deviate much from linearity but enough to capture a shaky hand when the object was drawn or a manufacturing process with deviations when machined. Is this model not also an idealization and therefore a reflection of something within the space of forms?

And why stop there. It seems to me that the boundary line between what is and is not in the space of forms is arbitrary (and perhaps self-referential – is the boundary between what is and is not in the space of forms itself in the space of forms).  Like levels of abstraction in a computer model depend on the context, could not the space of forms depend on the questions that are being asked.

Perhaps the space of forms is as infinite or as finite as we need it to be.  Perhaps its forms all the way down.

Why do We Teach the Earth is Round?

You’re no doubt asking yourself “Why the provocative title?  It’s obvious why we should teach that the Earth is round!” In some sense, this was my initial reaction when this exact question was posed in a round table discussion that I participated in recently.  The person who posed the question was undaunted by the initial pushback and persisted.  Her point was simply a genuinely honest question driven by a certain pragmatism.

Her basic premise is this.  For the vast majority of people on the Earth, a flat Earth model best fits their daily experiences.  None of us plan our day-to-day trips using the geometry of Gauss.  Many of us fly, but far fewer of us fly long enough distances where the pilot or navigator consciously lays in great circle path.  And even if all of us were to fly, say from New York to Rome, so what if the path the plane follows is a ‘geodesic on the sphere’, very few of us are either aware or care.  After all, that is someone else’s job to do.  And certainly gone are the days where we sit at the seashore and watch the masts of ships disappear last over the horizon – cell phones and the internet are far more interesting.

I listened to the argument carefully and mulled it over a few days and realized that there was a lot of truth in it.  The points here weren’t that we shouldn’t teach that the Earth is round but rather that we should know with a firm and articulable conviction why we should teach it and that that criteria for inclusion should be open to debate when schools draw up their curriculum.

So what criteria should be used to construct a firm and articulable conviction? It seems that at the core of this question was a dividing line between types of knowledge and why we would care to know one over the other.

The first distinction in our round-Earth epistemological exploration is one between what I will call tangible and intangible knowledge.  Tangible knowledge consists of all those facts that have an immediate impact on a person’s everyday existence.  For example, knowing that a particular road bogs down in the afternoon is a slice of tangible knowledge because acting on it can prevent me from arriving home late for dinner (or perhaps having no dinner at all).  Knowing that the rainbow is formed by light entering a water droplet in the atmosphere in a particular way so that it is subjected to a single total internal reflection before exiting the drop with the visible light substantially dispersed is an intangible fact, since I am neither a farmer nor a meteorologist.  Many are the people who have said “don’t tell me how a rainbow is formed – it ruins all the beauty and poetry!”

An immediate corollary of this distinction is that what is tangible and intangible knowledge is governed by what impacts a person’s life.  It differs both from person to person and over time.  A person who doesn’t drive the particular stretch of road that I do would find the knowledge that my route home bogs down at certain times and the meteorologist would find the physical mechanism for the rainbow a tangible bit of knowledge, even if it kills the poet in him.

The second distinction is between what I will call private and common knowledge.  The particular PIN I use to access by phone is knowledge that is, and should, remain private to me.  In the hands of others it is either useless (for the vast majority who are either honest, or don’t know, or both) or it is dangerous (for those who do know me and are up to no good).  Common knowledge describes those facts that can be shared with no harm between all people.  Knowing how electromagnetic waves propagate is an example of common knowledge but knowing a particular frequency to intercept enemy communications is private.

With these distinctions in hand, it is now easy to see what was meant by the original, provocative question.  As it is taught in schools, knowledge that the Earth is round is, for most people, a common, intangible slice of human knowledge.  In this context, it is reasonable to ask why we even teach it to the students.

A far better course of action is to try to transform this discovery into a common but tangible slice of knowledge that effects each student on core level.  The particular ways that this can be done are numerous but let me suggest one that I regard as particularly important.

Fancy earth

Teaching that the Earth is round should be done within a broader context of how do we know anything about the world around it, how certain are we, and where are the corners of doubt and uncertainty.  A common misconception is that the knowledge that the Earth is round was lost during the Dark and early Middle Ages.  The ancient Greeks knew with a great deal of certainty that the Earth was round and books from antiquity tell the story of how Eratosthenes determined the radius of the Earth to an astounding accuracy considering the technology of his day.  This discovery persisted into the Dark and Middle Ages and was finally put to some practical use only when the collective technology of the world progressed to the point that the voyages of Columbus and Magellan were possible.  Framing the lesson of the Earth’s roundness in this way provides a historical context that elevates it from mere geometry into a societally shaping event.  Science, technology, sociology, geography, and human affairs are all intertwined and should be taught as so.

Along the way, numerous departure points are afforded to discuss other facets of what society knows and how does it know it.  Modern discoveries that the Earth is not a particularly spherical (equatorial bulge) know take on a life outside of geodesy and the concepts of approximations, models, and contexts by which ‘facts’ are known and consumed now become tools for honing critical thinking about a host of policy decision each and every one of us has to make.

By articulating the philosophical underpinnings for choosing a particular curriculum, society can be sure that arbitrary decisions about what topics are taught can be held in check. Different segments can openly debate what material should be included and what can be safely omitted in an above board manner.  Emotional and aesthetic points can be addressed side-by-side with practical points without confusion.  And all the while we can be sure that development of critical thinking is center stage.

Failure to do this leaves two dangerous scenarios.  The first is that student is filled with a lot of unconnected facts that improve neither his civic participation in practical matters nor his general appreciation for the beauty of the world.  The second, and more importantly, the student is left with the impression that science delivers to us unassailable facts.  This is a dangerous position since it leads to modern interpretations of science as a new type of religion whose dogma has replaced the older dogma of the spiritual simply by virtue that its magic (microwaves, TVs, cell-phones, rockets, nuclear power, and so on) is more powerful and apparent.