{"id":239,"date":"2015-06-27T03:23:22","date_gmt":"2015-06-27T03:23:22","guid":{"rendered":"http:\/\/aristotle2digital.blogwyrm.com\/?p=239"},"modified":"2023-05-07T05:23:43","modified_gmt":"2023-05-07T09:23:43","slug":"bayesian-inference-the-basics","status":"publish","type":"post","link":"https:\/\/aristotle2digital.blogwyrm.com\/?p=239","title":{"rendered":"Bayesian Inference \u2013 The Basics"},"content":{"rendered":"<p><style>\ntable, th, td {<br \/>\nborder: 1px solid black;<br \/>\nborder-collapse: collapse;<br \/>\n}<br \/>\nth { text-align: center !important; }<br \/>\n<\/style><\/p>\n<p>In last week\u2019s article, I discussed some of the interesting contributions to the scientific method made by the pair of English Bacons, Roger and Francis.\u00a0 A common and central theme to both of their approaches is the emphasis they placed on performing experiments and then inferring from those experiments what the logical underpinning was.\u00a0 Put another way, both of these philosophers advocated inductive reasoning as a powerful tool for understanding nature.<\/p>\n<p>One of the problems with the inductive approach is that in generalizing from a few observations to a proposed universal law one may overreach.\u00a0 It is true that, in the physical sciences, great generalizations have been made (e.g., Newton\u2019s universal law of gravity or the conservation of energy) but these have ultimately rested on some well-supported philosophical principles.<\/p>\n<p>For example, the conservation of momentum rests on a fundamental principle that is hard to refute in any reasonable way; that space has no preferred origin.\u00a0 This is a point that we would be loath to give up because it would imply that there was some special place in the universe.\u00a0 But since all places are connected (otherwise they can\u2019t be places) how would nature know to make one of them the preferred spot and how would it keep such a spot inviolate?<\/p>\n<p>But in other matters, where no appeal can be made to an over-arching principle as a guide, the inductive approach can be quite problematic.\u00a0 The classic and often used example of the black swan is a case in point.\u00a0 Usually the best that can be done in these cases is to make a probabilistic generalization.\u00a0 We infer that such and such is the most likely explanation but by no means necessarily the correct one.<\/p>\n<p>The probabilistic approach is time honored.\u00a0 William of Occam\u2019s dictum that the simplest explanation that fits all the available facts is usually the correct one is, at its heart, a statement about probabilities.\u00a0 Furthermore, general laws of nature started out as merely suppositions until enough evidence and corresponding development of theory and concepts led to the principles upon which our confidence rests.<\/p>\n<p>So the only thorny questions are what are meant by \u2018fact\u2019 and \u2018simplest\u2019.\u00a0 On these points, opinions vary and much argument ensues.\u00a0 In this post, I\u2019ll be exploring one of the more favored approaches for inductive inference known as the Bayesian method.<\/p>\n<p>The entire method is based on the theorem attributed to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes\">Thomas Bayes<\/a>, a Presbyterian minister, and statistician, who first published this law in the latter half of the 1700s.\u00a0 It was later refined by Pierre Simon Laplace, in 1812.<\/p>\n<p>The theorem is very easy to write down, and that perhaps is what hides its power and charm.\u00a0 We start by assuming that two random events, $A$ and $B$, can occur, each according to some probability distribution.\u00a0 The random events can be anything at all and don\u2019t have to be causally connected or correlated.\u00a0 Each event has some possible set of outcomes $a_1, a_2, \\ldots$ and $b_1, b_2, \\ldots$.\u00a0 Mathematically, the theorem is written as<\/p>\n<p>\\[ P(a_i|b_j) = \\frac{P(b_j|a_i) P(a_i)}{P(b_j)} \\; , \\]<\/p>\n<p>where $a_i$ and $b_j$ are some specific outcomes of the events $A$ and $B$ and $P(a_i|b_j)$ ($P(b_j|a_i)$) is called the conditional probability that $a_i$ ($b_j$) results given that we know that $b_j$ ($a_i$) happened.\u00a0 As advertised it is nice and simple to write down and yet amazingly rich and complex in its applications.\u00a0 To understand the theorem, let\u2019s consider a practical case where the events $A$ and $B$ take on some easy-to-understand meaning.<\/p>\n<p>Suppose that we are getting ready for Christmas and want to decorate our tree with the classic strings of different-colored lights. \u00a0We decide to a purchase a big box of bulbs of assorted colors from the Christmas light manufacturer, Brighty-Lite, who provides bulbs in red, blue, green, and yellow.\u00a0 Allow the set $A$ to represent the colors<\/p>\n<p>\\[ A = \\left\\{\\text{red}, \\text{blue}, \\text{green}, \\text{yellow} \\right\\} = \\left\\{r,b,g,y\\right\\} \\; . \\]<\/p>\n<p>On its website, Brighty-Lite proudly tells us that they have tweaked their color distribution in the variety pack to best match their customer\u2019s desires.\u00a0 They list their distribution as consisting of 30% percent red and blue, 25% green, and 15% yellow.\u00a0 So the probabilities associated with reaching into the box and pulling out a bulb of a particular color are<\/p>\n<p>\\[ P(A) = \\left\\{ P(r), P(b), P(g), P(y) \\right\\} = \\left\\{0.30, 0.30, 0.25, 0.15 \\right\\} \\; . \\]<\/p>\n<p>The price for bulbs from Brighty-Lite is very attractive, but being cautious people, we are curious how long the bulbs will last before burning out.\u00a0\u00a0 We find a local university that put its undergraduates to good use testing the lifetimes of these bulbs.\u00a0 For ease of use, they categorized their results into three bins: short, medium, and long lifetimes. Allowing the set $B$ to represent the lifetimes<\/p>\n<p>\\[ B = \\left\\{\\text{short}, \\text{medium}, \\text{long} \\right\\} = \\left\\{s,m,l\\right\\} \\]<\/p>\n<p>the student results are reported as<\/p>\n<p>\\[ P(B) = \\left\\{ P(s), P(m), P(l) \\right\\} = \\left\\{0.40, 0.35, 0.25 \\right\\} \\; , \\]<\/p>\n<p>which confirmed our suspicions that Brighty-Lite doesn\u2019t make its bulbs to last.\u00a0 However, since we don\u2019t plan on having the lights on all the time, we decide to buy a box.<\/p>\n<p>After receiving the box and buying the tree, we set aside a weekend for decorating.\u00a0 Come Friday night we start by putting up the lights and, as we work, we start wondering whether all colors have the same lifetime distribution or whether some colors are more prone to be short-lived compared with the others. The probability distribution that describes the color of the bulb and its lifetime is known as the joint probability distribution.<\/p>\n<p>If the bulb color doesn\u2019t have any effect on the lifetime of the filament, then the events are independent, and the joint probability of, say, a red bulb with a medium lifetime is given by the product of the probability that the bulb is red and the probability that it has a medium lifespan (symbolically $P(r,m) = P(r) P(m)$).<\/p>\n<p>The entire full joint probability distribution is thus<\/p>\n<table style=\"border-style: none !important;\">\n<tbody>\n<tr>\n<th style=\"border-style: none !important; background-color: #ffffff !important;\" width=\"106\">\u00a0<\/th>\n<th width=\"106\">red<\/th>\n<th width=\"106\">blue<\/th>\n<th width=\"106\">green<\/th>\n<th width=\"106\">yellow<\/th>\n<th style=\"border-style: none !important; background-color: #ffffff !important;\" width=\"106\">\u00a0<\/th>\n<\/tr>\n<tr>\n<td width=\"106\">short<\/td>\n<td width=\"106\">0.12<\/td>\n<td width=\"106\">0.12<\/td>\n<td width=\"106\">0.1<\/td>\n<td width=\"106\">0.06<\/td>\n<td width=\"106\">0.40<\/td>\n<\/tr>\n<tr>\n<td width=\"106\">medium<\/td>\n<td width=\"106\">0.105<\/td>\n<td width=\"106\">0.105<\/td>\n<td width=\"106\">0.0875<\/td>\n<td width=\"106\">0.0525<\/td>\n<td width=\"106\">0.35<\/td>\n<\/tr>\n<tr>\n<td width=\"106\">long<\/td>\n<td width=\"106\">0.075<\/td>\n<td width=\"106\">0.075<\/td>\n<td width=\"106\">0.0625<\/td>\n<td width=\"106\">0.0375<\/td>\n<td width=\"106\">0.25<\/td>\n<\/tr>\n<tr>\n<td style=\"border-style: none !important;\" width=\"106\">\u00a0<\/td>\n<td width=\"106\">0.30<\/td>\n<td width=\"106\">0.30<\/td>\n<td width=\"106\">0.25<\/td>\n<td width=\"106\">0.15<\/td>\n<td style=\"border-style: none !important;\" width=\"106\">\u00a0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Now we are in a position to see Bayes theorem in action.\u00a0 Suppose that we pull out a green bulb from the box. \u00a0The conditional probability that the lifetime is short $P(s|g)$ is the relative proportion that the green and short entry $P(g,s)$ has compared to the sum of the probabilities $P(g)$ found in the column labeled green.\u00a0 Numerically,<\/p>\n<p>\\[ P(s|g) = \\frac{P(g,s)}{P(g)} = \\frac{0.1}{0.25} = 0.4 \\; . \\]<\/p>\n<p>Another way to write this is as<\/p>\n<p>\\[ P(s|g) = \\frac{P(g,s)}{P(g,s) + P(g,m) + P(g,l)} \\; , \\]<\/p>\n<p>which better shows that the conditional probability is the relative proportion within the column headed by the label green.<\/p>\n<p>Likewise, the conditional probability that the bulb is green given that its lifetime is short is<\/p>\n<p>\\[ P(g|s) = \\frac{ P(g,s) }{P(r,s) + P(b,s) + P(g,s) + P(y,s)} \\; . \\]<\/p>\n<p>Notice that this time the relative proportion is measured against joint probabilities across the colors (i.e., across the row labeled short). Numerically, $P(g|s) = 0.1\/0.4 = 0.25$.<\/p>\n<p>Bayes theorem links these two probabilities through<\/p>\n<p>\\[ P(s|g) = \\frac{ P(g|s) P(s) }{ P(g) } = \\frac{0.25 \\cdot 0.4}{0.25} = 0.4 \\; , \\]<\/p>\n<p>which is happily the value we got from working directly with the joint probabilities.<\/p>\n<p>The next day, we did some more cyber-digging and found that a group of graduate students at the same university extended the undergraduate results (were they perhaps the same people?) and reported the following joint probability distribution:<\/p>\n<p>\u00a0<\/p>\n<table style=\"border-style: none !important;\">\n<tbody>\n<tr>\n<th style=\"border-style: none !important; background-color: #ffffff !important;\" width=\"106\">\u00a0<\/th>\n<th width=\"106\">red<\/th>\n<th width=\"106\">blue<\/th>\n<th width=\"106\">green<\/th>\n<th width=\"106\">yellow<\/th>\n<th style=\"border-style: none !important; background-color: #ffffff !important;\" width=\"106\">\u00a0<\/th>\n<\/tr>\n<tr>\n<td width=\"106\">short<\/td>\n<td width=\"106\">0.15<\/td>\n<td width=\"106\">0.10<\/td>\n<td width=\"106\">0.05<\/td>\n<td width=\"106\">0.10<\/td>\n<td width=\"106\">0.40<\/td>\n<\/tr>\n<tr>\n<td width=\"106\">medium<\/td>\n<td width=\"106\">0.05<\/td>\n<td width=\"106\">0.12<\/td>\n<td width=\"106\">0.15<\/td>\n<td width=\"106\">0.03<\/td>\n<td width=\"106\">0.35<\/td>\n<\/tr>\n<tr>\n<td width=\"106\">long<\/td>\n<td width=\"106\">0.10<\/td>\n<td width=\"106\">0.08<\/td>\n<td width=\"106\">0.05<\/td>\n<td width=\"106\">0.02<\/td>\n<td width=\"106\">0.25<\/td>\n<\/tr>\n<tr>\n<td style=\"border-style: none !important;\" width=\"106\">\u00a0<\/td>\n<td width=\"106\">0.30<\/td>\n<td width=\"106\">0.30<\/td>\n<td width=\"106\">0.25<\/td>\n<td width=\"106\">0.15<\/td>\n<td style=\"border-style: none !important;\" width=\"106\">\u00a0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Sadly, we noticed that our assumption of independence between the lifetime and color was not borne out by experiment since $P(A,B) \\neq P(A) \\cdot P(B)$ or in more explicit terms $P(color,lifetime) \\neq P(color) P(lifetime)$.\u00a0 However, we were not completely disheartened since Bayes theorem relates relative proportions and, therefore, it might still work.<\/p>\n<p>Trying it out, we computed<\/p>\n<p>\\[ P(s|g) = \\frac{P(g,s)}{P(g,s) + P(g,m) + P(g,l)} = \\frac{0.05}{0.05 + 0.15 + 0.05} = 0.2 \\]<\/p>\n<p>and<\/p>\n<p>\\[ P(g|s) = \\frac{ P(g,s) }{P(r,s) + P(b,s) + P(g,s) + P(y,s)} \\\\ = \\frac{0.05}{0.15 + 0.10 + 0.05 + 0.10} = 0.125 \\; . \\]<\/p>\n<p>Checking Bayes theorem, we found<\/p>\n<p>\\[ P(s|g) = \\frac{ P(g|s) P(s) }{ P(g) } = \\frac{0.125 \\cdot 0.4}{0.25} = 0.2 \\]<\/p>\n<p>guaranteeing a happy and merry Christmas for all.<\/p>\n<p>Next time, I\u2019ll show how this innocent looking computation can be put to subtle use in inferring cause and effect.<\/p>\n\n\n\n","protected":false},"excerpt":{"rendered":"<p>In last week\u2019s article, I discussed some of the interesting contributions to the scientific method made by the pair of English Bacons, Roger and Francis.\u00a0 A common and central theme&#8230; <a class=\"read-more-button\" href=\"https:\/\/aristotle2digital.blogwyrm.com\/?p=239\">Read more &gt;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-239","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/239","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=239"}],"version-history":[{"count":7,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/239\/revisions"}],"predecessor-version":[{"id":1629,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/239\/revisions\/1629"}],"wp:attachment":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=239"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=239"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=239"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}