{"id":1467,"date":"2022-05-27T23:30:05","date_gmt":"2022-05-28T03:30:05","guid":{"rendered":"http:\/\/aristotle2digital.blogwyrm.com\/?p=1467"},"modified":"2022-07-04T06:23:29","modified_gmt":"2022-07-04T10:23:29","slug":"metropolis-hastings-1","status":"publish","type":"post","link":"https:\/\/aristotle2digital.blogwyrm.com\/?p=1467","title":{"rendered":"Metropolis-Hastings I"},"content":{"rendered":"\n<p>The last two columns explored the notion for transition probabilities, which are conditional probabilities that express the probability of a new state being realized some system given the current state that the system finds itself in.&nbsp; The candidate case that was examined involved the transition probabilities of between weather conditions (sunny-to-rainy, sunny-to-sunny, etc.).&nbsp; That problem yielded answers via two routes: 1) stationary analysis using matrix methods and 2) Markov Chain Monte Carlo (MCMC).&nbsp;<\/p>\n\n\n\n<p>The analysis of the weather problem left one large question looming.&nbsp; Why anyone would ever use the MCMC algorithm when the matrix method was both computationally less burdensome and also gave exact results compared to the estimated values that naturally arise in an Monte Carlo method.<\/p>\n\n\n\n<p>There are many applications but the most straightforward one is the Metropolis-Hasting algorithm for producing pseudorandom numbers from an arbitrary distribution.&nbsp; In an <a href=\"http:\/\/aristotle2digital.blogwyrm.com\/?p=1408\">earlier post<\/a>, we presented an algorithm for producing pseudorandom numbers within a limited class of distributions using the <a href=\"https:\/\/towardsdatascience.com\/generate-random-variables-using-inverse-transform-method-in-python-8e5392f170a3\">Inverse Transform Method<\/a>.&nbsp; While extremely economical the Inverse Transform Method depends on the finding invertible relationship between a uniformly sampled pseudorandom number $y$ and the variable $x$ that we want distributed according to some desired distribution.<\/p>\n\n\n\n<p>The textbook example of the Inverse Transform Method is the exponential distribution.&nbsp; The set of pseudorandom numbers given by<\/p>\n\n\n\n<p>\\[ &nbsp;x = -\\ln (1-y) \\; , \\]<\/p>\n\n\n\n<p>where $y$ is uniformly distributed on the interval $[0,1]$, is guaranteed to be distributed exponentially.&nbsp;<\/p>\n\n\n\n<p>If no such relationship can be found, the Inverse Transform Method fails to be useful.&nbsp;<\/p>\n\n\n\n<p>The Metropolis-Hastings method is capable to overcoming this hurdle at the expense of being computationally much more expensive.&nbsp; The algorithm works as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Setup<ul><li>Select the distribution $f(x)$ from which a set random numbers are desired<\/li><\/ul><ul><li>Arbitrarily assign a value to a random realization $x_{start}$ of the desired distribution consistent with its range<\/li><\/ul><ul><li>Select a simple distribution $g(x|x_{cur})$, centered on the current value of the realization from which to draw candidate changes<\/li><\/ul><\/li><li>Execution<ul><li>Draw a change from $g(x|x_{cur})$<\/li><\/ul><ul><li>Create a candidate new realization $x_{cand}$<\/li><\/ul><ul><li>Form the ratio $R = f(x_{cand})\/f(x_{cur})$<\/li><\/ul><ul><li>Make a decision as the keeping or rejecting $x_{cand}$ as the new state to transition to according to the Metropolis-Hastings rule<ul><li>If $R \\geq 1$ then keep the change and let $x_{cur} = x_{cand}$<\/li><\/ul><ul><li>If $R &lt; 1$ then draw a random number $q$ from the unit uniform distribution and make the transition if $q &lt; R$<\/li><\/ul><\/li><\/ul><\/li><\/ul>\n\n\n\n<p>The bulk of this algorithm is very straightforward and simple.&nbsp; The only confusing part is the Metropolis-Hastings decision rule in the last step and why it requires the various pieces above to work.&nbsp; The next post will be dedicated to a detailed discussion of that rule.&nbsp; For now, we\u2019ll just show the algorithm in action with various forms of $f(x)$ which are otherwise intractable.<\/p>\n\n\n\n<p>The python code for implementing this algorithm is relatively simple.&nbsp; Below is a snippet from the implementation that produced the numerical results that follow.&nbsp; The key points to note are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>the numpy unit uniform distribution ($U(0,1)$) <strong><em>random<\/em><\/strong> was used for both $g(x|x_{cur})$ and the Metropolis-Hastings decision rule<\/li><li><strong><em>N_trials<\/em><\/strong> is the number of times the decision loop was executed but is not the number of resulting accepted or kept random numbers, which is necessarily a smaller number<\/li><li><strong><em>delta<\/em><\/strong> is the parameter that governs the size of the candidate change (i.e., $g(x|x_{cur}) = U(x_{cur}-\\delta,x_{cur}+\\delta)$<\/li><li><strong><em>n_burn<\/em><\/strong> is the number of initial accepted changes to ignore or throw away (more on this next post)<\/li><li><strong><em>x_0 <\/em><\/strong>is the arbitrarily assigned starting realization value<\/li><li><strong><em>prob<\/em><\/strong> is a pointer to the function $f(x)$<\/li><\/ul>\n\n\n\n<p>In addition to these input parameters, the value $f = N_{kept}\/N_{trials}$ was reported.&nbsp; The rule of thumb for Metropolis-Hastings is to have a value $0.25 \\leq f \\leq 0.33$.<\/p>\n\n\n\n<div class=\"myQuoteDiv\">\n<pre class=\"wp-block-preformatted\">x_lim&nbsp;&nbsp;&nbsp; = 2.8\nN_trials = 1_000_000\ndelta&nbsp;&nbsp;&nbsp; = 7.0\nDs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = (2*random(N_trials) - 1.0)*delta\nRs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = random(N_trials)\nn_accept = 0\nn_burn&nbsp;&nbsp; = 1_000\nx_0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = -2.0\nx_cur&nbsp;&nbsp;&nbsp; = x_0\nprob&nbsp;&nbsp;&nbsp;&nbsp; = pos_tanh\np_cur&nbsp;&nbsp;&nbsp; = prob(x_cur,-x_lim,x_lim)\n&nbsp;\nx = np.arange(-3,3,0.01)\nz = np.array([prob(xval,-x_lim,x_lim) for xval in x])\n&nbsp;\nkept = []\nfor d,r in zip(Ds,Rs):\n&nbsp;&nbsp;&nbsp; x_trial = x_cur + d\n&nbsp;&nbsp;&nbsp; p_trial = prob(x_trial,-x_lim,x_lim)\n&nbsp;&nbsp;&nbsp; w&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = p_trial\/p_cur\n&nbsp;&nbsp;&nbsp; if w &gt;= 1 or w &gt; r:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; x_cur = x_trial\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; n_accept += 1\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if n_accept &gt; n_burn:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; kept.append(x_cur)<\/pre><\/div>\n\n\n\n<p>Two arbitrary and difficult functions were used for testing the algorithm:&nbsp; $f_1(x) = |tanh(x)|\/N_1$ and $f_2(x) = \\cos^6 (x)\/N_2$.&nbsp; $N_1$ and $N_2$ are normalization constants obtained by numerically integrating over the range $[-2.8,2.8]$, which was chosen for strictly aesthetic reasons (i.e., the resulting plots looked nice).<\/p>\n\n\n\n<p>The two parameters that had the most impact of the performance of the algorithm were the values of <strong><em>delta<\/em><\/strong> and <strong><em>x_0<\/em><\/strong>. The metric chosen for the \u2018goodness\u2019 of the algorithm is simply a visual inspection of how well the resulting distributions matched the normalized probability density functions $f_1(x)$ and $f_2(x)$.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Case 1 &#8211; $f_1(x) = |tanh(x)|\/N_1$<\/h2>\n\n\n\n<p>This distribution was chosen since it has a sharp cusp in the center and two fairly flat and broad wings.&nbsp; The best results came from a starting point $x_0 = -2.3$ over in the left wing (the right wing also worked equally well) and a fairly large value of $\\delta=7.0$ was chosen such that $f \\approx 1\/3$.&nbsp; The agreement is quite good.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_left_delta_big.png\" alt=\"\" class=\"wp-image-1463\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_left_delta_big.png 640w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_left_delta_big-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure><\/div>\n\n\n\n<p>A smaller value of $\\delta$ lead to more samples being collected ($f=0.85$) but the realizations couldn\u2019t \u2018shake off the memory\u2019 that they started over in the left wing and the results were not very good, although the two-wing structure can still be seen.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_left_delta_small.png\" alt=\"\" class=\"wp-image-1461\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_left_delta_small.png 640w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_left_delta_small-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure><\/div>\n\n\n\n<p>Starting off with $x_0 = 0$ resulted in a realization of random numbers that looks far more like the uniform distribution than the distribution we were hunting.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_middle_delta_big.png\" alt=\"\" class=\"wp-image-1462\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_middle_delta_big.png 640w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/pos_tanh_x0_middle_delta_big-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<p>This result persists with smaller values of $\\delta$.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Case 2 &#8211; $f_2(x) = \\cos^6(x)\/N_2$<\/h2>\n\n\n\n<p>In this case, the distribution has a large central hump or mode and two truncated wings.&nbsp;&nbsp; The best results came from a starting value of $x_0$ at or near zero and a large value of $\\delta$.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_middle_delta_big.png\" alt=\"\" class=\"wp-image-1465\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_middle_delta_big.png 640w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_middle_delta_big-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure><\/div>\n\n\n\n<p>Lowering the value of $\\delta$ leads to the algorithm being trapped in the central peak with no way to cross into the wings.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_middle_delta_small.png\" alt=\"\" class=\"wp-image-1466\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_middle_delta_small.png 640w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_middle_delta_small-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure><\/div>\n\n\n\n<p>Finally, starting in the wings with a large value of $\\delta$ leads to a realization that looks like it has the general idea as to the shape of the distribution but without the ability to capture the fine detail.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_right_delta_big.png\" alt=\"\" class=\"wp-image-1464\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_right_delta_big.png 640w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2022\/05\/cos_pos_x0_right_delta_big-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure><\/div>\n\n\n\n<p>This behavior persists even when the number of trials is increased by a factor of 10 and no immediately satisfactory explanation exists.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The last two columns explored the notion for transition probabilities, which are conditional probabilities that express the probability of a new state being realized some system given the current state&#8230; <a class=\"read-more-button\" href=\"https:\/\/aristotle2digital.blogwyrm.com\/?p=1467\">Read more &gt;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1467","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/1467","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1467"}],"version-history":[{"count":1,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/1467\/revisions"}],"predecessor-version":[{"id":1481,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/1467\/revisions\/1481"}],"wp:attachment":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1467"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1467"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1467"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}