{"id":718,"date":"2018-02-23T23:30:58","date_gmt":"2018-02-24T04:30:58","guid":{"rendered":"http:\/\/aristotle2digital.blogwyrm.com\/?p=718"},"modified":"2021-11-25T19:57:03","modified_gmt":"2021-11-26T00:57:03","slug":"to-great-lengths","status":"publish","type":"post","link":"https:\/\/aristotle2digital.blogwyrm.com\/?p=718","title":{"rendered":"To Great Lengths"},"content":{"rendered":"<p>It is well known that developers will go to great lengths to speed up algorithms or optimize code.\u00a0 Performance, after all, is king.\u00a0 In general, accelerating throughput comes in three flavors.<\/p>\n<p>The first category covers all those really sophisticated ideas that come from a clear and insightful understanding of an algorithm or a physical model.<\/p>\n<p>The examples that spring to mind in the case of the algorithm are the development of the solution to the traveling salesman problem or the invention of the Fast Fourier Transform.\u00a0 In these cases, solutions to the problem existed but generalizing these so that they would perform well in an arbitrary setting proved to be challenging and elusive.\u00a0 It took decades of many dedicated people\u2019s times to wrangle out the techniques needed to transform the elementary examples into a working technique.<\/p>\n<p>A keen example of how optimization for a physical model only comes from a deep understanding of the physics is the story that <a href=\"http:\/\/physics.cornell.edu\/peter-lepage\">Peter Lepage<\/a> relayed at a seminar that I attended.\u00a0 The story is powerful and sticks with me to this day.\u00a0 At the beginning of the seminar he took moment to set aside a laptop (1990s vintage) on which he started a \u2018job\u2019. He then proceeded to talk about the evolution of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Quantum_chromodynamics\">Quantum Chromodynamics<\/a> (QCD) in the early days after the theory had been proposed.\u00a0 Every few years, the team he was on would go to the NSF requesting more powerful super-computers with which to solve the QCD problem, giving in return more promises that the solution was just at hand.\u00a0 After 3 or 4 iterations of this loop, Lepage, said that he realized that the team had lived through something like 5 progressions of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Moore's_law\">Moore\u2019s law<\/a>.\u00a0 The computing power had increased by a factor of roughly 32 and yet the solution remained out-of-reach.\u00a0 He then decided that the fault, dear Brutus, laid not in their stars but in themselves in that their algorithm was all wrong (my paraphrasing of Shakespeare not his).\u00a0 At this point, he interrupted the seminar and went over to the laptop.\u00a0 Examining it, he declared that during the time he was speaking, a small program on his machine had solved a QCD problem and had \u2018discovered\u2019 the <a href=\"https:\/\/en.wikipedia.org\/wiki\/J\/psi_meson\">J\/Psi particle<\/a> and a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Quark%E2%80%93gluon_plasma\">quark ball<\/a>.\u00a0 He went on to explain that the performance gain he was able to create came not from better computing or large, more powerful computers, but from reworking the algorithm so that the physics (in this case <a href=\"https:\/\/en.wikipedia.org\/wiki\/Renormalization\">renormalization<\/a>) were properly considered.<\/p>\n<p>In the second category, are the general optimization techniques that are taught in computer science.\u00a0 These involve a careful examination of how to do common things like multiplying out a polynomial, sorting a list, convolving data and so on. \u00a0Counting operations and thinking about grouping and ordering are the usual ways of tackling these problems.<\/p>\n<p>For example, a usual strategy to speed up polynomial evaluation is to refactor the usual way a polynomial is presented so as to minimize the number of multiplications.\u00a0 The fourth-order polynomial<\/p>\n<p>\\[ a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 \\]<\/p>\n<p>transforms into<\/p>\n<p>\\[ a_0 + x ( a_1 + x ( a_2 + x ( a_3 + a_4 x) ) ) \\; ; \\]<\/p>\n<p>far less readable but far more efficient.\u00a0 The former case has 4 additions and 10 multiplies (assuming no calls to \u2018pow\u2019 are made).\u00a0 The latter case still has 4 additions but only 4 multiplies, resulting in a huge savings if the polynomial is evaluated frequently.<\/p>\n<p>Another example I encountered recently centered on the binning of a large list of floating point data into a smaller array.\u00a0 A colleague of mine was using one of the standard scientific packages and he was amazed at the performance difference that resulted simply from changing the order of operations.<\/p>\n<p>To illustrate, with a concrete case, consider the case where you have millions of floating point numbers that fall somewhere in the range between 0 and 8.\u00a0 In addition, you have 8 bins, labeled B0 through B7 into which to place them.<\/p>\n<p><a href=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Lists_and_bins.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-721\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Lists_and_bins.png\" alt=\"\" width=\"857\" height=\"273\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Lists_and_bins.png 857w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Lists_and_bins-300x96.png 300w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Lists_and_bins-768x245.png 768w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Lists_and_bins-810x258.png 810w\" sizes=\"auto, (max-width: 857px) 100vw, 857px\" \/><\/a>While it may seem natural, a clumsy way to place the numbers within the bins would be to pick a given bin and then traverse the list of floats finding all those that belong in that bin and then starting with the next bin and so on.\u00a0 This requires the traversal of the large list 8 times.\u00a0\u00a0 A better way would be to pick a number off of the list of floats and place it into the appropriate bin, requiring only 1 movement through the list.\u00a0 These simple things make a huge difference but it isn\u2019t always easy to tell how to massage a commercial piece of software into doing one versus the other.<\/p>\n<p>At this point, you may be thinking that you understand the previous two categories.\u00a0 While you may never have the insight to contribute to the first category, you can appreciate the results.\u00a0 And you most likely can and will apply the techniques of the other in your day-to-day computing task.\u00a0 So what exactly falls into the third category?<\/p>\n<p>Well, the third category comprises all of those incredible \u2018tricks\u2019 that only result from equal parts inspiration (first category), standard algorithmic understanding (second category), pride, and outright desperation.<\/p>\n<p>The poster child for this category is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fast_inverse_square_root\">\u2018Fast Inverse Square Root\u2019<\/a>, an algorithm that was used to perform a high-performance estimate to $1\/\\sqrt{x}$, since this computation is the primary one needed to normalize vectors and compute scalar products for lighting in video games.<\/p>\n<p>The Wikipedia article, which gives the history, context, and analysis of the algorithm is worth reading and maybe should be mandatory for all game designers and computer scientists.\u00a0 I\u2019ll not try to cover it here except to make a few observations.<\/p>\n<p>First, the following figure, adapted from the Wikipedia article, shows the short set of lines needed to implement the algorithm<\/p>\n<p><a href=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Fast_Inverse_Sqrt.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-722\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Fast_Inverse_Sqrt.png\" alt=\"\" width=\"857\" height=\"313\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Fast_Inverse_Sqrt.png 857w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Fast_Inverse_Sqrt-300x110.png 300w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Fast_Inverse_Sqrt-768x280.png 768w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/03\/Fast_Inverse_Sqrt-810x296.png 810w\" sizes=\"auto, (max-width: 857px) 100vw, 857px\" \/><\/a><\/p>\n<p>It may have been fast, but hardly maintainable.\u00a0 The comments (most amusing) don\u2019t reveal much in how the algorithm works.\u00a0 Why, precisely, is there a cast from float to integer and what is the origin of the very specific hex value of <strong>0x5f3759df<\/strong>, and to what algorithm do the lines decorated with 1<sup>st<\/sup> iteration and 2<sup>nd<\/sup> iteration refer?\u00a0 Apparently, the second question haunted the developer who put in the comment that I sanitized for this venue.\u00a0 Answers to all of these questions can be found in Wikipedia but it is worth pondering the mindset that led to the lines pictured above.<\/p>\n<p>The second comment, a whimsical afterthought if you will, is the astonishing resemblance between John Carmack, credited with the fast inverse square root and Gary Cole, the actor who played Bill Lumberg in <em>Office Space<\/em><\/p>\n<p><a href=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/02\/Uncanny-Resemblance.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-717\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/02\/Uncanny-Resemblance.png\" alt=\"\" width=\"857\" height=\"422\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/02\/Uncanny-Resemblance.png 857w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/02\/Uncanny-Resemblance-300x148.png 300w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/02\/Uncanny-Resemblance-768x378.png 768w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2018\/02\/Uncanny-Resemblance-810x399.png 810w\" sizes=\"auto, (max-width: 857px) 100vw, 857px\" \/><\/a><\/p>\n<p>a movie about the great lengths to which developers will go.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is well known that developers will go to great lengths to speed up algorithms or optimize code.\u00a0 Performance, after all, is king.\u00a0 In general, accelerating throughput comes in three&#8230; <a class=\"read-more-button\" href=\"https:\/\/aristotle2digital.blogwyrm.com\/?p=718\">Read more &gt;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-718","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/718","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=718"}],"version-history":[{"count":0,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/718\/revisions"}],"wp:attachment":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=718"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=718"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=718"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}