Latest Posts

To Great Lengths

It is well known that developers will go to great lengths to speed up algorithms or optimize code.  Performance, after all, is king.  In general, accelerating throughput comes in three flavors.

The first category covers all those really sophisticated ideas that come from a clear and insightful understanding of

Logically the Best

In keeping with the New Years Theme, this month’s column lists the five most important accomplishments from the world of logic, mathematics, computing, or philosophy in the last 100 years.  This idea is an offshoot of the books Five Golden Rules: Great Theories of 20th Century Mathematics – and Why They Matter

and Five More Golden Rules: Knots, Codes, Chaos and Other Great Theories of 20th Century Mathematics,

both written by John Casti.  In these works, Casti devotes a chapter to each of the following:

  1. Minimax Theorem (Game Theory)
  2. Brouwer Fixed-Point Theorem (Topology)
  3. Morse’s Theorem (Singularity Theory)
  4. Halting Theorem (Theory of Computation)
  5. Simplex Method (Optimization Theory)
  6. Alexander Knot Polynomial (Knot Theory)
  7. Hopf Bifurcation (Dynamical System Theory)
  8. Kalman Filter (Control Theory)
  9. Hahn-Banach Theorem (Functional Analysis)
  10. Shannon Coding Theorem (Information Theory)

with items 1-5 being from the first book and 6-10 from the second.

The idea of ranking my own top 5 had been pinballing around the back of my mind since I first encountered these books in 2015 and when the opportunity presented itself for this month’s theme, I thought I would seize it and run.  But it is important to be aware that the list I’ve compiled here is my own, inspired by Casti’s work, but independent in both content and analysis.  In fact, I think Casti misses the two most important contributions even with a list of 10 – but that is a matter of judgement and taste.

So, without any more preamble, let’s get to the list.

  1. Kalman Filter

It is hard to overemphasize the Kalman filter’s importance to real time computing.  The algorithm allows for fitting and estimation to be performed, in the presence of noise and uncertainty, during the course of process, such as the motion of a robotic arm or the flight of a spacecraft.  The filter takes in measured points as they become available and ‘re-fits’ the new point in the context of all previous points thereby giving an improved estimate of the state of the system.  The key point is that the system directly takes into account both the uncertainty in the measurements as well as the short-comings of any physical model of the motion.  The introduction of the Kalman Filter gave rise to a whole industry of algorithm development (extended Kalman Filters, unscented filters, Bayesian networks, etc.) that balance model-based prediction (e.g. Newton’s laws) with uncertainty in knowledge and measurement – the essential ingredient for computation in the real world.  Final note:  Casti also recognizes the Kalman filter.

  1. Optimization Techniques

This entry is a catch-all for a variety of algorithms and approaches that have been stirred into action with the advent of the computer.  While not particularly glamourous, optimization methods solve a host of real world problems associated with getting the best; as long as best can be mathematically encoded in something called an objective function.  There are two basic families of optimization algorithms:  indirect and direct.  In the indirect method, additional equations, typically called adjoint equations, are added to the problem to be solved and the response of the new variables in these equations indicates when controls should be applied and how.  In the direct methods, a set of controls are assumed at the onset and their values are adjusted to achieve optimality.  Optimization techniques find application in thousands of practical problems including trajectory optimization, operations research, and economic modeling.  Final note: Casti singles out the simplex method for special recognition.  While this method is of particular importance in both the history of optimization and the particular application to economics, its limitation to linear problems is a significant liability, hence the larger scope of this entry.

  1. Shannon’s Mathematical Model of Information

In 1948, Claude Shannon published A Mathematical Theory of Communication in The Bell System Technical Journal.  This paper, considered a landmark, set the stage for the modern digital age and for the advances in the field of telecommunications (wired and wireless).  Building on the ideas of Nyquist and Hartley, Shannon essentially created the field of information theory and the digital signal processing that follows in its train.  His theory defined the notion of information (a bit), sets the theoretical limits on the amount of information that can be transfers, adapted the idea of entropy to communications, and provided a set of theorems that defined what could and could not be done in a communications system.  Final note: Casti recognizes the importance of Shannon’s work in his second book and increased my own recognition of the importance of Shannon’s work.

  1. Fast-Fourier Transform

Deemed as one of the most important numerical algorithms ever created, the fast fourier transform or FFT is the workhorse of the modern digital age.  There are so many applications that it is hard to even list a fraction of them but some of the most important ones are:  1) signal processing in telecommunications, 2) image construction in medical imaging, and 3) spectrum analysis of scientific data ranging from the mechanics of materials to response of electrical components to quantum mechanics.  An FFT system resides in virtually every home in the US in the high definition televisions, cellphones, videogame systems, and so on.  Final note:  Casti has no mention that I’ve found of the FFT.

  1. Gödel’s Theorem

Rounding out the top of this list is the far-reaching analysis on the limit of logic (mathematical and otherwise) that Kurt Godel provided the world in 1931.  This theorem, along with the closely related theorems due to Tarski (undefinability theorem), Church (undecidability) , and Turing (halting theorem) states that formal logic must be either incomplete or inconsistent.  All of these related theorems deal with how logic views or expresses statements about itself and have profound implications on human thought, philosophy, and artificial intelligence.  I also interpret them as saying something essential about man as a rational animal and, therefore, about the human condition as I’ve written in earlier posts (Gödel’s Theorem – General Notions, Turing, Godel, and the Universe, Logic and Limits, Bare Bones Halting, and Self-Reference and Paradoxes).  The full content of these results will no doubt be explored for years to come.  Final note:  Casti focuses more on the halting theorem and its implications for computing.

Throwing Darts

Riding a bike, catching a ball, driving a car, throwing a dart – these and countless other similar, everyday activities involve the very amazing human capacity to target.  This capacity has yet to be duplicated in the machine world even in its most sophisticated applications.

To expand on this concept, take dart throwing as the prototype for analysis.  In the act of hitting a bullseye, a brain, an eye, a hand, a dart, a dartboard, a throw, and a landing all come together to produce a spectacular trajectory where the missile hits exactly where the thrower intends.

How does the person learn to throw accurately?  The equations governing the flight of the dart are nonlinear and subject to a variety of errors.  Clearly practice and feedback from eye to brain to hand are required but how exactly does the training take hold?

While some believe that they know the answer, an honest weighing of the facts suggest that we don’t have more than an inkling into the mechanism by which intelligence and repetition interact to produce a specific skill.  In Aristotelian terms, the mystery lies in how a human moves from having the potentiality to score a bullseye into being able to do it dependably.

What does seem to be clear is that this human mechanism, whatever it actually is, is quite different from any of the mathematical processes that are used in machine applications.  To appreciate just how different, let’s look at one specific mathematical targeting method: differential correction.

Differential correction is the umbrella term that covers a variety of related numerical techniques, all of them based on Newton’s method.  Related terms are the secant method, Newton-Raphson, the shooting method, and forward targeting.  Generally speaking, these tools are used to teach a computer how to solve a set of nonlinear equations (e.g., the flight of a dart, the motion of a car, the trajectory of a rocket, etc.) that symbolically can be written as

 \Phi: \bar V \rightarrow \bar G \; ,

where \bar V are the variables that we are free to manipulate and \bar G are the goals we are trying to achieve.  The variables are expressed in an n-dimensional space that, at least locally, looks like \mathbb{R}^n.  Likewise, the goals live in an m-dimensional space that is also locally \mathbb{R}^m.  The primary situation where these spaces deviate from being purely Cartesian is when some subset of the variables and/or goals are defined on a compact interval, the most typical case being when a variable is a heading angle.

These fine details and others, such as how to solve the equations when m \neq n, are important in practice but are not essential for this argument.  The key point is that the mapping \Phi only goes one way.  We know how to throw the dart at the board and then see where it lands.  We don’t know how to start at the desired landing point (i.e., the bullseye) and work backwards to determine how to throw.   Likewise, in the mathematical setting, the equations linking the variables to the goals can’t be easily inverted.

Nonetheless, both the natural and mathematical targeting processes allow for a local inversion in the following way.  Once a trial throw is made and the resulting landing point seen and understood, the initial way the dart was thrown can be modified to try to come closer.  This subsequent trial informs further throws and often (although not always) the bullseye can be eventually reached.

To illustrate this in the differential correction scheme, suppose the flight of the dart depends on two variables x and y (the two heading angles right-to-left and up-to-down) and consider just how to solve these two equations describing where the dart lands on the board (the values of f and g):

 x^2 y = f

and

 2 x + \cos(y) = g \; .

To strike the bullseye means to solve these equations for a specified set of goals f^* and g^*, which are the values of the desired target.  There may be a clever way to solve these particular equations for x and y in terms of f and g, but one can always construct so-called transcendental equations where no such analytic solution can ever be obtained (e.g. Kepler’s equation), so let’s tackle them as if no analytic form of the solution exists.

Assuming that we can find a pair of starting values x_0 and y_0 that get us close to the unknown solutions x^* and y^*, we can ‘walk’ this first guess over to the desired initial conditions by exploiting calculus.  Take the first differential of the equations

 2 x y dx + x^2 dy = df

and

 2 dx + \sin(y) dy = dg \; .

Next relax the idea of a differential to a finite difference – a sort of reversal of the usual limiting process by which we arrive at the calculus in the first place – and consider the differentials of the goals as the finite deviations of the target values from those achieved on the first throw:  df = f_0 - f^* and dg = g_0 - g^*.  Likewise, the differential dx = x_0 - x^* and dy = y_0 - x^*.   Since we now have two linear equations in the two unknowns x^* and y^*, we can invert to find a solution.  Formally, this approach is made more obvious by combining the system into a single matrix equation

 \left [ \begin{array}{cc} 2 x y & x^2 \\ 2 & \sin(y) \end{array} \right] \left[ \begin{array}{c} x_0 - x^* \\ y_0 - y^* \end{array} \right] = \left[ \begin{array}{c} f_0 - f^* \\ g_0 - g^* \end{array} \right] \; .

The solution of this equation is then

 \left[ \begin{array}{c} x^* \\ y^* \end{array} \right] = \left[ \begin{array}{c} x_0 \\ y_0 \end{array} \right] - \left [ \begin{array}{cc} 2 x y & x^2 \\ 2 & \sin(y) \end{array} \right]^{-1} \left[ \begin{array}{c} f_0 - f^* \\ g_0 - g^* \end{array} \right] \; .

Since the original equations are non-linear, this solution is only an estimate of the actual values and, by taking these estimates as the improved guess, x_1 and y_1, an iteration scheme can be used to refine the solutions to the desired accuracy.

Everything rests on two major assumptions:  1) a sufficiently close first guess can be obtained to seed the process and 2) the equations are smooth and can be differentiated.  By examining each of these assumptions, we will arrive at a keen distinction between machine and human thought.

First the idea of a first guess is so natural to us that it is often hard to even notice how crucial it is.  Just how do we orient ourselves to the dartboard.  The obvious answer is that we look around to find it or we ask someone else where it is.  Once in sight, we then lock our attention on it and finally we throw the dart.  But this description isn’t really a description at all.  It lacks the depth needed to explain it to a blind person let alone explain it to the machine.  In some fashion, we integrate our senses into the process in a high non-trivial way before even the first dart is thrown.

Returning to the mathematical model, consider how to find x_0 and y_0 when f^* = 2.5 and g^* = 4.0.  The traditional way is to make a plot of the mapping \Phi, which may be considered as taking the usual Cartesian grid in x and y and mapping it to a set of level curves.  A bit of playing with the equations leads one to the plot

where level curves of f are shown in black and those for g are in blue.  The trained eye can guess a starting point at or around (1.5,1.0) but, for the sake of this argument, let’s pick (1.0,0.5) (the green dot on the figure).  In just a handful of iterations, the differential correction process hones in on the precise solution of (1.57829170833, 1.00361110653) (the black dot on the plot).  But the mystery lies in the training of the eye and not in the calculus that follows after the first guess has been chosen.  What is it about the human capacity that makes looking at the plot such an intuitive thing?  Imagine trying to teach a machine to accomplish this.  The machine vision algorithms needed to look at the plot and infer the first guess still don’t exist, even though the differential correction algorithm is centuries old.

Even more astonishing, once realized, is that the human experience neither deals with symbolically specified equations nor are the day-to-day physical processes for throwing darts, driving cars, or any of the myriad other things people do strictly differentiable.  All of these activities are subjected to degrees of knowledge and control errors.  Inexactness and noise pervade everything and yet wetware is able to adapt to extend in a way the software can’t.  Somehow the human mind deals with the complexity in an emergent way that is far removed from the formal structure of calculus.

So, while it is true that the human and the machine both can target and that targeting entails trial-and-error and iteration, they do it in completely different ways.  The machine process is analytic and local and limited.  The human process is emergent, global, and heuristic.  In addition, we can teach ourselves and others how to do it and even invent the mathematics needed to teach a pale imitation of it to computers, but we really don’t understand how.

Logic and Limits

For those of us who grew up in the seventies and had an interest in things other than sex, drugs, and rock and roll, Star Trek was a mainstay.  The characters, the visuals, the storylines all inspired a generation of scientists, engineers, and explorers.  And no other character in Star Trek epitomized a bright new future full of exciting technology more than did Mr. Spock.  As a Vulcan, his strong embrace of logic influenced many of us to value reason and clear thinking ever since.

But a curious aspect of Vulcans, in general, and Mr. Spock, in particular, is their illogical devotion to logic.  Although the original show frequently emphasized that logic wasn’t enough (often through the intuition or hunches of Kirk, McCoy, or Scott) and that the human experience transcends logic, there is no portrayal that the Vulcans ever examine the basis of logic.  As a race, they never seem to engage in meta-logic, the logical look at the underpinnings of logic and an exploration of what it can do well and where it fails.

Clearly this point is not the central concern for the show nor should it be.  Nonetheless, it is fun to speculate what Spock might have done should the writers had known about the limits of logic.  I’m not speaking of something as sophisticated as the implications of Gödel’s theorem.  Rather, I am talking about the simple and obvious issues associated with logic.  The need for reasoning to be based on the finite knowledge of beings, the need for first principles or axioms, and the boot-strapping needed to go from a guess to a hypothesis to a theory with strong evidentiary support.  In short, inductive reasoning.

Induction is the key pillar of scientific advances; it is the method by which all knowledge of the physical world is explored and gathered.  Induction is inherently probabilistic in the way it tries to link cause and effect (think of the number of times Kirk asked Spock for the odds that such-and-such a thing would happen) and has been remarkable successful in helping mankind grope its way to a better understanding of the physical world.  But despite its success, its logical foundations are weak.

A nice discussion about the issues associated with inductive reasoning is found in Chapter 5 of Introduction to Logic, by Harry J. Gensler and I’ve cited Gensler and his discussions in several posts in the past.

What struck me about the arguments found towards the end of Chapter 5 is the remarkably fragile way that the whole logical structure holds together when examined closely.  In other words, logic examining logic finds itself disappointing in two areas – the proper formulation of induction principles and the justification for believing an inductive argument.

The element of probability inherent in induction can lead to some absurd and laughable conclusions.  Gensler offers this statistical syllogism by way of illustration.

60 percent of All Cleveland voters are Democrats
This non-Democrat is a Cleveland voter
This is all we know about the matter
Therefore, it’s 60 percent probable that this non-Democrat is a Democrat

Of course, this is a nonsense argument; but before being quick to dismiss it consider these three areas where arguments like this are not easy to spot and can do damage.  First is the student (at any age) who is learning a subject for the first time.  Unless the entire curriculum is memorization, somewhere along the line the student will need to infer something based on the knowledge in hand.  This step is usually where beginners make their mistakes – jumping to an unwarranted conclusion precisely because their induction skills (their intuition) haven’t been honed within the broader context.  This leads to frustration and, far to often, abandonment of the subject.  Extending this idea to machine-based inference engines, ones whose intuition can’t be improved through learning (at least not in the way humans learn), brings to the forefront the fact that the need for rigorous laws of inductive inference is far more acute.  Even though it makes for engaging science fiction, we don’t want a machine in our lives coming up with an absurd conclusion just because of sloppy rules.  And finally, large components of society are constantly being grifted by arguments of the Cleveland-voter sort.  We see it in politics, advertising, and all the other sophistical interactions surrounding us.  To quote P.T. Barnum, “There’s a sucker born every minute.”  Unfortunately, as Gensler goes to great pains to point out, there doesn’t seem to be a way to make the rules of inductive inference tighter.  Even principles, like Occam’s Razor, are merely guidelines that must be applied with judgement.  Or, as I like to phrase it, there doesn’t seem to be a way to make inductive logic ‘sucker-proof’.

Even more interesting is the fact that justification of inductive arguments seems to be self-referential and circular.  Gensler discusses the criticism leveled by David Hume against induction and the five responses that have arrived over the intervening years.  I won’t go over all five in detail but will confine the discussion to the three most interesting ones.

  1. Justification for induction can be done by presuming nature is uniform, meaning that it works in regular patterns. I happen to favor this justification even though Gensler is correct in insisting that it is hopelessly vague.  Curiously, his most interesting observation is that the idea that nature is uniform is based on arguing from prior experience.  This argument is inductive and thus we have induction justifying induction.
  2. Closely related is the idea that inductive methods have been so profoundly successfully in every aspect of life that they must be correct. Of course, this is another circular argument in which an inductive syllogism is used to justify the inductive method.
  3. The final response Gensler cites is the most interesting – that we approach the justification of inductive logic in the same way we approach the justification for deductive logic. He singles out modus ponens as that particular part of deductive logic worth examining.  Why does modus ponens work and why should we believe it?  This last question is most germane when considering the truth table for modus ponens where the premise is false but the consequent is true (A \rightarrow B is true if A is false but B is true).  In some sense this is an arbitrary convention (although a reasonable one – if I can use a very vague word).  So why should we expect modus ponens to work?  How do we justify it?  He first says that we might try using this syllogism
    If the truth table for modus ponens never gives true premises and a false conclusion, then modus ponens is valid.
    The truth table for modus ponens never gives true premises and a false conclusions.
    Therefore, modus ponens is valid.

    But here we have a problem, we are using modus ponens to justify modus ponens.  The justification, even in the firm ground of deductive logic, leaves something to be desired.

In the end, it seems that we have to simply accept that logic has its limits and that, as a race, we’ve bootstrapped our way to our present level of knowledge.  This holds for inductive and deductive logic, with the former providing the basis for the latter (i.e., experience has taught us that deductive logic axioms are the ones we need to assume to get the whole thing to work).  Gensler concludes with:

Inductive reasoning has been very useful.  Instructively, we assume that it will continue to be useful.  In our lives, we can’t do without it.  But the intellectual basis for inductive reasoning is shaky.

It would have been interesting to hear Mr. Spock discuss these points, to argue these a bit with the more intuitive Kirk.  But perhaps a serious examination would have undermined the entire purpose that he served on the show.  Perhaps it would have undermined the very reason for Vulcan to be, and maybe that would have been an episode worth seeing.

Logic and Politics

There is a widespread belief about politicians and used car salesmen that both species can often speak with a lot of gusto, using sizeable portions of the English language, without actually saying much or anything at all.

Sometimes there is a certain charm in circumlocution. For example, the legendary speech by Noah 'Soggy' Sweat Jr. certainly can be whittled down to 'it all depends' but at the loss of a great deal of fun and verve.

On other occasions, ambiguity and double-speak can be misleading or even down-right deadly. An excellent example of such a situation is in the following vivid exchange taken from Isaac Asimov's Foundation.

For those unfamiliar with this recognized science fiction classic, the story is set far in the future of mankind in which a vast galactic empire is beginning its decline. The Foundation is a large colony of scientists and researchers who the empire has exiled to the galactic fringe in order that they may compile an encyclopedia that catalogs the knowledge of the race in advance of the collapse. In actuality, the empire has been tricked into forcing their exile so that the colony can become the nucleus around which the next empire coellesces.

The out-of-the-way planet that the Foundation calls home sits within a set of smaller star systems that are breaking away from the empire as the latter's influence and dominion recedes. The following excerpt comes from a strategy meeting where the Foundation's leaders are trying to determine how to respond to the ulimatum they've just received from Anacreon, the largest of the breakway states, in light of recent diplomatic visits of the delegations from Anacreon and the Galactic Empire.

The exchange starts with Salvor Hardin, the Foundation's politically-savy mayor, trying to convince the board of Trustees, who oversee the completion of the encyclopedia, just how precarious their situation is. Hardin's position is that the Board's appeal to the Empire for protection was the cause of the threat from Anacreon.

Said Yate Fulham: "And just how do you arrive at that remarkable conclusion, Mr. Mayor?"

"In a rather simple way. It merely required the use of that much-neglected commodity – common sense. You see, there is a branch of human knowledge known as symbolic logic, which can be used to prune away all sorts of clogging deadwood that clutters up human language."

"What about it?" said Fulham.

"I applied it. Among other things, I applied it to this document here. I didn't really need to for myself because I knew what it was all about, but I think I can explain it more easily to five physical scientists by symbols rather than by words."

...

"The message from Anacreon was a simple problem, naturally, for the men who wrote it were men of action rather than men of words. It boils down easily and straightforwardly to the unqualified statement,...,which in words, roughly translated, is, 'You give us what we want in a week, or we take it by force.'"

"All right." Hardin replaced the sheets. "Before you now you see a copy of the treaty between the Empire and Anacreon – a treaty, incidentally, which is signed on the Emperor's behalf by the same Lord Dorwin who was here last week – and with it a symbolic analysis."

"As you see, gentlemen, something like ninety percent of the treaty boiled right out of the analysis as being meaningless, and what we end up with can be described in the following interesting manner:

"Obligations of Anacreon to the Empire: None!

"Powers of the Empire over Anacreon: None!"

Later on the group discusses a similar analysis of the visits from the empire's representative Lord Dorwin

"You know, that's the most interesting part of the whole business. I'll admit I had thought his Lordship a most consummate donkey when I first met him – but it turned out that he was actually an accomplished diplomat and a most clever man. I took the liberty of recording all his statements."

"... The analysis was the most difficult of the three by all odds. When Holk, after two days of steady work, succeeded in eliminating meaningless statements, vague gibberish, useless qualifications – in short, all the goo and dribble – he found he had nothing left. Everything canceled out."

"Lord Dorwin, gentlemen, in five days of discussion didn't say one damned thing, and said it so you never noticed."

Not all applications of symbolic logic are as dramatic and interesting as the one Asimov depicts. Nonetheless, even though there may not be any cosmic significance, symbolic logic can be a tool that makes life easier and reasoning and comprehension more clear.

Suppose, for example, that you have a friend who says

If it is raining and either it is not raining or it is snowing then it is snowing.

What do you make of that statement. What does it mean? Does it ever make sense? Trying to parse his sentence is nearly impossible - at least in its fully decorated language form. Symbolic logic let's us, much like Mayor Hardin, strip away all the nonsense and come to some supported conclusion about your friend's ability to communicate.

To apply it, first take the basic pieces and represent them with simple symbols. For this example, let p mean 'it is not raining' and q 'it is snowing. Your friend's cryptic statement is symbolically represented as:

 \left[ \neg p \wedge (p \vee q) \right] \rightarrow q \; ,

where \neg means not (i.e. \neg p means it is raining), \wedge is the logical and, \vee is logical or, and \rightarrow is the usual if-then relation.

Having translated the cryptic statement into symbols, we can now manipulate in terms of the standard rules of propositional logic.

The first step is to rewrite the if-then implication in its 'or' form

 \neg \left[ \neg p \wedge (p \vee q) \right] \vee q \; .

Then use de Morgan's rule to bring the negation inside

 \left[ \neg \neg p \vee \neg (p \wedge q) \right] \vee q \; .

Next use the double negation to simplify the first term

 \left[ p \vee \neg(p \wedge q) \right] \vee q

and then use de Morgan's rule again

 \left[ p \vee (\neg p \vee \neg q) \right] \vee q \; .

The law of distribution is the next step

 \left[ (p \vee \neg p) \vee (p \vee \neg q) \right] \vee q \; .

From the basic laws of logic p \vee \neg p \equiv T since a proposition is either true or false so that that same proposition or not that proposition is always true (a tautology). This observation yields

 \left[ T \vee (p \vee \neg q) \right] \vee q \; .

Next apply T \vee p \equiv p , and our original statement is now

 (p \vee \neg q) \vee q \; ,

which in English reads something like 'either it is raining or it is not snowing or it is snowing'. Still a bit confusing to parse but certainly much simpler. Fortunately, we can continue analyzing the statement further.

Using distribution again gives

 (p \vee q) \vee (\neg q \vee q ) \; ,

which becomes

 (p \vee q ) \vee T

when \neg q \vee q \equiv T is used.

Finally, as noted earlier,  anything \vee T \equiv T and the original statement boils down always being true. Your friend has uttered a tautology and, for one brief moment, shown himself to be worthy of being called 'an accomplished diplomat and a most clever man' who is able to avoid saying 'one damned thing' and of saying 'it so you never noticed'.

Interfaces and Thoughts

This month’s column is, on the face of it, a whimsical undertaking.  I’ll be examining the man-machine interface on two different cars – a 2004 Saturn Ion and a 2000 Infiniti I30.  Yes, you are correct!  I own some old cars and, just in case your thoughts wander into the notion that at least there is a luxury car in the mix, please note that the Infiniti was purchased used, with over 100,000 miles on it.  Anyway, the central theme of this post is not what kind of cars I own and drive but rather what the design of the car tells about the designer.

The idea that the form and function of a work reveals a portion of the mind of the designer is old, very old, and is one of the central arguments from the medieval synthesis for the existence of God.  I’m not claiming any deep metaphysical insights from the discussion here but I do think there are interesting reflections on the human mind in this very tongue-in-cheek analysis.

To start, let me say that the Infiniti is the better designed car from the point of view of acceleration, handling, and space.  It has a V8 compared to the Ion’s 4 cylinder inline.  It has leather seats versus the Saturn’s cloth ones.  And the list could go on.  Nonetheless, the man-machine interface in the Saturn is hands down better.

Let’s start with the power locks on the doors.  Here is a picture of the locking control for the I30:

and the corresponding look at the Ion:

While they are very close in design they don’t work the same way at all – leading to a great deal of confusion whenever I switch.  The icon of the black key on the white background means unlock on the Saturn and lock on the Infiniti.  And that difference speaks to how the people who designed these controls think.

I am quite aware that there are cultural differences between the Japanese and US mind but these shouldn’t have come into play.  That isn’t to say that they aren’t important or valid differences nor that the resulting man-machine interface isn’t influenced by them but rather there should be universal agreement across both designs.

The primary purpose of any key, regardless of the time and place in which it is used, is to unlock a locked door.  In certain, older circumstances keys are also used to lock the door again but this lesser purpose should not be the one that the designers seized for their manufacturing.  This is because, for reasons of safety and security, a car is something that is generally in a locked state unless being entered or exited.  Once in a car, especially one that has power locks, the notion of key as the locking agent becomes almost meaningless.  In addition, cars are imported and exported around the world and international standardizations are commonplace.  Thus, the only possible conclusion is that the Ion gets it right and the Infiniti gets its wrong.  This conclusion also suggests that the Infiniti designers were perhaps focused on the Japanese interpretation and not on how their product would be used in the global market.

Of course, there are those of you who still object that this is simply a matter of convention; that the functional argument is not persuasive.  I must admit that when I first encountered this difference I wasn’t swayed either.  The interface that really draws the difference and pushed my thinking from equivocation to certainty is the windshield wiper control.  Here the problem isn’t embracing a difference between conventions but a matter of internal consistency.  And again, the Saturn is the clear winner.

To be concrete, here is a photo of the interface to the wiper system from the Ion

and the I30

Again, the two designs look very similar – too similar, in fact, to keep me from getting confused.  Both designs sport the traditional iconography of a dashed line (- -) for intermittent wipers, the solid line (—) for continuous low, and a heavy solid line for continuous high ().  A careful examination reveals that the directions one must articulate the control is different; on the Ion, the wipers go from intermittent to low to high by pushing the lever upwards while on the I30 the same sequence results by pushing down.  Again, the difference seems to be one of convention but we haven’t discussed the intermittency setting and it is here that the I30 shows itself to be inconsistent.

Before getting to the inconsistency, there is one more matter of convention that differs between the two.  Both controls sport a dial bearing graduated lines (the white lines on the right that are wider at the top and taper to almost nothing at the bottom). that set the speed of the intermittent wipers.  For the I30, the larger the line the larger the time gap between successive swipes by the wipers.  For the Ion, the larger the line the smaller the time gap between successive swipes.  So their conventions are dual to each other, with the I30 working in terms of time and the Ion in terms of frequency.

The inconsistency rears its head when the lever and dial are used in tandem (i.e. when using intermittent wipers).  In the I30, higher frequency is obtained by pushing the lever down but by turning the dial up.  On the Ion, up means the same thing for both lever and dial.  And that, in a nutshell, is why the man-machine interface of the Ion is better than that of the I30, despite the I30 being a better car overall.

So, what do these design choices reveal about the minds of the designers.  In the case of the Ion, it seems to show that there was one single guiding mind or principle.  Whether you prefer the design choices of the Ion over the I30, there is no arguing that the Ion design is self-consistent.  White icons always mean changing the state to something more active from the previously passive state.  For the doors from locked (passive) to unlocked (active); for the wipers from off (passive) to low frequency (active) to high frequency (more active).  In the case of the I30, the design is a hodge-podge of concepts with differing motifs and little consistency.  This suggests that no single guiding principle that knitted the design of the man-machine interface.  Part of this is, no doubt cultural, but part of it seems to be indicative of a company that puts pride in the subsystems but fails to put as much emphasis on knitting the systems together in a seamless whole.

K-Means++ Data Clustering: Improved Seeding

This month’s column is a follow on to last month’s look at the K-means data clustering.  Again, I’ll be looking at James McCaffrey’s Test Run article K-Means++ Data Clustering, published in MSDN in August 2015.

In the last post, I confined my analysis to the K-means data clustering algorithm using McCaffrey’s hand-picked data points that showed that the heights and weights of a sample of some population aggregated into 3 distinct groups of clusters.  Once the data were normalized, the human eye easily picks out these clusters but the machine, not so much.  The purpose of the K-means clustering was to provide the machine with a mechanism where it might mimic, at least in the most primitive of senses, the ability of the human eye to correlate a 2-dimensional image.

Of course, McCaffrey’s sample was designed specifically to illustrate the techniques with no gray areas that might confuse the person, let alone the computer.  Nonetheless, the algorithm failed to find the clusters completely around 20-30% of the time; an estimate of my own based on a semi-empirical method of restarting the algorithm a number of times and simply counting the number of times that the clustering gave a color-codes result that looked like the following figure (each color denoting a cluster).

The problem with the algorithm is one of seeding the process.   For each of the k clusters chosen as a guess by the user, the algorithm needs to start with a guess as to where the centers of those clusters lie.  The K-means algorithm seeds the process by selecting samples from the distribution at random to give the initial guess as to the cluster centers.  If this random guess selects two nearby points as two distinct centers then a situation like the one shown above arises.

The ‘++’ in the K-means++ algorithm denotes an improvement to the seeding process to avoid this undesirable outcome.  The technique is called roulette wheel selection and it works as follows.

K-Means Data Clustering

This month’s column is inspired by James McCaffrey’s Test Run article K-Means++ Data Clustering, published in MSDN in August 2015.   McCaffrey’s piece is really two articles rolled into one and is interesting not only for the data clustering algorithm that it presents but also for the fascinating light it sheds on the human facility to visually analyze data (not that McCaffrey comments on this point – it’s just there for the careful observer to note).  But before getting to the more philosophical aspects, it’s necessary to discuss what K-means data clustering is all about.

To explain what data clustering entails, I will use the very same data the McCaffrey presents in his article as the test set for mine.  He gives a set of ordered pairs of height (in inches) and weight (in pounds) for 20 people in some sampled population.  The table listing these points

Height
(in)
Weight
(lbs)
65.0 220.0
73.0 160.0
59.0 110.0
61.0 120.0
75.0 150.0
67.0 240.0
68.0 230.0
70.0 220.0
62.0 130.0
66.0 210.0
77.0 190.0
75.0 180.0
74.0 170.0
70.0 210.0
61.0 110.0
58.0 100.0
66.0 230.0
59.0 120.0
68.0 210.0
61.0 130.0

doesn’t reveal anything special at a glance; nor should it be expected that it would.  As every good data scientist knows, the best (human) approach is to plot the data so that the eye can pick out patterns in that the mind may fail to perceive in the tabulated points.  The resulting plot

reveals that there are three distinct clusters for the data. McCaffrey doesn’t show a plot of these data but it seems that he omitted it because he wanted to focus on how to teach the computer to find these three groups using K-means clustering without the benefit of eyes and a brain, which it obvious doesn’t have.

The principle behind the K-means clustering is quite simple:  each group of points is called a cluster if all the points are closer to the center of the group than they are to the centers of the of the other groups.

The implementation is a bit more difficult and tricky.  Distance plays a key role in this algorithm but whereas the human eye can take in the entire plot at one go and can distinguish the center of each cluster intuitively, the computer must calculate each point one at a time.  This leads to two problems:  1) how to numerically calculate distances between points and 2) how to find the centers of the cluster from which these distances are measured.  Let’s deal with each of these.

The fact that the units for the x-axis are different from those on the y-axis can lead to some problems identifying the clusters – for both the human and the machine.  Consider the following plot of the same data but with the x- and y-axes given the same extent from 50 to 200 in their respective units:

It isn’t at all clear that there are three distinct clusters as opposed to two.  A better approach is to normalize the data in some fashion.  I prefer using the Z-score where the data are normalized according to

 Z_x = \frac{ x - <x>}{\sigma_x} \; ,

where is the mean over all the values and \sigma_x is the corresponding standard deviation.  Applying the normalization to the height and the weight gives

Once the data are normalized, the distance between them is determined by the usual Euclidean norm.

The second step is to determine the centers of each cluster, which is essentially a chicken-and-the-egg problem.  On one hand, one needs the centers to determine which point belongs to which cluster.  On the other hand, one needs to have the points clustered in order to find their center.  The K-means approach is to guess the centers and then to iterate to find a self-consistent solution.

I implemented the K-means clustering code in a Jupyter notebook using Python 2.7 and numpy.  I defined 2 helper functions.

The first one determined the cluster into which each point was classified based on the current guess of the centers.  It loops are all points and compares each point’s distance from the k centers.  The point belongs to the closest center and the point is assigned the id number closest center in a list, which is returned.

def find_cluster(curr_means,data):
    num_pts      = len(data)
    k            = len(curr_means)
    cluster_list = np.zeros(num_pts,dtype=int)
    dist         = np.zeros(k)
    for i in range(num_pts):
        for m in range(k):
            l       = data[i] - curr_means[m]
            dist[m] = l.dot(l)
        cluster_list[i] = np.argmin(dist)
    return cluster_list 

The second helper function updates the position of the centers by selecting from the list of all points, those that belong to the current cluster.  This function depends heavily on the slicing and numerical functions built into numpy.

def update_means(k,cluster_list,curr_data):
    updated_means = np.zeros((k,2))
    for i in range(k):
        clustered_data     = curr_data[np.where(cluster_list == i)]
        updated_means[i,:] = np.array([np.mean(clustered_data[:,0]),np.mean(clustered_data[:,1])])
    return updated_means  

For this implementation, I chose 3 clusters (k = 3) and I seeded the guess for the centers by randomly selecting from the list of the initial points.  Subsequent updates moved the centers off the tabulated points.  The update process was iterated until no changes were observed in the position of the cluster centers.  The final cluster assignments were then used to color code the points based on the cluster to which they belong.

The first run of the algorithm was successful with the resulting plot indicating that the process had done a good job of identifying the clusters similar to the way the human eye would.

Subsequent runs show that about 70-80% of the time, the algorithm correctly identifies the clusters.  The other 20-30% of the time, the algorithm gets ‘stuck’ and misidentifies the points in way that tends to split the small cluster in the bottom left into two pieces (although not always in the same way).

The ‘++’ nomenclature featured in McCaffrey’s article (K++-means clustering) refers to a more sophisticated way of selecting the initial cluster centers that minimizes the chances that the algorithm will get confused.    This new type of seed and some numerical experiments with a larger data set (in both number of points and dimensionality) will be the subject of next month’s column.

Of Fishbones and Philosophy

On the off chance that you, dear reader, are thinking that there is precious little overlap between the skeletons left over from dead fish and the high art of philosophy, let me set your mind at rest.  You are correct; there isn’t much.  Nonetheless, this installment isn’t a shortened quip-of-a-column designed to note this simple observation and then to make a quick, albeit graceful exit.  In point of fact, the fishbones that I am referring to have a great deal to do with philosophy, in general, and epistemology, specifically.

For those you aren’t aware, the fishbone or Ishikawa diagram (after Kaoru Ishikawa) is a way of cataloging the possible, specific causes of an observed event as a way of inferring which one is the most likely.  Its primary application is to those events where the effect is clearly and obviously identifiable but where the trigger of that event is unknown or, at least, unobservable.  One can usually find these diagrams applied in industrial or technological settings where a fault in a complex system rears its ugly head but the failure mode is totally or partially unknown.

Now it is one of those trendy nuggets of common knowledge that philosophy is one of those subjects designed for the technically-challenged to while away their time considering how many angels can dance on the head of a pin or whether to push the fat man onto the tracks in order to save the lives of the many passengers on the train.  No practical applications can be found in philosophy.  It has nothing important to offer workplaces where holes are drilled, sheet metal bent, circuits soldered, products built, and so on.

The fishbone diagram speaks otherwise – it deals with what is real and practical and with what we know and how we know it in a practical setting.  It marries concepts of ontology and, more importantly, epistemology with the seemingly humdrum worlds of quality assurance and manufacturing.

To appreciate exactly how this odd marriage is affected, let’s first start with a distinction that is made in fishbone analysis between the proximate cause and the root cause.  A practical example will serve much better here than any amount of abstract generalizations.

Suppose that as we are strolling through ancient Athens, we stumble upon a dead body.  We recognize that it is our sometime companion by the name of Socrates.  Having been fond of that abrasive gadfly and possessing a slice of curiosity consistent with being an ancient Greek, we start trying to determine just what killed Socrates.  One of us, who works in the new Athenian pottery plant where the emerging science of quality management is practiced, recommends making a fishbone diagram to help organize our investigation.

Inside the head of the fish we place the key observation that Socrates is dead.  Off the central spine, we string possible causes of death, grouped into categories that make sense to us.  After a lot of discussion, we agree these four:  Divine Intervention, Natural Causes, Accidental Death, and Foul Play.   Under each of these broad headings we add specific instances.  For example, some of us have heard rumors of the dead man’s impiety, so perhaps Zeus has struck him down with a thunderbolt.  Other suggest that being hit with a discus was the cause of death, just like what happened to uncle Telemachus at the last Olympic Games.  We continue on until we have our finished fishbone.

 

This version of the fishbone diagram aims at helping us determine the proximate cause.  We want to know what actually killed him without, at this stage, trying to figure out why (although the question of ‘why’ helped us in populating the list).

We then, in good logical fashion, start looking for observations that either strengthen or weaken each of the bones in our diagram.  We find no evidence of charring or submergence in water, so we argue that Divine Intervention is highly unlikely.  There is no blood or signs of blunt force trauma, so scratch all the possibilities under Accidental Death.  One of us notes that his belongings are all present and that his face is peaceful and his body shows no subtle signs of violence like what might be attributed to strangulation or smothering, so we think murder very unlikely.  Finally, one of us detects a faint whiff of a distinct odor and concludes that Socrates has died by drinking hemlock.

In fishbone analysis, hemlock poisoning is the proximate cause – the direct, previous link in the chain of causation that led to his death.  Note that we haven’t actually seen Socrates consume the lethal cocktail; we are simply inferring it based on the effect (he’s dead) and the smell (likeliest cause).  The next step is to determine the root cause – the reason or motivation for his consumption of the hemlock.

We find, after collecting a different type of observations, that he was executed by the Polis of Athens for impiety and for corrupting the morals of the youths of our city state.  We generally fill out this step by interviewing people and collecting human impressions rather than physical evidence.  A what point we decide that we’ve hit the root is up to us.  We can stop with the death sentence passed down by the Athenian court or we can look to the politics that led to that sentence.  We can stop with the politics or dig further into the social and demographic forces that led to Athenian democracy so disposed to dispatch the father of Western thought.  We can trace events back to Hippias the tyrant, or back to Homer, or wherever.

This sense of arbitrariness isn’t confined solely to where we cut off the determination of the root cause.  We also limited our universe of explanations in determining the proximate cause.  We can’t consider everything – how about dryads, sylphs, and satyrs?

In other words, all of us start our fishbone analysis with a Bayesian a priori expectation of likeliest causes and we apply, whether consciously or not, Occam’s razor to simplify.  Let’s reflect on this point a bit more.  Doing so brings into sharper focus the distinction between what we think we know, what we actually know, and what we don’t know; between the universe of knowable, unknown, and unknowable.  Ultimately, what we are dealing with is deep questions of epistemology masquerading as crime scene investigation.

The situation is even more interesting when one has an observable effect with no discernable cause.  Is the cause simply unknown or is it unknowable?  And how do we know in which category it goes without knowing it in the first place?

This epistemological division is even further muddied when we deal with indirect observations provide by tools (usually computers).  Consider the case where a remote machine (perhaps in orbit) communicates with another machine, which unpacks the electronic signals it receives.  If a problem is observed (a part is reported dead, for example), what does this actually mean?  Where does the fault lie?  Is it in the first machine or the second one?  Could the second one be spoofing by accident or malice (hacking) the fault on the first.  How does one know and where does one start?  And if one is willing to extend the concept of a second machine to include human beings and their senses then the line gets even more blurred between observer and observed.  Where does the fault lie, with our machines or with ourselves, and how does one know?

I will close on that note of uncertainty and confusion with an aporetic ending in honor of Socrates.  And all of it came from a little fishbone, whose most common practitioners would most likely tell you that they are not interested in anything so impractical as philosophy.

Dumbing AI Down

The concept of the Turing Test as the basic gate that an artificially intelligent system must pass to be judged sufficiently human-like is both pervasive and intriguing.   Dealt with widely in both serious academic circles and in fanciful science fiction avenues, the usual theme is one in which the AI must overcome a set of hurdles to pass the test.

Usually, these hurdles are viewed as a question of evolution - of smartening the AI so that it acts like a human being.  Topics along this line include enabling sophisticated algorithms that recognize levels of evocation; an essential property that allows for understanding humor, getting double entendres, recognizing sarcasm.  Poetry and evocative imagery is also a complication that has been explored off and on.

Far less frequently is the concept of devolution explored.  The idea here is to dumb down the AI so that it seems less like a computer and more like a human being.  It should know how to round numbers grossly, use vague characterizations, use contractions, cut verbal corners, and the like.  One should imagine Commander Data from Star Trek the Next Generation as the textbook example.

This post deals with an unillumined corner of this latter category.  Specifically, how to make sure an AI can mimic the intuition of a human being, warts and all.  What I am talking about is a designed AI with the same usual blind spots and foibles as the average human being. Nothing illustrates this so clearly as intuition-defying results that come from big numbers.

Humans are not usually good with numbers in general and are notoriously bad with big numbers.  This is such a prevalent problem that there is even a term to describe just how poor the average soul’s understanding of numbers and mathematics is – innumeracy.

Even for those practiced in the art, intuition can fail when big numbers come in the form of probability and statistics.  Two puzzles are famous for challenging the mortal mind:  The Birthday and the Monte Hall Puzzles.  Any AI that wants to blend in had better trip over these two problems with the rest of us or run the risk of being exposed as something other than human.

The Birthday Puzzle

Often called the Birthday Paradox, this puzzle is a significant challenge to the basic intuition that each of us has to the likelihood of coincidences.  As described, the Birthday Puzzle, goes something like this.  Suppose that there are n persons in a room, say attending a party.  What is the probability that any two of them have the same birthday?  Stated slightly differently, how many people do you need in a room before the probability is 50% that any two of them share the same birthday.

To be concrete and to keep things as simple as possible, let’s agree to ignore leap days and the possibility of a birthday falling on February 29th.  This step is not essential but it keeps the number of special cases to consider down to a minimum.

Ask the average person and they will tell you that you need about 182 people in the room to get a 50-50 shot (assuming that the average person can actually divide 365 in half and properly round).  Whether it is a result of nature or nurture, this ‘intuitive’ and ‘obvious’ answer is grossly wrong.

The easiest way to compute the probability is to compute the much easier probability that none of the n persons have the same birthday and then to subtract this number from 1 to get the probability that at least one pair share a birthdate in common.

Suppose that there are 3 people in the room; then there are 365 days to assign person 1’s birthday, 364 days to assign to person 2’s birthday, and 363 days to assign to person 3’s birthday.  Each of these numbers is then divided by the total number of days to get the probability.  The value of this number is

 \tilde P = \frac{365}{365} \frac{364}{365} \frac{363}{365} \; .

The probability that in a group of 3 persons at least one birthday is held in common is

 P = 1 - \tilde P = 0.0082 \; .

This approach, which doesn’t come naturally to most of us, is, at least comforting, in that common sense tells us that when there are 366 or more people in a room then at least one pair share a birthday.  The real assault on common sense begins when we generalize the analysis to an arbitrary number of people and graph the result.

The general formula is

 P_n = 1 - \frac{365}{365} \frac{364}{365} \cdots \frac{365-n+1}{365} \; .

When graphed the unexpected appears:  only 23 people are needed to get a probability just over 50%. 

By the time the number of people reaches about 60, the probability of a match is nearly 100%.  This result challenges our expectations and causes us genuine surprise.  How will an AI that passes the more conventional aspects of a Turing test react?

The Monte Hall Puzzle

Even more interesting and non-intuitive is the Monte Hall or Let’s Make a Deal Puzzle.  Based on the final segment of the game show Let’s Make a Deal, contestants are offered a choice between three doors.  Behind two of them are so-called booby prizes, usually a farm animal or some other unwanted thing.  Behind one of the doors is usually a car.  Monte Hall, the host of the show, asks the contestant to pick one door.  Next, he opens one of the other two doors and reveals one of the booby prizes (e.g. the goat).

Monte’s final step is to offer the one unopened door in trade to the contestant.  The question then is should the contestant accept the offer and switch doors or should he stay with his original pick?  Of course, there is no way to guarantee the correct choice, but the contestant has a definite statistical advantage if he switches.  The probability that the car is behind the door he chose is 1/3 while the probability it is behind the other door is 2/3.  Most people see two doors and assume that the odds are 50-50.  That’s human intuition – even though it is wrong.  And Monte Hall, who I believe must have been a confidence man before going legit, played on the contestant’s greed and excitement, by offering cash if they stay with their first choice.  Usually, he kept them from getting the car, which I suppose was his aim.

Now imagine what would happen when an AI went onto Let’s Make a Deal.  Certainly, the AI should be able to understand natural language.  But how should it react to the door choices, to Monte Hall’s con-man techniques.  If the AI is going to fool the humans around it, it'd better be conned alongside the rest of us.