Latest Posts

Interfaces and Thoughts

This month’s column is, on the face of it, a whimsical undertaking.  I’ll be examining the man-machine interface on two different cars – a 2004 Saturn Ion and a 2000 Infiniti I30.  Yes, you are correct!  I own some old cars and, just in case your thoughts wander into the notion that at least there is a luxury car in the mix, please note that the Infiniti was purchased used, with over 100,000 miles on it.  Anyway, the central theme of this post is not what kind of cars I own and drive but rather what the design of the car tells about the designer.

The idea that the form and function of a work reveals a portion of the mind of the designer is old, very old, and is one of the central arguments from the medieval synthesis for the existence of God.  I’m not claiming any deep metaphysical insights from the discussion here but I do think there are interesting reflections on the human mind in this very tongue-in-cheek analysis.

To start, let me say that the Infiniti is the better designed car from the point of view of acceleration, handling, and space.  It has a V8 compared to the Ion’s 4 cylinder inline.  It has leather seats versus the Saturn’s cloth ones.  And the list could go on.  Nonetheless, the man-machine interface in the Saturn is hands down better.

Let’s start with the power locks on the doors.  Here is a picture of the locking control for the I30:

and the corresponding look at the Ion:

While they are very close in design they don’t work the same way at all – leading to a great deal of confusion whenever I switch.  The icon of the black key on the white background means unlock on the Saturn and lock on the Infiniti.  And that difference speaks to how the people who designed these controls think.

I am quite aware that there are cultural differences between the Japanese and US mind but these shouldn’t have come into play.  That isn’t to say that they aren’t important or valid differences nor that the resulting man-machine interface isn’t influenced by them but rather there should be universal agreement across both designs.

The primary purpose of any key, regardless of the time and place in which it is used, is to unlock a locked door.  In certain, older circumstances keys are also used to lock the door again but this lesser purpose should not be the one that the designers seized for their manufacturing.  This is because, for reasons of safety and security, a car is something that is generally in a locked state unless being entered or exited.  Once in a car, especially one that has power locks, the notion of key as the locking agent becomes almost meaningless.  In addition, cars are imported and exported around the world and international standardizations are commonplace.  Thus, the only possible conclusion is that the Ion gets it right and the Infiniti gets its wrong.  This conclusion also suggests that the Infiniti designers were perhaps focused on the Japanese interpretation and not on how their product would be used in the global market.

Of course, there are those of you who still object that this is simply a matter of convention; that the functional argument is not persuasive.  I must admit that when I first encountered this difference I wasn’t swayed either.  The interface that really draws the difference and pushed my thinking from equivocation to certainty is the windshield wiper control.  Here the problem isn’t embracing a difference between conventions but a matter of internal consistency.  And again, the Saturn is the clear winner.

To be concrete, here is a photo of the interface to the wiper system from the Ion

and the I30

Again, the two designs look very similar – too similar, in fact, to keep me from getting confused.  Both designs sport the traditional iconography of a dashed line (- -) for intermittent wipers, the solid line (—) for continuous low, and a heavy solid line for continuous high ().  A careful examination reveals that the directions one must articulate the control is different; on the Ion, the wipers go from intermittent to low to high by pushing the lever upwards while on the I30 the same sequence results by pushing down.  Again, the difference seems to be one of convention but we haven’t discussed the intermittency setting and it is here that the I30 shows itself to be inconsistent.

Before getting to the inconsistency, there is one more matter of convention that differs between the two.  Both controls sport a dial bearing graduated lines (the white lines on the right that are wider at the top and taper to almost nothing at the bottom). that set the speed of the intermittent wipers.  For the I30, the larger the line the larger the time gap between successive swipes by the wipers.  For the Ion, the larger the line the smaller the time gap between successive swipes.  So their conventions are dual to each other, with the I30 working in terms of time and the Ion in terms of frequency.

The inconsistency rears its head when the lever and dial are used in tandem (i.e. when using intermittent wipers).  In the I30, higher frequency is obtained by pushing the lever down but by turning the dial up.  On the Ion, up means the same thing for both lever and dial.  And that, in a nutshell, is why the man-machine interface of the Ion is better than that of the I30, despite the I30 being a better car overall.

So, what do these design choices reveal about the minds of the designers.  In the case of the Ion, it seems to show that there was one single guiding mind or principle.  Whether you prefer the design choices of the Ion over the I30, there is no arguing that the Ion design is self-consistent.  White icons always mean changing the state to something more active from the previously passive state.  For the doors from locked (passive) to unlocked (active); for the wipers from off (passive) to low frequency (active) to high frequency (more active).  In the case of the I30, the design is a hodge-podge of concepts with differing motifs and little consistency.  This suggests that no single guiding principle that knitted the design of the man-machine interface.  Part of this is, no doubt cultural, but part of it seems to be indicative of a company that puts pride in the subsystems but fails to put as much emphasis on knitting the systems together in a seamless whole.

K-Means++ Data Clustering: Improved Seeding

This month’s column is a follow on to last month’s look at the K-means data clustering.  Again, I’ll be looking at James McCaffrey’s Test Run article K-Means++ Data Clustering, published in MSDN in August 2015.

In the last post, I confined my analysis to the K-means data clustering algorithm using McCaffrey’s hand-picked data points that showed that the heights and weights of a sample of some population aggregated into 3 distinct groups of clusters.  Once the data were normalized, the human eye easily picks out these clusters but the machine, not so much.  The purpose of the K-means clustering was to provide the machine with a mechanism where it might mimic, at least in the most primitive of senses, the ability of the human eye to correlate a 2-dimensional image.

Of course, McCaffrey’s sample was designed specifically to illustrate the techniques with no gray areas that might confuse the person, let alone the computer.  Nonetheless, the algorithm failed to find the clusters completely around 20-30% of the time; an estimate of my own based on a semi-empirical method of restarting the algorithm a number of times and simply counting the number of times that the clustering gave a color-codes result that looked like the following figure (each color denoting a cluster).

The problem with the algorithm is one of seeding the process.   For each of the k clusters chosen as a guess by the user, the algorithm needs to start with a guess as to where the centers of those clusters lie.  The K-means algorithm seeds the process by selecting samples from the distribution at random to give the initial guess as to the cluster centers.  If this random guess selects two nearby points as two distinct centers then a situation like the one shown above arises.

The ‘++’ in the K-means++ algorithm denotes an improvement to the seeding process to avoid this undesirable outcome.  The technique is called roulette wheel selection and it works as follows.

K-Means Data Clustering

This month’s column is inspired by James McCaffrey’s Test Run article K-Means++ Data Clustering, published in MSDN in August 2015.   McCaffrey’s piece is really two articles rolled into one and is interesting not only for the data clustering algorithm that it presents but also for the fascinating light it sheds on the human facility to visually analyze data (not that McCaffrey comments on this point – it’s just there for the careful observer to note).  But before getting to the more philosophical aspects, it’s necessary to discuss what K-means data clustering is all about.

To explain what data clustering entails, I will use the very same data the McCaffrey presents in his article as the test set for mine.  He gives a set of ordered pairs of height (in inches) and weight (in pounds) for 20 people in some sampled population.  The table listing these points

65.0 220.0
73.0 160.0
59.0 110.0
61.0 120.0
75.0 150.0
67.0 240.0
68.0 230.0
70.0 220.0
62.0 130.0
66.0 210.0
77.0 190.0
75.0 180.0
74.0 170.0
70.0 210.0
61.0 110.0
58.0 100.0
66.0 230.0
59.0 120.0
68.0 210.0
61.0 130.0

doesn’t reveal anything special at a glance; nor should it be expected that it would.  As every good data scientist knows, the best (human) approach is to plot the data so that the eye can pick out patterns in that the mind may fail to perceive in the tabulated points.  The resulting plot

reveals that there are three distinct clusters for the data. McCaffrey doesn’t show a plot of these data but it seems that he omitted it because he wanted to focus on how to teach the computer to find these three groups using K-means clustering without the benefit of eyes and a brain, which it obvious doesn’t have.

The principle behind the K-means clustering is quite simple:  each group of points is called a cluster if all the points are closer to the center of the group than they are to the centers of the of the other groups.

The implementation is a bit more difficult and tricky.  Distance plays a key role in this algorithm but whereas the human eye can take in the entire plot at one go and can distinguish the center of each cluster intuitively, the computer must calculate each point one at a time.  This leads to two problems:  1) how to numerically calculate distances between points and 2) how to find the centers of the cluster from which these distances are measured.  Let’s deal with each of these.

The fact that the units for the x-axis are different from those on the y-axis can lead to some problems identifying the clusters – for both the human and the machine.  Consider the following plot of the same data but with the x- and y-axes given the same extent from 50 to 200 in their respective units:

It isn’t at all clear that there are three distinct clusters as opposed to two.  A better approach is to normalize the data in some fashion.  I prefer using the Z-score where the data are normalized according to

 Z_x = \frac{ x - <x>}{\sigma_x} \; ,

where is the mean over all the values and \sigma_x is the corresponding standard deviation.  Applying the normalization to the height and the weight gives

Once the data are normalized, the distance between them is determined by the usual Euclidean norm.

The second step is to determine the centers of each cluster, which is essentially a chicken-and-the-egg problem.  On one hand, one needs the centers to determine which point belongs to which cluster.  On the other hand, one needs to have the points clustered in order to find their center.  The K-means approach is to guess the centers and then to iterate to find a self-consistent solution.

I implemented the K-means clustering code in a Jupyter notebook using Python 2.7 and numpy.  I defined 2 helper functions.

The first one determined the cluster into which each point was classified based on the current guess of the centers.  It loops are all points and compares each point’s distance from the k centers.  The point belongs to the closest center and the point is assigned the id number closest center in a list, which is returned.

def find_cluster(curr_means,data):
    num_pts      = len(data)
    k            = len(curr_means)
    cluster_list = np.zeros(num_pts,dtype=int)
    dist         = np.zeros(k)
    for i in range(num_pts):
        for m in range(k):
            l       = data[i] - curr_means[m]
            dist[m] =
        cluster_list[i] = np.argmin(dist)
    return cluster_list 

The second helper function updates the position of the centers by selecting from the list of all points, those that belong to the current cluster.  This function depends heavily on the slicing and numerical functions built into numpy.

def update_means(k,cluster_list,curr_data):
    updated_means = np.zeros((k,2))
    for i in range(k):
        clustered_data     = curr_data[np.where(cluster_list == i)]
        updated_means[i,:] = np.array([np.mean(clustered_data[:,0]),np.mean(clustered_data[:,1])])
    return updated_means  

For this implementation, I chose 3 clusters (k = 3) and I seeded the guess for the centers by randomly selecting from the list of the initial points.  Subsequent updates moved the centers off the tabulated points.  The update process was iterated until no changes were observed in the position of the cluster centers.  The final cluster assignments were then used to color code the points based on the cluster to which they belong.

The first run of the algorithm was successful with the resulting plot indicating that the process had done a good job of identifying the clusters similar to the way the human eye would.

Subsequent runs show that about 70-80% of the time, the algorithm correctly identifies the clusters.  The other 20-30% of the time, the algorithm gets ‘stuck’ and misidentifies the points in way that tends to split the small cluster in the bottom left into two pieces (although not always in the same way).

The ‘++’ nomenclature featured in McCaffrey’s article (K++-means clustering) refers to a more sophisticated way of selecting the initial cluster centers that minimizes the chances that the algorithm will get confused.    This new type of seed and some numerical experiments with a larger data set (in both number of points and dimensionality) will be the subject of next month’s column.

Of Fishbones and Philosophy

On the off chance that you, dear reader, are thinking that there is precious little overlap between the skeletons left over from dead fish and the high art of philosophy, let me set your mind at rest.  You are correct; there isn’t much.  Nonetheless, this installment isn’t a shortened quip-of-a-column designed to note this simple observation and then to make a quick, albeit graceful exit.  In point of fact, the fishbones that I am referring to have a great deal to do with philosophy, in general, and epistemology, specifically.

For those you aren’t aware, the fishbone or Ishikawa diagram (after Kaoru Ishikawa) is a way of cataloging the possible, specific causes of an observed event as a way of inferring which one is the most likely.  Its primary application is to those events where the effect is clearly and obviously identifiable but where the trigger of that event is unknown or, at least, unobservable.  One can usually find these diagrams applied in industrial or technological settings where a fault in a complex system rears its ugly head but the failure mode is totally or partially unknown.

Now it is one of those trendy nuggets of common knowledge that philosophy is one of those subjects designed for the technically-challenged to while away their time considering how many angels can dance on the head of a pin or whether to push the fat man onto the tracks in order to save the lives of the many passengers on the train.  No practical applications can be found in philosophy.  It has nothing important to offer workplaces where holes are drilled, sheet metal bent, circuits soldered, products built, and so on.

The fishbone diagram speaks otherwise – it deals with what is real and practical and with what we know and how we know it in a practical setting.  It marries concepts of ontology and, more importantly, epistemology with the seemingly humdrum worlds of quality assurance and manufacturing.

To appreciate exactly how this odd marriage is affected, let’s first start with a distinction that is made in fishbone analysis between the proximate cause and the root cause.  A practical example will serve much better here than any amount of abstract generalizations.

Suppose that as we are strolling through ancient Athens, we stumble upon a dead body.  We recognize that it is our sometime companion by the name of Socrates.  Having been fond of that abrasive gadfly and possessing a slice of curiosity consistent with being an ancient Greek, we start trying to determine just what killed Socrates.  One of us, who works in the new Athenian pottery plant where the emerging science of quality management is practiced, recommends making a fishbone diagram to help organize our investigation.

Inside the head of the fish we place the key observation that Socrates is dead.  Off the central spine, we string possible causes of death, grouped into categories that make sense to us.  After a lot of discussion, we agree these four:  Divine Intervention, Natural Causes, Accidental Death, and Foul Play.   Under each of these broad headings we add specific instances.  For example, some of us have heard rumors of the dead man’s impiety, so perhaps Zeus has struck him down with a thunderbolt.  Other suggest that being hit with a discus was the cause of death, just like what happened to uncle Telemachus at the last Olympic Games.  We continue on until we have our finished fishbone.


This version of the fishbone diagram aims at helping us determine the proximate cause.  We want to know what actually killed him without, at this stage, trying to figure out why (although the question of ‘why’ helped us in populating the list).

We then, in good logical fashion, start looking for observations that either strengthen or weaken each of the bones in our diagram.  We find no evidence of charring or submergence in water, so we argue that Divine Intervention is highly unlikely.  There is no blood or signs of blunt force trauma, so scratch all the possibilities under Accidental Death.  One of us notes that his belongings are all present and that his face is peaceful and his body shows no subtle signs of violence like what might be attributed to strangulation or smothering, so we think murder very unlikely.  Finally, one of us detects a faint whiff of a distinct odor and concludes that Socrates has died by drinking hemlock.

In fishbone analysis, hemlock poisoning is the proximate cause – the direct, previous link in the chain of causation that led to his death.  Note that we haven’t actually seen Socrates consume the lethal cocktail; we are simply inferring it based on the effect (he’s dead) and the smell (likeliest cause).  The next step is to determine the root cause – the reason or motivation for his consumption of the hemlock.

We find, after collecting a different type of observations, that he was executed by the Polis of Athens for impiety and for corrupting the morals of the youths of our city state.  We generally fill out this step by interviewing people and collecting human impressions rather than physical evidence.  A what point we decide that we’ve hit the root is up to us.  We can stop with the death sentence passed down by the Athenian court or we can look to the politics that led to that sentence.  We can stop with the politics or dig further into the social and demographic forces that led to Athenian democracy so disposed to dispatch the father of Western thought.  We can trace events back to Hippias the tyrant, or back to Homer, or wherever.

This sense of arbitrariness isn’t confined solely to where we cut off the determination of the root cause.  We also limited our universe of explanations in determining the proximate cause.  We can’t consider everything – how about dryads, sylphs, and satyrs?

In other words, all of us start our fishbone analysis with a Bayesian a priori expectation of likeliest causes and we apply, whether consciously or not, Occam’s razor to simplify.  Let’s reflect on this point a bit more.  Doing so brings into sharper focus the distinction between what we think we know, what we actually know, and what we don’t know; between the universe of knowable, unknown, and unknowable.  Ultimately, what we are dealing with is deep questions of epistemology masquerading as crime scene investigation.

The situation is even more interesting when one has an observable effect with no discernable cause.  Is the cause simply unknown or is it unknowable?  And how do we know in which category it goes without knowing it in the first place?

This epistemological division is even further muddied when we deal with indirect observations provide by tools (usually computers).  Consider the case where a remote machine (perhaps in orbit) communicates with another machine, which unpacks the electronic signals it receives.  If a problem is observed (a part is reported dead, for example), what does this actually mean?  Where does the fault lie?  Is it in the first machine or the second one?  Could the second one be spoofing by accident or malice (hacking) the fault on the first.  How does one know and where does one start?  And if one is willing to extend the concept of a second machine to include human beings and their senses then the line gets even more blurred between observer and observed.  Where does the fault lie, with our machines or with ourselves, and how does one know?

I will close on that note of uncertainty and confusion with an aporetic ending in honor of Socrates.  And all of it came from a little fishbone, whose most common practitioners would most likely tell you that they are not interested in anything so impractical as philosophy.

Dumbing AI Down

The concept of the Turing Test as the basic gate that an artificially intelligent system must pass to be judged sufficiently human-like is both pervasive and intriguing.   Dealt with widely in both serious academic circles and in fanciful science fiction avenues, the usual theme is one in which the AI must overcome a set of hurdles to pass the test.

Usually, these hurdles are viewed as a question of evolution - of smartening the AI so that it acts like a human being.  Topics along this line include enabling sophisticated algorithms that recognize levels of evocation; an essential property that allows for understanding humor, getting double entendres, recognizing sarcasm.  Poetry and evocative imagery is also a complication that has been explored off and on.

Far less frequently is the concept of devolution explored.  The idea here is to dumb down the AI so that it seems less like a computer and more like a human being.  It should know how to round numbers grossly, use vague characterizations, use contractions, cut verbal corners, and the like.  One should imagine Commander Data from Star Trek the Next Generation as the textbook example.

This post deals with an unillumined corner of this latter category.  Specifically, how to make sure an AI can mimic the intuition of a human being, warts and all.  What I am talking about is a designed AI with the same usual blind spots and foibles as the average human being. Nothing illustrates this so clearly as intuition-defying results that come from big numbers.

Humans are not usually good with numbers in general and are notoriously bad with big numbers.  This is such a prevalent problem that there is even a term to describe just how poor the average soul’s understanding of numbers and mathematics is – innumeracy.

Even for those practiced in the art, intuition can fail when big numbers come in the form of probability and statistics.  Two puzzles are famous for challenging the mortal mind:  The Birthday and the Monte Hall Puzzles.  Any AI that wants to blend in had better trip over these two problems with the rest of us or run the risk of being exposed as something other than human.

The Birthday Puzzle

Often called the Birthday Paradox, this puzzle is a significant challenge to the basic intuition that each of us has to the likelihood of coincidences.  As described, the Birthday Puzzle, goes something like this.  Suppose that there are n persons in a room, say attending a party.  What is the probability that any two of them have the same birthday?  Stated slightly differently, how many people do you need in a room before the probability is 50% that any two of them share the same birthday.

To be concrete and to keep things as simple as possible, let’s agree to ignore leap days and the possibility of a birthday falling on February 29th.  This step is not essential but it keeps the number of special cases to consider down to a minimum.

Ask the average person and they will tell you that you need about 182 people in the room to get a 50-50 shot (assuming that the average person can actually divide 365 in half and properly round).  Whether it is a result of nature or nurture, this ‘intuitive’ and ‘obvious’ answer is grossly wrong.

The easiest way to compute the probability is to compute the much easier probability that none of the n persons have the same birthday and then to subtract this number from 1 to get the probability that at least one pair share a birthdate in common.

Suppose that there are 3 people in the room; then there are 365 days to assign person 1’s birthday, 364 days to assign to person 2’s birthday, and 363 days to assign to person 3’s birthday.  Each of these numbers is then divided by the total number of days to get the probability.  The value of this number is

 \tilde P = \frac{365}{365} \frac{364}{365} \frac{363}{365} \; .

The probability that in a group of 3 persons at least one birthday is held in common is

 P = 1 - \tilde P = 0.0082 \; .

This approach, which doesn’t come naturally to most of us, is, at least comforting, in that common sense tells us that when there are 366 or more people in a room then at least one pair share a birthday.  The real assault on common sense begins when we generalize the analysis to an arbitrary number of people and graph the result.

The general formula is

 P_n = 1 - \frac{365}{365} \frac{364}{365} \cdots \frac{365-n+1}{365} \; .

When graphed the unexpected appears:  only 23 people are needed to get a probability just over 50%. 

By the time the number of people reaches about 60, the probability of a match is nearly 100%.  This result challenges our expectations and causes us genuine surprise.  How will an AI that passes the more conventional aspects of a Turing test react?

The Monte Hall Puzzle

Even more interesting and non-intuitive is the Monte Hall or Let’s Make a Deal Puzzle.  Based on the final segment of the game show Let’s Make a Deal, contestants are offered a choice between three doors.  Behind two of them are so-called booby prizes, usually a farm animal or some other unwanted thing.  Behind one of the doors is usually a car.  Monte Hall, the host of the show, asks the contestant to pick one door.  Next, he opens one of the other two doors and reveals one of the booby prizes (e.g. the goat).

Monte’s final step is to offer the one unopened door in trade to the contestant.  The question then is should the contestant accept the offer and switch doors or should he stay with his original pick?  Of course, there is no way to guarantee the correct choice, but the contestant has a definite statistical advantage if he switches.  The probability that the car is behind the door he chose is 1/3 while the probability it is behind the other door is 2/3.  Most people see two doors and assume that the odds are 50-50.  That’s human intuition – even though it is wrong.  And Monte Hall, who I believe must have been a confidence man before going legit, played on the contestant’s greed and excitement, by offering cash if they stay with their first choice.  Usually, he kept them from getting the car, which I suppose was his aim.

Now imagine what would happen when an AI went onto Let’s Make a Deal.  Certainly, the AI should be able to understand natural language.  But how should it react to the door choices, to Monte Hall’s con-man techniques.  If the AI is going to fool the humans around it, it'd better be conned alongside the rest of us.

Knowing When to Stop

Imagine that it's Christmas Eve and, due to some poor planning on your part, you find yourself short of a few gifts – gifts for key people in your life.  You reckon that you have no choice but to go out to the mall and fight all the other last minute shoppers to find those special trinkets that will bring smiles to all those faces you would rather not look at when they are frowning.  You know parking will be a delicate issue with few choices available at any given time and, as you enter the lot, you happen to see a space about one foot shy of a football-field’s distance to the mall entrance.  Should you take it or is there a better one closer?

If you take the space, you are in for a long walk to and fro as well as a waste of your time - and maybe, just maybe, the gifts will be gone by the time you get there.  If you pass by the space you run the risk of not finding a closer space and, most likely, this space will not be there when you circle back.

In a nutshell, this type of problem is best described under the heading ‘knowing when it is time to settle’.  It has broad applications in wide ranging fields; any discipline where decision making is done within the context of uncertainty mixed with a now-or-never flavor falls under this heading.

Within the computing and mathematical communities, this scenario is dubbed The Secretary Problem and has been widely studied.  The article Knowing When to Stop by Theodore Hill, published by The American Scientist, presents a nice introduction and discussion of the problem within many of the real world applications.  The aim of this month’s column is to look at some realizations of the problem within a computing context, and to look at some variations that lead to some interesting deviations from the common wisdom.  The code and approach presented here are strongly influenced by the article The Secretary Problem by James McCaffrey in the Test Run column of MSDN Magazine.  All of the code presented and all of the results were produced in a Jupyter notebook using Python 2.7 and the standard suite of numpy and matplotlib.

The basic notion of the Secretary Problem is that a company is hiring for the position of secretary and they have received a pool of applicants.  Since it is expensive to interview and vet applicants and there is a lost opportunity cost for each day the position goes unfilled, the company would like to fill the position as soon as possible.  On the other hand, the company doesn’t want to settle for a poor candidate if a more suitable one would be found with a bit more searching.  And, overall, what expectations should the company have for the qualifications of the secretary; perhaps the market is bad all over.

Within a fairly stringent set of assumptions, there is a way to maximize the probability of selecting the best choice by using the 1/e stopping rule.  To illustrate the method, imagine that 10 applicants seek the position.  Divide the applicant pool up into a testing pool and a selection pool, where the size of the testing pool is determined (to within some rounding or truncation scheme) by dividing the total number of applicants by e, the base of the natural logarithms. Using truncation, the testing pool has 3 members and the selection pool has 7.

Secretary Problem_pool

The testing pool is interviewed and the applicants assessed and scored.  This sampling of the applicant pool serves to survey the entire pool.  The highest score from the testing pool sets a threshold that must be met or exceeded (hopefully) by an applicant within the additional population found in the selection pool.  The first applicant from the selection pool to meet or exceed the threshold is selected; this may or may not be the best overall candidate. Following this approach, and using the additional assumption that each applicant is scored uniquely, the probability is 36.8% chance of getting the best applicant (interestingly, this percentage is also 1/e).

This decision-making framework has three possible responses:  it can find the best applicant, it can settle on a sub-optimal applicant, or it can fail to find any applicant that fits the bill.  This later case occurs when all the best applicants are in the Testing Pool and no applicants in the Selection Pool can match or exceed the threshold.

To test the 1/e rule, I developed code in Python within the Jupyter notebook framework.  The key function is the one that sets up the initial applicant pool.  This function

def generate_applicants(N,flag='uniform'):
    if flag == 'integer':
        pool = []
        for i in range(0,N):
        return np.array(pool)
    if flag == 'normal':
        temp          = np.abs(np.random.randn(N))
        return np.floor(temp/np.max(temp)*100.0)/10.0
    if flag == 'uniform':
        return np.floor(np.random.rand(N)*100.0)/10.0
        print "Didn't understand your specification - using uniform distribution"
        return np.floor(np.random.rand(N)*100.0)/10.0

sets the scores of the applicants in one of three ways.  The first method, called ‘integer’, assigns an integer to each applicant based on a uniform probability distribution.  The selected range is chosen to be 10 times larger than the number of applicants, effectively guaranteeing that no two applicants have the same score.  The second, called ‘normal’, assigns a score from the normal distribution.  This approach also effectively guarantees that no two applicants have the same score.  The occasions where both methods violate the assumption of uniqueness form a very small subset of the whole.  The third method, called ‘uniform’, distributes scores uniformly but ‘quantizes’ the score to a discrete set.  This last method is used to test the importance of the assumption of a unique score for each applicant.

A specific applicant pool and the application of the 1/e rule can be regarded as an individual Monte Carlo trial.  Each trial is repeated a large number of times to assemble the statistics for analysis.  The statistics comprise the number of times the best applicant is found, the number of times no suitable applicant is found, and the number of times a sub-optimal applicant is found and how far from the optimum said applicant is.  This last statistic is called the settle value, since this is what the company has had to settle for.

The following figure shows the percentage of times that each method finds an optimal candidate from the selection pool by using the 1/e stopping rule.

Secretary - Total Success

Note that for the two methods where duplication is nearly impossible (integer and normal), the percent of total success remains, to within Monte Carlo error, at the theoretically derived value of about 36.8 %.  In contrast, the uniform method, which enjoys a quantized scoring system, shoots upwards to a total success rate of 100%.  The reason that explains this behavior is that with a quantized scoring system there is only a discrete set of values any applicant can achieve.  Once the number of applicants gets great enough, the testing pool perfectly characterizes the whole.   And while the number of applicants needed to achieve this higher percentage is impractical for finding a secretary (who really wants 640 applicants interviewing for the position) the application to other problems is obvious.  There is really no reason that a decision process should always hinge on the difference between two choices of less than a fraction of the overall score.  This fact also explains why businesses typically ‘look’ at the market and pay careful attention to who is hiring whom.

For completeness, the following figures show the analogous behavior for the partial success percentage

Secretary - Partial Success

and the total failure scenarios

Secretary - Failure

An interesting corollary is to ask, in the case of partial success, how much short of optimal did the decision process fall in the process of settling on a sub-optimal selection.  The following figures shows histograms for 10, 80, and 640 applicants in the applicant pool for those cases where the decision process had to settle for a sub-optimal choice, for the normal and uniform cases, respectively.  As expected, there is an improvement in how far from the maximum the decision falls as the testing pool size increases but, even with 640 applicants, the normal process has a significant probability of falling short by 20% or more.

Secretary - Settle Normal

In contrast, the distribution for the uniform scoring quickly collapses, so that the amount that the settled-upon candidate falls from the optimum is essentially within 5% even with a moderately sized applicant pool.  Again, this behavior is due to the quantized scoring, which more accurately reflects real world scenarios.

Secretary - Settle Quantized

At this point, there are two observations worth making in brief.  First, the core assumption of the original problem, that all applicants can be assigned a unique score, is worth throwing away.  Even if its adoption was crucial in deriving the 1/e stopping rule, real world applications simply do not admit a clear, unambiguous way to assign unique scores.  Second, it is, perhaps, astonishing how much richness is hidden in something so mundane as hiring a qualified candidate. Of course, this is to be expected, since good help is hard to find.

Aristotle on Whiskey

It has been some time since this column explicitly examined the great Philosopher or explicitly cited his philosophy.  And while his approach to thinking and reflecting on various problems has never been far from the matters usually discussed here, I’ve not actually invoked his name for many columns.  So, it may seem to be a bit of a surprise to start the new year by mentioning Aristotle and whiskey together in this month’s title.  To some it may even be viewed as an unforgivable irreverence to one of the world’s greatest thinkers.  But, as I hope to show, there is nothing irreverent or surprising in linking Aristotle to alcoholic spirits, beyond the usual association that many have about the ancient Greeks – an expectation, no doubt, largely set by Plato’s Symposium.  At issue is the Aristotelian concept of virtue, the sloppy practice of equivocation (double-speak) in logical arguments, and a somewhat famous speech about whiskey made by Noah ‘Soggy’ Sweat Jr.

In 1952, a Mississippi’s law-maker by the name of Noah ‘Soggy’ Sweat Jr., was asked about his position regarding the state’s continued prohibition on selling alcoholic beverages to its citizens.  Soggy’s speech, which has since become immortalized due to its colorful language and its terseness, reads as

My friends, I had not intended to discuss this controversial subject at this particular time. However, I want you to know that I do not shun controversy. On the contrary, I will take a stand on any issue at any time, regardless of how fraught with controversy it might be. You have asked me how I feel about whiskey. All right, here is how I feel about whiskey:

If when you say whiskey you mean the devil's brew, the poison scourge, the bloody monster, that defiles innocence, dethrones reason, destroys the home, creates misery and poverty, yea, literally takes the bread from the mouths of little children; if you mean the evil drink that topples the Christian man and woman from the pinnacle of righteous, gracious living into the bottomless pit of degradation, and despair, and shame and helplessness, and hopelessness, then certainly I am against it.

But, if when you say whiskey you mean the oil of conversation, the philosophic wine, the ale that is consumed when good fellows get together, that puts a song in their hearts and laughter on their lips, and the warm glow of contentment in their eyes; if you mean Christmas cheer; if you mean the stimulating drink that puts the spring in the old gentleman's step on a frosty, crispy morning; if you mean the drink which enables a man to magnify his joy, and his happiness, and to forget, if only for a little while, life's great tragedies, and heartaches, and sorrows; if you mean that drink, the sale of which pours into our treasuries untold millions of dollars, which are used to provide tender care for our little crippled children, our blind, our deaf, our dumb, our pitiful aged and infirm; to build highways and hospitals and schools, then certainly I am for it.

This is my stand. I will not retreat from it. I will not compromise

The standard analysis found at Wikipedia or at Bo Bennett’s Logically Fallacious website is that Soggy’s rhetoric is an amusing example of double-speak.  Bennett has the following to say about this speech:

This is an amazing insight to the human mind and the area of rhetoric.  We can see how when both sides of the issue are presented through the same use of emotionally charged words and phrases, the argument is really vacuous and presents very little factual information, nor does it even take a stance on the issue.

On the surface, Bennett’s analysis seems to be spot on; Soggy’s speech suggests double-talk of the highest order uttered, most likely, with that old, rolling, Southern voice best exemplified by Foghorn Leghorn.  But there is another interpretation that is equally valid and should be explored, in the spirit of fairness.

To understand this more charitable interpretation, we need to step back and understand the Aristotelian concept of virtue; a concept discussed by Aristotle in many places, most notably in Book II of the Nicomachean Ethics.

The concept of virtue coincides with the proper balance between an excess or a deficiency of trait.  In the case of courage or bravery, Aristotle would say that the virtue of courage is having the proper mix between the two extremes of courage.  On one side, the soldier who possesses too little courage is timid and is incapable of performing his function in battle or even, most probably, even incapable of saving his own life.  On the other, the soldier who jumps into danger with no thought whatsoever for his safety or those of his compatriots serves no useful purpose due to his rashness and foolhardiness.

The Aristotelian notion of virtue as the balance between two extremes can be applied to Soggy’s speech as well.  At one extreme, is his first meaning of ‘by whiskey’: the overindulgence in alcohol that weakens character, causes lapses in judgement, and dissipates wealth, prosperity, and family cohesion.  This extreme is drunkenness indulged in by the alcoholic and should be avoided.

The other extreme is a bit more difficult to identify precisely because Soggy refers to it obliquely by noting all the advantages that result from its avoidance rather than discussing all the ills that follow by its pursuit.  This extreme, which may be called prudishness or uptightness, is often the province of the teetotaler, who deprives himself of the benefits that follow from the proper use of wine and spirits.  History shows that almost all cultures reserve an honored spot for ‘adult beverages’ because of the good effects they bring to both the body and the soul of its citizens.  In addition, Soggy points out that their production forms a significant sector of the modern economy, resulting in gainful employment and ample tax revenues that are also beneficial to society.

So there are at least two readings of Soggy’s speech: the first looks at it as a crass example of political jibber-jabber, the second credits it as a colorful explanation, in layman terms, of the virtue of alcohol.  Personally, I prefer the latter interpretation as it brings the great philosophical thought of ancient Greece to the everyday political doings of the modern world.

The More Things Change...

The scope of human knowledge has certainly changed over the last 3000 years.  Daily, we manipulate electrons and beam electromagnetic signals to and fro.  Large scale distribution networks move goods between highly specialized production centers.  Information flows from one corner of the globe to another in a matter of seconds.  Clearly, we live in an age of wonder.  But interestingly, while what we have learned has increased over the centuries, the methods of inquiry into obtaining new knowledge really haven’t changed all that much; certainly what we know has changed greatly but not how we learn it.

Case in point, it the use of regression or recursion as a tool for understanding the world in a philosophical way.  The applications of this approach are numerous due to the wide utility and the fruitful applications of it.

For example, Aristotle argued for that Man must have a purpose by using a type of regression argument whose spirit, although not its explicit nature, goes something like this.  Consider the bones of the hand; their function is to provide stiffness.  Likewise, the ligaments, tendons, and muscles provide the articulation.  The nerves provide the sense of touch and the flesh provides a means of touching, gripping, and holding as well as a unity and an encapsulation for the other parts.  All of these pieces have a function to perform in the greater existence that is the hand.  Likewise, one can find smaller parts serving limited roles within limbs, organs, and systems within the human body:  the eye serves to see; the nose to smell and breath; the mouth to chew, taste, drink, eat, breathe, and talk; and so on.  Since each piece contributes, through its function, to the greater function of the thing to which it is a part, isn’t reasonable to assume that the completed whole, the sum of all these individual parts within parts, also has a function or purpose?  This argument, put forward approximately 2500 years ago, is still compelling and persuasive.

Saint Thomas Aquinas, one the great philosophers of the medieval world, put forward arguments greatly influenced and cast in the form of Aristotle’s arguments.  In his Cosmos section of the Summa Theologica, Aquinas offers five proofs for the existence of God based on the concept of regression.  In outline form they are:

  1. Argument from Motion/Change: changing things depend on their interaction with other things (a plant depends on sunlight which, in turn, depends on ongoing fusion, and so on).  Since no change can start itself, things react when acted on by an external mover.  The only way to avoid an infinite regression of things depending on yet more things (all operating simultaneously) is to assume that there is a prime or unmoved mover.
  2. Argument from Efficient Causes: current effects are brought about by prior causes, which are, in turn, effects of causes one level more removed.  The only way to avoid an infinite regression of things causing other things is to assume that there was a first cause not caused by anything else.
  3. Argument from Possibility and Necessity: beings come and go into and out of existence; they are contingent – having a limited time during which they exist.  Given infinite time in the past, there must have been a time where all things were absent implying nothing could exist now.  Given the current state of existence, the only way to avoid this contradiction is to assume the existence of a non-contingent being.
  4. Argument from Gradation of Being: natural objects are understood and ranked by quantities or qualities; this object is hotter than that one, this thing is better than another (better constructed, better conceived, and so on). Ranking requires a maximum (e.g. hottest object) from which all are measured.  The only way to judge something as better is if there is a best that exists.
  5. Argument from Design: natural objects seem to work towards a goal that they themselves don’t or can’t know and so are directed by a higher intelligence. The only way to avoid an infinite regression of greater intelligences directing lesser ones is to assume that there is a master intelligence that directs all.

In all of these arguments, Aquinas identifies the thing that prevents an infinite regress, that stops the chain of thinking at a well-defined point, as God.

These kinds of logical arguments are not limited to the purely metaphysical realm.  One of the crowning achievements of mathematics is the development of set theory, which depends heavily on the type of arguments put forward above.  The presence of a run-away regress, one that never stops, is a sign of issues within set theory.  This typically, although not exclusively, happens when dealing with infinite collections or sets of things and leads to many paradoxes that must be resolved in order for mathematics to be put on a firm foundation.

Perhaps the most famous example is Russell’s paradox.  The paradox is based on defining two types of sets:

  • Ordinary sets – sets that do not contain themselves
  • Extraordinary sets – sets that do contain themselves

It isn’t at all clear that extraordinary sets exist.  If they do, they can’t be constructed using finite sets or the more familiar infinite sets like the integers or reals.  But assuming that they do exist, one can ask the following question:

Does the set of all ordinary sets contain itself?

To see that this leads to a paradox, first suppose that this set, call it Q, doesn’t contain itself.  It is then an ordinary set by definition.  Since it is ordinary, it should go in a listing of all ordinary sets – that is, it should contain itself.  Thus, we conclude that Q is extraordinary (hence the need to define the term in the first place).

So far so good!

But the fly in the ointment comes when we look carefully at Q being extraordinary.  The membership requirement to be in Q is that the element must be ordinary.  But if Q is contained within Q, it must be ordinary.  And so, we arrive at an infinite loop where one condition implies the other and vice versa.

Oddly enough, even though the trappings are associated with present-day set theory, replete with fancy modern symbols, the structure is almost the same as the ancient liars paradox.  The main ingredients are self-reference and a requirement to end an infinite regression.

The solution of these paradox is a story for another post.  The point here is that while the physical tools we use to manipulate the world have evolved dramatically over the centuries, the mental tools we use to grapple with logic remain largely unchanged.

Game of Life

Last month’s column explored the surprising complexity that results from the repeated application of the simple set of rules called the Half or Triple-Plus-One (HTPO).  Despite the fact that the rules are easy to understand and apply, composing their application iteratively led to rich patterns that defy the ability of the current state of mathematical logic to fully predict or prove.  So it should come as no surprise that there similar systems exhibit ‘big things in small packages’ behavior.

One class of systems where the application of simple local rules leads to complex global behavior is known as cellular automata.  These systems are usually modeled as living on a grid or lattice usually with a rectangular topology, although other connection schemes exist.

The state of the system is specified by a discrete state variable living at the grid point, with the most common state being specified by the integers 0 or 1, corresponding to ‘on’ or ‘off’.  The transition rules for deciding whether a grid point turns from on to off or vice versa are usually based on the local neighborhood of the grid point and are usually specified by relatively simple rules.

As in the HTPO case, the fun begins when the entire system dynamics for extended systems are taken into account.  Even though a specific grid point only interacts directly with its neighbors (the precise definition of what these are differ from system to system) its future evolution is determined by its neighbor’s interactions with their neighbors and so on.  In a real sense, local rules propagate out globally, and this is how the complexity results.

Perhaps the best known and popular cellular automaton system is Conway’s Game of Life.  This system takes place on a rectangular grid with the state of the system specified by 0 for off (‘dead’) and 1 for on (‘alive’).  Each cell has 8 neighbors, the closest cells to the north, north-east, east, south-east, south, south-west, west, and north-west.  The rules for how a particular grid cell evolves in time are:

  • If a cell is alive and has less than 2 neighbors, it dies from heart-crushing loneliness and a failure by society-at-large to meet its needs,
  • if a cell is alive and has 2 or 3 neighbors, it flourishes due to a balanced social life where it is integrated properly into society,
  • if a cell is alive and has 4 or more neighbors, it dies from overcrowding, squalor, and disease,
  • and if a cell is dead but there are exactly 3 neighbors, a new baby is born to settle into the empty spot.

Of course, none of the colorful language used above is canonical.  John Horton Conway picked the rules to try to balance the birth and death rates such that there would be no explosive growth and that the outcomes were not trivially predictable from the inputs.  According to the Wikipedia article linked above, Conway’s work was motivated by the desire to simplify von Neumann’s invention of a computational ‘machine’ that could build copies of itself.

The introduction of the Game of Life in 1970 sparked an entire industry due to the rich, emergent patterns that result.  To give a taste of the possibilities consider some simple configurations on a 6x6 grid.  Perhaps the simplest, non-trivial configuration is the block, which consists of 4 cells clustered together.


Since each cell has 3 neighbors, the pattern, once established, persists into eternity and the four cells live forever together in a stable colony.   Interestingly, this future-time stability does not imply past-time stability as this configuration can be reached by a variety of enormously dynamic and turbulent pasts as long as the penultimate step is a 3-cell ‘L’ configuration.

There are many stable configurations like the block.  Another one that tends to appear in simulations is the beehive, which also lasts from one time to another time unless disturbed.


It is interesting to note that the addition of one extra cell anywhere where it may be counted as a neighbor to the beehive ruins the stability completely.

Dynamic patterns that have a stable form (really stationary) are also present, some of which are periodic and others which exhibit non-periodic, complex motions.  The simplest examples of periodic behaviors are those structures that seem to spin or rotate thus seeming to retain their shape.  The textbook example of this is the blinker that appears to rotate.


A more careful analysis shows that only the central cell persists while the exterior ones die and resurrect every second time step.  The Wikipedia article shows other periodic structures, including the very beautiful pulsar, that exhibit much more complex periodicities.

In the class of non-periodic stationary structures comes the glider, which is a 5-cell structure that sort-of shuffles its way across the board so that it walks through 4 distinct shapes such that its center moves one cell diagonally every 4 steps.


Obviously, a huge amount of complexity results from these simple rules.  But the Game of Life has more structure than the emergence of a bewildering array of patterns.  It turns out all logic gates needed in computing can be constructed using gliders, as seen in this video by Alex Bellos

The key structure for producing the gliders is known as Gospar’s Glider Gun, which pumps out a signal (a glider) that can be interrupted by other gliders from other guns.  The fact that all possible logic gates can be constructed in the Game of Life means that it is Turing-Complete.  Thus you can program any algorithm in Game of Life, including the Game of Life

Amazing what a few simple rules can do!

Collatz Conjecture

Big things come in small packages.  From tiny acorns grow mighty oaks.  Never judge a book by its cover.  These familiar euphemisms try to capture, in a pithy way, the basic idea that simple looking systems can often hide a surprising amount of complexity.   This basic observations couldn’t more true than in the case of the Collatz Conjecture.

The Collatz Conjecture is so simple that, on the face of it, it must be easy to prove.  But like other easily stated suppositions in mathematics, the proof, if one exists, must be particularly difficult to construct, since it has eluded mathematicians for nearly 100 years.

In a nutshell, the Collatz Conjecture says that a particular process, described just below, when repeatedly applied to any integer always ends up the same way, regardless of the starting value of the integer.  The process, referred to as the Half or Triple-Plus-One (HTPO) process, is as follows:

  • If the integer is even, divide it by 2
  • If the integer is odd, multiply it by 3 and then add 1>

There it is.  It is so simple that it can be implemented in a few lines in just about any language; probably even in COBOL.  And yet, proving that this conjecture is actually so hard that the famous mathematician Paul Erdös is credited with saying

Mathematics may not be ready for such problems.

– Paul Erdös on the Collatz Conjecture

Obviously this column is not going to present a proof but it is going to explore some of the properties of the conjecture – including a few that may not have been seen in the literature.  There are two reasons for doing so.

The first reason is the sheer joy and delight that arises from seeing inexplicable complexity arise out of such simple rules.  Amazingly rich plots results simply by looking at the data from numerical experiments in a variety of different ways.  What, at first, may look like randomness resolves itself in patterns later on as the number of integers examined is increased.

The second reason is less about mathematics and far more about human reason.  Why a proof is hard to find is a topic in epistemology worth exploring all on its own.  Consider that the Collatz conjecture is a system that is far easier to encode in a computer than say the solution to an orbital mechanics problem or the motion of a fluid over a fixed object like an airplane wing.  No calculus or linear algebra is required.  Nowhere does one need real or complex numbers.  All the machinery that is needed is learned in elementary school and yet the proof is much harder than those associated with the ‘more advanced’ topics.  Surely there is a Socratic lesson buried in all of this. But before we explore that topic, let’s look at the Collatz conjecture in detail.

Pick an integer, say n = 3, and apply the HTPO process to it.  Since 3 is odd, the resulting value is 10.  Now use 10 and the next value and again apply the HTPO process.  Since 10 is even, the resulting value is 5.  Starting from here and applying in succession leads to the following ‘trajectory’.


Note how the sequence of numbers rises and falls getting up as high as 16 and falling as low as 1.  This is called a hailstone sequence since it is reminiscent of the multiple rises and falls of a hailstone during a thunderstorm.  Also note that once the number 1 is reached the sequence is now trapped in the infinitely-repeating ‘4-2-1’ loop.  It is customary to stop the iterations when 1 is reached for the first time and to declare that the sequence has stopped.  By convention, the number of integers in the sequence (including the starting value) is declared as the stopping time.  Thus the stopping time for a starting value of 3 is 8, the number of unique circles in the figure.

The Collatz Conjecture is then the statement that the number 1 is always reached no matter what the initial value may be.  While the proof of this assertion has not be obtained, huge numbers have been tested (260 = 1,152,921,504,606,846,976 ) and none have failed to reach 1 and settle into the ‘4-2-1’ loop.

Investigators, looking for a proof, have employed a number of tools in an attempt to better understand what makes this conjecture so shy in being characterized with a logical proof.  Many of these tools are visualizations of the stopping times as a function of initial value.

The following figure is one such plot showing the stopping times for the first 100 integers.


It is remarkable that there is no smooth pattern in the results.  Adjacent integers, such as 26 and 27, can have wildly different stopping lengths, 10 versus 111, respectively, while adjacent pairs can have identical stopping lengths.  Particularly noteworthy is the fact that the integers 28, 29, and 30 all have a stopping length of 18 despite their rather different trajectories:


The jerky or random character of the stopping length plot for the first 100 integers transitions into something more akin to patterns within patterns when the number of integers surveyed increases to 2000.


There seem to be overlapping curves asymptotically rising and falling, layered one on top of the other with large regions where they interleave.

Different visualizations reveal different structures.  For example, the stopping times for the first 10000 integers, plotted on a semilog plot


reveal a general triangular shape, whereas the same data shown on a full log-log plot shows


that the values are tending to cluster rather that moving in a unbounded fashion.

This latter observation opened another line of inquiry centered on just how high does the hailstone trajectory go rather than how long does it take for it to land.  A little bit of additional coding to capture the full trajectory for the first 2000 integers reveals that one value, 9232, tends to be hit more often than all the others.


There is a strong line visible just under 104 on the plot and some simple statistics show that 9232 forms 16% of all the highest values in the first 100 integers and about 33.8% for the first 2000.  As the integer range increases to 20000, additional horizontal attractors (to coin a term in relation to the Collatz Conjecture) come in, although it is difficult to pick out just how prominent they are due to the business of the plot.


It is interesting to see just how low these horizontal attractors extend.

As this column closes, it is worth repeating that all of this structure comes from a repeated application of the HTPO process for a finite number of times.  The fact that mathematics can’t say whether the process will stop for arbitrary integers is astonishing and speaks to many of the basic complexities that arise in proof and logic when the repeated operations are involved.  It seems that if Socrates were alive and playing with the Collatz Conjecture he might be inclined to point out that the only wisdom is knowing that we don’t know very much.