Latest Posts

Logic and Limits

For those of us who grew up in the seventies and had an interest in things other than sex, drugs, and rock and roll, Star Trek was a mainstay.  The characters, the visuals, the storylines all inspired a generation of scientists, engineers, and explorers.  And no other character in Star Trek epitomized a bright new future full of exciting technology more than did Mr. Spock.  As a Vulcan, his strong embrace of logic influenced many of us to value reason and clear thinking ever since.

But a curious aspect of Vulcans, in general, and Mr. Spock, in particular, is their illogical devotion to logic.  Although the original show frequently emphasized that logic wasn’t enough (often through the intuition or hunches of Kirk, McCoy, or Scott) and that the human experience transcends logic, there is no portrayal that the Vulcans ever examine the basis of logic.  As a race, they never seem to engage in meta-logic, the logical look at the underpinnings of logic and an exploration of what it can do well and where it fails.

Clearly this point is not the central concern for the show nor should it be.  Nonetheless, it is fun to speculate what Spock might have done should the writers had known about the limits of logic.  I’m not speaking of something as sophisticated as the implications of Gödel’s theorem.  Rather, I am talking about the simple and obvious issues associated with logic.  The need for reasoning to be based on the finite knowledge of beings, the need for first principles or axioms, and the boot-strapping needed to go from a guess to a hypothesis to a theory with strong evidentiary support.  In short, inductive reasoning.

Induction is the key pillar of scientific advances; it is the method by which all knowledge of the physical world is explored and gathered.  Induction is inherently probabilistic in the way it tries to link cause and effect (think of the number of times Kirk asked Spock for the odds that such-and-such a thing would happen) and has been remarkable successful in helping mankind grope its way to a better understanding of the physical world.  But despite its success, its logical foundations are weak.

A nice discussion about the issues associated with inductive reasoning is found in Chapter 5 of Introduction to Logic, by Harry J. Gensler and I’ve cited Gensler and his discussions in several posts in the past.

What struck me about the arguments found towards the end of Chapter 5 is the remarkably fragile way that the whole logical structure holds together when examined closely.  In other words, logic examining logic finds itself disappointing in two areas – the proper formulation of induction principles and the justification for believing an inductive argument.

The element of probability inherent in induction can lead to some absurd and laughable conclusions.  Gensler offers this statistical syllogism by way of illustration.

60 percent of All Cleveland voters are Democrats
This non-Democrat is a Cleveland voter
This is all we know about the matter
Therefore, it’s 60 percent probable that this non-Democrat is a Democrat

Of course, this is a nonsense argument; but before being quick to dismiss it consider these three areas where arguments like this are not easy to spot and can do damage.  First is the student (at any age) who is learning a subject for the first time.  Unless the entire curriculum is memorization, somewhere along the line the student will need to infer something based on the knowledge in hand.  This step is usually where beginners make their mistakes – jumping to an unwarranted conclusion precisely because their induction skills (their intuition) haven’t been honed within the broader context.  This leads to frustration and, far to often, abandonment of the subject.  Extending this idea to machine-based inference engines, ones whose intuition can’t be improved through learning (at least not in the way humans learn), brings to the forefront the fact that the need for rigorous laws of inductive inference is far more acute.  Even though it makes for engaging science fiction, we don’t want a machine in our lives coming up with an absurd conclusion just because of sloppy rules.  And finally, large components of society are constantly being grifted by arguments of the Cleveland-voter sort.  We see it in politics, advertising, and all the other sophistical interactions surrounding us.  To quote P.T. Barnum, “There’s a sucker born every minute.”  Unfortunately, as Gensler goes to great pains to point out, there doesn’t seem to be a way to make the rules of inductive inference tighter.  Even principles, like Occam’s Razor, are merely guidelines that must be applied with judgement.  Or, as I like to phrase it, there doesn’t seem to be a way to make inductive logic ‘sucker-proof’.

Even more interesting is the fact that justification of inductive arguments seems to be self-referential and circular.  Gensler discusses the criticism leveled by David Hume against induction and the five responses that have arrived over the intervening years.  I won’t go over all five in detail but will confine the discussion to the three most interesting ones.

  1. Justification for induction can be done by presuming nature is uniform, meaning that it works in regular patterns. I happen to favor this justification even though Gensler is correct in insisting that it is hopelessly vague.  Curiously, his most interesting observation is that the idea that nature is uniform is based on arguing from prior experience.  This argument is inductive and thus we have induction justifying induction.
  2. Closely related is the idea that inductive methods have been so profoundly successfully in every aspect of life that they must be correct. Of course, this is another circular argument in which an inductive syllogism is used to justify the inductive method.
  3. The final response Gensler cites is the most interesting – that we approach the justification of inductive logic in the same way we approach the justification for deductive logic. He singles out modus ponens as that particular part of deductive logic worth examining.  Why does modus ponens work and why should we believe it?  This last question is most germane when considering the truth table for modus ponens where the premise is false but the consequent is true (A \rightarrow B is true if A is false but B is true).  In some sense this is an arbitrary convention (although a reasonable one – if I can use a very vague word).  So why should we expect modus ponens to work?  How do we justify it?  He first says that we might try using this syllogism
    If the truth table for modus ponens never gives true premises and a false conclusion, then modus ponens is valid.
    The truth table for modus ponens never gives true premises and a false conclusions.
    Therefore, modus ponens is valid.

    But here we have a problem, we are using modus ponens to justify modus ponens.  The justification, even in the firm ground of deductive logic, leaves something to be desired.

In the end, it seems that we have to simply accept that logic has its limits and that, as a race, we’ve bootstrapped our way to our present level of knowledge.  This holds for inductive and deductive logic, with the former providing the basis for the latter (i.e., experience has taught us that deductive logic axioms are the ones we need to assume to get the whole thing to work).  Gensler concludes with:

Inductive reasoning has been very useful.  Instructively, we assume that it will continue to be useful.  In our lives, we can’t do without it.  But the intellectual basis for inductive reasoning is shaky.

It would have been interesting to hear Mr. Spock discuss these points, to argue these a bit with the more intuitive Kirk.  But perhaps a serious examination would have undermined the entire purpose that he served on the show.  Perhaps it would have undermined the very reason for Vulcan to be, and maybe that would have been an episode worth seeing.

Logic and Politics

There is a widespread belief about politicians and used car salesmen that both species can often speak with a lot of gusto, using sizeable portions of the English language, without actually saying much or anything at all.

Sometimes there is a certain charm in circumlocution. For example, the legendary speech by Noah 'Soggy' Sweat Jr. certainly can be whittled down to 'it all depends' but at the loss of a great deal of fun and verve.

On other occasions, ambiguity and double-speak can be misleading or even down-right deadly. An excellent example of such a situation is in the following vivid exchange taken from Isaac Asimov's Foundation.

For those unfamiliar with this recognized science fiction classic, the story is set far in the future of mankind in which a vast galactic empire is beginning its decline. The Foundation is a large colony of scientists and researchers who the empire has exiled to the galactic fringe in order that they may compile an encyclopedia that catalogs the knowledge of the race in advance of the collapse. In actuality, the empire has been tricked into forcing their exile so that the colony can become the nucleus around which the next empire coellesces.

The out-of-the-way planet that the Foundation calls home sits within a set of smaller star systems that are breaking away from the empire as the latter's influence and dominion recedes. The following excerpt comes from a strategy meeting where the Foundation's leaders are trying to determine how to respond to the ulimatum they've just received from Anacreon, the largest of the breakway states, in light of recent diplomatic visits of the delegations from Anacreon and the Galactic Empire.

The exchange starts with Salvor Hardin, the Foundation's politically-savy mayor, trying to convince the board of Trustees, who oversee the completion of the encyclopedia, just how precarious their situation is. Hardin's position is that the Board's appeal to the Empire for protection was the cause of the threat from Anacreon.

Said Yate Fulham: "And just how do you arrive at that remarkable conclusion, Mr. Mayor?"

"In a rather simple way. It merely required the use of that much-neglected commodity – common sense. You see, there is a branch of human knowledge known as symbolic logic, which can be used to prune away all sorts of clogging deadwood that clutters up human language."

"What about it?" said Fulham.

"I applied it. Among other things, I applied it to this document here. I didn't really need to for myself because I knew what it was all about, but I think I can explain it more easily to five physical scientists by symbols rather than by words."

...

"The message from Anacreon was a simple problem, naturally, for the men who wrote it were men of action rather than men of words. It boils down easily and straightforwardly to the unqualified statement,...,which in words, roughly translated, is, 'You give us what we want in a week, or we take it by force.'"

"All right." Hardin replaced the sheets. "Before you now you see a copy of the treaty between the Empire and Anacreon – a treaty, incidentally, which is signed on the Emperor's behalf by the same Lord Dorwin who was here last week – and with it a symbolic analysis."

"As you see, gentlemen, something like ninety percent of the treaty boiled right out of the analysis as being meaningless, and what we end up with can be described in the following interesting manner:

"Obligations of Anacreon to the Empire: None!

"Powers of the Empire over Anacreon: None!"

Later on the group discusses a similar analysis of the visits from the empire's representative Lord Dorwin

"You know, that's the most interesting part of the whole business. I'll admit I had thought his Lordship a most consummate donkey when I first met him – but it turned out that he was actually an accomplished diplomat and a most clever man. I took the liberty of recording all his statements."

"... The analysis was the most difficult of the three by all odds. When Holk, after two days of steady work, succeeded in eliminating meaningless statements, vague gibberish, useless qualifications – in short, all the goo and dribble – he found he had nothing left. Everything canceled out."

"Lord Dorwin, gentlemen, in five days of discussion didn't say one damned thing, and said it so you never noticed."

Not all applications of symbolic logic are as dramatic and interesting as the one Asimov depicts. Nonetheless, even though there may not be any cosmic significance, symbolic logic can be a tool that makes life easier and reasoning and comprehension more clear.

Suppose, for example, that you have a friend who says

If it is raining and either it is not raining or it is snowing then it is snowing.

What do you make of that statement. What does it mean? Does it ever make sense? Trying to parse his sentence is nearly impossible - at least in its fully decorated language form. Symbolic logic let's us, much like Mayor Hardin, strip away all the nonsense and come to some supported conclusion about your friend's ability to communicate.

To apply it, first take the basic pieces and represent them with simple symbols. For this example, let p mean 'it is not raining' and q 'it is snowing. Your friend's cryptic statement is symbolically represented as:

 \left[ \neg p \wedge (p \vee q) \right] \rightarrow q \; ,

where \neg means not (i.e. \neg p means it is raining), \wedge is the logical and, \vee is logical or, and \rightarrow is the usual if-then relation.

Having translated the cryptic statement into symbols, we can now manipulate in terms of the standard rules of propositional logic.

The first step is to rewrite the if-then implication in its 'or' form

 \neg \left[ \neg p \wedge (p \vee q) \right] \vee q \; .

Then use de Morgan's rule to bring the negation inside

 \left[ \neg \neg p \vee \neg (p \wedge q) \right] \vee q \; .

Next use the double negation to simplify the first term

 \left[ p \vee \neg(p \wedge q) \right] \vee q

and then use de Morgan's rule again

 \left[ p \vee (\neg p \vee \neg q) \right] \vee q \; .

The law of distribution is the next step

 \left[ (p \vee \neg p) \vee (p \vee \neg q) \right] \vee q \; .

From the basic laws of logic p \vee \neg p \equiv T since a proposition is either true or false so that that same proposition or not that proposition is always true (a tautology). This observation yields

 \left[ T \vee (p \vee \neg q) \right] \vee q \; .

Next apply T \vee p \equiv p , and our original statement is now

 (p \vee \neg q) \vee q \; ,

which in English reads something like 'either it is raining or it is not snowing or it is snowing'. Still a bit confusing to parse but certainly much simpler. Fortunately, we can continue analyzing the statement further.

Using distribution again gives

 (p \vee q) \vee (\neg q \vee q ) \; ,

which becomes

 (p \vee q ) \vee T

when \neg q \vee q \equiv T is used.

Finally, as noted earlier,  anything \vee T \equiv T and the original statement boils down always being true. Your friend has uttered a tautology and, for one brief moment, shown himself to be worthy of being called 'an accomplished diplomat and a most clever man' who is able to avoid saying 'one damned thing' and of saying 'it so you never noticed'.

Interfaces and Thoughts

This month’s column is, on the face of it, a whimsical undertaking.  I’ll be examining the man-machine interface on two different cars – a 2004 Saturn Ion and a 2000 Infiniti I30.  Yes, you are correct!  I own some old cars and, just in case your thoughts wander into the notion that at least there is a luxury car in the mix, please note that the Infiniti was purchased used, with over 100,000 miles on it.  Anyway, the central theme of this post is not what kind of cars I own and drive but rather what the design of the car tells about the designer.

The idea that the form and function of a work reveals a portion of the mind of the designer is old, very old, and is one of the central arguments from the medieval synthesis for the existence of God.  I’m not claiming any deep metaphysical insights from the discussion here but I do think there are interesting reflections on the human mind in this very tongue-in-cheek analysis.

To start, let me say that the Infiniti is the better designed car from the point of view of acceleration, handling, and space.  It has a V8 compared to the Ion’s 4 cylinder inline.  It has leather seats versus the Saturn’s cloth ones.  And the list could go on.  Nonetheless, the man-machine interface in the Saturn is hands down better.

Let’s start with the power locks on the doors.  Here is a picture of the locking control for the I30:

and the corresponding look at the Ion:

While they are very close in design they don’t work the same way at all – leading to a great deal of confusion whenever I switch.  The icon of the black key on the white background means unlock on the Saturn and lock on the Infiniti.  And that difference speaks to how the people who designed these controls think.

I am quite aware that there are cultural differences between the Japanese and US mind but these shouldn’t have come into play.  That isn’t to say that they aren’t important or valid differences nor that the resulting man-machine interface isn’t influenced by them but rather there should be universal agreement across both designs.

The primary purpose of any key, regardless of the time and place in which it is used, is to unlock a locked door.  In certain, older circumstances keys are also used to lock the door again but this lesser purpose should not be the one that the designers seized for their manufacturing.  This is because, for reasons of safety and security, a car is something that is generally in a locked state unless being entered or exited.  Once in a car, especially one that has power locks, the notion of key as the locking agent becomes almost meaningless.  In addition, cars are imported and exported around the world and international standardizations are commonplace.  Thus, the only possible conclusion is that the Ion gets it right and the Infiniti gets its wrong.  This conclusion also suggests that the Infiniti designers were perhaps focused on the Japanese interpretation and not on how their product would be used in the global market.

Of course, there are those of you who still object that this is simply a matter of convention; that the functional argument is not persuasive.  I must admit that when I first encountered this difference I wasn’t swayed either.  The interface that really draws the difference and pushed my thinking from equivocation to certainty is the windshield wiper control.  Here the problem isn’t embracing a difference between conventions but a matter of internal consistency.  And again, the Saturn is the clear winner.

To be concrete, here is a photo of the interface to the wiper system from the Ion

and the I30

Again, the two designs look very similar – too similar, in fact, to keep me from getting confused.  Both designs sport the traditional iconography of a dashed line (- -) for intermittent wipers, the solid line (—) for continuous low, and a heavy solid line for continuous high ().  A careful examination reveals that the directions one must articulate the control is different; on the Ion, the wipers go from intermittent to low to high by pushing the lever upwards while on the I30 the same sequence results by pushing down.  Again, the difference seems to be one of convention but we haven’t discussed the intermittency setting and it is here that the I30 shows itself to be inconsistent.

Before getting to the inconsistency, there is one more matter of convention that differs between the two.  Both controls sport a dial bearing graduated lines (the white lines on the right that are wider at the top and taper to almost nothing at the bottom). that set the speed of the intermittent wipers.  For the I30, the larger the line the larger the time gap between successive swipes by the wipers.  For the Ion, the larger the line the smaller the time gap between successive swipes.  So their conventions are dual to each other, with the I30 working in terms of time and the Ion in terms of frequency.

The inconsistency rears its head when the lever and dial are used in tandem (i.e. when using intermittent wipers).  In the I30, higher frequency is obtained by pushing the lever down but by turning the dial up.  On the Ion, up means the same thing for both lever and dial.  And that, in a nutshell, is why the man-machine interface of the Ion is better than that of the I30, despite the I30 being a better car overall.

So, what do these design choices reveal about the minds of the designers.  In the case of the Ion, it seems to show that there was one single guiding mind or principle.  Whether you prefer the design choices of the Ion over the I30, there is no arguing that the Ion design is self-consistent.  White icons always mean changing the state to something more active from the previously passive state.  For the doors from locked (passive) to unlocked (active); for the wipers from off (passive) to low frequency (active) to high frequency (more active).  In the case of the I30, the design is a hodge-podge of concepts with differing motifs and little consistency.  This suggests that no single guiding principle that knitted the design of the man-machine interface.  Part of this is, no doubt cultural, but part of it seems to be indicative of a company that puts pride in the subsystems but fails to put as much emphasis on knitting the systems together in a seamless whole.

K-Means++ Data Clustering: Improved Seeding

This month’s column is a follow on to last month’s look at the K-means data clustering.  Again, I’ll be looking at James McCaffrey’s Test Run article K-Means++ Data Clustering, published in MSDN in August 2015.

In the last post, I confined my analysis to the K-means data clustering algorithm using McCaffrey’s hand-picked data points that showed that the heights and weights of a sample of some population aggregated into 3 distinct groups of clusters.  Once the data were normalized, the human eye easily picks out these clusters but the machine, not so much.  The purpose of the K-means clustering was to provide the machine with a mechanism where it might mimic, at least in the most primitive of senses, the ability of the human eye to correlate a 2-dimensional image.

Of course, McCaffrey’s sample was designed specifically to illustrate the techniques with no gray areas that might confuse the person, let alone the computer.  Nonetheless, the algorithm failed to find the clusters completely around 20-30% of the time; an estimate of my own based on a semi-empirical method of restarting the algorithm a number of times and simply counting the number of times that the clustering gave a color-codes result that looked like the following figure (each color denoting a cluster).

The problem with the algorithm is one of seeding the process.   For each of the k clusters chosen as a guess by the user, the algorithm needs to start with a guess as to where the centers of those clusters lie.  The K-means algorithm seeds the process by selecting samples from the distribution at random to give the initial guess as to the cluster centers.  If this random guess selects two nearby points as two distinct centers then a situation like the one shown above arises.

The ‘++’ in the K-means++ algorithm denotes an improvement to the seeding process to avoid this undesirable outcome.  The technique is called roulette wheel selection and it works as follows.

K-Means Data Clustering

This month’s column is inspired by James McCaffrey’s Test Run article K-Means++ Data Clustering, published in MSDN in August 2015.   McCaffrey’s piece is really two articles rolled into one and is interesting not only for the data clustering algorithm that it presents but also for the fascinating light it sheds on the human facility to visually analyze data (not that McCaffrey comments on this point – it’s just there for the careful observer to note).  But before getting to the more philosophical aspects, it’s necessary to discuss what K-means data clustering is all about.

To explain what data clustering entails, I will use the very same data the McCaffrey presents in his article as the test set for mine.  He gives a set of ordered pairs of height (in inches) and weight (in pounds) for 20 people in some sampled population.  The table listing these points

Height
(in)
Weight
(lbs)
65.0 220.0
73.0 160.0
59.0 110.0
61.0 120.0
75.0 150.0
67.0 240.0
68.0 230.0
70.0 220.0
62.0 130.0
66.0 210.0
77.0 190.0
75.0 180.0
74.0 170.0
70.0 210.0
61.0 110.0
58.0 100.0
66.0 230.0
59.0 120.0
68.0 210.0
61.0 130.0

doesn’t reveal anything special at a glance; nor should it be expected that it would.  As every good data scientist knows, the best (human) approach is to plot the data so that the eye can pick out patterns in that the mind may fail to perceive in the tabulated points.  The resulting plot

reveals that there are three distinct clusters for the data. McCaffrey doesn’t show a plot of these data but it seems that he omitted it because he wanted to focus on how to teach the computer to find these three groups using K-means clustering without the benefit of eyes and a brain, which it obvious doesn’t have.

The principle behind the K-means clustering is quite simple:  each group of points is called a cluster if all the points are closer to the center of the group than they are to the centers of the of the other groups.

The implementation is a bit more difficult and tricky.  Distance plays a key role in this algorithm but whereas the human eye can take in the entire plot at one go and can distinguish the center of each cluster intuitively, the computer must calculate each point one at a time.  This leads to two problems:  1) how to numerically calculate distances between points and 2) how to find the centers of the cluster from which these distances are measured.  Let’s deal with each of these.

The fact that the units for the x-axis are different from those on the y-axis can lead to some problems identifying the clusters – for both the human and the machine.  Consider the following plot of the same data but with the x- and y-axes given the same extent from 50 to 200 in their respective units:

It isn’t at all clear that there are three distinct clusters as opposed to two.  A better approach is to normalize the data in some fashion.  I prefer using the Z-score where the data are normalized according to

 Z_x = \frac{ x - <x>}{\sigma_x} \; ,

where is the mean over all the values and \sigma_x is the corresponding standard deviation.  Applying the normalization to the height and the weight gives

Once the data are normalized, the distance between them is determined by the usual Euclidean norm.

The second step is to determine the centers of each cluster, which is essentially a chicken-and-the-egg problem.  On one hand, one needs the centers to determine which point belongs to which cluster.  On the other hand, one needs to have the points clustered in order to find their center.  The K-means approach is to guess the centers and then to iterate to find a self-consistent solution.

I implemented the K-means clustering code in a Jupyter notebook using Python 2.7 and numpy.  I defined 2 helper functions.

The first one determined the cluster into which each point was classified based on the current guess of the centers.  It loops are all points and compares each point’s distance from the k centers.  The point belongs to the closest center and the point is assigned the id number closest center in a list, which is returned.

def find_cluster(curr_means,data):
    num_pts      = len(data)
    k            = len(curr_means)
    cluster_list = np.zeros(num_pts,dtype=int)
    dist         = np.zeros(k)
    for i in range(num_pts):
        for m in range(k):
            l       = data[i] - curr_means[m]
            dist[m] = l.dot(l)
        cluster_list[i] = np.argmin(dist)
    return cluster_list 

The second helper function updates the position of the centers by selecting from the list of all points, those that belong to the current cluster.  This function depends heavily on the slicing and numerical functions built into numpy.

def update_means(k,cluster_list,curr_data):
    updated_means = np.zeros((k,2))
    for i in range(k):
        clustered_data     = curr_data[np.where(cluster_list == i)]
        updated_means[i,:] = np.array([np.mean(clustered_data[:,0]),np.mean(clustered_data[:,1])])
    return updated_means  

For this implementation, I chose 3 clusters (k = 3) and I seeded the guess for the centers by randomly selecting from the list of the initial points.  Subsequent updates moved the centers off the tabulated points.  The update process was iterated until no changes were observed in the position of the cluster centers.  The final cluster assignments were then used to color code the points based on the cluster to which they belong.

The first run of the algorithm was successful with the resulting plot indicating that the process had done a good job of identifying the clusters similar to the way the human eye would.

Subsequent runs show that about 70-80% of the time, the algorithm correctly identifies the clusters.  The other 20-30% of the time, the algorithm gets ‘stuck’ and misidentifies the points in way that tends to split the small cluster in the bottom left into two pieces (although not always in the same way).

The ‘++’ nomenclature featured in McCaffrey’s article (K++-means clustering) refers to a more sophisticated way of selecting the initial cluster centers that minimizes the chances that the algorithm will get confused.    This new type of seed and some numerical experiments with a larger data set (in both number of points and dimensionality) will be the subject of next month’s column.

Of Fishbones and Philosophy

On the off chance that you, dear reader, are thinking that there is precious little overlap between the skeletons left over from dead fish and the high art of philosophy, let me set your mind at rest.  You are correct; there isn’t much.  Nonetheless, this installment isn’t a shortened quip-of-a-column designed to note this simple observation and then to make a quick, albeit graceful exit.  In point of fact, the fishbones that I am referring to have a great deal to do with philosophy, in general, and epistemology, specifically.

For those you aren’t aware, the fishbone or Ishikawa diagram (after Kaoru Ishikawa) is a way of cataloging the possible, specific causes of an observed event as a way of inferring which one is the most likely.  Its primary application is to those events where the effect is clearly and obviously identifiable but where the trigger of that event is unknown or, at least, unobservable.  One can usually find these diagrams applied in industrial or technological settings where a fault in a complex system rears its ugly head but the failure mode is totally or partially unknown.

Now it is one of those trendy nuggets of common knowledge that philosophy is one of those subjects designed for the technically-challenged to while away their time considering how many angels can dance on the head of a pin or whether to push the fat man onto the tracks in order to save the lives of the many passengers on the train.  No practical applications can be found in philosophy.  It has nothing important to offer workplaces where holes are drilled, sheet metal bent, circuits soldered, products built, and so on.

The fishbone diagram speaks otherwise – it deals with what is real and practical and with what we know and how we know it in a practical setting.  It marries concepts of ontology and, more importantly, epistemology with the seemingly humdrum worlds of quality assurance and manufacturing.

To appreciate exactly how this odd marriage is affected, let’s first start with a distinction that is made in fishbone analysis between the proximate cause and the root cause.  A practical example will serve much better here than any amount of abstract generalizations.

Suppose that as we are strolling through ancient Athens, we stumble upon a dead body.  We recognize that it is our sometime companion by the name of Socrates.  Having been fond of that abrasive gadfly and possessing a slice of curiosity consistent with being an ancient Greek, we start trying to determine just what killed Socrates.  One of us, who works in the new Athenian pottery plant where the emerging science of quality management is practiced, recommends making a fishbone diagram to help organize our investigation.

Inside the head of the fish we place the key observation that Socrates is dead.  Off the central spine, we string possible causes of death, grouped into categories that make sense to us.  After a lot of discussion, we agree these four:  Divine Intervention, Natural Causes, Accidental Death, and Foul Play.   Under each of these broad headings we add specific instances.  For example, some of us have heard rumors of the dead man’s impiety, so perhaps Zeus has struck him down with a thunderbolt.  Other suggest that being hit with a discus was the cause of death, just like what happened to uncle Telemachus at the last Olympic Games.  We continue on until we have our finished fishbone.

 

This version of the fishbone diagram aims at helping us determine the proximate cause.  We want to know what actually killed him without, at this stage, trying to figure out why (although the question of ‘why’ helped us in populating the list).

We then, in good logical fashion, start looking for observations that either strengthen or weaken each of the bones in our diagram.  We find no evidence of charring or submergence in water, so we argue that Divine Intervention is highly unlikely.  There is no blood or signs of blunt force trauma, so scratch all the possibilities under Accidental Death.  One of us notes that his belongings are all present and that his face is peaceful and his body shows no subtle signs of violence like what might be attributed to strangulation or smothering, so we think murder very unlikely.  Finally, one of us detects a faint whiff of a distinct odor and concludes that Socrates has died by drinking hemlock.

In fishbone analysis, hemlock poisoning is the proximate cause – the direct, previous link in the chain of causation that led to his death.  Note that we haven’t actually seen Socrates consume the lethal cocktail; we are simply inferring it based on the effect (he’s dead) and the smell (likeliest cause).  The next step is to determine the root cause – the reason or motivation for his consumption of the hemlock.

We find, after collecting a different type of observations, that he was executed by the Polis of Athens for impiety and for corrupting the morals of the youths of our city state.  We generally fill out this step by interviewing people and collecting human impressions rather than physical evidence.  A what point we decide that we’ve hit the root is up to us.  We can stop with the death sentence passed down by the Athenian court or we can look to the politics that led to that sentence.  We can stop with the politics or dig further into the social and demographic forces that led to Athenian democracy so disposed to dispatch the father of Western thought.  We can trace events back to Hippias the tyrant, or back to Homer, or wherever.

This sense of arbitrariness isn’t confined solely to where we cut off the determination of the root cause.  We also limited our universe of explanations in determining the proximate cause.  We can’t consider everything – how about dryads, sylphs, and satyrs?

In other words, all of us start our fishbone analysis with a Bayesian a priori expectation of likeliest causes and we apply, whether consciously or not, Occam’s razor to simplify.  Let’s reflect on this point a bit more.  Doing so brings into sharper focus the distinction between what we think we know, what we actually know, and what we don’t know; between the universe of knowable, unknown, and unknowable.  Ultimately, what we are dealing with is deep questions of epistemology masquerading as crime scene investigation.

The situation is even more interesting when one has an observable effect with no discernable cause.  Is the cause simply unknown or is it unknowable?  And how do we know in which category it goes without knowing it in the first place?

This epistemological division is even further muddied when we deal with indirect observations provide by tools (usually computers).  Consider the case where a remote machine (perhaps in orbit) communicates with another machine, which unpacks the electronic signals it receives.  If a problem is observed (a part is reported dead, for example), what does this actually mean?  Where does the fault lie?  Is it in the first machine or the second one?  Could the second one be spoofing by accident or malice (hacking) the fault on the first.  How does one know and where does one start?  And if one is willing to extend the concept of a second machine to include human beings and their senses then the line gets even more blurred between observer and observed.  Where does the fault lie, with our machines or with ourselves, and how does one know?

I will close on that note of uncertainty and confusion with an aporetic ending in honor of Socrates.  And all of it came from a little fishbone, whose most common practitioners would most likely tell you that they are not interested in anything so impractical as philosophy.

Dumbing AI Down

The concept of the Turing Test as the basic gate that an artificially intelligent system must pass to be judged sufficiently human-like is both pervasive and intriguing.   Dealt with widely in both serious academic circles and in fanciful science fiction avenues, the usual theme is one in which the AI must overcome a set of hurdles to pass the test.

Usually, these hurdles are viewed as a question of evolution - of smartening the AI so that it acts like a human being.  Topics along this line include enabling sophisticated algorithms that recognize levels of evocation; an essential property that allows for understanding humor, getting double entendres, recognizing sarcasm.  Poetry and evocative imagery is also a complication that has been explored off and on.

Far less frequently is the concept of devolution explored.  The idea here is to dumb down the AI so that it seems less like a computer and more like a human being.  It should know how to round numbers grossly, use vague characterizations, use contractions, cut verbal corners, and the like.  One should imagine Commander Data from Star Trek the Next Generation as the textbook example.

This post deals with an unillumined corner of this latter category.  Specifically, how to make sure an AI can mimic the intuition of a human being, warts and all.  What I am talking about is a designed AI with the same usual blind spots and foibles as the average human being. Nothing illustrates this so clearly as intuition-defying results that come from big numbers.

Humans are not usually good with numbers in general and are notoriously bad with big numbers.  This is such a prevalent problem that there is even a term to describe just how poor the average soul’s understanding of numbers and mathematics is – innumeracy.

Even for those practiced in the art, intuition can fail when big numbers come in the form of probability and statistics.  Two puzzles are famous for challenging the mortal mind:  The Birthday and the Monte Hall Puzzles.  Any AI that wants to blend in had better trip over these two problems with the rest of us or run the risk of being exposed as something other than human.

The Birthday Puzzle

Often called the Birthday Paradox, this puzzle is a significant challenge to the basic intuition that each of us has to the likelihood of coincidences.  As described, the Birthday Puzzle, goes something like this.  Suppose that there are n persons in a room, say attending a party.  What is the probability that any two of them have the same birthday?  Stated slightly differently, how many people do you need in a room before the probability is 50% that any two of them share the same birthday.

To be concrete and to keep things as simple as possible, let’s agree to ignore leap days and the possibility of a birthday falling on February 29th.  This step is not essential but it keeps the number of special cases to consider down to a minimum.

Ask the average person and they will tell you that you need about 182 people in the room to get a 50-50 shot (assuming that the average person can actually divide 365 in half and properly round).  Whether it is a result of nature or nurture, this ‘intuitive’ and ‘obvious’ answer is grossly wrong.

The easiest way to compute the probability is to compute the much easier probability that none of the n persons have the same birthday and then to subtract this number from 1 to get the probability that at least one pair share a birthdate in common.

Suppose that there are 3 people in the room; then there are 365 days to assign person 1’s birthday, 364 days to assign to person 2’s birthday, and 363 days to assign to person 3’s birthday.  Each of these numbers is then divided by the total number of days to get the probability.  The value of this number is

 \tilde P = \frac{365}{365} \frac{364}{365} \frac{363}{365} \; .

The probability that in a group of 3 persons at least one birthday is held in common is

 P = 1 - \tilde P = 0.0082 \; .

This approach, which doesn’t come naturally to most of us, is, at least comforting, in that common sense tells us that when there are 366 or more people in a room then at least one pair share a birthday.  The real assault on common sense begins when we generalize the analysis to an arbitrary number of people and graph the result.

The general formula is

 P_n = 1 - \frac{365}{365} \frac{364}{365} \cdots \frac{365-n+1}{365} \; .

When graphed the unexpected appears:  only 23 people are needed to get a probability just over 50%. 

By the time the number of people reaches about 60, the probability of a match is nearly 100%.  This result challenges our expectations and causes us genuine surprise.  How will an AI that passes the more conventional aspects of a Turing test react?

The Monte Hall Puzzle

Even more interesting and non-intuitive is the Monte Hall or Let’s Make a Deal Puzzle.  Based on the final segment of the game show Let’s Make a Deal, contestants are offered a choice between three doors.  Behind two of them are so-called booby prizes, usually a farm animal or some other unwanted thing.  Behind one of the doors is usually a car.  Monte Hall, the host of the show, asks the contestant to pick one door.  Next, he opens one of the other two doors and reveals one of the booby prizes (e.g. the goat).

Monte’s final step is to offer the one unopened door in trade to the contestant.  The question then is should the contestant accept the offer and switch doors or should he stay with his original pick?  Of course, there is no way to guarantee the correct choice, but the contestant has a definite statistical advantage if he switches.  The probability that the car is behind the door he chose is 1/3 while the probability it is behind the other door is 2/3.  Most people see two doors and assume that the odds are 50-50.  That’s human intuition – even though it is wrong.  And Monte Hall, who I believe must have been a confidence man before going legit, played on the contestant’s greed and excitement, by offering cash if they stay with their first choice.  Usually, he kept them from getting the car, which I suppose was his aim.

Now imagine what would happen when an AI went onto Let’s Make a Deal.  Certainly, the AI should be able to understand natural language.  But how should it react to the door choices, to Monte Hall’s con-man techniques.  If the AI is going to fool the humans around it, it'd better be conned alongside the rest of us.

Knowing When to Stop

Imagine that it's Christmas Eve and, due to some poor planning on your part, you find yourself short of a few gifts – gifts for key people in your life.  You reckon that you have no choice but to go out to the mall and fight all the other last minute shoppers to find those special trinkets that will bring smiles to all those faces you would rather not look at when they are frowning.  You know parking will be a delicate issue with few choices available at any given time and, as you enter the lot, you happen to see a space about one foot shy of a football-field’s distance to the mall entrance.  Should you take it or is there a better one closer?

If you take the space, you are in for a long walk to and fro as well as a waste of your time - and maybe, just maybe, the gifts will be gone by the time you get there.  If you pass by the space you run the risk of not finding a closer space and, most likely, this space will not be there when you circle back.

In a nutshell, this type of problem is best described under the heading ‘knowing when it is time to settle’.  It has broad applications in wide ranging fields; any discipline where decision making is done within the context of uncertainty mixed with a now-or-never flavor falls under this heading.

Within the computing and mathematical communities, this scenario is dubbed The Secretary Problem and has been widely studied.  The article Knowing When to Stop by Theodore Hill, published by The American Scientist, presents a nice introduction and discussion of the problem within many of the real world applications.  The aim of this month’s column is to look at some realizations of the problem within a computing context, and to look at some variations that lead to some interesting deviations from the common wisdom.  The code and approach presented here are strongly influenced by the article The Secretary Problem by James McCaffrey in the Test Run column of MSDN Magazine.  All of the code presented and all of the results were produced in a Jupyter notebook using Python 2.7 and the standard suite of numpy and matplotlib.

The basic notion of the Secretary Problem is that a company is hiring for the position of secretary and they have received a pool of applicants.  Since it is expensive to interview and vet applicants and there is a lost opportunity cost for each day the position goes unfilled, the company would like to fill the position as soon as possible.  On the other hand, the company doesn’t want to settle for a poor candidate if a more suitable one would be found with a bit more searching.  And, overall, what expectations should the company have for the qualifications of the secretary; perhaps the market is bad all over.

Within a fairly stringent set of assumptions, there is a way to maximize the probability of selecting the best choice by using the 1/e stopping rule.  To illustrate the method, imagine that 10 applicants seek the position.  Divide the applicant pool up into a testing pool and a selection pool, where the size of the testing pool is determined (to within some rounding or truncation scheme) by dividing the total number of applicants by e, the base of the natural logarithms. Using truncation, the testing pool has 3 members and the selection pool has 7.

Secretary Problem_pool

The testing pool is interviewed and the applicants assessed and scored.  This sampling of the applicant pool serves to survey the entire pool.  The highest score from the testing pool sets a threshold that must be met or exceeded (hopefully) by an applicant within the additional population found in the selection pool.  The first applicant from the selection pool to meet or exceed the threshold is selected; this may or may not be the best overall candidate. Following this approach, and using the additional assumption that each applicant is scored uniquely, the probability is 36.8% chance of getting the best applicant (interestingly, this percentage is also 1/e).

This decision-making framework has three possible responses:  it can find the best applicant, it can settle on a sub-optimal applicant, or it can fail to find any applicant that fits the bill.  This later case occurs when all the best applicants are in the Testing Pool and no applicants in the Selection Pool can match or exceed the threshold.

To test the 1/e rule, I developed code in Python within the Jupyter notebook framework.  The key function is the one that sets up the initial applicant pool.  This function

def generate_applicants(N,flag='uniform'):
    if flag == 'integer':
        pool = []
        for i in range(0,N):
            pool.append(np.random.randint(10*N))
        return np.array(pool)
    if flag == 'normal':
        temp          = np.abs(np.random.randn(N))
        return np.floor(temp/np.max(temp)*100.0)/10.0
    if flag == 'uniform':
        return np.floor(np.random.rand(N)*100.0)/10.0
    else:
        print "Didn't understand your specification - using uniform distribution"
        return np.floor(np.random.rand(N)*100.0)/10.0

sets the scores of the applicants in one of three ways.  The first method, called ‘integer’, assigns an integer to each applicant based on a uniform probability distribution.  The selected range is chosen to be 10 times larger than the number of applicants, effectively guaranteeing that no two applicants have the same score.  The second, called ‘normal’, assigns a score from the normal distribution.  This approach also effectively guarantees that no two applicants have the same score.  The occasions where both methods violate the assumption of uniqueness form a very small subset of the whole.  The third method, called ‘uniform’, distributes scores uniformly but ‘quantizes’ the score to a discrete set.  This last method is used to test the importance of the assumption of a unique score for each applicant.

A specific applicant pool and the application of the 1/e rule can be regarded as an individual Monte Carlo trial.  Each trial is repeated a large number of times to assemble the statistics for analysis.  The statistics comprise the number of times the best applicant is found, the number of times no suitable applicant is found, and the number of times a sub-optimal applicant is found and how far from the optimum said applicant is.  This last statistic is called the settle value, since this is what the company has had to settle for.

The following figure shows the percentage of times that each method finds an optimal candidate from the selection pool by using the 1/e stopping rule.

Secretary - Total Success

Note that for the two methods where duplication is nearly impossible (integer and normal), the percent of total success remains, to within Monte Carlo error, at the theoretically derived value of about 36.8 %.  In contrast, the uniform method, which enjoys a quantized scoring system, shoots upwards to a total success rate of 100%.  The reason that explains this behavior is that with a quantized scoring system there is only a discrete set of values any applicant can achieve.  Once the number of applicants gets great enough, the testing pool perfectly characterizes the whole.   And while the number of applicants needed to achieve this higher percentage is impractical for finding a secretary (who really wants 640 applicants interviewing for the position) the application to other problems is obvious.  There is really no reason that a decision process should always hinge on the difference between two choices of less than a fraction of the overall score.  This fact also explains why businesses typically ‘look’ at the market and pay careful attention to who is hiring whom.

For completeness, the following figures show the analogous behavior for the partial success percentage

Secretary - Partial Success

and the total failure scenarios

Secretary - Failure

An interesting corollary is to ask, in the case of partial success, how much short of optimal did the decision process fall in the process of settling on a sub-optimal selection.  The following figures shows histograms for 10, 80, and 640 applicants in the applicant pool for those cases where the decision process had to settle for a sub-optimal choice, for the normal and uniform cases, respectively.  As expected, there is an improvement in how far from the maximum the decision falls as the testing pool size increases but, even with 640 applicants, the normal process has a significant probability of falling short by 20% or more.

Secretary - Settle Normal

In contrast, the distribution for the uniform scoring quickly collapses, so that the amount that the settled-upon candidate falls from the optimum is essentially within 5% even with a moderately sized applicant pool.  Again, this behavior is due to the quantized scoring, which more accurately reflects real world scenarios.

Secretary - Settle Quantized

At this point, there are two observations worth making in brief.  First, the core assumption of the original problem, that all applicants can be assigned a unique score, is worth throwing away.  Even if its adoption was crucial in deriving the 1/e stopping rule, real world applications simply do not admit a clear, unambiguous way to assign unique scores.  Second, it is, perhaps, astonishing how much richness is hidden in something so mundane as hiring a qualified candidate. Of course, this is to be expected, since good help is hard to find.

Aristotle on Whiskey

It has been some time since this column explicitly examined the great Philosopher or explicitly cited his philosophy.  And while his approach to thinking and reflecting on various problems has never been far from the matters usually discussed here, I’ve not actually invoked his name for many columns.  So, it may seem to be a bit of a surprise to start the new year by mentioning Aristotle and whiskey together in this month’s title.  To some it may even be viewed as an unforgivable irreverence to one of the world’s greatest thinkers.  But, as I hope to show, there is nothing irreverent or surprising in linking Aristotle to alcoholic spirits, beyond the usual association that many have about the ancient Greeks – an expectation, no doubt, largely set by Plato’s Symposium.  At issue is the Aristotelian concept of virtue, the sloppy practice of equivocation (double-speak) in logical arguments, and a somewhat famous speech about whiskey made by Noah ‘Soggy’ Sweat Jr.

In 1952, a Mississippi’s law-maker by the name of Noah ‘Soggy’ Sweat Jr., was asked about his position regarding the state’s continued prohibition on selling alcoholic beverages to its citizens.  Soggy’s speech, which has since become immortalized due to its colorful language and its terseness, reads as

My friends, I had not intended to discuss this controversial subject at this particular time. However, I want you to know that I do not shun controversy. On the contrary, I will take a stand on any issue at any time, regardless of how fraught with controversy it might be. You have asked me how I feel about whiskey. All right, here is how I feel about whiskey:

If when you say whiskey you mean the devil's brew, the poison scourge, the bloody monster, that defiles innocence, dethrones reason, destroys the home, creates misery and poverty, yea, literally takes the bread from the mouths of little children; if you mean the evil drink that topples the Christian man and woman from the pinnacle of righteous, gracious living into the bottomless pit of degradation, and despair, and shame and helplessness, and hopelessness, then certainly I am against it.

But, if when you say whiskey you mean the oil of conversation, the philosophic wine, the ale that is consumed when good fellows get together, that puts a song in their hearts and laughter on their lips, and the warm glow of contentment in their eyes; if you mean Christmas cheer; if you mean the stimulating drink that puts the spring in the old gentleman's step on a frosty, crispy morning; if you mean the drink which enables a man to magnify his joy, and his happiness, and to forget, if only for a little while, life's great tragedies, and heartaches, and sorrows; if you mean that drink, the sale of which pours into our treasuries untold millions of dollars, which are used to provide tender care for our little crippled children, our blind, our deaf, our dumb, our pitiful aged and infirm; to build highways and hospitals and schools, then certainly I am for it.

This is my stand. I will not retreat from it. I will not compromise

The standard analysis found at Wikipedia or at Bo Bennett’s Logically Fallacious website is that Soggy’s rhetoric is an amusing example of double-speak.  Bennett has the following to say about this speech:

This is an amazing insight to the human mind and the area of rhetoric.  We can see how when both sides of the issue are presented through the same use of emotionally charged words and phrases, the argument is really vacuous and presents very little factual information, nor does it even take a stance on the issue.

On the surface, Bennett’s analysis seems to be spot on; Soggy’s speech suggests double-talk of the highest order uttered, most likely, with that old, rolling, Southern voice best exemplified by Foghorn Leghorn.  But there is another interpretation that is equally valid and should be explored, in the spirit of fairness.

To understand this more charitable interpretation, we need to step back and understand the Aristotelian concept of virtue; a concept discussed by Aristotle in many places, most notably in Book II of the Nicomachean Ethics.

The concept of virtue coincides with the proper balance between an excess or a deficiency of trait.  In the case of courage or bravery, Aristotle would say that the virtue of courage is having the proper mix between the two extremes of courage.  On one side, the soldier who possesses too little courage is timid and is incapable of performing his function in battle or even, most probably, even incapable of saving his own life.  On the other, the soldier who jumps into danger with no thought whatsoever for his safety or those of his compatriots serves no useful purpose due to his rashness and foolhardiness.

The Aristotelian notion of virtue as the balance between two extremes can be applied to Soggy’s speech as well.  At one extreme, is his first meaning of ‘by whiskey’: the overindulgence in alcohol that weakens character, causes lapses in judgement, and dissipates wealth, prosperity, and family cohesion.  This extreme is drunkenness indulged in by the alcoholic and should be avoided.

The other extreme is a bit more difficult to identify precisely because Soggy refers to it obliquely by noting all the advantages that result from its avoidance rather than discussing all the ills that follow by its pursuit.  This extreme, which may be called prudishness or uptightness, is often the province of the teetotaler, who deprives himself of the benefits that follow from the proper use of wine and spirits.  History shows that almost all cultures reserve an honored spot for ‘adult beverages’ because of the good effects they bring to both the body and the soul of its citizens.  In addition, Soggy points out that their production forms a significant sector of the modern economy, resulting in gainful employment and ample tax revenues that are also beneficial to society.

So there are at least two readings of Soggy’s speech: the first looks at it as a crass example of political jibber-jabber, the second credits it as a colorful explanation, in layman terms, of the virtue of alcohol.  Personally, I prefer the latter interpretation as it brings the great philosophical thought of ancient Greece to the everyday political doings of the modern world.

The More Things Change...

The scope of human knowledge has certainly changed over the last 3000 years.  Daily, we manipulate electrons and beam electromagnetic signals to and fro.  Large scale distribution networks move goods between highly specialized production centers.  Information flows from one corner of the globe to another in a matter of seconds.  Clearly, we live in an age of wonder.  But interestingly, while what we have learned has increased over the centuries, the methods of inquiry into obtaining new knowledge really haven’t changed all that much; certainly what we know has changed greatly but not how we learn it.

Case in point, it the use of regression or recursion as a tool for understanding the world in a philosophical way.  The applications of this approach are numerous due to the wide utility and the fruitful applications of it.

For example, Aristotle argued for that Man must have a purpose by using a type of regression argument whose spirit, although not its explicit nature, goes something like this.  Consider the bones of the hand; their function is to provide stiffness.  Likewise, the ligaments, tendons, and muscles provide the articulation.  The nerves provide the sense of touch and the flesh provides a means of touching, gripping, and holding as well as a unity and an encapsulation for the other parts.  All of these pieces have a function to perform in the greater existence that is the hand.  Likewise, one can find smaller parts serving limited roles within limbs, organs, and systems within the human body:  the eye serves to see; the nose to smell and breath; the mouth to chew, taste, drink, eat, breathe, and talk; and so on.  Since each piece contributes, through its function, to the greater function of the thing to which it is a part, isn’t reasonable to assume that the completed whole, the sum of all these individual parts within parts, also has a function or purpose?  This argument, put forward approximately 2500 years ago, is still compelling and persuasive.

Saint Thomas Aquinas, one the great philosophers of the medieval world, put forward arguments greatly influenced and cast in the form of Aristotle’s arguments.  In his Cosmos section of the Summa Theologica, Aquinas offers five proofs for the existence of God based on the concept of regression.  In outline form they are:

  1. Argument from Motion/Change: changing things depend on their interaction with other things (a plant depends on sunlight which, in turn, depends on ongoing fusion, and so on).  Since no change can start itself, things react when acted on by an external mover.  The only way to avoid an infinite regression of things depending on yet more things (all operating simultaneously) is to assume that there is a prime or unmoved mover.
  2. Argument from Efficient Causes: current effects are brought about by prior causes, which are, in turn, effects of causes one level more removed.  The only way to avoid an infinite regression of things causing other things is to assume that there was a first cause not caused by anything else.
  3. Argument from Possibility and Necessity: beings come and go into and out of existence; they are contingent – having a limited time during which they exist.  Given infinite time in the past, there must have been a time where all things were absent implying nothing could exist now.  Given the current state of existence, the only way to avoid this contradiction is to assume the existence of a non-contingent being.
  4. Argument from Gradation of Being: natural objects are understood and ranked by quantities or qualities; this object is hotter than that one, this thing is better than another (better constructed, better conceived, and so on). Ranking requires a maximum (e.g. hottest object) from which all are measured.  The only way to judge something as better is if there is a best that exists.
  5. Argument from Design: natural objects seem to work towards a goal that they themselves don’t or can’t know and so are directed by a higher intelligence. The only way to avoid an infinite regression of greater intelligences directing lesser ones is to assume that there is a master intelligence that directs all.

In all of these arguments, Aquinas identifies the thing that prevents an infinite regress, that stops the chain of thinking at a well-defined point, as God.

These kinds of logical arguments are not limited to the purely metaphysical realm.  One of the crowning achievements of mathematics is the development of set theory, which depends heavily on the type of arguments put forward above.  The presence of a run-away regress, one that never stops, is a sign of issues within set theory.  This typically, although not exclusively, happens when dealing with infinite collections or sets of things and leads to many paradoxes that must be resolved in order for mathematics to be put on a firm foundation.

Perhaps the most famous example is Russell’s paradox.  The paradox is based on defining two types of sets:

  • Ordinary sets – sets that do not contain themselves
  • Extraordinary sets – sets that do contain themselves

It isn’t at all clear that extraordinary sets exist.  If they do, they can’t be constructed using finite sets or the more familiar infinite sets like the integers or reals.  But assuming that they do exist, one can ask the following question:

Does the set of all ordinary sets contain itself?

To see that this leads to a paradox, first suppose that this set, call it Q, doesn’t contain itself.  It is then an ordinary set by definition.  Since it is ordinary, it should go in a listing of all ordinary sets – that is, it should contain itself.  Thus, we conclude that Q is extraordinary (hence the need to define the term in the first place).

So far so good!

But the fly in the ointment comes when we look carefully at Q being extraordinary.  The membership requirement to be in Q is that the element must be ordinary.  But if Q is contained within Q, it must be ordinary.  And so, we arrive at an infinite loop where one condition implies the other and vice versa.

Oddly enough, even though the trappings are associated with present-day set theory, replete with fancy modern symbols, the structure is almost the same as the ancient liars paradox.  The main ingredients are self-reference and a requirement to end an infinite regression.

The solution of these paradox is a story for another post.  The point here is that while the physical tools we use to manipulate the world have evolved dramatically over the centuries, the mental tools we use to grapple with logic remain largely unchanged.