Uncategorized

Gödel’s Theorem – General Notions

As I mentioned in last week’s blog, I was encouraged by Casti’s chapter on the Halting Theorem and questions of undecidability in logic and computing.  In fact, I was inspired enough that I resolved have another go at studying Gödel’s theorem.

To give some background, many years ago I came across ‘Gödel, Escher, and Bach: An Eternal Golden Braid’ by Douglas Hofstadter. While I appreciate that his work is considered a classic, I found it difficult and ponderous.   Its over 700 pages of popularized work did little to nothing to really sink home Gödel’s theorem and the connections to Turing and Church.  What’s more, Hofstadter seems to say (it’s difficult to tell exactly as he mixes and muddles many concepts) that Gödel’s work supports his ideas that consciousness can emerge from purely mechanistic means at the lowest level.  This point always seemed dodgy to me.  Especially since Hofstadter left me with the impression that Gödel’s theorem showed a fundamental lack in basic logic not an emergent property.

For this go around, I decided that smaller was better and picked up the slim work entitled ‘Gödel’s Theorem’ by Nagel and Newman.  What a difference a serious exposition makes.  Nagel and Newman present all the essential flavor and some of the machinery that Gödel used in a mere 102 pages.

Note that formal logic is not one of my strong suits and the intellectual terrain was rocky and difficult.  Nonetheless, Nagel and Newman’s basic program was laid out well and consisted of the following points.

To start, the whole thing was initiated by David Hilbert, whose Second Problem, challenged the mathematical community to provide absolute proof of the consistency of a formal system (specifically arithmetic) based solely on its structure.  This idea of absolute proof stands in contrast to a relative proof where the system is question is put into relation to another system, whose validity and consistency is accepted.  If the relation between the two is faithful, then the consistency of the second system carries over or is imparted to the first.

Hilbert was unsatisfied by the relative approach as it depended on some system being accepted at face value as being consistent and finding such a system was a tricky proposition.

The preferred way to implement an absolute proof is to start by stripping away all meaning from the system and to deal only with abstract symbols and a mechanistic way of manipulating these symbols using well-defined rules.  The example that Nagel and Newman present is the sentential calculus slightly adapted from the ‘Principia Mathematica’ by Whitehead and Russell.  The codification of the formal logic system depends on symbols that fall into two classes: variables and constant signs.  Variables, denoted by letters, stand for statements.  For example, ‘p’ could stand for ‘all punters kick the football far’.  There are six constant signs with the mapping

~ Not
$\vee$ Or
$\supset$ If… Then…
$\cdot$ And
( Left-hand punctuation
) Right-hand punctuation

The idea is then to map all content-laden statements, like ‘If either John or Tim are late then we will miss the bus!’, to formal statements like ( (J $\vee$ T ) $\supset$ B) with all meaning and additional fluff removed.  Two rules for manipulating the symbols, the Rule of Substitution and the Rule of Detachment, are adopted and four axioms are used as starting points.

In using this system, one has to sharpen one’s thinking to be able to distinguish between statements in the system (mathematical statements like ‘2 > 1’) from statements about the system (meta-mathematical statements like ‘2>1’ is true or that the ‘>’ symbol is an infix operator).  One must also be careful to note the subtle differences between the symbol ‘0’, the number 0, and the concept of zero meaning nothing.

The advantage of this approach is that the proofs are cleaner, especially when there are many symbols.  The disadvantage is that it takes time and effort to be able to work in this language.

The consistency of the formal system is shown when there is at least one formal statement (or formula) that cannot be derived from the axioms.  The reason for this is complicated and I don’t have a good grasp on it but it goes something like this.  The following formula ‘p $\supset$ (~ p $\supset$ q)’ can be derived in the sentential calculus.  If the statements S and ~S are both deducible then any statement you like can be derived from the axioms (via the Rules of Substitution and Detachment) and the system is clearly inconsistent.  In the words of Nagel and Newman:

‘The task, therefore, is to show that there is at least one formula that cannot be derived from the axioms.’ [p51]

For the sentential calculus, they point out that the formula ‘p $\vee$ q’ fits the bill since it is a formula, it doesn’t follow from the axiom.  Thus this system is consistent. Note that there is no truth statement attached to this formula.  The observation simply means that ‘p $\vee$ q’ can’t be obtained from the axioms by a mechanical manipulation.  They present this argument in their chapter titled ‘An Example of a Successful Absolute Proof of Consistency’.

Well despite that lofty achievement, they go on to show how Gödel took a similar approach and mostly ended any hope for a proof of absolute consistency in formal systems with a great deal more complexity.  Gödel used as his symbols the numerals and as his rules the basic operations of arithmetic and, in particular, the arithmetic of prime numbers. Using an ingenious mapping from of the variables and constant symbols to numbers, he not only could encode the structure of the formal system itself, he could also encode statements about the formal system as well (meta-mathematics).  In the language of Hofstadter, these are self-referential statements, although Nagel and Newman don’t use this term.

Using this approach, Gödel was able to prove that there is no absolute proof of consistency.  At best, the system can say about itself that it is either incomplete or inconsistent.  If the system is incomplete, there are true statements that are not part of the axioms and that cannot be derived from them.  Enlarging the set of axioms to include them doesn’t work since they presence begets new unprovable truths. If the system is inconsistent, then everything is ‘true’ as discussed above.

Nagel and Newman leave the reader with come final thoughts that are worth contemplation.  On the hope that Hilbert’s program can be successfully completed they have this to say

‘These conclusions show that the prospect of finding for every deductive system an absolute proof of consistency that satisfies the finitistic requirement’s of Hilbert’s proposal, though not logically impossible, is most unlikely.’

They also comment on the possibility of proving consistency from an outside-looking-in approach using meta-mathematical techniques when they say

‘[W]hether an all-inclusive definition of mathematical or logical truth can be devised, and whether, as Gödel himself appears to believe, only a thoroughgoing philosophical ‘realism’ of the ancient Platonic type can supply an adequate definition, are problems still under debate…’

Finally, they have this to say about artificial intelligence (although that term wasn’t in vogue at the time they published

‘[T]he brain appears to embody a structure of rules of operation which is far more powerful than the structure of currently conceived artificial machines.  There is no immediate prospect of replacing the human mind by robots.’

And there you have it. A whirlwind tour of Gödel’s theorem with a surprise appearance of the philosophy from antiquity and the ideas about artificial intelligence.

Turing, Gödel, and the Universe

Something about the book ‘Five Golden Rules: Great Theories of 20th-Century Mathematics and Why They Matter’ by John L. Casti caught my eye the other day at the library.  On a whim, I signed the book out and started reading.  Living up to the promise of the title, the book had five chapters, each one devoted to one of the great math theories of 20th century.  All in all, I had been exposed, at least superficially, to all the topics covered so I wasn’t sure what I would get out of the book other than some additional insight into material I already knew or confirmation of my wisdom in staying away from topics that I didn’t.

Anyway, I am quite glad that the mechanism of providence pointed me at this book because the connections that Casti draws are worth thinking about.  None of these connections was as profound for me as the deep link that he explores in Chapter 4 entitled ‘The Halting Theorem (Theory of Computation)’.

In this chapter, Casti first presents the concept of the universal Turing machine (UTM) as a mechanism for attacking the question of Decision Problem (or Entscheidungsproblem) proposed by David Hilbert in 1928.  Casti then couples this presentation with a discussion of the famous Gödel Incompleteness Theorem.

I’m simply a beginner in these fields and Casti omits important details and glosses over a variety of things to help the novice.  From this point-of-view, I can’t recommend his work.  But I am grateful and excited about the perspective he provided by making this linkage.

To understand my excitement, let me first try to fill in some of the background as best as I can.

Hilbert’s Decision Problem asks the following question.  Given a set of input data and an algorithm for manipulating said data, is there a way to know if the algorithm will be able to make a yes/no decision about the input.  For example, if the input data is a set of axioms in a logical system and some corresponding assertion in the same system and the algorithm is a logical formalism, such as the classical syllogism, will the algorithm be able to prove or disprove the assertion as either true (yes) or false (no)?

While the Decision Problem might seem straightforward enough to talk about it at a seminar, when it comes to actually tackling the question there are some vague conceptions that need further elaboration.  Specifically, what does the term ‘algorithm’ really mean and what constitutes a workable algorithm seem to have been the murkiest part when the problem was first posed.

Turing chose to clarify the algorithm concept by the invention of the machine which bears his name.  The UTM is a basic model of computation that is often used to understand the theory behind computer programming. It is also the avenue that allowed Turing to a way to tackle Hilbert’s Decision Problem.  The basic ingredients for a UTM are a tape or strip of paper, divided into squares, a set of symbols (usually ‘0’ and ‘1’) that can be written into the squares, and a box that has a mechanism for moving the strip left or right, a head that reads from and writes to the strip and that contains an internal state that allows the UTM to select an action from a pre-defined set, given the internal state and current symbol being read.  A typical graphical representation of an UTM looks like

Turing_machine

The UTM can represent the input to the decision process by a particular pattern of symbols on the strip.  It can also implement the step associated with an algorithm by encoding these also to symbols on the tape.  So the entire nature of the Decision Problem comes down to the question as to whether once the UTM starts if it will ever halt; hence the name of the name of the chapter.

Kurt Gödel took a different approach to answering the Decision Problem.  He developed a way to map any formal system to a set of numbers, called Gödel numbers.  Then by formally manipulating these numbers, he was in position to try to answer Hilbert’s challenge.

Now it is reasonable to suppose that all not every question has a yes or no answer.  Questions about feeling, opinion, or taste spring to mind.  But I think that most of us expect that questions of mathematics and logic and programming have answers and that we should be able to figure them out.  The remarkable thing about the work of both Turing and Gödel is that there are cases where the formal logical system simply can’t decide.

In the language of Turing, there are computational problems for which there is no knowing beforehand if the algorithm will terminate with an answer.  In the language of Gödel, no matter is done to a logical system in terms of refining existing axioms or adopting new ones, there will always be truths that are unprovable.

I was quite aware of Godel’s theorem and even wrestled with it for a while when I was concerned about its implications to physical systems.  Eventually, I decided that while man’s logic may be limited, nature didn’t need to worry about it because she could make decisions unencumbered by our shortcomings.

I was also quite aware that Turing’s machine was used fruitfully as a model computation.  What I was unaware of until reading Casti’s book was the parallel between the conclusions of Gödel and of Turing.

And here we arrive at the source of my excitement.  As I’ve already said, I remain convinced that Nature can decide – that is to say that Nature is free of the issues discussed above.  And yet, in some capacity the Universe is an enormous Turing machine.

Universe_as_Turing

So why does Nature always make a decision and why does the computation of trajectories, and the motion of waves, and evolution of quantum bit of stuff always reach a decision?  It may not be the decision we expect or anticipate, but it seems clear that a decision is reached.  And so, by looking at how the Universe as a Turing Machine (UaaTM) differs from a Universal Turing Machine, one may learn more about both.  This is the exciting new idea that has occurred to me and one which I will be exploring in the coming months.

States, Abstraction, and Implicit Objects

This post grew out of a conversation in which I recently participated, about the correct way to computationally model a system.  The conversation covered the usual strategies of identifying the system’s states and how to abstract the appropriate objects to hold or contain the states based on the question at hand.  But what was most interesting were the arguments put forward about the importance of recognizing not only the explicit objects used in the modeling but also the implicit objects that were ignored.

By way of a general definition, an implicit object is an object whose existence is needed in the computer modeling of the system in question but one that is not represented in code as either a real or virtual class.  I don’t think this definition is used in computer science or programming books but it should be.  This is a point that is almost entirely ignored in the field but is vital for a correct understanding of many systems.

A simple example will go a long way to clarify the definition and to also raise awareness that implicit objects are used more often than people realize.  The system to be modeled is the motion of traffic on a highway.  For simplicity, we’ll focus on two cars, each traveling along a straight stretch of road.  Generalizations to more complicated situations should be obvious but don’t add anything to the discussion.  A picture of the situation is

cars_on_road

Here we have a blue car and a green car moving to the right as denoted by the matching colored arrows.  The typical abstraction at this point is to create a car class with appropriate attributes and member functions for the questions we want to answer.  Let’s suppose that we are interested in modeling (with an eye towards prevention) the collision between the two cars.  What ingredients are needed to perform this modeling?

A collision between two objects occurs, by definition, when they try to occupy the same space at the same time.  Unpacking this sentence leads us to the notion of position for each car as a function of time and a measure of the physical extent of each car.  Collision occurs when the relative position between the centers of the two cars is such that their physical extents overlap.

collision

Now, real life can be rather complicated, and this situation is no different.  The physical shape of a car is generally hard to model with many faces, facets, and extrusions.  And the position of the car is a function of what the driver perceives, how the driver reacts, and how much acceleration or deceleration the driver imposes.

In time-honored fashion, we will idealize the cars as having a simple two-dimensional rectangular shape as shown in the figure, and will also assume that the cars have a simple control law (which we don’t need to specify in detail here except to say that it depends on the position, velocity, and time) that gives the velocity of the car at any instant of time.  With these simplifying assumptions, the trajectory of the car is completely specified by giving its position at some initial time and then using the control law to update the velocity at each time step.  The position is then updated using the simple formula ${\vec x}(t+dt) = {\vec x}(t) + {\vec v} dt$.

With all these data specified, we might arrive at a class definition for the car object that looks like:

car_class

At this point, we’ve already encountered our first two implicit objects.

The first one is a coordinate frame, comprised of an origin and an orientation of the coordinate axes, which tells us what the position of the object is measured against.  This coordinate frame is an object in its own right and, in many circles, worthy of study.  Here it is simply important to acknowledge that it exists and that the width and length parameters need to be defined consistently with regards to it.  This is an important point, since a change in coordinate frame orientation will change the test (to be described later) for the geometric condition of overlap (i.e., collision).

The second implicit object is much more subtle and it took the discovery of the principles behind special relativity to drive that point home.  Both cars share a common clock to describe their motion.  That is to say that their velocities are only meaningfully compared if their clocks are synchronized and ticking at the same speed.  The role of the implicit clock is usually carried out by the main program of the simulation but, since each object holds its own time, care must be exercised to keep them synchronized.  It is possible to make this common-clock assumption stronger by eliminating time from the list of member data, but that approach also has its drawbacks, as will be touched on below.

The final question is, how do we determine if the cars collide somewhere during the simulation?  There are two possible approaches.  The first is that we give each car a new member function, called detect_collision, which takes as arguments the position of the other car and its size.  This is not an unreasonable approach for simple situations, but it quickly becomes unwieldy should we later want to add fidelity to the simulation.  For example, if we wanted to allow for the body of the car changing its orientation as it changes lanes, then the number of parameters that we have to hand into the detect_collision function has to increase.  A more robust way to do this is to perform this computation at the main workspace in the simulation.  If this method is chosen, we’ve now introduced a third implicit object.  It’s not quite clear from the context what to call this object, but it’s function is absolutely clear – it communicates to the world the message that the two cars have collided.

This last notion may seem like a trivial observation leading to an unnecessary complication but consider the case where each car is equipped with some sensor that warns of an eminent collision.  Remote sensing requires some model of the medium between the sensor and thing being sensed.  If the collision detector uses sound, then the air between the cars must be modeled as an elastic medium, with a finite propagation speed of the signal, and variations in this medium due to temperature, humidity, and the like may need also need to be modeled.  If the collision detector uses electromagnetic waves (radar or laser ranging), then the index of refraction and dispersion properties of the air become the key parameters that again may depend on the thermodynamics of the atmosphere.  In either case, this communicating medium now must be promoted from an implicit object to an explicit one and the times of the two objects take on new meaning as we need to track the transmit and receive times of the signals sent between them.

The lesson, I hope, is clear.  It is just as important to pay attention to the implicit objects in a simulation as it is to the explicit ones.  There is a parallel here with the spoken word – where it is often as important to pay attention to what was left unsaid as it is to what was said.

Finding the Socratic Method

There is a standard debate in mathematics about the application of the terms ‘invention’ versus ‘discovery’.  It resurfaced the other day when a colleague and I were talking about some mathematical graffiti that adorned a door jamb in the conference room in which we were meeting.  This graffiti took the form of some mathematical symbols printed on a magnet held in place at the top part of the door.  None of us in the room were able to determine what, if any, message was being sent but, in the process of discussing the possible meaning, my colleague said, in passing, that Pythagoras had invented the theorem that bears his name.  I questioned whether the verb should have been ‘discovered’ rather than ‘invented’.  We spent a few minutes discussing that point, then we gave up altogether and went our separate ways. On the drive home that evening I began to think about the proper use of those two words and I finished up wondering if Socrates invented or discovered his famous method.

To understand the distinction between ‘invent’ and ‘discover’, let’s return to the Pythagorean Theorem for a moment.  Most everyone knows the theorem, namely that the square of the length of the hypotenuse of a right triangle is equal to the sum of the squares of the lengths of the other 2 sides.  The most common proof and, in my opinion, the most elegant, draws a right triangle with squares of the appropriate areas on each side.  I’ve provided such a diagram for a 3-4-5 Pythagorean triangle below.

PT_triangles

The proof proceeds by laying the bigger of the two 90-degree side squares onto the hypotenuse square

PT_overlay

and then adding up the remaining area and showing that it is equal to the area of the smaller square

PT_small_area

Of course, the steps shown above were quickly done in PowerPoint graphics and there is no reason for a skeptic to actually accept them as proof.  But the doubters can go at this ‘proof’ with whatever vigor they desire.  The answer will always be the same:  $a^2 + b^2 = c^2$.

And that brings us to the point about invention versus discovery.  I would argue that the Pythagorean Theorem is an exercise in discovery.  That finding that all right triangles satisfy it is exactly the point – we find or discover that all right triangles obey this relationship.

Contrast this with Edison’s invention of the light bulb.  I say invention because of a number of factors.  First, there is the particular form of the object in question.  The base, with its two contacts, one at the bottom center and one on the periphery, is choice of form that could have been done many other ways.  The shape of the bulb itself is only a suggestion of what could be done given the state of art of glass blowing at the time of its introduction to society.  Second, there is the particular design and implementation.  The placement, current and voltage running through the filament were all carefully chosen to meet specific goals or requirements.  The materials that comprise all the parts were chosen to provide the maximum economy based on availability and convenience in pre-existing manufacturing processes. So I would argue that, while it is proper to say that laws and properties of electricity were discovered, the light bulb is a true invention.

So far so good, but what about Socrates?  Did he discover or invent the Socratic Method?  As a quick review, a few brief words are in order about what I mean when I say the Socratic Method (ironically, if you follow the link to the Wikipedia article, you’ll find that both ‘invention’ and ‘discovery’ are used in describing Socrates’s contribution).

The Socratic Method is best explained by the Platonic dialog called Euthyphro.  In this dialog, we find Socrates and Euthyphro both showing up at the Athenian court but for very different reasons.  Socrates is answering a call by the court to make account of his ‘criminal ways’ whereas Euthyphro intends on petitioning the court to bring a charge of murder against his father for the death of a slave in his possession.  The two meet on the steps leading inside and exchange with each other their reasons for being there.  Socrates expresses surprise that Euthyphro is accusing his father of murder since the slave in question died from being imprisoned for murdering another slave.  Euthyphro says that he is compelled to this course of action due to his piety.  That’s all the prompting needed by Socrates and soon the two are engaged in a discussion where Socrates asks a question like ‘what is piety’ and Euthyphro attempts to answer with a response like ‘what is pleasing to the gods’.  After each answer, Socrates questions some new part of the response as a way of sharpening the reasoning behind the response.

The Socratic Method is a way of examining the logical content of a statement by carefully examining the basic notions that make up that statement.  So, asking what do ‘piety’, ‘pleasing’ and ‘gods’ mean is a way of finding the truth. Generally, when the method is applied, we are more apt to find out what a particular concept, like ‘piety’, isn’t, rather than finding out what it is.  Most of the dialogs (and, for that matter, modern applications) end with both parties departing before the full meaning has been established but at least with a clearer picture of what is not meant.

So, with all the preliminaries out of the way, the key question to grapple with is whether Socrates invented this method or discovered it.  My vote is for discovery.  I say this mostly because of the universal nature of this mode of inquiry, but partly because Socrates believed in Truth in the most absolute sense.  If he invented this type of intellectual exploration, then the application of it would be necessarily limited to those contexts where its design matched a particular cast of mind or cultural milieu. The fact that it is a successful philosophical pursuit the world over is testament to its ability to transcend the accidentals of human culture.  The fact that it was fashioned with the goal of discovering Truth through logic and reason and that Socrates believed in Truth leads me to believe that he would agree that he discovered his method.

I am willing to say that Plato invented the particular encounters presented in the dialogs and that Socrates invented the accidental trappings whereby he applied his method to the Athenian society.  Recognizing these inventions is equivalent to recognizing the invention of the particular symbols for denoting an algebraic quantity as ‘$a$’ or multiplication by ‘$\times$’ or equality as ‘$=$’.  Writing $c \times c = a \times a + b \times b$ versus saying ‘the square of the hypotenuse is equal to the sum of the squares of the other two sides’ are two different, invented ways for expressing the same truth.

Language and Metaphor

Why do we quote movies?  I often think about this fascination we have as a culture to repeat and relive moments from our favorite films.  Large number of clips exist on YouTube devoted to capturing that magical moment when a character utters an unforgettable line.  But why? What is it that drives us to remember a great line or identify with the person who uttered a quote?

This question came up over the dinner table one night and, as I reflected on this question, my thoughts were drawn to an explanation about speed reading that I had once come across.  The author went to great trouble to make the point that that the trick to speed reading was to see groups of words in a sentence as a single chunk of information.

To understand that point, consider the words you read in a sentence. To make it concrete, let’s take the word ‘understand’.  When you read the word ‘understand’, you are seeing a group of 11 individual letters, but are you really conscious of each letter at a time?  Do you really see a ‘u’ followed by an ‘n’ followed by a ‘d’ and so on?  No.  What each of us with any sophistication in reading accomplishes is to perceive these 11 letters as a unit that conveys the word ‘understand’.  Of course, this is why we can read a sentence with a misspelling quite comfortably and often we may not even notice.

This idea of chunking and clumping comfortably scales upward to where common phrases can be consumed with a glance without a detailed examination of the individual words.  Two common approaches are to either abbreviate the phrase into some short form.  Expressions like ‘LOL’ or ‘BTW’ or any of the other text and chat speak concepts are excellent examples.  The other approach is to pick lyrical or repetitive expressions to encapsulate a phrase.  The famous ‘rosy-fingered dawn’ from the Homeric epics or ‘ready, set, go!’ are examples of these kinds, respectively.

But there is an even more compelling approach – the concept of the metaphor.  The idea here being that we liken an object to another object, one with well-known properties.  The properties of the second object are then inherited by the first simply by the equating of the name.  Some examples of this include the following sentences.

  • ‘That guy’s a Benedict Arnold for leaving our team for that other one.’
  • ‘How was the date last night, Romeo?’
  • ‘Stay away from her, she’s Mata Hari through and through!’
  • ‘That man is death itself’.

I think that our desire to quote movies is indicative of this.  That by repeating the dialog of the movie, the quote itself becomes a cultural metaphor for the feelings and experiences expressed in the movie.  This idea was brilliantly explored in the Star Trek: The Next Generation episode Darmok.

In this episode, the crew of the Enterprise are coaxed into a meeting with a Tamarian ship in orbit around a mostly unexplored planet.  Despite their universal translator, no ship of the Federation had ever been able to crack the Tamarian language.  The words themselves could be translated, but the syntax and context seemed to be utterly devoid of meaning.  The Tamarian captain, Dathon, seeing that this meeting was no different from previous encounters and that the ordinary avenues for communication were not working, arranged for himself and Captain Picard to be trapped on a planet.

On that planet, both captains were confronted by a dangerous creature.  This shared danger spurred a meeting of the minds and eventual understanding dawned on Picard.  The Tamarian race thought and communicated in metaphors.  They would say statements like ‘Tember, his arms wide’ to mean the concept of giving or generosity.  Back on the Enterprise, the crew had also come to a similar epiphany.  By analogy, they constructed a Tamarian-like way of expressing romance by saying ‘Juliet on her balcony’, but they lamented that without the proper context, in this case Shakespeare’s tragic play about Romeo and Juliet, one didn’t know who Juliet was and why she was on her balcony.

The episode closes with Dathon dying from the wounds inflicted by the creature, and with Picard arriving back aboard the Enterprise just in time to make peace between the Tamarians and the Federation by speaking their language.

The episode left some dangling ideas.  How do the Tamarians specify an offer that involves a choice between many things, or how an abstract idea, like giving someone his freedom, would be expressed.  Nonetheless, it was a creative and insightful way of exploring how powerful metaphor can be and how abstracted can be the thought that lies behind it.

So, the next time you quote a movie, give a thought to the metaphor that you are tapping into and take a moment to marvel at the miracle of speech and thought.

Aces High

This week’s column has a three-fold inspiration.  First off, most of the most exciting and controversial philosophical endeavors always involve arguing from incomplete or imprecise information to a general conclusion.  These inferences fall under the heading of either inductive or abductive reasoning, and most of the real sticking points in modern society (or any society for that matter) revolve around how well a conclusion is supported by fact and argument.  The second source comes from my recent dabbling in artificial intelligence.  I am currently taking the edx course CS188.1x, and the basic message is that the real steps forward that have been taking shape in the AI landscape came after the realization was made that computers must be equipped to handle incomplete, noisy, and inconsistent data.  Statistical analysis and inference deals, by design, with such data, and its use allows for an algorithm to make a rational decision in such circumstances.  The final inspiration came from a fall meeting of my local AAPT chapter in which Carl Mungan of the United States Naval Academy gave a talk.  His discussion of the two aces problem was nicely done, and I decided to see how to produce code in Python that would ‘experimentally’ verify the results.

Before presenting the various pieces of code, let’s talk a bit about the problem.  This particular problem is due to Boas in her book Mathematical Methods for the Physical Sciences, and goes something like this:

What is the probability that, when being dealt two random cards from a standard deck, the two cards will be two aces?  Suppose that one of the cards is known to be an ace, what is the probability?  Suppose that one of the cards is known to be the ace of spades, what is the probability?

 

The first part of this three-fold problem is very well defined and relatively simple to calculate.  The second two parts require some clarification of the phrase ‘is known to be’.  The first possible interpretation is that the player separates out the aces from the deck and either chooses one of them at random (part 2) or he chooses the ace of spades (part 3).  He returns the remaining 3 aces to the deck, which he subsequently shuffles prior to drawing the second card.  I will refer to this method as the solitaire option.  The second possible interpretation (due to Mungan) is that a dealer draws two cards at random and examines them both while keeping their identity hidden from the player.  If one of the cards is an ace (part 2) or is the ace of spades (part 3), the dealer then gives the cards to the player.  Parts 2 and 3 of the question then ask for the probability that the hands that pass this inspection step actually have two aces.  Since this last process involves assistance from the dealer, I will refer to it as the assisted option.  Finally, all probabilities will be understood from the frequentist point of view.  That is to say that each probability will be computed as a ratio of desired outcomes to all possible outcomes.

With those preliminary definitions out of the way, let’s compute some probabilities by determining various two-card hand outcomes.

First let’s calculate the number of possible two-card hands.  Since there are 52 choices for the first card and 51 for the second, there are 52×51/2 = 1326 possible two-card hands.  Of these, there are 48×47/2 = 1128 possible two-card hands with no aces, since the set of cards without the aces is comprised of 48 cards.  Notice that, in both of these computations, we divide by 2 to account for the fact that order doesn’t matter.  For example, a 2 of clubs as the first card and a 3 of diamonds as the second is the same hand as a 3 of diamonds as the first card and a 2 of clubs as the second.

no_aces

Likewise, there are 48×4 = 192 ways of getting a hand with only 1 ace.  Note that there is no need to divide by 2, since the two cards are drawn from two different sets.

one_ace

Finally, there are only 6 ways to get a two-ace hand.  These are the 6 unique pairs that can be constructed from the green set shown in the above figure.

As a check, we should sum the size of the individual set and confirm that it equals the size of the total number of two-card hands. This sum is 1128 + 192 + 6 for no-ace, one-ace, and two-ace hands, and it totals 1326, which is equal to the size of the two-card hands.  So, the division into subsets is correct.

With the size of these sets well understood, it is reasonably easy to calculate the probabilities asked for in the Boas problem.  In addition, we’ll be in a position to determine something about the behavior of the algorithms developed to model the assisted option.

For part 1, the probability is easy to find as the ratio of all possible two-ace hands (6 of these) to the number of all possible two-card hands (1326 of these).  Calculating this ratio gives 6/1326 = 0.004525 as the probability of pulling a two-ace hand from a random draw from a well-shuffled standard deck.

For parts 2 and 3 in the solitaire option, the first card is either given to be an ace or the ace of spades.  In both cases, the probability of getting another ace is the fraction of times that one of the three remaining aces is pulled from the deck that now holds 51 cards.  The probability is then 3/15 or 0.05882.

The answers for the assisted option for parts 2 and 3 are a bit more subtle.  For part 2, the dealer assists by winnowing the possible set down from 1326 possibilities associated with all two-card hands, to only 192 possibilities associated with all one-ace hands, plus 6 possibilities associated with all two-ace hands.  The correct ratio is 6/198 = 0.03030 as the probability for getting a single hand with two aces when it is known that one is an ace.  For part 3, the dealer is even more zealous in his diminishment of the set of possible hands with one card, the ace of spades.  After he is done, there are only 51 possibilities, of which 3 are winners, and so the correct ratio is 3/51 = 0.05882 as the probability of getting a single hand with two aces when it is known that one is the ace of spades.

All of these probabilities are easily checked by writing computer code.  I’ve chosen python because it is very easy to perform string concatenation and to append and remove items from lists.  The basic data structure is the deck, which is a list of 52 cards constructed from 4 suits (‘S’,’H’,’D’,’C’), ranked in traditional bridge order, and 13 cards (‘A’,’2’,’3’,’4’,’5’,’6’,’7’,’8’,’9’,’10’,’J’,’Q’,’K’).  Picking a card at random is done importing the random package and using random.choice, which returns a random element of a list passed into it as an argument.  Using the random package in the code turns the computer modeling into a Monte Carlo simulation.  For all cases modeled, I input the number of Monte Carlo trials to be N = 1,000,000.

Code to model part 1 and parts 2 and 3 for the solitaire option is easy to implement and understand, so I don’t include it here.  The Monte Carlo results (10 cases each with N trials) certainly support the analysis done above, but, if one didn’t know how to do the combinatorics, one would only be able to conclude that the results are approximately 0.0045204 +/- 0.00017.

The code to model parts 2 and 3 for the assisted option is a bit more involved because the dealer (played by the computer) has to draw a hand then either accept or reject it.  Of the N Monte Carlo trials drawn, what percentage of them will be rejected?  For part 2, this amounts to determining what the ratio of two-card hands that have at least one ace relative to all the possible two-card hands.  This ratio is 192/1326 = 0.1448.  So, roughly 85.5 % of the time the dealer is wasting my time and his.  This lack of economy becomes more pronounced when the dealer rejects anything without an ace-of-spades.  In this case, the ratio is 51/1326 = 1/52 = 0.01923, and approximately 98% of the time the dealer is throwing away the dealt cards because they don’t meet the standard.  In both cases, the Monte Carlo results support the combinatoric analysis with 0.03031 +/- 0.0013 and 0.05948 +/- 0.0044.

import random                

suits   = ['S','H','C','D']
cards   = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']

def make_deck():
    deck = []
    for s in suits:
        for c in cards:
            deck.append(c+s)
    return deck
  
def one_card_ace(num_mc):
    num_times = 0
    num_good  = 0
    deck     = make_deck()
    for i in range(0,num_mc,1):
        card1 = random.choice(deck)
        deck.remove(card1)
        card2 = random.choice(deck)
        deck.append(card1)
        if card1.find('A') == 0 or card2.find('A') == 0:
            num_times = num_times + 1
            if card1.find('A') == 0 and card2.find('A') == 0:
                num_good = num_good + 1
    print deck
    return [num_times, num_good]

def ace_of_spades(num_mc):
    num_times = 0
    num_good  = 0
    deck     = make_deck()
    for i in range(0,num_mc,1):
        card1 = random.choice(deck)
        deck.remove(card1)
        card2 = random.choice(deck)
        deck.append(card1)
        if card1.find('AS') == 0:
            num_times = num_times + 1
            if card2.find('A') == 0:
                num_good = num_good + 1
    print deck
    return [num_times,num_good]

Notice that the uncertainty in the Monte Carlo results grows larger in part 2 and even larger in part 3.  This reflects the fact that the dealer only really affords us about 150,000 and 19,000 trials of useful information due to the rejection process.

Finally, there are a couple of philosophical points to touch on briefly.  First, the Monte Carlo results certainly support the frequentist point of view, but they are not actual proofs of the results.  Even more troubling is that a set enumeration, such as given above, is not a proof of the probabilities, either.  It is a compelling argument and an excellent model, but it presupposes that the probability should be calculated by the ratios as above.  Fundamentally, there is no way to actually prove the assumption that set ratios give us the correct probabilities.  This assumption rests on the belief that all possible two-card hands are equally likely.  This is a very reasonable assumption, but it is an assumption nonetheless. Second, there is often an accompanying statement along the lines that, the more that is known, the higher the likelihood of the result.  For example, knowing that one of the cards was an ace increased the likelihood that both were an ace by a factor of 6.7.  While true, this statement is a bit misleading, since, in order to know, the dealer had to face the more realistic odds that 82 percent of the time he would be rejecting the hand.  So, as the player, our uncertainty was reduced only at the expense of a great deal of effort done on our behalf by another party.  This observation has implications for statistical inference that I will explore in a future column.

Random Thoughts on Randomness

One of the questions I like to indulge, from time to time, is “what would Aristotle’s reaction be to some fixture of modern life?” This week I decided to ponder what the famous Macedonian would say about the use of randomness, statistics, and probabilistic reasoning that pervades our daily existence.

On one hand, I think that he would be intrigued by the very way that randomness is built into nature and our interactions with her. Quantum mechanics has randomness at its very core. This type of randomness, sometimes referred to as aleatory (meaning depending on the dice) uncertainty, is (or at least seems to be) an inescapable facet of how matter works at its most basic level. The outcome of any experiment is probabilistic no matter how precisely controlled was the setup. I imagine the fascination and, perhaps, the disbelief that he would have displayed when first encountering the Heisenberg uncertainty principle. In fact, since I am fantasizing here, it would be really interesting to see these two great men sit down for lunch and a long philosophical discussion about the meaning of uncertainty and causality.

I also think that Aristotle would have quickly grasped how we apply uncertainty in our knowledge, sometimes referred to as epistemological uncertainty, to a host of important everyday activities.

I envision him nodding his head in approval to the sciences of chemistry, thermodynamics, and statistical mechanics. In each of these disciplines, the sheer amount of data needed to describe the motion of the individual atoms, molecules, or components is overwhelming, and one must sacrifice complete knowledge of the small in order to get good knowledge of the all. No doubt, the connection between information and entropy would consume much of his attention as he explored the combinatoric possibilities of very large numbers of objects.

Aristotle would also be at home in the how discipline of statistics and estimation theory. As a skeptic of the universal validity of the Platonic forms, he well knew the individual differences in development and form of everything on the Earth. I picture him exclaiming, with satisfaction, “Yes! That’s how it should be done,” when he saw how statistical estimation theory is applied to turning noisy measurements into adequate conclusions. All the modern applications, from quality control to data fitting, would have caught his fancy.

Computing application of randomness and uncertainty would also meet with his approval. Numerical techniques, such as the Monte Carlo method, that use random samples as a way of ferreting out details of a process would hold a particular fascination for him and I imagine him spending hours playing with cellular automata simulations, such as Conway’s game of life. Most of all, the growing consensus in the artificial intelligence community that uncertainty is the vital component to producing machines that make rational decisions would have been pure delight for the Philosopher.

All that said, I think Aristotle would recoil in horror with that one, very large, component of randomness in modern life – the public’s approach to the scientific study.

Everywhere we go we are confronted by a headline, radio broadcast, news report, or tweet that says some study performed somewhere has determined something. Often times the public, even that component that is highly educated, consumes these studies with little or no heed. Most are unaware of how these studies are performed and how weak the inferences (i.e., conclusions) actually are.

If Aristotle were sitting amongst us when the news of the latest study were to break, he would be the first to caution us to be clear in our thinking and skeptical of the results. As a logician, he would know the difficulty in inferring a cause from the observed effect. And as a member of the Athenian polis, he would know better than we the various conflicting political agendas that drive people to various conclusions.

But the most grievous complaint that Aristotle would level against modern scientific studies would come only after he returned to the academy to see just how ‘the sausage is made’. He would be shocked and repulsed by the bastardization of the academic process. Not because of the political pressures and the conflicts of interest due to funding or the lack of it. No, he would be repulsed with how little regard our institutions of higher learning have for the concept of critical thinking.

To be a skeptic, to doubt a conclusion, to question and probe are no longer virtues (in the strictest Aristotelian sense). These traits brand their bearer not as a philosopher but as dullard or, worse, a denier. Graduate students, post docs, and professors worldwide are increasingly drifting further away from intellectual integrity. It is enough, they say, to find a positive or negative correlation in some data, perform a hypothesis test, and publish a result (see, e.g., the critiques leveled by Michael Starbird in Meaning from Data: Statistics Made Clear). Damn critical thinking and the discomfort that it brings they say. Aristotle, or course, would say different.

The Dangers of Being Equal

One of the favorite themes of this blog is how language and reasoning affect each other, sometimes to the detriment of both.  The overlap between logical reasoning and mathematical language is particularly ripe with possibilities of confusion because of the way certain concepts are used contextually.  In an earlier post I discussed the seductive properties of the humble symbol $\infty$.  A far more deadly symbol is the highly overloaded glyph described by two parallel horizontal lines – the equal sign ‘$=$’.

There are so many contextual uses of the equal sign that it is hard to know where to start.  And each and every one of them is sinister to the untrained.  Like some kind of bizarre initiation ritual, we subject students of all stripes to this ambiguous notation, and then we get frustrated when they don’t grasp the subtle distinctions and shaded nuances of meaning that we take for granted.  This situation closely parallels the experiences many of us have had learning how to swim, or ride a bike, or ice-skate, or drive.  Those of us who know how to do something, often can’t remember how hard it is to learn when you don’t know.

Of course, this situation is not unprecedented in language.  A simple internet search using the string ‘word with the most definitions’ returns the following statement from Dictionary.com

“Set” has 464 definitions in the Oxford English Dictionary. “Run” runs a distant second, with 396. Rounding out the top ten are “go” with 368, “take” with 343, “stand” with 334, “get” with 289, “turn” with 288, “put” with 268, “fall” with 264, and “strike” with 250.

So, functionally overloading a word with numerous meanings, some of them very closely related and some of them quite distinct, is commonplace.

What makes the equal sign so frustrating is that it is mostly applied in highly technical fields where shades of meaning in thought can have large implication in outcomes.  Consider the differences in meaning in the following equations:
\[ \pi = \frac{C}{D} = \frac{C}{2 r} \]
and
\[ \pi = \ln\left( i^{-2i} \right) \]
and
\[ \pi = \sqrt{6 \sum_{n=1}^{\infty} \frac{1}{n^2}} \; . \]

Each of them tells us something about the irrational number $\pi$, but in very different ways.  In the first equation, we think of $\pi$ as the assigned value for the correlation between the diameter of a circle, $D$, and its circumference, $C$.  This concept is purely geometric, and can be explored with rulers and compasses and pieces of paper. In some sense, it can even be regarded as a causative relation, telling us that, if we make a circle of radius $r$, then we are making an object whose perimeter is a distance $C$.  The second equation is an identity in the purest sense of that term.  It boldly states that one of the many disguises of $\pi$ is an algebraic expression involving the natural logarithm and the imaginary number .  The final equation is neither an assignment nor an identity, but a set of instructions saying ‘if you want to know how to compute $\pi$ to some accuracy, then set up a computing process that takes the first  integers and combines them in this funny way.’

The science of computing has long recognized that the usual ambiguity of human language would be inadequate for machine instructions.  All programming languages to which I’ve been exposed clearly distinguish between the concepts of assignment, equivalence, and function definition.  Using the pi equations above, one might express them in the programming languages Python and Maxima as

Pi equation Python Maxima
\[ \small  \pi = \frac{C}{2r} \]
pi = C/(2*r)
pi : C/(2*r)
\[ \small \pi = \ln\left(i^{-2i}\right) \]
pi == ln(i**(-2*i))
pi = ln(i**(-2*i))
\[ \small \pi = \sqrt{6 \sum_{n=1}^{\infty} \frac{1}{n^2}} \]
def sum_sqr(n):
    sum = 0
    for i in range(1,n+1):
        sum = sum + 1.0/(i*i)
    return temp

def approx_pi(n):
    sum = sum_sqr(n)
    return (6*sum)**(0.5)
calc_pi(n) := 
block([sum],
 sum : 0,
 for i: 1 thru n do
 sum : sum + 1/(i*i),
 ans : sqrt(6*sum));

Note that, in each case, there is a clear syntactical difference between assignment (‘$=$’ or ‘$:$’), the conditional test for identity (‘$==$’ or ‘$=$’), and functional definition (‘def’ or ‘$:=$’).  For anyone who’s been programming for some time, switching back and forth between these ideas of assignment, equivalence, and definition is relatively effortless, but for the beginner it is one of the hardest concepts he will have to learn.

The situation is even more complex in the physical sciences, for two primary reasons.  First, and foremost, because man has been investigating the physical world longer than he has been writing computer programs.  As a result, there has been more time for man to layer different meanings and subtle distinctions.  Second, computers are intrinsically stupid and require a high degree of precision and clarity to function.  A nice discussion of this last point can be found in the prologue of the book Functional Differential Geometry by Sussman and Wisdom.

As an example, let’s look at perhaps the most famous physical statement – Newton’s second law.  Many people, even those lacking formal training in science, know that the expression of the law is ‘force equals mass times acceleration’ or, in mathematical terms,

\[ \vec F = m \vec a \; . \]

But what does the equal sign here mean?  The concept of a force tells us that it is a vector quantity that transforms like a position vector.  That means that a force relationship is the same in all frames.  For example, the balancing of the pulls from three ropes tied to an object such that the object doesn’t move is an equilibrium condition that is independent of the frame in which it is expressed.  An accelerating observer will make the same conclusion as an inertial observer. So the force on the left-hand side of $f=ma$ is geometric in its meaning.

On the other hand, we understand that the acceleration appearing on the right-hand side is kinematic.  It describes an object’s motion and it’s the kind of thing measured with rulers and clocks.  It is fundamentally frame dependent when described by an accelerating observer.  Just imagine the visual perception of someone on a merry-go-round.  The mass, which measures the object’s unwillingness to move under influence of a force, simply scales the acceleration and can be regarded as constant.

So how do we reconcile what the equal sign is meaning here?   On one side is a geometric quantity as immutable and placid as a mountain.  The other side is as ephemeral as rising mist or running water, flowing to and fro.  How can they actually be equal?

Well, the answer is that the equal sign should be regarded as relating cause and effect.  If we regard the force as known (e.g., Newton’s universal law of gravity), then the equal sign allows us to deduce the resulting motion once the force is applied.  If we regard the acceleration as known (e.g., we film the motion and do a frame analysis), we can infer (via abductive reasoning) the force that caused it.

Clearly, the innocent-looking ‘$=$’ packs a lot more meaning than at first it appears. It is interesting to ponder why it is that the shortest of strings, such as ‘$\infty$’, or ‘set’, or ‘$=$’, have the longest and deepest of meanings. Maybe it reflects on the subtly of the human mind.

It Isn’t Elementary

I suppose that this post grew out of a recent opportunity I had to re-watch the M*A*S*H television series. One particular episode, entitled The Light That Failed, finds the army surgeons and nurses of the 4077th unit suffering a brutal Korean winter and desperately low on supplies.  The supply truck soon arrives bearing a cargo more suited for a military unit in the Guam or the Philippines and not mainland Asia.  As the troops are left wondering what they can do with mosquito netting, ice cream churns, and swim fins when the temperature is hovering around freezing, someone notices that one of the doctors, B J Hunnicutt, has received a very rare object – a book.

And this is not just any book, but a murder mystery by the famed writer Abigail Potterfield called the Rooster Crowed at Midnight.  Either out of the goodness of his heart or out of a desire to end the constant nagging (probably both), B J decides to tear portions of the book out so that it can circulate throughout the camp.  As the old saying goes, no good deed goes unpunished, and soon he discovers that the last page of the book, in which all is revealed, is missing.  And thus begins the great debate as to who committed the murders, how they did it, and why.

The team comes up with many answers, all of which are first widely embraced as the solution and then scuttled when someone gives a counterexample that pokes a hole in the theory.  They eventually place a long distance phone call to the author herself, now residing in Australia, to get the answer.  But even this authoritative voice doesn’t quell the skepticism.  Shortly after they ring off, Col. Potter, the commanding officer, points out that Abigail Porterfield’s own solution can’t be true.  The episode closes with the delivery of the much-needed supplies, and some comic hijinks, but with no satisfactory explanation as to who the culprit was.

I was in middle school when I first saw that episode and it left a lasting impression on me.  For many years I carried misconceptions about mystery stories and I wondered why anyone would ever read them.  In particular, I held a very skewed idea about deductive reasoning and what can and cannot be determined from the evidence.  With the perspective of years (really decades) I am both happy and disappointed to say that I was not alone in my poor understanding of what logic and reason are capable of achieving.

Let’s talk about deductive logic first.  The basic idea behind deductive logic is that the conclusion is infallible if the premises are true.  It is a strong approach to logic, since it argues from first principles that apply to a broad class or set of objects and, from those, narrows down a conclusion about a specific object.  In a pictorial sense, deductive logic can be thought of in terms of Venn diagrams.  If we want to conclude something about an object we simply need to know into what categories or classes this object falls and we will be able to exactly conclude something about it by noting where all of the various categories to which it belongs intersect.

Deductive reasoning is, unfortunately, also limited by the fact that we are not born with nor does anyone have the universe’s user manual that spells out in detail what attributes each object has and into what categories they may be grouped.  So, the standard objections that are raised in deductive logic fall squarely on disagreements about the truth of one or more premises.

For example, the syllogism

  • All men are mammals
  • George is a man
  • Therefore George is a mammal

is a logically correct deduction, since the conclusion follows from the premises and it is true since the premises are true (or at least we regard them to be true). The syllogism

  • All men are mammals
  • George is a mammal
  • Therefore George is a man

is invalid, even though its premises are true, since it argues from the specific to the general.  In contrast, the syllogism

  • All white birds are man-eaters
  • All swans are white birds
  • Therefore all swans are man-eaters

is perfectly valid, since the conclusion follows from the premises, but is not true since neither premise is true (or so I hope!).

All of this should be familiar.  But what to make of this ‘deduction’ (B J’s syllogism) made by B J Hunnicutt in the episode:

  • Lord Cheevers was murdered in the locked library in Huntley Manor
  • Randolf had motive for murdering Lord Cheevers
  • Randolf played in Huntley Manor as a child
  • Randolf would have known if there were secret passages in Huntley Manor
  • Therefore Randolf was the murderer.

Is this really a deduction as he claimed?  According to the novel, the first three premises are true.  The fourth premise is certainly plausible but is not necessarily true.  How then should we feel about the conclusion?  What kind of logic is this if it is not deductive?  Suppose we knew that Randolf was the murderer (e.g., we caught him in the act). What can we infer about the fourth premise?

Before answering these questions, consider what would happen if we were to modify the argument a bit to simplify the various possibilities.  The syllogism (B J’s syllogism revised) now reads

  • Lord Cheevers was murdered in the locked library in Huntley Manor
  • Randolf had motive for murdering Lord Cheevers
  • Randolf played in Huntley Manor as a child
  • Randolf knew there was a secret passage from the study to the library
  • Randolf was seen entering the empty study just before the murder
  • Therefore Randolf was the murderer.

This argument is certainly a stronger one than the first one proffered, but it isn’t really conclusive.  But, again, how should we feel about the conclusion?  What kind of logic is this?

In both cases, we know that the conclusion is not iron-clad; that it doesn’t necessarily follow from the premises.  But just like those fictional characters in M*A*S*H, we are often faced with the need to draw a conclusion from a set of premises that do not completely ‘nail down’ an unequivocal conclusion.

The type of logic that deals with uncertainty falls under the broad descriptions of inductive and abductive reasoning.  Inductive reasoning allows us to draw a plausible conclusion ‘B’ from a set of premises ‘A’ without ‘B’ necessarily following from ‘A’.  Abductive reasoning allows us to infer the premise ‘A’ based on our knowledge that outcome ‘B’ has occurred.

In the M*A*S*H examples given above, B J’s revised syllogism is an example of inductive reasoning.  All the necessary ingredients are there for Randolf to have committed the crime but there is not enough evidence to inescapably conclude that he did. We can infer that Randolf is the killer but we can’t conclude that with certainty.

B J’s original syllogism is a lot more complicated.  It involved elements of both inductive and abductive reasoning.  If we believe Randolf is guilty, we might then try to establish that there were secret passages in Huntley Manor that connected the locked library to some other room in the mansion.  We would then have to also establish, maybe through eyewitness testimony, that Randolf knew of the passages (e.g., an old servant recalls showing it to a young Randolf).  Even still, all we would be doing is establishing the premises with more certainty.  The conclusion of his guilt would still not necessarily follow.  If, on the other hand, we knew that he was guilty, perhaps he was seen by someone looking into the library from outside, we might abductively infer that there was a secret passage and that Randolf knew of its existence.

So here it is, a great irony of life.  It’s decades after I first watched an episode of M*A*S*H that turned me off of mystery stories for a long time, and I find myself using that episode as a model for discussing logic and reason.  That, I can’t figure out.

The Language of Infinity

“How language shapes thought and thought shapes language” is an age old question in linguistics and philosophy.  I’m not in any position to give a definitive answer, nor, I suspect, is anyone else.  Having taught math and physics at the university level, I am willing to offer some thoughts about how the language of mathematics and the symbols and glyphs used to turn mathematical concepts into written words shape how people think and solve problems.

In this blog I will be focusing in the concept of infinity and the philosophic implications that come from using it.  But before I get to infinity directly, I would like to discuss, by way of a warm-up exercise, how the use of the symbol ‘x’ throws off a lot of beginning students.

When describing a function or mapping between two sets of real numbers, without a doubt, the most common notation that teachers use is to allow the symbol ‘x’, called the independent variable, to be any member of the initial set, and the symbols ‘y’ and ‘f(x)’ to be the corresponding member of the target set and the function that generates it.  The symbolic equation ‘y = f(x)’ becomes so rigidly fixed in some students minds, that the idea that the symbols ‘x’ or ‘y’ could be replaced with any other symbol, say ‘y’ and ‘z’, never occurs to them.  I myself have experiences of students coming and asking if their book has a typo when it asks them to solve ‘x = f(y)’ or integrate ‘f(y) dy’ or the like (once this happened while I was out to dinner at Olive Garden with my family – but that is a story for another day).

There is no easy way to fix this problem as there is a kind of catch 22 in the teaching of mathematics.  One on hand, the mapping between sets exists as a pictorial relation between ‘clouds’ and ‘the points within them’

pictorial_mapping

without the need for written glyphs.  One the other hand, an initial set of well-defined symbols keeps initial confusion to a minimum and allows the student to focus on the concepts without all the possible freedom of choice in notation getting in the way.  (Note:  a reader comfortable with classic philosophy may point out that a mapping between sets can be abstracted even further, perhaps to the notion of a Platonic form, but this is a side issue.)

Okay, with the appetizer firmly digesting in our minds, let’s turn to perhaps the most confusing symbol in all of mathematics, the symbol for infinity, ‘$\infty$’. This symbol, which looks like the number ‘8’ passed out after a night of heavy drinking, seduces students and instructors alike into all sorts of bad thoughts.

How does it have this power, you may ask? Well, its very shape, small and compact and slightly numberish, encourages our minds to treat it like all other numbers. There are literally countless examples of infinity masquerading as a plain number, much like a wolf in sheep’s clothing. One of the most egregious examples is the innocent-looking expression

\[ \int_0^\infty e^{-x^2} dx = \frac{\sqrt{\pi}}{2} \]

where ‘∞’ is compared side-by-side with the number ‘0’. There is perhaps a more palatable way of writing the integral as

\[ \lim_{a \rightarrow \infty} \int_0^a e^{-x^2} dx = \frac{\sqrt{\pi}}{2} \]

but it still looks like the number ‘a’ can be thought of as becoming or approaching ‘∞’. A seasoned practitioner actually knows that both expressions are really shorthand for something much more involved. I will summarize what this more involved thing is in one short and sweet sentence. Infinity is a process that you have the freedom to perform as many times as you like. Or even shorter: Infinity is an inexhaustible process.

Take a moment to think that through. Are you back with me? If you don’t see the wisdom in that maxim consider either form of the integral expression listed above. In both cases, what is really being said is the following. Pick an upper bound on the integral (call it ‘a’ to be consistent with the second form). Evaluate the integral for that value of ‘a’. Record the result. Now increase ‘a’ a bit more, maybe by doubling it or multiplying it by 10 or however you like, as long as it is bigger. Now evaluate the integral again and record the result. Keep at it until one of several things has happened: 1) the difference in the recorded values has gotten smaller than some threshold, 2) you run out of time, or 3) you run out of patience and decide to go do something else. The term infinity is simply meant to say that you have the freedom to decide when you stop and you also have the freedom to resume whenever you like and continue onwards.

If you are new to calculus, you will no doubt find this short sentence definition somewhat at odds with what your instructors have told you. Where are all the formal limits and precise nomenclature? Where is all the fancy machinery? If you are an old hand at calculus you may even be offended by the use of the words ‘process’, ‘freedom’, or ‘inexhaustible’. But this sentiment is exactly at the heart of the Cauchy delta-epsilon formalism, and the casual nomenclature has the advantage of ruthlessly demolishing the ‘high-brow’ language of mathematics to bring what is really a simple idea back to its rightful place as an everyday tool in the thinking person’s toolbox.

On the other hand you may be thinking that everyone knows this and that I am making a mountain out of a mole hill. If you fall into that camp, consider this video about the properties of zero by the Numberphiles.

I must admit I like many of the Numberphile’s videos, but this one made me shake my head. They allowed language to affect their thinking, and they were seduced by the evil camouflage powers of infinity. They go to great trouble to explain why you can’t divide by zero, and they note that people say “isn’t dividing by zero just infinity?” and they point out it isn’t that simple.

The problem is, it is that simple! Dividing by zero is infinity as understood by the maxim above. The Numberphiles prove this fact themselves. At about a minute into the video, one of their lot begins to explain how multiplication is ‘glorified addition’ and division is ‘glorified subtraction’. The argument for ‘glorified subtraction’ goes something like this.

If one wishes to divide 20 by 4, then one keeps subtracting 4 until one is lefts with a number smaller than 4 (in this case zero). The number of times one engages in this subtraction process is the answer, with whatever piece left over being the remainder. So dividing the number 17 by 5 is a shorthand for subtracting 5 from 17 three times and finding that one has 2 leftover. So one then says 17/5 = 3 with a remainder of 2.

The same bloke (I use bloke because of their cool English or Australian or whatever accents) then says that 20 divided by 0 goes on forever because each time you subtract 0 you are left with 20. Here then is the inexhaustible process that lives at the very heart of infinity. Sadly, while he looks like he is about to hit the bull’s-eye (at 2:20 he even says infinity isn’t a number), his aim goes horribly awry at the last moment when he objects to saying that the expression ‘1/0 = ∞’ can’t be true because one could then go on to say ‘1/0 = ∞ = 2/0’ from which one can say ‘1=2’.

This is, of course, a nonsensical objection since the expression ‘1/0 = ∞’ is a shorthand for saying ‘the glorified subtraction of 0 from 1 (in the sense used above) is an inexhaustible process.’ It is no more meaningful to say that this process is the same as the ‘glorified subtraction of 0 from 2’ as it is to say that ‘1/0’ is the same as any other inexhaustible process, like halving a non-zero number until you reach zero.

The fact that the words ‘0’, ‘1’, and ‘∞’ and the sentence ‘1/0 = ∞’ result in an illogical conclusion is an important warning about the power of language to shape thought. The Numberphile guys had all the right ideas but they came up with a wrong result.