Uncategorized

Self-Reference and Paradoxes

Conrad SchiffMay 23, 20150

The essence of the Gödel idea is to encode not just the facts but also the ‘facts about the facts’ of the formal system being examined within the framework of the system being examined. This meta-mathematics technique allowed Gödel to prove simple facts like ‘2 + 2 = 4’ and hard facts like ‘not all true statements are axioms or are theorems – some are simply out of reach of the formal system to prove’ within the context of the system itself. The hard facts come from the system talking about or referring to itself with its own language.

As astonishing as Godel’s theorem is, the concept of paradoxes within self-referential systems is actually a very common experience in natural language. All of us have played at one time or another with odd sentences like ‘This sentence is false!’. Examined from a strictly mechanical and logical vantage, how should that sentence be parsed? If the sentence is true then it is lying to us. If it is false, then it is sweetly and innocently telling us the truth. This example of the liar’s paradox has been known since antiquity and variation of it have appeared throughout the ages in stories of all sorts.

Perhaps the most famous example comes from the original Star Trek television series in an episode entitled ‘I Mudd’. In this installment of the ongoing adventures of the starship Enterprise, an impish Captain Kirk defeats a colony of androids that hold him and his crew hostage by exploiting their inability to be meta.

There are actually host of paradoxes (or antinomies in the technical speak) that some dwerping around on the internet can uncover in just a handful of clicks. They all arise when a formal system talks about itself in its own language and often their paradoxical nature arises when they talk about something of a negative nature. The sentence ‘This sentence is true,’ is fine while ‘This sentence is false.’ is not.

Not all of the examples show up as either interesting but useless tricks of the spoken language or as formal encodings in mathematical logic. One of the most interesting cases deals with libraries of either the brick and mortar variety or existing solely on hard drives and in RAM and FTP packets.

Consider for a moment that you’ve been given charge of a library. Properly speaking, a library has two basic components: the books to read and a system to catalog and locate the books so that they can be read. Now thinking about the books is no problem. They are the atoms of the system and so can be examined separately or in groups or classes. It is reasonable and natural to talk about a single book like ‘Moby Dick’ and to catalog this book along with all the other separate works that the library contains. It is also reasonable and natural to talk about all books written by Herman Melville and to catalog them within a new list with a title perhaps with the name ‘Lists of works by H. Melville’. A similar list can be made with grouping criterion selects books about the books by Melville. This list would have a title like ‘List of critiques and reviews of the works by H. Melville’.

An obvious extension would be to construct something like the following list.

List of Author Critiques and Reviews:

List of critiques and reviews of H. Melville
List of critiques and reviews of J. R. R. Tolkien
List of critiques and reviews of U. Eco
List of critiques and reviews of R. Stout
List of critiques and reviews of G. K. Chesterton
List of critiques and reviews of A. Christie
….

Since the lists are themselves written works what status do they have in the cataloging system? Should there also be lists of lists? If so, how deep should there construction go? At some point won’t we arrive at lists that have to refer to themselves and what do we do when we reach that point? Should the library catalog have a reference to itself as a written work?

Bertrand Russell wrestled with these questions in the context of set theory around the turn of the 20^th century. To continue on with the library example, Russell would label the ‘List of Author Critiques and Reviews’ as a normal set since it is a collection of things that doesn’t include itself. He would also label as an abnormal set, any list that would have itself as a member – in this case a catalog (i.e. list) of all lists pertaining to the library. General feeling suggests that the normal sets are well behaved but the abnormal sets are likely to cause problems. So let’s just focus on the normal sets. Russell asks the following question about the normal sets: Is the set, R, of all normal sets, itself normal or abnormal? If R is normal, then it must appear as a member in its own listing, thus making R abnormal. Alternatively, if R is abnormal, it can’t be listed as a member within itself and, therefore, it must be normal. No matter which way you start you are led to a contradiction.

The natural tendency is, at this point, to cry foul and to suggest that the whole thing is being drawn out to an absurd length. Short and simple answers to each of the questions posed in the earlier paragraph come to mind with the application of a little common sense. Lists should only be themselves cataloged if they are independent works that are distinct parts of the library. The overall library catalog need not list itself because it primary function is to help the patron find all the other books, publications, and related works in the library. If the patron can find the catalog, then there is no need to have it listed within itself. One the other hand, if the patron cannot find the catalog, having it listed within itself serves no purpose – the patron will need something else to point him towards the catalog.

And as far as Russell and perfidious paradox is concerned, who cares? This might be a matter to worry about if one is a stuffy logician who can’t get a date on a Saturday night but normal people (does this mean Russell and his kind are abnormal?) have better things to do with their lives than worry about such ridiculous ideas.

Despite these responses, or maybe because of them, we should care. Application of common sense is actually quite sophisticated even if we are quite unaware of the subtleties involved. In all of these common-sensical responses there is an implicit assumption about something above or outside. If the patron can’t find the library catalog, well then that is what a librarian is for – to point the way to the catalog. The librarian doesn’t need to be referred to or listed in the catalog. He sits outside the system and can act as an entry point into the system. If there is a paradox in set theory, not to worry, there are more important things than complete consistency in formal systems.

This is concept of sitting outside the system, is at the heart of the current differences between human intelligence and machine intelligence. The later, codified by the formal rules of logic, can’t resolve these kinds of paradoxes precisely because they can’t step outside themselves like people can. And maybe they never will.

Uncategorized

May

2015

Gödel’s Theorem – General Notions

Conrad SchiffMay 16, 20150

As I mentioned in last week’s blog, I was encouraged by Casti’s chapter on the Halting Theorem and questions of undecidability in logic and computing. In fact, I was inspired enough that I resolved have another go at studying Gödel’s theorem.

To give some background, many years ago I came across ‘Gödel, Escher, and Bach: An Eternal Golden Braid’ by Douglas Hofstadter. While I appreciate that his work is considered a classic, I found it difficult and ponderous. Its over 700 pages of popularized work did little to nothing to really sink home Gödel’s theorem and the connections to Turing and Church. What’s more, Hofstadter seems to say (it’s difficult to tell exactly as he mixes and muddles many concepts) that Gödel’s work supports his ideas that consciousness can emerge from purely mechanistic means at the lowest level. This point always seemed dodgy to me. Especially since Hofstadter left me with the impression that Gödel’s theorem showed a fundamental lack in basic logic not an emergent property.

For this go around, I decided that smaller was better and picked up the slim work entitled ‘Gödel’s Theorem’ by Nagel and Newman. What a difference a serious exposition makes. Nagel and Newman present all the essential flavor and some of the machinery that Gödel used in a mere 102 pages.

Note that formal logic is not one of my strong suits and the intellectual terrain was rocky and difficult. Nonetheless, Nagel and Newman’s basic program was laid out well and consisted of the following points.

To start, the whole thing was initiated by David Hilbert, whose Second Problem, challenged the mathematical community to provide absolute proof of the consistency of a formal system (specifically arithmetic) based solely on its structure. This idea of absolute proof stands in contrast to a relative proof where the system is question is put into relation to another system, whose validity and consistency is accepted. If the relation between the two is faithful, then the consistency of the second system carries over or is imparted to the first.

Hilbert was unsatisfied by the relative approach as it depended on some system being accepted at face value as being consistent and finding such a system was a tricky proposition.

The preferred way to implement an absolute proof is to start by stripping away all meaning from the system and to deal only with abstract symbols and a mechanistic way of manipulating these symbols using well-defined rules. The example that Nagel and Newman present is the sentential calculus slightly adapted from the ‘Principia Mathematica’ by Whitehead and Russell. The codification of the formal logic system depends on symbols that fall into two classes: variables and constant signs. Variables, denoted by letters, stand for statements. For example, ‘p’ could stand for ‘all punters kick the football far’. There are six constant signs with the mapping

~	Not
$\vee$	Or
$\supset$	If… Then…
$\cdot$	And
(	Left-hand punctuation
)	Right-hand punctuation

The idea is then to map all content-laden statements, like ‘If either John or Tim are late then we will miss the bus!’, to formal statements like ( (J $\vee$ T ) $\supset$ B) with all meaning and additional fluff removed. Two rules for manipulating the symbols, the Rule of Substitution and the Rule of Detachment, are adopted and four axioms are used as starting points.

In using this system, one has to sharpen one’s thinking to be able to distinguish between statements in the system (mathematical statements like ‘2 > 1’) from statements about the system (meta-mathematical statements like ‘2>1’ is true or that the ‘>’ symbol is an infix operator). One must also be careful to note the subtle differences between the symbol ‘0’, the number 0, and the concept of zero meaning nothing.

The advantage of this approach is that the proofs are cleaner, especially when there are many symbols. The disadvantage is that it takes time and effort to be able to work in this language.

The consistency of the formal system is shown when there is at least one formal statement (or formula) that cannot be derived from the axioms. The reason for this is complicated and I don’t have a good grasp on it but it goes something like this. The following formula ‘p $\supset$ (~ p $\supset$ q)’ can be derived in the sentential calculus. If the statements S and ~S are both deducible then any statement you like can be derived from the axioms (via the Rules of Substitution and Detachment) and the system is clearly inconsistent. In the words of Nagel and Newman:

‘The task, therefore, is to show that there is at least one formula that cannot be derived from the axioms.’ [p51]

For the sentential calculus, they point out that the formula ‘p $\vee$ q’ fits the bill since it is a formula, it doesn’t follow from the axiom. Thus this system is consistent. Note that there is no truth statement attached to this formula. The observation simply means that ‘p $\vee$ q’ can’t be obtained from the axioms by a mechanical manipulation. They present this argument in their chapter titled ‘An Example of a Successful Absolute Proof of Consistency’.

Well despite that lofty achievement, they go on to show how Gödel took a similar approach and mostly ended any hope for a proof of absolute consistency in formal systems with a great deal more complexity. Gödel used as his symbols the numerals and as his rules the basic operations of arithmetic and, in particular, the arithmetic of prime numbers. Using an ingenious mapping from of the variables and constant symbols to numbers, he not only could encode the structure of the formal system itself, he could also encode statements about the formal system as well (meta-mathematics). In the language of Hofstadter, these are self-referential statements, although Nagel and Newman don’t use this term.

Using this approach, Gödel was able to prove that there is no absolute proof of consistency. At best, the system can say about itself that it is either incomplete or inconsistent. If the system is incomplete, there are true statements that are not part of the axioms and that cannot be derived from them. Enlarging the set of axioms to include them doesn’t work since they presence begets new unprovable truths. If the system is inconsistent, then everything is ‘true’ as discussed above.

Nagel and Newman leave the reader with come final thoughts that are worth contemplation. On the hope that Hilbert’s program can be successfully completed they have this to say

‘These conclusions show that the prospect of finding for every deductive system an absolute proof of consistency that satisfies the finitistic requirement’s of Hilbert’s proposal, though not logically impossible, is most unlikely.’

They also comment on the possibility of proving consistency from an outside-looking-in approach using meta-mathematical techniques when they say

‘[W]hether an all-inclusive definition of mathematical or logical truth can be devised, and whether, as Gödel himself appears to believe, only a thoroughgoing philosophical ‘realism’ of the ancient Platonic type can supply an adequate definition, are problems still under debate…’

Finally, they have this to say about artificial intelligence (although that term wasn’t in vogue at the time they published

‘[T]he brain appears to embody a structure of rules of operation which is far more powerful than the structure of currently conceived artificial machines. There is no immediate prospect of replacing the human mind by robots.’

And there you have it. A whirlwind tour of Gödel’s theorem with a surprise appearance of the philosophy from antiquity and the ideas about artificial intelligence.

Uncategorized

May

2015

Turing, Gödel, and the Universe

Conrad SchiffMay 9, 20150

Something about the book ‘Five Golden Rules: Great Theories of 20^th-Century Mathematics and Why They Matter’ by John L. Casti caught my eye the other day at the library. On a whim, I signed the book out and started reading. Living up to the promise of the title, the book had five chapters, each one devoted to one of the great math theories of 20^th century. All in all, I had been exposed, at least superficially, to all the topics covered so I wasn’t sure what I would get out of the book other than some additional insight into material I already knew or confirmation of my wisdom in staying away from topics that I didn’t.

Anyway, I am quite glad that the mechanism of providence pointed me at this book because the connections that Casti draws are worth thinking about. None of these connections was as profound for me as the deep link that he explores in Chapter 4 entitled ‘The Halting Theorem (Theory of Computation)’.

In this chapter, Casti first presents the concept of the universal Turing machine (UTM) as a mechanism for attacking the question of Decision Problem (or Entscheidungsproblem) proposed by David Hilbert in 1928. Casti then couples this presentation with a discussion of the famous Gödel Incompleteness Theorem.

I’m simply a beginner in these fields and Casti omits important details and glosses over a variety of things to help the novice. From this point-of-view, I can’t recommend his work. But I am grateful and excited about the perspective he provided by making this linkage.

To understand my excitement, let me first try to fill in some of the background as best as I can.

Hilbert’s Decision Problem asks the following question. Given a set of input data and an algorithm for manipulating said data, is there a way to know if the algorithm will be able to make a yes/no decision about the input. For example, if the input data is a set of axioms in a logical system and some corresponding assertion in the same system and the algorithm is a logical formalism, such as the classical syllogism, will the algorithm be able to prove or disprove the assertion as either true (yes) or false (no)?

While the Decision Problem might seem straightforward enough to talk about it at a seminar, when it comes to actually tackling the question there are some vague conceptions that need further elaboration. Specifically, what does the term ‘algorithm’ really mean and what constitutes a workable algorithm seem to have been the murkiest part when the problem was first posed.

Turing chose to clarify the algorithm concept by the invention of the machine which bears his name. The UTM is a basic model of computation that is often used to understand the theory behind computer programming. It is also the avenue that allowed Turing to a way to tackle Hilbert’s Decision Problem. The basic ingredients for a UTM are a tape or strip of paper, divided into squares, a set of symbols (usually ‘0’ and ‘1’) that can be written into the squares, and a box that has a mechanism for moving the strip left or right, a head that reads from and writes to the strip and that contains an internal state that allows the UTM to select an action from a pre-defined set, given the internal state and current symbol being read. A typical graphical representation of an UTM looks like

The UTM can represent the input to the decision process by a particular pattern of symbols on the strip. It can also implement the step associated with an algorithm by encoding these also to symbols on the tape. So the entire nature of the Decision Problem comes down to the question as to whether once the UTM starts if it will ever halt; hence the name of the name of the chapter.

Kurt Gödel took a different approach to answering the Decision Problem. He developed a way to map any formal system to a set of numbers, called Gödel numbers. Then by formally manipulating these numbers, he was in position to try to answer Hilbert’s challenge.

Now it is reasonable to suppose that all not every question has a yes or no answer. Questions about feeling, opinion, or taste spring to mind. But I think that most of us expect that questions of mathematics and logic and programming have answers and that we should be able to figure them out. The remarkable thing about the work of both Turing and Gödel is that there are cases where the formal logical system simply can’t decide.

In the language of Turing, there are computational problems for which there is no knowing beforehand if the algorithm will terminate with an answer. In the language of Gödel, no matter is done to a logical system in terms of refining existing axioms or adopting new ones, there will always be truths that are unprovable.

I was quite aware of Godel’s theorem and even wrestled with it for a while when I was concerned about its implications to physical systems. Eventually, I decided that while man’s logic may be limited, nature didn’t need to worry about it because she could make decisions unencumbered by our shortcomings.

I was also quite aware that Turing’s machine was used fruitfully as a model computation. What I was unaware of until reading Casti’s book was the parallel between the conclusions of Gödel and of Turing.

And here we arrive at the source of my excitement. As I’ve already said, I remain convinced that Nature can decide – that is to say that Nature is free of the issues discussed above. And yet, in some capacity the Universe is an enormous Turing machine.

So why does Nature always make a decision and why does the computation of trajectories, and the motion of waves, and evolution of quantum bit of stuff always reach a decision? It may not be the decision we expect or anticipate, but it seems clear that a decision is reached. And so, by looking at how the Universe as a Turing Machine (UaaTM) differs from a Universal Turing Machine, one may learn more about both. This is the exciting new idea that has occurred to me and one which I will be exploring in the coming months.

Uncategorized

May

2015

States, Abstraction, and Implicit Objects

Conrad SchiffMay 1, 20150

This post grew out of a conversation in which I recently participated, about the correct way to computationally model a system. The conversation covered the usual strategies of identifying the system’s states and how to abstract the appropriate objects to hold or contain the states based on the question at hand. But what was most interesting were the arguments put forward about the importance of recognizing not only the explicit objects used in the modeling but also the implicit objects that were ignored.

By way of a general definition, an implicit object is an object whose existence is needed in the computer modeling of the system in question but one that is not represented in code as either a real or virtual class. I don’t think this definition is used in computer science or programming books but it should be. This is a point that is almost entirely ignored in the field but is vital for a correct understanding of many systems.

A simple example will go a long way to clarify the definition and to also raise awareness that implicit objects are used more often than people realize. The system to be modeled is the motion of traffic on a highway. For simplicity, we’ll focus on two cars, each traveling along a straight stretch of road. Generalizations to more complicated situations should be obvious but don’t add anything to the discussion. A picture of the situation is

Here we have a blue car and a green car moving to the right as denoted by the matching colored arrows. The typical abstraction at this point is to create a car class with appropriate attributes and member functions for the questions we want to answer. Let’s suppose that we are interested in modeling (with an eye towards prevention) the collision between the two cars. What ingredients are needed to perform this modeling?

A collision between two objects occurs, by definition, when they try to occupy the same space at the same time. Unpacking this sentence leads us to the notion of position for each car as a function of time and a measure of the physical extent of each car. Collision occurs when the relative position between the centers of the two cars is such that their physical extents overlap.

Now, real life can be rather complicated, and this situation is no different. The physical shape of a car is generally hard to model with many faces, facets, and extrusions. And the position of the car is a function of what the driver perceives, how the driver reacts, and how much acceleration or deceleration the driver imposes.

In time-honored fashion, we will idealize the cars as having a simple two-dimensional rectangular shape as shown in the figure, and will also assume that the cars have a simple control law (which we don’t need to specify in detail here except to say that it depends on the position, velocity, and time) that gives the velocity of the car at any instant of time. With these simplifying assumptions, the trajectory of the car is completely specified by giving its position at some initial time and then using the control law to update the velocity at each time step. The position is then updated using the simple formula ${\vec x}(t+dt) = {\vec x}(t) + {\vec v} dt$.

With all these data specified, we might arrive at a class definition for the car object that looks like:

At this point, we’ve already encountered our first two implicit objects.

The first one is a coordinate frame, comprised of an origin and an orientation of the coordinate axes, which tells us what the position of the object is measured against. This coordinate frame is an object in its own right and, in many circles, worthy of study. Here it is simply important to acknowledge that it exists and that the width and length parameters need to be defined consistently with regards to it. This is an important point, since a change in coordinate frame orientation will change the test (to be described later) for the geometric condition of overlap (i.e., collision).

The second implicit object is much more subtle and it took the discovery of the principles behind special relativity to drive that point home. Both cars share a common clock to describe their motion. That is to say that their velocities are only meaningfully compared if their clocks are synchronized and ticking at the same speed. The role of the implicit clock is usually carried out by the main program of the simulation but, since each object holds its own time, care must be exercised to keep them synchronized. It is possible to make this common-clock assumption stronger by eliminating time from the list of member data, but that approach also has its drawbacks, as will be touched on below.

The final question is, how do we determine if the cars collide somewhere during the simulation? There are two possible approaches. The first is that we give each car a new member function, called detect_collision, which takes as arguments the position of the other car and its size. This is not an unreasonable approach for simple situations, but it quickly becomes unwieldy should we later want to add fidelity to the simulation. For example, if we wanted to allow for the body of the car changing its orientation as it changes lanes, then the number of parameters that we have to hand into the detect_collision function has to increase. A more robust way to do this is to perform this computation at the main workspace in the simulation. If this method is chosen, we’ve now introduced a third implicit object. It’s not quite clear from the context what to call this object, but it’s function is absolutely clear – it communicates to the world the message that the two cars have collided.

This last notion may seem like a trivial observation leading to an unnecessary complication but consider the case where each car is equipped with some sensor that warns of an eminent collision. Remote sensing requires some model of the medium between the sensor and thing being sensed. If the collision detector uses sound, then the air between the cars must be modeled as an elastic medium, with a finite propagation speed of the signal, and variations in this medium due to temperature, humidity, and the like may need also need to be modeled. If the collision detector uses electromagnetic waves (radar or laser ranging), then the index of refraction and dispersion properties of the air become the key parameters that again may depend on the thermodynamics of the atmosphere. In either case, this communicating medium now must be promoted from an implicit object to an explicit one and the times of the two objects take on new meaning as we need to track the transmit and receive times of the signals sent between them.

The lesson, I hope, is clear. It is just as important to pay attention to the implicit objects in a simulation as it is to the explicit ones. There is a parallel here with the spoken word – where it is often as important to pay attention to what was left unsaid as it is to what was said.

Uncategorized

May

2015

Finding the Socratic Method

Conrad SchiffApril 24, 20150

There is a standard debate in mathematics about the application of the terms ‘invention’ versus ‘discovery’. It resurfaced the other day when a colleague and I were talking about some mathematical graffiti that adorned a door jamb in the conference room in which we were meeting. This graffiti took the form of some mathematical symbols printed on a magnet held in place at the top part of the door. None of us in the room were able to determine what, if any, message was being sent but, in the process of discussing the possible meaning, my colleague said, in passing, that Pythagoras had invented the theorem that bears his name. I questioned whether the verb should have been ‘discovered’ rather than ‘invented’. We spent a few minutes discussing that point, then we gave up altogether and went our separate ways. On the drive home that evening I began to think about the proper use of those two words and I finished up wondering if Socrates invented or discovered his famous method.

To understand the distinction between ‘invent’ and ‘discover’, let’s return to the Pythagorean Theorem for a moment. Most everyone knows the theorem, namely that the square of the length of the hypotenuse of a right triangle is equal to the sum of the squares of the lengths of the other 2 sides. The most common proof and, in my opinion, the most elegant, draws a right triangle with squares of the appropriate areas on each side. I’ve provided such a diagram for a 3-4-5 Pythagorean triangle below.

The proof proceeds by laying the bigger of the two 90-degree side squares onto the hypotenuse square

and then adding up the remaining area and showing that it is equal to the area of the smaller square

Of course, the steps shown above were quickly done in PowerPoint graphics and there is no reason for a skeptic to actually accept them as proof. But the doubters can go at this ‘proof’ with whatever vigor they desire. The answer will always be the same: $a^2 + b^2 = c^2$.

And that brings us to the point about invention versus discovery. I would argue that the Pythagorean Theorem is an exercise in discovery. That finding that all right triangles satisfy it is exactly the point – we find or discover that all right triangles obey this relationship.

Contrast this with Edison’s invention of the light bulb. I say invention because of a number of factors. First, there is the particular form of the object in question. The base, with its two contacts, one at the bottom center and one on the periphery, is choice of form that could have been done many other ways. The shape of the bulb itself is only a suggestion of what could be done given the state of art of glass blowing at the time of its introduction to society. Second, there is the particular design and implementation. The placement, current and voltage running through the filament were all carefully chosen to meet specific goals or requirements. The materials that comprise all the parts were chosen to provide the maximum economy based on availability and convenience in pre-existing manufacturing processes. So I would argue that, while it is proper to say that laws and properties of electricity were discovered, the light bulb is a true invention.

So far so good, but what about Socrates? Did he discover or invent the Socratic Method? As a quick review, a few brief words are in order about what I mean when I say the Socratic Method (ironically, if you follow the link to the Wikipedia article, you’ll find that both ‘invention’ and ‘discovery’ are used in describing Socrates’s contribution).

The Socratic Method is best explained by the Platonic dialog called Euthyphro. In this dialog, we find Socrates and Euthyphro both showing up at the Athenian court but for very different reasons. Socrates is answering a call by the court to make account of his ‘criminal ways’ whereas Euthyphro intends on petitioning the court to bring a charge of murder against his father for the death of a slave in his possession. The two meet on the steps leading inside and exchange with each other their reasons for being there. Socrates expresses surprise that Euthyphro is accusing his father of murder since the slave in question died from being imprisoned for murdering another slave. Euthyphro says that he is compelled to this course of action due to his piety. That’s all the prompting needed by Socrates and soon the two are engaged in a discussion where Socrates asks a question like ‘what is piety’ and Euthyphro attempts to answer with a response like ‘what is pleasing to the gods’. After each answer, Socrates questions some new part of the response as a way of sharpening the reasoning behind the response.

The Socratic Method is a way of examining the logical content of a statement by carefully examining the basic notions that make up that statement. So, asking what do ‘piety’, ‘pleasing’ and ‘gods’ mean is a way of finding the truth. Generally, when the method is applied, we are more apt to find out what a particular concept, like ‘piety’, isn’t, rather than finding out what it is. Most of the dialogs (and, for that matter, modern applications) end with both parties departing before the full meaning has been established but at least with a clearer picture of what is not meant.

So, with all the preliminaries out of the way, the key question to grapple with is whether Socrates invented this method or discovered it. My vote is for discovery. I say this mostly because of the universal nature of this mode of inquiry, but partly because Socrates believed in Truth in the most absolute sense. If he invented this type of intellectual exploration, then the application of it would be necessarily limited to those contexts where its design matched a particular cast of mind or cultural milieu. The fact that it is a successful philosophical pursuit the world over is testament to its ability to transcend the accidentals of human culture. The fact that it was fashioned with the goal of discovering Truth through logic and reason and that Socrates believed in Truth leads me to believe that he would agree that he discovered his method.

I am willing to say that Plato invented the particular encounters presented in the dialogs and that Socrates invented the accidental trappings whereby he applied his method to the Athenian society. Recognizing these inventions is equivalent to recognizing the invention of the particular symbols for denoting an algebraic quantity as ‘$a$’ or multiplication by ‘$\times$’ or equality as ‘$=$’. Writing $c \times c = a \times a + b \times b$ versus saying ‘the square of the hypotenuse is equal to the sum of the squares of the other two sides’ are two different, invented ways for expressing the same truth.

Uncategorized

Apr

2015

Language and Metaphor

Conrad SchiffFebruary 27, 20150

Why do we quote movies? I often think about this fascination we have as a culture to repeat and relive moments from our favorite films. Large number of clips exist on YouTube devoted to capturing that magical moment when a character utters an unforgettable line. But why? What is it that drives us to remember a great line or identify with the person who uttered a quote?

This question came up over the dinner table one night and, as I reflected on this question, my thoughts were drawn to an explanation about speed reading that I had once come across. The author went to great trouble to make the point that that the trick to speed reading was to see groups of words in a sentence as a single chunk of information.

To understand that point, consider the words you read in a sentence. To make it concrete, let’s take the word ‘understand’. When you read the word ‘understand’, you are seeing a group of 11 individual letters, but are you really conscious of each letter at a time? Do you really see a ‘u’ followed by an ‘n’ followed by a ‘d’ and so on? No. What each of us with any sophistication in reading accomplishes is to perceive these 11 letters as a unit that conveys the word ‘understand’. Of course, this is why we can read a sentence with a misspelling quite comfortably and often we may not even notice.

This idea of chunking and clumping comfortably scales upward to where common phrases can be consumed with a glance without a detailed examination of the individual words. Two common approaches are to either abbreviate the phrase into some short form. Expressions like ‘LOL’ or ‘BTW’ or any of the other text and chat speak concepts are excellent examples. The other approach is to pick lyrical or repetitive expressions to encapsulate a phrase. The famous ‘rosy-fingered dawn’ from the Homeric epics or ‘ready, set, go!’ are examples of these kinds, respectively.

But there is an even more compelling approach – the concept of the metaphor. The idea here being that we liken an object to another object, one with well-known properties. The properties of the second object are then inherited by the first simply by the equating of the name. Some examples of this include the following sentences.

‘That guy’s a Benedict Arnold for leaving our team for that other one.’
‘How was the date last night, Romeo?’
‘Stay away from her, she’s Mata Hari through and through!’
‘That man is death itself’.

I think that our desire to quote movies is indicative of this. That by repeating the dialog of the movie, the quote itself becomes a cultural metaphor for the feelings and experiences expressed in the movie. This idea was brilliantly explored in the Star Trek: The Next Generation episode Darmok.

In this episode, the crew of the Enterprise are coaxed into a meeting with a Tamarian ship in orbit around a mostly unexplored planet. Despite their universal translator, no ship of the Federation had ever been able to crack the Tamarian language. The words themselves could be translated, but the syntax and context seemed to be utterly devoid of meaning. The Tamarian captain, Dathon, seeing that this meeting was no different from previous encounters and that the ordinary avenues for communication were not working, arranged for himself and Captain Picard to be trapped on a planet.

On that planet, both captains were confronted by a dangerous creature. This shared danger spurred a meeting of the minds and eventual understanding dawned on Picard. The Tamarian race thought and communicated in metaphors. They would say statements like ‘Tember, his arms wide’ to mean the concept of giving or generosity. Back on the Enterprise, the crew had also come to a similar epiphany. By analogy, they constructed a Tamarian-like way of expressing romance by saying ‘Juliet on her balcony’, but they lamented that without the proper context, in this case Shakespeare’s tragic play about Romeo and Juliet, one didn’t know who Juliet was and why she was on her balcony.

The episode closes with Dathon dying from the wounds inflicted by the creature, and with Picard arriving back aboard the Enterprise just in time to make peace between the Tamarians and the Federation by speaking their language.

The episode left some dangling ideas. How do the Tamarians specify an offer that involves a choice between many things, or how an abstract idea, like giving someone his freedom, would be expressed. Nonetheless, it was a creative and insightful way of exploring how powerful metaphor can be and how abstracted can be the thought that lies behind it.

So, the next time you quote a movie, give a thought to the metaphor that you are tapping into and take a moment to marvel at the miracle of speech and thought.

Uncategorized

Feb

2015

Aces High

Conrad SchiffFebruary 21, 20150

This week’s column has a three-fold inspiration. First off, most of the most exciting and controversial philosophical endeavors always involve arguing from incomplete or imprecise information to a general conclusion. These inferences fall under the heading of either inductive or abductive reasoning, and most of the real sticking points in modern society (or any society for that matter) revolve around how well a conclusion is supported by fact and argument. The second source comes from my recent dabbling in artificial intelligence. I am currently taking the edx course CS188.1x, and the basic message is that the real steps forward that have been taking shape in the AI landscape came after the realization was made that computers must be equipped to handle incomplete, noisy, and inconsistent data. Statistical analysis and inference deals, by design, with such data, and its use allows for an algorithm to make a rational decision in such circumstances. The final inspiration came from a fall meeting of my local AAPT chapter in which Carl Mungan of the United States Naval Academy gave a talk. His discussion of the two aces problem was nicely done, and I decided to see how to produce code in Python that would ‘experimentally’ verify the results.

Before presenting the various pieces of code, let’s talk a bit about the problem. This particular problem is due to Boas in her book Mathematical Methods for the Physical Sciences, and goes something like this:

What is the probability that, when being dealt two random cards from a standard deck, the two cards will be two aces? Suppose that one of the cards is known to be an ace, what is the probability? Suppose that one of the cards is known to be the ace of spades, what is the probability?

The first part of this three-fold problem is very well defined and relatively simple to calculate. The second two parts require some clarification of the phrase ‘is known to be’. The first possible interpretation is that the player separates out the aces from the deck and either chooses one of them at random (part 2) or he chooses the ace of spades (part 3). He returns the remaining 3 aces to the deck, which he subsequently shuffles prior to drawing the second card. I will refer to this method as the solitaire option. The second possible interpretation (due to Mungan) is that a dealer draws two cards at random and examines them both while keeping their identity hidden from the player. If one of the cards is an ace (part 2) or is the ace of spades (part 3), the dealer then gives the cards to the player. Parts 2 and 3 of the question then ask for the probability that the hands that pass this inspection step actually have two aces. Since this last process involves assistance from the dealer, I will refer to it as the assisted option. Finally, all probabilities will be understood from the frequentist point of view. That is to say that each probability will be computed as a ratio of desired outcomes to all possible outcomes.

With those preliminary definitions out of the way, let’s compute some probabilities by determining various two-card hand outcomes.

First let’s calculate the number of possible two-card hands. Since there are 52 choices for the first card and 51 for the second, there are 52×51/2 = 1326 possible two-card hands. Of these, there are 48×47/2 = 1128 possible two-card hands with no aces, since the set of cards without the aces is comprised of 48 cards. Notice that, in both of these computations, we divide by 2 to account for the fact that order doesn’t matter. For example, a 2 of clubs as the first card and a 3 of diamonds as the second is the same hand as a 3 of diamonds as the first card and a 2 of clubs as the second.

Likewise, there are 48×4 = 192 ways of getting a hand with only 1 ace. Note that there is no need to divide by 2, since the two cards are drawn from two different sets.

Finally, there are only 6 ways to get a two-ace hand. These are the 6 unique pairs that can be constructed from the green set shown in the above figure.

As a check, we should sum the size of the individual set and confirm that it equals the size of the total number of two-card hands. This sum is 1128 + 192 + 6 for no-ace, one-ace, and two-ace hands, and it totals 1326, which is equal to the size of the two-card hands. So, the division into subsets is correct.

With the size of these sets well understood, it is reasonably easy to calculate the probabilities asked for in the Boas problem. In addition, we’ll be in a position to determine something about the behavior of the algorithms developed to model the assisted option.

For part 1, the probability is easy to find as the ratio of all possible two-ace hands (6 of these) to the number of all possible two-card hands (1326 of these). Calculating this ratio gives 6/1326 = 0.004525 as the probability of pulling a two-ace hand from a random draw from a well-shuffled standard deck.

For parts 2 and 3 in the solitaire option, the first card is either given to be an ace or the ace of spades. In both cases, the probability of getting another ace is the fraction of times that one of the three remaining aces is pulled from the deck that now holds 51 cards. The probability is then 3/15 or 0.05882.

The answers for the assisted option for parts 2 and 3 are a bit more subtle. For part 2, the dealer assists by winnowing the possible set down from 1326 possibilities associated with all two-card hands, to only 192 possibilities associated with all one-ace hands, plus 6 possibilities associated with all two-ace hands. The correct ratio is 6/198 = 0.03030 as the probability for getting a single hand with two aces when it is known that one is an ace. For part 3, the dealer is even more zealous in his diminishment of the set of possible hands with one card, the ace of spades. After he is done, there are only 51 possibilities, of which 3 are winners, and so the correct ratio is 3/51 = 0.05882 as the probability of getting a single hand with two aces when it is known that one is the ace of spades.

All of these probabilities are easily checked by writing computer code. I’ve chosen python because it is very easy to perform string concatenation and to append and remove items from lists. The basic data structure is the deck, which is a list of 52 cards constructed from 4 suits (‘S’,’H’,’D’,’C’), ranked in traditional bridge order, and 13 cards (‘A’,’2’,’3’,’4’,’5’,’6’,’7’,’8’,’9’,’10’,’J’,’Q’,’K’). Picking a card at random is done importing the random package and using random.choice, which returns a random element of a list passed into it as an argument. Using the random package in the code turns the computer modeling into a Monte Carlo simulation. For all cases modeled, I input the number of Monte Carlo trials to be N = 1,000,000.

Code to model part 1 and parts 2 and 3 for the solitaire option is easy to implement and understand, so I don’t include it here. The Monte Carlo results (10 cases each with N trials) certainly support the analysis done above, but, if one didn’t know how to do the combinatorics, one would only be able to conclude that the results are approximately 0.0045204 +/- 0.00017.

The code to model parts 2 and 3 for the assisted option is a bit more involved because the dealer (played by the computer) has to draw a hand then either accept or reject it. Of the N Monte Carlo trials drawn, what percentage of them will be rejected? For part 2, this amounts to determining what the ratio of two-card hands that have at least one ace relative to all the possible two-card hands. This ratio is 192/1326 = 0.1448. So, roughly 85.5 % of the time the dealer is wasting my time and his. This lack of economy becomes more pronounced when the dealer rejects anything without an ace-of-spades. In this case, the ratio is 51/1326 = 1/52 = 0.01923, and approximately 98% of the time the dealer is throwing away the dealt cards because they don’t meet the standard. In both cases, the Monte Carlo results support the combinatoric analysis with 0.03031 +/- 0.0013 and 0.05948 +/- 0.0044.

import random                

suits   = ['S','H','C','D']
cards   = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']

def make_deck():
    deck = []
    for s in suits:
        for c in cards:
            deck.append(c+s)
    return deck
  
def one_card_ace(num_mc):
    num_times = 0
    num_good  = 0
    deck     = make_deck()
    for i in range(0,num_mc,1):
        card1 = random.choice(deck)
        deck.remove(card1)
        card2 = random.choice(deck)
        deck.append(card1)
        if card1.find('A') == 0 or card2.find('A') == 0:
            num_times = num_times + 1
            if card1.find('A') == 0 and card2.find('A') == 0:
                num_good = num_good + 1
    print deck
    return [num_times, num_good]

def ace_of_spades(num_mc):
    num_times = 0
    num_good  = 0
    deck     = make_deck()
    for i in range(0,num_mc,1):
        card1 = random.choice(deck)
        deck.remove(card1)
        card2 = random.choice(deck)
        deck.append(card1)
        if card1.find('AS') == 0:
            num_times = num_times + 1
            if card2.find('A') == 0:
                num_good = num_good + 1
    print deck
    return [num_times,num_good]

Notice that the uncertainty in the Monte Carlo results grows larger in part 2 and even larger in part 3. This reflects the fact that the dealer only really affords us about 150,000 and 19,000 trials of useful information due to the rejection process.

Finally, there are a couple of philosophical points to touch on briefly. First, the Monte Carlo results certainly support the frequentist point of view, but they are not actual proofs of the results. Even more troubling is that a set enumeration, such as given above, is not a proof of the probabilities, either. It is a compelling argument and an excellent model, but it presupposes that the probability should be calculated by the ratios as above. Fundamentally, there is no way to actually prove the assumption that set ratios give us the correct probabilities. This assumption rests on the belief that all possible two-card hands are equally likely. This is a very reasonable assumption, but it is an assumption nonetheless. Second, there is often an accompanying statement along the lines that, the more that is known, the higher the likelihood of the result. For example, knowing that one of the cards was an ace increased the likelihood that both were an ace by a factor of 6.7. While true, this statement is a bit misleading, since, in order to know, the dealer had to face the more realistic odds that 82 percent of the time he would be rejecting the hand. So, as the player, our uncertainty was reduced only at the expense of a great deal of effort done on our behalf by another party. This observation has implications for statistical inference that I will explore in a future column.

Uncategorized

Feb

2015

Random Thoughts on Randomness

Conrad SchiffFebruary 13, 20150

One of the questions I like to indulge, from time to time, is “what would Aristotle’s reaction be to some fixture of modern life?” This week I decided to ponder what the famous Macedonian would say about the use of randomness, statistics, and probabilistic reasoning that pervades our daily existence.

On one hand, I think that he would be intrigued by the very way that randomness is built into nature and our interactions with her. Quantum mechanics has randomness at its very core. This type of randomness, sometimes referred to as aleatory (meaning depending on the dice) uncertainty, is (or at least seems to be) an inescapable facet of how matter works at its most basic level. The outcome of any experiment is probabilistic no matter how precisely controlled was the setup. I imagine the fascination and, perhaps, the disbelief that he would have displayed when first encountering the Heisenberg uncertainty principle. In fact, since I am fantasizing here, it would be really interesting to see these two great men sit down for lunch and a long philosophical discussion about the meaning of uncertainty and causality.

I also think that Aristotle would have quickly grasped how we apply uncertainty in our knowledge, sometimes referred to as epistemological uncertainty, to a host of important everyday activities.

I envision him nodding his head in approval to the sciences of chemistry, thermodynamics, and statistical mechanics. In each of these disciplines, the sheer amount of data needed to describe the motion of the individual atoms, molecules, or components is overwhelming, and one must sacrifice complete knowledge of the small in order to get good knowledge of the all. No doubt, the connection between information and entropy would consume much of his attention as he explored the combinatoric possibilities of very large numbers of objects.

Aristotle would also be at home in the how discipline of statistics and estimation theory. As a skeptic of the universal validity of the Platonic forms, he well knew the individual differences in development and form of everything on the Earth. I picture him exclaiming, with satisfaction, “Yes! That’s how it should be done,” when he saw how statistical estimation theory is applied to turning noisy measurements into adequate conclusions. All the modern applications, from quality control to data fitting, would have caught his fancy.

Computing application of randomness and uncertainty would also meet with his approval. Numerical techniques, such as the Monte Carlo method, that use random samples as a way of ferreting out details of a process would hold a particular fascination for him and I imagine him spending hours playing with cellular automata simulations, such as Conway’s game of life. Most of all, the growing consensus in the artificial intelligence community that uncertainty is the vital component to producing machines that make rational decisions would have been pure delight for the Philosopher.

All that said, I think Aristotle would recoil in horror with that one, very large, component of randomness in modern life – the public’s approach to the scientific study.

Everywhere we go we are confronted by a headline, radio broadcast, news report, or tweet that says some study performed somewhere has determined something. Often times the public, even that component that is highly educated, consumes these studies with little or no heed. Most are unaware of how these studies are performed and how weak the inferences (i.e., conclusions) actually are.

If Aristotle were sitting amongst us when the news of the latest study were to break, he would be the first to caution us to be clear in our thinking and skeptical of the results. As a logician, he would know the difficulty in inferring a cause from the observed effect. And as a member of the Athenian polis, he would know better than we the various conflicting political agendas that drive people to various conclusions.

But the most grievous complaint that Aristotle would level against modern scientific studies would come only after he returned to the academy to see just how ‘the sausage is made’. He would be shocked and repulsed by the bastardization of the academic process. Not because of the political pressures and the conflicts of interest due to funding or the lack of it. No, he would be repulsed with how little regard our institutions of higher learning have for the concept of critical thinking.

To be a skeptic, to doubt a conclusion, to question and probe are no longer virtues (in the strictest Aristotelian sense). These traits brand their bearer not as a philosopher but as dullard or, worse, a denier. Graduate students, post docs, and professors worldwide are increasingly drifting further away from intellectual integrity. It is enough, they say, to find a positive or negative correlation in some data, perform a hypothesis test, and publish a result (see, e.g., the critiques leveled by Michael Starbird in Meaning from Data: Statistics Made Clear). Damn critical thinking and the discomfort that it brings they say. Aristotle, or course, would say different.

Uncategorized

Feb

2015

The Dangers of Being Equal

Conrad SchiffFebruary 7, 20150

One of the favorite themes of this blog is how language and reasoning affect each other, sometimes to the detriment of both. The overlap between logical reasoning and mathematical language is particularly ripe with possibilities of confusion because of the way certain concepts are used contextually. In an earlier post I discussed the seductive properties of the humble symbol $\infty$. A far more deadly symbol is the highly overloaded glyph described by two parallel horizontal lines – the equal sign ‘$=$’.

There are so many contextual uses of the equal sign that it is hard to know where to start. And each and every one of them is sinister to the untrained. Like some kind of bizarre initiation ritual, we subject students of all stripes to this ambiguous notation, and then we get frustrated when they don’t grasp the subtle distinctions and shaded nuances of meaning that we take for granted. This situation closely parallels the experiences many of us have had learning how to swim, or ride a bike, or ice-skate, or drive. Those of us who know how to do something, often can’t remember how hard it is to learn when you don’t know.

Of course, this situation is not unprecedented in language. A simple internet search using the string ‘word with the most definitions’ returns the following statement from Dictionary.com

“Set” has 464 definitions in the Oxford English Dictionary. “Run” runs a distant second, with 396. Rounding out the top ten are “go” with 368, “take” with 343, “stand” with 334, “get” with 289, “turn” with 288, “put” with 268, “fall” with 264, and “strike” with 250.

So, functionally overloading a word with numerous meanings, some of them very closely related and some of them quite distinct, is commonplace.

What makes the equal sign so frustrating is that it is mostly applied in highly technical fields where shades of meaning in thought can have large implication in outcomes. Consider the differences in meaning in the following equations:
\[ \pi = \frac{C}{D} = \frac{C}{2 r} \]
and
\[ \pi = \ln\left( i^{-2i} \right) \]
and
\[ \pi = \sqrt{6 \sum_{n=1}^{\infty} \frac{1}{n^2}} \; . \]

Each of them tells us something about the irrational number $\pi$, but in very different ways. In the first equation, we think of $\pi$ as the assigned value for the correlation between the diameter of a circle, $D$, and its circumference, $C$. This concept is purely geometric, and can be explored with rulers and compasses and pieces of paper. In some sense, it can even be regarded as a causative relation, telling us that, if we make a circle of radius $r$, then we are making an object whose perimeter is a distance $C$. The second equation is an identity in the purest sense of that term. It boldly states that one of the many disguises of $\pi$ is an algebraic expression involving the natural logarithm and the imaginary number . The final equation is neither an assignment nor an identity, but a set of instructions saying ‘if you want to know how to compute $\pi$ to some accuracy, then set up a computing process that takes the first integers and combines them in this funny way.’

The science of computing has long recognized that the usual ambiguity of human language would be inadequate for machine instructions. All programming languages to which I’ve been exposed clearly distinguish between the concepts of assignment, equivalence, and function definition. Using the pi equations above, one might express them in the programming languages Python and Maxima as

Pi equation	Python	Maxima
\[ \small \pi = \frac{C}{2r} \]	pi = C/(2*r)	pi : C/(2*r)
\[ \small \pi = \ln\left(i^{-2i}\right) \]	pi == ln(i*(-2i))	pi = ln(i*(-2i))
\[ \small \pi = \sqrt{6 \sum_{n=1}^{\infty} \frac{1}{n^2}} \]	def sum_sqr(n): sum = 0 for i in range(1,n+1): sum = sum + 1.0/(ii) return temp def approx_pi(n): sum = sum_sqr(n) return (6sum)**(0.5)	calc_pi(n) := block([sum], sum : 0, for i: 1 thru n do sum : sum + 1/(ii), ans : sqrt(6sum));

Pi equation

Python

Maxima

\[ \small \pi = \frac{C}{2r} \]

pi = C/(2*r)

pi : C/(2*r)

\[ \small \pi = \ln\left(i^{-2i}\right) \]

pi == ln(i**(-2*i))

pi = ln(i**(-2*i))

\[ \small \pi = \sqrt{6 \sum_{n=1}^{\infty} \frac{1}{n^2}} \]

def sum_sqr(n):
    sum = 0
    for i in range(1,n+1):
        sum = sum + 1.0/(i*i)
    return temp

def approx_pi(n):
    sum = sum_sqr(n)
    return (6*sum)**(0.5)

calc_pi(n) := 
block([sum],
 sum : 0,
 for i: 1 thru n do
 sum : sum + 1/(i*i),
 ans : sqrt(6*sum));

Note that, in each case, there is a clear syntactical difference between assignment (‘$=$’ or ‘$:$’), the conditional test for identity (‘$==$’ or ‘$=$’), and functional definition (‘def’ or ‘$:=$’). For anyone who’s been programming for some time, switching back and forth between these ideas of assignment, equivalence, and definition is relatively effortless, but for the beginner it is one of the hardest concepts he will have to learn.

The situation is even more complex in the physical sciences, for two primary reasons. First, and foremost, because man has been investigating the physical world longer than he has been writing computer programs. As a result, there has been more time for man to layer different meanings and subtle distinctions. Second, computers are intrinsically stupid and require a high degree of precision and clarity to function. A nice discussion of this last point can be found in the prologue of the book Functional Differential Geometry by Sussman and Wisdom.

As an example, let’s look at perhaps the most famous physical statement – Newton’s second law. Many people, even those lacking formal training in science, know that the expression of the law is ‘force equals mass times acceleration’ or, in mathematical terms,

\[ \vec F = m \vec a \; . \]

But what does the equal sign here mean? The concept of a force tells us that it is a vector quantity that transforms like a position vector. That means that a force relationship is the same in all frames. For example, the balancing of the pulls from three ropes tied to an object such that the object doesn’t move is an equilibrium condition that is independent of the frame in which it is expressed. An accelerating observer will make the same conclusion as an inertial observer. So the force on the left-hand side of $f=ma$ is geometric in its meaning.

On the other hand, we understand that the acceleration appearing on the right-hand side is kinematic. It describes an object’s motion and it’s the kind of thing measured with rulers and clocks. It is fundamentally frame dependent when described by an accelerating observer. Just imagine the visual perception of someone on a merry-go-round. The mass, which measures the object’s unwillingness to move under influence of a force, simply scales the acceleration and can be regarded as constant.

So how do we reconcile what the equal sign is meaning here? On one side is a geometric quantity as immutable and placid as a mountain. The other side is as ephemeral as rising mist or running water, flowing to and fro. How can they actually be equal?

Well, the answer is that the equal sign should be regarded as relating cause and effect. If we regard the force as known (e.g., Newton’s universal law of gravity), then the equal sign allows us to deduce the resulting motion once the force is applied. If we regard the acceleration as known (e.g., we film the motion and do a frame analysis), we can infer (via abductive reasoning) the force that caused it.

Clearly, the innocent-looking ‘$=$’ packs a lot more meaning than at first it appears. It is interesting to ponder why it is that the shortest of strings, such as ‘$\infty$’, or ‘set’, or ‘$=$’, have the longest and deepest of meanings. Maybe it reflects on the subtly of the human mind.

Uncategorized

Feb

2015

It Isn’t Elementary

Conrad SchiffJanuary 31, 20150

I suppose that this post grew out of a recent opportunity I had to re-watch the M*A*S*H television series. One particular episode, entitled The Light That Failed, finds the army surgeons and nurses of the 4077^th unit suffering a brutal Korean winter and desperately low on supplies. The supply truck soon arrives bearing a cargo more suited for a military unit in the Guam or the Philippines and not mainland Asia. As the troops are left wondering what they can do with mosquito netting, ice cream churns, and swim fins when the temperature is hovering around freezing, someone notices that one of the doctors, B J Hunnicutt, has received a very rare object – a book.

And this is not just any book, but a murder mystery by the famed writer Abigail Potterfield called the Rooster Crowed at Midnight. Either out of the goodness of his heart or out of a desire to end the constant nagging (probably both), B J decides to tear portions of the book out so that it can circulate throughout the camp. As the old saying goes, no good deed goes unpunished, and soon he discovers that the last page of the book, in which all is revealed, is missing. And thus begins the great debate as to who committed the murders, how they did it, and why.

The team comes up with many answers, all of which are first widely embraced as the solution and then scuttled when someone gives a counterexample that pokes a hole in the theory. They eventually place a long distance phone call to the author herself, now residing in Australia, to get the answer. But even this authoritative voice doesn’t quell the skepticism. Shortly after they ring off, Col. Potter, the commanding officer, points out that Abigail Porterfield’s own solution can’t be true. The episode closes with the delivery of the much-needed supplies, and some comic hijinks, but with no satisfactory explanation as to who the culprit was.

I was in middle school when I first saw that episode and it left a lasting impression on me. For many years I carried misconceptions about mystery stories and I wondered why anyone would ever read them. In particular, I held a very skewed idea about deductive reasoning and what can and cannot be determined from the evidence. With the perspective of years (really decades) I am both happy and disappointed to say that I was not alone in my poor understanding of what logic and reason are capable of achieving.

Let’s talk about deductive logic first. The basic idea behind deductive logic is that the conclusion is infallible if the premises are true. It is a strong approach to logic, since it argues from first principles that apply to a broad class or set of objects and, from those, narrows down a conclusion about a specific object. In a pictorial sense, deductive logic can be thought of in terms of Venn diagrams. If we want to conclude something about an object we simply need to know into what categories or classes this object falls and we will be able to exactly conclude something about it by noting where all of the various categories to which it belongs intersect.

Deductive reasoning is, unfortunately, also limited by the fact that we are not born with nor does anyone have the universe’s user manual that spells out in detail what attributes each object has and into what categories they may be grouped. So, the standard objections that are raised in deductive logic fall squarely on disagreements about the truth of one or more premises.

For example, the syllogism

All men are mammals
George is a man
Therefore George is a mammal

is a logically correct deduction, since the conclusion follows from the premises and it is true since the premises are true (or at least we regard them to be true). The syllogism

All men are mammals
George is a mammal
Therefore George is a man

is invalid, even though its premises are true, since it argues from the specific to the general. In contrast, the syllogism

All white birds are man-eaters
All swans are white birds
Therefore all swans are man-eaters

is perfectly valid, since the conclusion follows from the premises, but is not true since neither premise is true (or so I hope!).

All of this should be familiar. But what to make of this ‘deduction’ (B J’s syllogism) made by B J Hunnicutt in the episode:

Lord Cheevers was murdered in the locked library in Huntley Manor
Randolf had motive for murdering Lord Cheevers
Randolf played in Huntley Manor as a child
Randolf would have known if there were secret passages in Huntley Manor
Therefore Randolf was the murderer.

Is this really a deduction as he claimed? According to the novel, the first three premises are true. The fourth premise is certainly plausible but is not necessarily true. How then should we feel about the conclusion? What kind of logic is this if it is not deductive? Suppose we knew that Randolf was the murderer (e.g., we caught him in the act). What can we infer about the fourth premise?

Before answering these questions, consider what would happen if we were to modify the argument a bit to simplify the various possibilities. The syllogism (B J’s syllogism revised) now reads

Lord Cheevers was murdered in the locked library in Huntley Manor
Randolf had motive for murdering Lord Cheevers
Randolf played in Huntley Manor as a child
Randolf knew there was a secret passage from the study to the library
Randolf was seen entering the empty study just before the murder
Therefore Randolf was the murderer.

This argument is certainly a stronger one than the first one proffered, but it isn’t really conclusive. But, again, how should we feel about the conclusion? What kind of logic is this?

In both cases, we know that the conclusion is not iron-clad; that it doesn’t necessarily follow from the premises. But just like those fictional characters in M*A*S*H, we are often faced with the need to draw a conclusion from a set of premises that do not completely ‘nail down’ an unequivocal conclusion.

The type of logic that deals with uncertainty falls under the broad descriptions of inductive and abductive reasoning. Inductive reasoning allows us to draw a plausible conclusion ‘B’ from a set of premises ‘A’ without ‘B’ necessarily following from ‘A’. Abductive reasoning allows us to infer the premise ‘A’ based on our knowledge that outcome ‘B’ has occurred.

In the M*A*S*H examples given above, B J’s revised syllogism is an example of inductive reasoning. All the necessary ingredients are there for Randolf to have committed the crime but there is not enough evidence to inescapably conclude that he did. We can infer that Randolf is the killer but we can’t conclude that with certainty.

B J’s original syllogism is a lot more complicated. It involved elements of both inductive and abductive reasoning. If we believe Randolf is guilty, we might then try to establish that there were secret passages in Huntley Manor that connected the locked library to some other room in the mansion. We would then have to also establish, maybe through eyewitness testimony, that Randolf knew of the passages (e.g., an old servant recalls showing it to a young Randolf). Even still, all we would be doing is establishing the premises with more certainty. The conclusion of his guilt would still not necessarily follow. If, on the other hand, we knew that he was guilty, perhaps he was seen by someone looking into the library from outside, we might abductively infer that there was a secret passage and that Randolf knew of its existence.

So here it is, a great irony of life. It’s decades after I first watched an episode of M*A*S*H that turned me off of mystery stories for a long time, and I find myself using that episode as a model for discussing logic and reason. That, I can’t figure out.

Uncategorized

Jan

2015

Uncategorized

Post navigation