Monthly Archive: May 2015

Self-Reference and Paradoxes

The essence of the Gödel idea is to encode not just the facts but also the ‘facts about the facts’ of the formal system being examined within the framework of the system being examined.  This meta-mathematics technique allowed Gödel to prove simple facts like ‘2 + 2 = 4’ and hard facts like ‘not all true statements are axioms or are theorems – some are simply out of reach of the formal system to prove’ within the context of the system itself.  The hard facts come from the system talking about or referring to itself with its own language.

As astonishing as Godel’s theorem is, the concept of paradoxes within self-referential systems is actually a very common experience in natural language.  All of us have played at one time or another with odd sentences like ‘This sentence is false!’.  Examined from a strictly mechanical and logical vantage, how should that sentence be parsed?  If the sentence is true then it is lying to us.  If it is false, then it is sweetly and innocently telling us the truth.  This example of the liar’s paradox has been known since antiquity and variation of it have appeared throughout the ages in stories of all sorts.

Perhaps the most famous example comes from the original Star Trek television series in an episode entitled ‘I Mudd’. In this installment of the ongoing adventures of the starship Enterprise, an impish Captain Kirk defeats a colony of androids that hold him and his crew hostage by exploiting their inability to be meta.

There are actually host of paradoxes (or antinomies in the technical speak) that some dwerping around on the internet can uncover in just a handful of clicks.  They all arise when a formal system talks about itself in its own language and often their paradoxical nature arises when they talk about something of a negative nature.  The sentence ‘This sentence is true,’ is fine while ‘This sentence is false.’ is not.

Not all of the examples show up as either interesting but useless tricks of the spoken language or as formal encodings in mathematical logic.  One of the most interesting cases deals with libraries of either the brick and mortar variety or existing solely on hard drives and in RAM and FTP packets.

Consider for a moment that you’ve been given charge of a library.  Properly speaking, a library has two basic components: the books to read and a system to catalog and locate the books so that they can be read.  Now thinking about the books is no problem.  They are the atoms of the system and so can be examined separately or in groups or classes.  It is reasonable and natural to talk about a single book like ‘Moby Dick’ and to catalog this book along with all the other separate works that the library contains.  It is also reasonable and natural to talk about all books written by Herman Melville and to catalog them within a new list with a title perhaps with the name ‘Lists of works by H. Melville’.  A similar list can be made with grouping criterion selects books about the books by Melville.  This list would have a title like ‘List of critiques and reviews of the works by H. Melville’.

An obvious extension would be to construct something like the following list.

List of Author Critiques and Reviews:

  • List of critiques and reviews of H. Melville
  • List of critiques and reviews of J. R. R. Tolkien
  • List of critiques and reviews of U. Eco
  • List of critiques and reviews of R. Stout
  • List of critiques and reviews of G. K. Chesterton
  • List of critiques and reviews of A. Christie
  • ….

Since the lists are themselves written works what status do they have in the cataloging system?  Should there also be lists of lists?  If so, how deep should there construction go?   At some point won’t we arrive at lists that have to refer to themselves and what do we do when we reach that point?  Should the library catalog have a reference to itself as a written work?

Bertrand Russell wrestled with these questions in the context of set theory around the turn of the 20th century.  To continue on with the library example, Russell would label the ‘List of Author Critiques and Reviews’ as a normal set since it is a collection of things that doesn’t include itself.  He would also label as an abnormal set, any list that would have itself as a member – in this case a catalog (i.e. list) of all lists pertaining to the library.  General feeling suggests that the normal sets are well behaved but the abnormal sets are likely to cause problems.  So let’s just focus on the normal sets.  Russell asks the following question about the normal sets:  Is the set, R, of all normal sets, itself normal or abnormal?  If R is normal, then it must appear as a member in its own listing, thus making R abnormal.  Alternatively, if R is abnormal, it can’t be listed as a member within itself and, therefore, it must be normal.  No matter which way you start you are led to a contradiction.

The natural tendency is, at this point, to cry foul and to suggest that the whole thing is being drawn out to an absurd length.  Short and simple answers to each of the questions posed in the earlier paragraph come to mind with the application of a little common sense.  Lists should only be themselves cataloged if they are independent works that are distinct parts of the library.  The overall library catalog need not list itself because it primary function is to help the patron find all the other books, publications, and related works in the library.  If the patron can find the catalog, then there is no need to have it listed within itself.  One the other hand, if the patron cannot find the catalog, having it listed within itself serves no purpose – the patron will need something else to point him towards the catalog.

And as far as Russell and perfidious paradox is concerned, who cares?  This might be a matter to worry about if one is a stuffy logician who can’t get a date on a Saturday night but normal people (does this mean Russell and his kind are abnormal?) have better things to do with their lives than worry about such ridiculous ideas.

Despite these responses, or maybe because of them, we should care.  Application of common sense is actually quite sophisticated even if we are quite unaware of the subtleties involved.  In all of these common-sensical responses there is an implicit assumption about something above or outside.  If the patron can’t find the library catalog, well then that is what a librarian is for – to point the way to the catalog.  The librarian doesn’t need to be referred to or listed in the catalog.  He sits outside the system and can act as an entry point into the system.  If there is a paradox in set theory, not to worry, there are more important things than complete consistency in formal systems.

This is concept of sitting outside the system, is at the heart of the current differences between human intelligence and machine intelligence.  The later, codified by the formal rules of logic, can’t resolve these kinds of paradoxes precisely because they can’t step outside themselves like people can. And maybe they never will.

Gödel’s Theorem – General Notions

As I mentioned in last week’s blog, I was encouraged by Casti’s chapter on the Halting Theorem and questions of undecidability in logic and computing.  In fact, I was inspired enough that I resolved have another go at studying Gödel’s theorem.

To give some background, many years ago I came across ‘Gödel, Escher, and Bach: An Eternal Golden Braid’ by Douglas Hofstadter. While I appreciate that his work is considered a classic, I found it difficult and ponderous.   Its over 700 pages of popularized work did little to nothing to really sink home Gödel’s theorem and the connections to Turing and Church.  What’s more, Hofstadter seems to say (it’s difficult to tell exactly as he mixes and muddles many concepts) that Gödel’s work supports his ideas that consciousness can emerge from purely mechanistic means at the lowest level.  This point always seemed dodgy to me.  Especially since Hofstadter left me with the impression that Gödel’s theorem showed a fundamental lack in basic logic not an emergent property.

For this go around, I decided that smaller was better and picked up the slim work entitled ‘Gödel’s Theorem’ by Nagel and Newman.  What a difference a serious exposition makes.  Nagel and Newman present all the essential flavor and some of the machinery that Gödel used in a mere 102 pages.

Note that formal logic is not one of my strong suits and the intellectual terrain was rocky and difficult.  Nonetheless, Nagel and Newman’s basic program was laid out well and consisted of the following points.

To start, the whole thing was initiated by David Hilbert, whose Second Problem, challenged the mathematical community to provide absolute proof of the consistency of a formal system (specifically arithmetic) based solely on its structure.  This idea of absolute proof stands in contrast to a relative proof where the system is question is put into relation to another system, whose validity and consistency is accepted.  If the relation between the two is faithful, then the consistency of the second system carries over or is imparted to the first.

Hilbert was unsatisfied by the relative approach as it depended on some system being accepted at face value as being consistent and finding such a system was a tricky proposition.

The preferred way to implement an absolute proof is to start by stripping away all meaning from the system and to deal only with abstract symbols and a mechanistic way of manipulating these symbols using well-defined rules.  The example that Nagel and Newman present is the sentential calculus slightly adapted from the ‘Principia Mathematica’ by Whitehead and Russell.  The codification of the formal logic system depends on symbols that fall into two classes: variables and constant signs.  Variables, denoted by letters, stand for statements.  For example, ‘p’ could stand for ‘all punters kick the football far’.  There are six constant signs with the mapping

~ Not
$\vee$ Or
$\supset$ If… Then…
$\cdot$ And
( Left-hand punctuation
) Right-hand punctuation

The idea is then to map all content-laden statements, like ‘If either John or Tim are late then we will miss the bus!’, to formal statements like ( (J $\vee$ T ) $\supset$ B) with all meaning and additional fluff removed.  Two rules for manipulating the symbols, the Rule of Substitution and the Rule of Detachment, are adopted and four axioms are used as starting points.

In using this system, one has to sharpen one’s thinking to be able to distinguish between statements in the system (mathematical statements like ‘2 > 1’) from statements about the system (meta-mathematical statements like ‘2>1’ is true or that the ‘>’ symbol is an infix operator).  One must also be careful to note the subtle differences between the symbol ‘0’, the number 0, and the concept of zero meaning nothing.

The advantage of this approach is that the proofs are cleaner, especially when there are many symbols.  The disadvantage is that it takes time and effort to be able to work in this language.

The consistency of the formal system is shown when there is at least one formal statement (or formula) that cannot be derived from the axioms.  The reason for this is complicated and I don’t have a good grasp on it but it goes something like this.  The following formula ‘p $\supset$ (~ p $\supset$ q)’ can be derived in the sentential calculus.  If the statements S and ~S are both deducible then any statement you like can be derived from the axioms (via the Rules of Substitution and Detachment) and the system is clearly inconsistent.  In the words of Nagel and Newman:

‘The task, therefore, is to show that there is at least one formula that cannot be derived from the axioms.’ [p51]

For the sentential calculus, they point out that the formula ‘p $\vee$ q’ fits the bill since it is a formula, it doesn’t follow from the axiom.  Thus this system is consistent. Note that there is no truth statement attached to this formula.  The observation simply means that ‘p $\vee$ q’ can’t be obtained from the axioms by a mechanical manipulation.  They present this argument in their chapter titled ‘An Example of a Successful Absolute Proof of Consistency’.

Well despite that lofty achievement, they go on to show how Gödel took a similar approach and mostly ended any hope for a proof of absolute consistency in formal systems with a great deal more complexity.  Gödel used as his symbols the numerals and as his rules the basic operations of arithmetic and, in particular, the arithmetic of prime numbers. Using an ingenious mapping from of the variables and constant symbols to numbers, he not only could encode the structure of the formal system itself, he could also encode statements about the formal system as well (meta-mathematics).  In the language of Hofstadter, these are self-referential statements, although Nagel and Newman don’t use this term.

Using this approach, Gödel was able to prove that there is no absolute proof of consistency.  At best, the system can say about itself that it is either incomplete or inconsistent.  If the system is incomplete, there are true statements that are not part of the axioms and that cannot be derived from them.  Enlarging the set of axioms to include them doesn’t work since they presence begets new unprovable truths. If the system is inconsistent, then everything is ‘true’ as discussed above.

Nagel and Newman leave the reader with come final thoughts that are worth contemplation.  On the hope that Hilbert’s program can be successfully completed they have this to say

‘These conclusions show that the prospect of finding for every deductive system an absolute proof of consistency that satisfies the finitistic requirement’s of Hilbert’s proposal, though not logically impossible, is most unlikely.’

They also comment on the possibility of proving consistency from an outside-looking-in approach using meta-mathematical techniques when they say

‘[W]hether an all-inclusive definition of mathematical or logical truth can be devised, and whether, as Gödel himself appears to believe, only a thoroughgoing philosophical ‘realism’ of the ancient Platonic type can supply an adequate definition, are problems still under debate…’

Finally, they have this to say about artificial intelligence (although that term wasn’t in vogue at the time they published

‘[T]he brain appears to embody a structure of rules of operation which is far more powerful than the structure of currently conceived artificial machines.  There is no immediate prospect of replacing the human mind by robots.’

And there you have it. A whirlwind tour of Gödel’s theorem with a surprise appearance of the philosophy from antiquity and the ideas about artificial intelligence.

Turing, Gödel, and the Universe

Something about the book ‘Five Golden Rules: Great Theories of 20th-Century Mathematics and Why They Matter’ by John L. Casti caught my eye the other day at the library.  On a whim, I signed the book out and started reading.  Living up to the promise of the title, the book had five chapters, each one devoted to one of the great math theories of 20th century.  All in all, I had been exposed, at least superficially, to all the topics covered so I wasn’t sure what I would get out of the book other than some additional insight into material I already knew or confirmation of my wisdom in staying away from topics that I didn’t.

Anyway, I am quite glad that the mechanism of providence pointed me at this book because the connections that Casti draws are worth thinking about.  None of these connections was as profound for me as the deep link that he explores in Chapter 4 entitled ‘The Halting Theorem (Theory of Computation)’.

In this chapter, Casti first presents the concept of the universal Turing machine (UTM) as a mechanism for attacking the question of Decision Problem (or Entscheidungsproblem) proposed by David Hilbert in 1928.  Casti then couples this presentation with a discussion of the famous Gödel Incompleteness Theorem.

I’m simply a beginner in these fields and Casti omits important details and glosses over a variety of things to help the novice.  From this point-of-view, I can’t recommend his work.  But I am grateful and excited about the perspective he provided by making this linkage.

To understand my excitement, let me first try to fill in some of the background as best as I can.

Hilbert’s Decision Problem asks the following question.  Given a set of input data and an algorithm for manipulating said data, is there a way to know if the algorithm will be able to make a yes/no decision about the input.  For example, if the input data is a set of axioms in a logical system and some corresponding assertion in the same system and the algorithm is a logical formalism, such as the classical syllogism, will the algorithm be able to prove or disprove the assertion as either true (yes) or false (no)?

While the Decision Problem might seem straightforward enough to talk about it at a seminar, when it comes to actually tackling the question there are some vague conceptions that need further elaboration.  Specifically, what does the term ‘algorithm’ really mean and what constitutes a workable algorithm seem to have been the murkiest part when the problem was first posed.

Turing chose to clarify the algorithm concept by the invention of the machine which bears his name.  The UTM is a basic model of computation that is often used to understand the theory behind computer programming. It is also the avenue that allowed Turing to a way to tackle Hilbert’s Decision Problem.  The basic ingredients for a UTM are a tape or strip of paper, divided into squares, a set of symbols (usually ‘0’ and ‘1’) that can be written into the squares, and a box that has a mechanism for moving the strip left or right, a head that reads from and writes to the strip and that contains an internal state that allows the UTM to select an action from a pre-defined set, given the internal state and current symbol being read.  A typical graphical representation of an UTM looks like

Turing_machine

The UTM can represent the input to the decision process by a particular pattern of symbols on the strip.  It can also implement the step associated with an algorithm by encoding these also to symbols on the tape.  So the entire nature of the Decision Problem comes down to the question as to whether once the UTM starts if it will ever halt; hence the name of the name of the chapter.

Kurt Gödel took a different approach to answering the Decision Problem.  He developed a way to map any formal system to a set of numbers, called Gödel numbers.  Then by formally manipulating these numbers, he was in position to try to answer Hilbert’s challenge.

Now it is reasonable to suppose that all not every question has a yes or no answer.  Questions about feeling, opinion, or taste spring to mind.  But I think that most of us expect that questions of mathematics and logic and programming have answers and that we should be able to figure them out.  The remarkable thing about the work of both Turing and Gödel is that there are cases where the formal logical system simply can’t decide.

In the language of Turing, there are computational problems for which there is no knowing beforehand if the algorithm will terminate with an answer.  In the language of Gödel, no matter is done to a logical system in terms of refining existing axioms or adopting new ones, there will always be truths that are unprovable.

I was quite aware of Godel’s theorem and even wrestled with it for a while when I was concerned about its implications to physical systems.  Eventually, I decided that while man’s logic may be limited, nature didn’t need to worry about it because she could make decisions unencumbered by our shortcomings.

I was also quite aware that Turing’s machine was used fruitfully as a model computation.  What I was unaware of until reading Casti’s book was the parallel between the conclusions of Gödel and of Turing.

And here we arrive at the source of my excitement.  As I’ve already said, I remain convinced that Nature can decide – that is to say that Nature is free of the issues discussed above.  And yet, in some capacity the Universe is an enormous Turing machine.

Universe_as_Turing

So why does Nature always make a decision and why does the computation of trajectories, and the motion of waves, and evolution of quantum bit of stuff always reach a decision?  It may not be the decision we expect or anticipate, but it seems clear that a decision is reached.  And so, by looking at how the Universe as a Turing Machine (UaaTM) differs from a Universal Turing Machine, one may learn more about both.  This is the exciting new idea that has occurred to me and one which I will be exploring in the coming months.

States, Abstraction, and Implicit Objects

This post grew out of a conversation in which I recently participated, about the correct way to computationally model a system.  The conversation covered the usual strategies of identifying the system’s states and how to abstract the appropriate objects to hold or contain the states based on the question at hand.  But what was most interesting were the arguments put forward about the importance of recognizing not only the explicit objects used in the modeling but also the implicit objects that were ignored.

By way of a general definition, an implicit object is an object whose existence is needed in the computer modeling of the system in question but one that is not represented in code as either a real or virtual class.  I don’t think this definition is used in computer science or programming books but it should be.  This is a point that is almost entirely ignored in the field but is vital for a correct understanding of many systems.

A simple example will go a long way to clarify the definition and to also raise awareness that implicit objects are used more often than people realize.  The system to be modeled is the motion of traffic on a highway.  For simplicity, we’ll focus on two cars, each traveling along a straight stretch of road.  Generalizations to more complicated situations should be obvious but don’t add anything to the discussion.  A picture of the situation is

cars_on_road

Here we have a blue car and a green car moving to the right as denoted by the matching colored arrows.  The typical abstraction at this point is to create a car class with appropriate attributes and member functions for the questions we want to answer.  Let’s suppose that we are interested in modeling (with an eye towards prevention) the collision between the two cars.  What ingredients are needed to perform this modeling?

A collision between two objects occurs, by definition, when they try to occupy the same space at the same time.  Unpacking this sentence leads us to the notion of position for each car as a function of time and a measure of the physical extent of each car.  Collision occurs when the relative position between the centers of the two cars is such that their physical extents overlap.

collision

Now, real life can be rather complicated, and this situation is no different.  The physical shape of a car is generally hard to model with many faces, facets, and extrusions.  And the position of the car is a function of what the driver perceives, how the driver reacts, and how much acceleration or deceleration the driver imposes.

In time-honored fashion, we will idealize the cars as having a simple two-dimensional rectangular shape as shown in the figure, and will also assume that the cars have a simple control law (which we don’t need to specify in detail here except to say that it depends on the position, velocity, and time) that gives the velocity of the car at any instant of time.  With these simplifying assumptions, the trajectory of the car is completely specified by giving its position at some initial time and then using the control law to update the velocity at each time step.  The position is then updated using the simple formula ${\vec x}(t+dt) = {\vec x}(t) + {\vec v} dt$.

With all these data specified, we might arrive at a class definition for the car object that looks like:

car_class

At this point, we’ve already encountered our first two implicit objects.

The first one is a coordinate frame, comprised of an origin and an orientation of the coordinate axes, which tells us what the position of the object is measured against.  This coordinate frame is an object in its own right and, in many circles, worthy of study.  Here it is simply important to acknowledge that it exists and that the width and length parameters need to be defined consistently with regards to it.  This is an important point, since a change in coordinate frame orientation will change the test (to be described later) for the geometric condition of overlap (i.e., collision).

The second implicit object is much more subtle and it took the discovery of the principles behind special relativity to drive that point home.  Both cars share a common clock to describe their motion.  That is to say that their velocities are only meaningfully compared if their clocks are synchronized and ticking at the same speed.  The role of the implicit clock is usually carried out by the main program of the simulation but, since each object holds its own time, care must be exercised to keep them synchronized.  It is possible to make this common-clock assumption stronger by eliminating time from the list of member data, but that approach also has its drawbacks, as will be touched on below.

The final question is, how do we determine if the cars collide somewhere during the simulation?  There are two possible approaches.  The first is that we give each car a new member function, called detect_collision, which takes as arguments the position of the other car and its size.  This is not an unreasonable approach for simple situations, but it quickly becomes unwieldy should we later want to add fidelity to the simulation.  For example, if we wanted to allow for the body of the car changing its orientation as it changes lanes, then the number of parameters that we have to hand into the detect_collision function has to increase.  A more robust way to do this is to perform this computation at the main workspace in the simulation.  If this method is chosen, we’ve now introduced a third implicit object.  It’s not quite clear from the context what to call this object, but it’s function is absolutely clear – it communicates to the world the message that the two cars have collided.

This last notion may seem like a trivial observation leading to an unnecessary complication but consider the case where each car is equipped with some sensor that warns of an eminent collision.  Remote sensing requires some model of the medium between the sensor and thing being sensed.  If the collision detector uses sound, then the air between the cars must be modeled as an elastic medium, with a finite propagation speed of the signal, and variations in this medium due to temperature, humidity, and the like may need also need to be modeled.  If the collision detector uses electromagnetic waves (radar or laser ranging), then the index of refraction and dispersion properties of the air become the key parameters that again may depend on the thermodynamics of the atmosphere.  In either case, this communicating medium now must be promoted from an implicit object to an explicit one and the times of the two objects take on new meaning as we need to track the transmit and receive times of the signals sent between them.

The lesson, I hope, is clear.  It is just as important to pay attention to the implicit objects in a simulation as it is to the explicit ones.  There is a parallel here with the spoken word – where it is often as important to pay attention to what was left unsaid as it is to what was said.