Latest Posts

Milling About

Arguing about cause and effect is a difficult enterprise even in the best of circumstances.  Rarely is it as clean, or as boring, as introductory texts on logic make it out to be.  Examples of simple cause and effect – it is raining and therefore the ground is getting wet – are neither controversial nor are they fun.  Most everyone agrees on the matter and that’s that.  But arguments over new things undiscovered or unseen are one of the best things going.  Who isn’t thrilled by the prospect of figuring something out that none else have done?

Interestingly enough, almost all of us use standard methods for arguing from effect to cause.  Most likely, we’ve all learned these methods by first watching others apply them and then by next jumping into the game and trying out the methods ourselves.  As will be discussed in more detail below, these methods worm themselves into almost every aspect of life; often without our notice.  They form the backbone of most editorials, advertisements, and dinner-table arguments.  And despite their anonymity, they do have a name:  Mill’s Methods.

First codified by John Stuart Mill in his book A System of Logic (1843), these methods, which no doubt date back to antiquity, go by the obscure names of:

  • Method of Agreement
  • Method of Difference
  • Joint Method
  • Method of Concomitant Variation
  • Method of Residues

As unfamiliar as these terms may be to the ear, their use and application is familiar to the thinking of anyone who has ever tried to figure out what food at dinner last night didn’t agree with them or some similar scenario.  Indeed, many of the examples presented in the community deal with food and indigestion.  (In fact, application of Mill’s Methods to epidemiology is the core component of the TV show House).

Not wanting to dwell on food related illness (since it is done to death in the literature), I propose illustrating the methods using a more interesting question faced by parents and teachers across the country:  what factors contribute to good grades.

Consider a group of 10 students from a local school.  After circulating a questionnaire, their teacher compiles a table listing various activities they pursued and the study method they used (written homework or online quizzes).  The teacher wants to see what caused half the students to pass where the other half failed and so he looks for a factor that is both necessary and sufficient to explain why the first group passed.  He suspects that those students who play video games have been poisoned and that students who avoid this digital scourge are the ones that pass.  But being a man of integrity he decides to pursue the answer with an open mind.  To do this he employs Mill’s Methods in succession.

To apply the Method of Agreement, he looks to see what features all the passing students have in common.  He starts by looking at a subset group composed of Amy, Carl and Walter.

Student Recreation Musical Instrument Teaching Technique Exercise (Pass/Fail)
Amy Video games None Written Homework None Pass
Carl Drawing Piano Written Homework Swimming Pass
Walter Blogging Clarinet Written Homework Yoga Pass

He notices that these 3 students have nothing in common in terms of their recreational pursuits, they don’t play the same musical instrument (in fact Amy doesn’t play any), that that they don’t all engage in the same exercise.  But all three of them were taught using the same technique of written homework.  He concludes that there is very likely possibility that written homework is the cause of their success in the class.

To apply the Method of Difference, he then looks for a pair of students, one who has passed and one who has failed, that have almost everything in common.  Any difference between them being a strong indication that it is the cause of success/failure.  He finds such a pair in Ben and Stacey.

Student Recreation Musical Instrument Teaching Technique Exercise (Pass/Fail)
Ben Blogging Guitar Online Quizzes Running Fail
Stacey Blogging Guitar Written Homework Running Pass

Both of them enjoy blogging, play guitar, and exercise by running.  The difference between them seems to be that Stacey was required to do written homework while Ben was required to do online quizzes.  He concludes that there is a very strong possibility that written homework leads to good grades.

The Joint Method marries the two approaches together looking for support that this one factor, the teaching technique, is the primary cause of classroom success.  To this end, our teacher combines all the students into the following table

Student Recreation Musical Instrument Teaching Technique Exercise (Pass/Fail)
Amy Video games None Written Homework None Pass
Ben Blogging Guitar Online Quizzes Running Fail
Carl Drawing Piano Written Homework Swimming Pass
Diane Blogging None Online Quizzes Yoga Fail
Ethan Drawing Piano Online Quizzes None Fail
Vanda Video games Guitar Written Homework Yoga Pass
Walter Blogging Clarinet Written Homework Yoga Pass
Thomas Video games Clarinet Online Quizzes None Fail
Ursula Drawing None Online Quizzes Swimming Fail
Stacey Blogging Guitar Written Homework Running Pass

and he notices that in each case, the only factor that correlates with passing or failing, is written homework or online quizzes, respectively.  Despite his preconception that video games were dangerous he finds that two of the three students (Amy and Vanda) who play actually passed the course.

The final two of Mill’s Methods deal with matters of degree.  They help to answer how much written homework really helps and if there is another factor that might contribute to success.  To this end the teacher modifies the table to list the hours each student spends completing their written homework and their GPA.

Student Recreation Musical Instrument Homework Hours per Week Exercise GPA
Amy Video games None 12 None 3.8
Carl Drawing Piano 7 Swimming 3.5
Vanda Video games Guitar 5 Yoga 3.2
Walter Blogging Clarinet 10 Yoga 4.0
Stacey Blogging Guitar 8 Running 3.6

 

In the case of the Method of Concomitant Variation, the teacher notices that in there is a strong correlation between the number of hours of homework each week and the student’s GPA.  Vanda does the least amount of homework each week and she has the lowest GPA.  Carl and Stacey are in the middle in terms of time invested in homework and so is their GPA.  And finally, Amy and Walter have the highest time spent on homework and the highest GPA.  This behavior is a strong indication that requiring students to complete written homework causes students to have high GPAs.

The Method of Residues helps to point towards additional factors that have not been considered but which contribute to the cause and effect relationship.  In the case of the two top performing students, the teacher notices that although Amy spends the greatest amount of time on homework each week she doesn’t have the highest GPA.  Of course, this minor difference between her and Walter might be explained in many ways (e.g. her courses are harder).  But let’s suppose that the table exhaustively lists all the relevant attributes and that Walter and Amy are in the same class in elementary school so that they see all the same material and are assigned the same homework.  Our teacher might be inclined to conclude that either playing a musical instrument or exercising might be the cause of the remaining difference.  This method can also be applied to the situation were the differences are a matter of quality rather than quantity.

One of the most famous examples of the application of the Method of Residues was to the motion of the planet Mercury.  After all the known contributions of Mercury’s orbit had been accounted by astronomers of the late 1800s there was still a remaining 53 arcseconds/century of precession that could be ascribed to any particular cause.  This difference, though small, helped to spur Einstein to create the theory of General Relativity.

While the discussion above was both illustrating and interesting it is hardly the only nor primary application of Mill’s Methods.  As Prof. Dave Beisecker points out on his discussion, Mill’s Methods are used in all sorts of persuasive arguments about products, policies, and the like.  I would encourage the reader to visit his page as some of his examples are both educational and laugh-out-loud funny.  Consider this gem used to illustrate how the Method of Difference is used in advertising

Jiffy Squid fries are the best, and you know what the secret is? While the recipe, the potatoes, and everything else is the same as at Burger Thing, the fries at Jiffy Squid are cooked in oil that has been through the crankcase of a ’57 Desoto. The result – mmm-mmm fries!

– Dave Beisecker

Of course real life is never so clear cut as the contrived examples seem to imply.  But that’s what makes it so fun.  Putting one’s skill to the test to find what causes what can lead to amazing discoveries and brings out the best in us.

Logic and Cause & Effect

There is a famous scene in the movie All the President’s Men where Bob Woodward (played by Robert Redford) and Carl Bernstein (played by Dustin Hoffman) are struggling to see if they have enough facts to continue to publish their stories about Watergate.  As they are eating at a local fast food joint they discuss what they can ‘deduce’ from what they already know.  Woodward essentially says that they can infer what they need circumstantially.  He defends this position with a discussion about logic and cause and effect.

In essence, what he says is that if you look out your window before you go to bed one fine, cold winter night and the ground is snow-free and you wake up the next morning to see a winter wonderland of white all around you can conclude that it snowed overnight, even if you didn’t see it.  Simple application of cause and effect, it seems.  But arguing for causes by observing effects can be tricky since causes are often elusive even though their effects are quite observable.

Consider the basic demonstration of gravity. Hold an object up and the let go.  The effect is clearly observable; the object falls to the ground.  The cause, on the other hand is quite mysterious.  Just what is gravity?  None of us can feel it or see it or sense it in any way.  The word gravity is just a name we give to this cause.  The genius of Newton gave us a way to match the nature of the effect (distance and speed as a function of time) to the geometry of the situation (where the object is in relation to the Earth) but he didn’t really explain it.  Neither do the modern explanations of gravity as spacetime bending or the exchange of gravitons do anything other than to explain one mysterious concept in terms of others (which, granted, provide a more complete way of predicting the outcome).

The situation becomes substantially more complex when several causes can be present, all of which result in a particular effect.  It’s no wonder that misapplications of cause and effect abound even outside the realms of science.  Sometimes this leads to amusingly bad arguments; sometimes the results are more frustrating and thorny.

At the heart of the logic of cause and effect is the idea that if a cause, call it $P$, is present, then its effect, call it $Q$, must follow.  Symbolically, this linkage is denoted by $P \rightarrow Q$, which is the familiar construct of propositional logic that was discussed in this column in earlier postings.

The question is what can be said about $P$ based on an observation of $Q$ or vice versa.  The four options are:

  • Observe P and infer something about Q
  • Fail to observe P and infer something about Q
  • Fail to observe Q and infer something about P
  • Observe Q and infer something about P

The usual names for these options and their logical validity are summarized symbolically in the following table.

Propositional If Then Arguments

This innocent looking table can lull one into thinking that good arguments are easily sifted from the bad ones and this expectation would be correct – for simple examples.  Bo Bennett’s website, Logically Fallacious, has hundreds of examples of all sorts of fallacies both formal and informal.  These have been specially crafted to make detection of the fallacy nice and easy.  Often they are also fun to read and some are laugh-out-loud funny.

I don’t have nearly the same devotion to cataloging fallacies nor do I have the same sense of humor as Dr. Bennett, so I will stick first to some easy examples of the meteorological type with which we began above.  I’ll then look at just where things go awry and I’ll emphasize that, for things to move forward, we often have to ‘embrace the fallacies’, specifically affirming the consequent.

Suppose that you look out the window and you see that it is raining.  Literally, you are seeing water come from the sky and hit the earth.  Furthermore, you can readily see that when the water strikes it wets the ground.  Through observation, you’ve determined the causal link that ‘if it is raining then the ground is wet’.  The clauses ‘it is raining’ and ‘the ground is wet’ abbreviate to $P$ and $Q$, respectively.

With this fact established, you no longer need to directly observe the ground during rainfall to know it is wet.  If someone else, say a morning weather report, tells you that “it is currently raining in the metropolitan area” then you can immediately deduce that the ground is wet.   This valid deduction is the content of Modus Ponens.

But what can you do with other observations?  That is the key point since often, as argued above, observing the cause is very difficult (more on this in a bit).

Well, Modus Tollens tells us that if we were to observe that the ground is dry then we can immediately deduce that it is not raining.  I personally use this ‘trick’ whenever the day is hazy or foggy and the rain, if there is any, is too fine to be seen or heard.  If the ground looks dry then it isn’t raining.

However, if the ground is wet I am stuck and my deductive chain fails.  The reason for this is obviously the fact that the ground could be wet for a bunch of other reasons.  Maybe it snowed overnight but melted before I made my observation.  More probably, my neighbors may have watered their lawns in the early morning hours with those automatic, beneath-the-ground systems that have proved most popular in my neighborhood.  Observing that the ground is wet (Affirming the Consequent) merely tells me that one of several possibilities was the cause.

This same line of argument also leads us to reject the conclusion that the ground is not wet when I observe that it is not raining (Denying the Antecedent).  Again, several causes can lead to the same effect so the absence of a single cause does not support the conclusion that the effect is also absent.

Now, all of this sounds neat and tidy but the world is never so clear cut.  Linkage between cause and effect is difficult to observe completely and, in many cases, the cause can’t be observed directly.  For example, if I observe that the ground is wet but I can’t see it raining, I may still be able to conclude that it is even though I would be committing the formal error of Affirming the Consequent.  I would do this in a probabilistic fashion where I would argue that the other causes are very unlikely.  Perhaps the street in front of my house is wet in a way that would be unlikely due to water sprinklers alone.  Or maybe it is summer and the likelihood of snow is small.  As compelling as these arguments may be, a water main break out of my line of sight may explain everything.

Despite the fact that this rainfall example is contrived, it captures the essence of the natural sciences.  Medicine, in particular, is plagued by this kind of uncertainty.  A patient comes in with an effect, say a fever, and the cause is unknown.  Literally hundreds of causes exist and one of the tasks set before the doctor is to infer the likeliest cause and start there.  Each and every day, medical practitioners violate Affirming the Consequent – they need to, since inaction is often far more dangerous than action based on a wrong conclusion.  The key is not making the inference but rather having the wisdom to know when to abandon that inference and to make a new one.

This type of statistical reasoning is not limited to the STEM professions.  Each of us deals with unseen or poorly observed causes and effects that may have many parents each and every day.  Each of us must embrace formally fallacious reasoning just to be able to move forward.

So, the next time you see someone employ one of these fallacies in a manner obvious to you and are inclined to reject their argument just remember two things.  First he may not be able to see cause and effect as clearly as you do.  Second, perhaps he knows something you don’t and it is you who are not seeing cause and effect clearly.  In either case, certainly reject abject certainty in the conclusion but try to have compassion for the arguer.  After all, making conclusions about cause and effect, even the cause and effect of a bad argument, is a difficult job.

Proposition Calculus – Part 3: Hunt the Wumpus

This post completes a brief three-part study of propositional calculus. In the first post, the basic notions, symbols and rules for this logical system were introduced. The second post dealt with how to prove the validity of an argument. In this context, an argument is a formal manipulation of initial assumptions into a final conclusion using the 10 inference rules. Proofs are highly algorithmic, although usually tricky, and computer programs exist that can workout proofs and determine validity. This post will be devoted to discussion some of the applications of such a decision/inference engine.

The idea of an inference engine making decisions akin to what a human would do is the core notion of an expert system. Expert systems try to encapsulate the problem-solving or decision-making techniques of human experts through the interaction of the inference engine with the knowledge base. Rather than relying on a defined procedural algorithm for moving data back-and-forth, the expert system operators on a set of initial premises using its inference engine to link these premises to facts in its knowledge base. These linked facts then acts as the new premises and the chain continues until the goal is met or the system declares that it has had enough.

There are two different approaches to building the inferential links: forward-chaining and backward chaining.

Forward-chaining works much like syllogistic logic starting from the premises and working forward to the conclusion. The example in Wikipedia is particularly nice in that it demonstrates the method using, perhaps, the most famous syllogism in logic. The knowledge base would have a rule reading something like:

\[ Man(x) \implies Mortal(x) \]

This rule means that anytime the object $x$ is identified as belonging to the set $Man$ it can be inferred that the same object belongs to the set $Mortal$. The inference engine, when given

\[ x = Socrates \; \]

immediately can determine that Socrates is mortal, thus rediscovering in cyberspace the classic syllogism

All men are mortal
Socrates is a man
Therefore Socrates is mortal

Backwards-chaining starts with the conclusion and tries to find a path that gets it all the way back to the beginning. Backwards-chaining is often used, to comedic effect, in certain stories in which in order to get something the lead character wants (say Z) he must first acquire Y. But acquisition of Y requires first getting X and so on. Of course, the humor comes when the chain is almost complete only to break apart, link-by-link, at the last moment.

In both cases, the inference rule that is primarily employed is the Modus Ponens which asserts

\[ P \rightarrow Q, P \implies Q \; . \]

It is easy to see that the rise of the expert system explains the importance of propositional calculus.

Now the formal instance of Modus Ponens given above is deceptively simple-looking but can be quite complex when applied to real-world problems. As a result, the actual track record of expert systems is mixed. Some applications have failed (e.g. Hearsay for speech-recognition) whereas other applications have been quite successful (e.g. MYCIN or CADUCEUS for medical diagnoses).

Expert systems were quite in fashion in the late 1980s and early 1990s but their usage has receded from view. It appears that nowadays, they tend to find a place in and amongst other AI tools rather than being standalone systems.

Despite their relegated position in modern AI, expert systems and propositional calculus are still investigated today. One very fun and challenging application is to produce an agent that is capable of playing the game Hunt the Wumpus.

In Hunt the Wumpus, the player finds himself trapped in a dungeon. He can only see the contents of his room and the exits that issue forth. Hidden in the dungeon are pits that will plunge him to a painful death and a Wumpus, a bizarre beast, that will eat the player. Both of these threats are certain doom if the player enters the room in which they are contained. Also hidden in the dungeon is a pile of gold. The player’s goal is to find the gold and kill the Wumpus, thus allowing the player to escape with the riches.

Hunt the Wumpus

In order for the player to navigate the dungeon, he must depend on his senses of smell and touch. When he is in a room adjacent to a pit, he senses a slight breeze blowing but he cannot tell from where. When he is in a room that is within two of the Wumpus he smells a stench that alerts him that the Wumpus is near. Armed with only one arrow, the player must use his perceptions to deduce where the Wumpus is and, firing from the safety of the adjacent room, must kill the foul beast. Note that this is a variant of the original game – as far as I can tell, all versions are variants and everyone who implements the game tinkers to some minor degree with the game play.

As the player moves around the board, he increases his knowledge base and using propositional calculus, he attempts to win the gold, kill the Wumpus, and escape to tell his tell.

A nice example of the game play available on the TI-99/4A can be found on YouTube. In the following example, the human player has revealed a large percentage of the board.

Wumpus_Case_1_before

The green colors to the rooms denote the ‘breeze observation’ while the rust color denotes the ‘stench observation’. In this version, there are also bats that, when triggered, transport the player to a random location. These bats, however, rarely seem to activiate and I’ll ignore them.

With a little thought, it is possible to see that there is a pit along the 2nd row from the top and the 5th column from the left. Likewise, the Wumpus in the 6th row and 5th column. The player recognized this and shot appropriately, killing the Wumpus and saving his skin. The following figure shows the game board revealed after th success.

Wumpus_Case_1_after

This sort of deduction is relatively easy for the human player once a bit of experience is gained but not so much for the computer agent. As the video below demonstrates, even a relatively simple configuration (without bats and variable topology of the rooms) can be quite challenging.

And so I’ll close with a few final remarks. First, the propositional calculus is a subject alive and well in the AI community. It provides an underpinning for many decisions made by computer agents. Second, the skill of such systems has yet to rival human thought. Maybe it never will… time (and, perhaps the Wumpus) will tell.

Propositional Calculus – Part 2: Proofs

The last column introduced the basic structure of the propositional logic consisting of 10 inference rules that manipulated 5 logical symbols: negation $\neg$, conjunction (and) $\land$, disjunction (or) $\lor$, conditional $\rightarrow$, and biconditional $\leftrightarrow$.

The point of the system was to formalize the rules of reasoning so that two difficulties of natural language were eliminated. The first difficulty is that natural language premises (e.g. it is raining) are unwieldy compared to single symbols (e.g. R). The second, and more problematic, difficulty is the ambiguous aspect of natural language.

Suppose that a friend says “I am interested in logic and programming. Could you help me find good websites?” You try to help by doing a web search. How should you frame your search string? Does you friend mean that he is interested in getting hits that deal either with logic or with programming? Or perhaps he only wants sites that discuss both concepts in the same posting. With this ambiguity present, how should the search engine interpret the simple search string ‘logic programming’?

By formalizing and abstracting the objects and rules away from natural language, propositional calculus seeks a clean way of reasoning with algorithmic precision. The line of reasons that infers a set of well-formed formulas (wffs) from an initial set is called, appropriately enough, a proof. The flow of the proof is improved by abbreviating the rules of inference with a 2 or 3 character abbreviation:

  1. Modus ponens = MP
  2. Negation Elimination = $E\neg$
  3. Conjunction Introduction = $I\land$
  4. Conjunction Elimination = $E\land$
  5. Disjunction Introduction = $I\lor$
  6. Disjunction Elimination = $E\lor$
  7. Biconditional Introduction = $I\leftrightarrow$
  8. Biconditional Elimination = $E\leftrightarrow$
  9. Conditional Proof = CP
  10. Reductio ad Absurdum = RAA

I don’t find any proofs, except the most elementary ones, particularly easy to figure out and, I suppose, that is the point. If it was easy, there would be no need for the formalism.

As an example (taken from problem 3.31 of Theory and Problems of Logic by Nolt and Rohatyn), consider how to prove the following argument.

$ \neg( P \land Q) \rightarrow \neg P \lor \neg Q $

To recast that in natural language, let P stand for the sentence ‘it is raining’ and Q for ‘the streets are wet’. The argument above says that ‘if it is not true that it is raining and the streets are wet then either it is not raining or the streets are not wet’. This translation is not only a mouthful but, it may seem to some that it is also incomplete. In natural language, the ‘or’ in consequent can be interpreted as being an exclusive or – hat either it is not raining or that the streets are not wet but not both. However, the ‘or’ of propositional logic is the inclusive or that means the assumption is true if the first is true alone, the second is true alone, or both are true. Thus the propositional calculus arrived at a proof that, on the surface, may seem a surprise to the natural language ear.

The formal proof for this argument is given by:

\[ \tiny \begin{array}{l|llll} 1 & \neg(P \land Q) & & & \text{A} \\ \hline 2 & & \neg(\neg P \lor ~Q) & & \text{RAA H} \\ \hline3 & & & \neg P & \text{RAA H} \\ \hline4 & & & \neg P \lor \neg Q & 3 \; I\lor \\ \hline5 & & & (\neg P \lor \neg Q) \land \neg(\neg P \lor \neg Q) & 2, 4 \; I\land \\ \hline6 & & \neg \neg P & & \text{3-5 RAA} \\ \hline7 & & P & & 6 \; E\neg \\ \hline8 & & & \neg Q & \text{RAA H}\\ \hline9 & & & \neg P \lor \neg Q & 8 \; I\lor\\ \hline 10 & & & (\neg P \lor \neg Q) \land \neg( \neg P \lor \neg Q) & 2, \; 9 \; I \land\\ \hline11 & & \neg \neg Q & & \text{8-10 RAA} \\ \hline12 & & Q & & 11 \; E\neg\\ \hline 13 & & P \land Q & & 7, \; 12 \; I \land \\ \hline 14 & & (P \land Q) \land \neg(P \land Q) & & 1, \; 13 \\ \hline15 & \neg \neg(\neg P \lor \neg Q) & & & 2-14 \; RAA \\ \hline 16 & \neg P \lor \neg Q & & & 15 \; E\neg \end{array} \]

The indentations denote hypothetical proofs within the main body of the proof (this one has 3 Reductio ad Absurdum sub-proofs) and the notation in the last column gives the line number(s) and the rule used to manipulate the wffs.

It is reasonable to ask why go through all this bother. Surely we don’t need formal logic to figure out if it is raining or if the streets are wet. There are several good reasons for studying the propositional calculus but perhaps the most interesting and important is that the system can be programmed on a computer and the rules of inference can be used on problems much harder than the human mind may be willing to deal with.

An interesting example of this is the Tree Proof Generator.

Tree Proof Genrator
This web-based application allows the user to enter an argument its internal inference engine not only determines the arguments validity but it generates the step-by-step proof.

Now this example may not be particularly motivational until one stops to consider to the scope. Since the well-formed formula can be anything, such a system could solve complex mathematical problems, or go through complicated epidemiological analysis to assist in diagnosing a patient’s illness or in determining the pattern of a epidemic.

In other words, the propositional calculus can be used to make an expert system that can deal with large amounts of data and inferences in a formal way that would be difficult of impossible to people to do unaided.

Next week, I will look at what has been done with such expert agents – in particular with an agent application that attempts to keep a digital explorer from a gruesome cyber-fate.

Propositional Calculus – Part 1: Introduction

Propositional calculus is one of the oldest forms of logic.  It lies at the heart of all syllogisms, deductions, inductive inferences, and the like.  It is the system that allows us to deal with logic problems such as determining the truth of the following argument:

It is raining
If it is raining then I am depressed
I am depressed

or of this argument:

It is sunny
If it is sunny then it is bright
If it is bright then I should wear sunglasses
I should wear a raincoat.

Note that is doesn’t judge the truth or falsehood of the premises (It is raining or it is sunny) or of the conditionals (If it is…), it only judges the truth content of the conclusion based on these assumptions (premises and conditionals).  It won’t know whether or not it is raining, but it will be able to tell me that I am depressed when it is raining (not true – I like the rain) or that I shouldn’t wear a raincoat when it is sunny.  Note also that the system is not equipped to handle syllogism using the quantifiers all, some, or none.  Thus the true syllogism

All men are mortal
Socrates is a man
Socrates is mortal

and the false syllogism

All cats have fur
All cats are mammals
All mammals have fur

are outside of its ability to evaluate.

Despite these limitations, the system is fairly powerful and can be used in successful applications of artificial intelligence to solve real-world, interesting problems.

Amazingly propositional logic is able to pack a lot of power into a fairly brief amount of symbolism.  The system as a whole consists of 2 objects, 5 expressions that define relations between the objects, 10 rules for manipulating the relations, and 3 helper symbols that act as traffic cops to impose order on the steps.

The simplest object of the system is an atomic proposition, like the statement ‘It is raining’.  This proposition, usually denoted by a single capital letter – ‘R’ for ‘It is raining’.  More complex propositions are built out of the atomic propositions using the 5 logical expressions, subject to certain rules.  Any primitive proposition and all valid compound propositions are collectively known as well-formed formulae (wffs – pronounce ‘woofs’, like the sounds dogs make).  Often wffs are denoted by Greek symbols, like $\phi$ or $\psi$.

The 5 logical expressions denote relations associated with the propositions.  There is one prefix expression, where the expression symbol operators on only one proposition, and 4 infix expressions that link two propositions together.

The prefix expression is the ‘not’ which translates the proposition ‘It is raining’ into the negative proposition ‘It is not the case that it is raining’.  This somewhat more clunky way of expressing the negation (rather than ‘It is not raining’) seems to be preferred since it makes adding or removing a negation as simple as adding or removing the phrase ‘It is not the case that’ to the front of an existing proposition.

The four infix expressions link two propositions together.  These are:

  • Conjunction – ‘It is raining’ and ‘It is cold’
  • Disjunction – Either ‘it is raining’ or ‘it is sunny’
  • Conditional – If ‘it is raining’ then ‘the ground is wet’
  • Biconditional – ‘It is raining’ if and only if ‘water droplets are falling from sky’

Since the conjunction, disjunction, and biconditional expressions are symmetric upon interchange of the two propositions (or wffs) there is no special name for the first or second slots.  The conditional, however, requires a sense of cause-and-effect and, as result, the first slot is called the antecedent and the second slot the consequent.  In the conditional ‘If it is raining then I am depressed’, ‘it is raining’ is the antecedent and ‘I am depressed’ is the consequent.

The systems objects and expressions can be summarized as

Expression Name Symbol Example
It is not the case that Negation $\neg$, ~, ! $\neg R$
…  and … Conjunction $\land$, & $R \land S$
Either … or … Disjunction $\lor$ $R \lor S$
If … then … Conditional $\rightarrow$ $R \rightarrow S$
… if and only if … Biconditional $\leftrightarrow$ $R \leftrightarrow S$

In addition to the expression symbols, there are a few additional helper symbols that keep things neat.  The first is the ‘implies’ symbol $\implies$. It is sometimes called ‘infer that’ and then is denoted by $\vdash$. Either basically denotes the final conclusion (the output) of the argument.  So the first proposition translates into

$R$
$R \rightarrow D$
$\implies D$

where $R$ is the proposition ‘It is raining’ and $D$ is the proposition ‘I am depressed’.  The second set of symbols are the parentheses ‘(‘ and ‘)’ which are used to group terms together to avoid ambiguous expressions such as $A \land B \land C$, which could mean ‘I did A and then I did B and C’ or ‘I did A and B and then I did C’ or other meanings.

The next piece is the rules of inference that allow proper manipulation of one set of wffs into another.  These rules are:

  1. Modus ponens: a conditional implies the consequent if the antecedent is true
  2. Negation Elimination: $\neg \neg \phi \implies \phi$
  3. Conjunction Introduction: $\phi, \psi \implies \phi \land \psi$
  4. Conjunction Elimination: $\phi \land \psi \implies \phi, \psi$
  5. Disjunction Introduction: $\phi \implies \phi \lor \psi$ for any $\psi$
  6. Disjunction Elimination: $\phi \lor \psi, \phi \rightarrow \chi, \psi \rightarrow \chi \implies \chi$
  7. Biconditional Introduction: $(\phi \rightarrow \psi), (\psi \rightarrow \phi) \implies \phi \leftrightarrow \psi$
  8. Biconditional Elimination: $\phi \leftrightarrow \psi \implies \phi \rightarrow \psi, \psi \rightarrow \phi$
  9. Conditional Proof (CP): accepting a proposition $P$ that proves another $Q$ then $P \rightarrow Q$
  10. Reductio ad Absurdum (RAA): A contradiction to $\neg \phi \implies \phi$

Note that the truth value of the propositions are assumed to be known from the outset (with the exception of the conditional proof and reduction ad absurdum, where the assumption is made during the course of the argument).  The purpose of the system is to determine the truth of the conclusion based on the truth values of assumptions.  The formal inference rules act as a computer program that transforms input to output.

Next week’s column will apply the Propositional Calculus to prove some interesting outcomes and to show how unexpected inferences can result.  All of that is a prelude to the final, fun application of preventing an AI explorer from dying due to misadventure before he can go ‘there and back again’.

Knowledge and Uncertainty

The disciplines of the natural sciences and philosophy enjoy a rich, complicated, and, at times, subtle relationship.  Philosophic pursuits help to guide and inform the scientific enterprise while the phenomena, which science discovers, categorizes, and explains, expands and grounds the philosophic thought.  Nowhere is this interaction more interesting and, perhaps, more important than in the area of knowledge and uncertainty.

Epistemological ideas dealing with what is knowable, unknown, and unknowable have played a large role since the earliest days of philosophy.  In the Platonic dialog The Meno, Socrates puts forward the idea that much (or perhaps all) human learning is really a kind of remembrance of knowledge attained in past incarnations of the soul (anamnesis).  How exactly the cycle starts and what knowledge the proto-soul possesses or whether Plato/Socrates actually worried about an infinite regress is not clear.

Questions of knowledge continue on for thousands of years without much of a change in the focus or tenor until the rise of quantitative scientific methods in the post-Renaissance world.  Almost overnight, there is now a way to discuss three vital components of knowing, at least within the context of physical systems:

  • Knowledge characterized by measurement
  • Uncertainty characterized by error
  • Mathematical description of how the two propagate their influence

These new ingredients are not developed to shed light on ages-old debates but rather to determine just how to deal with these new models of the physical world – differential equations.  In differential equations, man had an operational model for cause-and-effect; a laboratory wherein the ideas of what is known and unknown/unknowable could be made the subject of experimentation.  Nature’s own fabric helped to shape and mold how mankind saw knowledge.

These ideas matured in many different directions subject to need and taste.  The three most interesting ones are:

  • Control theory
  • Quantum mechanics
  • Statistical mechanics

In control theory, the basic notion is one of a state whose evolution is subject to a set of differential equations that describe the influence of the natural environment and the man-made controls used to guide the evolution into a desired behavior.  The physical world is divided into pieces known and unknown.   Generally, the known pieces are considered to be deterministic and the unknown pieces are random.  The random variables are assigned probability distributions that describe what sort of state realizations can occur and how often they are likely to come on the scene.  Sometimes, there is a further division of the random variables as either being aleatory or epistemic.  The former term, aleatory, is best described as saying the randomness is somehow intrinsic to the system being modeled.  In contrast, the latter term, epistemic, refers to randomness that is due to lack of measurement precision.  The errors in the knowledge of the initial state of a system is often thought of as epistemic while the uncertainties in the evolution of the differential equation is often thought of as aleatory.  The distinction being that the initial state knowledge may be improved by better measurements while the evolution model for the system, the so-called right-hand side of the differential equation, will never be able to accurately represent the true dynamics due to random fluctuations in the forces that cause the motion.

Generally, the control system community does delve too deeply into the ontological nature of these uncertainties, contenting themselves with the need to operationally model them.  And this approach is reasonable since it isn’t nearly as important to understand where ‘noise’ comes from as it is to determine how to deal with it.

Nonetheless, the very concept of noise and randomness and the study of how they arise can guide the techniques used to control and mitigate their presence.  This is where the two disciplines in physics, statistical mechanics and quantum mechanics, shine.

These two disciplines are, properly speaking, two sides of the same coin, but it is often convenient to separate out the randomness into two separate bins, one dealing with the quantum nature and the other with the many-particle nature of the system being studies.  Although the terminology is rarely used by physicists, the descriptions of aleatory and epistemic fit these bins nicely, at least at the conceptual level.  However, hard pushing on these concepts will soon show that the divisions are not as clear cut as they might first appear.

First, consider quantum mechanics.  By the very nature of the quantum wave function, the state of a system at any time cannot be determined with infinite precision; so a complete knowledge of conjugate pairs of variables (e.g. position and momentum) is impossible.  In some sense the system is aleatory.  But the evolution of the wave function is mediated by the Hamiltonian, whose nature is considered known.  The state evolution is completely deterministic and the only insertion of randomness comes in the measurement step, where the wave function collapses into an eigenstate of the measurement Hamiltonian.  Thus the measurement process is aleatory but this randomness can be used to an advantage since the initial state of the system can be prepared so that it is perfectly an eigenstate of the measurement Hamiltonian and hence has no state uncertainty.

Statistical mechanics deals with the added complication of having an enormous number of degrees of freedom (e.g. many particles) so that a complete description of the state is practically impossible. (It is interesting to note that not all systems with enormous or even infinite degrees of freedom are intractable; the common field theory – say the wave equation – has an infinite number of Fourier modes that all behave in a describable fashion.)  In classical statistical mechanics, the state of the system is not limited by the uncertainty principle.  So the specification of the initial state is probabilistic only due to our ignorance, thus it is epistemic.  Since tracking separate their individual motions, and hence their interactions, is also intractable, the evolution is subject to ‘noise’ but of an epistemic nature as well; since in principle, if the individual states could be tracked (e.g. on a computer), then complete state knowledge would be possible.

Statistical mechanics becomes richer when combined with quantum mechanics.  The initial state of the system can be statistically distributed across multiple eigenstates.  For example, 10 percent of the system can be in one quantum state while 90 percent in another.  The density matrix formalism is designed to handle the case where epistemic uncertainty is layered on top of aleatory uncertainty.

All this is well and good but things become complicated when these concepts are pushed to their logical boundaries by asking some ontological questions about the nature of uncertainty.  The most intriguing question deal with the boundary between the epistemic and the aleatory.  Many researchers are fascinated with the idea that the aleatory uncertainty of quantum mechanics may give way to hidden variables, pilot waves, and the like.  The unspoken goal is eliminate or, otherwise, get around the uncertainty principle.  But the more interesting question flows the other way.  Is our ignorance a physical manifestation of aleatory rather than of epistemic uncertainty?  Buried deep under these distinctions is the notion of a human who can possess knowledge of the physical world; an observer in the language of quantum mechanics.  But no matter how the knowledge possessor is names, it is still a physical object.  Its knowledge is represented by physical patterns of matter and energy.  Its ability to measure and interact are still mediated materially.  So where does the actual boundary lie?  Just how separate is the measurer from the measured?  The answer is, to close with a pun, completely uncertain.

Frames and Systems of Reference

Thinking about other points-of-view is a proven strategy for more clearly defining what a concept is and what it isn’t.  The heart of the Socratic Method involves repeatedly changing perspectives along this line.  Each dialog employs an operational approach roughly rendered as ‘yes, suppose we look at the matter with this definition, what do we find’.   Following this line of questioning diligently and in a disciplined manner strips away more of the accidentals and allows a sharper picture to emerge of the essential nature of the idea in question.

Such an approach is also useful for sharpening the thinking involved in modeling physical or mathematical objects.   Steps forward in science, particularly physics, comes about often from a cleaner definition of just what some primitive object involves.

Oddly enough, I had the good fortune to be involved in two separate and unrelated discussions this past week about the essential natures of the points-of-view used to describe the physical world, which, in the physical sciences, are always referred to as reference frames or coordinate systems or some closely similar phrasing.   The resulting dialogs certainly helped me to see better what the physical sciences really know about frames and systems of reference.

To set the stage, a disclaimer is in order.  As far as I can tell, there is no universal agreement about how to define a reference frame, or how, exactly, it differs from a coordinate system and the associated measurements.  This lack of uniform definition points to some deep issue – either epistemological or ontological – about the nature of space and time and how humans perceive these things.   One part of the reason seems to be that the operational concepts are so primitive that we have only a basic notion, in many cases, of how to describe it.  I liken it to being able to drive a car or ride a bike but yet be unable to describe how to do these things to someone who can’t.  But I think that there is an even bigger reason that speaks to how we divide the world up into categories and how we identify the essentials from the accidentals.

To make this last point clearer, let me concretely discuss my definitions of reference frame and coordinate system and then point out how one may logically use these definitions to come up with something akin to a contradiction.

A reference frame is a physical object possessing a point-of-view.  The prototype is the human being so defined to have the essential parts of a set of limbs to move about, eyes to look, a mind/brain to process, a mouth to speak the results, and ears to listen.  Even when physics speaks about inanimate objects there is, lurking in the background, the notion of what an observer would see were he a disembodied spirit moving along with or sitting upon the object (such is the nature of our imaginations and how we understand the world).  A convenient abstraction is that a reference frame is any object that has a definite place, which possesses three (independent) directions that it can use, in combination, to point at something, and which has some measure of scale.

frame_of_reference

Now suppose something of interest comes into this objects field-of-view.  As a reference frame it can point towards the object and can denote how far away the thing of interest is.  By convention, our primitive reference frame object will adjust the length of the direction to the thing of interest, making the length of the arrow along the direction longer or shorter in proportion to the distance. Thus we have defined a traditional position vector with respect to our primitive reference frame.

vector_in_the_frame

Note that the notions of direction and distance are also primitive concepts with no easy way to define them in terms of other, simpler things.  Also note that there are no names for the directions yet nor is there any developed idea of how to specify these directions or distances mathematically.

The next step is to remedy this short coming because being able to measure and compute and reproduce values are vital ingredients to understanding the world.  The remedy involves giving the primitive reference frame basic measuring tools.  For this discussion, the ruler, the clock, and the protractor will suffice and the generalization to more sophisticated modes shouldn’t be too hard.

Using only rulers, we can decorate the primitive reference frame with a set of planes, each possessing a ruled grid of lines and spaced with a known distance.  One such configuration is shown below.

Cartesian_coordinates

The thing of interest is then specified by the labels specifying on which plane and within which cell is it located.  Other objects can be as deftly located and thus we arrive at a coordinate system – an instance of the Cartesian coordinate system to be precise.

Two important things are worth noting.  First, in this scheme, the reference frame possesses the coordinate system – we’ll return to this point below.  Second, the coordinate system is arbitrary.  The planes shown above were oriented so that their edges coincide with the reference frame directions but this choice is no better or worse (at least philosophically) than any other.

Indeed, the whole idea of using planes, regardless of alignment, can be abandoned altogether.  Instead, we could have chosen to use concentric spherical shells of different radii with great circles and latitude lines (taken as primitive notions) drawn on them.  The protractor is now our tool of choice and the result is the spherical coordinate system.  One such shell of one such instance is shown below.

spherical_coordinates

In terms of these shells, the location of the thing of interest would be specified by stating on which shell it lies and by giving the great circle and latitude lines on which it lies.  Of course the orientation of the great circles and latitude lines are as equally arbitrary as the alignment of the planes pictured above.

The whole scheme holds up just fine as long as it is being used operationally.  The trouble comes when one starts examining it closely with an eye to first principles (yet another point-of-view).   Several annoying questions come up which bring into doubt the underlying consistency of the scheme.  Some of these are:

  • How can the reference frame have a notion of direction and length without first having some notion of how to measure angles and lengths? In other words, which comes first, reference frame or coordinate system?
  • What objects are used to find the position and orientation of the first object – are they not also reference frames? There is a Machian idea buried here but no time to worry about that now.  It suffices to point out that this ambiguity leads to the perpetual confusion between active and passive rotations.

One might also pose the following question.  Since coordinate systems also objects in their own right, with directions determined by lines of constant coordinate value, can’t they also be used as reference frames.  My answer to that question is a guarded no.  Cartesian coordinates really don’t really have an origin that matters – they are really an affine space – so they aren’t quite the same type of object as the primitive thing we attached directions and an origin to.  This observation is also not very solid since the spherical coordinate system has to have an origin upon which the spherical shells are centered.  Even this requirement doesn’t prevent it the origin from shifting, it just makes the algebra much harder and since most everyone goes back to Cartesian coordinates to compute it isn’t a strong point.

More troubling is the observation that some origins are devoid of a physical object.  For example, the barycenter of two equal mass objects separated by a distance great enough that they don’t touch is located in the empty space between them.  Nonetheless, scientists are quite happy to use this mathematical construction as an origin of a reference frame.

So in the final analysis we are left with two basic conclusions.  First, it is no wonder that there is no uniformly, accepted definition of the basic terms of reference frame and coordinate system.  In some sense they are tightly interconnected and too primitive to define precisely.  As long as any scheme works (i.e. give the right numbers) it is operationally sound if not totally logically so.  Second, by studying this thorny problem, we can get some insight into just what is knowable and explainable.

Bare Bones Halting

It’s a rare thing to find a clear and simple explanation for something complicated – something like the Pythagorean Theorem or calculus or whatnot.  It’s even rarer to find a clear and simple explanation for something that challenges the very pillars of logic – something like Gödel’s Theorem or the halting theorem.  But J. Glenn Brookshear treats readers of his text Computer Science: An Overview to a remarkably clear exploration of the halting theorem in just a scant 3 pages in Chapter 12. I found this explanation so easy to digest that I thought I would devote a post to it, expanding on it in certain spots and putting my own spin on it.

Now to manage some expectations:  it is important to note that this proof is not rigorous; many technical details needed to precisely define terms are swept under the rug; but neither is it incorrect.  It is logically sound if a bit on the heuristic side.

The halting theorem is important in computer science and logic because it proves that it is impossible to construct an algorithm that is able to analyze an arbitrary computer program and all its possible inputs and determine if the program will run to a successful completion (i.e. halt) or will run forever, being unable to reach a solution.  The implications of this theorem are profound since its proof demonstrates that some functions are not computable and I’ve discussed some of these notions in previous posts.  This will be the first time I’ve set down a general notion of its proof.

The basic elements of the proof are a universal programing language, which Brookshear calls Bare Bones, and the basic idea of self-referential analysis.

The Bare Bones language is a sort of universal pseudocode to express those computable functions that run on a Turing Machine (called Turing-computable functions).  The language, which is very simple, has three assignment commands:

  • Clear a variable
  • Increment a variable
  • Decrement a variable

In addition, there is one looping construct based on the while loop that looks like

while x not 0 do;
   ...
end;

While primitive, these commands make it possible to produce any type of higher level function.  For  example, Brookshear provides an algorithm for computing the product of two positive integers:

BB Multiply

Note: for simplicity only positive integers will be discussed – but the results generalize to any type of computer word (float, sign int, etc.).

Now Bare Bones is a universal language in the sense that researchers have demonstrated that all Turing-computable functions can be expressed as Bare Bones algorithms similar to the Multiply function shown above.  Assuming the Church-Turing thesis, which states that the set of all computable functions is the same as the set of all Turing-computable functions, allows one to conclude that Bare Bones is sufficiently rich to encode all algorithms that express computable functions.

One particularly important Bare Bones algorithm is something I call x-test (since Brookshear doesn’t give it a name).

x-test

The job of x-test is simple: it either terminates when x is identically zero or it runs forever otherwise.

x-test at work

The Terminate flag is assigned a value of 0 if x-test fails to halt and 1 if it does.

The next step brings in the concept of self-reference.  The function x-test takes an arbitrary integer in and gives the binary output or 0 or 1.  Since x-test is expressed as a set of characters in Bare Bones and since each character can be associated with an integer, the entire expression of x-test can be associated with an integer.  To be concrete, the string representation of x-test is:

‘while x not 0 do;\n    increment x;\nend;’

A simple python program, called string_to_int:

def string_to_int(s):
    ord3 = lambda x : '%.3d' % ord(x)
    return int(''.join(map(ord3, s)))

transforms this string to the (very long) integer

119104105108101032120032110111116032048032100111059010032032032032105110099114101109101110116032120059010101110100059

Using this integer in x-test results in the Terminate flag equal 0.

Of course, any expression of x-test in any programming language will result in a nonzero integer and so x-test never has a chance to terminate.

xtest is not self-terminating

But there are clearly functions that do terminate when they are expressed as input for their own execution (one obvious example is the duel function to x-test where the while loop stops if x is non-zero).  Such functions are said to be self-terminating.

Although it may look like a lot of effort for nothing, this notion of self-terminating functions is the key to the halting theorem and all that remains is to combine the above results with some clever thinking.

The clever thinking starts with the assumption that there exists a function, which I call the Hypothetical Halting Program (HHP), that can analyze any arbitrary program written in Bare Bones along with any corresponding input and determine which combinations will terminate (decidable) and which will not (undecidable).

Again, to be concrete, let’s look at what the HHP would do when looking to see if x-test is self-terminating.

HHP analyzes xtest

The HHP would be given the x-test function and the corresponding input (x-test in integer form) and would determine, just as we did above, that x-test won’t terminate – that is, x-test is not self-terminating.

Now such a function that works only on x-test is easy to understand; the text above that argued against x-test being self-terminating is one such instantiation.  However, the goal is to have the HHP be abale to analyze all such algorithms so that we don’t have to think.  Already, warning bells should be going off, since the word ‘all’ is a pretty big set, in fact infinite, and infinite sets have a habit of behaving badly.

But let’s set aside that concern for the moment and just continue our assumption that HHP can do the job.  As we progress, we’ll see that our assumption of the existence of the HHP will lead to a contradiction that rules out this assumption and, thus, gives the desired proof.  Operating without this foresight, we argue that since the HHP is computable it must be able to find expression as a Bare Bones program.  And since the HHP analyzes Bare Bones programs, it must be able to analyze itself and determine if it is self-terminating – spitting out a 1 if so and a 0 otherwise.

Now we can construct a new program, called the HHP++, that is the sequential running of the HHP followed by x-test.

HPP++

We can always do this in Bare Bones by first running HHP and then assigning its output (0 or 1) to the input of x-test.  And here is where it gets interesting.

Take the HHP++ and let it test itself to see if it is self-terminating.  If it is self-terminating, the HHP piece will confirm it so and will output a value of 1.  This value is then passed to the x-test piece which then falls to halt.  If the HHP++ is not self-terminating, the HHP piece will output a 0 which, when passed to the x-test piece, halts the operation.  Thus the HHP++ is a program that is neither self-terminating nor not self-terminating and so the only way that can be true is that our original assumption that the HHP exists is false.

HPP++ contradiction

It is remarkable that using such a simple argument, the human mind is able to rule out the possibility of general function like the HHP.

Vectors as Instructions

For many of us, it’s hard to imagine how the standard blackboard construction of lines and points in the plane can hold any mystery or subtlety. What could be simpler than drawing a line in two dimensions and decorating it with some points, each bearing a label? However, many of the intuitive notions that we have about things aren’t sufficiently well-defined to survive the transition from casual use to rigorous proof. This is especially true in all things associated with getting a computer to perform a set of functions for us; the first step in building good software often being the sharpening of fuzzy ideas. The human capacity to ‘figure it out’ has yet to be designed into a set of instructions.

The issue that arises in constructing a line with points in the plane is in understanding the distinction between the ordered pair of numbers used to denote a given point and the ordered pair of numbers used to denote a vector. Take the line $\ell$ in the plane.

line and plane
The points ${\mathcal P}$ and ${\mathcal Q}$ fall on the line and have coordinates given by the ordered pairs


\[ {\mathcal P} = \left( \begin{array}{c} x_1 \\ y_1 \end{array} \right) \]

and

\[ {\mathcal Q} = \left( \begin{array}{c} x_0 \\ y_0 \end{array} \right) \; . \]

Note that there is nothing special about writing the points in terms of column arrays, rather than the more usual $(x_i, y_i)$ notation. The reason for doing this is a matter of notation convenience that should become clear below.

Intuitively, we understand that if the vector $\vec v$ stretches from ${\mathcal Q}$ to ${\mathcal P}$ it components are given by

\[  [\vec v]_x \equiv v_x = (x_1 – x_0) \]

and

\[  [\vec v]_y \equiv v_y = (y_1 – y_0) \; , \]

where the notation $[\vec v]_i$ should be read as ‘get the ith component of the vector $\vec v$’.

At this point, there is a strong temptation to write the vector $\vec v$ in the same fashion

\[ \vec v = \left( \begin{array}{c} x_1 – x_0 \\ y_1 – y_0 \end{array} \right)  \]

as we did the points ${\mathcal P}$ and ${\mathcal Q}$.

This approach certainly provides some benefits. That the notational forms for points and vectors look the same fits the visual picture of $\vec v$ connecting ${\mathcal Q}$ to ${\mathcal P}$. But the cons of such an approach outweigh the benefits. An individual point on the line is a zero-dimensional object whose address in space is given by the ordered pair. The vector is a one-dimensional object, a portion of the line that behaves like a directed line segment.

In addition, the vector is completely insensitive to the change in the origin. What to make of two new points ${\mathcal R}$ and ${\mathcal S}$ specified by the ordered pairs

\[ {\mathcal R} = \left(\begin{array}{c} x_2 \\ y_2 \end{array} \right) = \left(\begin{array}{c} x_0 + a \\ y_0 +b \end{array} \right) \]

and

\[  {\mathcal S} = \left(\begin{array}{c} x_3 \\ y_3 \end{array} \right) = \left(\begin{array}{c} x_1 + a \\ y_1 +b \end{array} \right) \; ? \]

Depending on the choices for the values $a$ and $b$, ${\mathcal R}$ and ${\mathcal S}$ may not even fall on $\ell$ and yet they have the same vector $\vec v$ connecting them as does ${\mathcal Q}$ and ${\mathcal P}$.

Mathematicians like to draw the distinction between points and vectors but they are often clumsy about it. Take, for example A course in mathematics for students of physics, Vol. 1 by Bamberg and Sternberg. These authors identify the vector $\vec v$ as an equivalence class and they use the cluttered notation

\[ {\mathcal P} \; “+” \; \vec v = {\mathcal Q} \]

to define it in terms of the more primitive points. They also use different delimiters around the column arrays which specify the components: parentheses for one and square brackets for the other. Although it isn’t important which is used for which, note, for completeness, that the notation used in this column is opposite of Bamberg and Sternberg.

In this notation, the distinction between vector and points is front and center but at the cost of complication in the visual presentation. A parametric line would be specified as the set

\[  \ell(t) = \left\{ \left. \left(\begin{array}{c} x_0 \\ y_0 \end{array} \right) + t \left[ \begin{array}{c} v_x \\ v_y \end{array} \right] \right| t \in \mathbb{R} \right\} \; , \]

of all points related by the real number $t$, where the components of $\vec v$ are as specified above.

A cleaner way of thinking about these distinctions is to regard the relations more as computing instruction rather than as mathematical definitions. This allows a cleaner form of the notation and the defining equation

\[ \vec v = {\mathcal P} – {\mathcal Q} \]

to be interpreted as ‘the vector $\vec v$ contains the instructions on how to move from the ${\mathcal Q}$ to the point ${\mathcal P}$. The equation of the parametric line, now cast in the abstract form without the column arrays, would be the set

\[ \ell(t) = \left\{ \left. {\mathcal Q} + t \vec v \; \right| \; t \in \mathbb{R} \right\} \; , \]

of all points formed by moving a variable amount $t$ from ${\mathcal Q}$ according to the instructions held in $\vec v$.

The translation to objects in a computer program is now much more straightforward and natural than trying to parse what is meant by an equivalence class. To be clear, I am not criticizing the notion of an equivalence class nor its citation by Bamberg and Sternberg. Rather I am simply saying that viewing vectors in the context of directed line segments is much more natural in this computer-centric age.

A Flip of the Coin

Ideological conflicts are often the bitterest of arguments that appear in the race of Man.  Whether the  religious wars of a post-reformation Europe, the polarizing arguments of modern US politics, the simple disputes over which is better – PCs or Macs, or whether somebody should be a dog-person or a cat-person, these conflicts are always passionate and, while important, they are, in some aspect, pointless.  Clearly, adherents of both sides always have a good reason for supporting their position; if they didn’t they wouldn’t support it so vehemently.  Those bystanders without a strong opinion one way or another are left to just shake their heads.

One such ideological conflict is the strong, often mean-spirited, argument between the Frequentists and the Bayesians over the right way to characterize or define probabilities.  For much of this particular cultural hotspot, I’ve been a bystander.  By training and early practice, an outsider would have characterized me as a Frequentist since I am comfortable with and enjoy using sampling techniques, like classical Monte Carlo, to investigate the evolution of a given probability distribution.  Over the past 6 or 7 years, I’ve come to a better appreciation of Bayesian methods and find myself in the ever-growing position of seeing the complementary utility of both.

Nonetheless, finding a simple scenario that captures the essential aspects of these schools of thought and is easy to articulate has been elusive – that is until recently.  I now believe I’ve found a reasonably compact and understandable way to demonstrate the complementary nature of these techniques through the flip of a coin. (Although I am quite sure that there is great room to improve it across the board).

Before diving into the coin flip model, I would like to summarize the differences between the Frequentist and Bayesian camps.  While the coin-flip model is strictly my own – based on my own thoughts and observations – the following summary is heavily influenced by the online book entitled Entropic Inference and the Foundations of Physics, by Ariel Caticha.

The reader may ask why people just can’t agree on a single definition of probability.  Why do the two camps even matter.

On the notion of probability, Caticha has this to say

The question of the meaning and interpretation of the concept of probability has long been controversial. Needless to say the interpretations offered by various schools are at least partially successful or else they would already have been discarded. But the different interpretations are not equivalent. They lead people to ask different questions and to pursue their research in different directions. Some questions may become essential and urgent under one interpretation while totally irrelevant under another. And perhaps even more important: under different interpretations equations can be used differently and this can lead to different predictions.

Ariel Caticha

The Frequentist model of probability is based on what I call the gambling notion.  Take a probability-based scenario, like throwing a pair of dice, and repeat for a huge number of trials. The probability of a random event occurring, say the rolling of snake eyes (two 1’s)

Snake Eyes

is empirically determined by how frequently it shows up compared to the total number of trials.  The advantage that this method has is that, when applicable, it has a well-defined operational procedure that doesn’t depend on human judgement.  It is seen as an objective way of assigning probabilities.  It suffers in two regards.  First, the notion of what random means is a poorly-defined philosophical concept and different definitions lead to different interpretations.  Second, the method fails entirely in those circumstances where trials cannot be practically repeated.  This failure manifests itself in two quite distinct ways.  First, the question being asked could be entirely ill-suited to repeated trials.  For example, Caticha cites the probability of there being life on Mars as one such question.  Such a question drives allocation of resources in space programs around the world but is not subject to the creation and analysis of an ensemble of trials.  Second, the scenario may naturally lend itself to experimental trials but the cost of such trials may be prohibitively expensive.  In these cases, it is common to build a computational model which is subject to a classical Monte Carlo method in which initially assumed distributions are mapped to final distributions by the action of the trial.  Both the mechanics of the trial and the assumed initial distribution are subjective and so are the results obtained.

The Bayesian approach relies heavily on Bayes theorem and, in doing so, allows the user to argue consistently about issues like the life-on-Mars question without needing to define random or determine how to implement trials.  This approach eschews the notion of objective determination of probability and, instead, views probability as an epistemological question about knowledge and confidence.  Thus two observers can look at the same scenario and quite naturally and honestly assign very different probabilities to the outcome based on each one’s judgement.  The drawback of this method is that it is much harder to reach a consensus in a group setting.

Now to the coin-flip model that embodies both points of view.  The basic notion is this:  Suppose you and I meet in the street and decide to have lunch.  We can’t agree on where to go and decide to pick the restaurant based on a coin flip done in the following fashion.  I’ll flip a quarter and catch it against the back of my hand.  If you call the outcome correctly we go to your choice; alternatively we go to mine.  What is the probability that we end up in your restaurant versus mine?

Well there are actually two aspects of this problem.  The first is the frequentist assignment of probability to the coin flip.  Having observed lots of coin flips, we both can agree that prior to the flip, the probability that it be heads or tails is 50-50.

Before the Flip

But after I’ve flipped the coin and caught it, the probability of it being heads is a meaningless question.  It is either heads or tails – it is entirely deterministic.  It is just that you don’t know what the outcome is.  So the probability of you picking the right answer is not amenable to trials.  Sure we could repeat this little ritual every time we go out to lunch, but it won’t be an identical trial.  Your selection of heads versus tails will be informed based on all your previous attempts, how you are feeling that day, and so on.

So it seems that we need both concepts.  And after all, we are human and can actually entertain multiple points of view on any given subject, provided we are able to get past our own ideology.