Analog Logic in the Digital Age

Latest Posts

Images, Representations, and Programming

Conrad SchiffAugust 28, 20150

It’s an old idea. Someone you know holds up a photograph depicting something familiar, say a beautiful car, maybe a Corvette, and asks you “what this?”. You answer, “it’s a Corvette,” and are greeted with the cheeky response, “No! silly, it’s a picture.”

This simple joke, while annoying, makes an important philosophical point about keeping a clear distinction between the image of a thing and the thing itself. As important as this distinction is for basic reasoning and logic, it is much more important to keep it straight in the practice of mathematics and computing – particularly in the study of vectors.

Formally, a vector is any kind of object that belongs to a class of like objects that all ‘obey’ a set of rules that define how they combine to form new objects also in the same class. For simplicity, a vector will be denoted in underlined, bold face but, as will discussed below, there are other common ways to denote the vectors, all of which suffer from ‘Corvette-problem’ above. The set of combination rules are:

There is a combination rule ‘+’ such that U + V is a vector if U, V are vectors
U + V = V + U (order doesn’t matter)
U + (V + W) = (U + V) + W (the combination rule is associative)
0 + U = U + 0 = U (there is a zero vector)
U + (-U) = 0 (there is a way to add up vectors to get a zero one)
There is a scaling rule such that the product kU is a vector, (k is an ordinary complex number)
k(U + V) = kU + kV
(k+l)U = kU + lU (where k & l are ordinary complex numbers)
k(lU) = (kl)U
1U = U

Some purist out there may object and point out that only occasionally does an author actually enumerate all 10 items above separately (even though such a purist will concede that all 10 must be there in some form or another). The purist may also go on to say that some authors prefer 0U = 0 to rule #5. But none of these details are particularly important.

What is important is that the rules are abstract and simple. They apply equally well to vectors defined as a directed arrows as they do to vectors defined as column arrays of numbers as they do to vectors defined in terms of partial derivatives. They apply equally well to vectors that we can observe and touch, for example pulls and pushes on an object, as they do to those that live in an abstract space like column arrays or partial derivatives, whose sole existence is built from ideas in the mind and symbols on the page.

As the study of vectors deepened, several clarifying points made computation with them very simple. The most powerful, and hence most dangerous, realization is the point that an arbitrary vector can be decomposed in terms of primitive vectors, usually referred to as basis vectors. This realization, which arguably finds its crystallization in the work of Descartes, reduces the infinity of possibilities into a manageable number of chunks and is the driving force between the 10 rules listed above.

The manageable chunks consist of a set of basic vectors whose number equals the number of dimensions in the space (1 for a line, 2 for a plane, 3 for a volume, and so on) and a list of the numbers whose length also equals the number of dimensions.

And here is the first of the traps. Once the basis vectors are agreed-upon and understood, they can be pushed to the back and the list (called a list of components) can be manipulated without much additional thought. The list becomes a stand-in for the original object in analogy for the way that the image of the car becomes a stand-in for the car itself. The list is now a representation of the original object.

This blurring between the original object and its representation becomes even more fuzzy with some additional reflection. A list is also a valid choice as an original object in the vector space since it also obeys the 10 rules (with the appropriate definition of ‘+’ and ‘x’). To show how strange this is in the physical world, consider the possibility of getting into the picture of the car, kicking over its motor, and taking it for a spin.

It’s no wonder that otherwise well-trained and intelligent people get hung up over vectors and their manipulations each and every day. Functionally, every object shown in the figure above is equivalent to a list of numbers.

Now suppose that one wanted to represent these abstract objects in a computer language. Well, as long as one was careful, one could actually exploit these ambiguities and simply say that the list will always be the representation. This is actually what most, if not all, languages do, though they differ in the terminology, with many choosing array, some choose vector, and others stay with list.

Of course, most users aren’t careful about maintaining that distinction and, I suppose, most aren’t even really conscious of it. But one hopes that at least the language creators do.

In most cases, this hope is realized. Many languages make the distinction between a heterogeneous list (not a vector) and a homogeneous list (which is, or at least can be, a vector). Some languages, like those underlying the computer algebra system Maple, use the word vector to connote a special kind of list. However, sadly, sometimes a language gets befuddled and either loses these distinctions or creates ones where none exist.

An example of the later problem comes from the numpy/scipy family of packages used in the Python programming language. To properly discuss this minor defect in what is really a great set of packages, I need to add one more ingredient that adds a few more ingredients to the vector space turning it into a metric space.

In a metric space, there is added to the original 10 rules an additional notion of the length of a vector. A new combination rule, usually denoted with a dot ‘.’, allows for two vectors to be combined to produce not another vector but a number, specifying how much of the length of one of the two lies along the other. This combination is defined such that A.B = B.A and that A.A is the square of the length of A. This combination rule is called variously as a dot product, an inner product, or a scalar product.

Once defined, another operation can be derived from these 11 rules. This operation, called the cross product, mixes the components from various places in the list to get new components. It depends on the dot product to bring meaning to the idea of having the component from one dimension multiplying the component from another dimension and, like the dot product, actually results in an object that doesn’t (properly) belong in the vector space. In other words, both the dot and the cross products take two vectors and produce something different.

In addition, both rules belong to the space itself since they both apply to any two pairs of vectors. Unfortunately, the numpy/scipy team missed this concept entirely.

In numpy, the vector space as a whole can be thought of as being represented by the family of functions that make up numpy proper. These functions include the function ‘array’ for making a new array and the function ‘cross’ for taking the cross product. Strangely, the function ‘dot’ is not found in the collection of numpy functions but rather is a member function of the ‘array’ object itself. A minor flaw in a really fine set of packages but a solid proof that it isn’t always easy to the tell the image from the thing.

Uncategorized

Aug

2015

Bayes and Drugs

Conrad SchiffJuly 10, 20150

One of the most curious features of Bayesian inference is the non-intuitive conclusions that can result from innocent looking observations. A case in point is the well-known issue with mandatory drug tests being administered in a population that is mostly clean.

For the sake of this post, let’s assume that there is a drug, called Drugg, that is the new addiction on the block and that we know from arrest records and related materials that about 7 percent of the population uses it. We want to develop a test that will detect the residuals in a person’s bloodstream thus indicating that the subject has used Drugg within some period time (e.g. two weeks) prior to the administration of the test. The test will return a binary result with a ‘+’ indicating that the subject has used Drugg and a ‘-‘ indicating that the subject is clean.

Of course, since no test will be infallible, one of our requirements is that the test will provide an acceptably low percentage of cases that are either miss detections or false alarms. A missed detection occurs when the subject uses Drugg but the test fails to return a ‘+’. Likewise, a false alarm occurs when the test returns a ‘-‘ but the subject is clean. Both situations present substantial risk and potentially high costs, so the lower both percentages can be made the better.

In order to develop the test, we gather 200 subjects for clinical trials; 100 of them are known Drugg users (e.g. they were caught in the act or are seeking help with their addiction) and the remaining 100 of them are known to be clean. After some experimentation, we have reached the stage where the 99 percent of the time, the test correctly returns a ‘+’ when administered to a Drugg user and 95 percent of the time, it correctly returns a ‘-‘ when administered to someone who is clean. What are the false alarm and missed detection rates?

This is where Bayes theorem allows us to make a statistically based inference and one that is usually surprising. To apply the theorem, we need to be a bit careful with notation so let’s first define some additional notation. A person who belongs to the population that uses Drugg will be denoted by ‘D’. A person who belongs to the population that is clean will be denoted by ‘C’. Let’s summarize what we know in the following table.

Description	Symbol	Value
Probability of a ‘+’ given that the person is C	P(+\|C)	0.05
Probability of a ‘-’ given that the person is C	P(-\|C)	0.95
Probability of a ‘+’ given that the person is D	P(+\|D)	0.99
Probability of a ‘-’ given that the person is D	P(-\|D)	0.01
Probability that a person is C	P(C)	0.93
Probability that a person is D	P(D)	0.07

There are two things to note. First the results of our clinical trials are all expressed as conditional probabilities. Second, the conditional probabilities for disjoint events sum to 1 (e.g. P(+|D) + P(-|D) = 1 since a member of D, when tested, must result in either a ‘+’ or a ‘-‘).

In the population as a whole, we won’t know to which group the subject belongs. Instead, we will administer the test and get back either a ‘+’ or a ‘-‘ and from that observation we need to infer to what group the subject is most likely to belong.

For example, let’s use Bayes theorem to infer what the missed detection probability, P(D|-) (note the role-reversal between ‘D’ and ‘-‘). Applying the theorem we get

\[ P(D|-) = \frac{ P(-|D) P(D) }{ P(-) } \; . \]

Values for P(-|D) and P(D) are already listed above, so all we need is to get P(-) and we are in business. This probability comes is obtained from the formula

\[ P(-) = P(-|C) P(C) + P(-|D) P(D) \; . \]

Note that this relationship can be derived from $P(-) = P(- \cap C ) + P(- \cap D)$ and $P(A \cap B) = P(A|B) P(B)$. The first formula says, in words, that the probability of getting a negative from the test is the probability of either getting a negative and the subject is clean or getting a negative and the subject uses Drugg. The second formula is essentially the definition of conditional probability.

Since we’ll be needing them P(+) as well, let’s compute them both now and note their values.

Description	Formula	Symbol	Value
Probability of a ‘+’ given that the person either is in C or D	\[ P(+) = P(+\|C) P(C) + P(+\|D) P(D) \]	P(+)	0.1158
Probability of a ‘-’ given that the person either is in C or D	\[ P(-) = P(-\|C) P(C) + P(-\|D) P(D) \]	P(-)	0.8842

The missed detection probability is

\[ P(D|-) = \frac{ P(-|D) P(D) }{ P(-) } = \frac{ 0.01 \cdot 0.07 }{ 0.8842 } = 0.0008 \; . \]

So things are looking good and we are happy. But our joy soon turns to perplexity when we compute the false alarm probability

\[ P(C|+) = \frac{ P(+|C) P(C) }{ P(+) } = \frac{ 0.05 \cdot 0.93 }{ 0.1158 } = 0.4016 \; . \]

This result says that around 40 percent of the time, our test is going to incorrectly point a finger at a clean person.

Suppose we went back to our clinical trials and came out with the second version of the test where nothing had changed except P(-|C) had now risen from 0.95 to 0.99. As the figure below shows, the false alarm rate does decrease but still remains very high (surprisingly high) when the percentage of the population using Drugg is low.

The reason for this is that when the percentage of users in the population is small in order to get the missed detection rate low we have to do it at the expense of a greater percentage of false alarms. In other words, our diligence in finding Drugg users has made us overly suspicious.

Uncategorized

Jul

2015

Bayesian Inference – Cause and Effect

Conrad SchiffJuly 3, 20150

In the last column, the basic inner workings of Bayes theorem were demonstrated in the case where two different random variable realizations (the attributes of the Christmas tree bulbs) occurred together in a joint probability function. The theorem holds whether the probability functions for the two events are independent or are correlated. In addition, it can be generalized in an obvious way to cases where there are more than two variables and where one some or all of them are continuous rather than discrete random variables.

If that were all there was to it – a mechanical demonstration between conditional and joint probabilities – Bayes theorem would make a curious footnote in probability and statistics textbooks and would hold little practical interest and no controversy. However, the real power of Bayes theorem comes in ability to link one statistical event with another and to allow inferences to be made about cause and effect.

Before looking at how inferences (sometimes very subtle and non-intuitive) can be drawn, let’s take a moment to step back and consider why Bayes theorem works.

The key insight come from examining the meaning contained in the joint probability that two events, $A$ and $B$, will both occur. This probability is written as

\[ P( A \cap B ) \; , \]

where the operator $\cap$ is the logical ‘and’ requiring both $A$ and $B$ to be true. It is at this point that the philosophically interesting implications can be made.

Suppose that we believe that $A$ is a cause of $B$. This causal link could take the form of something like: $A$ = ‘it was raining’ and $B$ = ‘the ground is wet’. Then it is obvious that the joint probability takes the form

\[ P( A \cap B ) = P(B|A) P(A) \; , \]

which in words says that the probability that ‘it was raining and the ground is wet’ = the probability that ‘the ground is wet given that it was raining’ times the probability that ‘it was raining’.

Sometimes, the link between cause and effect is obvious and no probabilistic reasoning is required. For example, if the event is changed from ‘it was raining’ to ‘it is raining’, it becomes clear that ‘the ground is wet’ due to the rain. (Of course even in this case, another factor may also be contributing to how wet the ground is but that complication is naturally handled with the conditional probability).

Often, however, we don’t observe the direct connection between the cause and the effect. Maybe we woke up after the rain had stopped and the clouds had moved on and all we observe is that the ground is wet. What can we then infer? If we lived somewhere without running water (natural or man-made), then the conditional probability ‘that the ground is wet given that is was raining’ would be 1 and we would infer that ‘it was raining’. There would be no way for the ground to be wet other than to have had rain fall from the sky. In general, such a clear indication between cause and effect doesn’t happen and the conditional probability describes the likelihood that some other cause has led to the same event. In the case of the ‘ground is wet’ event perhaps a water main had burst or a neighbor had watered their lawn.

In order to infer anything about the cause from the observed effect, we want to reverse the roles of $A$ and $B$ and argue backwards, as it were. The joint probability can be written with the mathematical roles of $A$ and $B$ reversed to yield

\[ P( A \cap B ) = P(A|B) P(B) \; , \]

Equating the two expressions for the joint probability gives Bayes theorem and also a way of statistically inferring the likelihood that a particular cause $A$ gave the observed effect $B$.

Of course any inference obtained in this fashion is open to a great deal of doubt and scrutiny due to the fact that the link backwards from observation to proposed or inferred origin is one built on probabilities. Without some overriding philosophical principle (e.g. a conservation law) it is easy to confuse coincidence or correlation with causation. Inductive reasoning can then lead to probabilistically support but untrue conclusions like all swans are white – so we have to be on our guard.

Next week’s column will showcase one such trap within the context of mandatory drug testing.

Uncategorized

Jul

2015

Bayesian Inference – The Basics

Conrad SchiffJune 27, 20151

In last week’s article, I discussed some of the interesting contributions to the scientific method made by the pair of English Bacons, Roger and Francis. A common and central theme to both of their approaches is the emphasis they placed on performing experiments and then inferring from those experiments what the logical underpinning was. Put another way, both of these philosophers advocated inductive reasoning as a powerful tool for understanding nature.

One of the problems with the inductive approach is that in generalizing from a few observations to a proposed universal law one may overreach. It is true that, in the physical sciences, great generalizations have been made (e.g., Newton’s universal law of gravity or the conservation of energy) but these have ultimately rested on some well-supported philosophical principles.

For example, the conservation of momentum rests on a fundamental principle that is hard to refute in any reasonable way; that space has no preferred origin. This is a point that we would be loath to give up because it would imply that there was some special place in the universe. But since all places are connected (otherwise they can’t be places) how would nature know to make one of them the preferred spot and how would it keep such a spot inviolate?

But in other matters, where no appeal can be made to an over-arching principle as a guide, the inductive approach can be quite problematic. The classic and often used example of the black swan is a case in point. Usually the best that can be done in these cases is to make a probabilistic generalization. We infer that such and such is the most likely explanation but by no means necessarily the correct one.

The probabilistic approach is time honored. William of Occam’s dictum that the simplest explanation that fits all the available facts is usually the correct one is, at its heart, a statement about probabilities. Furthermore, general laws of nature started out as merely suppositions until enough evidence and corresponding development of theory and concepts led to the principles upon which our confidence rests.

So the only thorny questions are what are meant by ‘fact’ and ‘simplest’. On these points, opinions vary and much argument ensues. In this post, I’ll be exploring one of the more favored approaches for inductive inference known as the Bayesian method.

The entire method is based on the theorem attributed to Thomas Bayes, a Presbyterian minister, and statistician, who first published this law in the latter half of the 1700s. It was later refined by Pierre Simon Laplace, in 1812.

The theorem is very easy to write down, and that perhaps is what hides its power and charm. We start by assuming that two random events, $A$ and $B$, can occur, each according to some probability distribution. The random events can be anything at all and don’t have to be causally connected or correlated. Each event has some possible set of outcomes $a_1, a_2, \ldots$ and $b_1, b_2, \ldots$. Mathematically, the theorem is written as

\[ P(a_i|b_j) = \frac{P(b_j|a_i) P(a_i)}{P(b_j)} \; , \]

where $a_i$ and $b_j$ are some specific outcomes of the events $A$ and $B$ and $P(a_i|b_j)$ ($P(b_j|a_i)$) is called the conditional probability that $a_i$ ($b_j$) results given that we know that $b_j$ ($a_i$) happened. As advertised it is nice and simple to write down and yet amazingly rich and complex in its applications. To understand the theorem, let’s consider a practical case where the events $A$ and $B$ take on some easy-to-understand meaning.

Suppose that we are getting ready for Christmas and want to decorate our tree with the classic strings of different-colored lights. We decide to a purchase a big box of bulbs of assorted colors from the Christmas light manufacturer, Brighty-Lite, who provides bulbs in red, blue, green, and yellow. Allow the set $A$ to represent the colors

\[ A = \left\{\text{red}, \text{blue}, \text{green}, \text{yellow} \right\} = \left\{r,b,g,y\right\} \; . \]

On its website, Brighty-Lite proudly tells us that they have tweaked their color distribution in the variety pack to best match their customer’s desires. They list their distribution as consisting of 30% percent red and blue, 25% green, and 15% yellow. So the probabilities associated with reaching into the box and pulling out a bulb of a particular color are

\[ P(A) = \left\{ P(r), P(b), P(g), P(y) \right\} = \left\{0.30, 0.30, 0.25, 0.15 \right\} \; . \]

The price for bulbs from Brighty-Lite is very attractive, but being cautious people, we are curious how long the bulbs will last before burning out. We find a local university that put its undergraduates to good use testing the lifetimes of these bulbs. For ease of use, they categorized their results into three bins: short, medium, and long lifetimes. Allowing the set $B$ to represent the lifetimes

\[ B = \left\{\text{short}, \text{medium}, \text{long} \right\} = \left\{s,m,l\right\} \]

the student results are reported as

\[ P(B) = \left\{ P(s), P(m), P(l) \right\} = \left\{0.40, 0.35, 0.25 \right\} \; , \]

which confirmed our suspicions that Brighty-Lite doesn’t make its bulbs to last. However, since we don’t plan on having the lights on all the time, we decide to buy a box.

After receiving the box and buying the tree, we set aside a weekend for decorating. Come Friday night we start by putting up the lights and, as we work, we start wondering whether all colors have the same lifetime distribution or whether some colors are more prone to be short-lived compared with the others. The probability distribution that describes the color of the bulb and its lifetime is known as the joint probability distribution.

If the bulb color doesn’t have any effect on the lifetime of the filament, then the events are independent, and the joint probability of, say, a red bulb with a medium lifetime is given by the product of the probability that the bulb is red and the probability that it has a medium lifespan (symbolically $P(r,m) = P(r) P(m)$).

The entire full joint probability distribution is thus

	red	blue	green	yellow
short	0.12	0.12	0.1	0.06	0.40
medium	0.105	0.105	0.0875	0.0525	0.35
long	0.075	0.075	0.0625	0.0375	0.25
	0.30	0.30	0.25	0.15

Now we are in a position to see Bayes theorem in action. Suppose that we pull out a green bulb from the box. The conditional probability that the lifetime is short $P(s|g)$ is the relative proportion that the green and short entry $P(g,s)$ has compared to the sum of the probabilities $P(g)$ found in the column labeled green. Numerically,

\[ P(s|g) = \frac{P(g,s)}{P(g)} = \frac{0.1}{0.25} = 0.4 \; . \]

Another way to write this is as

\[ P(s|g) = \frac{P(g,s)}{P(g,s) + P(g,m) + P(g,l)} \; , \]

which better shows that the conditional probability is the relative proportion within the column headed by the label green.

Likewise, the conditional probability that the bulb is green given that its lifetime is short is

\[ P(g|s) = \frac{ P(g,s) }{P(r,s) + P(b,s) + P(g,s) + P(y,s)} \; . \]

Notice that this time the relative proportion is measured against joint probabilities across the colors (i.e., across the row labeled short). Numerically, $P(g|s) = 0.1/0.4 = 0.25$.

Bayes theorem links these two probabilities through

\[ P(s|g) = \frac{ P(g|s) P(s) }{ P(g) } = \frac{0.25 \cdot 0.4}{0.25} = 0.4 \; , \]

which is happily the value we got from working directly with the joint probabilities.

The next day, we did some more cyber-digging and found that a group of graduate students at the same university extended the undergraduate results (were they perhaps the same people?) and reported the following joint probability distribution:

	red	blue	green	yellow
short	0.15	0.10	0.05	0.10	0.40
medium	0.05	0.12	0.15	0.03	0.35
long	0.10	0.08	0.05	0.02	0.25
	0.30	0.30	0.25	0.15

Sadly, we noticed that our assumption of independence between the lifetime and color was not borne out by experiment since $P(A,B) \neq P(A) \cdot P(B)$ or in more explicit terms $P(color,lifetime) \neq P(color) P(lifetime)$. However, we were not completely disheartened since Bayes theorem relates relative proportions and, therefore, it might still work.

Trying it out, we computed

\[ P(s|g) = \frac{P(g,s)}{P(g,s) + P(g,m) + P(g,l)} = \frac{0.05}{0.05 + 0.15 + 0.05} = 0.2 \]

and

\[ P(g|s) = \frac{ P(g,s) }{P(r,s) + P(b,s) + P(g,s) + P(y,s)} \\ = \frac{0.05}{0.15 + 0.10 + 0.05 + 0.10} = 0.125 \; . \]

Checking Bayes theorem, we found

\[ P(s|g) = \frac{ P(g|s) P(s) }{ P(g) } = \frac{0.125 \cdot 0.4}{0.25} = 0.2 \]

guaranteeing a happy and merry Christmas for all.

Next time, I’ll show how this innocent looking computation can be put to subtle use in inferring cause and effect.

Uncategorized

Jun

2015

Bringing Home the Bacon

Conrad SchiffJune 19, 20150

Don’t worry; this week’s entry is not about America’s favorite pork-related product (seriously there exists bacon-flavored candy). It’s about the scientific method. Not the whole thing, of course, as that would take volumes and volumes of text and would be outdated and maybe obsolete by the time it was finished. No, this column is about two men who are considered by science historians to have contributed substantially to the scientific method and the philosophy of science. And it just so happens that both of them bore the last name of Bacon.

Roger Bacon was born somewhere around 1214 (give or take – time and record keeping then, as now, was hard to do) in England. Roger became both an English philosopher of note and a Franciscan friar. Most of the best scholastic philosophers of the Middle Ages were monks, and in taking Holy Orders, Bacon falls amongst the ranks of other prominent thinking religious, including Robert Grosseteste, Albert Magnus, Thomas Aquinas, John Duns Scotus, and William of Ockham.

It seems that the cultural milieu of that time was planting the intellectual seeds for the scientific and artistic renaissance that followed. Roger Bacon cultivated modes of thought that would be needed for the advances to come. Basing his philosophy on Aristote, he advocated for the following ‘modern’ ideas:

Experimental testing for all inductively derived conclusions
Rejection of bling following of prior authorities
Repeating pattern of observation, hypothesis, and testing
Independent corroboration and verification

In addition, he wrote extensive on science, both its general structure and on specific applications. Among his particular fields of interest was optics, where his diagrams have the look and feel of the modern experimental lab notebook.

He also criticized the Julian day and argued for dropping a day every 125 years. This system would not be adopted until about 300 years after his death with the creation of the Gregorian calendar in 1582. He was almost an outspoken supporter experimental science saying that it had three great prerogatives over other sciences and arts in that:

It verifies all of its conclusions by direct experiment
It discovers truths which can’t be reached without observation
It reveals the secrets of nature

Francis Bacon was born in 1561 in England. He was a government official (Attorney General and Lord Chancellor) and a well-known philosopher. His writings on science and philosophy established a firm footing for inductive methods used for scientific inquiry. The details of the method are collectively known as the Baconian Method or the scientific method.

In his work Novum Organum (literally the new Organon referring to Aristotle’s work on metaphysics and logic), Francis has this to say about induction:

Our only hope, then is in genuine Induction… There is the same degree of licentiousness and error in forming Axioms, as in abstracting Notions: and that in the first principles, which depend in common induction. Still more is this the case in Axioms and inferior propositions derived from Syllogisms.

By induction, he meant the careful gathering of data and then refinement of a theory from those observations.

Curiously, both Bacons talk about four errors that interfere with the acquisition of knowledge: Roger does so in his Opus Majus; Francis in his Novum Organum. The following table makes an attempt to match up each’s list.

Roger Bacon’s Four Causes of Error	Francis Bacon’s Four Idols of the Mind
Authority (reliance on prior authority)	Idols of the Theater (following academic dogma)
Custom	Idols of the Tribe (tendency of humans to see order where it isn’t)
Opinion of the unskilled many	Idols of the Marketplace (confusion in the use of language)
Concealment of ignorance behind the mask of knowledge	Idols of the Cave (interference from personal beliefs, likes, and dislikes)

While not an exact match, the two Baconian lists of errors match up fairly well, which is puzzling if historic assumption that Francis Bacon had no access to the works of Roger Bacon is true. Perhaps the most logical explanation is that both of them saw the same patterns of error; that human kind doesn’t change its fundamental nature in the passage of time or space.

Or perhaps Francis is simply the reincarnation of Roger, an explanation that I am positively sure William of Occam would endorse if he were alive today…

Uncategorized

Jun

2015

Ideal Forms and Error

Conrad SchiffJune 13, 20150

A central concept of Socratic and Platonic thought is the idea of an ideal form. It sits at the base of all discussions about knowledge and epistemology. Any rectangle that we draw on paper or in a drawing software package, that we construct using rulers and scissors, or manufacture with computer controlled fabrication is a shadow or reflection of the ideal rectangle. This ideal rectangle exists in the space of forms, which may be entirely within the human capacity to understand the world and distinguish or may actually have an independent existence outside the human mind, reflecting a high power. All of these notions about the ideal forms are familiar from the philosophy from antiquity.

What isn’t so clear is what Plato’s reaction would be if he were suddenly transported forward in time and plunked down in a classroom discussion about the propagation of error. The intriguing question is would he modify his philosophical thought to expand the concept of an ideal form to include and ideal form of error?

Let’s see if I can make this question concrete by the use of an example. Consider a diagram representing an ideal rectangle of length $L$ and height $H$.

Euclidean geometry tells us that the area of such a rectangle is given by the product

\[ A = L \cdot H \; . \]

Of course, the rectangle represented in the diagram doesn’t really exist since there are always imperfections and physical limitations. The usual strategy is to not take the world as we would like it to be but to take it as it is and cope with these departures from the ideal.

The departures from the ideal can be classified into two broad categories.

The first category, called knowledge error, contains all of the errors in our ability to know. For example, we do not know exactly what numerical value to give the length $L$. There are fundamental limitations on our ability to measure or represent the numerical value of $L$ and so we know the ‘true’ value of $L$ only to within some fuzzy approximation.

The second category doesn’t seem to have a universally agreed-upon name, reflecting the fact that, as a society, we are still coming to grips with the implications of this idea. This departure from the ideal describes the fact that at some level there may not even be on definable concept of true. Essentially, the idea of the length of an object is context-dependent and may have no absolutely clear idea at the atomic level due to the inherent uncertainty in quantum mechanics. This type of ‘error’ is sometimes called aleatory error (in contrast to epistemic error; synonymous with knowledge error).

Taken together, the knowledge and aleatory errors contribute to an uncertainty in length of the rectangle of $dL$ and an uncertainty in its height of $dH$.

Scientists and engineers are commonly exposed to a model in determining the error in the area of such a rectangle as part of their training to deal with uncertainty and error in a formula sometimes called the propagation of error (or uncertainty). For the case of this error-bound rectangle, the true area, $A’$, is determined also in Euclidean fashion yielding

\[ A’ = (L+dL) \cdot (H+dH) = L \cdot H + dL \cdot H + L \cdot dH + dL \cdot dH .\]

So the error in the error in the area, denoted as $dA$, has a more complicated form that the area itself

\[ dA = dL \cdot H + L \cdot dH + dL \cdot dH \; . \]

Now suppose that Plato were in the classroom when this lesson was taught. What would his reaction be? I bring this up because although the treatment above is meant to handle error it is still an idealization. There is still a notion of an ideal rectangle sitting underneath.

The curious question that follows in its train is this: is there an ideal form for this error idealization? In other words, is there a perfect or ideal error in the space of forms of which our particular error discussion is a shadow or reflection?

It may sound like this question if predicated on a contradiction but my contention is that it only seems so, on the surface. In understanding the propagation of error in the calculation of the rectangle I’ve had to assume a particular functional relationship.

It is a profound assumption that the object drawn above (not what it represents but that object itself), which is called a rectangle but which is embodied in the real world as made up of atomic parts (be they physical atoms or pixels), can be characterized by two numbers ($L$ and $H$) even if I don’t know what values $L$ and $H$ take on. In some sense, this idealization should sit in the space of forms.

But if that is true, what stops us there. Suppose we had a more complex functional relationship, something, say, that tries to model the boundaries of the object as a set of curves that deviate much from linearity but enough to capture a shaky hand when the object was drawn or a manufacturing process with deviations when machined. Is this model not also an idealization and therefore a reflection of something within the space of forms?

And why stop there. It seems to me that the boundary line between what is and is not in the space of forms is arbitrary (and perhaps self-referential – is the boundary between what is and is not in the space of forms itself in the space of forms). Like levels of abstraction in a computer model depend on the context, could not the space of forms depend on the questions that are being asked.

Perhaps the space of forms is as infinite or as finite as we need it to be. Perhaps its forms all the way down.

Uncategorized

Jun

2015

Why do We Teach the Earth is Round?

Conrad SchiffJune 5, 20150

You’re no doubt asking yourself “Why the provocative title? It’s obvious why we should teach that the Earth is round!” In some sense, this was my initial reaction when this exact question was posed in a round table discussion that I participated in recently. The person who posed the question was undaunted by the initial pushback and persisted. Her point was simply a genuinely honest question driven by a certain pragmatism.

Her basic premise is this. For the vast majority of people on the Earth, a flat Earth model best fits their daily experiences. None of us plan our day-to-day trips using the geometry of Gauss. Many of us fly, but far fewer of us fly long enough distances where the pilot or navigator consciously lays in great circle path. And even if all of us were to fly, say from New York to Rome, so what if the path the plane follows is a ‘geodesic on the sphere’, very few of us are either aware or care. After all, that is someone else’s job to do. And certainly gone are the days where we sit at the seashore and watch the masts of ships disappear last over the horizon – cell phones and the internet are far more interesting.

I listened to the argument carefully and mulled it over a few days and realized that there was a lot of truth in it. The points here weren’t that we shouldn’t teach that the Earth is round but rather that we should know with a firm and articulable conviction why we should teach it and that that criteria for inclusion should be open to debate when schools draw up their curriculum.

So what criteria should be used to construct a firm and articulable conviction? It seems that at the core of this question was a dividing line between types of knowledge and why we would care to know one over the other.

The first distinction in our round-Earth epistemological exploration is one between what I will call tangible and intangible knowledge. Tangible knowledge consists of all those facts that have an immediate impact on a person’s everyday existence. For example, knowing that a particular road bogs down in the afternoon is a slice of tangible knowledge because acting on it can prevent me from arriving home late for dinner (or perhaps having no dinner at all). Knowing that the rainbow is formed by light entering a water droplet in the atmosphere in a particular way so that it is subjected to a single total internal reflection before exiting the drop with the visible light substantially dispersed is an intangible fact, since I am neither a farmer nor a meteorologist. Many are the people who have said “don’t tell me how a rainbow is formed – it ruins all the beauty and poetry!”

An immediate corollary of this distinction is that what is tangible and intangible knowledge is governed by what impacts a person’s life. It differs both from person to person and over time. A person who doesn’t drive the particular stretch of road that I do would find the knowledge that my route home bogs down at certain times and the meteorologist would find the physical mechanism for the rainbow a tangible bit of knowledge, even if it kills the poet in him.

The second distinction is between what I will call private and common knowledge. The particular PIN I use to access by phone is knowledge that is, and should, remain private to me. In the hands of others it is either useless (for the vast majority who are either honest, or don’t know, or both) or it is dangerous (for those who do know me and are up to no good). Common knowledge describes those facts that can be shared with no harm between all people. Knowing how electromagnetic waves propagate is an example of common knowledge but knowing a particular frequency to intercept enemy communications is private.

With these distinctions in hand, it is now easy to see what was meant by the original, provocative question. As it is taught in schools, knowledge that the Earth is round is, for most people, a common, intangible slice of human knowledge. In this context, it is reasonable to ask why we even teach it to the students.

A far better course of action is to try to transform this discovery into a common but tangible slice of knowledge that effects each student on core level. The particular ways that this can be done are numerous but let me suggest one that I regard as particularly important.

Teaching that the Earth is round should be done within a broader context of how do we know anything about the world around it, how certain are we, and where are the corners of doubt and uncertainty. A common misconception is that the knowledge that the Earth is round was lost during the Dark and early Middle Ages. The ancient Greeks knew with a great deal of certainty that the Earth was round and books from antiquity tell the story of how Eratosthenes determined the radius of the Earth to an astounding accuracy considering the technology of his day. This discovery persisted into the Dark and Middle Ages and was finally put to some practical use only when the collective technology of the world progressed to the point that the voyages of Columbus and Magellan were possible. Framing the lesson of the Earth’s roundness in this way provides a historical context that elevates it from mere geometry into a societally shaping event. Science, technology, sociology, geography, and human affairs are all intertwined and should be taught as so.

Along the way, numerous departure points are afforded to discuss other facets of what society knows and how does it know it. Modern discoveries that the Earth is not a particularly spherical (equatorial bulge) know take on a life outside of geodesy and the concepts of approximations, models, and contexts by which ‘facts’ are known and consumed now become tools for honing critical thinking about a host of policy decision each and every one of us has to make.

By articulating the philosophical underpinnings for choosing a particular curriculum, society can be sure that arbitrary decisions about what topics are taught can be held in check. Different segments can openly debate what material should be included and what can be safely omitted in an above board manner. Emotional and aesthetic points can be addressed side-by-side with practical points without confusion. And all the while we can be sure that development of critical thinking is center stage.

Failure to do this leaves two dangerous scenarios. The first is that student is filled with a lot of unconnected facts that improve neither his civic participation in practical matters nor his general appreciation for the beauty of the world. The second, and more importantly, the student is left with the impression that science delivers to us unassailable facts. This is a dangerous position since it leads to modern interpretations of science as a new type of religion whose dogma has replaced the older dogma of the spiritual simply by virtue that its magic (microwaves, TVs, cell-phones, rockets, nuclear power, and so on) is more powerful and apparent.

Uncategorized

Jun

2015

Self-Reference and Paradoxes

Conrad SchiffMay 23, 20150

The essence of the Gödel idea is to encode not just the facts but also the ‘facts about the facts’ of the formal system being examined within the framework of the system being examined. This meta-mathematics technique allowed Gödel to prove simple facts like ‘2 + 2 = 4’ and hard facts like ‘not all true statements are axioms or are theorems – some are simply out of reach of the formal system to prove’ within the context of the system itself. The hard facts come from the system talking about or referring to itself with its own language.

As astonishing as Godel’s theorem is, the concept of paradoxes within self-referential systems is actually a very common experience in natural language. All of us have played at one time or another with odd sentences like ‘This sentence is false!’. Examined from a strictly mechanical and logical vantage, how should that sentence be parsed? If the sentence is true then it is lying to us. If it is false, then it is sweetly and innocently telling us the truth. This example of the liar’s paradox has been known since antiquity and variation of it have appeared throughout the ages in stories of all sorts.

Perhaps the most famous example comes from the original Star Trek television series in an episode entitled ‘I Mudd’. In this installment of the ongoing adventures of the starship Enterprise, an impish Captain Kirk defeats a colony of androids that hold him and his crew hostage by exploiting their inability to be meta.

There are actually host of paradoxes (or antinomies in the technical speak) that some dwerping around on the internet can uncover in just a handful of clicks. They all arise when a formal system talks about itself in its own language and often their paradoxical nature arises when they talk about something of a negative nature. The sentence ‘This sentence is true,’ is fine while ‘This sentence is false.’ is not.

Not all of the examples show up as either interesting but useless tricks of the spoken language or as formal encodings in mathematical logic. One of the most interesting cases deals with libraries of either the brick and mortar variety or existing solely on hard drives and in RAM and FTP packets.

Consider for a moment that you’ve been given charge of a library. Properly speaking, a library has two basic components: the books to read and a system to catalog and locate the books so that they can be read. Now thinking about the books is no problem. They are the atoms of the system and so can be examined separately or in groups or classes. It is reasonable and natural to talk about a single book like ‘Moby Dick’ and to catalog this book along with all the other separate works that the library contains. It is also reasonable and natural to talk about all books written by Herman Melville and to catalog them within a new list with a title perhaps with the name ‘Lists of works by H. Melville’. A similar list can be made with grouping criterion selects books about the books by Melville. This list would have a title like ‘List of critiques and reviews of the works by H. Melville’.

An obvious extension would be to construct something like the following list.

List of Author Critiques and Reviews:

List of critiques and reviews of H. Melville
List of critiques and reviews of J. R. R. Tolkien
List of critiques and reviews of U. Eco
List of critiques and reviews of R. Stout
List of critiques and reviews of G. K. Chesterton
List of critiques and reviews of A. Christie
….

Since the lists are themselves written works what status do they have in the cataloging system? Should there also be lists of lists? If so, how deep should there construction go? At some point won’t we arrive at lists that have to refer to themselves and what do we do when we reach that point? Should the library catalog have a reference to itself as a written work?

Bertrand Russell wrestled with these questions in the context of set theory around the turn of the 20^th century. To continue on with the library example, Russell would label the ‘List of Author Critiques and Reviews’ as a normal set since it is a collection of things that doesn’t include itself. He would also label as an abnormal set, any list that would have itself as a member – in this case a catalog (i.e. list) of all lists pertaining to the library. General feeling suggests that the normal sets are well behaved but the abnormal sets are likely to cause problems. So let’s just focus on the normal sets. Russell asks the following question about the normal sets: Is the set, R, of all normal sets, itself normal or abnormal? If R is normal, then it must appear as a member in its own listing, thus making R abnormal. Alternatively, if R is abnormal, it can’t be listed as a member within itself and, therefore, it must be normal. No matter which way you start you are led to a contradiction.

The natural tendency is, at this point, to cry foul and to suggest that the whole thing is being drawn out to an absurd length. Short and simple answers to each of the questions posed in the earlier paragraph come to mind with the application of a little common sense. Lists should only be themselves cataloged if they are independent works that are distinct parts of the library. The overall library catalog need not list itself because it primary function is to help the patron find all the other books, publications, and related works in the library. If the patron can find the catalog, then there is no need to have it listed within itself. One the other hand, if the patron cannot find the catalog, having it listed within itself serves no purpose – the patron will need something else to point him towards the catalog.

And as far as Russell and perfidious paradox is concerned, who cares? This might be a matter to worry about if one is a stuffy logician who can’t get a date on a Saturday night but normal people (does this mean Russell and his kind are abnormal?) have better things to do with their lives than worry about such ridiculous ideas.

Despite these responses, or maybe because of them, we should care. Application of common sense is actually quite sophisticated even if we are quite unaware of the subtleties involved. In all of these common-sensical responses there is an implicit assumption about something above or outside. If the patron can’t find the library catalog, well then that is what a librarian is for – to point the way to the catalog. The librarian doesn’t need to be referred to or listed in the catalog. He sits outside the system and can act as an entry point into the system. If there is a paradox in set theory, not to worry, there are more important things than complete consistency in formal systems.

This is concept of sitting outside the system, is at the heart of the current differences between human intelligence and machine intelligence. The later, codified by the formal rules of logic, can’t resolve these kinds of paradoxes precisely because they can’t step outside themselves like people can. And maybe they never will.

Uncategorized

May

2015

Gödel’s Theorem – General Notions

Conrad SchiffMay 16, 20150

As I mentioned in last week’s blog, I was encouraged by Casti’s chapter on the Halting Theorem and questions of undecidability in logic and computing. In fact, I was inspired enough that I resolved have another go at studying Gödel’s theorem.

To give some background, many years ago I came across ‘Gödel, Escher, and Bach: An Eternal Golden Braid’ by Douglas Hofstadter. While I appreciate that his work is considered a classic, I found it difficult and ponderous. Its over 700 pages of popularized work did little to nothing to really sink home Gödel’s theorem and the connections to Turing and Church. What’s more, Hofstadter seems to say (it’s difficult to tell exactly as he mixes and muddles many concepts) that Gödel’s work supports his ideas that consciousness can emerge from purely mechanistic means at the lowest level. This point always seemed dodgy to me. Especially since Hofstadter left me with the impression that Gödel’s theorem showed a fundamental lack in basic logic not an emergent property.

For this go around, I decided that smaller was better and picked up the slim work entitled ‘Gödel’s Theorem’ by Nagel and Newman. What a difference a serious exposition makes. Nagel and Newman present all the essential flavor and some of the machinery that Gödel used in a mere 102 pages.

Note that formal logic is not one of my strong suits and the intellectual terrain was rocky and difficult. Nonetheless, Nagel and Newman’s basic program was laid out well and consisted of the following points.

To start, the whole thing was initiated by David Hilbert, whose Second Problem, challenged the mathematical community to provide absolute proof of the consistency of a formal system (specifically arithmetic) based solely on its structure. This idea of absolute proof stands in contrast to a relative proof where the system is question is put into relation to another system, whose validity and consistency is accepted. If the relation between the two is faithful, then the consistency of the second system carries over or is imparted to the first.

Hilbert was unsatisfied by the relative approach as it depended on some system being accepted at face value as being consistent and finding such a system was a tricky proposition.

The preferred way to implement an absolute proof is to start by stripping away all meaning from the system and to deal only with abstract symbols and a mechanistic way of manipulating these symbols using well-defined rules. The example that Nagel and Newman present is the sentential calculus slightly adapted from the ‘Principia Mathematica’ by Whitehead and Russell. The codification of the formal logic system depends on symbols that fall into two classes: variables and constant signs. Variables, denoted by letters, stand for statements. For example, ‘p’ could stand for ‘all punters kick the football far’. There are six constant signs with the mapping

~	Not
$\vee$	Or
$\supset$	If… Then…
$\cdot$	And
(	Left-hand punctuation
)	Right-hand punctuation

The idea is then to map all content-laden statements, like ‘If either John or Tim are late then we will miss the bus!’, to formal statements like ( (J $\vee$ T ) $\supset$ B) with all meaning and additional fluff removed. Two rules for manipulating the symbols, the Rule of Substitution and the Rule of Detachment, are adopted and four axioms are used as starting points.

In using this system, one has to sharpen one’s thinking to be able to distinguish between statements in the system (mathematical statements like ‘2 > 1’) from statements about the system (meta-mathematical statements like ‘2>1’ is true or that the ‘>’ symbol is an infix operator). One must also be careful to note the subtle differences between the symbol ‘0’, the number 0, and the concept of zero meaning nothing.

The advantage of this approach is that the proofs are cleaner, especially when there are many symbols. The disadvantage is that it takes time and effort to be able to work in this language.

The consistency of the formal system is shown when there is at least one formal statement (or formula) that cannot be derived from the axioms. The reason for this is complicated and I don’t have a good grasp on it but it goes something like this. The following formula ‘p $\supset$ (~ p $\supset$ q)’ can be derived in the sentential calculus. If the statements S and ~S are both deducible then any statement you like can be derived from the axioms (via the Rules of Substitution and Detachment) and the system is clearly inconsistent. In the words of Nagel and Newman:

‘The task, therefore, is to show that there is at least one formula that cannot be derived from the axioms.’ [p51]

For the sentential calculus, they point out that the formula ‘p $\vee$ q’ fits the bill since it is a formula, it doesn’t follow from the axiom. Thus this system is consistent. Note that there is no truth statement attached to this formula. The observation simply means that ‘p $\vee$ q’ can’t be obtained from the axioms by a mechanical manipulation. They present this argument in their chapter titled ‘An Example of a Successful Absolute Proof of Consistency’.

Well despite that lofty achievement, they go on to show how Gödel took a similar approach and mostly ended any hope for a proof of absolute consistency in formal systems with a great deal more complexity. Gödel used as his symbols the numerals and as his rules the basic operations of arithmetic and, in particular, the arithmetic of prime numbers. Using an ingenious mapping from of the variables and constant symbols to numbers, he not only could encode the structure of the formal system itself, he could also encode statements about the formal system as well (meta-mathematics). In the language of Hofstadter, these are self-referential statements, although Nagel and Newman don’t use this term.

Using this approach, Gödel was able to prove that there is no absolute proof of consistency. At best, the system can say about itself that it is either incomplete or inconsistent. If the system is incomplete, there are true statements that are not part of the axioms and that cannot be derived from them. Enlarging the set of axioms to include them doesn’t work since they presence begets new unprovable truths. If the system is inconsistent, then everything is ‘true’ as discussed above.

Nagel and Newman leave the reader with come final thoughts that are worth contemplation. On the hope that Hilbert’s program can be successfully completed they have this to say

‘These conclusions show that the prospect of finding for every deductive system an absolute proof of consistency that satisfies the finitistic requirement’s of Hilbert’s proposal, though not logically impossible, is most unlikely.’

They also comment on the possibility of proving consistency from an outside-looking-in approach using meta-mathematical techniques when they say

‘[W]hether an all-inclusive definition of mathematical or logical truth can be devised, and whether, as Gödel himself appears to believe, only a thoroughgoing philosophical ‘realism’ of the ancient Platonic type can supply an adequate definition, are problems still under debate…’

Finally, they have this to say about artificial intelligence (although that term wasn’t in vogue at the time they published

‘[T]he brain appears to embody a structure of rules of operation which is far more powerful than the structure of currently conceived artificial machines. There is no immediate prospect of replacing the human mind by robots.’

And there you have it. A whirlwind tour of Gödel’s theorem with a surprise appearance of the philosophy from antiquity and the ideas about artificial intelligence.

Uncategorized

May

2015

Turing, Gödel, and the Universe

Conrad SchiffMay 9, 20150

Something about the book ‘Five Golden Rules: Great Theories of 20^th-Century Mathematics and Why They Matter’ by John L. Casti caught my eye the other day at the library. On a whim, I signed the book out and started reading. Living up to the promise of the title, the book had five chapters, each one devoted to one of the great math theories of 20^th century. All in all, I had been exposed, at least superficially, to all the topics covered so I wasn’t sure what I would get out of the book other than some additional insight into material I already knew or confirmation of my wisdom in staying away from topics that I didn’t.

Anyway, I am quite glad that the mechanism of providence pointed me at this book because the connections that Casti draws are worth thinking about. None of these connections was as profound for me as the deep link that he explores in Chapter 4 entitled ‘The Halting Theorem (Theory of Computation)’.

In this chapter, Casti first presents the concept of the universal Turing machine (UTM) as a mechanism for attacking the question of Decision Problem (or Entscheidungsproblem) proposed by David Hilbert in 1928. Casti then couples this presentation with a discussion of the famous Gödel Incompleteness Theorem.

I’m simply a beginner in these fields and Casti omits important details and glosses over a variety of things to help the novice. From this point-of-view, I can’t recommend his work. But I am grateful and excited about the perspective he provided by making this linkage.

To understand my excitement, let me first try to fill in some of the background as best as I can.

Hilbert’s Decision Problem asks the following question. Given a set of input data and an algorithm for manipulating said data, is there a way to know if the algorithm will be able to make a yes/no decision about the input. For example, if the input data is a set of axioms in a logical system and some corresponding assertion in the same system and the algorithm is a logical formalism, such as the classical syllogism, will the algorithm be able to prove or disprove the assertion as either true (yes) or false (no)?

While the Decision Problem might seem straightforward enough to talk about it at a seminar, when it comes to actually tackling the question there are some vague conceptions that need further elaboration. Specifically, what does the term ‘algorithm’ really mean and what constitutes a workable algorithm seem to have been the murkiest part when the problem was first posed.

Turing chose to clarify the algorithm concept by the invention of the machine which bears his name. The UTM is a basic model of computation that is often used to understand the theory behind computer programming. It is also the avenue that allowed Turing to a way to tackle Hilbert’s Decision Problem. The basic ingredients for a UTM are a tape or strip of paper, divided into squares, a set of symbols (usually ‘0’ and ‘1’) that can be written into the squares, and a box that has a mechanism for moving the strip left or right, a head that reads from and writes to the strip and that contains an internal state that allows the UTM to select an action from a pre-defined set, given the internal state and current symbol being read. A typical graphical representation of an UTM looks like

The UTM can represent the input to the decision process by a particular pattern of symbols on the strip. It can also implement the step associated with an algorithm by encoding these also to symbols on the tape. So the entire nature of the Decision Problem comes down to the question as to whether once the UTM starts if it will ever halt; hence the name of the name of the chapter.

Kurt Gödel took a different approach to answering the Decision Problem. He developed a way to map any formal system to a set of numbers, called Gödel numbers. Then by formally manipulating these numbers, he was in position to try to answer Hilbert’s challenge.

Now it is reasonable to suppose that all not every question has a yes or no answer. Questions about feeling, opinion, or taste spring to mind. But I think that most of us expect that questions of mathematics and logic and programming have answers and that we should be able to figure them out. The remarkable thing about the work of both Turing and Gödel is that there are cases where the formal logical system simply can’t decide.

In the language of Turing, there are computational problems for which there is no knowing beforehand if the algorithm will terminate with an answer. In the language of Gödel, no matter is done to a logical system in terms of refining existing axioms or adopting new ones, there will always be truths that are unprovable.

I was quite aware of Godel’s theorem and even wrestled with it for a while when I was concerned about its implications to physical systems. Eventually, I decided that while man’s logic may be limited, nature didn’t need to worry about it because she could make decisions unencumbered by our shortcomings.

I was also quite aware that Turing’s machine was used fruitfully as a model computation. What I was unaware of until reading Casti’s book was the parallel between the conclusions of Gödel and of Turing.

And here we arrive at the source of my excitement. As I’ve already said, I remain convinced that Nature can decide – that is to say that Nature is free of the issues discussed above. And yet, in some capacity the Universe is an enormous Turing machine.

So why does Nature always make a decision and why does the computation of trajectories, and the motion of waves, and evolution of quantum bit of stuff always reach a decision? It may not be the decision we expect or anticipate, but it seems clear that a decision is reached. And so, by looking at how the Universe as a Turing Machine (UaaTM) differs from a Universal Turing Machine, one may learn more about both. This is the exciting new idea that has occurred to me and one which I will be exploring in the coming months.

Uncategorized

May

2015

Latest Posts

Post navigation