Monthly Archive: November 2015

Do Web Searches Influence How We Think?

One of the central questions about linguistics asks how much thinking influences language and how much language, in turn, influences thinking.  Clearly there is a link between how we express a thought and what the thought entails.  The words we know, the familiar phrases we invoke certainly shape how others perceive our expressed thought; but what of ourselves?  How much of a feedback loop exists wherein the manner in which we express a thought influences how that thought is perceived by the ourselves is unknown.  Each of us plays the role of both speaker and listener when speaking and often how we express ourselves influences how we perceive ourselves – at least as we imagine how we are perceived by through others eyes.

Clearly there is a link but measuring that link and determining which direction is strongest (thought to language vs language to thought) is ephemeral.  Natural language has been around for so long and with no formal metrics kept until what is essentially yesterday in terms of the march of history that it seems nearly impossible to find out much by examining everyday speech.  Much like the fish that lives in the ocean, we are often even unable to see the medium of speech in which we swim since it is so pervasive.

Mathematical languages have offered better laboratories.  Formal mathematical expression is relatively young – perhaps 500 years old – and is continually refined.  By examining the evolution of mathematics, both the thoughts codified and the method used for the codification, a consensus has been reached that the form of the expression is often as important as what is being expressed.  Why else are there what amounts to holy wars over mathematical notation.  The camps of Newton and Liebniz often warred with each other over the best way to denote the calculus.  Friction of this sort continued on into the 18th and 19th centuries with arguments over vectors and quaternions, sets and logic, and so on.  The general observation is that that better notation means better thinking.

This sentiment is also very much alive and well in the equally contentious ‘discussions’ about which programming language is best.  Adherent s on all sides will talk about the advantages and disadvantages that each has but, regardless of the particulars, one thing is clear; each programmer thinks his favorite language provides him the best avenue to express his thought, while simultaneously holding that other languages limit just how a programmer even thinks about how to solve a problem.

In a nutshell, all three disciplines – linguistics, mathematics, and computer programming – possess practitioners, who at one time or another, expressed the old proverb:

If all you have is a hammer, everything looks like a nail. 

Old Proverb

Unfortunately, all three of these disciplines pre-date modern big data metric collection and analysis.  But there is a modern language-and-thinking loop that is amenable to quantitative analysis.  Each and every day, millions of web-enabled searches are performed.  Some are looking for news, some researching information, some surfing for gossip, some rummaging for products or services, but all are getting auto-completion tossed there way.

It would be interesting and revealing to see to what extent auto-completion influences how we think.  I am sure each of us has started typing a search string into Amazon or Bing or Google only to find ourselves distracted by an auto-completion that was suggested by other searchers.  Surely these entities have the data and probably are sitting on it for commercial purposes but whether they are looking at it from a scientific point-of-view is unknown – although the probability is against it.

Perhaps there is a sociology or psychology department out there that can craft a well-thought out experiment, complete with control and test groups, where they ask students to perform research on the internet in support of a course.  The students are then given two identically-looking search engines.  The test group would get an unmodified search engine and the test group on with a different approach to auto-complete; maybe with more suggestive or more distracting completions relevant to the stated research goal.   Once the data were reduced and analyzed new insights into how to understand the way language and thought interact would be available.  These insights may then shape how we think about thinking, and so on.

 

Summing Up is (NP) Hard to Do

One can’t really be alive, work with computers, writing algorithms to solve hard problems, and be ignorant of the importance of a good algorithm.  Most texts on scientific programming, which formed the bulk of what I read in my formative technical years, possess at least on cautionary tale about how slow a computation can be if the algorithm is bad.

One classic tale I like, which is often trotted out with gusto, is the calculation of the Fibonacci numbers.  As a reminder, the nth Fibonacci number is the sum of the two previous ones

\[  F_n = F_{n-1} + F_{n-2} \; . \]

So to compute $F_{100}$, one need to compute $F_{99}$ and $F_{98}$, each of which need the two below them.  Computing naively causes a lot of repetition since one computes $F_{98}$ and $F_{97}$ to compute $F_{99}$ and $F_{97}$ and $F_{96}$ for $F_{98}$ and so on.  Of course, a fast algorithm exists which essentially involves recording these values once computed, thus ensuring that they are computed only once in the recursion.

While compelling, this cautionary tale is a bit of a cheat.  Someone actually knows the best algorithm and it happens to be quite fast and, generally speaking, it isn’t all that interesting to compute the Fibonacci numbers oneself, since they are immutable.

Much more interesting are the situations that are dynamic.  Finding the best route to visit an arbitrary list of cities is the stuff of legends with which the famous Traveling Salesman Problem (TSP) wrestles.  This problem is often listed as hard, or rather NP-Complete, a class of algorithms that belongs to the larger class of NP-Hard problems.

Roughly speaking, NP problems (or which NP-Complete and NP-Hard belong) are computational problems that do not possess known algorithms that can find solutions in polynomial time.  What that means is that as the input data set, usually thought of as consisting of $N$ entries, is increased (i.e. $N$ grows) the computing time required to find a solution grows much fasters than any polynomial in $N$.  Usually the growth is exponential, although worse cases, such as combinatorial growth, also occur.  This is in sharp contrast with the P class of computational problems that do possess known algorithms that can find solutions in polynomial time.

The curious thing about NP problems are that while they are not quickly solvable, they are quickly checkable.  This phrasing means that while it takes a long time to find a solution for a large input data set, once the solution has been found it can be verified in a relative short period of time that scales as a polynomial in $N$.

This distinction is hard to understand and this difficulty is compounded by the fact that the NP class has three qualifiers making for NP, NP-Complete, and NP-Hard as well as containing the P class.  In addition, problems like the TSP are very specialized and complex making them very hard to relate to in general

NP_hard

Note: The reason NP-Hard lies partially outside the NP boundary is because some NP-Hard may not even be decidable. Undecidable problems possess no definitive answer to the computing question with which they are posed.  That such problems exist inescapably follows from Turing’s analysis of the halting problem.

So I searched for a simplified problem that I could sink my teeth into.  A problem that would be tractable enough to play with but still demonstrated the ‘NP-ness’ in a way that I could relate.  After some trial-and-error, it seems that Subset Sum Problem (SSP) fits the bill.

The SSP asks the following question.  Given a sum $S$ and a set of integers $Q$ is there a proper subset of $Q$ that sums to $S$.   It is provable that the SSP is NP-Complete:  a solution exists (i.e. the problem is decidable) ; it is hard to find a solution, especially when $Q$ has a lot of elements; and it is easy to check the solution once the answer is revealed.

That the SSP is decidable is actually trivial to see.  Given a subset $P \in Q$ of integers, it is easy to sum them together and to test whether the result equals $S$.  By playing with a simple configuration for $Q$, it was easy to demonstrate that it is hard to just find a subset that sums to the desired value and that is equal easy to check once an answer is proffered.

Consider the following $3 \times 4$ block of integers between 1-9, inclusive.

3x4 grid

The sum of all of them is 53 and the smallest number in the set is 1, so it is also easy to answer no to the SSP when $S$ is outside this range.  But what about the rest of the values?  Can any value between 1 and 53 be represented?  (It should be clear that 1 is represented trivially by one of the 1s in Q and that 53 is represented by the whole set).

For a test, can we find a subset that sums to 19?  That answer is obvious if one stares at the grid for a while.  Since there is a 9 in the grid, all one needs to find is two numbers that sum to 10 in order to demonstrate that 19 is a valid subset sum.  One such possibility is

subset_sum_19

How should we go about answering the general question?  This is where the NP part comes in.  In general, the best algorithms known are not polynomial in complexity.  I won’t try to prove this; I am simply stating the conclusion of the computer science community.   But being NP, it is easy to check that a proposed solution is valid in polynomial time.  For example, it may not be obvious that 47 is contained as a subset sum within the grid but the following shows it to be true

subset_sum_47

Each of the blocks of different shades of red sum to 10 (i.e.: 6 + 4; 9 + 1; 8 + 2; 5 + 4 + 1) and the light orange block is a 7. Checking that solution is easier than finding it in the first place or in finding an alternative (note that one trivial change can be done in 3 cell to get a different set that subset sums to 47 – hint – look at the blocks that remained white).

Now back to the general question as to whether any sum can be constructed.  Consider the grid now colored so that the red subsets sum to 30 and the white portion of the grid consists of the numbers 1, 2, 3, 4, 5, and 8.

all_sum

Since the numbers 1, 2, 3, and 4 can sum to every number on the range 1-10, we already have that all numbers from 1 to 40 are subset sums.  For example, 27 can be formed from the two pairs of tens (say 1 + 9 & 7 + 3) and from 3 + 4.  Finally since 8 + 2 = 10 and 8 + 3 = 11, the remaining sums from 41 through 52 are also achievable.  Thus we have proved that all values between 1 and 53 are achievable.

Note that the time it takes to confirm any of these values is much shorter than the time it would take an algorithm to find these solutions.  That is what it means to be NP-Complete.

Making Rational Decisions

I recently came across an interesting method for combining qualitative and quantitative data on a common footing to allow for a mathematically supported framework for make complicated decisions where many criteria are involved.  The method is called the Analytic Hierarchy Process.

The Analytic Hierarchy Process (AHP), which was invented Thomas L. Saaty in the 1970s, uses a technique based on matrices and eigenvectors to structure complex decision making when large sets of alternatives and criteria are involved and/or when some of the criteria are described by attributes that cannot be assigned objective rankings are in play.  It is especially useful in group-based decision making since it allows the judgements of disparate stake-holders, often with quite different points-of-view, to be considered in a dispassionate way.

In a nut-shell, the AHP consists of three parts: the objective, the criteria, and the alternatives.  Criteria can be sub-divided as finely as desired, with the obvious, concomitant cost of more complexity in the decision making process.  Each alternative is then assigned a value in each criterion and each criteria is given a weighting.  The assessments are normalized and matrix methods are used to link the relative values and weightings to give a ranking.  Graphically, these parts are usually presented in hierarchical chart that looks something like:

AHP_structure

A nice tutorial exists by Haas and Meixner entitled An Illustrated Guide to the Analytic Hierarchy Process and this posting is patterned closely after their slides.  The decision-making process that they address is buying a car.  This is the objective (‘the what’) that we seek to accomplish.  We will use three criteria when selecting the car to buy:  Style, Reliability, and Fuel Economy.

Two of these criteria, Style and Reliability, are qualitative or, at least, semi-qualitative, whereas the Fuel Economy is quantitative.  Our alternatives/selections for the cars will be AutoFine, BigMotors, CoolCar, and Dynamix.

The first step is to make assign numerical labels to the qualitative criteria.  We will use a 1-10 scale for Style and Reliability.  Since we are weighing judgements, the absolute values of these scores are meaningless.  Instead the labels indicate the relative ranking.  For example, we can assume that the 1-10 scale can be interpreted as:

  • 1 – perfectly equal
  • 3 – moderately more important/moderately better
  • 5 – strongly more important/strongly better
  • 7 – very strongly more important/very strongly better
  • 9 – extremely more important/extremely better

with the even-labeled values slightly greater in shading than the odd labels that precede them.  This ranking scheme can be used to assign weightings to the criteria relative to each other (for example style is almost strongly more important than reliability – 4/1) and to weigh the alternatives against each other in a particular criteria (for example AutoFine is moderately better than CoolCar in reliability).

To be concrete, let’s suppose our friend Michael is looking to buy a car.  We interview Michael and find that he feels that:

  • Style is half as important as Reliability
  • Style is 3 times more important as Fuel Economy
  • Reliability is 4 times more important as Fuel Economy

Based on these responses, we construct a weighting table

  Style Reliability Fuel Economy
Style 1/1 1/2 3/1
Reliability   1/1 4/1
Fuel Economy     1/1

where the first number in the entry corresponds to the row and the second to the column.  So the ‘4/1’ entry encodes the statement that Reliability is 4 times more important as Fuel Economy. The omitted entries below the diagonal as simply the reverses of the one above (e.g. 1/2 goes to 2/1).

This table is converted to a weighting matrix $\mathbf{W}$ which numerically looks

\[ {\mathbf W} = \left[ \begin{array}{ccc} 1.000 & 0.5000 & 3.000 \\ 2.000 & 1.000 & 4.000 \\ 0.3333 & 0.2500 & 1.000 \end{array} \right] \; . \]

In a similar fashion, we interview Michael for his judgments of each automobile model with each criteria and find corresponding weighting matrices for Style and Reliability:

\[ {\mathbf S} = \left[ \begin{array}{ccc} 1.000 & 0.2500 & 4.000 & 0.1667 \\ 4.000 & 1.000 & 4.000 & 0.2500 \\ 0.2500 & 0.2500 & 1.000 & 0.2000 \\ 6.000 & 4.000 & 5.000 & 1.000 \end{array} \right] \; \]

and

\[ {\mathbf R} = \left[ \begin{array}{ccc} 1.000 & 2.000 & 5.000 & 1.000 \\ 0.5000 & 1.000 & 3.000 & 2.000 \\ 0.2000 & 0.3333 & 1.000 & 0.2500 \\ 1.000 & 0.5000 & 4.000 & 1.000 \end{array} \right] \; . \]

Finally, we rank the fuel economy for each alternative.  Here we don’t need to depend on Michael’s judgment and can simply look up the CAFE standards to find

\[ {\mathbf F} = \left[ \begin{array}{c}34\\27\\24\\28 \end{array} \right] mpg \; . \]

Saaty’s method directs us to first find the eigenvectors of each of the $4 \times 4$ criteria matrices and of the $3 \times 3$ weighting matrix that correspond to largest eigenvalues for each.  Note that the Fuel Economy is already in vector form.  The L1 norm is used so that each vector is normalized by the sum of it elements.  The resulting vectors are:

\[ {\mathbf vW} = \left[ \begin{array}{c}0.3196\\0.5584\\0.1220 \end{array} \right] \; , \]

\[ {\mathbf vS} = \left[ \begin{array}{c}0.1163\\0.2473\\0.0599\\0.5764 \end{array} \right] \; , \]

\[ {\mathbf vR} = \left[ \begin{array}{c}0.3786\\0.2901\\0.0742\\0.2571 \end{array} \right] \; , \]

and

\[ {\mathbf vF} = \left[ \begin{array}{c}0.3009\\0.2389\\0.2124\\0.2479 \end{array} \right] \; . \]

A $4 \times 3$ matrix is formed whose columns are ${\mathbf vS}$, ${\mathbf vR}$, ${\mathbf vF}$ which is then left multiplied into ${\mathbf vW}$ to give a final ranking.  Doing this gives:

AutoFine 0.2853
BigMotors 0.2702
CoolCar 0.0865
Dynamix 0.3580

 

So from the point-of-view of our criteria, Dynamix is the way to go.  Of course, we haven’t figured in cost.  To this end, Haas and Meixner recommend scaling these results by the cost to get a value.  This is done straightforwardly as shown in the following table.

  Ranking Cost Normalized Cost Value = Ranking/Normalized Cost
AutoFine 0.2853 $20,000 0.2899 1.016
BigMotors 0.2702 $15,000 0.2174 1.243
CoolCar 0.0865 $12,000 0.1739 0.497
Dynamix 0.3580 $22,000 0.3188 1.122

 

With this new data incorporated, we now decided that BigMotors gives us the best value for the money.  Whether our friend Michael will follow either of these two recommendations, is, of course, only answerable by him but at least the AHP gives some rational way of weighing the facts.  I suspect that Aristotle would have been pleased.

Technique and Judgment – Distinguishing ‘How’ and ‘What’

I suppose this column grew out a confluence of things that made me realize yet another limitation of the computers in their gallant attempt to unseat the human race in its mastery of the planet.  This limitation, unfortunately, also impacts the human being first learning a new skill.  What is this limitation you may ask – it’s the inability to distinguish technique from judgment.  Fortunately humans can grow out of it, computers not so much (that is to say not at all).

On the face of it, this limitation seems to be a mismatching of concepts bordering on a non sequitur.  After all technique is how one does whereas judgment centers on the ability to decide or conclude.  What do the two have to do with each other?

To illustrate, let’s consider the average student in one of the STEM programs at a university.  The student spends large amounts of time in mathematical courses learning the techniques associated with Calculus, Linear Algebra, Differential Equations, Vector Analysis and the like.  A good student earning good grades succeeds at tests with questions of the sort:

Given a vector field $\vec f(x,y,z) = x^2 \hat \imath + \sin(y) \hat \jmath + \frac{1}{3 z^3} \hat k$ compute the divergence $\nabla \cdot \vec f(x,y,z)$

A successful completion of this problem leads to the answer:

$\nabla \cdot \vec f(x,y,z) = 2x +\cos(y) – \frac{1}{z^4}$

demonstrating that the student knows how to compute a divergence.  To be sure, this skill and the others listed above, are important skills to have and are nothing to sneeze at, but they don’t take one far enough.  Without the judgment of knowing what to do, the technique of how to do it becomes nearly useless.

To illustrate this, our student, having gotten straight A’s in all her subjects now moves onto an engineering class where she is asked to solve a problem in electricity and magnetism that says something like

Given the following distribution of charges and arrangements of conducting surfaces, compute the electric field.

Suddenly there is no specification on what technique to use, no indication how to solve the problem. All that is being asked is a ‘what’ – what is the electric field.  Prudent judgment is needed to figure out how to solve the problem.  And here we find the biggest stumbling block for the human (and a nearly insurmountable obstacle for current computing).

Lawvere and Schanuel, in their book Conceptual Mathematics: a first introduction to categories, summarize this distinction when they note

There will be little discussion about how to do specialized calculations, but much about the analysis that goes into deciding what calculations need to be done, and in what order.  Anyone who has struggled with a genuine problem without having been taught an explicit method knows that this is the hardest part.

– Lawvere and Schanuel

Of course, any technique sufficiently refined, can be ‘taught’ to a computer. In the example from Vector Calculus discussed above, any number of computer algebra programs can be used to compute the divergence of the vector field $\vec f(x,y,z)$. Almost nothing exists in the way of computing judgment to determine the best strategy to solve a ‘what’ type problem. Even the most sophisticated expert systems fail to compete with competent human judgment unless the application is heavily structured.

The distinction between the ‘what’ and the ‘how’, between the judgment needed to determine what and when to do a thing and the technique needed to perform the thing is often complicated and subtle – even for the human.  Much like the intricate interplay between language and thought the interaction between judgment and technique has no clean lines.  In fact, viewed from a certain perspective, a technique can be thought of as the language of doing and judgment as the thought associated with what should be done.  How we do a thing often affects what we think can be done. That’s what makes it fun being alive!