Latest Posts

Paradigms and Constraints

To start, this month’s column brings the experimental exploration of category theory from Lewvere’s and Schanuel’ book to an end.  Why?  Well, up to this point, the invested effort seems to be far larger than the gained benefit.  Category theory promises deep insight into more complex ideas – otherwise why would anyone study it – but the amount of effort required to get to that place in the text by Lewvere and Schanuel seems overwhelming.  There is no middle ground in their work.  Either it is painfully pedantic or deeply complex.

So, as a change of pace, this column will be returning to simpler, one-off topics.  For this month, I thought it would be interesting to share a thought about the overlap between philosophy and computer programming that came about from an interesting video.  In this video, Dave Farley of Constant Development discusses the history of computing paradigms with an eye towards the tribal standoffs between the object oriented crowd versus the functional programming adherents, with particular attention between the tribal standoffs between the two over which paradigm is best.

While it is both comforting and amusing is the Dave says that both positions are ‘rubbish’ (with a delightful accent no less).  I heartily agree with his basic point that one should pick the right paradigm for the right job.  Along those lines, the most insightful thing that he had to say is an observation he attributed to “Uncle” Bob Martin.  Martin, a computing pundit of some note, says the purpose of a programming paradigm to constrain us, to take away a degree of freedom that we had otherwise been allowed. 

After reflecting on this point on more than one occasion, I realized, even though Farley didn’t say this, that the idea of constraining the world as part of a paradigm is both an ontological and epistemological metaphysical process whose scope is far larger than just computer programming.  A paradigmatic way of thinking constrains the way we perceive reality by constraining how we divvy up the world around us into pieces we understand.

One of the earliest texts covering the idea of chopping up the world into useful chunks is found in the philosophical work of Aristotle entitled the Categories.  In this work, Aristotle lists 10 separate categories that that cover all the possible things that can be the subject of a proposition.  The work centers around language and how parts of speech reflect the ideas we hold and this ten-fold division was meant to encompass everything a person can think (or perhaps just talk) about.  While the work is useful, because it deals with common observations and mode of thought we all share, it is dense and sometimes difficult.  Mortimer Adler, in his book Aristotle for Everyone, calls it uncommon common sense.  Fortunately, the key point for this discussion is that Aristotle offers a paradigm (perhaps hard to grasp) that constrains how we think (or describe) the world.

A more prosaic example comes from our everyday experience of communicating with our fellow beings.  For example, many of us would consider the word ‘automobile’ to be straightforward to define and we expect that the dictionary entry would be non-controversial.  However, exactly how each of us thinks about ‘automobile’ depends on the paradigm being used. 

For a thermal engineer, whose livelihood centers on the study of work and heat, an automobile make conjure the concept of a gasoline or diesel engine, a drive chain and an exhaust.  He has constrained the world to one dealing with how energy is stored, how much of it can be stored in a given volume, and how it moves from place to place.  A systems engineer, whose livelihood derives from understanding the interplay of various technologies, would mostly see a car as a set of subsystems designed to interact appropriately.  Does the engine provide the necessary power to match the steering and suspension?  Does the fuel tank and milage match the intended purpose?  And so on.  Our systems engineer has constrained his view of the automobile in such a way ignores the details of thermodynamics and deals only with the abstract ways in which such subsystems interface with each other.  To an urban planner, the automobile represents an atomic entity whose characteristics and behaviors are constrained to be at the driving level.  Doing so, allows the urban planner to deal with matters of traffic congestion, proper placement of tolls, and the like without concerning himself with the way in which the automobile moves or how well it moves.

Now let’s returning to the question of computer programming paradigms.  Certainly, a programmer must know, when modeling an automobile, what level his solution must deal with.  Is he programming a finite-element thermal model, an abstract entity-relationship, or a stochastic Monte Carlo simulation to match the fictitious engineers of the previous discussion.  However, the interesting aspect of a seasoned programmer is that when thinking about programming paradigms in the abstract he is actually engaging in metaphysics.  He is not asking how to describe how a thing can be an automobile but how a thing can be.  The comparisons of various paradigms naturally leads to the study of being itself.  The computer programmer has to deal with not only the problem of deciding into what category the thing he is interested falls but also with the meta-problem of how to categorize the categorizations; how to pick between the various ways of abstracting and how to apply them to the ontology that is the world.

Farley raises another point that is worth pondering.  He believes that human beings are intrinsically classifying animals, even if he doesn’t phrase it quite that way.   For that reason, he believes that the object-oriented paradigm is more naturally matched to human thought.  However, his definition of object oriented programing as the marriage of data and behaviors with polymorphism as the constraint agent (each object owns its specific response to a common message like fall) does lead to some difficult ontological questions when modeling the physical world.  Does a rock know how to behave under the influence of gravity because it knows how to interpret the message sent from the Earth or does the Earth know how to make the rock fall because of a message that the rock sends, or is the message loop (i.e. the gravitational interaction) the key point?  Even though an exploration of this question is beyond the scope of this post the ability to pose this question shows the power of thinking about paradigms as constraints.

It is for this reason that I think programming will becoming increasingly more and more important component of our day-to-day lives; not just because of the practical applications involving our smartphones and tablets, our X-boxes and Play Stations, and our computers working to solve this problem or that, but because it will trigger in us a way to look at the world anew.   If only Aristotle could have lived to participate in these debates.

Category Theory 6 – Coding Up Mappings

This month’s article is a bit of a departure from the previous installments on category theory in that it deals with the computational side of things rather than the theoretical side of things.  But, as the recent columns have shown, it is typically the case that working with specific examples facilitates a basic understanding of the general.

As a case in point, consider the set of all possible mappings between the sets $A$, $B$, and $C$, an example of which is shown below

where $A$ has three elements $\left\{ a_1, a_2, a_3\right\}$, $B$ has two elements $\left\{ b_1, b_2 \right\}$, and $C$ has three elements $\left\{ c_1, c_2, c_3\right\}$.  Although we’ve focused on automorphisms wherein we identified $A$ and $C$ as the same set, the general case will be what we tackle here. 

Even though these sets are small, the combinatorics associated with all possible mappings quickly daunts the use of paper and pencil.  There are 8 distinct mappings $f:A \rightarrow B$ and 9 distinct mappings from $g:B \rightarrow C$ leaving 72 mappings in total for the composition.  Obviously this situation calls for some basic computing.

Despite the simplicity of the discussion up to date in Lawvere and Schanuel, one thing they didn’t do is actually adopt terminology well-suited to writing computer code.  The word ‘set’, of course, is perfect for describing the collection, so that’s okay, but what to call the contents of each set.  The typical mathematical terminology would be members or elements but the former has an object-oriented connotation that would tend to obscure things while the latter is simply too generic.  In addition, both of those are rather long to type and really don’t speak to what the diagram looks like.  For the purposes here, things like $a_3$ or $b_1$ will be called dots since they are represented as such in the diagrams.  At this point, their terminology fails and the thing that connects two dots doesn’t even have a name in their bestiary.  Since it is drawn as an arrow in the diagrams, that is the name that will be used in the code.  The end of the arrow without the point will be called the outgoing side; the end with the point is the incoming side.

Using this terminology, we then say that:

  • each set is made up of dots,
  • each dot supports one outgoing arrow
  • each dot can receive any number of incoming arrows, including zero
  • a mapping is a collection of arrows

The python programming language is an excellent choice and everything that follows is done in version 3.x within the Jupyter notebook framework.  The sets discussed above are represented as python lists:

  •  Aset = ['a1','a2','a3']
  •  Bset = ['b1','b2']
  •  Cset = ['c1','c2','c3']

The first function

def generate_arrows(set1,set2):
    arrows = []
    for dot1 in set1:
        for dot2 in set2:
            arrows.append(dot1+'->'+dot2)
    return arrows

returns a list of all possible ways to draw arrows from set1 to set2.  Applying this function to sets $A$ and $B$ gives:

generate_arrows(Aset,Bset)
['a1->b1', 'a1->b2', 'a2->b1', 'a2->b2', 'a3->b1', 'a3->b2']

This helper function is used by the larger function

def generate_mappings(set1,set2):
    import itertools as it
    
    arrows      = generate_arrows(set1,set2)
    mapping_gen = it.combinations(arrows,len(set1))
    mappings    = []
    for candidate in mapping_gen:
        count = []
        for i,dot1 in zip(range(len(set1)),set1):
            count.append(0)
            for arrow in candidate:
                if dot1 in arrow:
                    count[i] += 1
        if min(count) != 0:
            mappings.append(candidate)
    return mappings

The function generate_mappings depends on the itertools module in python to determine all the possible combinations to make a mapping.  When applied to $A$ and $B$, the output of

A_to_B = generate_mappings(Aset,Bset) 

[('a1->b1', 'a2->b1', 'a3->b1'),
 ('a1->b1', 'a2->b1', 'a3->b2'),
 ('a1->b1', 'a2->b2', 'a3->b1'),
 ('a1->b1', 'a2->b2', 'a3->b2'),
 ('a1->b2', 'a2->b1', 'a3->b1'),
 ('a1->b2', 'a2->b1', 'a3->b2'),
 ('a1->b2', 'a2->b2', 'a3->b1'),
 ('a1->b2', 'a2->b2', 'a3->b2')]

is a list 8 elements in length as expected.  Each line is one possible mapping between the two sets.  Likewise, the total listing of all mappings from $B$ to $C$

[('b1->c1', 'b2->c1'),
 ('b1->c1', 'b2->c2'),
 ('b1->c1', 'b2->c3'),
 ('b1->c2', 'b2->c1'),
 ('b1->c2', 'b2->c2'),
 ('b1->c2', 'b2->c3'),
 ('b1->c3', 'b2->c1'),
 ('b1->c3', 'b2->c2'),
 ('b1->c3', 'b2->c3')]

has 9 possibilities, also as expected.

In order to compose functions, we need a way of combining expressions like ‘a1->b1’ and ‘b1->c3’ to get ‘a1->c3’.  This is provided by

def compose_arrows(arrow1,arrow2):
    head1,targ1 = arrow1.split('->')
    head2,targ2 = arrow2.split('->')
    if targ1 == head2:
        return head1+'->'+targ2
    else:
        return False

Note that if the middle term is not the same the function returns a False.

The functionality is rounded out with the

def compose_mappings(mappings1,mappings2):
    mappings = []
    for f in mappings1:
        for g in mappings2:
            compositions = []
            for f_arrow in f:
                for g_arrow in g:
                     compositions.append(compose_arrows(f_arrow,g_arrow))
            while(False in compositions):
                compositions.remove(False)
            mappings.append(tuple(compositions))
    return mappings

which composes two lists of mappings.  Plugging in the previous two lists of mappings (mappings = compose_mappings(A_to_B,B_to_C)) gives a list with 72 elements.  Of course, given the sizes of $A$ and $C$ there can be, at most, only 27 unique ones.  The repetitions are trivially removed by casting the list to a set and then casting it back to a list.  Doing this gives the following list of unique mappings:

[('a1->c1', 'a2->c2', 'a3->c1'),
 ('a1->c2', 'a2->c3', 'a3->c2'),
 ('a1->c3', 'a2->c2', 'a3->c3'),
 ('a1->c2', 'a2->c1', 'a3->c2'),
 ('a1->c1', 'a2->c3', 'a3->c1'),
 ('a1->c3', 'a2->c3', 'a3->c1'),
 ('a1->c2', 'a2->c2', 'a3->c3'),
 ('a1->c1', 'a2->c1', 'a3->c1'),
 ('a1->c3', 'a2->c1', 'a3->c1'),
 ('a1->c3', 'a2->c2', 'a3->c2'),
 ('a1->c3', 'a2->c1', 'a3->c3'),
 ('a1->c1', 'a2->c1', 'a3->c3'),
 ('a1->c3', 'a2->c3', 'a3->c3'),
 ('a1->c1', 'a2->c3', 'a3->c3'),
 ('a1->c2', 'a2->c1', 'a3->c1'),
 ('a1->c1', 'a2->c1', 'a3->c2'),
 ('a1->c2', 'a2->c2', 'a3->c1'),
 ('a1->c3', 'a2->c3', 'a3->c2'),
 ('a1->c2', 'a2->c3', 'a3->c3'),
 ('a1->c2', 'a2->c2', 'a3->c2'),
 ('a1->c1', 'a2->c2', 'a3->c2')]

There are 21 in all, 6 less than the 27 possible mappings between $A$ and $C$.  The missing ones correspond to the one-to-one mappings between $A$ and $C$ (i.e., $a_1 \rightarrow c_1$, $a_2 \rightarrow c_2$, $a_3 \rightarrow c_3$ and permutations).  Of course, these results drive home the point that, since the size of $B$ is smaller than $A$ and $C$, there is simply no way to support a one-to-one mapping.

One can experiment with replacing $B$ with a 4-element set $D$.  Doing so gives the expected 27 unique mappings $f: A \rightarrow C$ (at the expense of calculating 5184 possible combinations of going from $A$ to $C$ with a layover at $D$.  Obviously nobody should want to do this by hand.

So there you have it:  while not theoretically glamorous, functionality like this really allows for an exploration that couldn’t be done by hand.

Category Theory 5 – Retractions, Sections, and a New Definition for Inverse

With a clearer picture of determination and choice established in the last post, we now turn to refining the picture of the ‘division problem’ within the category of sets. The basic mechanic in both of these problems is trying to find the missing side of the triangle where the base is the direct mapping $h: A \rightarrow C$ and the two sides are the mappings $f:A \rightarrow B$ and $g:B \rightarrow C$ composed together, with the set $B$ acting as a waypoint. The only difference is between the two problems is which of the two sides is considered given and which is being solved for. In other words, the idea is to ‘divide’ both sides by $f$ to solve $f \circ g? = h$, in the determination case, or to ‘divide’ both sides by $g$ to solve $f? \circ g = h$ in the choice case, knowing full well that in certain circumstances an answer simply does not exist.

Lawvere and Schanuel, without explicitly saying clearly why in words, pay particular attention to a smaller subset of the determination and choice problems in which base of the triangle is an automorphism $I_A: A \rightarrow A$.  In this special case, the solutions to the determination and choice problems take the special name retractions and sections, respectively.

The two mathematical relationships that summarize these special cases are $ r \circ f = I$ and $f \circ s = I$ and where $f$ is the given mapping in whichever of the two problems is being considered (although, to be clear, $f$ will sometimes be called $g$ when it is the second leg of the composition and the context is clearer to do so). 

Of course, no matter which leg of the waypoint is specified as $f$, once a solution is found, both sides enter a dual relationship with the other and it then becomes a matter of taste as to which is the retraction and which is the section. Thus, $f$ can be regarded as the section to the retraction $r$ in the determination case and as the retraction to the section $s$ in the choice case. The following diagram illustrates this duality for a particular set of mappings between a two-element set (the label $A$ being suppressed for clarity) and the three-element set $B$.

The green lines correspond to the specific components of the mapping being solved for.  The dotted lines correspond to arrows that could be chosen differently (e.g. $b_2$ could point to either element in $A$). 

Once a section or a retraction have been found, other, more general ‘division’ problems can be solved.  In some sense, a retraction or a section are the primitives that unlock these problems.

The first of these general cases is built upon the retraction problem, where the mapping $y$, which goes from $A$ to a new set $T$, is given and the question is whether there exists a mapping $x$ from $B$ to $T$.  The ‘division problem’ we are trying to solve is defined by

\[ x \circ f = y \; . \]

The triangle portion of problem is summarized on the left where the ‘circular arrow’ connecting $A$ to $A$ is meant to remind us of the automorphism. 

Lawvere and Schanuel point out that this problem always has a solution when $x \equiv y \circ r$, since

\[ x \circ f = (y \circ r ) \circ f = y \circ ( r \circ f ) = y \circ I_A = y \; . \]

A specific instance of this, for the two- and three-element sets above, is given by the following diagram,

where the components of $y$ are the blue arrows, and the black and purple arrows are the original $f$ and $r$ dual pair of the retraction problem, respectively. The action of the retraction $r$ is to bring an element of $B$ back to $A$ where it can use the mapping $y$ to have access to the set $T$. The specific pieces of $x$ are:

\[ x(b_1) = (y \circ r) (b_1) = y (a_1) = t_1 \; , \]

\[ x(b_2) = (y \circ r) (b_2) = y (a_2) = t_4 \; , \]

and

\[ x(b_3) = (y \circ r) (b_3) = y (a_2) = t_4 \; . \]

In analogous way, the other, general cases is built upon the section problem, where the mapping $y$, which goes from a new set $T$ to $A$, is given and the question is whether there exists a mapping $x$ from $T$ to $B$. The ‘division problem’ we are now trying to solve is defined by

\[ f \circ x = y \; . \]

The triangle portion of problem is now summarized on the right where, again, the ‘circular arrow’ connecting $A$ to $A$ is meant to remind us of the automorphism. 

This problem always has a solution when $x \equiv s \circ y$, since

\[ f \circ x = f \circ (s \circ y) = (f \circ s) \circ y = I_A \circ y = y \; . \]

A specific instance of the section problem, again for the two- and three-element sets above, is given by the following diagram,

where the components of $y$ are still the blue arrows, and the black and green arrows are the original $f$ and $s$ dual pair of the section problem, respectively. The action of the section is to connect the action of the mapping $y: T \rightarrow $ to the set $B$ thereby putting each element of $T$ into the appropriate correspondence with elements in set $B$. The specific pieces of $x$ are:

\[ x(t_1) = (s \circ y) (t_1) = s (a_1) = b_1 \; , \]

\[ x(t_2) = (s \circ y) (t_2) = s (a_1) = b_1 \; , \]

\[ x(t_3) = (s \circ y) (t_3) = s (a_1) = b_1 \; , \]

and

\[ x(t_4) = (s \circ y) (t_4) = s (a_2) = b_3 \; . \]

One final note about the retraction and section problems. By making the base mapping of the triangle problem an automorphism, we are forces to supply additional requirements not imposed in the more general case. As was seen in the last post, an essential ingredient is that the left-side mapping from $A$ to $B$ be injective and the right-side mapping from $B$ to $A$ be surjective. It is instructive to look at how the number of retractions vary with size of the waypoint set $B$. The following diagram illustrates the retraction problem for the three cases with either one fewer, the same number of, or one more elements in $B$ than in $A$ (where $f$ and $g$ are used to specify the left- and right-side of the triangle).

Likewise, the following diagram illustrates how the number of sections vary through the same progression in the set $B$.

In both cases, there is a sense that the size of $B$ (i.e.  the number of the elements) is central to whether or not the problem has a solution and, if so, how many solutions exist.  This observation leads to two conclusions.

First, the definition of an inverse mapping, which was originally specified by the two conditions $g \circ f = I_A$ and $f \circ g = I_B$ (see this post for details), can now be recast as:

A map $f:A \rightarrow B$ is an isomorphism if there exists a unique inverse map $f^{-1}$ which is both a retraction and a section for $f$ such that $f \circ f^{-1} = I_B$ and $f^{-1} \circ f = I_A$.

Since the inverse must be both injective and surjective (since it is both a retraction and a section), we arrive at the familiar result that an inverse must be bijective. 

Second, one may wonder why all this machinery is needed to arrive at a result already familiar to mathematics centuries earlier.  According to Lawver and Schanuel, all this hard work will pay off when tackling categories where the objects are promoted from simple finite sets to richer sorts.  In particular, infinite sets and dynamical systems are cited but the jury is still out in this exploration as to whether their argument will ultimately be convincing.

Category Theory Part 4 – Determination and Choice

This month’s post on category theory picks up where the last post left off, with a look at the choice problem, the complimentary situation in map composition to the determination problem considered in the last post.  However, before digging in the choice problem directly this narrative will begin by revisiting the determination problem.  This go-back is one of the consequences of just following where the roots of discovery lead without having trampled down a path.  After having a month to reflect on the determination problem, it became clear that a discussion of the choice problem would greatly benefit from a little more detail in former.

One of the major failures of Lewvere’s and Schanuel’s book is not sufficiently connecting the combinatorics associated with composing maps with both problems.  While it is true that the text emphasizes the formal analogies between multiplication of ordinary numbers with compositions of maps, the usefulness of the analogy is never clearly stated.  Their exposition goes from the very simplistic (e.g. considering how to tell when two sets have the same number of elements) to the very complicated (retractions and sections as a way of defining inverse mappings) with no intermediary steps.  The purpose of this post is to provide those waypoints.  Along the way, it will become clearer why the authors focus on automorphisms.

To lay the groundwork, consider the finite sets $A$, $B$, and $C$ as shown in the diagram below, 

along with known mappings $f:A \rightarrow B$ and $h:A \rightarrow C$ and the yet-to-be-determined mapping $g:B \rightarrow C$. Set $A$ has two elements as does $C$, while $B$ has only one. In the last post, the importance of the mapping $f$ being injective was raised but not sufficiently explained. A clear distinction wasn’t made as to whether or not a non-injective map could actually solve the determination problem. We will detail the various mappings possible between the three sets shown in the figure and will show that not injective mappings can solve the determination problem but only in special contrived cases and never when the mapping between $A$ and $C$ is an automorphism.

Before diagraming each of the individual cases, let’s count the number of diagrams needed – one for each of possible sets of mappings ${f,g,h}$ (whether or not the set solves the determination problem). The number of mappings between any two sets is equal to the number of elements in the target or codomain set raised to the power of the number of elements in in the domain set. Since $B$ has only one element all mappings from any other set must look the same, that is every element of the domain maps to the singleton in $B$. The number of mappings from $B$ to $C$ is equal to the number of elements in $C$, which in this case is two. A similar computation shows that the number of distinct mappings from $A$ to $C$ is four. Already we can see that there is likely to be a problem in trying to solve the determination problem as there are only two distinct ways to go from $A$ to $C$ with a waypoint at $B$ whereas there are four ways in which to go from $A$ to $C$ directly. To see which of the four versions of $h$ are incompatible let’s look at the diagrams for the cases where $h$ is not injective followed by where it is.

In the case were the mapping $h:A \rightarrow C$ is not injective, there is always the solution to the determination problem despite the fact that the mapping from $f:A \rightarrow B $ also fails to be injective. The figure below shows the four possible sets of ${f,g,h}$ in this case with the left side showing the correct choice for the mapping $g$ and the right showing the wrong way to do it.

However, when the mapping $h$ is injective, that is to say that each element in $A$ maps to a unique element in $C$ (more generally every point in the codomain has, at most, one incoming arrow), then no solution to the determination problem can be found. This is shown in the following diagram for the two distinct possibilities for $h$, one of which is in an automorphism (with the identifications $a_1 = c_1$ and $a_2 = c_2$) and the other a permutation.

If we restrict ourselves to only the automorphism between $A$ and $C$, then we have to give up all hope of contriving a solution if $f$ is not injective, because, otherwise we would have to force some doubly-targeted element (i.e. $b_1$ in this case) into trying to do the impossible – to simultaneously point to two separate elements in its codomain. 

The special case where a mapping $g$ solves the determination problem for the case where $h=I_A$ is called a retraction for $r$ for $f$ and it obeys the relation $r \circ f = I_A$.

We are now in a better position to understand the choice problem and the role of the automorphisms in it.  The choice problem is the analog to the determination problem where the mappings $h: A \rightarrow C$ and $g:B \rightarrow C$ are known and where the map being sought is $f:A \rightarrow B$.   We’ve already cataloged all the possible sets of maps in the examples above and we know that the choice problem is only compatible with the case where $h$ is not-injective but this doesn’t give us enough information to understand the properties of $g$.  So, to really appreciate the choice problem we are going to have to change the composition of the sets.  The simplest approach is to enlarge $B$ to have 3-elements (the reason for skipping a 2-element $B$ should become obvious afterwards). 

Since we are interested in automorphisms as a special case of the 4 possible mappings from $A$ to $C$ we can substantially decrease the tedium of working through all the possible combinatorics.  In this special case, a solution to the choice problem is called a section and is usually denoted as $s$.   

Despite this simplification, there are still nine possible mappings into $B$ and eight possible mappings from it (for a total of 72 possibilities).  Fortunately, our job will get even easier after we examine the following diagram which focuses on one specific mapping from $B$. 

Of the nine possibilities that map the 2-element set (call it $A$ or $C$ as you wish) to $B$ none of them work because each element of $B$ aims at the same element in $A$. The remedy is to make sure that mapping $g$ sends at least one arrow to each element in $C$ – that is that $h$ is surjective.

It doesn’t matter exactly how the rest of the mapping is specified (for example $b_3$ could point to the other element).  The key feature is simply that all elements in $A$ are covered with at least on arrow head.

These two special cases are summarized quite nicely in the following diagram (adapted from the one on page 73 of Lawvere and Schanuel).

Next month, we’ll explore how to relate retractions and sections of a given map $f$ to a variety of ‘division problems’ in the category of finite sets and will we’ll explore a new insight into inverse mappings $f^{-1}$ that they provide.

Category Theory Part 3 – Determination and Division

Last month’s column talked about the mathematical and philosophical aspects of that one special mapping out of all the ones that exist: the isomorphism.  An isomorphism is special because only this mapping possesses the property that it can be ‘run in reverse’ and, like watching a movie played backwards, the end state (the codomain) returns exactly to the domain.  When this occurs, the original mapping has an inverse.

Lawvere and Schanuel (see here for details) emphasize that much like the arithmetic inverse of 2 is 1/2, so too can we interpret the mapping inverse as a sort of division problem.  Along those lines they introduce the reader to two types of ‘division’ problems – known as the determination and choice problems, respectively – within the context of mappings and category theory.  The fact that there are two types of ‘division mappings’ that one must contend with, in contrast to only one ‘division number’ in the arithmetic case, is a consequence of the fact that we don’t require, in general, an equal number of elements in the two sets being mapped.  The authors fail to explore this last point sufficiently, which leads to their discussion being somewhat impenetrable (a cursory search on the web leads me to conclude that this is a universal finding of people who study their text).  They also add insult to injury by interleaving the discussion of the two problems without much of an organized plan and with several smaller pedagogical no-nos. 

So, to address these shortcomings, this post will look at the first of these two problems, the determination problem, while deferring the choice problem to next month’s column.  In each case, the abstract definition of the problem will be given first follow by a concrete example.  The only remaining question is what specific concrete example to use.  Combining a bit of whimsey with the fact that I just introduced a friend to the 1979 movie The Warriors, led me to pull my examples from it.

For those unfamiliar with the movie, the plot centers around a group from gang called the Warriors who must fight their way through a host of rival gangs as they try to travel from the Bronx to Coney Island.  Each of the rival gangs has a particular set of colors and trademark approach to their violence.  Police officers and ordinary people also show up but in limited roles.  The finite sets based on this story are: a set of some of the gangs, a set of some of the characters, a set of some of the uniforms/colors worn by the gangs, and a set of some of the weapons. (Credit is due to The Warriors Gangs – The Warriors Movie Site which provided some of the finer details about the gangs).

Division Problem 1 – Determination Problem

Abstractly, the determination problem asks the following question.  If $A$, $B$, and $C$ are sets and $f:A \rightarrow B$ and $h:A \rightarrow C$ do we have enough information to determine what mapping, if any, exists between $B$ and $C$.

Pictorially, the question amounts to asking if a mapping $g$ exists such that the following triangle picture closes.

For our concrete example, let’s take the set $A$ to be three members of the titular gang, the set $B$ to be each member’s weapon of choice, and the set $C$ the gang each member most hates. Each member has his own unique weapon of choice (as seen by the mapping $f:A \rightarrow B$): $f(Ajax)=Bat$, $f(Cochise)=Pipe$, and $f(Swan)=Knife$. However, two of the Warriors rank the Rogues at the top of their most-hated lists ($h(Cochise)=Rogues=h(Swan)$ while another member views the Baseball Furies as easily the most detestable ($h(Ajax)=Furies$). Pictorially, the situation looks like:

The question is: can we generate a mapping $g$ from the set of preferred weapons to the set of most-hated gangs such that $g \circ f = h$.  In the parlance of category theory finding $g$ means that ‘$ h$ is a function of $f$’. 

A little bit of thought should convince the reader that a mapping $g$ defined in the following diagram fits the bill.

Of course we can check each case separately as well by the following list:

  • $ h(Ajax) = Furies = g \circ f (Ajax) = g(Bat) = Furies$
  • $h(Cochise) = Rogues = g \circ f (Cochise) = g(Pipe) = Rogues$
  • $h(Swan) = Rogues = g \circ f (Swan) = g(Knife) = Rogues$

Since we’ve exhausted all the elements of the set $A$ and, in each case, found that $h = g \circ f$, we can say that we’ve found a solution to the retraction problem for our little gang-warfare scenario.

Before we interpret that solution in plain language, let’s look at a related situation where a retraction can never be found.  We imagine that Cochise now prefers the Bat as his weapon of choice but that he maintains his deep and abiding hatred for the Rogues.  Let’s make the same list as above and see if anything is now amiss:

  • $h(Ajax) = Furies = g \circ f (Ajax) = g(Bat) = Furies$
  • $h(Cochise) = Rogues = g \circ f (Cochise) = g(Bat) = Rogues$
  • $h(Swan) = Rogues = g \circ f (Swan) = g(Knife) = Rogues$

By the time we got to the second line, we see that we have a problem.  Ajax’s case requires that $g(Bat) = Furies$ while Cochise’s situation requires $g(Bat)=Rogues$.  Under the rules of set mapping, one, and only one, arrow can start on an element in the domain set.  Pictorially, we have a situation where, to reconcile Ajax’s mode-de-guerre with that of Cochise requires two arrows starting off their favorite weapon the baseball bat (note extraneous line have been omitted for clarity).

The mathematical/philosophical machinery invoked leaves us with a clear picture of why a retraction doesn’t work in the latter case even though it works in the former case. The key difference is that in the first scenario $f$ had each element in $A$ point to a unique element in $B$. This requirement makes $f$ an injective mapping and this will be the central aspect of all mappings that possess a retraction. An important correllary is that the size of set $B$ must be at least as big as $A$ so that an injection can be setup (compare with the case if there were only two types of weapons; then one of the Warriors would, necessarily, need to favor the same weapon as another member).

Interestingly, while the category theory explanation of the contrast between the two cases is clear, finding plain language to describe the difference is quite hard.  One might think that a successfully constructed retraction in the first case would mean that we could conclude with certainty that if we see Ajax reach for Bat then he’s coming after one of the Furies.  But a good defense lawyer could argue that he merely wants to go to the batting cage.  Likewise, the lack of a retraction might seem daunting in the second case but if we find a Rogue beaten up with a baseball bat, any NYPD detective would have probable cause to talk to Cochise.  It seems that this aspect of category theory doesn’t adapt itself very well to real-world situations with nearly unlimited possibilities, uncertainties, and confounding variables, despite Lawvere’s and Schanuel’s attempts at doing so.  But the jury is still out in this exploration and maybe new ideas will emerge that force a reassessment.

Next month we’ll look at the choice problem but we’ll drop the gang violence in favor of more prosaic sets.

Category Theory Part 2 – Basics of Isomorphisms

This month’s installment continues the journey through Lawvere’s and Schanuel’s book on category theory (see last month’s installment).  In that article, the basic idea of category theory emerged as a set of ‘meta-rules’ that focused not on the elements of two sets but on the mappings between those sets.  In addition, the authors emphasized analogies with scientific thought and with ordinary arithmetic.

That section was rather short, though, and in this post the machinery gets to be a bit more complicated.  The central theme for the second section of the book (Article II and its accompanying Sessions) centers on isomorphisms.  This month’s installment will focus only on the basics of isomorphisms, with the hooks to future blogs on the topics of choice (i.e., sections of a map) and determination (i.e., retractions of a map).

It is interesting to note that a suspicion voiced in last month’s column seems to be on target: namely, that much of this work is philosophical rather than just mathematical.  The first indication of this is that Lawvere and Schanuel try to motivate the concept of isomorphisms by imagining a point in humanitie’s past where the formal concept of numbers had yet to be invented.  The question that they pose is, just how did man know when two things were equal?

The answer that they provide is a simple one.  Humanity devoid of formal numbers would not be able to count the number of elements in a set and compare that total to the total from another to establish that the sets have the same size.  But they could try to put the elements in each set into a one-to-one correspondence.

The example they construct consists of a nuclear family (father, mother, and child) and a collection of items that belong to or are identified with the person. 

The idea is that no matter how unsophisticated the family mathematics may be, each member of this group can claim treasure all his own and, in doing so, realize that the family group is the same size as the collection of personal treasures.  This example is a bit too ‘tribal’ for my taste, but it did get me thinking of those old heist movies where, after having made off with the loot, the criminals divvy up the cash using the ‘one for you, one for me’ scheme.

Be that as it may, there is an even more subtle notion lurking under the surface of this basic concept of equality.  Not only does each person have a treasure to call their own, but each treasure has an owner.  This bidirectionality is more special than the rules discussed in the category of sets.  In the latter, the only condition on a proper mapping is that each element in the set labeled as the domain has an arrow departing from it that connects with one and only one element in the codomain.  There was no condition placed on each element of the codomain connecting it back to one and only one element in the domain. 

A bit of reflection should, in general, convince one to reject a requirement that an element in the codomain must connect back to the domain.  Imposing such a requirement would mean that entire propositions would be impossible to express.  For example, consider the set of students {Andy, Barbara, Carl, Denise} who have just taken a test, and the set of possible grade outcomes {A,B,C,D,F}. 

The statement “No student failed the exam” could not be encoded into the language of category theory if every grade level needed to link to a student (and only one student).  (It is comical to think about a world were such a requirement were imposed.  Classes could never have any number of students but 5, and then, even if the scores on a 100-point exam were 1, 2, 3, 4, and 5, the top scorer would walk away with an A.)

So, the idea that the lines connecting the domain to the codomain can, in certain classes of mapping, be reversed is quite remarkable.  When a mapping possesses this bidirectionality it is called an isomorphism, and the formal definition is as follows:

A map $f: A \rightarrow B$ is called an isomorphism or invertible map, if there is a map $g:B \rightarrow A$ such that $g \circ f = I_A$ and $ f \circ g = I_B $.

Two special maps, $I_A$ and $I_B$, appear, which are the identity maps that take elements in the sets $A$ and $B$ back into themselves.  The map $g$ is also special in that it is unique and so it can be, without a loss of generality, denoted as $f^{-1}$.

It might be tempting to think that having both composition rules is redundant (e.g., $g \circ f = I_A$ has the same information as $ f \circ g = I_B$) but, as the following example shows, both are needed.

For these mappings, $g \circ f = I_A$ but $f \circ g \neq I_B$. 

It’s also easy to look past this definition and think that more must be needed, but the authors soon dispel that idea as well by showing that the following associated properties follow from it – namely that an isomorphism is:

Reflexive$A$ is isomorphic to $A$
SymmetricIf $A$ is isomorphic to $B$, then $B$ is isomorphic to $A$
TransitiveIf $A$ is isomorphic to $B$, and $B$ is isomorphic to $C$, then $A$ is isomorphic to $C$

These associated properties, and a host of other ones as well, are relatively easy to prove, but can look strange to one not used to the manipulations.  The primary strategy involves introducing the identity maps (either $I_A$ or $I_B$) at opportune moments.   

For example, the proof of the transitive property goes as:

  1. $A$ isomorphic to $B$ means $f:A \rightarrow B$ and that there is a $g: B \rightarrow A$ such that $f \circ g = I_B$ and $g \circ f  = I_A$.
  2. $B$ isomorphic to $C$ means $h:B \rightarrow C$ and that there is a $k: C \rightarrow B$ such that $h \circ k = I_C$ and $k \circ h  = I_B$.
  3. Then the map $h \circ f: A \rightarrow C$ (by composition).
  4. We now need to show that there exists an inverse map $ (h \circ f)^{-1}:C \rightarrow A$ such that $(h \circ f)^{-1}  \circ (h \circ f)  = I_A$ and $(h \circ f) \circ (h \circ f)^{-1} = I_C$
  5. From the definition of the inverse $(h \circ f)^{-1}$ must be equal to $g \circ k$.
  6. Putting it all together we see that:
    1. $(h \circ f)^{-1} \circ (h \circ f) = (g \circ k) \circ (h \circ f) = $ $g \circ (k \circ h) \circ f = I_B = ( g \circ I_B) \circ f = g \circ f = I_A$
    2. $(h \circ f) \circ (h \circ f) ^{-1} = (h \circ f) \circ (g \circ k) = $ $h \circ (f \circ g) \circ k = h \circ I_B \circ k = h \circ (I_C \circ k) = h \circ k = I_C$

Next month we’ll look at an equivalent definition of an isomorphism that has analogies to division in arithmetic using the concepts of sections and retractions of maps. 

Category Theory Part 1 – An Introduction

Since it is fashionable to start a new year with a new resolution, I thought it might be fun to try something new within the domain of this blog.  So, starting this month, the content and approach will be a little different.  I’ll be working through Conceptual Mathematics: A first introduction to categories, by F. William Lawvere and Stephen H. Schanuel. 

I had picked up this book some time back based on a whim and the promise on the back cover (obviously not quite a wing and a prayer but one improvises).  The authors claimed that categories lead to “remarkable unification and simplification of mathematics” and such a bold claim easily caught my attention and sparked my curiosity.   Thumbing through the pages made it clear that this work sits as much in the realm of philosophy as it does mathematics and that moved it from a possible purchase to an immediate sale.  However, life being life, I only got through the first chapter or so before overriding pressures shoved the book to the back burner, where it remained for many years. 

Now the time is right to return to this work but the process will be a little different.  Rather than digesting the material first and writing about it afterward, I’ll be chronicling my progress through material largely foreign presented in a work which is also substantially unfamiliar.  It should prove an interesting journey but perhaps not a very successful one in the long run; but only time will tell.

This first installment covers the introductory material in the book starting with some basic ontology and epistemology of physics and ending with a distillation of the basic properties concerning the category of finite sets.

Like any other mathematical scheme, the category of finite sets consists of two components: the basic objects or data comprising the category and a set of rules for how to relate them.  The curious thing about categories is that it seems that the rules are, in some sense, meta-rules.   

To better understand this point, consider a diagram from early in the book showing a standard picture of the trajectory of an object (left), in this case the flight of a bird, and the ontological breakdown of the motion mathematically (right).

At each instant in time, the bird is at a specific location in space.  The physicist’s standard approach is to consider the trajectory as a mapping $f$ from time to space which can symbolically be written as

\[ f:t \rightarrow {\mathbb R}^3 \equiv f:t \rightarrow {\mathbb R} \times {\mathbb R}^2 \; , \]

Where the tensor product in the last relation means that ‘space’ ${\mathbb R}^3$ can be decomposed into ‘line’ ${\mathbb R}$ and ‘plane’ ${\mathbb R}^2$. 

I think most physicists stop there with the job accomplished.  Being one myself, I pondered for a while what was so important about this point that is deserved a special place in the introduction – the part of the book that a perspective buyer would be most likely to read before purchasing.  While I am not completely sold on my interpretation, I believe that the authors were trying to say that the mappings were the basic objects in categories rather than simply the tools to understand the more ‘physical’ objects of time and space.

To further support this interpretation, I’ll point out that the authors list (p 17) the basic ingredients (as they call them) to have a category or ‘mathematical universe’ (again their words).  According to this terminology, these ingredients are

Data for the category:

  • Objects/sets:  $A$, $B$, $C$, $\ldots$
  • Maps between sets: $f:A \rightarrow B$ or, synonymously $A \xrightarrow{f} B$.  The maps only have one stipulation, that every element in the domain $A$ goes to some element in the range (or codomain) $B$.
  • Identity Maps (one per object): $A \xrightarrow{i_A} A$
  • Composition of Maps: assigns to each pair of compatible maps (say a map from $f:A \rightarrow B$ followed by another map from $g:B \rightarrow C$) a single map that accomplishes the job at one go (a map directly from $h: A \rightarrow C$) with $h \equiv g \circ f$ read as $g$ follows $f$

­Rules for the category

  • The identity laws
    • If $A \xrightarrow{i_A} A \xrightarrow{g} B$, then $A \xrightarrow{g \circ i_A} B$
    • If $A \xrightarrow{f} B \xrightarrow{i_B} B$, then $A \xrightarrow{i_B \circ f} B$
  • The associative law
    • If $A \xrightarrow{f} B \xrightarrow{g} C \xrightarrow{h} D$, then $A \xrightarrow{h \circ (g \circ f) = (h \circ g) \circ f} D$

They emphasize that the rules, are important, even crucial.  I take it that they mean that the key insight is the action between the maps and not the objects the maps relate and that is why I characterize the rules for categories as meta-rules.

These meta-rules center on the composition of the maps.  The identity laws are fairly straightforward with one caveat.  The two identity map rules are required since the maps are not required to be bijections, unlike the usual case in physics where the maps are almost always bijections.  This extra freedom makes a little more difficult to think about the associativity rule.  The following diagram, adapted from the exercises starting on page 18, shows that even in the absence of clear inverse for each map associativity holds; if ones starts with some element in the first set it has to go somewhere in the last set and it doesn’t matter if one thinks of grouping the intermediate steps one way or the other.

The final observation is that the authors make a point that there is a strong analogy between the multiplication of real numbers and the composition of maps.  This analogy derives from the combinatorics associated with the way in which one can relate $n$ elements in one set to $m$ elements in another.

Mathematically, the number of different maps between $A$ and $B$ is given in terms of the number of elements in each (denoted by $N_A$ and $N_B$ respectively) by

\[ {N_B}^{N_A} \; , \]

which for the case in the diagram above is $ 2^3 = 8$ as expected.

The exact insights that this analogy provides remain still a mystery but there is always next month.

Dictionaries and Forms

There is a curious point about human beings when it comes to language.  This point is comically portrayed in the episode And the Rock Cried Out, No Hiding Place from the third season in the television series Babylon 5.  This particular scene starts when Delenn, a member of an alien race called the Minbari (distinguished by the bone-like protrusion surrounding the sides and back of their heads) enters the command and control section of the eponymous space station.  She approaches John Sheridan, the commander of Babylon 5 and the man she is starting to grow romantically close to, to reproach him for driving himself too hard.  The way that she chooses to discuss his recent behavior sheds a humorous light on the idiosyncrasies of human language.

While her central point is obviously meant to be silly, the core message about language is, nonetheless, serious – one cannot use a dictionary to learn a language.  A dictionary’s job consists of simply relating a word unknown to the inquirer to other words that the inquirer knows.  If no words are known, the dictionary can only serve secondary functions as paper weight, door stop, or bug smasher.

Even if one already knows one language, a dictionary translating terms in the known language to the unknown language, is, by itself, an extremely poor guide to learning that language.  Other scenes, no doubt, spring to mind showing some clueless tourist trying to use a phrase book in a foreign country to comic results.

Despite being a source of endless hours of mirth, these humorous vignettes point towards one of the most mysterious and yet evident parts of human existence – that with no formal training each of us learns his mother tongue simply by observation and immersion.  The process by which we all learn to speak is commonplace that many of us may even have lost sight of it. 

To clarify just what mysterious power is operating, consider the newborn emerging from the womb.  He knows none of the words of his mother tongue and so he cannot be hoped to use a dictionary.  Indeed, he cannot even hold a dictionary nor turn the pages let alone read the entries.  However, he is equipped with the potentiality to speak and to really learn the abstract ideas that words represent and not just mimic them as would a parrot.  We can comfortably assert that he has some power to grasp the world and to extract from the objects he encounters (animate or inanimate) something of their essential forms.

An earlier blog (Learning Essential and Accidentals) discussed this ability to apprehend the essential forms of object within the context of learning a foreign language.  In that case, we suppose that the learner has already a good grasp of his mother tongue and the associated abstractions (redness or beauty or big) and that the process is one of making a many-to-many mapping between the known and the unknown.  In that scenario a learner is being taught, much in the way an newborn is, by example.  The native speaker points to a bottle of water and says ‘zerk’, a word which, to the learner, signifies nothing beyond the possibilities of ‘bottle, ‘clear’, ‘full’, ‘water’, and so.  With some trial-and-error, the learner soon learns that what for him was a nonsense sound can now be taken to mean ‘water’. 

The ability for the learner to sift through quite different concepts to arrive at the idea of water is quite remarkable.  Consider that he was required to compare extremely different types of nouns.  Water, as a substance, is very different from a cup of water used in drinking or preparing a recipe.  A bottle, which can be held in one’s own hand, is very different from the concept of clear, which is an attribute that can only be grasped in the mind.

As remarkable as this ability is, it pales in comparison to the far more interesting case of the infant.   In the case of the learner of a foreign tongue, he is not starting fresh but already has his faculties fully equipped, exhibits fluency in his native language, and is familiar with all of the abstractions he needs in order to extend what he knows into what he doesn’t.  However, the infant knows no words (at least not as far as we can tell), he has only nascent faculties to learn but no facility in having learned, and if he knows of some level of abstraction he is in no position to communicate it.

So how does the process get started?  For the new born, just what does he know that allows him to actualize speaking?  How does the gibberish syllables of ‘da-da’ or ‘ba-ba’ slowly but surely transform into not just the word father but the abstract idea that this older man looks after me, who is related to me, with whom I have a connection and so on?

The fact that we don’t have satisfactory answers to these questions is the heart of the mystery.  We, as humans, exercise a faculty which we use so well yet so poorly understand.  It is a faculty of a different type compared to reading or learning to play a musical instrument.  In those latter cases, we are conscious of having been taught how to accomplish these skills.  We remember formal lessons and, even if we get to the point where each is automatic, we still needed to learn.  We needed someone to tell us how to interface with a shared scheme of denoting sounds by symbols or affecting these sounds on a man-made device.

The learning displayed by the infant is more primitive but also more sophisticated.  It is a learning peculiar to that individual (how to move lips, mouth, gums, lungs, and so on) but it is also links that individual’s existence with society as a whole.  And it is results in far more than simple communications system about things happening now.  With speech, the individual can eventually express immaterial concepts such as love and justice. 

We are forced to conclude that each of us possess, innately, an active facility from the moment we begin to exist that allows us to learn to grasp not just the sensible world but the essential forms that sit behind (or within or whatever) the world as a whole.  Each one of us should pause and ponder this truly astonishing ability the next time we look up a word in the dictionary.

Election Conundrums

Well the election is upon us and the only good part of it is that a week after this blog goes live the whole thing will be over.  The political posturing and half-baked thinking displayed by most politicians, of course, will still be there and it is possible that the outcome will not be decided for some time to come – but the endless jockeying for position will be over.  Sanctimonious speeches will erupt on both sides about how voting integrity was either maintained against the fearsome onslaught of those wishing to taint the election (argument from the winning side) or how the election was actually stolen by bad actors and is now illegitimate (argument from the losing side).  Curiously, neither side actually talks about the inadequacies of the election process it self and some of the severe problem and type of election poses to properly ‘reflecting the will of the people’.

In a nutshell, what makes all elections border on illegitimacy is that in all electoral scheme there demonstrably exists situations in which a candidate disfavored by a majority of people participating in the election is elected over other, more desired candidates. 

Before talking about the problem in its most general form, let’s look at the case study from Chapter 15: Election Problems and Engine Failure from the companion Course Guidebook from Michael Starbird’s Great Courses lecture series entitled Meaning from Data: Statistics Made Clear.  Some of the structure in the following narrative is Starbird’s but the detailed analysis and presentation is unique.

The case study starts will polling results that show the populations preferences regarding four candidates for the election.  Each candidate is scored by what percentage of the population ranked that candidate as their first choice, what percentage had that candidate as their second choice, and so on.  The tabular results are:

Candidate1st Place Percentage2nd Place Percentage3rd Place Percentage4th Place PercentageTotal %
A40141630100
B1346338100
C1818361100
D2922481100
Total %100100100100 

The problem of determining the will of the people boils down to the problem of figuring out how to summarize the data presented in the table to unequivocally determine who is the candidate that best represents the voting population.

Suppose that we hold an election in which all four candidates are on the ballot and voters can only vote for one.  In this case, it is clear that Candidate A will be elected since he will have garnered the greatest number of votes.  He would then represent the entire population even though 60% of that population would prefer someone else.  This system, called plurality voting, gives a great deal of concern since the result seems to defy the will of a overwhelming majority of the voters.

Dissatisfied with the previous results as the outcome, suppose we decide to allow voters to pick their top two choices – their first place and runner up positions.  In this case, the rankings would be: Candidate B with 59% of the voters placing him in either first or second place, followed by Candidate A with 54%, Candidate D with 53%, and Candidate C with 36% of the vote.  This vote-for-two method elected Candidate B who was least liked for first place to represent the population.  In other words, the one thing the voters had a consensus on was that Candidate B was not the guy for first place.

Now beginning to mutter to ourselves about the stupidity of it all, we finally hit upon a scheme we are sure to work.  Let’s let the voters choose their top candidate as in the plurality case but instead of keeping only one candidate we will keep the top two.  Now having thinned the field, we re-offer the vote with a clearer choice.  This scheme, which is variation of run-off voting, seems to us to have the best of both worlds.   Putting this in place, we are happy to see that Candidates A and D are selected to proceed to the next to round.   Only now do we realize that we aren’t quite sure of the next step.  How will the 31% of the population not committed to either break their support to favor one or the other?  Looking beyond the numbers to examine the issues, we realize that the 18% supporting Candidate C will through their support behind Candidate D because of his stand on Issue #1.  We further realize that of the 13% supporting B, about 2/3 (say 7%), are going to also support Candidate D for the same reason.  Candidate D wins the election with 54% of the vote and we feel good until we realize that something disturbing.

What if Candidate B, totally irrelevant in race for first place, had dropped out before the first round.  Again, examining the issues, we now realize the Candidate C would have gained most of the support for B (based on their common viewpoint on Issue #1) bring his percentage up to 30%, enough to edge out Candidate D for the runoff.  Now we are deeply unsettled because our system seems to hinge on presence or absence of an irrelevant candidate who splits the vote.  Following that scenario further, we realize that of the 29% committed to Candidate D, who is now absent from the second round of the runoff, just over half (say 15%) will split towards Candidate A based on their support for a different issue, Issue #2.  This brings Candidate A up to 55% of the vote in the second round making him the clear winner. 

The fun doesn’t stop there, having now turned to looking at the issues, we realize that if Candidates D and C had pulled 12% and 11% of the support from Candidates A and B, respectively, over Issue #2 before the first round of the election, that the runoff would now involve Candidate C and D.

CandidateCurrent PercentageHypothetical Change based on Issue #2
A4028
B132
C1829
D2941

If the remaining 30% not directly of committed to either candidate, breaks two-to-one in favor of Candidate C, based on Issue #1, then Candidate C would have won even though Candidate D had done better and would have won a plurality vote even more decisively than Candidate A had in the previous scenario.

These results aren’t hypothetical but are based on an actual presidential election in the United States with Candidates A and D being Abraham Lincoln and Frederick Douglas, respectively.  Perhaps we can take some comfort that the two most favored candidates were both abolitionists and so that important issue would have been carried forward regardless of who would’ve one but the course of the Civil War may have changed dramatically with Douglas in office.  That said, surely we remain disturbed by the impression that it seems that we could select any candidate based on which voting system we used.

In fact, this impression has been fashioned into an actual, mathematically provable results known as Arrow’s Impossibility Theorem.  This theorem says that it is impossible for any ranked system of voting to

  • Maintain the consensus (e.g. Candidate B in the vote for two scenario)
  • Reward candidates for doing better (e.g. Candidate D in the last runoff scenario lost even though he had improved his standing)
  • Remain unswayed by irrelevant candidates (e.g. Candidate B dropping out of the second runoff scenario)

Mathematically, the outcome of an election has much to do with the system used and none are superior to others.  Remember that next election day.

Leaky Abstraction

Last month’s column promised more on Naïve Bayesian Classification with a specific look at the performance with synthetically generated gemstone data from the little universe of James McCaffrey’s creation.  But, deadlines being deadlines, life being life, and distractions and technological failings aplenty means that that promise will need to be deferred until next month.  A new idea crept in and, because it was so apropos, it refused to get out until it was expressed in writing.  And so, here we are, with an article on leaky abstraction.

I hadn’t heard of this term until the end of August when I came across an article, written about version control using Git, that mentioned it and then suddenly the light came on.  Based on context, I basically figured out the general gist but it wasn’t until I actually did some research on the web and read the Wikipedia article that really knew how to talk about it.  Here, finally, was a term for the angst I so often felt around the technology of the future – technology that was supposed to make things better and represent an improvement but which made life harder and more opaque; sometimes dramatically so.

The experience was a lot like suffering for years with a chronic condition, which you could not name or treat, only to visit a new doctor who not only believed something was wrong but knew what it was and how to diagnose it.  Suddenly, you go from suspecting that it may all be in your head to being on the road to recovery or management with the mental relief of knowing you really were ailing the whole time.    

In a nutshell, leaky abstraction is the term used to describe a common user experience with a tool whose interface makes the user experience an exercise in deducing how the development thinks about the problem their tool is addressing rather than using the solution they provide.  While there is an even more formal definition, a set of examples comparing and contrasting (a la Goofus and Gallant in Highlights but with the opposite order) serves much better.

On the side of the angels is the modern automobile.  One doesn’t need to know any physics or engineering to operate them.  Starting the engine is effortless with no knowledge required of Newton’s law, mechanical advantage, fluid dynamics, thermodynamics, metallurgy or engineering principles.  One doesn’t need to know if the brakes are direct mechanical or hydraulic, or are computer controlled to be able to stomp on them to stop quickly.  The human-machine interface is intuitive.  Want to go to the right, turn the steering wheel the same way.  Want to go fast, push the gas pedal down.  It is true that when things go amiss you are slightly better off if you know something about how the vehicle is designed but even in these relatively rare cases there is an entire support network built around making the process easy (albeit potentially expensive).

On the other side is Git.  One needs to really know how Git’s data model works.  The term ‘directed acyclic graph’ entered into my vocabulary because I am reliably told that the only way to really understand Git is to understand DAGs (e.g, the blog post here).  But I am not interested in discrete math; I simply want to version control software.  I appreciate it is a hard problem but so is designing a car or a laptop or a smart phone.  Those systems move in the direction of simplicity but Git seems happy embracing a user experience that is best summarized by the following XKCD cartoon.

I’ve tried to like Git, really I have.  When Github came online I was an early adopter precisely because I recognized the importance of saving to the cloud.  I even am grandfathered in with a username that isn’t an email address.  And I am grateful to Github for providing that platform.  And I find Octocat to be a cute mascot (especially in his MegaMan guise).

But, weighed against all that benefit is the detest I have for the Git mechanics best exemplified by the biting part of the last word balloon.

Too many times I’ve been bitten by Git. 

My first regretful experience came early on.  By accident, I committed a large file that took my commit over the 50 MB limit Github had (perhaps it still does – don’t know don’t care).  It was a careless mistake but I was tired doing real work.  I only realized my mistake when I tried to push from the local repo to Github.  No matter what hocus pocus I tried I couldn’t get Git to forget about that commit.  Eventually I stumbled upon a magical incantation that remedied the situation but to this day I don’t know what it was.

Chagrinned by this incident, I tried to become better at Git but the arcana just left me cold.  I’ve tried  experimenting with Git under the argument that the best way to learn wasn’t to RTFM but to actually do.  Along these lines, I created phantom project structures with files that are stubs and tried my best to move and reshuffle and rename and undo.  One expert (see I was reading) argued that Git could track file moves with explicitly being told but it didn’t always work.  Another expert said renaming would be a breeze.  And so on.  Numerous pieces of advice but in the end only vice.

A critical reader, especially of the haughty programmer types, are no doubt enraged with my lack of discipline to see an understanding of Git through to the end.  But I would ask of them, do you understand the quantum mechanics that allows your beloved digital age to work.  Judging by my experiences teaching the subject the answer is a resounding no.  People, in general, and programmers, in particular, are bad at understand quantum mechanics.  Suppose that to operate a computer one had to understand semiconductors, quantum wave functions, and Fermi-Dirac statistics (just to name a few).  How many homes would boast a computer?  Suppose you had to understand orbital mechanics and General Relativity to use a GPS-based navigation system.  How many people would stay home (especially the haughty programmer types) for fear of getting lost.

No, I am neither lazy nor stupid.  I am smart enough to know that software can do better that the leaky abstraction that Git foists on its user community.  I am cautious enough to use the magical incantations precisely so that I don’t get bloated commits or detached heads or whatever.  And I am industrious enough to put my time to better use than understanding the Git data model.  If only I were as cute as Octocat I would be set.