Latest Posts

Machine Classification: Part 2 – A Naïve Bayes Classification Algorithm

As discussed in the last column, classification can be a tricky thing.  Much of the machine learning buzz centers on classification problems.  Typical examples include things like optical character recognition (classify a set of pixels in a image as a particular character), computer vision and image processing (classify a region on the ground as flooded or not), and so on. 

This column focuses on one of the more common classification algorithms: naïve Bayes classifier (NBC).  Paraphrasing the Wikipedia article, the NBC is a simple technique that produces a model that acts as an agent, which by looking at some collection of features associated with an object, can place that object within the appropriate ‘bucket’. 

To create a concrete example, we’ll use the scheme used by James McCaffrey in his June 2019 Test Run column entitled Simplified Naive Bayes Classification Using C#.  One can imagine that we are pawn brokers in McCaffrey’s universe.  People frequently come in hawking jewelry and, given that we run a pawn shop, we should expect that some of our clientele are less than trustworthy.  We want build a model that allows us to classify a gemstone as being real or fake based on its color, size, and style of the cut.

These three attributes of the gemstone will be factors used to make the prediction and they are typically arranged in a list or array that is euphemistically called a vector (it is only euphemistically so as these list don’t obey the accepted definitions for a vector space).  The gemstone vector will have 3 dimensions for color, size, and style.  Each attribute has various realizations as shown in this figure

To develop our model we first have to pool together what we know based on the gemstones we’ve seen.  For example, if a kind-hearted woman who had fallen on hard times came in with a small, twisted aqua-colored stone that we verified was authentic then we would enter into our database the entry:

Aqua,Small,Twisted,1

where the ‘1’ means authentic or good.  If some shady character, acting all tough came in with a small, blue, pointed stone that we reluctantly took and found out later was fake we would amend our database to read:

Aqua,Small,Twisted,1
Blue,Small,Pointed,0

where the ‘0’ means fake or bad.  Proceeding in this fashion, we produce a training set for our agent to gain experience with as it develops its own internal mode.  For this initial prototype, I used the 40 element training set provided by McCaffrey (of which the first two points are as shown above). 

This kind of training is called supervised since we actually label each feature vector with the category into which it belongs.  It is worth noting that there isn’t a single Bayesian classifier but rather a family of related algorithms.  The basic concepts are the same but the particular way in which the training set is characterized leads to better or worse performance based on context.  In particular, all NBCs assume that a given attribute is independent of the value of any other attribute.

Anyway, returning to McCaffrey’s NBC, the structure of his algorithm is most easily summarized in the following steps (the names of my Python routines to implement these steps are shown in parentheses):

  1. Training data is digested, the dimension of the feature vector is deduced, and the types of each attribute uniquely cataloged (find_distinct_values)
  2. The marginals of the distributions are determined (calculate_Laplace_smoothed_marginals) with an added nuance to handle if a feature combination is not present
  3. Additional statistics are computed to facilitate the classification scheme (characterize_data_set)
  4. Finally the model is able to classify on the feature vector of a new instance of a gemstone (calculate_evidence)

The primary data structure is the python dictionary which builds up around each attribute discovered in the training set.  Obviously, this limits the NBC to classifying on known attributes.  In other words, if a ruby-colored gemstone came on the scene the agent/model wouldn’t know how to classify it. This situation would be the same for us manning the pawn shop when a person who don’t know whether to trust of not comes in with such a stone.

The code for each function is listed here:

def find_distinct_values(df,attributes_lst):
    distinct_values = {}
    for attribute in attributes_lst:
        distinct_values[attribute] = set(df[attribute])

    return distinct_values
def calculate_Laplace_smoothed_marginals(df,distinct_values):
    #initialize the marginals
    marginals = {}
    for attribute_type in distinct_values:
        for attribute in distinct_values[attribute_type]:
           #initializing to [1,1] implements Laplace smoothing
            marginals[attribute] = np.array([1,1])  
            
    for attribute_type in distinct_values:
        for attribute, authenticity in zip(df[attribute_type],df['authenticity']):
            marginals[attribute][authenticity] += 1
            
    return marginals  
def characterize_data_set(df):
    fake_label           = 0
    true_label           = 1
    summary              = {}
    authenticity_data    = df['authenticity']
    fake_counts          = len(np.where(authenticity_data==fake_label)[0])
    true_counts          = len(np.where(authenticity_data==true_label)[0])
  
    summary['num samples'] = fake_counts + true_counts
    summary['num fake']    = fake_counts
    summary['num true']    = true_counts
    
    return summary
def calculate_evidence(distinct_values,smoothed_marginals,summary,sample_values):
    fake_label           = 0
    true_label           = 1
    num_attributes       = len(distinct_values)
    
    prob_fake            = summary['num fake']/summary['num samples']
    prob_true            = summary['num true']/summary['num samples']
    smoothed_num_fake    = summary['num fake'] + num_attributes
    smoothed_num_true    = summary['num true'] + num_attributes
    
    sample_evidence_fake = 1
    for attribute in sample_values:
        sample_evidence_fake *= smoothed_marginals[attribute][fake_label]/smoothed_num_fake
    sample_evidence_fake *= prob_fake
    
    sample_evidence_true = 1
    for attribute in sample_values:
        sample_evidence_true *= smoothed_marginals[attribute][true_label]/smoothed_num_true
    sample_evidence_true *= prob_true
    
    normalization = sample_evidence_fake + sample_evidence_true
    
    return sample_evidence_fake/normalization, sample_evidence_true/normalization

Happily, the code reproduces the results of McCaffrey’s original article but preliminary tests with more varied training sets have been disappointing. 

Machine Classification: Part 1 – Overview

Classification is one of the key components of the modern AI toolkit in which a machine learning (ML) algorithm attempts to mimic the human ability to distinguish and categorize.  The idea is that the algorithm, when confronted with a new instance of an object, is to statistically determine the class or category into which this object best fits.   

Classification is one of those human activities that is deceptively simple.  For example, for decades people thought that the horseshoe crab was related to the crustaceans because it could be found in the ocean.  As biology progressed, it became clearer that the horseshoe crab wasn’t in the same class as the crustaceans and that it has more in common with arachnids, making it the ‘spider of the seas’. 

The ability of an expert to determine whether an object belongs in one category is also a subtle affair that is often as much art as it is science, as the following excerpt from Miss Marple’s speech in A Christmas Tragedy by Agatha Christie nicely describes:

It’s really a matter of practice and experience.  An Egyptologist, so I’ve heard, if you show him one of those curious little beetles, can tell you by the look and feel of the thing what date B.C. it is, or if it’s a Birmingham imitation.  And he can’t always give a definite rule for doing so.  He just knows.  His life has been spent handling such things.

Christie makes several important points in that brief passage.  First, there is the matter of ‘practice and experience’.  This translates in the domain of machine learning to training.  Second, she speaks of the Egyptologist ‘handling’ the beetle and judging by the ‘look and the feel of the thing’.  This requirement corresponds to having a set of percepts about the object, a point that is, arguably, the trickiest.  The third point she raises is that the expert can classify the age of the object (‘what date B.C’) or can spot a counterfeit.  Of course, this is the point of the ML algorithm in the first place, to be able to judge expertly a new object.  The fourth and final point is that the expert can’t always give a definite rule explaining how he judged.  There is no direct translation of this rule into the domain of machine learning, but more on that point below.

In thinking about an expert (Egyptologist or otherwise), we need to recognize that what makes him an expert is that he is more often right than wrong.  The context for the previous excerpt is the argument that Miss Marple, a spinster sleuth of uncertain age, makes about how ‘superfluous women’ (such as herself) who engage in ‘tittle tattle’ are ‘nine times out of ten’ correct, and ‘[t]hat’s really just what makes people so annoyed about it’.  So, we can’t expect our machine learning algorithm to be able to be 100% accurate, since no expert ever is; we can only hope that it is ‘accurate enough’. 

It is also very likely that the algorithm will never be as accurate as a human expert for the following reason.  In philosophical terms, machine classification overlaps the first and second Acts of the Mind (the Act of Understanding and the Act of Judgement) without, necessarily, being fully developed in either.  

For humans, the first act involves apprehending the percepts provided (‘handling’ the object to get an idea of its ‘look and feel’).  A baby is born with the ability to process his perceptions; to make some sort of comprehension of the sensory input from the five senses. In the second act, the person abstracts universals (or what, at least functionally, passes as such) from those sensory experiences to be able to understand ‘redness’ or ‘roundness’ or the being-qua-being of any other form.  These universals allow the human to then classify and sub-classify the objects in the surrounding world. 

In contrast, the machine is taught about only a small subset of possible percepts (typically digital data representing an image or a time series).  Currently, no machine can expand or contract its attention when it realizes it needs to know more or is being blasted with too much information.  In addition, it only knows the categories that are used to train it. 

The human has a decided advantage in that he can expand or contract the number of attributes used in the classification on the fly (e.g., concentrating only the weight and the texture first and then adding in color and style as needed later) and the human can invent new attributes at need (e.g. suddenly noticing that the size matters).  The machine has only two advantages: raw speed and the ability to handle an arbitrarily large number of attributes (although the number must be fixed for each situation).  As a result, the machine’s ability to classify is entirely based on some statistical or probabilistic measure.  The human’s ability to classify is surely rooted in probability as well, but what if anything else is going on is, at this time, anybody’s guess.

To be more concrete, consider the problem of spam emails.  Determining whether a given email is spam or not is a good example of a classification problem that illustrates some of the advantages and disadvantages on both sides.  The human can actually read the content of an email, comprehend the meanings, and judge the context (which may require consideration of different attributes compared to the previous email) before deciding whether the message is good or bad.  However, the human can only read a limited number of emails each day and is prone to getting bored or tired and making mistakes.  The machine can make sense of large amount of the associated network data, be it IP addresses, message size, number of hops and so on – data that would make little or no sense to an overwhelming number of most humans.  In addition, the machine can analyze a vast number of messages in the time it takes the human to read one. 

Over the coming months, this column will look at some of the more popular ML techniques for classifying data and compare the pros and cons of each technique.  Some of the metrics for the comparison will be the difficulty of assembling a training set (the data that gives the required ‘practice and experience’), whether the data need to be pre-labeled into classes (e.g., a real scarab or a Birmingham imitation), or whether we can allow the algorithm to find the possible classes based on how the data cluster, the accuracy of the method compared to truth, and the application domains that experts use.  In the end, we will have essentially a classification of classification algorithms.

Randomness and Structure (or Monkey Mayhem)

There is a lot of plain silliness surrounding how people talk about the role of randomness in producing structure in nature.  There is no denying that randomness either exists in nature or that we have to invoke random behavior in our models due to our ignorance (which of these you choose is a matter of some philosophical debate).  However, there is also no denying that its role is far from understood, and that some of the brasher amongst us forget this from time-to-time, and end up uttering the most ridiculous pronouncements.

The poster child for this sort of unalloyed over-zealousness is nicely examined and ridiculed in the following homework exercise from the book Thermal Physics, by Kittel and Kromer, who broached the question about the role of pure randomness in their question entitled The meaning of “never” (page 53 in the 2nd edition).  The beginning of the problem reads

It has been said (footnote: J. Jeans, Mysterious Universe, Cambridge University Press, 1930, p. 4.  The statement is attributed to Huxley) that “six monkeys, set to strum unintelligently on typewriters for millions of years, would be bound in time to write all the books in the British Museum.”…Could all the monkeys in the world have typed out a single specified book in the age of the universe?

While the attribution to Huxley by Jeans (presumably Thomas Henry Huxley) seems to be apocryphal, it is clear that a large segment of the population shares in the expressed sentiment that randomness eventually leads to structure.  The operative question is, should they be believing in the power of fluctuations and chance.

To help in answering the posed question, Kittel and Kromer ask the student to make the following assumptions:

  • The monkeys have 44 keys on their typewriters. Ignoring the use of the shift key, the breakdown is 26 letters, 10 digits, 8 punctuation marks.  Modern laptop keyboards seem to have something more like 10 punctuation mark keys but a smaller number of keys is better for the meandering paws of the monkeys as they have fewer ways of producing gibberish.
  • The primate population is $10^{10}$ monkeys, which corresponds to 10 billion simian typists, roughly 30% more than the number of people walking the planet at this moment
  • Allot $10^{18}$ seconds to our monkeys for their unorchestrated typing.  This time span totals up at about 31.7 billion years compared to the 13.8 billion years estimated age of the universe.
  • Each monkey can type 10 characters per second, which is fast even for trained typists.
  • The specified text is Hamlet, which has $10^5$ characters, and we ignore case.

Kittel and Kromer calculate that the probability that any 100,000 character string chosen at random matches Hamlet is $10^{-164,345}$ and that the probability that the monkeys will produce Hamlet is $ 10^{-164,316}$ to which they state:

The probability of Hamlet is therefore zero in any operational sense of an event, so the original statement at the beginning of the problem is nonsense: one book, much less a library, will never occur in the total literary production of the monkeys.

But, before moving onto a deeper discussion about randomness and structure, it is useful to skim through the probability computations in a simplified setting.  To see the nuts and bolts, we will ask what is the probability of producing the inviting string ‘hello’ given a keyboard with only 14 keys corresponding to the first fourteen letters of the alphabet {a,b,c,d,e,f,g,h,i,k,l,m,n,o}.   There are five characters in the target string translating to five slots each with 14 possible choices for a grand total of 537,824 possible choices (one of which is shown below).

The only additional pieces used in the Kittel and Kromer problem, which are not needed in this simple example, are the use of logarithms to estimate the number of realizations of Hamlet at $44^{10^5}$ and the multiplication of the corresponding probability by $10^{29}$ to account for the number of realizations that the monkey population can produce in the allotted time.

Numbers like these are, or should be, damning evidence to anyone who thinks structure arises solely from random fluctuations and this problem has been a major hurdle that biologists have to deal with when considering how life evolves.  After all, the idea that many people have about biological evolution is that, if one waits long enough, structure will appear, but, clearly, that can’t happen simply with blind chance.

So, how should one think about the role that randomness plays in physical and biological phenomena?  Richard Dawkins claims in The Blind Watcher that the key to randomness in biological processes is found in the concept of cumulative selection.  He brushes aside the objections raised above in his chapter entitled ‘Accumulating small change’.  In that chapter, Dawkins supports this claim by presenting his ‘weasel program’, which is a piece of code that takes a random string (produced, say, by one of the monkeys) and breeds a fitter string.  For example, one can take ‘aF!rty.opRSWi’ and breed it towards ‘Hello, World!’. 

Dawkins is quite proud of his little tinkering and he uses it as refutation of the idea of intelligent design.  However, his program is simply a probabilistic way of traversing a tree of all possible strings of the target length from a random string towards the bottom to a specific string at the top using a fitness function and a rate of randomizing, which he calls mutation. 

The following figure illustrates the 27 random strings that can be created from the simple alphabet of {aer}.  Each variation is color-coded based on how many mutations away it is from the target string ‘ear’, with green, yellow, and red corresponding to 1, 2, or 3 mutations, respectively.  The arrows show the connections between levels for one particular path to the target passing through the single mutation of ‘ear’ to ‘aar’.

For simplicity, assume that when the program breeds a brood from a given parent it can only make one mutation.  Then it can move the fitness up a level or down a level or keep it the same (with obvious limitations at the top and bottom).  The action of selecting the fittest child is the equivalent of always moving up the ladder provided a sufficient number of children are produced.  It is an almost certainty that the program must reach its goal.  And so, Dawkins argues, randomness amortized over generations produces the most complex structures we know – living organisms. 

But the weasel program merely underscores his sloppy thinking about randomness.  Dawkins’s process of cumulative selection is just that, a process that assumes a whole set of deterministic rules, of which the following are a sample:

  • The length of the target string is known a priori
  • The content of the target string is also known – it is the goal or telos of the algorithm
  • A deterministic fitness function exists that clearly finds the best string or strings from a set
  • At each generation a fixed number of children are produced
  • The best child is always chosen by the fitness function
  • The best child always fathers the next generation
  • The algorithm exists within a deterministic environment in which strings can be interpreted as commands, and those commands do the same thing time and again

With so much structure it is easy to see why a desirable outcome results in a fairly short amount of time.  The key point is that a clear set of underlying rules combined with some randomness is really the only way in which a stochastic process leads to structure.  In other words, it is the process that matters, and blind chance merely adds spice to the recipe.  How the underlying process comes about is something which, curiously, Dawkins has little to say, but that hasn’t prevented untold numbers of people being duped by his 20th century version of the ‘Huxley argument’ about randomness.  I guess monkey see, monkey do.

Boy-Girl Paradox and the Language of Probability

It is odd how the phrasing of a question changes the meaning and interpretation of probability-based situations.  Philosophically, we should expect a degree of fluidity because when we engage in thinking about and discussing probabilities we wander into the twilight zone of thought.  Clear distinctions between what is known and what can be known, between epistemological uncertainty and ontological uncertainty (sometimes called aleatory variability), and how and why we know are important if we ever want to emerge from the forest of confusion and doubt.

For a simple example of some of the complexity that can arise, consider the lowly coin flip.  Imagine you are at a friend’s house and the two of you are arguing over what movie to watch.  Your friend wants to watch Predator and you want to watch Alien.  You decide to settle the debate on a coin flip of the variety where he flips the coin, catches it, mashes it onto his arm with his hand covering it, and then he invites you to call it heads or tails.

Assuming the coin is fair, you reckon that there is a fifty-fifty chance that you’ll be enjoying Alien tonight while he just has to grin and bear it.  He then flips the coin and, as you contemplate the hidden disk upon which all your hopes and dreams ride (at least as this evening’s movie selection is concerned), you may be moved to say that the probability of heads is 0.5.  But in this you would be wrong.  The probability of the flip coming up heads before it is tossed is 0.5 (an example of ontological uncertainty) but after you friend has flipped and caught the coin there is a decided outcome.  The correct way of phrasing the situation is to say that the probability that you will guess the already selected result is 0.5 (an example of epistemological uncertainty). 

Hopefully this simple example has clarified these points a bit.  Ontological uncertainty usually arises when making predictions of physical outcomes of an event with the traditional example being Aristotle’s sea battle.  Whether a sea battle will happen tomorrow is a statement that cannot have definitive truth value (either true or false) and is an example where the law of the excluded middle may be violated.  Epistemological uncertainty arises when making decisions about the past outcome of an event with limited knowledge with a corresponding example being whether the sea battle that happened today was a victory for one side or a defeat.

It is very easy to get confused on these points, and an excellent example of this controversy was raised by Zach Star in his YouTube video entitled This May Be The Most Counterintuitive Probability Paradox I’ve Ever Seen | Can you spot the error? from April 7, 2019. 

I don’t recommend watching the whole video precisely because Zach gets very contorted in the analysis of a variant of the Boy-Girl Paradox, but it is an important precursor to his follow-up video entitled The Boy or Girl Probability Paradox Resolved | It was never really a paradox from April 11, 2019.

Even in his clarification video, he goes to some effort to caution about his tenuous grasp of the right way to analyze the situation and why his earlier conclusions were wrong.

To explain where the tangle arises, let’s start with the most basic premise of the Boy-Girl Paradox that asks the following.  Suppose you meet a father in a bar and, in the course of conversation (say over gin and tonics), he reveals that he has two children.  What is the probability that he has two girls?

Well, assuming that boys and girls are equally likely, the probability is 0.25.  This conclusion is straightforward but best presented in the following figure, which assumes that you’ve now met 10,000 such two-children families (and have run up a large bar tab).

This is a statement of ontological uncertainty.  That is to say that, in families that have birthed two children, the random process of sex selection will distribute the sexes such that the proportions shown in the figure result. 

But in the context of the bar conversation, the probability is really epistemological in that we are trying to determine, based on the clues we pick up, what is the probability that we will guess correctly.  Since only 2,500 two-girl families are present in the population of 10,000 total families, the probability, if we guess correctly, that a given father has two girls, given no other data, is one quarter or 0.25.

Now suppose that he lets slip that one of his children is a girl.  This revelation provides a bit more data and so our expectation is that the probability should increase and so it does because we now get to exclude all the families with two boys.  Our two-girl families remain at 2,500 but the population against which it is measured as a proportion has dropped to 7,500 and the probability that we will correctly guess that the father has two girls rises to 1/3.  Let me underline this last distinction.  The probability that the father has two girls if he has two children is always 1/4 ontologically.  What we are doing at this point is narrowing our epistemological uncertainty.   

Now comes the tricky part that initially caused Zach Star to stumble.  Suppose that a given father says one of his children is named Julie.  Star says that the probability that the man has two girls has risen to one half or 0.5.  He reasons that conclusion this way.  Assume that the probability of a girl being named Julie is 1/100 (the actual probability value doesn’t matter but this value is convenient).  Then the set of one-girl families supplies 50 girls who meet the bill (on average of course – that is why we took the number of families large to begin with so that we could ignore fluctuations).  The set of two-girl families, while half the size when taken in aggregate as a two-child household, supplies 50 girls as well, since they have two girls for each one in the other set.  Ergo, the probability is 0.5.  And this change in probability is a paradox to him because how can knowing the name Julie make a difference.

This way of talking is sloppy for several reasons.  First, as pointed out before, the ontological probability never changes; what changes is our ability to guess properly, and that should go up or down as new info is provided.  Second, and more important, the reasoning is wrong.  Only half the fathers in the two-daughter set are going to randomly mention that they have a daughter named Julie even if there are 50 Julies to be found.  That is because they have no incentive to select Julie over the other daughter, whatever her name may be.  If, however, we systematically poll each family and ask if they have a daughter named Julie then we will be sure to uncover all the ones in the two-child set.  This process increases our knowledge and so it should decrease our epistemological uncertainty.

It’s amazing how easy it is to get tangled up in probability.

What is Random

Call it happenstance or blind luck but we all have a notional idea of randomness and the role it plays in our lives.  We see learned authorities invoke random processes in Darwin’s theory of natural selection, in the vicissitudes of the Stock Market, or in the efficacy of a pharmaceutical in treating an ailment or disease.  Surprisingly, a precise definition of random is often elusive and controversial.  Despite the very nature of randomness, this post will try to proceed deterministically through some of the discussions over the millennia starting with the bigger questions and ending with the smaller ones.  Interestingly, there are deep connections between the smallest and largest questions that often sneak up.

At the top of the list is the very question about free will versus determinism.  If free will exists then random outcomes are possible but, conversely, if the universe’s evolution is predetermined then every outcome is part of a larger plan and only our own (predetermined) ignorance prevents us from perceiving it.  Much of the thinking here falls under the problem of future contingents.  One of the first philosophical scenarios used to explore future contingencies was Aristotle’s example of the sea battle

that may or may not be fought tomorrow.  Assume, for a moment, that the proposition ‘the sea battle will be fought tomorrow’ is unequivocally true.  Next, let’s turn our face from the future and consider the past.  If the proposition ‘the sea battle will be fought tomorrow’ is true today, then it was also true yesterday because the proposition’s truth value in the past was also resolved and locked in.  If it was true yesterday then it must also have been true the day before and so on.  The same holds if the proposition ‘the sea battle will not be fought tomorrow’ is true.  The actual proposition is not at all important but rather that the truth value of a given future event is certain. At this point we are forced to note that if a future event has a well-defined truth value then free will and/or randomness cannot be allowed. Free will of the combatants would afford them the ability to cancel the battle while a random event, say an unforeseen terrible storm, may arise and prevent one navy from arriving.

To get around the problem that a future proposition must have a definite state of true or false like current propositions do, Aristotle created a third truth value for future events that regards them neither true not false but being contingent.  This is the only violation of the law of the excluded middle that Aristotle allows.  Philosophers seem to love to argue about this (e.g., see the introduction to The Problem of Future Contingents by Richard Taylor, originally published in The Philosophical Review, 66 (1957), now accessibly reprinted in Philosophy for the 21st Century: A Comprehensive Reader, edited by Steven M. Cahn). 

The interesting thing is that while philosophers argue over how well Aristotle’s argument addresses the problem of future contingents and all the concepts that flow from it (particularly randomness), mathematicians, engineers, and scientists have all assumed that the randomness is inherent in reality.  For example, in a recent paper on Bell’s Theorem and random numbers, Pironio et al start by saying “Randomness is a fundamental feature of nature…” already assuming, as a given, an ontological position still wrestled over by philosophers.

Of course, the very roots of probability theory date back to the 17th century and often centered on characterizing the randomness seen in games of chance. Indeed, it is hard to find any general text on the subject without encountering a coin flip, a dice roll, or the standard deck of cards.  The concept of the random variable doesn’t seem to have taken clear shape until the 19th century when the ideas of probability and expectation value were systematically expressed.  It isn’t easy to get a clear and firm history on the flow of ideas in this field and nailing down who contributed what and when is a task best left to others. 

The subject then jumped into overdrive in the 20th century with ideas being applied to so many things it is hard to keep track.  The importance of randomness and decision making in the face of uncertainty became central focal points in both economics, control theory, and artificial intelligence.  A variety of distributions were invented and continue to be invented for understanding experimental data, for actuarial sciences and the setting of premiums in insurance, and for predicting the reliability and lifetimes of manufactured goods.  The concepts algorithmic complexity/randomness and random signals grew in the fields of computer science and electronics.  The field of dynamical systems changed how we look at random behavior in systems that exhibit chaos.  Perhaps, the single most influential development was the theory of quantum mechanics, which is completely deterministic in it predictions right up to but not including measurement at which point randomness and probability come straight to the front. 

All that being said, one of the more interesting corners of the modern aspects of randomness is the generation of sequences of random numbers for a variety of computer applications.  Psuedo-random number generators (PRNG) typically deliver the sequences used in simulations of what we believe to be random effects in nature (usually under some guise of the Monte Carlo method).  However, anyone who has used PRNGs professionally soon comes to suspect their performance in large-scale simulations since they are really periodic algorithms whose short-term manifestation ‘looks’ random.  In more recent years, rRANDOM.ORG has been producing random numbers based on what many believe are ontologically random events in nature (in this case atmospheric noise).  The figure below shows a comparison between a set of uniform and normally distributed random numbers (top and bottom, respectively) produced in numpy and generated by random.org (left and right, respectively).

To the naked eye they look similar.  Both systems produce ‘open spot’ and ‘clumps’ just as one would expect from a random process.  But the gap between looking good and being good is wide and crossing it is difficult.  RANDOM.ORG’s page on randomness or the Wikipedia article on tests for randomness detail the various struggles constructing adequate tests but the long and short of it is that the difficulty lies in the fact that after millennia of talking about chance versus fate we still know far too little about how the world works.

Ad Hominem Ad Nauseum

It is common in the field of logic to throw objections to ad hominem attacks to an argument.  The objection is so strong that the use of an ad hominem attack in critiquing an argument is considered an informal or material logical fallacy.  For those who aren’t quite sure what ‘ad hominem’ means, an ad hominem attack is one in which criticism is leveled at the arguer, typically with respect to some deficiency in makeup or character, rather than at the argument.  The academically accepted approach (at least on paper) is to confine any criticism to an argument’s validity independent of who is presenting that argument. Simply put one should be picking on the message and not on the messenger.

But this approach, while it is charitable, is certainly one that can be called into question.  Consider the courts of law. The rules of evidence allow, during cross examination, the opposing counsel to call into question the credibility of the witness under the idea that the latter may be “biased”.  

The obvious objection to applying this ‘exception’ or ‘violation’ to the ad hominem fallacy is that legal proceedings only flirt with logic but are not based solely on it.  The recognition that findings of the law often don’t follow strictly from the laws of logic is well-known and annoying. But is there really no logical rationale for attacking the credibility of the arguer.

Well, the attentive reader can obviously sense that I am going to argue that there are many occasions where attacking the arguer rather than the argument is clearly both a logical and charitable action.  The rationale for throwing away the ad hominem fallacy is built on two pillars: 1) the primary one being appeal to authority and 2) the lesser one being sophistry that experts often use.

Let’s explore each of these rationales in turn.

Appeal to authority is, itself, considered to be a fallacy, but anyone who argues that point must be a true occupant of the ivory tower because almost all arguments concerning the physical world involve appeal to authority.  For example, how many of us have observed an electron by performing Millikan’s oil drop experiment? Most of us depend on what ‘experts’ have discovered about the electron, including its mass, charge, and spin.  How many of us have analyzed DNA and the cell replication process? And yet we believe overwhelmingly in the power of genetics as evidenced by prenatal testing for birth defects and the popularity of 23 and me and similar tests.  How many of us have combed through the climate change data to verify anthropogenic global warming? And yet we believe in the Paris Accords are vital for our continued several on planet Earth.

An overwhelming majority of the body of knowledge we claim to be accepted and common rests on the authority of experts.  The wikipedia article on the appeal to authority features the following quote:

One of the great commandments of science is, “Mistrust arguments from authority.” … Too many such arguments have proved too painfully wrong. Authorities must prove their contentions like everybody else.
Carl Sagan, The Demon-Haunter World

Clearly Sagan was never an actual practicing scientist or, if he was, he was pretending that science was pure and noble when actual historic evidence shows numerous ignoble events.  A small sample of this would include the fraud perpetrated by Jan Hendrik Schon as disclosed in Plastic Fantastic, fraud that took the majority of the condensed matter physics community based solely on his reputation (i.e., authority).  It would include the increasing inability of science to reproduce many of the published experiments in so-called peer-reviewed journals as is documented in the very valuable review of the scientific paradigm by William A. Wilson in his article Scientific Regress.  Wilson’s opening quote is particularly biting

The problem with science is that so much of it simply isn’t

I can speak to the peer-review process with some authority (see there it is again) as both a submitter and reviewer.  Submittals and reviews are done by human beings who are motivated by reputational capital, possible fame and funding, and similar ‘human’ motivations.  Therefore the proper thing to do is to always question the motivations of any person who proffers an argument. 

The purist would object to that last sentiment by reminding me that an argument stands on its own merit regardless of the reputation and motivation of the arguer.  But this objection applies only to deductive arguments and not even all of these. Deductive arguments form a very small part of ordinary life even though academia thinks otherwise.  An additional point is that deductive arguments, at least by modern symbolic standards, has a some appalling properties.

So, a prudent approach to any argument, is to remain skeptical about the arguer.

Now onto sophistry.  The word sophist commands little if no respect based on the classic condemnation of Plato, who described these teachers as a practitioner of deception, the description of Aristophanes that characterized them as ‘hairsplitting wordsmiths’.  Sophists were known to able to argue and support two contrary positions using ambiguities of language to attain power rather than pursue truth.

Often our current societal dialog centers around some pundit whose arguments are mixes of one part ignorance or opinion and one part sophistry in an attempt to sway us to their positions regardless of truth.  Even when these modern liars use deductive logic, the premises usually rest on their authority. So why shouldn’t we come to the table, loaded for bear to ‘shoot the messenger’ with an ad hominem attack just in case he turns out to talk out of both sides of his mouth.

The Axiom of Choice

I was a graduate student when I first heard of the Axiom of Choice.  The mention, by one of the faculty, was in passing and, since it didn’t seem to impact my particular field and my desire to  graduate was high, I didn’t pursue exactly what is was. All that I was left with was a lasting impression of the awe (or was it frustration or, perhaps, disgust), for lack of a better term, that I heard in his voice.  That impression has stayed with me for years and I thought that this month’s column should be devoted to an introduction to the Axiom of Choice.

Years later, I am in a better position to understand the mix of emotions that came from that faculty member when it came to the Axiom of Choice.  Said simply, the Axiom of Choice may very well be the most controversial of topics in modern mathematics.

The Axiom of Choice is a part of axiomatic set theory, the branch of mathematics that studies sets and attempts to distill their essential properties, from the most mundane finite sets, to the more difficult but commonly used infinite sets like the number line, and beyond (transfinite sets).

Axiomatic set theory grew out of the desire to eliminate the paradoxes of naive set theory, the most famous example being Russell’s paradox.  Towards that goal, several axiomatic systems (or models) cropped up,  The most commonly used system is the Zermelo-Fraenkel model with the Axiom of Choice thrown in (denoted by ZFC). 

The importance of the Axiom of Choice is immediately apparent in that it is the only axiom that has the pride of place (or shame) next to the names of the men who founded this system.  The following table, a synthesis of the articles on the ZFC system from the Encyclopedic Dictionary of Mathematics (EDM-2) and the Wikipedia article on axiomatic set theory. The first number is taken from the EDM-2 ordering, the second is from WIkipedia; likewise for the name with the EDM-2’s nomenclature coming first and the Wikipedia name (if different) coming after.

Axiom NameContent
1. (1) Axiom of ExtensionalitySets formed by the same elements are equal
2. (4) Axiom of the Unordered Pair (Axiom of Pairing)Set X exists with only elements that are made of the elements of sets A and B
3. (5) Axiom of the Sum Set (Axiom of Union)Set X exists whose elements are all the possible elements of a set A.  For example the union over {{1,2},{2,3}} is {1,2,3}
4. (8) Axiom of the Power SetSet X exists whose elements are all the possible subsets of a set A (i.e. the power set)
5. (8) Axiom of the Empty SetThere exists an empty set {}, 
(Note that this axiom is not included as separate axiom in the Wikipedia article nor in the PBS Infinite Series video but mentioned in passing under the Axiom 8 in the former)
6. (7) Axiom of InfinityThere exists a set having infinitely many members (e.g. containing all the natural numbers; 0 = {}, 1 = {0} = {{}}, 2 = {0,1} = {{},{{}}}, and so on)
7. (3) Axiom of Separation (Axiom of Specification)A set X can be separated into two subsets, the first obeying some condition and the second not
8. (6) Axiom of ReplacementThe image of a set under any definable function will also be a set
9. (2) Axiom of Regularity (Axiom of Foundation)Every non-empty set X contains a member Y such that X and Y are disjoint sets
10. (9) Axiom of Choice (Well-ordering Theorem)Let X be a set whose members are all non-empty.  Then there exists a choice function f from X to the union of the members of X such that for all Y in X, f(Y) is in Y.

On the face of it, the Axiom of Choice may seem to be as technically innocuous as its brother axioms and, indeed, it can be deduced from the other axioms when applied to finite sets.  The Axiom of Choice is distinct when dealing with infinite sets. And this is where the controversy arises.

Before discussing that point, it is worthwhile recalling that infinite sets are not strictly the parlance of the erudite mathematician.  Much of physics is built on the idea of an infinite set representing a physical process (e.g. Fourier Series). And infinite sets or, more properly said, their avoidance was used in classical philosophy for all sorts of arguments (e.g. Aquinus’s arguments for the existence of God).  So, the implications of the ZFC system may range further than simply a technical point on the correctness of infinite sets.

That said, the controversy surrounding the Axiom of Choice is both serious and interesting. The first controversy is that the Axiom of Choice presents is the fact that while it tells us that there is a way of selecting a member from a set it doesn’t tell us how.  In other terms, the Axiom of Choice is not constructive and, so, we are left with the problem of saying that we can choose but we don’t know how the choice is made or even, sometimes, what is chosen. It is as if a miracle happened.

A direct consequence of this miracle, as discussed the PBS Infinite Series, is that Axiom of Choice can produce sets that are measureless from perfectly measurable sets.  As shown in the video, at set S has, by one argument, a measure between 1 and 3, but, by another argument, its measure must be zero. So, S is a non-measurable set.

Even more disturbing is Banach-Tarski Paradox.  The essence of this paradox is that a 3-dimensional ball can be taken apart into a finite number of disjoint pieces.  These pieces can then be reassembled to make two balls of the same size. We are supposed to be comforted by the fact that in this process, the disjoint pieces are non-measurable sets.  Details explaining the process are found in the Vsauce video linked below.

I’ll close by saying that after really thinking about the Axiom of Choice and all of the controversy that comes in its wake, I now have a better idea about the mix of feelings of that academic mentioned at the beginning.  I also must confess a deep concern with mathematical logic applied to infinite sets. Unfortunately, that concern seems to have no immediate mitigation, which means researchers and, for that matter, the whole human race still has a long way to go in understanding infinity.

The Three Acts of the Mind

The last post presented the concept of a hybrid AI or, perhaps more correctly said, a hybrid intelligent system which mixed and matched various tools, developed in computer science, to mimic the Three Acts of the Mind.  The idea is that to best mimic the intelligent behavior of humans one must understand what is being mimicked (the human mind) so that one may most closely align one’s mimicry (machine behavior) to the object being mimicked.  Of course, the last sentence is a bit fanciful in its language but its message is quite serious – to best duplicate the behavior of the human mind first figure out what is going on within the human mind.  And that is the point of this post.

To be clearer, neither the dark corners of the human mind, explored by Freud, nor the nearly inexplicable connections between minds and society at large, posited by Jung, nor the red hot passions presented in oh so many of the Greek tragedies are being pursued here.  Only that smallest sliver of human thought, the rational and ordered part, is being discussed because this is the only part that maybe can be mimicked by a machine.  And the description of the rational mind that will be employed is Three Acts of the Mind.

The go-to reference on this is Peter Kreeft’s Socratic Logic.  The entire text book is built around the Three Acts of the Mind and concentrates on classical, Aristotelian logic rather than the usual symbolic propositional logic (more on this in a future post).  Despite the fact that the book boasts nearly 400 pages of content, the basic principles as summarized in a few pages in Section 5.  A condense form of that summary will be presented below.

The situation is bleaker when considering online references.  There isn’t a large amount of material to be found on the web concerning the Three Acts of the Mind.  On possible explanation is that the Three Acts are so obvious as first principles that there really isn’t anything to say on them.  But careful reflection should lead one to realize that this argument is vacuous since it is the first principles of any philosophy that always receive the most scrutiny because they can’t be proved; they must either be accepted or rejected at face value.  Anyway, for those who can’t obtain Kreeft’s book, Dr. Christopher Anadale from Mount Saint Mary’s provides a nice succinct summary.

Both Kreeft and Anadale agree on the basic observations of human thought that underpin the Three Acts, namely that:

  1. Human beings think
  2. Human thought has structure
  3. That structure is objective and knowable

These three assumptions are the hunting license that allows us to go off and categorize the acts of the human mind.  Briefly stated the Three Acts are (in order from least to most complex):

Act of Understanding – grasp one object of thought,  

Act of Judgement – combine to objects of thought into a proposition with a subject and predicate,

Act of Reasoning – combine two or more propositions into a reasoned argument producing a conclusion.

Several things should be noted for each act.  For the First Act of Understanding, it is important that the term or concept be clearly defined.  Nothing obscures human communication so much as when two individuals are using the same word to mean different things and nothing is so tricky as a logical argument where the meaning of a term changes somewhere between the beginning and end.  For the Second Act of Judgement there is little to be said in general.  The key point here is being able to see whether a proposition is true or false and this is a difficult task with no fixed rules for figuring out the truth value of a claim.  For the Third Act of Reasoning, deductive arguments can be algorithmically determined to be valid or invalid.  Inductive arguments can also be analyzed but only to the probability of certainty since an inductive argument can only rendered conclusions that are true within a confidence interval.  It should also be noted that the Act of Reasoning, with its arguments and conclusions, is the activity that actually produces new knowledge, conjectures, hypotheses, and actions.

Now that Three Acts can be a bit obscure when stated in an abstract way, so the following table, adapted from the table on page of Kreeft’s book, should help.  Note that some rows have been omitted, simply for brevity, and that a new row, proposing a possible machine equivalent has been added.

  1st Act – Understanding 2nd Act – Judgement 3rd Act – Reasoning
Logical Expression Term Proposition Argument
Linguistic Expression Word or Phrase Declarative sentence Paragraph
Example book or football game All books have pages with words.   A football game is a sport All books have pages with words. Too Many Cooks has pages with words. Therefore, Too Many Cooks is a book
Structural Parts None Subject term and a predicate term Premises (propositions) and conclusion (also a proposition)
Question Answered What it is Whether it is Why it is
Aspect of Reality Essence Existence Cause
Good When Clear (unambiguous) True Valid
Bad When Unclear (ambiguous) False Invalid
Possible Machine Equivalent Recognized percept Classification Syllogism and action

Of course, there is a lot more to be said about the Three Acts but the above sets the generally agreed upon framework from which other analyses spring.

Hybrid AI

There is certainly a lot of excitement throughout the tech community about the promise of artificial intelligence or AI, as it is more commonly known.  And while many of the advances are impressive compared to where computer science was only a decade ago there is a lot more hype than fact in many of the more outlandish claims being made.  Skynet from the Terminator Movies or the Machines from the Matrix are not soon to take over nor are they likely to do so for many generations.  However, there is a distinct possibility that AI may actually be able to make competent decisions in the near future but only if the community takes a broader focus than it is apparently taking right now.

Now some may object that AI is enabling all sorts of important activities that would not otherwise work.  Reports of scientific discoveries, business approaches, and computational improvements abound so why can’t one conclude that AI is making competent decisions now?  The crux of the matter is the definition of what AI is/does and what competent means.  To flesh out this argument, let’s take a small regression to discuss what is commonly meant by AI and what is capable of doing now.

Many modern books on AI and a spate of YouTube videos extol the recent advances in AI with particular attention on algorithms like convolutional neural nets and the vast improvements they offer in image classification or on support vector machines or K++ Means algorithms and the power they offer in clustering data.  For example, one of my favorite videos on the subject is But what is a Neural Network? By 3Blue1Brown

Grant Sanderson (the voice and vision behind 3Blue1Brown) starts off his video by discussing the remarkable functioning of the human visual cortex that can recognize all sorts of different renderings of the number 3.  For example, each of the following glyphs show the string “3” printed in a different font. 

Most people (and their remarkable visual system) can tell that each character is a different rendering of the same number.  And, as Grant details in his video, convolutional neural networks seem to be able to perform the same recognition. 

He moves onto discussing how a neural net can be used to encode similar kinds of pattern recognition for a machine allowing it to recognize edges, or loops, and so and that within its multiple layers can be found the ability to report, with very highly levels of certainty that each of render corresponds to the same underlying digit. 

This is quite a step forward in machine vision but does it really constitute artificial intelligence or decision making?  Sure, the algorithms can comb through vast amounts of data looking for a prescribed pattern and can ‘decide’ when they have found a candidate.  And these results should not be surprising because, after all, neural nets were designed to mimic the human cortex and who is to say that the training the net receives and the way it decomposes images into parts doesn’t mimic what is done in the brain.

Despite those arguments, it is philosophically hard to say that the AI can even come close to making competent decisions. 

There are two reason for this assertion, one technical and one philosophical.  On the technical front the  best operative definition of artificial intelligence is that of the rational agent taken from Artificial Intelligence: A Modern Approach by Russell and Norvig.  Under this definition, the machine must not only recognize the desired pattern (e.g. deciding that it sees a “3”) but it also has to perform an appropriate action based on that recognition. 

To understand how a rational agent takes the appropriate action, Russell and Norvig assert that a rational agent received stimulus, called a percept, (in the case above the pixel values of an image of the rendering of “3”) and then acts so as to maximize the value of some performance metric based on the percept, the sequence of percepts up to that point (i.e. some notion of memory), it’s knowledge about its environment, and the rules that it has for actuating whatever actions it can take.

Simply pattern matching and ‘deciding’ that the pattern is either seen or not doesn’t really meet the definition of a rational agent.  For example, no sequence of previous percepts (excepting the initial training) is used by the machine learning techniques currently being enthusiastically pursued.  The current systems aren’t capable of continuous adjustment as situations change.  To pattern match a “3” the system needs to find two half-loops stack one upon the other.  If gradually, the adoption of other styles, say the Roman numeral III, became popular in the percept stream, the net would be stymied.

In addition, it is difficult to call a binary sorting a ‘real decision’.  Certainly, it is useful to have such a sorting algorithm that can look at vast amounts of ‘noisy’ data and point out the parts that are of most interest but the judgement of what is of interest still sits with the human element.   And, to be sure, this is an important step forward, but it doesn’t really constitute ‘learning’ or ‘rational’ in the human sense.

And this brings up the second point.  Traditional philosophy recognizes Three Acts of the Mind.  Roughly the three acts break down as follows (using our “3” once again).  First Act:  the system recognizes the “3” in a sentence.  Second Act:  the system recognizes the meaning of the sentence “Start the process at 3.” Third Act:  the system can reason through chaining of statements to answer “Why didn’t the process start at 3?”.   At best, what has been accomplished corresponds to the initial, baby steps into the First Act.

For a system to really exhibit some semblance of rationality it must mimic the three acts, there need to be a hierarchy of different types of agents all working together.  A possible example of this would be a system as pictured below:

At the bottom would be some set of agents to perform the First Act, perhaps two differently trained convolutional neural nets or a convolutional neural net and a support vector machine or whatever other combination one could imagine.  This layer, like the others, should be tool agnostic, focusing on what should be done not how.  The second layer might have an expert system to interpret the percepts from the lower layer within the context that the system finds itself (effectively answering Russell’s and Norvig’s requirement that the rational agent know its percept history and its environment).  This layer should also have some way of swapping the tools in the lowest layer or adjusting their operating parameters, allowing it to change how it looks at a problem.  Perhaps an Analytical Hierarchy Process or a Genetic Algorithm can be employed to weight the performance of the tools.  Finally, in the third layer should be some set of tools for chaining the results from the second layer so that rational decisions could be made.  Here the tool set is far more speculative.  Perhaps a different kind of expert system or AHP or, perhaps, an A* algorithm could be used.  It really doesn’t make what the tools are but rather what they do.  This seems to be the only blueprint for achieving a real AI. 

 

Pseudoscience and Science?

 

The roots of this post date back almost over three decades.  Shortly after I started full time employment and had some money to spend I decided to get a subscription to Scientific American.  At the time, I thought that it was a excellent publication devoted to pure science.  With time I began to see the business side of it, the need to generate ‘science excitement’ even when there was perhaps none.  Over the years, I became much more savvy in seeing the ebb and flow, the highs and the lows of their editorial policy.  And I learned as much, or even more, by the bad treatments as I did from the good ones, since the former challenge one’s ability to identify, isolate, and explain what went wrong.

One particular low, which caused me to let my subscription lapse, was a provocative piece on pseudoscience.  In principle, I have no beef with anyone taking sloppy thinking and evidence-poor arguments to task.  I have had, on many occasions, the need to roll up my intellectual sleeves and rip apart poor scientific logic, whether it was someone else’s (I’ll discuss a particularly interesting example later on) or my own.  But that’s not what the author did.  Rather than focus on the ‘truth content’ of the claims, he focused on the ‘stupidity’ of the ideas and made light of those who held them and I have a big problem with that.

As a typical example, the author mocked the concept of alien abduction.  You may be remarking to yourself that alien abduction should be mocked; just look at the people who assert that they are victims.  But we shouldn’t be mocking people in general nor should we mock an idea, no matter how far-fetched.  The assertion that people are taken off the earth and placed on alien spacecraft and are subjected to who knows what else is neither a scientific statement nor unscientific one; it is ascientific.

Admittedly, the word ‘ascientific’ is one that I’ve coined for this discussion but I think it serves the need well.  No simple statement that can be uttered, such as ‘the sky is blue’ or ‘the sky is red’, can have any scientific content, merely truth content.  We add the ‘science’ to these assertions by the way we investigate the claim made in the statement.  In other words, the word science should mean the process by which we add a truth content to an assertion not the truth of the assertion itself.

The ‘science’ begins by the very obvious but often forgotten steps of formulating a question and then gathering what evidence exists.  (The wikipedia article on the scientific method is amazingly detailed and explanatory on these points as well as the others that span the general principles of the method).  In the case of alien abductions, we start with the claim at face value and look at what evidence exists.  Admittedly, the evidence that its proponents muster is the worst kind of evidence, riddled with eyewitness testimony and scant bits of circumstantial evidence and no reasonable analysis would it.

So, based on the evidence, we can conclude that there is no support for the statement ‘alien abductions take place’ and the truth content of any claiming victims is low.  But it doesn’t mean we disproved the statement as a whole, anymore than Europeans of nearly 350 years ago could have proved that the statement ‘some swans are white’ as false.  The tests we set a limit on the likelihood of the particular assertions that have been made.

To be more concrete, I’ll draw from personal experience and talk about two of my encounters with people who believed in the explanatory and predictive power of astrology.  Both of these encounters occurred when I work for a small technology firm and the people involved were highly trained STEM professionals (oddly enough they didn’t know each other).  Once the individual in question brought up their belief in astrology I thought I would put the power of the system to a test.  Rather than reveal my ‘star sign’ and get a stream of vague statements that could be interpreted in just about anyway, I asked for the person to predict my astrological sign given what they knew about my behavior, attitudes, and habits.  In both cases, the adherent started by suggesting what I might be and eventually ruling out the actual sign under which I was born.  They might have had better odds simply by spinning a zodiac spinner.

In both of these areas (alien abduction and astrological predictions), we are on firm ground saying that since we’ve examined a large number of such claims and found little to no evidence supporting them that it is unlikely we will ever find anyone with a credible set of facts that support these claims.  But we need to recognize that: 1) are conclusions are not deductive proof but merely the product of statistical inference, and 2) the individuals making these claims are real people and we must treat them with rudimentary respect if we ever hope to persuade them that these positions are very likely wrong..

Now some might suggest that I’m splitting hairs with this fine distinction between the claim and the truth of the claim?  What practical implications could warrant such a careful treatment?  One prominent example springs to mind – the debate over whether to vaccinate.

Before digging in, let me state categorically that in my opinion the benefits of vaccines outweighs their risk and I endorse getting children vaccinated.  But I am not ignorant of the fact that in the current debate there are actually two pseudoscience positions.  The first is the obvious position taken by in this debate by the anti-vaccine crowd, best attested to by the recent and ongoing outbreak of measles in the US.  The other pseudoscience position is by the medical professionals who assert that vaccines are perfectly safe by citing ‘medical studies’.

Much of what passes for medical studies, or statistical studies in most disciplines for that matter, themselves border on pseudoscience.  As William Wilson so nicely puts it in his opening paragraph in the May 2016 article of First Things: “The problem with science is that so much of it simply isn’t.”  In a nutshell, his points are that most studies are unreproducible (and therefore unfalsifiable) and are built on the flimsiest of statistical interpretation.

The public is aware of these problems even if they are unable to articulate their concerns precisely in a way that would make a philosopher happy.  Therefore, it is hardly a surprise when they hold pseudoscientific positions that cause unneeded misery and death.  The science community needs to put its own house in order by raising standards on publication before turning its critical eye somewhere else.