Summing Up is (NP) Hard to Do

One can’t really be alive, work with computers, writing algorithms to solve hard problems, and be ignorant of the importance of a good algorithm.  Most texts on scientific programming, which formed the bulk of what I read in my formative technical years, possess at least on cautionary tale about how slow a computation can be if the algorithm is bad.

One classic tale I like, which is often trotted out with gusto, is the calculation of the Fibonacci numbers.  As a reminder, the nth Fibonacci number is the sum of the two previous ones

Fn=Fn1+Fn2.

So to compute F100, one need to compute F99 and F98, each of which need the two below them.  Computing naively causes a lot of repetition since one computes F98 and F97 to compute F99 and F97 and F96 for F98 and so on.  Of course, a fast algorithm exists which essentially involves recording these values once computed, thus ensuring that they are computed only once in the recursion.

While compelling, this cautionary tale is a bit of a cheat.  Someone actually knows the best algorithm and it happens to be quite fast and, generally speaking, it isn’t all that interesting to compute the Fibonacci numbers oneself, since they are immutable.

Much more interesting are the situations that are dynamic.  Finding the best route to visit an arbitrary list of cities is the stuff of legends with which the famous Traveling Salesman Problem (TSP) wrestles.  This problem is often listed as hard, or rather NP-Complete, a class of algorithms that belongs to the larger class of NP-Hard problems.

Roughly speaking, NP problems (or which NP-Complete and NP-Hard belong) are computational problems that do not possess known algorithms that can find solutions in polynomial time.  What that means is that as the input data set, usually thought of as consisting of N entries, is increased (i.e. N grows) the computing time required to find a solution grows much fasters than any polynomial in N.  Usually the growth is exponential, although worse cases, such as combinatorial growth, also occur.  This is in sharp contrast with the P class of computational problems that do possess known algorithms that can find solutions in polynomial time.

The curious thing about NP problems are that while they are not quickly solvable, they are quickly checkable.  This phrasing means that while it takes a long time to find a solution for a large input data set, once the solution has been found it can be verified in a relative short period of time that scales as a polynomial in N.

This distinction is hard to understand and this difficulty is compounded by the fact that the NP class has three qualifiers making for NP, NP-Complete, and NP-Hard as well as containing the P class.  In addition, problems like the TSP are very specialized and complex making them very hard to relate to in general

NP_hard

Note: The reason NP-Hard lies partially outside the NP boundary is because some NP-Hard may not even be decidable. Undecidable problems possess no definitive answer to the computing question with which they are posed.  That such problems exist inescapably follows from Turing’s analysis of the halting problem.

So I searched for a simplified problem that I could sink my teeth into.  A problem that would be tractable enough to play with but still demonstrated the ‘NP-ness’ in a way that I could relate.  After some trial-and-error, it seems that Subset Sum Problem (SSP) fits the bill.

The SSP asks the following question.  Given a sum S and a set of integers Q is there a proper subset of Q that sums to S.   It is provable that the SSP is NP-Complete:  a solution exists (i.e. the problem is decidable) ; it is hard to find a solution, especially when Q has a lot of elements; and it is easy to check the solution once the answer is revealed.

That the SSP is decidable is actually trivial to see.  Given a subset PQ of integers, it is easy to sum them together and to test whether the result equals S.  By playing with a simple configuration for Q, it was easy to demonstrate that it is hard to just find a subset that sums to the desired value and that is equal easy to check once an answer is proffered.

Consider the following 3×4 block of integers between 1-9, inclusive.

3x4 grid

The sum of all of them is 53 and the smallest number in the set is 1, so it is also easy to answer no to the SSP when S is outside this range.  But what about the rest of the values?  Can any value between 1 and 53 be represented?  (It should be clear that 1 is represented trivially by one of the 1s in Q and that 53 is represented by the whole set).

For a test, can we find a subset that sums to 19?  That answer is obvious if one stares at the grid for a while.  Since there is a 9 in the grid, all one needs to find is two numbers that sum to 10 in order to demonstrate that 19 is a valid subset sum.  One such possibility is

subset_sum_19

How should we go about answering the general question?  This is where the NP part comes in.  In general, the best algorithms known are not polynomial in complexity.  I won’t try to prove this; I am simply stating the conclusion of the computer science community.   But being NP, it is easy to check that a proposed solution is valid in polynomial time.  For example, it may not be obvious that 47 is contained as a subset sum within the grid but the following shows it to be true

subset_sum_47

Each of the blocks of different shades of red sum to 10 (i.e.: 6 + 4; 9 + 1; 8 + 2; 5 + 4 + 1) and the light orange block is a 7. Checking that solution is easier than finding it in the first place or in finding an alternative (note that one trivial change can be done in 3 cell to get a different set that subset sums to 47 – hint – look at the blocks that remained white).

Now back to the general question as to whether any sum can be constructed.  Consider the grid now colored so that the red subsets sum to 30 and the white portion of the grid consists of the numbers 1, 2, 3, 4, 5, and 8.

all_sum

Since the numbers 1, 2, 3, and 4 can sum to every number on the range 1-10, we already have that all numbers from 1 to 40 are subset sums.  For example, 27 can be formed from the two pairs of tens (say 1 + 9 & 7 + 3) and from 3 + 4.  Finally since 8 + 2 = 10 and 8 + 3 = 11, the remaining sums from 41 through 52 are also achievable.  Thus we have proved that all values between 1 and 53 are achievable.

Note that the time it takes to confirm any of these values is much shorter than the time it would take an algorithm to find these solutions.  That is what it means to be NP-Complete.

Leave a Comment