THE NP-COMPLETE ARCADE

Here, for your amusement and experimentation, is a set of applets that demonstrate some well-known NP-Complete problems.

Updated 21st June 2011: Postscript

 

click image to go to applet page
Click on the image to go to the applet.
click image to go to applet page
The Million Dollar Sudoku Puzzle

Had you solved this 'super fiendish' puzzle, and posted your solution to The Times, you might have won yourself a bottle of bubbly. To earn that, you would have used a fair number of the deductive tricks that you've learned over the last few years

Alternatively, you could have used something like the Java applet shown on the left. It's a non-deductive sudoku solver.

What I mean is that the program doesn't use any sudoku strategies. It simply looks for a permutation that fits the basic rules, and the given numbers.

The generalised form of this puzzle, viewed this way, is one of a steadily growing number of problems, classed as NP-Complete, that are the focus of the PvsNP question. I won't attempt to explain this. You probably know at least as much about it as I do. There's an introduction in the Wikipedia. If you want the full works, read Stephen Cook's statement of the problem.

The Clay Institute of Mathematics are offering a prize of $1,000,000 to anyone who can answer the PvsNP question. It's one of the seven Millenium Problems.

The world of technology would give a great deal more.

This page is not concerned with it.

Regardless of whether or not the PvsNP question is answered, the theoretically hard NP-complete problems in its domain are easily solvable in many real-world circumstances.

Sudoku puzzles are created in the expectation that they will be solved. Generally, there is only one solution. Normally they are solved by deduction. They can also be solved, as they are here, using similar methods to those used to solve the Travelling Salesman Problem.

One thing that's said about NP-complete problems is that if you've solved one you've solved them all. There are some that say that sudoku is the one to concentrate on.

 

In common with all the applets on these pages, this one uses a random method. But, unlike the Music Tape applet (Misterioso below), this one doesn't simply try random arrangements until it finds one that fits. It wouldn't be possible to do this with today's technology.

There are 6,670,903,752,021,072,936,960 valid sudoku grid arrangements of the 9 sets of 9 numbers. To work through these at ,say, 1 million per second, would take 211 million years.

This applet doesn't look for the combination that fits....it shapes it. Often, it takes less than half a minute to do this. It seldom takes more than a couple. It's all a question of luck. You'll find quicker and surer sudoku solvers on the net, but this one's not trying to be clever.

Initially, the given values are set in their places in the matrix from the stock of 9 sets of the numbers 1 to 9. The rest of the stock is allocated sequentially to fill the matrix. The result is far removed from being a valid arrangement. The applet then embarks on a procedure whereby random incremental progressive changes to the non-given elements of the matrix cause it to evolve towards being a valid sudoku permutation. In the case of a a single-solution puzzle like the one here, the resultant permutation will be the 1-in-6,670,903,752,021,072,936,960 arrangement.

As happens with evolutionary systems, the applet can find itself in a blind alley where, apparently, no further small changes can yield improvement. It doesn't know for sure that it's stuck, but it assumes that, if it has gone through 100,000 interations and not come to the solution, it is in a cul-de-sac. To overcome this stalemate it partially scrambles the failed result, before resuming the process. The jerky progress of the applet is a manifestation of this.

When it reaches each millionth iteration point, it is about to restart after being stuck. You have the options of continuing, or resetting. My feeling is that it probably doesn't make a lot of difference which you choose.

 

Misterioso

Some years ago, I was writing a Java application to browse my collection of jazz recordings. One of the things I wanted do with it, was compile collections of tracks that played for exactly the amount of time that I specified. That was in the days of tape cassettes, when you aimed to fill each side exactly so that there was no need to rewind when you turned the tape over.

My idea was that the compilation routine would simply to work through random combinations of track times until it found one that fitted. If it didn't find one after 5000 tries, it would present the one that was closest. With thousands of tracks to choose from, I didn't think it would have any problem finding a set of tracks that exactly fitted the specified tape length.

I was surprised though, when I tested the program with just 20 tracks, that it came up with a set that fitted.

I've put the algorithm into the applet shown below. Click on the image if you want to play with it.

You can change the tape length and track lengths. If you blank out a track length, you knock it out of the process. You'll see that you can reduce the number tracks to 15 before there's any difficulty finding a combination of the right length.

 

click image to go to applet page

 

What I didn't know then, was that the problem of selecting a subset of numbers from a set, to exactly total a target value is one of the best known NP-Complete problems. It is the subset sum problem.

With just 20 tracks, there are just over one million possible combinations, so we're not in the going to blow any fuses in getting our answer, even if there happened to be just one solution.

The next applet applies the same algorithm to a set of 40 tracks. This gives us the much meatier number of 1,099,511,627,776 combinations to work through.

If you play with it, you'll see that, although we have a million times the number of possible combinations here, it doesn't take appreciably longer to obtain a subset totalling to the required tape length. This is because the proportion of such subsets to the overall number remains the same.

We can thin out this density, by changing track times to a value (eg 50:00) that is more than the tape length. If we do this to 25 of them, we'll end up with just one or two (or perhaps nil) qualifying subsets in among our trillion combinations. The first 20 tracks are the same as in the 20-track applet, so you can use that to check that there is a subset to be obtained.

I've limited the number of iterations to 1 million, so that you won't have the applet running on for hundreds of hours!.

 

click image to go to applet page

 

 

 

 

Sudoku 17

I've run my sudoku applet with one of Gordon Royle's Minimum Sudokus. These have a single solution with just 17 givens (or hints, or clues, whatever you want to call them). As my program doesn't use any sudoku strategies, I had expected it to stupidly chomp through it in a reasonably short time........ But it didn't.

I was disappointed at first, but now I'm just a bit amazed. It's not because of the single solution (not a problem up until now). Nor is it the scarcity of the givens (see sad puzzle to the right). It just seems to have found it too difficult*.

If you want to see the applet struggle with a minimum sudoku, go to my applet page. It's the second applet down.

* However, see Postscript lower down.

 

click image to go to applet page

 

 

click image to go to applet page

Sit Down You're Rocking The Boat!

This little trick really blows my mind. Dressed up as a clever balancing act, it is simply a special case of the subset sum problem. It's called the partition problem.

For this demonstration, you have 20 sealed crates to load into the hold of a ship. Their weights are random numbers in the range 1-50,000 units. You have to flip them, port and starboard, so that you have exactly the same weight (or a difference of 1 unit if the total weight is an odd number) on either side.

The applet allows you to do the partitioning manually if you feel up to it. It also allows you to alter the numbers that you've been given.

 

A Question Of Balance

This next applet will help you find some answers to questions that no doubt occurred to you while you were playing with the 'StevedoreStomp' applet above. What happens with larger numbers of crates, and bigger crates.

With this one you can increase the number of crates to 1 million, and the maximum crate size to 1 billion. You can run for as long as you're prepared to keep pressing the 'CONTINUE' button that pops up every 10 million iterations. You can also play around with the 'STOMP' factor, that controls the degree to which applet is focussed on its goal.

Here are some statistics to keep in mind if the applet doesn't seem to be getting anywhere......

Number of crates Time required to review all the ways of dividing them between the 2 holds
20 1 second
30 20 minutes
40 12 days
50 35 years
60 37,000 years
70 37,000,000 years
80 Twice the age of the universe
1,000,000 Don't bother thinking about it!

click image to go to applet page

 

That's not to say that balancing the cargo gets harder simply as you increase the number of crates. Indeed, under certain conditions it may get easier (see bit on 'stomp' below).

To balance a cargo of 1 million crates in the range 1 to 50,000 takes a few seconds..... around the same time as it does for 20.

The maximum crate weight seems to be a more significant factor. To balance 80 crates with max. weight 50,000 takes seconds. To balance the same number with max.weight 1 billion takes around half an hour.

The weights of the crates, that are randomly generated for this applet, range from 1 unit to the maximum number of units that you specify. As the range is increased, the difference between one crate and the next in weight also increases. It is these differences that have to be combined to cancel each other out. For a given range of differences, you need a minimum number of them to do the trick.

The table below shows results from a large number of runs, using an applet similar to the one here....but automated for multiple runs. It shows the maximum crate weight that can be exactly balanced without a single failure in the series of runs for each Number of Crates. I started out with 5,000 runs each, and cut it down to 1,000 after I'd passed the 25-crate mark.

 

Minimum number of crates

Half the number of possible arrangements

 Maximum  crate weight

Average number of  iterations 

n c=2(n-1) m i
20 524,288 15,000 31,835
21 1,048,576 35,000 72,669
22 2,097,152 69,000 142,112
23 4,194,304 140,000 308,916
24 8,388,608 283,000 631,578
25 16,777,216 564,000 1,251,452
26 33,554,432 1,128,000 2,651,895
27 67,108,864 2,267,000 5,291,363

 

There's a wide variation in the number of iterations it takes the applet to balance a cargo....even with successive runs on the same cargo. The figures in the columns 3 and 4 are a bit wobbly, but I think it's clear enough how they progress. As n is incremented by 1, the observed values of m and i are doubled. The calculated value of c is also doubled.  The value of m is approximately c/32 or 2(n-5), and i is approximately 2m or c/16 or 2(n-4).

Using these relationships, we can say that with 80 crates we can balance a maximum crate weight of 37,778,931,862,957,161,709,568, and that it would take, on average, double that number of iterations to do so.

If we reckon that we can work through 1,000,000 iterations per second, this would take 2,395,924,141 years.

Looking at it one way, we could say that taking this long to shunt 80 crates around to get them balanced, has all the hallmarks of an NP-complete problem.

Looking at it another way......if we had an applet that just counted and displayed up to the maximum crate weight, at 1 million per second, it would run for around a billion years. We wouldn't regard that as a NP-complete problem.

So, in this case, is the applet concerned with working through the combinations of 80 crates, or with balancing large numbers*, to get a successful result? Are these necessarily the same thing?

* See Postscript lower down.

A 23-digit number is not that impressive to look at, but it's a hell of a lot of beans.

I've written another version of the applet that allows you to set a minimum crate weight. I think this is is a bit more realistic. It allows you to work with big crates that vary relatively little in weight. The closer you set the minimum to the maximum, the more you apply the additional contraint that the number of crates on each side is the same. The amounts to be balanced are the excesses of the crate weights over the minumum, so the minimum number of crates and the iterations are reduced accordingly. Generally, the need to have the same number of crates on each side is not a problem.......unless of course you have an odd number of crates!

STOMP

The stomp value is the percentage number of regressive changes that are disallowed. All the above trials were carried out with a value of 50. At that value, the number of crates has very little effect on the resolution time.

The stomp value has a big effect on the running time when the number of crates is large. Here are the results of a series of runs (1000 per stomp value) for a maximum crate weight of 50,000.

 

Stomp Value Average iterations with 23 crates Average iterations with 100 crates Average iterations with 1,000 crates Average iterations with 1,000,000 crates
0 149,169 266,192 893,861 24,556,773
10 133,687 206,081 403,299 518,989
20 123,461 167,718 243,180 259,776
30 110,188 133,769 156,932 164,251
50 98,227 105,568 107,117 117,066
70 95,135 93,992 94,019 90,660
80 98,436 86,769 82,210 79,539
90 117,183 84,006 74,383 75,111
100 fails fails fails 79,432

 

 

Combinatorial Junior Tool Kit No.1

The applets that I've posted up until now were designed to explore the circumstances under which some np-complete problems become difficult.

Leaving aside the theoretical difficulty, many instances of these problems in real life are more or less easy to solve, depending on the sets of data they operate on. They can be difficult enough however, to require the help of some sort of tool.

The Junior Tool Kit contains a set of applets designed as tools that can be used by you to solve your problems using your data. My expectation is that generally they will give you an exact solution. Failing that, they should give you a solution with a very small error. If you've played with the applets above you'll have some idea of what to expect. Take them as you find them. If they help you, that's fine.... if not, then you probably need the Senior Tool Kit.

Don't hold your breath though...the Senior Tool Kit may never become available.

I'll start with the partition problem, because it's a fairly straight forward modification of the previous applet.

You just paste (or type) your set of integers (with or without associated identifiers) into the input text area, press a couple of buttons, and copy the resultant subsets from the output text area. It uses the CSV (comma seperated values) import/export format used by most spreadsheets.

Click on the image below to go to the applet page. You'll find full instructions there.

 

click image to go to applet page

 

Although the Target Difference, in the above applet, is ostensibly for the purpose of producing unequal partitions, it also provides the means of obtaining subsets that sum to a specified value. Maybe you've already used it that way. For each difference, there are 2 subsets. One of these can be regarded as the obtained subset; the other is then the residual subset. ie the original set minus that obtained. The relationship between the sum of the original set (S), the obtained subset sum (s), and the target difference (d) is:

s = (S - d)/2 or d = S - 2s

So, it should come as no surprise that the second tool in the box, the Subset Sum Tool, looks very much like the first.

 

click image to go to applet page

 



The Forgotten 400

I misunderstood the Clay Institute's 400-student problem, but I don't think I was the only one that thought the 100 students were to be accommodated in 50 double units. When I wrote a program to solve the wrong problem, I found it was very difficult to come up with a Dean's list that was even slightly difficult to resolve. Only when the list contained all but a few hundred of the possible 79,800 possible pairs did it become hard. I was unable to devise a list that was impossible without it being obviously so.

Then I thought 'that's too easy', re-read the problem description, and wrote a second program to populate a 100-student dormitary. The list started to get difficult with around 3,200 pairs. I began to understand what the 'hard' bit was.

That was 3 years ago. The other day, while I was testing the Clique Tool (below) I had the feeling that I was revisiting somewhere that I'd been before. Sure enough, googling on 'clay 400 student clique', I found that I was probably right about the dorm.

The applet below begins to find it difficult to form a clique of 100 at around the same point as my bespoke program did. A dean's list of 3200 is equivalent to an average 383 adjacencies in a graph of 400 vertices.

So here's tool #3....

 

click image to go to applet page

 

 

1962

The jazz buffs among you might have noticed that all the jazz tracks in my sample data for the subset sum tool were recorded in the same year...1962...nearly half a century ago. Stylistically, they can be identified as being from that time, but for those unfamiliar with jazz history, they could have been made any time since. You couldn't have said that in 1962 about King Oliver's recordings cut just 39 years previously, and sounding like carefully preserved relics from another age. Recording quality in the sixties was nearly as good as it ever was going to get. That's a subjective view of course. I'm not looking to argue with anyone about it.

Proctor and Gamble held the Car 54 TSP contest in 1962. The $10,000 prize required finding the shortest round tour of 33 specified cities in the US. My guess is that the winners used a large university computer. By large I mean that it took up a lot of real estate. They probably had it booked for a whole weekend. Although they were thought to be pretty awesome then, computers had not reached a plateau in their development. Forty years on, what was a weekend's computation can be completed in a fraction of a second. If Moore's Law continues to apply, computations that would now take a million years to complete will only take half a minute in 80 years time.

The doubling of speeds will bring more and more time-consuming processes into the realm of the instantly completable. Will this eventually make the PvsNP question irrelevant? No more than it already is, I think. We only need 60 members of a set of whatever we're playing with, for it to take a million years to examine all combinations. We can wait 80 years to do it in seconds, but we then only have to add another 40 members to be right back where we started. If we wait until the end of time, then maybe the question gets a bit flakey....but what doesn't?

It Ain't What You Do It's The Way That You Do It

When I started reading about the Travelling Salesman Problem I found much more accessible information about techniques used, than I had hitherto been able to find for the other problems I'd been working on. I found that the methods that I'd been using were not new, except perhaps in minor detail. I hadn't really expected them to be. It's not rocket science. You only have to ask yourself how you're going to solve a problem without working out how to do it. I can't solve sudoku puzzles. I didn't need to know how to solve them to write an applet that can. All I needed to know was what the problem was.

I'm an artist, not a mathematician. If I were wearing a conceptualist hat, I'd class these simple applets that randomly bring order out of chaos, as works (it's Turner Prize stuff if anything is). Even from a more traditional point of view, the output from my next applet, with its increasing economy of line, is pleasing to the artistic eye.

I said about the sudoku applet that there are better solvers around. You'll find plenty in the web that address the problem through its logic, rather than through chance mutations. The same could have been said the about TSP applet below if I'd gone ahead with my original intention to present an applet that used a similar method to everthing else I've put here up until now. But then I wrote another applet that used a more formal method. This uses an arbitrary insertions algorithm. I read about it in this paper. It works much better than my earlier applet. So figuring you'd want the the best, here's tool #4. Click on the image to go to the applet page. You'll also find my original applet there, along with some other goodies. TSPLIB format is used for input. Most of the TSP and ATSP examples in TSPLIB can be loaded.

 

click image to go to applet page

 



Random thoughts

Only 14 billion years?

My thought when I first heard that number as the estimate of the age of the universe was that it was far too small. 14 with nine noughts after it is a very small number in combinatorial terms. The apes whose job it is to type the complete works of Shakespeare will only get the first few words right in that number of years. Fortunately we don't have to wait for them to finish the job. Evolution already produced the bard, and he got it done around 400 years before typewriters were invented.

Richard Dawkins, in his Blind Watchmaker TV documentary, used a simple computer program to show how quickly a short phrase could be generated through incremental improvement, as opposed to the way the monkeys would do it. He was using it as part of his demonstration of how natural selection could give rise to complex components. In the course of the rest of the program, he convincingly shrugged off the creationists objections to Darwinism. But the monkeys, briefly introduced and dismissed, were left typing away.

The Laureate Ape Show runs and runs. The audience is promised that sooner or later one of the monkeys will type the first bardic capital letter, and from that point on will faultlessly reproduce the great works to the last fullstop. At the end of time the team will have typed not only Shakespeare's stuff an infinite number of times, but all the other great writings from Dostoevsky to Hank Janson, an infinite collection of highly acclaimed original works, and an infinitely larger collection of lesser works, incomplete works, and pure gibberish....the audience will have slipped away.....and the show will go on.

By, once in a sub-eternity, sitting down at the keyboard and getting it right straight off, the monkeys would be demonstrating the inevitability of creation. Bolstered by infinity, a process doesn't have to be efficient to a get a result. Luck rules. The improbable becomes a certainty. It's the result that counts.

In the infinitude* of time, our 14 billion years is but one of an infinite number of instants. It's unthinkable that just one of these has witnessed the birth of a universe. If it can happen once it can happen an infinite number of times no matter how unlikely it is for the right conditions to arise. Accepting that line of thought, I suppose that no estimate of the age of the universe should be surprising.

* I have in mind a common sense, atheistic, unscientific concept of endless time here.

******

A process that would take 14 billion years to complete if I started it on my computer now would be computed in a fraction of a second using a Moore's Law compliant computer in 2140. If that law continued to apply indefinitely then at the end of time we would have an infinitely fast computer. I'll stay clear of commenting on what that would mean. I'd be in danger of paraphrasing Douglas Adams. Anyway we're not going to get there.

Sometime in the finitely measured future, however, it will appear that for most practical purposes computers perform tasks instantly. My trial and error sudoku solver will appear to be as fast as the cleverest of those around. It won't be, but no one will care that it produces the answer in a microsecond rather than a nanosecond. In many fields computers are already close to the point where an increase in speed will not make them appear any faster. Google presents the sites that relate to your query in a blink of the eye. When you click on a high definition JPEG file you see the picture flash up instantly even though the computer is working sequentially through millions of instructions to display it. Computer games smoothly move around complex objects in immediate response to your play. There may be still room for enhancement, but it won't be many years before increased speed and definition would be in excess of what can be perceived.

But, so long as computers are less than infinitely fast, NP-complete problems will still require unimaginable amounts of time to solve. Unless someone finds a surefire method of checking out all combinations without going through them, these problems will continue to be in a category of their own.

Postscript

It's more than 4 years ago that I started putting together this page. Reading back over the early text I wrote around the applets, I encounter myself as one who could benefit from what I know.

In the 'Question of Balance' section, I explored the effect that the size of the numbers have on the partition problem. I set out my results without giving them much thought. I asked a question then that I can now answer. In the case in question, we are balancing large numbers....or rather, we're working to greater accuracy. We pay a price for this, but it's not exponential. If we want the answer to be 23-digit times more accurate, it will take a 23-digit times longer to obtain. In the examination the NP-completeness of the problem, the size of integers is something of a red herring. It has, however, relevance when it comes setting up an instance of the problem. By balancing the number of integers and their size, we can adjust the statistical incidence of the solutions. A mathematician would be able calculate this ratio to yield a given probability of solution. I would have liked to have been able to do this when I was testing the applet. I'd have liked even more to have been able to have created large sets with specific numbers of solutions. If I'd been able to do this, I'd now be quite wealthy.

The applet in that section works well, but has been made redundant by the Partition Problem Tool. The tools have turned out to be a popular attraction. I'm particularly pleased with the Clique Problem Tool. I've tested it with my own data, and with some fairly 'difficult' data from other sources. If anyone has a graph with a known maximum clique that it hasn't been able to find, I'd be glad to hear from them.

The music tape applet is the simplest one. The size of the numbers that it's working with, guarantee a high density of solutions. It simply draws tracks at random from the library until it gets an exact fit, or runs out of time. It's a trick that you might consider staying around to see the chimps complete. The cargo balancing puzzle applet works in a similar way.

The other applets use incremental improvement algorithms similar to that which I describe for the sudoku solver. I've lost track of the different strategies that I've played around with to get them to recover from being stuck and/or to stop them getting so. The question of what is changed and how much, and how progress is rated, for each iteration, has also been the subject of much experimentation.

None of the applets can tell you if there is no solution. The question is open as to whether any of them will reliably home in on a single solution. The sudoku applet seems to. With my newer faster computer, I now find that it does solve the minumum sudoku. It takes around 100,000,000 iterations. This is still a very small number when one considers the number of permutations that it's faced with (much more than the number of valid sudoku grids that I quoted). I don't know how it manages to do this. I'm pretty sure that the subset sum tool would not be able to locate a single solution under comparable odds.

Although the problems I've worked on here, all have the attribute of being NP-complete, I'm not really so interested in that aspect of them. Their theoretical property of being ultimately unsolvable, maybe be of vital interest to computer scientists and cryptologists, but I think the extent to which they are solvable is of more interest. More particularly, not being comfortable with too much analytical thinking, I'm much drawn to the notion that solving this sort of problem by trial and error becomes increasingly feasible as computer speeds double.

The popular articles that I've read over the last few years have all made a hard sell of the proposition that unless P=NP, in the absence of the implied polynomial-time algorithm, we are only able to obtain approximate solutions to NP-complete problems. The bit about this only being true when we're on the border of worst-case instances is in the small print. With these pages I've aimed to provide the means by which anyone can demonstrate for themselves the broad range of cases in which exact solutions can be easily found. Although the applications here are simple ones, they together represent a wide range of financial, industrial and logistical processes.

 

Some links:

Cool Math Sites - Loads of good stuff here.

 

 

Pi in the Sky

Pi in the Sky magazine is primarily aimed at high-school students and teachers, with the main goal of providing a cultural context/landscape for mathematics. It has a natural extension to junior high school students and undergraduates, and articles may also put curriculum topics in a different perspective.

Published by Pacific Institute for the Mathematical Sciences (PIMS). All issues are freely available on-line in PDF format. Aimed at schools in Alberta B.C. and Washington State, it's for anyone who enjoys mathematics....or wishes they did!

 

 

Knot a braid of links

 

Contact: john.lawrence@hagaregn.org.uk