THE NP-COMPLETE ARCADE

Here, for your amusement and experimentation, is a set of applets that demonstrate some well-known NP-Complete problems.

Updated 6th December 2009: Clique Problem Tool

 

click image to go to applet page
Click on the image to go to the applet.
click image to go to applet page
The Million Dollar Sudoku Puzzle

Had you solved this 'super fiendish' puzzle, and posted your solution to The Times, you might have won yourself a bottle of bubbly. To earn that, you would have used a fair number of the deductive processes that you've developed over the last few years

Alternatively, you could have used something like the Java applet shown on the left. It's a non-deductive sudoku solver.

What I mean is that the program doesn't use any sudoku strategies. It simply looks for a permutation that fits the basic rules, and the given numbers.

The generalised form of this puzzle, viewed this way, is one of a steadily growing number of problems, classed as NP-Complete, that are the focus of the PvsNP question. I won't attempt to explain this. I don't fully understand it. There's an introduction in the Wikipedia. If you want the full works, read Stephen Cook's statement of the problem. It was he who formulated the question.

The Clay Institute of Mathematics are offering a prize of $1,000,000 to anyone who can answer the PvsNP question. It's one of the seven Millenium Problems.

The world of technology would give a great deal more.

Regardless of whether or not the PvsNP question is answered, the theoretically hard NP-complete problems in its domain are easily solvable in many real-world circumstances.

Sudoku puzzles are created in the expectation that they will be solved. Generally, there is only one solution. Normally they are solved by deduction. They can also be solved, as they are here, using similar methods to those used to solve the Travelling Salesman Problem.

One thing that's said about NP-complete problems is that if you've solved one you've solved them all. There are some that say that sudoku is the one to concentrate on.

 

In common with all the applets on these pages, this one uses a random method. But, unlike the Music Tape applet (Misterioso below), this one doesn't simply try random arrangements until it finds one that fits. It wouldn't be possible to do this with today's technology.

There are 6,670,903,752,021,072,936,960 valid sudoku grid arrangements of the 9 sets of 9 numbers. To work through these at ,say, 1 million per second, would take 211 million years.

This applet doesn't look for the combination that fits....it shapes it. Often, it takes less than half a minute to do this. It seldom takes more than a couple. It's all a question of luck. You'll find quicker and surer sudoku solvers on the net, but this one's not trying to be clever.

Initially, the given values are set in their places in the matrix from the stock of 9 sets of the numbers 1 to 9. The rest of the stock is allocated sequentially to fill the matrix. The result is far removed from being a valid arrangement. The applet then embarks on a procedure whereby random incremental progressive changes to the non-given elements of the matrix cause it to evolve towards being a valid sudoku permutation. In the case of a a single-solution puzzle like the one here, the resultant permutation will be the 1-in-6,670,903,752,021,072,936,960 arrangement.

As happens with evolutionary systems, the applet can find itself in a blind alley where, apparently, no further small changes can yield improvement. It doesn't know for sure that it's stuck, but it assumes that, if it has gone through 100,000 interations and not come to the solution, it is in a cul-de-sac. To overcome this stalemate it partially scrambles the failed result, before resuming the process. The jerky progress of the applet is a manifestation of this.

When it reaches each millionth iteration point, it is about to restart after being stuck. You have the options of continuing, or resetting. My feeling is that it probably doesn't make a lot of difference which you choose.

 

Misterioso

Some years ago, I was writing a Java application to browse my collection of jazz recordings. One of the things I wanted do with it, was compile collections of tracks that played for exactly the amount of time that I specified. That was in the days of tape cassettes, when you aimed to fill each side exactly so that there was no need to rewind when you turned the tape over.

My idea was that the compilation routine would simply to work through random combinations of track times until it found one that fitted. If it didn't find one after 5000 tries, it would present the one that was closest. With thousands of tracks to choose from, I didn't think it would have any problem finding a set of tracks that exactly fitted the specified tape length.

I was surprised though, when I tested the program with just 20 tracks, that it came up with a set that fitted.

I've put the algorithm into the applet shown below. Click on the image if you want to play with it.

You can change the tape length and track lengths. If you blank out a track length, you knock it out of the process. You'll see that you can reduce the number tracks to 15 before there's any difficulty finding a combination of the right length.

 

click image to go to applet page

 

What I didn't know then, was that the problem of selecting a subset of numbers from a set, to exactly total a target value is one of the best known NP-Complete problems. It is the subset sum problem.

With just 20 tracks, there are just over one million possible combinations, so we're not in the going to blow any fuses in getting our answer, even if there happened to be just one solution.

The next applet applies the same algorithm to a set of 40 tracks. This gives us the much meatier number of 1,099,511,627,776 combinations to work through.

If you play with it, you'll see that, although we have a million times the number of possible combinations here, it doesn't take appreciably longer to obtain a subset totalling to the required tape length. This is because the proportion of such subsets to the overall number remains the same.

We can thin out this density, by changing track times to a value (eg 50:00) that is more than the tape length. If we do this to 25 of them, we'll end up with just one or two (or perhaps nil) qualifying subsets in among our trillion combinations. The first 20 tracks are the same as in the 20-track applet, so you can use that to check that there is a subset to be obtained.

I've limited the number of iterations to 1 million, so that you won't have the applet running on for hundreds of hours!.

 

click image to go to applet page

 

 

 

 

Sudoku 17

I've run my sudoku applet with one of Gordon Royle's Minimum Sudokus. These have a single solution with just 17 givens (or hints, or clues, whatever you want to call them). As my program doesn't use any sudoku strategies, I had expected it to stupidly chomp through it in a reasonably short time........ But it didn't.

I was disappointed at first, but now I'm just a bit amazed. It's not because of the single solution (not a problem up until now). Nor is it the scarcity of the givens (see sad puzzle to the right). It just seems to have found it too difficult.

If you want to see the applet struggle with a minimum sudoku, go to my applet page. It's the second applet down.

 

click image to go to applet page

 

 

click image to go to applet page

Sit Down You're Rocking The Boat!

This little trick really blows my mind. Dressed up as a clever balancing act, it is simply a special case of the subset sum problem. It's called the partition problem.

For this demonstration, you have 20 sealed crates to load into the hold of a ship. Their weights are random numbers in the range 1-50,000 units. You have to flip them, port and starboard, so that you have exactly the same weight (or a difference of 1 unit if the total weight is an odd number) on either side.

What really amazes me is that 20 randomly generated numbers of this size, are so amenable to being partioned in this way.

The applet allows you to do the partitioning manually if you feel up to it. It also allows you to alter the numbers that you've been given.

 

A Question Of Balance

This next applet will help you find some answers to questions that no doubt occurred to you while you were playing with the 'StevedoreStomp' applet above. What happens with larger numbers of crates, and bigger crates.

With this one you can increase the number of crates to 1 million, and the maximum crate size to 1 billion. You can run for as long as you're prepared to keep pressing the 'CONTINUE' button that pops up every 10 million iterations. You can also play around with the 'STOMP' factor, that controls the degree to which applet is focussed on its goal.

Here are some statistics to keep in mind if the applet doesn't seem to be getting anywhere......

Number of crates Time required to review all the ways of dividing them between the 2 holds
20 1 second
30 20 minutes
40 12 days
50 35 years
60 37,000 years
70 37,000,000 years
80 Twice the age of the universe
1,000,000 Don't bother thinking about it!

click image to go to applet page

 

That's not to say that balancing the cargo gets harder simply as you increase the number of crates. Indeed, under certain conditions it may get easier (see bit on 'stomp' below).

To balance a cargo of 1 million crates in the range 1 to 50,000 takes a few seconds..... around the same time as it does for 20.

The maximum crate weight seems to be a more significant factor. To balance 80 crates with max. weight 50,000 takes seconds. To balance the same number with max.weight 1 billion takes around half an hour.

The weights of the crates, that are randomly generated for this applet, range from 1 unit to the maximum number of units that you specify. As the range is increased, the difference between one crate and the next in weight also increases. It is these differences that have to be combined to cancel each other out. For a given range of differences, you need a minimum number of them to do the trick.

The table below shows results from a large number of runs, using an applet similar to the one here....but automated for multiple runs. It shows the maximum crate weight that can be exactly balanced without a single failure in the series of runs for each Number of Crates. I started out with 5,000 runs each, and cut it down to 1,000 after I'd passed the 25-crate mark.

 

Minimum number of crates

Half the number of possible arrangements

 Maximum  crate weight

Average number of  iterations 

n c=2(n-1) m i
20 524,288 15,000 31,835
21 1,048,576 35,000 72,669
22 2,097,152 69,000 142,112
23 4,194,304 140,000 308,916
24 8,388,608 283,000 631,578
25 16,777,216 564,000 1,251,452
26 33,554,432 1,128,000 2,651,895
27 67,108,864 2,267,000 5,291,363

 

There's a wide variation in the number of iterations it takes the applet to balance a cargo....even with successive runs on the same cargo. The figures in the columns 3 and 4 are a bit wobbly, but I think it's clear enough how they progress. As n is incremented by 1, the observed values of m and i are doubled. The calculated value of c is also doubled.  The value of m is approximately c/32 or 2(n-5), and i is approximately 2m or c/16 or 2(n-4).

Using these relationships, we can say that with 80 crates we can balance a maximum crate weight of 37,778,931,862,957,161,709,568, and that it would take, on average, double that number of iterations to do so.

If we reckon that we can work through 1,000,000 iterations per second, this would take 2,395,924,141 years.

Looking at it one way, we could say that taking this long to shunt 80 crates around to get them balanced, has all the hallmarks of an NP-complete problem.

Looking at it another way......if we had an applet that just counted and displayed up to the maximum crate weight, at 1 million per second, it would run for around a billion years. We wouldn't regard that as a NP-complete problem.

So, in this case, is the applet concerned with working through the combinations of 80 crates, or with balancing large numbers, to get a successful result? Are these necessarily the same thing?

A 23-digit number is not that impressive to look at, but it's a hell of a lot of beans.

I've written another version of the applet that allows you to set a minimum crate weight. I think this is is a bit more realistic. It allows you to work with big crates that vary relatively little in weight. The closer you set the minimum to the maximum, the more you apply the additional contraint that the number of crates on each side is the same. The amounts to be balanced are the excesses of the crate weights over the minumum, so the minimum number of crates and the iterations are reduced accordingly. Generally, the need to have the same number of crates on each side is not a problem.......unless of course you have an odd number of crates!

STOMP

The stomp value is the percentage number of regressive changes that are disallowed. All the above trials were carried out with a value of 50. At that value, the number of crates has very little effect on the resolution time.

The stomp value has a big effect on the running time when the number of crates is large. Here are the results of a series of runs (1000 per stomp value) for a maximum crate weight of 50,000.

 

Stomp Value Average iterations with 23 crates Average iterations with 100 crates Average iterations with 1,000 crates Average iterations with 1,000,000 crates
0 149,169 266,192 893,861 24,556,773
10 133,687 206,081 403,299 518,989
20 123,461 167,718 243,180 259,776
30 110,188 133,769 156,932 164,251
50 98,227 105,568 107,117 117,066
70 95,135 93,992 94,019 90,660
80 98,436 86,769 82,210 79,539
90 117,183 84,006 74,383 75,111
100 fails fails fails 79,432

 

 

Combinatorial Junior Tool Kit No.1

The applets that I've posted up until now were designed to explore the circumstances under which some np-complete problems become difficult.

Leaving aside the theoretical difficulty, many instances of these problems in real life are more or less easy to solve, depending on the sets of data they operate on. They can be difficult enough however, to require the help of some sort of tool.

The Junior Tool Kit contains a set of applets designed as tools that can be used by you to solve your problems using your data. My expectation is that generally they will give you an exact solution. Failing that, they should give you a solution with a very small error. If you've played with the applets above you'll have some idea of what to expect. Take them as you find them. If they help you, that's fine.... if not, then you probably need the Senior Tool Kit.

Don't hold your breath though...the Senior Tool Kit may never become available.

I'll start with the partition problem, because it's a fairly straight forward modification of the previous applet.

You just paste (or type) your set of integers (with or without associated identifiers) into the input text area, press a couple of buttons, and copy the resultant subsets from the output text area. It uses the CSV (comma seperated values) import/export format used by most spreadsheets.

Click on the image below to go to the applet page. You'll find full instructions there.

 

click image to go to applet page

 

Although the Target Difference, in the above applet, is ostensibly for the purpose of producing unequal partitions, it also provides the means of obtaining subsets that sum to a specified value. Maybe you've already used it that way. For each difference, there are 2 subsets. One of these can be regarded as the obtained subset; the other is then the residual subset. ie the original set minus that obtained. The relationship between the sum of the original set (S), the obtained subset sum (s), and the target difference (d) is:

s = (S - d)/2 or d = S - 2s

So, it should come as no surprise that the second tool in the box, the Subset Sum Tool, looks very much like the first.

 

click image to go to applet page

 



The Forgotten 400

I suppose that I misunderstood the Clay Institute's 400-student problem, but I don't think was the only one that thought the 100 students were to be accommodated in 50 double units, rather than in a single unit. When I wrote a program to solve the wrong problem, I found it was very difficult to come up with a Dean's list that was even slightly difficult to resolve successfully. Only when the list contained all but a few hundred of the possible 79,800 possible pairs did it become hard. I was unable to devise a list that was impossible without it being obviously so.

Then I thought that's too easy, re-read the problem description, and wrote a second program to populate a 100-student block. The list started to get difficult with around 3,200 pairs. I began to understand what the 'hard' bit was.

That was 3 years ago. The other day, while I was testing the Clique Tool (below) I had the feeling that I was revisiting somewhere that I'd been before. Sure enough, googling on 'clay 400 clique', I found that I was probably right about the dorm.

The applet below begins to find it difficult to form a clique of 100 at around the same point as my bespoke program did. A dean's list of 3200 is equivalent to an average 383 adjacencies in a graph of 400 vertices.

So here's tool #3....

 

click image to go to applet page

 



To be continued......

 

Some links:

Cool Math Sites - Loads of good stuff here.

 

 

Knot a braid of links

 

Contact: john.lawrence@hagaregn.org.uk