Introduction
In this section we introduce the ‘0/1 Knapsack’ problem.
The 0/1 Knapsack Problem and Logistics
Suppose an airline cargo company has 1 aeroplane which it flies from the UK to the US on a daily basis to transport some cargo. In advance of a flight, it receives bids for deliveries from (many) customers.
Customers state
• the weight of the cargo item they would like to be delivered; • the amount they are prepared to pay.
3
The company must choose a subset of the packages (bids) to carry in order to make the maximum possible profit, given the total weight limit that the plane is allowed to carry.
In mathematical form the problem is: Given a set of N items each with weight wi and value vi, for i = 1 to N, choose a subset of items (e.g. to carry in a knapsack, or in this case an aeroplane) so that the total value carried is maximised, and the total weight carried is less than or equal to a given carrying capacity, C. As we are maximising a value given some constraints this is an optimisation problem.
This kind of problem is known as a 0/1 Knapsack problem. A Knapsack problem is any problem that involves packing things into limited space or a limited weight capacity. The problem above is “0/1” because we either do carry an item: “1”; or we don’t: “0”. Other problems allow that we can take more than 1 or less than 1 (a fraction) of an item. Below is a description of a fractional problem.
See the description in Algorithm Design and Applications, p. 498. or the briefer description in Introduction to Algorithms, p. 353.
An Enumeration Method for solving 0/1 Knapsack
A straightforward method for solving any 0/1 Knapsack problem is to try out all possible ways of packing/leaving out the items. We can then choose the most valuable packing that is within the weight limit.
For example, consider the following knapsack problem instance:
Sample Input 3
154
2 12 10
385 11
The first line gives the number of items; the last line gives the capacity of the knapsack; the remaining lines give the index, value and weight of each item e.g. item 2 has value 12 and weight 10.
The full enumeration of possible packings would be as follows:
Items Packed
000
001
010 12 10 Yes 011 20 15 No
100
101
110
111
5 4 Yes
Value Weight
Feasible?
0 0 Yes 8 5 Yes
13 17 25
9 Yes 14 No 19 No
OPTIMAL
The items packed column represents the packings as a binary string, where “1” in position i means pack item i, and 0 means do not pack it. Every combination of 0s and 1s has been tried. The one which is best is 101 (take items 1 and 3), which has weight 9 (so less than C = 11) and value 13. We can also represent a solution as an array of booleans (this approach is taken in the Java and Python stubs).
4
Some vocabulary
• A solution: Any binary or boolean array of length N is referred to as a packing or a solution; This only means it is a correctly formatted instruction of what items to pack.
• A feasible solution: A solution that also has weight less than the capacity C of the knapsack.
• An optimal solution: The best possible feasible solution (in terms of value).
• An approximate solution: Only a high value solution, but not necessarily optimal.
In this lab we will investigate some efficient ways of finding optimal solutions and approximate solu- tions.
Description
This lab asks you to implement four different solutions to the 1/0 Knapsack problem over three weeks. We have provided partial solutions and it is your job to complete them.
Important Note 1: The C support code represents Knapsack solutions as a bitstring (an array of 0s or 1s) indicating whether an items should (1) or should not (0) be packed into the knapsack. The Java and Python code represent solutions as arrays of booleans (True and False). The Java and Python support code, converts the boolean values to 0 or 1 for printing so there can be a uniform presentation of results. In what follows we will sometimes use T as shorthand for True and F as shorthand for False.
Important Note 2: In the input files, items for the Knapsack are numbered from 1 to N. The support functions read these into arrays of size N+1 (one for item weights, one for item values and one to map the index of the item in the input file to that in a sorted array (for the algorithms where sorting by value/weight ratios is useful)). These arrays all have a null or None value (depending on language) as the 0th element of the array. In this way the array indices match up with the numbering in the input files, but it does mean that functions and methods working with these arrays have to account for the irrelevant 0th element. This is easier in C where you can simply pass a pointer to the element of the array at position 1 than in Python or Java where you need to start any iterations etc., explicitly at element 1.
Task 1a: Full Enumeration
Time Budget: Task 1a is primarily intended to help you familiarise yourself with the input files and running the knapsack program. You should not spend more than an hour on this task and it is not needed for later tasks in this coursework. If you are stuck on this for some reason but are confident that you understand how our input files work, how problems are represented internally in the program and how the program can be run, then we would recommend you move on to the rest of this coursework after an hour.
We have provided an implementation of the enumeration method for the Knapsack problem in each language. You can (compile, if necessary, and) run this program on data/easy.20.txt. The program enumerates the value, weight and feasibility of every solution and prints them to the screen. However, it does not “remember” the best (highest value) feasible solution or display it at the end.
- Adapt the code so that it does that. NB. on data/easy.20.txt this should compute a solution value of 377.
- It would also be useful to display how much of the enumeration has been done – like a progress bar. Add code to the enumeration loop to print out the fraction of the enumeration that is complete and the value of the current best solution. Note: If you update the progress bar too
5
often it will slow you down a lot. You may want to think about how often you update it if you want a more efficient program.
You may change the value of the QUIET variable to suppress output to the screen and make the code run faster. HOWEVER please make sure that in your submitted version we can see both the progress bar and the final solution – i.e., that the value of QUIET is not suppressing this output.
Running on our Example Inputs
In the data directory you will find four files:
easy.20.txt easy.200.txt hard.200.txt hard.2000.txt
In each case the number in the file name (20, 200, 2000) indicates how many items the example wants you to put into the knapsack. Your enumeration algorithm will probably only manage to solve the 20 item knapsack. The correct optimal answers for the problems are 377 (easy.20.txt), 4077 (easy.200.txt), 126968 (hard.200.txt), 1205259 (hard.2000.txt).
Blackboard Submission
Once you have completed this implementation you should look at the Blackboard submission form where you will be asked the following question about Part 1a.
- How did you implement remembering the best solution - include your code? (no more than 100 words - not counting code snippet).
- How did you implement the progress bar? How often did you choose to refresh this and why? (no more than 100 words - not counting any code snippets).
Task 1b: Dynamic Programming
Time Budget: Task 1b should take you 1-2 hours to complete. Dynamic programming can sometimes be a bit fiddly so you might want to budget this time to include one of the drop-in sessions so you can access GTA help if necessary. If you are going over 2 hours, it is probably worth moving on to the next part of the coursework, but you will need either a dynamic programming solution or a branch-and-bound solution for the final part of the coursework, so bear this in mind.
Unlike the enumeration approach (which generated all possible solutions and picks the best), dynamic programming approaches iteratively compute the solution to larger and larger subproblems using the results of smaller subproblems until we have the solution to the overall problem.
Complete the program that solves the 0/1 Knapsack Problem by dynamic programming. Using the program stubs and support files we provide for you.
The Dynamic Programming Solution
There is a detailed explanation of the dynamic programming approach to the 0/1 Knapsack Problem with examples on p. 343-345 of Algorithm Design and Applications. In brief the solution is as follows:
Identify sub-problems in a tabular form We will use a two dimensional array, V , for our sub- problems. V i is the maximum value we can achieve with the first ‘i’ items in our input list using at most weight ‘w’. If we have a list of N items and a capacity of C, then if we can
6
compute all values in the array V the value at VN will contain the optimal value of the items can can fit into our aeroplane.
Initialise the array If we have no items then our optimal value is 0, so V 0 = 0 for all 0 ≤ w ≤ C. All other values in the array we initialise with null or None (depending upon the language).
Recursive Step The maximum value we can make with the first i items and a weight limit of w, where the value of the ith item is vi and the weight of the ith item is wi is either:
The maximum value we could make with the first i − 1 items (ignoring the ith item), or 2. The maximum value we could make by adding the value of the ith item, vi, to the maximum
value we could make using the first i − 1 items and a maximum weight of w − wi. This is V i = max(V i − 1, vi + V i − 1) for 1 ≤ i ≤ N, 0 ≤ w ≤ C.
The table V just tells us the best value we can get after considering i items it doesn’t tell us which ones we added. We can update the algorithm to keep track of this information by using an auxilliary array, keep where keepi records whether the ith item is used in the maximal solution for V i. If keepi = 0 then we know that the ith item has been ignored and this maximal solution has been constructed from the maximal solution for V i − 1 so we should use keepi − 1 to find out which other items are included. If keepi = 1 then we know that the ith item has been included and this maximal solution has been constructed from the maximal solution for V i − 1 so we should use keepi − 1 to find out which other items are included.
The following pseudo-code outlines the complete solution
KnapSack(v, w, N, C) {
for (w = 0 to C) V0 = 0
for (i = 1 to N)
for (w = 0 to C)if (w[i] <= w) and (v[i] + V[i - 1][w - w[i]] > V[i - 1, w]) { V[i][w] = v[i] + V[i - 1][w - w[i]] keep[i][w] = 1 } else { V[i][w] = V[i - 1][w] keep[i][w] = 0 }
K=C
for (i = N downto 1)
if (keepi == 1) {output i
K = K - w[i] }
return V[N, C]
You may want to run a few simple examples of the algorithm by hand to check you understand how it works. For instance, how does it behave with the following four items:
i1234 vi 10 40 30 50 wi 5 4 6 3
Implementing and Testing the Dynamic Programming Solution
You can find further instructions for approach to this in dp.c/dp kp.java/dp kp.py file (depending upon which language you are using). If you wish, you may ignore these program stubs and implement your own dynamic programming approach to the 0/1 Knapsack problem.
7
Test your code on the instance file, data/easy.20.txt. You should get the same answer as with the enum program.
Try the harder problems in the data directory. If interested, and you have time, you might want to investigate the space and time complexity of your solution.
Once you have completed this implementation you should look at the Blackboard submission form where you will be asked the following questions about Part 1b.
Blackboard Submission
Once you have completed this implementation you should look at the Blackboard submission form where you will be asked the following question about Part 1b.
- How does your implementation of dynamic programming work? Illustrate this with the part of your code that is equivalent to lines 3-11 of the pseudocode we have given above. In particular, explain how your code calculates the maximum value for some set of items and some knapsack capacity with reference to the relevant lines of code in your implementation; and how your code tracks which items are in the knapsack for the optimal solution (no more than 400 words, not including code snippet)
Ideally, you should be here by the end of week 1 of this lab.
Task 2a: Fractional Knapsack Bound
Time Budget: Tasks 2a and 2b, taken together, are intended to take between 1 and 2 hours to complete. You will need either this, or dynamic programming, working for the final part of this coursework. If you have spent more than three hours on Part 2, you might want to look ahead and check the mark scheme, and make some decisions about where best to invest your time.
Imagine we have decided we definitely want to pack some particular items, and definitely don’t want to pack some particular other ones. For the remaining items we are not sure yet. We can represent this situation as
001110 or FFTTTF
where 0 or F (False) means definitely won’t take the item, 1 or T (True) means definitely will, and *
means don’t know. We call these partial solutions.
We can now calculate an estimate of the best value of all possible ways of replacing the *s by 0/Fs
and 1/Ts. This will be an overestimate. We call it an upper bound.
To calculate the estimate we take the * items in decreasing order of value-to-weight ratio and add them to the knapsack, until we go over the capacity. For the last item added, we take it out again and only add that fraction of its weight that would fit, and adding the same fraction of its value to the total value of the knapsack. This is a kind of cheating; however, we are only interested in making an estimate.
In the bnb.c/bnb kp.java/bnb kp.py code, we have already provided an almost complete func- tion frac bound() which accepts a partial solution of the form 01101 (C) or FTTFT** (Java/Python) as input and does the following things: - Checks the feasibility of the partial solution and sets the solution value to -1 if it is infeasible, and returns.
8 - If the partial solution is feasible then its value is calculated, i.e. the value of the items already packed, and the value is updated
- If the partial solution is feasible then its upper bound is also computed and updated.
Make sure you understand this frac bound() function and complete it by filling in the two missing lines.
Task 2b: Branch-and-Bound
The branch and bound approach was covered in lectures as a backtracking method for optimisation problems. You should also see pages 521-524 of Algorithm Design and Applications.
Let’s consider how we might, as people, solve the 0/1 knapsack problem. We would try the ‘best’ item first (e.g. the one with the highest value-to-weight ratio) and see if we can fit the best items in. This is also a reasonable approach for a computer e.g. to sort the items in descending order of value-to-weight ratio, and add items in that order until the knapsack is full. However, we might realise that putting that really big item in the bag means there is left over space and putting two smaller items in would have been better i.e. we back-track by removing an item and trying something else. We get computers to do the same thing when systematically exploring the search space.
In order to prevent us backtracking through every possible solution, we can “prune” off parts of the set of the solutions we know cannot contain a feasible/optimal solution. If we know that a particular subset of items is heavier than the capacity, we do not need to consider any solutions that use that subset. Similarly, if we know that not including a particular item (e.g. the first item) the best we can do is value vupper, and we have already found a solution better than vupper, then we no longer have to consider any solution that does not contain that crucial item.
You must now use the frac bound() function to complete the branch-and-bound implementation. The outline of the algorithm can be given as follows (see Algorithm Design and Applications for more details). - Sort the items by decreasing value-to-weight ratio.
- Compute the upper bound of the solution **...
- Compute the current values of each of the two solutions 0... (or F...) and 1... (or T...) (i.e. the total value of all the 1/Ts in each string), also their upper bound values, and check they are feasible.
- If they are feasible we place them on a priority queue (Note we have provided implementations of a priority queue for all three languages).
- In the next and all subsequent iterations, we remove the item with best bound value off the priority queue and again consider appending a 0 (C) or False (Java and Python) and a 1 (C) or True (Java and Python) to it.
- The algorithm stops when the queue is empty or a solution (a complete solution with no stars in it) with value equal to the current upper bound is found.
Complete the branch and bound function given in bnb.c/bnb kp.java/bnb kp.py. This will call the fractional knapsack bound function frac bound() and use the priority queue functions provided. More instructions are given in the code files.
When testing branch-and-bound you should consider stopping the program early if it is taking too long and think about how close the current best solution is to the actual optimal solution. Note that
9
if you wait long enough you might hit the capacity of the priority queue – this probably indicates that you should give up trying to find an exact solution with branch-and-bound, although you can try making the queue larger.
Blackboard Submission
Once you have completed this implementation (or got as far as you can with it) you should look at the Blackboard submission form where you will be asked the following questions about Part 2.
- What is a partial solution? What does the frac bound function compute (there should be two things)? How do these relate to a partial solution? What is a feasible partial solution? (no more than 200 words).
- Branch and bound works by “pruning” branches in the search space. Our suggested implementa- tion is also prioritising which branch to explore next. Explain how this pruning and prioritising is working in your code. What data structure are you using to manage the branches? Illustrate your answer with the relevant parts of your code – you may omit lines of the code (use ...) to focus on those bits most relevant to the question (don’t show large amounts of code provided by us (if you are using it) but show where you make calls to those functions/methods). (no more than 300 words, not including code snippet).
- Compare the behaviour of your Dynamic Programming solution and your Branch-and-Bound solution on the test data we have provided (killing test runs after about 1 minute if it is taking longer than that). What is the difference between the two approaches in terms of the time taken to find a solution, and what happens when no solution can be found in reasonable time (i.e, 1 minute)? (no more than 200 words).
You should be here by the end of week 2 of this lab.
Task 3a: Greedy Algorithm
Time Budget: Task 3a is intended to take less than half an hour to complete. It is not needed elsewhere in the lab and should be abandoned if time is running out.
The greedy algorithm is very simple. It sorts the items in decreasing value-to-weight ratio. Then it adds them in one by one in that order, skipping over any items that cannot fit in the knapsack, but continuing to add items that do fit until the last item is considered. There is no backtracking to be done.
Write your own greedy algorithm in greedy.c/greedy kp.java/greedy kp.py. Note that we have pro- vided a sort by ratio function/method (you should have encountered this when implementing the branch-and-bound solution).
Blackboard Submission
Once you have completed this implementation (or got as far as you can with it) you should look at the Blackboard submission form where you will be asked the following question about Part 3a. - Describe your implementation of the Greedy Algorithm. Illustrate your description with the relevant part of your code (no more than 200 words, not counting the code snippet).
- What are the advantages and disadvantages of the Greedy Algorithm? Answer this in reference to the test data we have provided (no more than 100 words).
10
Task 3b: What makes a Knapsack problem Difficult for Dynamic Program- ming or Branch-and-Bound?
Time Budget: Task 3b is intended to take 1-3 hours to complete. Please bear this in mind when investing time trying to figure out what happened if you do not get the results you expected.
Generating Data
To help with experiments we have supplied a python script called kp generate.py which will auto- matically generate knapsack instance files for you. It takes four arguments:
- The first argument is the number of items you want to put in the knapsack
- The second argument is the capacity of the knapsack
- The third argument is an upper bound on the profit and weight of each item – the script will randomly generate values for profit and weight between 1 and this number.
- The fourth argument is the name of the output file.
So, for instance if you call the script from the command line with
python3 kp_generate.py 5 6 7 test.txt
It will generate a file called test.txt whose contents will look something like: - 165 233 351 434 575 6
Feel free to adapt and modify this script for your own use.
The Experiment
Obviously Knapsack problems get harder to solve the more potential items there are to put into the Knapsack, but that’s not the only factor. Construct an experiment to explore what else (beyond number of items) makes a knapsack problem difficult for either dynamic programming or branch- and-bound and write this up in your report.
If you choose to explore dynamic programming, think about how the theoretical complexity for dynamic programming is computed and what factors are considered in that analysis.
The question of what makes a Knapsack problem difficult for branch and bound is quite complex. It is the harder of the two choices, so worth more marks but we don’t recommend taking this option unless you have made good time on the early parts of the coursework. Obviously, in the worst case, branch and bound performs no better than enumeration but in many cases, as you should have seen, it performs much better than enumeration. So what makes a knapsack instance cause branch and bound to behave more or less like enumeration? Part of this task involves doing some research to help you form a hypothesis. Make sure you reference any of your sources correctly in your report. If possible you should favour archival reports, such as scientific papers, over web pages. You can find these more easily using Google Scholar rather than Google’s general search engine. The library also has a lot of support available for how to find literature. As a hint, you might want to investigate “weakly correlated knapsack instances”.
11
Construct an experiment to explore what makes a knapsack problem difficult for dynamic pro- gramming or branch and bound and write this up in a report. Your write up should have the following sections:
Hypothesis You should state, as a hypothesis, what aspect of a knapsack problem not counting the number of items might make the problem more difficult for a dynamic programming/branch- and-bound solution with a brief explanation why. (This section should be about 200 words and take up no more than half a page in your report).
Experimental Design You should describe how you designed the experiment. This should include:
• Your intended independent variable – this is the thing you are varying.
• Your intended dependent variable – this is the thing you are measuring whose value your hypothesis says depends upon the value of the independent variable.
• Anything that you could vary but you are not – for instance to avoid confusing your results with other variables that might also influence performance.
• What input data you generated and how many input files you used for each value of your independent variable.
• What program you ran on what inputs. How many times you ran it on each input.
• What you measured and how.
You should also mention anything else you think is relevant that will help your marker judge how well you designed your experiment to test your hypothesis. (This section should be about 500 words and take up no more than a page in your report – not counting any scripts included as appendices).
Results You should present your results, ideally as a graph showing a best fit line. If you do any processing on results, such as generating best fit lines, computing averages, etc., then these should be described. Raw data should be presented in an appendix or included with your code submission. You should state clearly whether your hypothesis was confirmed or refuted (or it is equivocal or difficult to tell). (This section should be about 200 words and take up no more than half a page in your report – not counting graphs and tables).
Data Statement It is good practice to inform readers where they can find your data and code in order to check your results and re-run experiments for themselves. This should appear in a Data statement.
We’ve asked you not to include large input files (knapsack instances with more than 10,000 items) in your gitlab, but you should include any scripts you created - for instance to generate input files or run experiments, and the “raw” output data - e.g., a table (e.g., as a .csv file or spreadsheet) of independent variable values matched with dependent variable values. You may also include this as an appendix in the report.
The data statement should clearly state where all the input data (if included), scripts and output data can be found.
In total your experiment write up should not take up more than two sides of A4 at a sensible font size (not counting appendices). When you have written it please upload your report as a PDF to Blackboard.
12
1 C Instructions
This section contains some additional notes on completing the exercise in C.
Task 1a: Full Enumeration
Take a look at the files.
• enum.c implements the full enumeration of all possible solutions as described above.
• knapsack-util.c provides some functions to read in a knapsack instance, print the instance details, print out a solution, and evaluate the values and weights of a solution. There’s no need to edit knapsack-util.c during this lab, but you may do so if you wish.
Make enum and run it on easy.20.1.txt, which is a 0/1 knapsack problem instance with 20 items to pack:
make enum
./enum ../data/easy.20.1.txt
Task 1b: Dynamic Programming
Complete the program that solves the 0/1 Knapsack Problem by dynamic programming. Use the files knapsack-util.c and dp.c we provide for you. Further instructions are in dp.c.
Tasks 2a and 2b: Branch-and-Bound
Complete the program that solves the 0/1 Knapsack Problem by branch-and-bound. Use the files knapsack-util.c and bnb.c we provide for you. Further instructions are in bnb.c.
Task 3a: Greedy
Complete the program that solves the 0/1 Knapsack Problem with a greedy algorithm. Use the files knapsack-util.c and greedy.c we provide for you. Further instructions are in greedy.c.
2 Java Instructions
This section contains some additional notes on completing the exercise in Java.
Task 1a: Full Enumeration
Take a look at the files.
• enum kp.java implements the full enumeration of all possible solutions as described above.
• KnapSack.java provides a class with some functions to read in a knapsack instance, print the instance details, print out a solution, and evaluate the values and weights of a solution. There’s no need to edit KnapSack.java during this lab, but you may do so if you wish.
Note that enum kp.java sub-classes KnapSack, as do all the other Knapsack solutions in this lab.
Compile enum kp and run it on easy.20.1.txt, which is a 0/1 knapsack problem instance with 20 items to pack:
13
cd java/comp26120
javac enum kp.java KnapSack.java
cd ..
java comp26120.enum kp ../data/easy.20.1.txt
Task 1b: Dynamic Programming
Complete the program that solves the 0/1 Knapsack Problem by dynamic programming. Use the files KnapSack.java and dp kp.java we provide for you. Further instructions are in dp kp.java.
Tasks 2a and 2b: Branch-and-Bound
Complete the program that solves the 0/1 Knapsack Problem by branch-and-bound. Use the files KnapSack.java and bnb kp.java we provide for you. Further instructions are in bnb kp.java.
Task 3a: Greedy
Complete the program that solves the 0/1 Knapsack Problem with a greedy algorithm. Use the files KnapSack.java and greedy kp.java we provide for you. Further instructions are in greedy kp.java.
3 Python Instructions
This section contains some additional notes on completing the exercise in Python.
Task 1a: Full Enumeration
Take a look at the files.
• enum kp.py implements the full enumeration of all possible solutions as described above.
• knapsack.py provides a class with some functions to read in a knapsack instance, print the instance details, print out a solution, and evaluate the values and weights of a solution. There’s no need to edit knapsack.py during this lab, but you may do so if you wish.
Note that enum kp.py sub-classes knapsack, as do all the other Knapsack solutions in this lab.
Run enum knp.py on easy.20.1.txt, which is a 0/1 knapsack problem instance with 20 items to pack:
python3 enum kp.py ../data/easy.20.1.txt
Task 1b: Dynamic Programming
Complete the program that solves the 0/1 Knapsack Problem by dynamic programming. Use the files knapsack.py and dp kp.py we provide for you. Further instructions are in dp kp.py.
Tasks 2a and 2b: Branch-and-Bound
Complete the program that solves the 0/1 Knapsack Problem by branch-and-bound. Use the files knapsack.py and bnb kp.py we provide for you. Further instructions are in bnb kp.py.
14
Task 3a: Greedy
Complete the program that solves the 0/1 Knapsack Problem with a greedy algorithm. Use the files knapsack.py and greedy kp.py we provide for you. Further instructions are in greedy kp.py.
Marking Scheme
This coursework is worth 15% of your final mark for COMP26120. This means each mark in this mark scheme is worth about 0.5% of your final mark for the module.
In the rubric below the marks an item is worth are slightly approximate due to the way Blackboard allocates percentages across sections. In general unsatisfactory performance will get you 10% of the available marks for a section, satisfactory will get you 50% and excellent will get you 100% but this can vary because several criteria may be involved each of which can be marked unsatisfactory, satisfactory or excellent.
Code Submission (1 mark)