Greedy algorithms are incredibly powerful when applicable. That "when applicable" is pretty important because, in my experience, it is not often that we are able to use a greedy algorithm, but when we can, they usually lead to very fast solutions. In the post, I'll try to demistify greedy algorithms from more of a mathematical perspective.
I think most things make sense from an example, so I'll pick an example problem that made greedy algorithms clear to me.
Let's say we have processes. Think, computer processes. The process takes time to execute. We need to schedule processes sequentially, i.e., is scheduled before and so on. The finishing time of a process, , is,
We want to find an ordering of processes such that the average finishing time is minimized. The average finishing time is defined as,
As an example, let's say the execution times (not the finishing times!) is as follows,
Then, the finshing times would be,
And the average finishing time would be . Can we do better? Yes, if we arrange the processes as follows,
We get an average finishing time of . Much better! So the goal of the problem is to find an ordering of these given times such that we reduce the average minimum time.
Note that our optimization objective has a term. Also notice that the problem asks us to schedule all the processes. This means that is fixed no matter what ordering we return. If is fixed we're only optimizing the numerator, which is the sum of finishing times.
After playing with the example above, one might notice that placing smaller processes at the start we can get a better total sum. This is because the sums are cumulative, the finishing times are sums of everything that came before. Hence, the execution time of the first scheduled process shows up in every single finishing time. We would like this to be the smallest. This leads to a simple greedy algorithm: sort the times in increasing order. You can use your favorite comparision sort here, to yield an algorithm.
But how can we prove that our algorithm is correct? In the next section we're gonna walk through a formal proof of our algorithm.
We will first define a subproblem. Let be the minimum average completion time to schedule processes with times in the set . is simply a set of all the execution times. Clearly, if we solve the problem where is the entire input set, , we solve the entire problem.
But when we use the greedy technique we make a local choice. We might decide to schedule one of the proccesses inside . Say by some magic, we decided to schedule a process with execution time . Then we our left with the problem, . This problem is smaller than our original problem, but solving it allows us to solve the whole problem.
We will now show that our subproblem has the optimal substructure property, i.e., the solution of the whole problem can be constructed from the optimal solution of subproblems.
Consider a problem , and by some magic we make a choice and decide to schedule a process with execution time . Then, as before, we are left with . Let the optimal solution of be . Let a solution to be . Then, we know that where is some fixed constant, the completion time we got for scheduling . It doesn't matter what is.
We want to show that Let the optimal solution to is contained within the optimal solution of . Let be the optimal solution to and for contradiction be the unoptimal solution to . This means that,
This means, that picking any other solution, other than the optimal solution to leads to a sub-optimal solution for the whole problem.
We have shown that we can locally make a decision and then solve the rest of the subproblem, combine them and we'll get a globally optimal answer. Now we show that the local decision we make is indeed the best one. In this problem, the local decision we decided to make was to pick the process with the lowest execution time first. We show that this is optimal using a proof by contradiction.
For contradiction, let be some order that is not sorted by . Then, there must be some and such that and , otherwise it would be sorted by our greedy choice. Then we show that swapping and will give us a lower average finishing time.
The finishing times of all processes before and finishing times of all processes after are unchanged. The only change we get is all processes between . For every process in this range, the change in finishing time is . Now there could be a number of changes, but we will have at least one change. We know that because . Thus the sorted order is more optimal if we swapped and .
Hopefully this post provided some clarity on how to approach writing proofs of correctness for greedy algorithms. The technique described here is fairly general, and is also very useful for dynamic programming proofs, which shows up more often than not in various machine learning tasks. It's pretty powerful to be able to convincingly argue why a particlar algorithm would always yield a correct answer.