How To Prove Your Greedy Algorithm

2 September 2021 · 5 mins

Greedy algorithms are incredibly powerful when applicable. That "when applicable" is pretty important because, in my experience, it is not often that we are able to use a greedy algorithm, but when we can, they usually lead to very fast solutions. In the post, I'll try to demistify greedy algorithms from more of a mathematical perspective.

I think most things make sense from an example, so I'll pick an example problem that made greedy algorithms clear to me.

Example Problem

Let's say we have nn processes. Think, computer processes. The ithi^\text{th} process takes tit_i time to execute. We need to schedule processes sequentially, i.e., p0p_0 is scheduled before p1p_1 and so on. The finishing time of a process, pip_i, is,

Cpi=j=0itpj\begin{align} C_{p_i} = \sum_{j = 0}^i t_{p_j} \end{align}

We want to find an ordering of processes such that the average finishing time is minimized. The average finishing time is defined as,

iCpi/n\begin{align} \sum_i C_{p_i} / n \end{align}

As an example, let's say the execution times (not the finishing times!) is as follows,

T=[2,1,9,1,4,3,3]T = [2, 1, 9, 1, 4, 3, 3]

Then, the finshing times would be,

C=[2,3,12,13,17,20,23]C = [2, 3, 12, 13, 17, 20, 23]

And the average finishing time would be 90/712.8690 / 7 \approx 12.86. Can we do better? Yes, if we arrange the processes as follows,

T=[1,1,2,4,3,3,9]C=[1,2,4,8,11,14,23]\begin{align} T' &= [1, 1, 2, 4, 3, 3, 9] \\ C' &= [1, 2, 4, 8, 11, 14, 23] \end{align}

We get an average finishing time of 63/7=963 / 7 = 9. Much better! So the goal of the problem is to find an ordering of these given times such that we reduce the average minimum time.

Solution

Note that our optimization objective has a 1n\frac{1}{n} term. Also notice that the problem asks us to schedule all the processes. This means that nn is fixed no matter what ordering we return. If nn is fixed we're only optimizing the numerator, which is the sum of finishing times.

After playing with the example above, one might notice that placing smaller processes at the start we can get a better total sum. This is because the sums are cumulative, the finishing times are sums of everything that came before. Hence, the execution time of the first scheduled process shows up in every single finishing time. We would like this to be the smallest. This leads to a simple greedy algorithm: sort the times in increasing order. You can use your favorite comparision sort here, to yield an O(nlogn)O(n \log n) algorithm.

But how can we prove that our algorithm is correct? In the next section we're gonna walk through a formal proof of our algorithm.

Subproblem

We will first define a subproblem. Let G(X)G(X) be the minimum average completion time to schedule processes with times in the set XX. XX is simply a set of all the execution times. Clearly, if we solve the problem G(X)G(X) where XX is the entire input set, X=TX = T, we solve the entire problem.

But when we use the greedy technique we make a local choice. We might decide to schedule one of the proccesses inside XX. Say by some magic, we decided to schedule a process with execution time tXt \in X. Then we our left with the problem, G(X{t})G(X - \{t\}). This problem is smaller than our original problem, but solving it allows us to solve the whole problem.

Optimal Substructure

We will now show that our subproblem has the optimal substructure property, i.e., the solution of the whole problem can be constructed from the optimal solution of subproblems.

Consider a problem XX, and by some magic we make a choice and decide to schedule a process with execution time tt. Then, as before, we are left with X{t}X - \{t\}. Let the optimal solution of G(X)G(X) be SS. Let a solution to G(X{t})G(X - \{t\}) be ss. Then, we know that S=k+sS = k + s where kk is some fixed constant, the completion time we got for scheduling tt. It doesn't matter what kk is.

We want to show that Let the optimal solution to G(X{t})G(X - \{t\}) is contained within the optimal solution of G(X)G(X). Let sopts_{\text{opt}} be the optimal solution to G(X{t})G(X - \{t\}) and for contradiction sunopts_{\text{unopt}} be the unoptimal solution to G(X{t})G(X - \{t\}). This means that,

sopt<sunopt    k+sopt<k+sunopt\begin{align} s_{\text{opt}} &< s_{\text{unopt}} \\ \implies k + s_{\text{opt}} &< k + s_{\text{unopt}} \end{align}

This means, that picking any other solution, other than the optimal solution to G(X{t})G(X - \{t\}) leads to a sub-optimal solution for the whole problem. \Box

Greedy Choice Property

We have shown that we can locally make a decision and then solve the rest of the subproblem, combine them and we'll get a globally optimal answer. Now we show that the local decision we make is indeed the best one. In this problem, the local decision we decided to make was to pick the process with the lowest execution time first. We show that this is optimal using a proof by contradiction.

For contradiction, let p0,,pn1p_0, \ldots, p_{n-1} be some order that is not sorted by tit_i. Then, there must be some pip_i and pjp_j such that i<ji < j and ti>tjt_i > t_j, otherwise it would be sorted by our greedy choice. Then we show that swapping pip_i and pjp_j will give us a lower average finishing time.

The finishing times of all processes before pip_i and finishing times of all processes after pjp_j are unchanged. The only change we get is all processes between [i,j)[i, j). For every process in this range, the change in finishing time is Δ=tpitpj\Delta = t_{p_i} - t_{p_j}. Now there could be a number of changes, but we will have at least one Δ\Delta change. We know that Δ>0\Delta > 0 because ti>tjt_i > t_j. Thus the sorted order is more optimal if we swapped pip_i and pjp_j. \Box

Wrapping Up

Hopefully this post provided some clarity on how to approach writing proofs of correctness for greedy algorithms. The technique described here is fairly general, and is also very useful for dynamic programming proofs, which shows up more often than not in various machine learning tasks. It's pretty powerful to be able to convincingly argue why a particlar algorithm would always yield a correct answer.

Shreyas Kapur's Blog