2 September 2021 · 5 mins

Greedy algorithms are incredibly powerful when applicable. That "when applicable" is pretty important because, in my experience, it is not often that we are able to use a greedy algorithm, but when we can, they usually lead to very fast solutions. In the post, I'll try to demistify greedy algorithms from more of a mathematical perspective.

I think most things make sense from an example, so I'll pick an example problem that made greedy algorithms clear to me.

Let's say we have $n$ processes. Think, computer processes. The $i^\text{th}$ process takes $t_i$ time to execute. We need to schedule processes sequentially, i.e., $p_0$ is scheduled before $p_1$ and so on. The finishing time of a process, $p_i$, is,

$\begin{align}
C_{p_i} = \sum_{j = 0}^i t_{p_j}
\end{align}$

We want to find an ordering of processes such that the average finishing time is minimized. The average finishing time is defined as,

$\begin{align}
\sum_i C_{p_i} / n
\end{align}$

As an example, let's say the execution times (**not** the finishing times!) is as follows,

$T = [2, 1, 9, 1, 4, 3, 3]$

Then, the finshing times would be,

$C = [2, 3, 12, 13, 17, 20, 23]$

And the average finishing time would be $90 / 7 \approx 12.86$. Can we do better? Yes, if we arrange the processes as follows,

$\begin{align}
T' &= [1, 1, 2, 4, 3, 3, 9] \\
C' &= [1, 2, 4, 8, 11, 14, 23]
\end{align}$

We get an average finishing time of $63 / 7 = 9$. Much better! So the goal of the problem is to find an ordering of these given times such that we reduce the average minimum time.

Note that our optimization objective has a $\frac{1}{n}$ term. Also notice that the problem asks us to schedule *all* the processes. This means that $n$ is fixed no matter what ordering we return. If $n$ is fixed we're only optimizing the numerator, which is the sum of finishing times.

After playing with the example above, one might notice that placing *smaller* processes at the start we can get a better total sum. This is because the sums are cumulative, the finishing times are sums of everything that came before. Hence, the execution time of the first scheduled process shows up in every single finishing time. We would like this to be the smallest. This leads to a simple greedy algorithm: sort the times in increasing order. You can use your favorite comparision sort here, to yield an $O(n \log n)$ algorithm.

But how can we prove that our algorithm is correct? In the next section we're gonna walk through a formal proof of our algorithm.

We will first define a subproblem. Let $G(X)$ be the minimum average completion time to schedule processes with times in the set $X$. $X$ is simply a set of all the execution times. Clearly, if we solve the problem $G(X)$ where $X$ is the entire input set, $X = T$, we solve the entire problem.

But when we use the greedy technique we make a local *choice*. We might decide to schedule one of the proccesses inside $X$. Say by some magic, we decided to schedule a process with execution time $t \in X$. Then we our left with the problem, $G(X - \{t\})$. This problem is smaller than our original problem, but solving it allows us to solve the whole problem.

We will now show that our subproblem has the optimal substructure property, i.e., the solution of the whole problem can be constructed from the optimal solution of subproblems.

Consider a problem $X$, and by some magic we make a choice and decide to schedule a process with execution time $t$. Then, as before, we are left with $X - \{t\}$. Let the optimal solution of $G(X)$ be $S$. Let *a* solution to $G(X - \{t\})$ be $s$. Then, we know that $S = k + s$ where $k$ is some fixed constant, the completion time we got for scheduling $t$. It doesn't matter what $k$ is.

We want to show that Let the optimal solution to $G(X - \{t\})$ is contained within the optimal solution of $G(X)$. Let $s_{\text{opt}}$ be the optimal solution to $G(X - \{t\})$ and for contradiction $s_{\text{unopt}}$ be the unoptimal solution to $G(X - \{t\})$. This means that,

$\begin{align}
s_{\text{opt}} &< s_{\text{unopt}} \\
\implies k + s_{\text{opt}} &< k + s_{\text{unopt}}
\end{align}$

This means, that picking any other solution, other than the optimal solution to $G(X - \{t\})$ leads to a sub-optimal solution for the whole problem. $\Box$

We have shown that we can locally make a decision and then solve the rest of the subproblem, combine them and we'll get a globally optimal answer. Now we show that the local decision we make is indeed the best one. In this problem, the local decision we decided to make was to pick the process with the lowest execution time first. We show that this is optimal using a proof by contradiction.

For contradiction, let $p_0, \ldots, p_{n-1}$ be some order that is not sorted by $t_i$. Then, there must be some $p_i$ and $p_j$ such that $i < j$ and $t_i > t_j$, otherwise it would be sorted by our greedy choice. Then we show that swapping $p_i$ and $p_j$ will give us a lower average finishing time.

The finishing times of all processes before $p_i$ and finishing times of all processes after $p_j$ are unchanged. The only change we get is all processes between $[i, j)$. For every process in this range, the change in finishing time is $\Delta = t_{p_i} - t_{p_j}$. Now there could be a number of changes, but we will have at least one $\Delta$ change. We know that $\Delta > 0$ because $t_i > t_j$. Thus the sorted order is more optimal if we swapped $p_i$ and $p_j$. $\Box$

Hopefully this post provided some clarity on how to approach writing proofs of correctness for greedy algorithms. The technique described here is fairly general, and is also very useful for dynamic programming proofs, which shows up more often than not in various machine learning tasks. It's pretty powerful to be able to convincingly argue why a particlar algorithm would always yield a correct answer.