Week 5
VL 9 - 13.05.25
Review
- review of quick sort
Efficiency of Randomized Quicksort
- choose Pivot randomly \(\Rightarrow\)
k
can be any index in[l, r]
with the same probability - good case - k ends up in the middle:
- sub arrays for the recursive calls have both size \(\frac{r - l}{2}\) \(\Rightarrow\) we profit from divide and conquer
- bad case - k ends up on the edge (
k == l of k == r
) \(\Rightarrow\) Recursion is not advantagous, we do not profit from D&C. - typical case: typical case is neither the best, nor the best case - it’s the average of those, i.e.
k
is somewhere between being exactly in the middle or exactly on the edges - what happens here ?
Before let’s determine the run time of the best and the worst cases.
For recursive algorithms the following genera run time formula holds:
\[L_{\text{total}}(N) = L_{\text{local}}(N) + L_{\text{recursive}}(N)\]
for quicksort \(L_{\text{local}}(N) = L_{\text{parition}}(N)\).
Bad Case
\[\begin{align*} L(N) &= (N + 1) + &&\text{// partition} \\ &\quad\;\; L(0) + L(N - 1) &&\text{// recursive calls} \\ &= N + 1 + 0 + L(N - 1) \quad &&\text{// since } L(0) = 0 \text{ and } L(1) = 0 \\ &= (N + 1) + L(N - 1) \\ &= (N + 1) + (N) + L(N - 2) \\ &\;\;\vdots \\ &= (N + 1) + N + \cdots + 2 + 1 \\ &= \sum_{k = 1}^{N + 1} k \\ &= \frac{(N + 1)(N + 2)}{2} \\ &\in \Theta(N^2) \end{align*}\]
Best Case
Choose N, s.t. both sub arrays are of the same size: N in {1, 3, 7, 15, …, 2^n - 1, …}. Then:
\[\begin{align*} L(2^M - 1) &= 2^M &&\text{partition} \\ &\quad +\; 2 \cdot L(2^{M - 1} - 1) &&\text{recursion on two halves} \\ &= 2^M + 2 \cdot 2^{M - 1} + 2^2 \cdot 2^{M - 2} + \cdots + 2^{M - 1} \cdot 2 &&\text{expanding recursively} \\ &= \sum_{k = 1}^{M - 1} 2^k \cdot 2^{M - k} &&\text{collecting all terms} \\ &= (M - 1) \cdot 2^M &&\text{each term simplifies to } 2^M \\ &= \log_2(N + 1) \cdot (N + 1) &&\text{since } N = 2^M - 1 \Rightarrow M = \log_2(N + 1) \\ &\in \Theta(N \log N) &&\text{best-case runtime} \end{align*}\]
Average Case
The average case of Quicksort involves a recurrence that doesn’t simplify as neatly as the worst and best cases because it includes a weighted sum over all possible pivot positions.
The key recurrence for the expected number of comparisons in Quicksort is:
\[ L(N) = N + 1 + \frac{1}{N} \sum_{k=0}^{N - 1} \left( L(k) + L(N - 1 - k) \right) \]
This models the expected cost when the pivot lands uniformly at random. By symmetry and change of variables, it simplifies to:
\[ L(N) = N + 1 + \frac{2}{N} \sum_{k=0}^{N - 1} L(k) \]
That leads to a recurrence for \(L(N)\) involving cumulative sums, and can be shown (via substitution or using generating functions) that the solution satisfies:
\[ L(N) \in \Theta(N \log N) \]
Derivation:
\[\begin{align*} L(N) &= N + 1 &&\text{partition step: compares } N \text{ elements to pivot} \\ &\quad + \frac{1}{N} \sum_{k=0}^{N-1} \left( L(k) + L(N - 1 - k) \right) &&\text{average over all pivot positions} \\ &= N + 1 + \frac{2}{N} \sum_{k=0}^{N-1} L(k) &&\text{by symmetry: } L(k) = L(N - 1 - k) \\ \Rightarrow N L(N) &= N(N + 1) + 2 \sum_{k=0}^{N-1} L(k) &&\text{multiply both sides by } N \\ \text{Let } A(N) &= \sum_{k=0}^{N} L(k) &&\text{define cumulative sum} \\ \Rightarrow N L(N) &= N(N + 1) + 2 A(N - 1) \\ \Rightarrow L(N) &= (N + 1) + \frac{2}{N} A(N - 1) \\ \text{Now guess: } L(N) &\in \Theta(N \log N) &&\text{solve via master theorem or induction} \end{align*}\]
Alternatively:
Your professor is summarizing a classic analysis of the average-case comparisons in Quicksort using a more elegant expression derived from solving the recurrence via indicator random variables or expected value over pivot positions.
Key Idea:
- Let \(L(N)\) be the expected number of comparisons when sorting \(N\) elements with Quicksort.
- Each pair \((i, j)\) is compared at most once, and only if one of them is chosen as a pivot before any of the elements between them.
Probability Argument:
- For any two distinct elements \(a_i\) and \(a_j\), the probability that they are compared is \(\frac{2}{|j - i| + 1}\).
- This is because the first pivot chosen among \(a_i, a_{i+1}, ..., a_j\) must be either \(a_i\) or \(a_j\) for them to be directly compared.
Summing Over All Pairs:
There are \(\binom{N+1}{2}\) pairs, and the expected cost is:
\[ L(N) = \sum_{1 \le i < j \le N+1} \frac{2}{j - i + 1} \]
This sum can be manipulated into the form:
\[ L(N) = 2(N+1) \sum_{k=2}^{N+1} \frac{1}{k} = 2(N+1) \left( \sum_{k=1}^{N+1} \frac{1}{k} - 1 \right) \]
Harmonic Approximation:
The harmonic sum \(H_{N+1} = \sum_{k=1}^{N+1} \frac{1}{k} \approx \ln(N+1) + \gamma\)
So:
\[ L(N) \approx 2(N+1)(\ln(N+1) + \gamma - 1) \in \Theta(N \log N) \]
\[\begin{align*} L(N) &= \sum_{1 \le i < j \le N+1} \mathbb{P}(\text{elements } i \text{ and } j \text{ are compared}) \\ &= \sum_{1 \le i < j \le N+1} \frac{2}{j - i + 1} \\ &= 2 \sum_{d=1}^{N} (N + 1 - d) \cdot \frac{1}{d + 1} &&\text{substitute } d = j - i \\ &= 2(N+1) \sum_{k=2}^{N+1} \frac{1}{k} \\ &= 2(N+1) \left( \sum_{k=1}^{N+1} \frac{1}{k} - 1 \right) \\ &= 2(N+1) \left( H_{N+1} - 1 \right) &&\text{where } H_n = \sum_{k=1}^{n} \frac{1}{k} \\ &\approx 2(N+1) \left( \ln(N+1) + \gamma - 1 \right) \\ &\in \Theta(N \log N) \end{align*}\]
Correctness
- if an algorithm is not correct, all other properties (elegance, efficiency) are irrelevant. Correctness is therefore a fundamental concept
- two aspecs:
- specification: what not how
- implementation: how
\(\Rightarrow\)
- validation: Deos the specification really satisfy our goal?
- verification: Does the algorithms really implement the specification?
Nothing prevents conceptually the specification to be probability based. I.e. the correct solution is given as a probability and not as a precise value \(\Rightarrow\) e.g. Neural networks
- problem: how are randomized algorithms tested? (open problem)
3 Approach to Correctness
- Formal proof of Correctness (Royal way - but not practical)
- Assistance by the programming language itself
- Programming language reports errors when something wrong is done
- Programming language prevents from doing errors (strongly typed languages)
- Systematic software testing
- in python the module “unittest”
Formal Proof of Correctness
two approaches:
- On the basis of pseudo-code (does the algorithm do the the correct thing in principle, typically in literature)
- On the basis of the implementation (very complicated, usually only applied for critical software)
- bank software
- parts of the operating systems in aircraft \(\Rightarrow\) automated theorem prover.
VL 10 - 15.05.25
Correctness (Cont.)
Correctness on the level of pseudocode:
uses Axioms:
- Pre- and Postconditions
- Loop invariants \(\Rightarrow\) Induction proof
example1: finding the minimum element of an array
def min(a): = len(a) N = 0 m # i == 0 # invariant: a[m] == min(a[0..i]) for i in range(N-1): if a[i + 1] < a[m]: m = i + 1 # post: i == N - 1 return a[m]
example 2: selection sort
def selection_sort(a): = len(a) N # i == 0 # invariant: a[0..i-1] <= a[i..N-1] and sorted(a[0..i-1]) and i <= N while i in range(N - 1): = i m # k == i # invariant: a[m] == min(a[i..k]) for k in range(i, N-1): if a[k+1] < a[m]: m = k + 1 # post: k == N - 1 = a[i], a[m] a[m], a[i] # post: i == N - 1
axioms:
- Pre: -
- Post:
- len(a’) == len(a)
- a’ has the same elements as a
- a’ is sorted, i.e. a[i - 1] <= a[i], for all i in [1..N] …
Correctness on the Basis of Software Testing
Example for Testing, babylonian root method for finding square root of a number
def sqrt(x):
if x < 0:
raise ValueError("sqrt(x) with x < 0 not allowed")
= x // 2 # bug floor division
y while y * y != x:
= (y + x / 2) / 2
y return y
We use the unit test module:
class SqrtTest(unittest.TestCase):
def testSqrt(Self): # in this function we write the tests
self.assertRaises(ValueError, sqrt, -1)
self.assertEqual(sqrt(9), 3)
self.assertEqual(sqrt(1), 1) # bud => y = x / 2 instead of y = x // 2
self.assertEqual(sqrt(1.21)**2, 1.21) # bug because the function never finishes;
# infinite loop due to while loop condition
# fix: while abs(y**2 / x - 1) > 1e-15 (relative error)
# but the error is still there
# we need a new test that accepts a tolarance
self.assertAlmostEqual(sqrt(1.21)**2, 1.21)
self.assertEqual(sqrt(4), 2)
self.assertEqual(sqrt(0), 0) # bug because we divide by zero in the while condition
# fix: while abs(x - y**2) > 1e-15