Week 4

Published

May 6, 2025

VL 7 - 06.05.25

Selection Sort (Sorting by selecting)

principle:

sorintg from left to write (left to the current position is all sorted)
select the appropriate element in the right-side of the current position.

def selection_sort(a) :
    N = len(a)
    # invariant: sorted(a[0..i-1]) and a[0..i-1] <= a[i..N-1]
    for i in range(N - 1) :  # iteration
        j = i 
        # k = i + 1
        # invariant: a[j] == min(a[i..k-1])
        for k in range(i + 1, N) : 
            if a[k] < a[j] : j = k
        a[i], a[j] = a[j], a[i]

a = [3, -1, 5, 10]
selection_sort(a)
print(a)

[-1, 3, 5, 10]

Complexity Analysis of Selection Sort

How many steps does selection sort take?

obviously depends on N.
estimating the runtime: \(L(N) \approx c \cdot f(N)\), where \(f(N)\) is some simple function (the formally correct versiono later with the \(\mathcal{O}(N)\) notation)

The inner loop compares a[k] < a[j] for each k in range(i + 1, N), so the number of comparisons decreases as i increases. Here’s the completed table based on the structure of the nested loops:

i	k	number of comparisons
0	1 … N - 1	N - 1
1	2 … N - 1	N - 2
2	3 … N - 1	N - 3
…	…	…
N-3	N-2 … N - 1	2
N-2	N - 1	1
N-1	— (no iteration)	0

Total number of comparisons:

To find the overall time complexity, sum all the comparisons:

\[ (N - 1) + (N - 2) + \dots + 1 = \frac{N(N - 1)}{2} \]

So, the time complexity of selection sort is O(N²) in the worst, average, and best cases (it always does the same number of comparisons).

How to make sorting functions generic so that they can be provided in libraries? \(\Rightarrow\) API (Application Programming Interface), so that the user can choose the sorting criteria the following way:

def selection_sort2(a, key = lambda x : x): 
    N = len(a)
    # i = 0
    # invariant sorted(a[0..i]) and a[0..i-1] <= a[]
    for i in range(N - 1) :
        m = i
        for  k in range(i + 1, N) :
            if key(a[k]) < key(a[m]): m = k
        a[i], a[m] = a[m], a[i]

Assume we have an array that contains tuples of students:

students = [("andrej", 20, 1.3),
            ("igor", 22, 2.7),
            ("olga", 21, 1.7)]

Then, the following way we could sort on different criteria:

selection_sort2(students, lambda x : x[0]) # sort w.r.t. names
print(students)
selection_sort2(students, lambda x : x[1]) # sort w.r.t. age
print(students)
selection_sort2(students, lambda x : x[2]) # sort w.r.t. grade
print(students)

[('andrej', 20, 1.3), ('igor', 22, 2.7), ('olga', 21, 1.7)]
[('andrej', 20, 1.3), ('olga', 21, 1.7), ('igor', 22, 2.7)]
[('andrej', 20, 1.3), ('olga', 21, 1.7), ('igor', 22, 2.7)]

Stable Sorting

Stable sorting preserves the original order of elements with equal keys, which is especially useful when sorting complex objects.

Example: Given a = [("Abel", 20), ("Babel", 20), ("Frey", 19)], initially sorted by name, sorting again by age using sort(a, key=lambda x: x[1]) yields a = [("Frey", 19), ("Abel", 20), ("Babel", 20)].

The idea behind stable sorting is that you can first sort by a secondary criterion (e.g., names), then by a primary one (e.g., grades), and the final result will retain the secondary order within groups that share the same primary key. Without stability, you’d need to re-sort each group manually.

Insertion Sort

insertion sort is stable
insertion sort is the fastest algorithm for small N.
Principle:
- sorts from left to right.
- create a hole in the correct position for the element a[i]

def insertion_sort(a):
    N = len(a)
    # i = 1
    # sorted(a[0..i-1])
    for i in range(1, N):
        k = i
        while k > 0:
            if a[k] < a[k - 1]: 
                a[k - 1], a[k] = a[k], a[k - 1]
            else: break
            k = k - 1

a = [1, -3, 4, 10, -99]
insertion_sort(a)
print(a)

[-99, -3, 1, 4, 10]

Run-time analysis of Insertion Sort

Runtime of insertion sort depends on whether the array was sorted before hand. (In selection sort run time is always the same formula)

Case 1(Best Case): array is already sorted \(\Rightarrow \mathcal{O}(n)\) , since inner while loop always executes the break statement immediately.
Case 2 (Worst case): array is reverse sorted \(\Rightarrow \mathcal{O}(n^2)\) , since each inner while loop always executes the whole range of \(i\)s.
Case 3 (Average case): Array is randomly sorted, i.e. intuitively somewhere in between sorted and reversly sorted. In this case the inner while loop breaks after \(\frac{i}{2}\). Then the complexity can be written as

\[T(N) = \frac{c}{2}(1 + 2 + \ldots + (N - 1)) \in \mathcal{O}(N^2)\]

Faster Sorting Algorithms

faster sorter algorithms that run in \(\mathcal{O}(N\cdot\log{N})\) and \(\mathcal{O}(N)\) (bucket sort).
such algoriths are faster because they employ the “divide and conquer” principle (recursion):
- divide the problem to smaller sub-problems and solve the smaller problems
- combine the smaller solutions
two classic exmples of such algorithms:
- merge sort:
  - divide the array in two equal parts
  - sorts the half errays
  - merge the sorted half arrays
- quick sort:
  - choose a pivot element
  - divide the array into “smaller than pivot” and “greater than pivot”
  - sort the individual parts.

VL 8 - 06.05.25

Efficient Sorting Algorithms (Cont.)

for small values, \(N < 30\) insertion sort is the fastest algorithm
for larger \(N\) efficient sorting algorithms must be preffered: merge sort and quicksort (divide & conquer)

Merge Sort

Divide the array into two equal parts
sort the half arrays
merge the sorted sub parts

def merge_sorted_arrays(l, r): # both l and r are sorted
  a = [] # sorted array
  i, k = 0, 0
  Nl, Nr = len(l), len(r)
  # a == merged(l[0:i], r[0:k]
  while i < Nl and k < Nr: 
    if l[i] <= r[k]: # <= instead of < so that sorting is stablef
      a.append(l[i]) # append is efficient
      i += 1
    else:
      a.append(r[k])
      k += 1
  a += l[i : Nl] # append rest of l
  a += r[k : Nr] # append rest of r
  return a

l = [1, 3, 10, 11]
r = [-1, 2, 20]

a = merge_sorted_arrays(l, r)
print(a)

[-1, 1, 2, 3, 10, 11, 20]

the final step

a += l[i : Nl] # append rest of l
a += r[k : Nr] # append rest of r

is enabled by the elegant python slicing syntax. The usual way without slicing would be:

while i < Nl:
  a.append(l[i])
  i += 1
while k < Nr:
  a.append(r[k])
  k += 1

Now we define merge sort recursively as follows:

def merge_sort(a):
  N = len(a)
  if N <= 1: return a
  l = merge_sort(a[: N//2])
  r = merge_sort(a[N//2:])
  return merge_sorted_arrays(l, r)

a = list(range(20))
random.shuffle(a)
print(a)
a = merge_sort(a)
print(a)

[15, 12, 3, 13, 14, 8, 4, 9, 19, 10, 6, 2, 17, 7, 1, 0, 18, 11, 5, 16]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Disadvantage: Merge sort is not in-place \(\Rightarrow\) extra memory is necessary for each merge sort call
Advantage: Merge sort is efficient and stable

Run-time Analysis of Merge Sort

Runtime of merge sort satisfies the following formula both in worst and best case:

\[\begin{align*} &T(0 || 1) = c_1 \\ &T(N) = c_2 + 2\cdot T(\frac{N}{2}) + c_3N \end{align*}\]

TODO: tldr diagram and solution of the recurrence relation.

Quicksort

In practice somewhat faster than merge sort (with a good choice of pivot)
not stable but in place.
idea:
1. choose an element of the array as “pivot”. (naiv, better is to choose pivot randomly)
2. partition the array: (incomplete sorting)
  1. pivot is at the final position.
  2. all elements <= pivot are left to the pivot
  3. all elements > pivot are right the pivot
  4. the left and right portions are not sorted
3. call quicksort recursively on the left and right partitions (divide and conquer)

first we define the paritioning function:

def partition(a, l, r):  # l, r are left and right boundaries (inclusive) of a
  pivot = a[r] # naive choice: Pivot
  i = l
  k = r - 1
  while True:
    while i < r and a[i] <= pivot: i += 1 # increment until found a misplaced element
    while k > l and a[k] >= pivot: k -= 1 # decrement until found a misplaced element
    if i < k : a[i], a[k] = a[k], a[i]
    else : break
  a[i], a[r] = a[r], a[i] # bring pivot to the correct position
  return i # so that recursive calls now where the partitions are

next we define

def quick_sort_impl(a, l, r):
  if r <= l: return # no return argument since in-place
  i = partition(a, l, r)
  quick_sort_impl(a, l, i-1)
  quick_sort_impl(a, i+1, r)

finally we define a wrapper function as an interface for the user:

def quick_sort(a):
  quick_sort_impl(a, 0, len(a) - 1)

a = list(range(20))
random.shuffle(a)
print(a)
quick_sort(a)
print(a)

[12, 16, 17, 6, 1, 4, 19, 5, 15, 10, 2, 9, 14, 0, 7, 13, 8, 3, 18, 11]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Why choosing pivot as the first element is naive?

if pivot always stays on the first position \(\Rightarrow\) \(N^2\). This happens if the arrray is already sorted (TODO: understand thus)
a naive solution: shuffle the array before sorting (of course doesn’t make much sense) \(\Rightarrow\) uquivalently: choose the pivot randomly each time.
```
import random
p = random.randint(l, r) 
a[p], a[r] = a[r], a[p] # replace right-most element with this random element
pivot = a[r]
```