Shell Sort

Shell Sort is a variant of insertion sorting with preliminary combing of the array of numbers.

We need to remember how insertion sort works:

1. The loop starts from zero to the end of the loop, thus the array is divided into two parts
2. For the left part, the second cycle is started, comparing elements from right to left, the smaller element on the right is omitted until a smaller element on the left is found
3. At the end of both cycles, we get a sorted list

Once upon a time, computer scientist Donald Shell was puzzled by how to improve the insertion sort algorithm. He came up with the idea of ​​also going through the array with two cycles, but at a certain distance, gradually reducing the “comb” until it turns into a regular insertion sort algorithm. Everything is really that simple, no pitfalls, we add another cycle to the two cycles on top, in which we gradually reduce the size of the “comb”. The only thing you will need to do is check the distance when comparing, so that it does not go beyond the array.

A really interesting topic is the choice of the sequence of changing the comparison length at each iteration of the first cycle. It is interesting because the performance of the algorithm depends on it.

A table of known variants and time complexity can be found here: https://en.wikipedia.org/wiki/Shellsort#Gap_sequences

Different people were involved in calculating the ideal distance, apparently they were so interested in the topic. Couldn’t they just run Ruby and call the fastest sort() algorithm?

In general, these strange people wrote dissertations on the topic of calculating the distance/gap of the “comb” for the Shell algorithm. I simply used the results of their work and checked 5 types of sequences, Hibbard, Knuth-Pratt, Chiura, Sedgewick.

import time
import random
from functools import reduce
import math

DEMO_MODE = False

if input("Demo Mode Y/N? ").upper() == "Y":
    DEMO_MODE = True

class Colors:
    BLUE = '\033[94m'
    RED = '\033[31m'
    END = '\033[0m'

def swap(list, lhs, rhs):
    list[lhs], list[rhs] = list[rhs], list[lhs]
    return list

def colorPrintoutStep(numbers: List[int], lhs: int, rhs: int):
    for index, number in enumerate(numbers):
        if index == lhs:
            print(f"{Colors.BLUE}", end = "")
        elif index == rhs:
            print(f"{Colors.RED}", end = "")
        print(f"{number},", end = "")
        if index == lhs or index == rhs:
            print(f"{Colors.END}", end = "")
        if index == lhs or index == rhs:
            print(f"{Colors.END}", end = "")
    print("\n")
    input(">")

def ShellSortLoop(numbers: List[int], distanceSequence: List[int]):
    distanceSequenceIterator = reversed(distanceSequence)
    while distance:= next(distanceSequenceIterator, None):
        for sortArea in range(0, len(numbers)):
            for rhs in reversed(range(distance, sortArea + 1)):
                lhs = rhs - distance
                if DEMO_MODE:
                    print(f"Distance: {distance}")
                    colorPrintoutStep(numbers, lhs, rhs)
                if numbers[lhs] > numbers[rhs]:
                    swap(numbers, lhs, rhs)
                else:
                    break

def ShellSort(numbers: List[int]):
    global ShellSequence
    ShellSortLoop(numbers, ShellSequence)

def HibbardSort(numbers: List[int]):
    global HibbardSequence
    ShellSortLoop(numbers, HibbardSequence)

def ShellPlusKnuttPrattSort(numbers: List[int]):
    global KnuttPrattSequence
    ShellSortLoop(numbers, KnuttPrattSequence)

def ShellPlusCiuraSort(numbers: List[int]):
    global CiuraSequence
    ShellSortLoop(numbers, CiuraSequence)

def ShellPlusSedgewickSort(numbers: List[int]):
    global SedgewickSequence
    ShellSortLoop(numbers, SedgewickSequence)

def insertionSort(numbers: List[int]):
    global insertionSortDistanceSequence
    ShellSortLoop(numbers, insertionSortDistanceSequence)

def defaultSort(numbers: List[int]):
    numbers.sort()

def measureExecution(inputNumbers: List[int], algorithmName: str, algorithm):
    if DEMO_MODE:
        print(f"{algorithmName} started")
    numbers = inputNumbers.copy()
    startTime = time.perf_counter()
    algorithm(numbers)
    endTime = time.perf_counter()
    print(f"{algorithmName} performance: {endTime - startTime}")

def sortedNumbersAsString(inputNumbers: List[int], algorithm) -> str:
    numbers = inputNumbers.copy()
    algorithm(numbers)
    return str(numbers)

if DEMO_MODE:
    maximalNumber = 10
    numbersCount = 10
else:
    maximalNumber = 10
    numbersCount = random.randint(10000, 20000)

randomNumbers = [random.randrange(1, maximalNumber) for i in range(numbersCount)]

ShellSequenceGenerator = lambda n: reduce(lambda x, _: x + [int(x[-1]/2)], range(int(math.log(numbersCount, 2))), [int(numbersCount / 2)])
ShellSequence = ShellSequenceGenerator(randomNumbers)
ShellSequence.reverse()
ShellSequence.pop()

HibbardSequence = [
    0, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095,
    8191, 16383, 32767, 65535, 131071, 262143, 524287, 1048575,
    2097151, 4194303, 8388607, 16777215, 33554431, 67108863, 134217727,
    268435455, 536870911, 1073741823, 2147483647, 4294967295, 8589934591
]

KnuttPrattSequence = [
    1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524, 88573, 265720, 
    797161, 2391484, 7174453, 21523360, 64570081, 193710244, 581130733, 
    1743392200, 5230176601, 15690529804, 47071589413
]

CiuraSequence = [
            1, 4, 10, 23, 57, 132, 301, 701, 1750, 4376, 
            10941, 27353, 68383, 170958, 427396, 1068491, 
            2671228, 6678071, 16695178, 41737946, 104344866, 
            260862166, 652155416, 1630388541
]

SedgewickSequence = [
            1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905,
            8929, 16001, 36289, 64769, 146305, 260609, 587521, 
            1045505, 2354689, 4188161, 9427969, 16764929, 37730305, 
            67084289, 150958081, 268386305, 603906049, 1073643521, 
            2415771649, 4294770689, 9663381505, 17179475969
]

insertionSortDistanceSequence = [1]

algorithms = {
    "Default Python Sort": defaultSort,
    "Shell Sort": ShellSort,
    "Shell + Hibbard" : HibbardSort,
    "Shell + Prat, Knutt": ShellPlusKnuttPrattSort,
    "Shell + Ciura Sort": ShellPlusCiuraSort,
    "Shell + Sedgewick Sort": ShellPlusSedgewickSort,
    "Insertion Sort": insertionSort
}

for name, algorithm in algorithms.items():
    measureExecution(randomNumbers, name, algorithm)

reference = sortedNumbersAsString(randomNumbers, defaultSort)

for name, algorithm in algorithms.items():
    if sortedNumbersAsString(randomNumbers, algorithm) != reference:
        print("Sorting validation failed")
        exit(1)

print("Sorting validation success")
exit(0)

In my implementation, for a random set of numbers, the fastest gaps are Sedgewick and Hibbard.

mypy

I would also like to mention the static typing analyzer for Python 3 – mypy. It helps to cope with the problems inherent in languages ​​with dynamic typing, namely, it eliminates the possibility of slipping something where it shouldn’t.

As experienced programmers say, “static typing is not needed when you have a team of professionals”, someday we will all become professionals, we will write code in complete unity and understanding with machines, but for now you can use such utilities and languages ​​with static typing.

Links

https://gitlab.com/demensdeum /algorithms/-/tree/master/sortAlgorithms/shellSort
http://mypy-lang.org/

Sources

https://dl.acm.org/doi/10.1145/368370.368387
https://en.wikipedia.org/wiki/Shellsort
http://rosettacode.org/wiki/Sorting_algorithms/Shell_sort
https://ru.wikipedia.org/wiki/Сортировка_Шелла
https://neerc.ifmo.ru/wiki/index.php?title=Сортировка_Шелла
https://twitter.com/gvanrossum/status/700741601966985216