Insertion Sort, Merge Sort

Insertion Sort

Insertion Sort – each element is compared to the previous ones in the list and the element is swapped with the larger one if there is one, otherwise the inner comparison loop stops. Since the elements are sorted from first to last, each element is compared to the already sorted list, which *may* reduce the overall running time. The time complexity of the algorithm is O(n^2), i.e. identical to bubble sort.

Merge Sort

Merge sort – the list is divided into groups of one element, then the groups are “merged” in pairs with simultaneous comparison. In my implementation, when merging pairs, the elements on the left are compared with the elements on the right, then moved to the resulting list, if there are no more elements on the left, then all the elements on the right are added to the resulting list (their additional comparison is unnecessary, since all elements in the groups undergo sorting iterations)
The operation of this algorithm is very easy to parallelize, the pair merging stage can be performed in threads, waiting for the end of iterations in the dispatcher.
Output of the algorithm for single-threaded execution:

["John", "Alice", "Mike", "#1", "Артем", "20", "60", "60", "DoubleTrouble"]
[["John"], ["Alice"], ["Mike"], ["#1"], ["Артем"], ["20"], ["60"], ["60"], ["DoubleTrouble"]]
[["Alice", "John"], ["#1", "Mike"], ["20", "Артем"], ["60", "60"], ["DoubleTrouble"]]
[["#1", "Alice", "John", "Mike"], ["20", "60", "60", "Артем"], ["DoubleTrouble"]]
[["#1", "20", "60", "60", "Alice", "John", "Mike", "Артем"], ["DoubleTrouble"]]
["#1", "20", "60", "60", "Alice", "DoubleTrouble", "John", "Mike", "Артем"]

Output of the algorithm for multithreaded execution:

["John", "Alice", "Mike", "#1", "Артем", "20", "60", "60", "DoubleTrouble"]
[["John"], ["Alice"], ["Mike"], ["#1"], ["Артем"], ["20"], ["60"], ["60"], ["DoubleTrouble"]]
[["20", "Артем"], ["Alice", "John"], ["60", "60"], ["#1", "Mike"], ["DoubleTrouble"]]
[["#1", "60", "60", "Mike"], ["20", "Alice", "John", "Артем"], ["DoubleTrouble"]]
[["DoubleTrouble"], ["#1", "20", "60", "60", "Alice", "John", "Mike", "Артем"]]
["#1", "20", "60", "60", "Alice", "DoubleTrouble", "John", "Mike", "Артем"]

The time complexity of the algorithm is O(n*log(n)), which is slightly better than O(n^2)

Sources

https://en.wikipedia.org/wiki/Insertion_sort
https://en.wikipedia.org/wiki/Merge_sort

Source code

https://gitlab.com/demensdeum /algorithms/-/tree/master/sortAlgorithms/insertionSort
https://gitlab.com/demensdeum/algorithms/-/tree/master/sortAlgorithms/mergeSort

Bubble Sort in Erlang

Bubble sort is pretty boring, but it becomes more interesting if you try to implement it in a functional language for telecom – Erlang.

We have a list of numbers, we need to sort it. The bubble sort algorithm goes through the entire list, iterating and comparing numbers in pairs. During the check, the following happens: the smaller number is added to the output list, or the numbers are swapped in the current list; if there is less on the right, the iteration continues with the next number in the iteration. This iteration is repeated until there are no more replacements in the list.

In practice, it is not worth using because of the high time complexity of the algorithm – O(n^2); I implemented it in Erlang, in an imperative style, but if you are interested, you can look for better options:

-module(bubbleSort).
-export([main/1]).

startBubbleSort([CurrentHead|Tail]) ->
    compareHeads(CurrentHead, Tail, [], [CurrentHead|Tail]).

compareHeads(CurrentHead, [NextHead|Tail], [], OriginalList) ->   
    if
        CurrentHead < NextHead ->
            compareHeads(NextHead, Tail, [CurrentHead], OriginalList);
        true ->
            compareHeads(CurrentHead, Tail, [NextHead], OriginalList)
    end;
    
compareHeads(CurrentHead, [NextHead|Tail], OriginalOutputList, OriginalList) ->
    if
        CurrentHead < NextHead ->
            OutputList = OriginalOutputList ++ [CurrentHead],
            compareHeads(NextHead, Tail, OutputList, OriginalList);
        true ->
            OutputList = OriginalOutputList ++ [NextHead],
            compareHeads(CurrentHead, Tail, OutputList, OriginalList)
    end;
  
compareHeads(CurrentHead, [], OriginalOutputList, OriginalList) ->
    OutputList = OriginalOutputList ++ [CurrentHead],
    if
        OriginalList == OutputList ->
            io:format("OutputList: ~w~n", [OutputList]);
        true ->
            startBubbleSort(OutputList)
    end.
  
main(_) ->
    UnsortedList = [69,7,4,44,2,9,10,6,26,1],
    startBubbleSort(UnsortedList).

Installation and launch

In Ubuntu, Erlang is very easy to install, just type sudo apt install erlang in the terminal. In this language, each file must be a module, with a list of functions that can be used from the outside – export. Interesting features of the language include the absence of variables, only constants, the absence of standard syntax for OOP (which does not prevent the use of OOP techniques), and of course parallel computations without locks based on the actor model.

You can launch the module either through the interactive console erl, running one command after another, or more simply through escript bubbleSort.erl; The file will look different for different cases, for example, for escript you need to make a main function from which it will start.

Sources

https://www.erlang.org/
https://habr.com/ru/post/197364/

Source code

https://gitlab.com/ demensdeum/algorithms/blob/master/bubbleSort/bubbleSort.erl

Lexicographic comparison algorithm

The lexicographic string comparison algorithm works very simply, in a loop the character codes are compared and the result is returned if the characters are not equal.

An example for the C language can be found here:
https://github.com/gcc-mirror/gcc/blob/master/libiberty/memcmp.c

It should be taken into account that the characters need to be compared in a single static encoding, for example, in Swift I used character-by-character comparison on UTF-32. The option of sorting an array using memcmp will work exactly for single-byte characters, in other cases (variable-length encodings) the order may be incorrect. I do not exclude the possibility of implementation based on variable-length encodings, but most likely it will be an order of magnitude more complicated.

Time complexity of the algorithm is O(1) in the best case, O(n) in the average and worst case

Sources

https://ru.wikipedia.org/wiki/Лексикографические_порядок

Sources

https://gitlab.com/demensdeum /algorithms/blob/master/lexiCompare/lexiCompare.swift

Binary search

Let’s say we need to find out whether the email address “demensdeum@gmail.com” is on the list of email addresses allowed to receive emails.

We’ll go through the entire list from the first to the last element, checking whether the element is equal to the specified address – we’ll implement a linear search algorithm. But it will take a long time, won’t it?

To answer this question, we use the “Time complexity of algorithms”, “O” notation. The worst-case linear search time is equal to the n-th number of array elements, we write this in “O” notation – O(n). Next, we need to explain that for any known algorithm, there are three performance indicators – the best-case, worst-case, and average-case execution time. For example, the email address “demensdeum@gmail.com” is in the first index of the array, then it will be found in the first step of the algorithm, which means that the best-case execution time is – O(1); and if it is at the end of the list, then this is the worst case – O(n)

But what about the software implementation details, hardware performance, they should influence big O, right? Now take a deep breath and imagine that the calculation of time complexity is calculated for some abstract ideal machine, which has only this algorithm and nothing else.

Algorithm

Okay, it turns out that linear search is quite slow, let’s try using Binary Search. First, it should be explained that we will not be working with binary data, this name was given to this method due to the peculiarities of its operation. Initially, we sort the array in lexicographic order, then the algorithm takes the range of the entire array, gets the middle element of the range, compares it lexicographically, and depending on the comparison result decides which range to take for further search – the upper half of the current one or the lower one. That is, at each step of the search a decision is made from two possible ones – binary logic. This step is repeated until either the word is found or not found (the lower and upper indices of the range intersect).

The performance of this algorithm is – the best case when an element in the middle of the array is immediately found is O(1), the worst case of enumeration is O(log n)

Pitfalls

When implementing binary search, I encountered not only an interesting problem of lack of standardization of lexicographic comparison in programming language libraries, but even discovered the absence of a single standard for implementing localeCompare within JavaScript. The ECMAScript standard allows for different implementations of this function, which is why when sorting using localeCompare, completely different results can be observed on different JavaScript engines.

Therefore, for the algorithm to work correctly, it is necessary to sort and use only the same lexicographic comparison algorithm in the work, otherwise nothing will work. But if, for example, you try to sort an array in Scala, and search using nodejs, without implementing your own sorting/sorting of one implementation, then nothing awaits you except disappointment in humanity.

Sources

What is lexicographic comparison and what does it represent?
Почему для вычисления сложности алгоритмов используется log N вместо lb N?
Двоичный поиск
Знай сложности алгоритмов
https://stackoverflow.com/questions/52941016/sorting-in-localecompare-in-javascript

Source code

https://gitlab.com/demensdeum/algorithms