Let’s say we need to find out whether the email address “demensdeum@gmail.com” is on the list of email addresses allowed to receive emails.
We’ll go through the entire list from the first to the last element, checking whether the element is equal to the specified address – we’ll implement a linear search algorithm. But it will take a long time, won’t it?
To answer this question, we use the “Time complexity of algorithms”, “O” notation. The worst-case linear search time is equal to the n-th number of array elements, we write this in “O” notation – O(n). Next, we need to explain that for any known algorithm, there are three performance indicators – the best-case, worst-case, and average-case execution time. For example, the email address “demensdeum@gmail.com” is in the first index of the array, then it will be found in the first step of the algorithm, which means that the best-case execution time is – O(1); and if it is at the end of the list, then this is the worst case – O(n)
But what about the software implementation details, hardware performance, they should influence big O, right? Now take a deep breath and imagine that the calculation of time complexity is calculated for some abstract ideal machine, which has only this algorithm and nothing else.
Algorithm
Okay, it turns out that linear search is quite slow, let’s try using Binary Search. First, it should be explained that we will not be working with binary data, this name was given to this method due to the peculiarities of its operation. Initially, we sort the array in lexicographic order, then the algorithm takes the range of the entire array, gets the middle element of the range, compares it lexicographically, and depending on the comparison result decides which range to take for further search – the upper half of the current one or the lower one. That is, at each step of the search a decision is made from two possible ones – binary logic. This step is repeated until either the word is found or not found (the lower and upper indices of the range intersect).
The performance of this algorithm is – the best case when an element in the middle of the array is immediately found is O(1), the worst case of enumeration is O(log n)
Pitfalls
When implementing binary search, I encountered not only an interesting problem of lack of standardization of lexicographic comparison in programming language libraries, but even discovered the absence of a single standard for implementing localeCompare within JavaScript. The ECMAScript standard allows for different implementations of this function, which is why when sorting using localeCompare, completely different results can be observed on different JavaScript engines.
Therefore, for the algorithm to work correctly, it is necessary to sort and use only the same lexicographic comparison algorithm in the work, otherwise nothing will work. But if, for example, you try to sort an array in Scala, and search using nodejs, without implementing your own sorting/sorting of one implementation, then nothing awaits you except disappointment in humanity.
Sources
What is lexicographic comparison and what does it represent?
Почему для вычисления сложности алгоритмов используется log N вместо lb N?
Двоичный поиск
Знай сложности алгоритмов
https://stackoverflow.com/questions/52941016/sorting-in-localecompare-in-javascript
Source code
https://gitlab.com/demensdeum/algorithms