{"id":2469,"date":"2020-01-01T15:03:07","date_gmt":"2020-01-01T12:03:07","guid":{"rendered":"http:\/\/demensdeum.com\/blog\/?p=2469"},"modified":"2024-12-16T22:32:30","modified_gmt":"2024-12-16T19:32:30","slug":"longest-common-substring","status":"publish","type":"post","link":"https:\/\/demensdeum.com\/blog\/2020\/01\/01\/longest-common-substring\/","title":{"rendered":"Longest Common Substring"},"content":{"rendered":"<p>In this note I will describe an algorithm for solving the longest common substring problem. Let&#8217;s say we are trying to decrypt encrypted binary data, let&#8217;s first try to find common patterns using the longest substring search.<br \/>Input string for example:<br \/><strong>adasDATAHEADER??jpjjwerthhkjbcvkDATAHEADER??kkasdf<\/strong><br \/>We are looking for a string that is repeated twice:<br \/><strong>DATAHEADER??<\/strong><\/p>\n<h3>Prefixes<\/h3>\n<p>First, let&#8217;s write a method for comparing the prefixes of two strings, let it return the resulting string in which the characters of the left prefix are equal to the characters of the right prefix.<br \/>For example, for the lines:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-unknown\" data-lang=\"unknown\"><code>        val lhs = \"asdfWUKI\"\n        val rhs = \"asdfIKUW\"\n\n<\/code><\/pre>\n<\/div>\n<p>The resulting string is &#8211; asdf<br \/>Example in Kotlin:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-unknown\" data-lang=\"unknown\"><code>fun longestPrefix(lhs: String, rhs: String): String {\n        val maximalLength = min(lhs.length-1, rhs.length -1)\n        for (i in 0..maximalLength) {\n            val xChar = lhs.take(i)\n            val yChar = rhs.take(i)\n                if (xChar != yChar) {\n                    return lhs.substring(0, i-1)\n                }\n        }\n        return lhs.substring(0,maximalLength)\n}\n\n<\/code><\/pre>\n<\/div>\n<h3>Brute Force<\/h3>\n<p>When it doesn&#8217;t work out the good way, it&#8217;s worth resorting to brute force. Using the longestPrefix method, we&#8217;ll go through the string in two loops, the first one takes the string from i to the end, the second one from i + 1 to the end, and passes them to the search for the largest prefix. The time complexity of this algorithm is approximately O(n^2) ~ O(n*^3).<br \/>Example in Kotlin:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-unknown\" data-lang=\"unknown\"><code>fun searchLongestRepeatedSubstring(searchString: String): String {\n        var longestRepeatedSubstring = \"\"\n        for (x in 0..searchString.length-1) {\n            val lhs = searchString.substring(x)\n            for (y in x+1..searchString.length-1) {\n                val rhs = searchString.substring(y)\n                val longestPrefix = longestPrefix(lhs, rhs)\n                if (longestRepeatedSubstring.length < longestPrefix.length) {\n                    longestRepeatedSubstring = longestPrefix\n                }\n            }\n        }\n        return longestRepeatedSubstring\n}\n\n<\/code><\/pre>\n<\/div>\n<h3>Suffix array<\/h3>\n<p>For a more elegant solution, we need a tool - a data structure called \"Suffix Array\". This data structure is an array of substrings filled in a loop, where each substring starts from the next character of the string to the end.<br \/>For example for the line:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-unknown\" data-lang=\"unknown\"><code>adasDATAHEADER??\n\n<\/code><\/pre>\n<\/div>\n<p>The suffix array looks like this:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-unknown\" data-lang=\"unknown\"><code>adasDATAHEADER??\ndasDATAHEADER??\nasDATAHEADER??\nsDATAHEADER??\nDATAHEADER??\nATAHEADER??\nTAHEADER??\nAHEADER??\nHEADER??\nEADER??\nADER??\nDER??\nER??\nR??\n??\n?\n\n<\/code><\/pre>\n<\/div>\n<h3>Solving by sorting<\/h3>\n<p>We sort the suffix array, then we go through all the elements in a loop where the left hand (lhs) contains the current element, the right hand (rhs) contains the next one, and we calculate the longest prefix using the longestPrefix method.<br \/>Example in Kotlin:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-unknown\" data-lang=\"unknown\"><code>fun searchLongestRepeatedSubstring(searchString: String): String {\n    val suffixTree = suffixArray(searchString)\n    val sortedSuffixTree = suffixTree.sorted()\n\n    var longestRepeatedSubstring = \"\"\n    for (i in 0..sortedSuffixTree.count() - 2) {\n        val lhs = sortedSuffixTree[i]\n        val rhs = sortedSuffixTree[i+1]\n        val longestPrefix = longestPrefix(lhs, rhs)\n        if (longestRepeatedSubstring.length < longestPrefix.length) {\n            longestRepeatedSubstring = longestPrefix\n        }\n    }\n    return longestRepeatedSubstring\n}\n\n<\/code><\/pre>\n<\/div>\n<p>The time complexity of the algorithm is O(N log N), which is much better than a brute-force solution.<\/p>\n<h3>Sources<\/h3>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Longest_common_substring_problem\" target=\"_blank\" rel=\"noopener\">https:\/\/en.wikipedia.org\/wiki\/Longest_common_substring_problem<\/a> <\/p>\n<h3>Source code<\/h3>\n<p><a href=\"https:\/\/gitlab.com\/demensdeum\/algorithms\" target=\"_blank\" rel=\"noopener\">https:\/\/gitlab.com\/demensdeum\/algorithms<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this note I will describe an algorithm for solving the longest common substring problem. Let&#8217;s say we are trying to decrypt encrypted binary data, let&#8217;s first try to find common patterns using the longest substring search.Input string for example:adasDATAHEADER??jpjjwerthhkjbcvkDATAHEADER??kkasdfWe are looking for a string that is repeated twice:DATAHEADER?? Prefixes First, let&#8217;s write a method<a class=\"more-link\" href=\"https:\/\/demensdeum.com\/blog\/2020\/01\/01\/longest-common-substring\/\">Continue reading <span class=\"screen-reader-text\">&#8220;Longest Common Substring&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[61,52],"tags":[131,140,141],"class_list":["post-2469","post","type-post","status-publish","format-standard","hentry","category-techie","category-tutorials","tag-algorithms","tag-lcs","tag-suffix-array","entry"],"translation":{"provider":"WPGlobus","version":"3.0.2","language":"en","enabled_languages":["en","ru","zh","de","fr","ja","pt","hi"],"languages":{"en":{"title":true,"content":true,"excerpt":false},"ru":{"title":true,"content":true,"excerpt":false},"zh":{"title":true,"content":true,"excerpt":false},"de":{"title":true,"content":true,"excerpt":false},"fr":{"title":true,"content":true,"excerpt":false},"ja":{"title":true,"content":true,"excerpt":false},"pt":{"title":true,"content":true,"excerpt":false},"hi":{"title":false,"content":false,"excerpt":false}}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/posts\/2469","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/comments?post=2469"}],"version-history":[{"count":16,"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/posts\/2469\/revisions"}],"predecessor-version":[{"id":3926,"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/posts\/2469\/revisions\/3926"}],"wp:attachment":[{"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/media?parent=2469"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/categories?post=2469"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/demensdeum.com\/blog\/wp-json\/wp\/v2\/tags?post=2469"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}