KMP算法简介-string匹配

2022-01-25 11:59:13 阅读：233 来源： 互联网

标签：string 简介 LPS lps tp bp KMP match pw

KMP算法简介

KMP算法是用来做字符串匹配的，比如目标字符串：'acabacacd’是否能在源字符串：'acfacabacabacacdkacfacabacabacacdk’找到匹配。传统的暴力解法复杂度是O(nk)的，k是目标字符串长度，n是源字符串长度。所以有优化的KMP算法复杂度为O(n)

算法核心

这种算法的核心在于如何处理某个字符不match的情况。对于传统暴力算法，当某个位置字符不match的时候，就会往下顺移一位，这也是为什么暴力算法的复杂度是O(nk)。而KMP在某个位置字符不match的时候，会跳过某些不需要的比较，牵涉到的核心概念是“Longest Proper Prefix which is Suffix” （lps数组）。lps数组是一个整数数组，长度与目标字符串一致，每个位置的整数值代表“以这个位置index为结尾的后缀传和这个位置的数字作为index的之前的前缀串相同”。这个确实很不好理解，但是总体的意思就是，有了这个数组，当某个位置字符不match的时候，我们就无需一位一位顺移，而是可以跳（具体的原理非常难以理解，所以建议暂时先理解这个high level的idea）

lps数组计算

这个数据计算具体的implementation也是非常的难以理解，虽然只有寥寥几行代码，但是想要完全理解为什么是这样依旧非常困难。如果想要完全搞清楚这个意义，建议看下面两个详细教程：教程1，教程2
下面是根据我浅显的理解写的implementation。更详细的解释在上面两个教程里面也有。想不通了就回去看看

def create_LPS(w):
    # initialize the array with all 0s
    LPS = [0] * len(w)
    # tp means the pointer at the prefix position
    tp = 0
    # bp means the pointer at the suffix position
    bp = 1
    # iterate through the whole word
    while bp < len(w):
        # if two pointers points to the same char, we give a value in the lps array and increment two pointers
        if w[tp] == w[bp]:
            LPS[bp] = tp + 1
            tp += 1 
            bp += 1 
        # if two pointers points to different char, there are two posibilities:
        else:
            # 1. if tp is 0, we keep tp and put zero value in lps, and only increment bp by one (we don't increment tp because no match char found)
            if tp == 0:
                LPS[bp] = 0
                bp += 1 
            # 2. else, comes the part that very difficult to understand, we update the tp using the value in lps
            else:
                tp = LPS[tp-1]
                
    return LPS

完整KMP算法

有了这个lps数组，一旦遇到不match的字符，直接根据lps数组跳过无需比较的位置

LPS = create_LPS(word)
pt = 0
pw = 0

while pt < len(text):
    # if current place match, keep going until found the complete match or a mismatch
    if text[pt] == word[pw]:
        pt += 1 
        pw += 1
        if pw == len(word):
            print('found',pt-len(word),pt)
            # we can jump again if we want to search for multiple matching once one match is already found
            pw = LPS[pw-1]
    # if current place does not match, two conditions
    else:
        # if pointer in word is 0, means the beginning does not match, we only need to increment the pointer in the text
        if pw == 0:
            pt += 1 
        # if pointer in word is not 0, means there is some place we can jump according to the LPS array
        else:
            pw = LPS[pw-1]

在具体的详细教程后面再写吧，确实比较难解释。。。

引用：

教程1，教程2

标签：string,简介,LPS,lps,tp,bp,KMP,match,pw
来源： https://blog.csdn.net/qq_37821701/article/details/122681569

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

KMP算法简介-string匹配

KMP算法简介

算法核心

lps数组计算

完整KMP算法

引用：