字符串哈希（含例题）

tech2023-10-21 122

字符串哈希 • 字符串的hash是通过某种字符串hash函数将不同的字符串映射到不同的数字，配合其他数据结构或STL，进行判重，统计，查询等操作。实现哈希 • 一个常用的字符串hash函数是hash[i]=(hash[i-1]*p+idx(s[i]))%mod，即hash[i]是字符串的前i个字符组成的前缀的hash值，而idx(s)为字符s的一个自定义索引，例如，idx(‘a’)=1，idx(‘b’)=2，……，idx(‘z’)=26。 • 例如，p=7，mod=91，把字符串"abc"映射为一个整数： hash[0]=idx(‘a’)%91=1，字串"a"被映射为1； hash[1]=(hash[0]*p+idx(‘b’))%mod=9，表示字符串"ab"被映射为9； hash[2]=(hash[1]*p+idx(‘c’))%mod=66，所以，字符串"abc"被映射成66。运用哈希 • 基于字符串hash函数，可以求字符串的任何一个子串的hash值：hash[l…r]=((hash[r]-hash[l-1]*p^(r-l+1)) %mod+mod) %mod。 • 如上例，对于字符串"abab"， hash[2]=(hash[1]*p+idx(‘a’))%mod=64，表示字符串"aba"被映射为64； hash[3]=(hash[2]*p+idx(‘b’))%mod=86，即字符串"abab"被映射为86。则hash[2…3]=( (hash[3]-hash[1]*p²)%mod+mod)%mod =9=hash[1]，即字符串"abab"的第一个"ab"子串和第二个"ab"子串所对应的hash值相同，都是9。 • p和mod取值要合适，否则可能会出现不同字符串有相同的hash值。一般p和mod要取素数，p取一个6位到8位的较大的素数，mod取一个大素数，比如109+7，或者109+9。优化哈希 •经验值：根据极多大佬的计算得出p取131或13331时hash值重合概率较小，mod则取尽量大的素数 •溢出取模：用unsigned long long定义hash数组，在计算过程中因为unsigned long long的数据范围在-2⁶⁴~2⁶⁴,数据溢出时相当于对2⁶⁴取模。这样可以省去取模操作。 •在计算p的n次方时，用普通的累乘或pow函数常常会超时，这里可以采用快速幂，或在计算hash[i]值的同时计算p的i次方存在一个数组中

快速幂代码

ll cal(int x,ll y) //计算和返回 y^x % mod 的结果值 { ll re = 1; while(x) { if(x&1) re = (re*y)%mod; x >>= 1; y = (y *y)%mod; } return re; }

题目地址：POJ2406【Power Strings】

例题讲解

AC代码：

#include<iostream> #include<string.h> using namespace std; typedef unsigned long long ULL; const int N = 1000010,base = 131; char str[N]; ULL h[N],p[N]; int len; ULL get(int l,int r)//获得 l 到 r 的 hash 值 { return h[r] - h[l - 1] * p[r - l + 1]; } bool check(int x)//若所有长度为 x 的相邻子串对应的散列函数值相等，则返回 true；否则返回 false { for(int i = x*2 ;i <= len;i += x) if(h[x] != get(i-x+1,i)) return false; return true; } int main() { while(~scanf("%s",str + 1)) { len = strlen(str + 1); if(len == 1&&str[1] == '.') return 0; h[0] = 0,p[0] = 1; for(int i = 1;i <= len;i ++ ) { h[i] = h[i-1] * base + str[i] - 'a' + 1; p[i] = p[i-1] * base; } for(int i = 1;i <= len;i ++ ) if(len%i == 0 && check(i)) { printf("%d\n",len/i); break; } } return 0; }

最新回复(0)