algomaths
What
- A rolling hash algorithm where the hash is a polynomial function with the input values as coefficients.
- Uses only multiplications and additions.
H(s[1⋯k])=i=1∑ksi∗pk−i=s1∗pk−1+s2∗pk−2+⋯+sk∗p0modmmodm
where:
- s[1⋯k] : the substring of s from 1 to k
- k: the window size
- p and m: are some positive integers
- modulo m was done to avoid huge H values
- the choice of p and m affects the performance and the security of the hash function
- p should be chosen to have a modular multiplicative inverse
- for lower-case only strings, p=31 is a good choice.
- competitive programmers prefer a larger value of p: 29791, 11111, 111111.
- p should be greater than the alphabet size (number of possible values for si)
- m is typically a prime number.
- should be large since the colliding probability is m1.
- 109+7 and 109+9 are widely used.
Note:
- avoid converting s[i]→0 (e.g. a→0) because then the hashes for
a, aa, aaa, ...
are all 0
- the power series can be in reverse, i.e.
s1∗p0+s2∗p1+⋯+sk∗pk−1
To move window to the right:
H(s[2⋯k+1])=(H[1⋯k]−s1∗pk−1)∗p+sk+1modm
To move window to the left:
H(s[0⋯k−1])=(H[1⋯k]−sk)∗p−1+s0∗pk−1modm
- p−1 is a modular multiplicative inverse of p by which H can be multiply to get the result of modulo division without actually performing a division.
- p−1≡pm−2modm if m is prime and p is coprime to m
- In python, use the built in
pow
to calculate modular exponentiation efficiently:
Implementation
Applications
- usually used in the Rabin-Karp string search algorithm.