What
- A rolling hash algorithm where the hash is a polynomial function with the input values as coefficients.
- Uses only multiplications and additions.
Formula
where:
- : the substring of s from to
- : the window size
- and : are some positive integers
- modulo was done to avoid huge values
- the choice of and affects the performance and the security of the hash function
- should be chosen to have a modular multiplicative inverse
- for lower-case only strings, is a good choice.
- competitive programmers prefer a larger value of : , , .
- should be greater than the alphabet size (number of possible values for )
- is typically a prime number.
- should be large since the colliding probability is .
- and are widely used.
- should be chosen to have a modular multiplicative inverse
Note:
- avoid converting (e.g. ) because then the hashes for
a, aa, aaa, ...
are all 0 - the power series can be in reverse, i.e.
To move window to the right:
To move window to the left:
- is a modular multiplicative inverse of by which can be multiply to get the result of modulo division without actually performing a division.
- if m is prime and is coprime to
- In python, use the built in
pow
to calculate modular exponentiation efficiently:
pow(p, m - 2, m)
Implementation
class PolynomialRollingHash:
M = 10**9 + 9
P = 29791 # 31, 53, 29791, 11111, 111111
P_INV = pow(P, M - 2, M)
POWS = [1]
def __init__(self, sequence=""):
self.sequence = deque()
self.hash = 0
for element in sequence:
self.append(element)
def get_pow(self, i) -> int:
while len(self.POWS) <= i:
self.POWS.append(self.POWS[-1] * self.P % self.M)
return self.POWS[i]
def encode(self, element) -> int:
"""
Encode a single element
Note: avoid producing an encoding of 0.
E.g. a -> 0 then the hashes for a, aa, aaa, ... are all 0.
"""
# for lower-cased str inputs
return ord(element) - 96
def appendleft(self, element) -> int:
x = self.encode(element)
p = self.get_pow(len(self.sequence))
self.sequence.appendleft(element)
self.hash = (x * p + self.hash) % self.M
return self.hash
def popleft(self) -> int:
element = self.sequence.popleft()
x = self.encode(element)
p = self.get_pow(len(self.sequence))
self.hash = (self.hash - x * p) % self.M
return self.hash
def append(self, element) -> int:
x = self.encode(element)
self.sequence.append(element)
self.hash = (self.hash * self.P + x) % self.M
return self.hash
def pop(self) -> int:
element = self.sequence.pop()
x = self.encode(element)
self.hash = ((self.hash - x) * self.P_INV) % self.M
return self.hash
Applications
- usually used in the Rabin-Karp string search algorithm.