algomaths

What

  • A rolling hash algorithm where the hash is a polynomial function with the input values as coefficients.
  • Uses only multiplications and additions.

Formula

where:

  • : the substring of s from to
  • : the window size
  • and : are some positive integers
    • modulo was done to avoid huge values
    • the choice of and affects the performance and the security of the hash function
      • should be chosen to have a modular multiplicative inverse
        • for lower-case only strings, is a good choice.
        • competitive programmers prefer a larger value of : , , .
      • should be greater than the alphabet size (number of possible values for )
      • is typically a prime number.
        • should be large since the colliding probability is .
        • and are widely used.

Note:

  • avoid converting (e.g. ) because then the hashes for a, aa, aaa, ... are all 0
  • the power series can be in reverse, i.e.

To move window to the right:

To move window to the left:

  • is a modular multiplicative inverse of by which can be multiply to get the result of modulo division without actually performing a division.
    • if m is prime and is coprime to
    • In python, use the built in pow to calculate modular exponentiation efficiently:
pow(p, m - 2, m)

Implementation

class PolynomialRollingHash:
    M = 10**9 + 9
    P = 29791  # 31, 53, 29791, 11111, 111111
    P_INV = pow(P, M - 2, M)
    POWS = [1]
 
    def __init__(self, sequence=""):
        self.sequence = deque()
        self.hash = 0
        for element in sequence:
            self.append(element)
 
    def get_pow(self, i) -> int:
        while len(self.POWS) <= i:
            self.POWS.append(self.POWS[-1] * self.P % self.M)
 
        return self.POWS[i]
 
    def encode(self, element) -> int:
        """
        Encode a single element
 
        Note: avoid producing an encoding of 0.
        E.g. a -> 0 then the hashes for a, aa, aaa, ... are all 0.
        """
 
        # for lower-cased str inputs
        return ord(element) - 96
 
    def appendleft(self, element) -> int:
        x = self.encode(element)
        p = self.get_pow(len(self.sequence))
        self.sequence.appendleft(element)
 
        self.hash = (x * p + self.hash) % self.M
 
        return self.hash
 
    def popleft(self) -> int:
        element = self.sequence.popleft()
        x = self.encode(element)
        p = self.get_pow(len(self.sequence))
 
        self.hash = (self.hash - x * p) % self.M
 
        return self.hash
 
    def append(self, element) -> int:
        x = self.encode(element)
        self.sequence.append(element)
 
        self.hash = (self.hash * self.P + x) % self.M
 
        return self.hash
 
    def pop(self) -> int:
        element = self.sequence.pop()
        x = self.encode(element)
 
        self.hash = ((self.hash - x) * self.P_INV) % self.M
 
        return self.hash

Applications

  • usually used in the Rabin-Karp string search algorithm.