This script implements the Rabin-Karp algorithm for substring search.
The Rabin-Karp algorithm is a string-searching algorithm that efficiently finds a pattern within a text. It works by hashing the pattern and sliding a window across the text, checking for matches based on hash values.
- Initialization:
n
: string lengthm
: substring lengthq
: prime modulusd
: ASCII alphabet size
- Calculate Hashes: Calculate hashes substring and first
m
string characters window. - Compare Hashes:
- If the hashes are equal, output the current index.
- If the hashes are not equal, recalculate the hash for the next
m
-character window of the string. Remove the hash of the first character and add the hash of the next character. - If no matches are found, output -1.
- Clone this repository
git clone https://github.com/torshin5ergey/python-playground.git
- Go to this project directory
cd python-playground/ungrouped/rabin_karp
- Run Python file
python rabin_karp.py
- Run unit tests file
python test_rabin_karp.py
or
python -m unittest test_rabin_karp.py
This will run the unit tests and display the results.
rabinkarp.py
: The Rabin-Karp algorithm inrabin_karp()
function.test_rabinkarp.py
: Unit tests for therabin_karp()
function using Python'sunittest
framework.
Test class: TestRabinKarp
Included tests:
test_smoke_rabin_karp()
: basic functionality tests (smoke tests).test_edge_rabin_karp()
: edge tests.
Enter a string:
>>> Rabin Karp
Enter a substring:
>>> Karp
6
- Hashing: The algorithm relies on hashing to efficiently compare substrings.
- Rabin-Karp
- d is used to ensure that the hash is unique for each substring. In the Rabin-Karp algorithm, each character of a string is treated as a digit in a number system with base d. With d=256, the characters of the string are treated as numbers in the 256-item number system.
- q is used to reduce collisions (situations where different substrings produce identical hashes). Using a prime number in modular arithmetic helps to distribute hashes more evenly, which reduces the probability of matching hashes for different substrings. In addition, modular division prevents overflow of hash values.
- In the Rabin-Karp algorithm hashes are computed modulo a large prime number q. This avoids integer overflow and reduces the probability of hash collisions.
Sergey Torshin @torshin5ergey