Skip to content

Commit 73e2486

Browse files
committed
Use DJB2 hash algorithm
1 parent b008f05 commit 73e2486

1 file changed

Lines changed: 13 additions & 7 deletions

File tree

src/statementcache.c

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -166,19 +166,25 @@ statementcache_finalize(StatementCache *sc, APSWStatement *statement)
166166
static Py_hash_t
167167
apsw_hash_bytes(void *data, Py_ssize_t nbytes)
168168
{
169-
/* This is the same algorithm as fts3StrHash from the SQLite source
170-
so it is battle tested. There is also strhash in SQLite showing
171-
an algorithm from Knuth but that one has the problem of being
172-
32 bit specific and we do 64 bit mostly. */
169+
/* This is the DJB2 hash algorithm which is effective, simple, and
170+
works particularly well on ascii text which most SQL is.
171+
172+
Previously a similar algorithm from SQLite was used which is a shift
173+
and two xors. djb2 has fewer collisions so speedtest with
174+
larger cache sizes performs a few percent better.
175+
176+
I did experiment with just using the length as the hash
177+
but it was not a good discriminator.
178+
*/
173179

174180
const unsigned char *cdata = (const unsigned char *)data;
175181

176-
/* unsigned must be used because signed overflow is undefined behaviour*/
177-
Py_uhash_t hash = 0;
182+
/* unsigned must be used because signed overflow is undefined behaviour */
183+
Py_uhash_t hash = 5381;
178184

179185
while (nbytes > 0)
180186
{
181-
hash = (hash << 3) ^ hash ^ *cdata;
187+
hash = (hash * 33) ^ *cdata;
182188
cdata++;
183189
nbytes--;
184190
}

0 commit comments

Comments
 (0)