-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
First 8 digits of result of ssn() in 'nl_NL' provider are (needlessly) unique
Issue
The ssn() code generates the first 8 digits by calling random.sample(range(10), k=8), so only the 9th digit can duplicate one of the first 8 digits.
The first 8 digits are all different.
>>> assert all(len(set(fake.ssn()[:-1])) == 8 for _ in range(100_000))
Fix
The code should call random.choices(range(10), k=8) instead.
Requirement
Uniqueness of the first 8 digits is not a requirement for Dutch BSN's (see example at Wikipedia).
It reduces the range from BSN's >80 million to only 1.6 million, which is less than the population of the Netherlands. That's how I found the issue, trying to generate a test file of 2 million unique BSN's.
Secondary issue
Although incorrect, the current version avoids ssn() results with 2 (or more) leading zeroes.
That's a happy accident as BSN's must be 8 (+ leading zero) or 9 digits long.
Therefore, when fixing ssn(), the code should also filter results with leading '00'. That can be accomplished
- either by brute force (i.e. filtering out generated
[0, 0, ...]digitslists) - or by choosing digit 2 from
range(1 if digit1 == 0 else 0, 10)(instead ofrange(10)).