Skip to content

First 8 digits of result of ssn() in 'nl_NL' provider are (needlessly) unique #2253

@Dutcho

Description

@Dutcho

First 8 digits of result of ssn() in 'nl_NL' provider are (needlessly) unique

Issue

The ssn() code generates the first 8 digits by calling random.sample(range(10), k=8), so only the 9th digit can duplicate one of the first 8 digits.
The first 8 digits are all different.

>>> assert all(len(set(fake.ssn()[:-1])) == 8 for _ in range(100_000))

Fix

The code should call random.choices(range(10), k=8) instead.

Requirement

Uniqueness of the first 8 digits is not a requirement for Dutch BSN's (see example at Wikipedia).
It reduces the range from BSN's >80 million to only 1.6 million, which is less than the population of the Netherlands. That's how I found the issue, trying to generate a test file of 2 million unique BSN's.

Secondary issue

Although incorrect, the current version avoids ssn() results with 2 (or more) leading zeroes.
That's a happy accident as BSN's must be 8 (+ leading zero) or 9 digits long.

Therefore, when fixing ssn(), the code should also filter results with leading '00'. That can be accomplished

  • either by brute force (i.e. filtering out generated [0, 0, ...] digits lists)
  • or by choosing digit 2 from range(1 if digit1 == 0 else 0, 10) (instead of range(10)).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions