Skip to content

Allow country specific phone number blacklisting #14382

@talflon

Description

@talflon

Problem and background

Currently, the only country specific phone number handling is for the NANP (USA et al), where a number might be written with or without an initial 1 or +1. However, we often get phone number spam from other countries.

A majority of countries that are not in the NANP dial numbers in this way:

  • an initial + or 00 indicates what follows is the country code, for an international call. Surprisingly for me, it looks like almost no one is using 00 anymore.
  • an initial 0 indicates what follows is a national call and specifically not international. This is being used, at least for India.
  • most allow national calls that omit an initial 0.

India is one of our most common spam number countries recently, and there are numbers that have been blacklisted three times: once starting with 91, once starting with 0, and once with no prefix. In general these are inserted separately each time a variant comes up. If the system understood that they were all the same number, like it does for NANP numbers, it could (in theory) be inserted once and automatically catch the others.

The question of country codes was previously discussed under #2169, where @teward pointed out that trying to generally catch any possible phone number, when considering every country's dialing plan, means a large number of possible matches. Keeping that in mind, let's expand the number of TP matches in a controlled, limited way also reduces the size of the blacklists.

An additional, but minor, point of frustration with the current code is that 10-digit long numbers that are not NANP need to be written like

9876543210(?#no noram)

While not a lot of numbers with country code are exactly 10 digits long, some domestic numbers, including India's, are 10 digits long if they don't include the initial 0.

Proposed solution

For a number where the country code is known, I'd like a blacklist entry with a clear initial country code to automatically blacklist all the expected number formats. For example, a blacklist of one of the following:

+9112345678
009112345678

would match all of the following:

9112345678 (therefore also +9112345678)
12345678
012345678
009112345678

Ideally it would not match

+1-12345678
52-12345678

but that might be more subtle to implement.

For a number where the country code is not known, I'd like a blacklist entry with a single initial 0 to automatically match both with and without the initial 0. I believe this would have very few new FPs, because:

  • SmokeDetector currently does not try to guess country codes, so it would only add one more number to be matched
  • many countries allow dialing without the initial 0, so it's likely a valid variant of the number
  • these are long numbers

The idea is that, instead of guessing every possible number match that might have been intended, blacklist entries with an initial 0, +, or 00 are ones where either the spammer or the blacklister decided to give information about the number: that it's a domestic number, or that it does in fact have the country code included*. Let's use that information.

NANP numbers would continue to be handled similarly to the existing code, with any of the following:

12223334444
+12223334444
0012223334444
222-333-4444
2223334444(?#is noram)

matching all of the following:

12223334444
+12223334444
0012223334444
2223334444

And the following would continue to match only the single number given:

2223334444(?#no noram)
2223334444   (if -forced)
987654321

The current blacklist could also be deduplicated after these changes are made to the number-processing code.

Alternatives considered

Some of these additional matches can be automatically added by simply dropping initial 0s from all numbers when matching them. I think my proposal might not be much harder to implement, it's more targeted, and it preserves two ways for blacklisters to blacklist only the exact number given.

I've considered proposing this just specifically for India. I think making it general over all country codes would automatically catch some more TPs. Alternatively, we could make a "shortlist" of country codes we know we've seen use these dialing patterns in spam.

I've considered grepping the blacklists to suggest a PR adding the different numbers as separate items, where I see that only one has been added. I don't like that because they're really the same number—I'd prefer they be kept together for blacklisting and unblacklisting. Instead, my proposal opens the possibility of where one member has blacklisted 111222333, and another member has information that it's really an Indian number, and can replace 111222333 with +91111222333 without increasing the number of lines in the file.

Possible addition

If someone proposes to blacklist a number with a country code or an initial 0, it would also be possible to automatically remove any old items that will be duplicated by the new item. That is, a watch would remove old watch items matched by the new one, and a blacklist would remove old blacklist and watch items matched by the new one.

This would keep the lists cleaner. It might, however, make it more complicated to roll back mistakes.

Known issue

* Spammers are sloppy, both accidentally and on purpose. Sometimes they add an initial + to a number that does not actually have the country code included, like +12345678+ or even +12345678 for the number +9112345678, or +8001234567 for the number +1-800-123-4567. This could easily be copied by a blacklister, confusing SmokeDetector into matching some unintended numbers. We could add some heuristics for SmokeDetector to push back on numbers that look "wrong" like this, or we could just accept this as collateral damage that isn't hard to fix on a case-by-case basis.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions