Skip to content

Is a custom word list okay to add to the repository? #43

@luc-x41

Description

@luc-x41

Hi! I've been checking and adding words to an internal dictionary which we've used in our penetration testing reports, blog posts, etc. the last years, when a colleague pointed out that I should maybe just be using a better dictionary than the default one and pointed me here :)

Many of these words, such as cryptographically, canonicalized, satisfiable, and transpiling, are not yet in this repository, so I want to contribute/consolidate those. I have no automatically updating source for them (and we certainly could not publish customers' reports for the project to scrape words from 😅), so my question is whether you are interested in including a list of custom words that does not get automatic updates. The words are split out into:

  • about 250 (jargon) words like the aforementioned four, all American English spellings because that's what we've standardized on;
  • about 100 acronyms and 200 brand names that I think you will mostly already have (didn't do a diff yet); and
  • about 60 names of different protocols or standards (WHOIS, WebAuthn), tools (sudo, rsync), attacks (Heartbleed, Clickjacking), encodings (Base64), etc., few of which seem to be in the repository.
    • (By the way, if someone has an overarching name for these sixty, I'd be interested. I guess the distinction I'm making is that brands are something you may recommend whereas Heartbleed, rsync, or USB... I mean, they can be trademarked and you could recommend something like USB or rsync over some other connector or transfer software, but they're very broadly used as a neutral term (no risk of it looking like an endorsement) and it's also part of composite nouns such as USB drive which I wouldn't think is trademarked or considered branded.)

If yes, follow-up questions are:

  • What should the structure be? If anything in ./wordlists/ is considered to be automatically generated, I could simply make a script that does no more than echoing the words. Alternatively, a plain text file could be included among the generated word lists, perhaps with a comment on top that indicates it is custom.
  • Should they be in one list/file with a blank line and comment separating each category (that's my current structure), or rather separate lists? The repository currently uses separate lists per category, but because these words will not update, it also feels potentially sensible to just collect these in one place.
    • The acronyms and brand names are a bit different because those categories already exist. I will check whether there's a point merging them in the first place: if there are more than, say, a dozen new entries then it probably makes sense to incorporate these into the existing scripts for acronyms and brands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions