Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed uri regex issue #3815

Merged
merged 19 commits into from
Feb 13, 2025
Merged

Conversation

kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented Dec 23, 2024

Description:

This PR fixes github issue #3686
Screenshot from 2024-12-23 19-05-15

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 requested a review from a team as a code owner December 23, 2024 14:13
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil)
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil)

var (
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`)
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\w-\.]{3,50}:([\w-\.]{3,50})@[-.%\w\/:]+\b`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\S matches any non-whitespace character, which is very broad. Instead, we are now using \w, which matches [A-Za-z0-9_], and extending it by adding a few special characters to suit our needs.

@kashifkhan0771 kashifkhan0771 self-assigned this Jan 10, 2025
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil)
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil)

var (
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`)
keyPat = regexp.MustCompile(`\b(?:https?:\/\/)?[\w-\.$~!]{3,50}:([\w-\.%$^&#]{3,50})@[-.\w]+\b`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a large number of valid characters for usernames and passwords. The host pattern is also still fairly permissive and would match things that could never be valid, e.g. @----__-2as-2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

\b(?:https?:\/\/)?[\w-\.$~!&'()*+,;=:%-]{3,50}:([\w-\.%$^#&'()*+,;=:%-]{3,50})@[a-zA-Z0-9.-]+(?:\.[a-zA-Z]{2,})?\b

Added additional valid characters and fixed the host pattern too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's looking a bit better. Some notes:

  1. The scheme prefix shouldn't be optional
  2. : isn't a valid username character
  3. Username isn't always required (example)
  4. The username and password patterns are both missing special characters. It would be pragmatic to add all applicable special characters from [[:graph:]], and remove them later if they're causing issues. e.g.,
\bhttps?:\/\/[\w!#$%&()*+,\-./;<=>?@[\\\]^_{|}~]{0,50}:([\w!#$%&()*+,\-./:;<=>?[\\\]^_{|}~]{3,50})@[a-zA-Z0-9.-]+(?:\.[a-zA-Z]{2,})?\b
  1. The pattern needs to be able to detect port as well as path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with port detection

@@ -30,7 +30,7 @@ var _ interface {
} = (*Scanner)(nil)

var (
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`)
keyPat = regexp.MustCompile(`\bhttps?:\/\/[\w!#$%&()*+,\-./;<=>?@[\\\]^_{|}~]{0,50}:([\w!#$%&()*+,\-./:;<=>?[\\\]^_{|}~]{3,50})@[a-zA-Z0-9.-]+(?:\.[a-zA-Z]{2,})?(?::\d{1,5})?\b`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this no longer matches paths after host/port. That's probably something worth keeping.

@kashifkhan0771 kashifkhan0771 requested a review from rgmz February 7, 2025 11:53
Copy link
Collaborator

ahrav commented Feb 9, 2025

@kashifkhan0771 Is this ready to be merged? Wanted to double check before merging. Thanks.

@kashifkhan0771
Copy link
Contributor Author

@kashifkhan0771 Is this ready to be merged? Wanted to double check before merging. Thanks.

It's done from myside. @rgmz need to do a final review of the regex.

@zricethezav zricethezav merged commit d010607 into trufflesecurity:main Feb 13, 2025
13 checks passed
@kashifkhan0771 kashifkhan0771 deleted the fix/github-3686 branch February 13, 2025 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

URI detector: regex matches invalid characters, greedily overlaps with other URIs
5 participants