Skip to content

Fix REGEX_URL_PROTOCOL to support standard URL schemes with colons #644

@opbot-xd

Description

@opbot-xd

Description

During the refactoring of the Cowrie extraction strategy (PR #639), a bug was identified in greedybear/regex.py regarding how URL protocols are matched.

Problem

The current regex REGEX_URL_PROTOCOL uses the pattern ps?, which expects the protocol to be immediately followed by // (e.g., http//). This causes the regex to miss standard URLs that include a colon, such as http:// or https://, which are commonly found in payloads.

Proposed Solution

Update the regex pattern to optionally match the colon character by changing ps? to ps?:?.

Location

greedybear/regex.py

Example

  • Current behavior: Matches http//example.com, misses http://example.com
  • Expected behavior: Should match both.

Related PRs

Discovered and fixed in #639.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingpythonPull requests that update Python code

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions