Skip to content

Pass additional Chunk information to detectors #1517

Open
@rgmz

Description

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

Presently, detectors have no knowledge of the source (e.g., "Git") or metadata (e.g., "file: package-lock.json"), and only receive a stream of bytes.

// FromData will scan bytes for results, and optionally verify them.
FromData(ctx context.Context, verify bool, data []byte) ([]Result, error)

While this design makes sense given TruffleHog's goal of scanning a multitude of sources (e.g., Git, Confluence, Slack), the lack of contextual information limits the power/usefulness of the detectors. For example, you cannot skip known bad filetypes like yarn.lock (#1460)1, nor can you write filetype/language-specific rules like checking for JDBC credentials in .java/JVM code2.

Problem to be Addressed

Provide more context to Decoders so that it's possible to ignore known bad files/filetypes and write file/filetype-specific rules.

Description of the Preferred Solution

A few potential solutions come to mind:

  1. Replace the FromData(ctx context.Context, verify bool, data []byte) ([]Result, error) function with FromChunk(ctx context.Context, chunk Chunk) ([]Result, error)
    https://github.com/trufflesecurity/trufflehog/blob/20b77938285b82bc80531ba176989b7f8bae8c4b/pkg/sources/sources.go#L14C1-L29
  2. Alter the signature of FromData to include SourceType as well as SourceMetadata (presumably you'd want SourceType to make pulling relevant metadata easier).
  3. Add a "preflight" check for each detector, separate from FromData, to determine whether or not it should run.

Additional Context

N/A

References

Footnotes

  1. As far as I can tell

  2. You can write that rule, however, it seems like it would run on every chunk which could adversely affect performance.

Metadata

Assignees

No one assigned

    Labels

    enhancementpkg/enginePRs and Issues related to the `engine` packagepkg/sourcesPRs and Issues related to the `sources` package

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions