Skip to content

Pass additional Chunk information to detectors #1517

Open
@rgmz

Description

@rgmz

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

Presently, detectors have no knowledge of the source (e.g., "Git") or metadata (e.g., "file: package-lock.json"), and only receive a stream of bytes.

// FromData will scan bytes for results, and optionally verify them.
FromData(ctx context.Context, verify bool, data []byte) ([]Result, error)

While this design makes sense given TruffleHog's goal of scanning a multitude of sources (e.g., Git, Confluence, Slack), the lack of contextual information limits the power/usefulness of the detectors. For example, you cannot skip known bad filetypes like yarn.lock (#1460)1, nor can you write filetype/language-specific rules like checking for JDBC credentials in .java/JVM code2.

Problem to be Addressed

Provide more context to Decoders so that it's possible to ignore known bad files/filetypes and write file/filetype-specific rules.

Description of the Preferred Solution

A few potential solutions come to mind:

  1. Replace the FromData(ctx context.Context, verify bool, data []byte) ([]Result, error) function with FromChunk(ctx context.Context, chunk Chunk) ([]Result, error)
    https://github.com/trufflesecurity/trufflehog/blob/20b77938285b82bc80531ba176989b7f8bae8c4b/pkg/sources/sources.go#L14C1-L29
  2. Alter the signature of FromData to include SourceType as well as SourceMetadata (presumably you'd want SourceType to make pulling relevant metadata easier).
  3. Add a "preflight" check for each detector, separate from FromData, to determine whether or not it should run.

Additional Context

N/A

References

Footnotes

  1. As far as I can tell

  2. You can write that rule, however, it seems like it would run on every chunk which could adversely affect performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementpkg/enginePRs and Issues related to the `engine` packagepkg/sourcesPRs and Issues related to the `sources` package

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions