Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
Presently, detectors have no knowledge of the source (e.g., "Git") or metadata (e.g., "file: package-lock.json"), and only receive a stream of bytes.
trufflehog/pkg/detectors/detectors.go
Lines 20 to 21 in 20b7793
While this design makes sense given TruffleHog's goal of scanning a multitude of sources (e.g., Git, Confluence, Slack), the lack of contextual information limits the power/usefulness of the detectors. For example, you cannot skip known bad filetypes like yarn.lock
(#1460)1, nor can you write filetype/language-specific rules like checking for JDBC credentials in .java/JVM code2.
Problem to be Addressed
Provide more context to Decoders so that it's possible to ignore known bad files/filetypes and write file/filetype-specific rules.
Description of the Preferred Solution
A few potential solutions come to mind:
- Replace the
FromData(ctx context.Context, verify bool, data []byte) ([]Result, error)
function withFromChunk(ctx context.Context, chunk Chunk) ([]Result, error)
https://github.com/trufflesecurity/trufflehog/blob/20b77938285b82bc80531ba176989b7f8bae8c4b/pkg/sources/sources.go#L14C1-L29 - Alter the signature of
FromData
to includeSourceType
as well asSourceMetadata
(presumably you'd wantSourceType
to make pulling relevant metadata easier). - Add a "preflight" check for each detector, separate from
FromData
, to determine whether or not it should run.
Additional Context
N/A
References
- Ignore known false-positives for the Parseur detector #1460
- Potential typo in Signable key regex #1456