Pass additional `Chunk` information to detectors

### Community Note

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment

### Description

Presently, detectors have no knowledge of the source (e.g., "Git") or metadata (e.g., ["file: package-lock.json"](https://github.com/trufflesecurity/trufflehog/blob/20b77938285b82bc80531ba176989b7f8bae8c4b/pkg/engine/git.go#L50-L63)), and only receive a stream of bytes.

https://github.com/trufflesecurity/trufflehog/blob/20b77938285b82bc80531ba176989b7f8bae8c4b/pkg/detectors/detectors.go#L20-L21

While this design makes sense given TruffleHog's goal of scanning a multitude of sources (e.g., Git, Confluence, Slack), the lack of contextual information limits the power/usefulness of the detectors. For example, you cannot skip known bad filetypes like `yarn.lock` (#1460)[^1], nor can you write filetype/language-specific rules like [checking for JDBC credentials in .java/JVM code](https://github.com/dbeaver/dbeaver/blob/a66e69a30cf5dec82c9fec488e7991cb6f15aca9/plugins/org.jkiss.dbeaver.ext.mysql/src/MySQLErrorsTest.java#L24)[^2].


[^1]: As far as I can tell
[^2]: You can write that rule, however, it seems like it would run on every chunk which could adversely affect performance.

## Problem to be Addressed

Provide more context to Decoders so that it's possible to ignore known bad files/filetypes and write file/filetype-specific rules.

## Description of the Preferred Solution

A few potential solutions come to mind:

1. Replace the `FromData(ctx context.Context, verify bool, data []byte) ([]Result, error)` function with `FromChunk(ctx context.Context, chunk Chunk) ([]Result, error)`
  https://github.com/trufflesecurity/trufflehog/blob/20b77938285b82bc80531ba176989b7f8bae8c4b/pkg/sources/sources.go#L14C1-L29
2. Alter the signature of `FromData` to include `SourceType` as well as `SourceMetadata` (presumably you'd want `SourceType` to make pulling relevant metadata easier).
3. Add a "preflight" check for each detector, separate from `FromData`, to determine whether or not it should run.


## Additional Context

N/A

### References

* #1460 
* #1456 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass additional `Chunk` information to detectors #1517

Community Note

Description

Problem to be Addressed

Description of the Preferred Solution

Additional Context

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// FromData will scan bytes for results, and optionally verify them.
	FromData(ctx context.Context, verify bool, data []byte) ([]Result, error)

Pass additional Chunk information to detectors #1517

Description

Community Note

Description

Problem to be Addressed

Description of the Preferred Solution

Additional Context

References

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pass additional `Chunk` information to detectors #1517