Skip to content

Conversation

@thomasleplus
Copy link

@thomasleplus thomasleplus commented Nov 12, 2025

What kind of change does this PR introduce?

This PR is adding Java support to the probe introduced by #4499. It looks for references to the classes sun.misc.Unsafe or jdk.internal.misc.Unsafe classes which can bypass the JVM's memory safety features (garbage collection, checks against out-of-bound read and write, etc.).

Note that the PR includes a Java source code parser generated with Antlr4 that can be reused to add more Java probes and checks in the future.

What is the current behavior?

The probe looks for unsafe patterns in go and c# code.

What is the new behavior (if this is a feature change)?

The probe also looks for unsafe patterns in Java code.

  • Tests for the changes have been added (for bug fixes/features)

Which issue(s) this PR fixes

Contributes to #3736.

Special notes for your reviewer

Does this PR introduce a user-facing change?

For user-facing changes, please add a concise, human-readable release note to
the release-note

(In particular, describe what changes users might need to make in their
application as a result of this pull request.)

Added Java support to probe for non-memory safe practices by detecting references to the sun.misc.Unsafe and jdk.internal.misc.Unsafe classes.

@thomasleplus thomasleplus requested a review from a team as a code owner November 12, 2025 19:14
@thomasleplus thomasleplus requested review from AdamKorcz and jeffmendoza and removed request for a team November 12, 2025 19:14
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Nov 12, 2025
@github-actions
Copy link

This pull request has been marked stale because it has been open for 10 days with no activity

@github-actions github-actions bot added the Stale label Nov 23, 2025
Looks for references to the classes sun.misc.Unsafe or jdk.internal.misc.Unsafe classes which can bypass the JVM's memory safety features (garbage collection, checks against out-of-bound read and write, etc.).

Signed-off-by: Thomas Leplus <thomasleplus@users.noreply.github.com>

ci(github): improve performance
@github-actions
Copy link

This pull request has been marked stale because it has been open for 10 days with no activity

@github-actions
Copy link

This pull request has been marked stale because it has been open for 10 days with no activity

@github-actions github-actions bot added the Stale label Dec 28, 2025
@github-actions github-actions bot removed the Stale label Dec 29, 2025
@AdamKorcz
Copy link
Contributor

Hi @thomasleplus thanks a lot for the PR and sorry for the long wait here. I think we want coverage of unsafe code blocks in Java added to Scorecard since Scorecard already supports it for other languages, however, given the size of your PR, have you thought of ways to do this with less code? For example, have you explored simple pattern search? At the time of writing, the PR has almost 59k lines of code, and I am not sure the added feature justifies that amount of code.

@thomasleplus
Copy link
Author

Hi @AdamKorcz,

First let me say that I understand your concern. I considered using a simple regex but the trade off is reliability. The regex might create false positives if it finds an import statement inside commented code for example. With some advanced regex magic, we may be able to exclude some false positives but there are too many to exclude them all (what if the import is inside a string literal, or in an annotation attribute, etc.). I know that it sounds unlike but given a large enough code base, I am sure someone will find an occurrence in their code and complain about it. Only a real Java parser can handle all the cases (except dead code maybe). That's why the equivalent Golang check uses a parser too, it just happens that one was available ready-made whereas I couldn't find a Golang parser for Java, hence my decision to generate one with Antlr.

Another option I considered is to use something like go-tree-sitter which supports Java but it would requires the tree-sitter binaries to be either either installed separately or shipped with scorecard. That doesn't seem ideal.

I think that my approach has the benefit of being a pure Golang solution. It's definitely a lot of code for just the feature described in this PR but it also means that any future Java check can benefit from the fact that the parser is there if it needs to find more complex code issues.

If your main concern is that it is impossible to review that many lines of code, a workaround could be to generate the parser code at build time from the grammar instead of committing it.

Ultimately I will respect what you guys as maintainers feel is best for the project. As I said it's a trade-off, there is no perfect answer. If you want me to rewrite this PR with a regex, I can do that.

Cheers,

Tom

@github-actions
Copy link

This pull request has been marked stale because it has been open for 10 days with no activity

@github-actions github-actions bot added the Stale label Jan 18, 2026
@github-actions github-actions bot removed the Stale label Jan 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants