Skip to content

Update julia grammar #520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2025
Merged

Update julia grammar #520

merged 3 commits into from
May 1, 2025

Conversation

savq
Copy link
Contributor

@savq savq commented Jan 8, 2025

Update the Julia grammar to tree-sitter/tree-sitter-julia#135. That PR introduced several breaking changes, including changes in the way function signatures and parameters were parsed.

(This should reduce the size of the parser.c in semgrep-julia by about 15MB).

Checklist

  • Any new parsing code was already published, integrated, and merged into Semgrep. DO NOT MERGE THIS PR BEFORE THE SEMGREP INTEGRATION WORK WAS COMPLETED.
  • Change has no security implications (otherwise, ping the security team)

Update the Julia grammar to tree-sitter-julia#135. That PR introduced
several breaking changes, including changes in the way function
signatures and parameters were parsed.
@CLAassistant
Copy link

CLAassistant commented Jan 8, 2025

CLA assistant check
All committers have signed the CLA.

@aryx aryx requested a review from mjambon January 9, 2025 07:22
Copy link
Collaborator

@aryx aryx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

),

_statement: ($, previous) => choice(
previous,
$.semgrep_ellipsis,
),

typed_parameter: ($, previous) => choice(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing this? We can still have ellpisis in type parameters even with this removal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julia doesn't have a distinction between expressions and patterns, so function signatures are parsed as function calls, and checked in the lowering phase (after macro expansion). Having these parameter rules added a lot of duplication and conflicts, so we removed them.

We can still have ellpisis in type parameters even with this removal?

Yes. Ellipsis would now be parsed as splat_expression.

@savq savq marked this pull request as ready for review January 17, 2025 16:03
@savq savq requested a review from a team as a code owner January 17, 2025 16:03
@nmote
Copy link
Collaborator

nmote commented Feb 3, 2025

Hey, apologies for the delay here. I can take responsibility for this review going forward. The changes here look good, but do you plan to integrate these into Semgrep as well? We likely can't accept this PR otherwise.

@savq
Copy link
Contributor Author

savq commented Feb 3, 2025

Yes, I'm currently working on the changes to semgrep itself. I opened semgrep/semgrep#10820, tho there's still some work to do there.

Copy link
Collaborator

@nmote nmote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've released this as semgrep/semgrep-julia@f973ebd.

We will merge this once the associated Semgrep PR is merged.

The `_statement` rule doesn't need to be extended because statements are
expressions and the `_expression` rule is already extended.
@savq
Copy link
Contributor Author

savq commented Apr 24, 2025

Hey @nmote,

I added a fix for a bug in the extended grammar: It was including semgrep_ellipsis for both expressions and statements (that's redundant because statements are expressions in Julia). That was causing some pretty hard to debug parse trees.

Could you please make a new release?

@nmote
Copy link
Collaborator

nmote commented Apr 24, 2025

Done!

nmote added a commit to semgrep/semgrep that referenced this pull request May 1, 2025
This PR upgrades support for Julia to a more recent version of
tree-sitter-julia.

- See tree-sitter/tree-sitter-julia#135 for the changes to the parser
- See semgrep/ocaml-tree-sitter-semgrep#520 for the changes to
semgrep-julia

Some notes:

- The changes to tree-sitter-julia in the linked PR were focused on
producing parse trees more faithful to JuliaSyntax (the default Julia
parser). However, JuliaSyntax delegates a lot of work to the lowering
phase, like checking function signatures. This means that while
tree-sitter produces a smaller parser, semgrep now needs to do more work
to find a function's name and parameters.

- Since parameters are now parsed as expressions, I left a lot of `todo`
for expressions that don't constitute valid "patterns". I don't know if
it might be a good idea to create an alias for `todo` to clearly
indicate that we don't handle those cases because they're not valid in
normal Julia (outside macros), and not because they should be handled in
the future.

- tree-sitter-julia now parses short functions as assignments (That's
also how it's done in the reference parser). To handle this, I translate
short functions into lambdas: `f(...) = ...` becomes `f = (...) -> ...`.

- The special handling for `where_clause` was removed. A later version
of tree-sitter-julia unified `where_clause` with `where_expression`.

- One of the issues I really hoped would be fixed was #10649, but that's
actually a limitation on semgrep's side! In general, I still feel
there's a lot of weird hacks because semgrep's AST only considers C-like
macros.

---------

Co-authored-by: Nat Mote <[email protected]>
@nmote nmote removed the do not merge label May 1, 2025
@nmote nmote merged commit dcba33d into semgrep:main May 1, 2025
25 of 45 checks passed
@savq savq deleted the upgrade-julia branch May 3, 2025 21:12
semgrep-ci bot added a commit to semgrep/semgrep that referenced this pull request May 5, 2025
…emgrep/semgrep-proprietary#3802)

This PR upgrades support for Julia to a more recent version of
tree-sitter-julia.

- See tree-sitter/tree-sitter-julia#135 for the changes to the parser
- See semgrep/ocaml-tree-sitter-semgrep#520 for the changes to
semgrep-julia

Some notes:

- The changes to tree-sitter-julia in the linked PR were focused on
producing parse trees more faithful to JuliaSyntax (the default Julia
parser). However, JuliaSyntax delegates a lot of work to the lowering
phase, like checking function signatures. This means that while
tree-sitter produces a smaller parser, semgrep now needs to do more work
to find a function's name and parameters.

- Since parameters are now parsed as expressions, I left a lot of `todo`
for expressions that don't constitute valid "patterns". I don't know if
it might be a good idea to create an alias for `todo` to clearly
indicate that we don't handle those cases because they're not valid in
normal Julia (outside macros), and not because they should be handled in
the future.

- tree-sitter-julia now parses short functions as assignments (That's
also how it's done in the reference parser). To handle this, I translate
short functions into lambdas: `f(...) = ...` becomes `f = (...) -> ...`.

- The special handling for `where_clause` was removed. A later version
of tree-sitter-julia unified `where_clause` with `where_expression`.

- One of the issues I really hoped would be fixed was #10649, but that's
actually a limitation on semgrep's side! In general, I still feel
there's a lot of weird hacks because semgrep's AST only considers C-like
macros.

---------

Co-authored-by: Nat Mote <[email protected]>

synced from OSS 06525d0

Co-authored-by: Sergio A. Vargas <[email protected]>

synced from Pro 7029d9d64eb5cdd2835b9eb238b02a84fe7ec17b
semgrep-ci bot added a commit to semgrep/semgrep that referenced this pull request May 6, 2025
…emgrep/semgrep-proprietary#3802)

This PR upgrades support for Julia to a more recent version of
tree-sitter-julia.

- See tree-sitter/tree-sitter-julia#135 for the changes to the parser
- See semgrep/ocaml-tree-sitter-semgrep#520 for the changes to
semgrep-julia

Some notes:

- The changes to tree-sitter-julia in the linked PR were focused on
producing parse trees more faithful to JuliaSyntax (the default Julia
parser). However, JuliaSyntax delegates a lot of work to the lowering
phase, like checking function signatures. This means that while
tree-sitter produces a smaller parser, semgrep now needs to do more work
to find a function's name and parameters.

- Since parameters are now parsed as expressions, I left a lot of `todo`
for expressions that don't constitute valid "patterns". I don't know if
it might be a good idea to create an alias for `todo` to clearly
indicate that we don't handle those cases because they're not valid in
normal Julia (outside macros), and not because they should be handled in
the future.

- tree-sitter-julia now parses short functions as assignments (That's
also how it's done in the reference parser). To handle this, I translate
short functions into lambdas: `f(...) = ...` becomes `f = (...) -> ...`.

- The special handling for `where_clause` was removed. A later version
of tree-sitter-julia unified `where_clause` with `where_expression`.

- One of the issues I really hoped would be fixed was #10649, but that's
actually a limitation on semgrep's side! In general, I still feel
there's a lot of weird hacks because semgrep's AST only considers C-like
macros.

---------

Co-authored-by: Nat Mote <[email protected]>

synced from OSS 06525d0

Co-authored-by: Sergio A. Vargas <[email protected]>

synced from Pro 7029d9d64eb5cdd2835b9eb238b02a84fe7ec17b
nmote pushed a commit to semgrep/semgrep that referenced this pull request May 6, 2025
…emgrep/semgrep-proprietary#3802)

This PR upgrades support for Julia to a more recent version of
tree-sitter-julia.

- See tree-sitter/tree-sitter-julia#135 for the changes to the parser
- See semgrep/ocaml-tree-sitter-semgrep#520 for the changes to
semgrep-julia

Some notes:

- The changes to tree-sitter-julia in the linked PR were focused on
producing parse trees more faithful to JuliaSyntax (the default Julia
parser). However, JuliaSyntax delegates a lot of work to the lowering
phase, like checking function signatures. This means that while
tree-sitter produces a smaller parser, semgrep now needs to do more work
to find a function's name and parameters.

- Since parameters are now parsed as expressions, I left a lot of `todo`
for expressions that don't constitute valid "patterns". I don't know if
it might be a good idea to create an alias for `todo` to clearly
indicate that we don't handle those cases because they're not valid in
normal Julia (outside macros), and not because they should be handled in
the future.

- tree-sitter-julia now parses short functions as assignments (That's
also how it's done in the reference parser). To handle this, I translate
short functions into lambdas: `f(...) = ...` becomes `f = (...) -> ...`.

- The special handling for `where_clause` was removed. A later version
of tree-sitter-julia unified `where_clause` with `where_expression`.

- One of the issues I really hoped would be fixed was #10649, but that's
actually a limitation on semgrep's side! In general, I still feel
there's a lot of weird hacks because semgrep's AST only considers C-like
macros.

---------

Co-authored-by: Nat Mote <[email protected]>

synced from OSS 06525d0

Co-authored-by: Sergio A. Vargas <[email protected]>

synced from Pro 7029d9d64eb5cdd2835b9eb238b02a84fe7ec17b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants